2 Comments
User's avatar
Neural Foundry's avatar

Impressive milestone getting full inference working on the 2-layer MLP. The way the unified buffer swap between layers handles the H transfer is really elegant, especially how it cycles quantized outputs back through without extra memory overhead. Back when I was experimenting with custom accelerators, managing that state transition between compute stages always felt clunky, so seeing this clean FSM approach for chaining arbitary depth is refreshing.

Alan Ma's avatar

Thanks for the kind words and support, it means a lot to us!!