Discussion about this post

User's avatar
Neural Foundry's avatar

Impressive milestone getting full inference working on the 2-layer MLP. The way the unified buffer swap between layers handles the H transfer is really elegant, especially how it cycles quantized outputs back through without extra memory overhead. Back when I was experimenting with custom accelerators, managing that state transition between compute stages always felt clunky, so seeing this clean FSM approach for chaining arbitary depth is refreshing.

1 more comment...

No posts

Ready for more?