Eli Bendersky's
Follow
Notes on implementing Attention
Some notes on implementing attention blocks in pure Python +
Numpy. The focus here is on the exact implementation in code, explaining all the
shapes throughout the process. The motivation for why attention works is not
covered here - there are plenty of excellent online resources explaining it.Several papers are mentioned …