Build a DeepSeek Model (From Scratch) cover
Build a DeepSeek Model (From Scratch)
by Raj Abhijit Dandekar, Rajat Dandekar, Naman Dwivedi, Sreedath Pana
ISBN: 978-1638358121
Found in 2 comments on Hacker News
We may earn a commission from purchases made through links on this page.
Not ready yet? Get weekly book picks.
I have a BS in CS (and have been in the field for 25 years). I couldn't understand the transformer architecture until I built a few myself. Here are the books I worked through. I now feel I have a very good understanding of modern LLMs.

https://www.amazon.com/Build-Large-Language-Model-Scratch/dp...

https://www.amazon.com/Build-DeepSeek-Scratch-Abhijit-Dandek...

Not OP but I worked through Sebastian Raschka's "Build a Large Language Model (From Scratch)" [0] and Raj Abhijit Dandekar's "Build a DeepSeek Model (From Scratch)" [1] books.

I don't think there is anything in a transformer I couldn't explain in the smallest detail now.

[0]: https://www.amazon.com/Build-Large-Language-Model-Scratch/dp...

[1]: https://www.amazon.com/Build-DeepSeek-Scratch-Abhijit-Dandek...