Found in 2 comments on Hacker News
mindcrime · 2021-12-02 · Original thread
I don't even know if anybody has written a book specifically about search at "web scale" (no MongoDB jokes here, please). But about the closest things I know of would be something like:

https://www.amazon.com/Managing-Gigabytes-Compressing-Multim...

https://www.amazon.com/Information-Retrieval-Implementing-Ev...

https://www.amazon.com/Introduction-Information-Retrieval-Ch...

PaulHoule · 2021-06-18 · Original thread
This book has a nice treatment of that kind of compression:

https://www.amazon.com/Managing-Gigabytes-Compressing-Multim...

For instance you might be keep track of facts like

   the word "the" is contained in document 1    the word "john" is contained in document 1    the word "the" is contained in document 2    ...    the word "john" is contained in document 12 
and you code the gaps; the word "the" appears in every document and the gap is always 1, but the gap for "john" is 11. With a variable-sized encoding you use fewer bits for smaller gaps -- with that kind of encoding you don't have to make "the" be a stopword because you can afford to encode all the postings.