https://www.amazon.com/Managing-Gigabytes-Compressing-Multim...
https://www.amazon.com/Managing-Gigabytes-Compressing-Multim...
https://www.amazon.com/Information-Retrieval-Implementing-Ev...
https://www.amazon.com/Introduction-Information-Retrieval-Ch...
https://www.amazon.com/Managing-Gigabytes-Compressing-Multim...
For instance you might be keep track of facts like
the word "the" is contained in document 1
the word "john" is contained in document 1
the word "the" is contained in document 2
...
the word "john" is contained in document 12
and you code the gaps; the word "the" appears in every document and the gap is always 1, but the gap for "john" is 11. With a variable-sized encoding you use fewer bits for smaller gaps -- with that kind of encoding you don't have to make "the" be a stopword because you can afford to encode all the postings.
The former is basically a solved problem. Lucene/ElasticSearch and Google are using basically the same techniques, and you can read about them in Managing Gigabytes [1], which was first published over 2 decades ago. Google may be a generation or so ahead - they were working on a new system to take full advantage of SSDs (which turn out to be very good for search, because it's a very read-heavy workload) when I left, and I don't really know the details of it. But ElasticSearch is a perfectly adequate retrieval system, and it does basically the same stuff that Google's systems did circa 2013, and even does some stuff better than Google.
The real interesting work in search is in ranking functions, and this is where nobody comes close to Google. Some of this, as other commenters note, is because Google has more data than anyone else. Some of it is just because there've been more man-hours poured into it. IMHO, it's pretty doubtful that an open-source project could attract that sort of focused knowledge-work (trust me; it's pretty laborious) when Google will pay half a mil per year for skilled information-retrieval Ph.Ds.
[1] https://www.amazon.com/Managing-Gigabytes-Compressing-Multim...
[1] https://www.amazon.com/Managing-Gigabytes-Compressing-Indexi... - Note review from Peter Norvig!
http://www.amazon.com/Managing-Gigabytes-Compressing-Multime...
http://www.amazon.com/Managing-Gigabytes-Compressing-Multime...
It is nice book, but might be little bit outdated.
http://www.amazon.com/Managing-Gigabytes-Compressing-Multime...
[1] https://www.amazon.co.uk/Managing-Gigabytes-Compressing-Inde... (1999)