Found in 2 comments on Hacker News
jamesblonde · 2025-11-19 · Original thread
Cloudflare tried to build their own feature store, and get a grade F.

I wrote a book on feature stores by O'Reilly. The bad query they wrote in Clickhouse could have been caused by another more error - duplicate rows in materialized feature data. For example, in Hopsworks it prevents duplicate rows by building on primary key uniqueness enforcement in Apache Hudi. In contrast, Delta lake and Iceberg do not enforce primary key constraints, and neither does Clickhouse. So they could have the same bug again due to a bug in feature ingestion - and given they hacked together their feature store, it is not beyond the bounds of possibility.

Reference: https://www.oreilly.com/library/view/building-machine-learni...

jamesblonde · 2025-07-04 · Original thread
There are 10k+ air sensors that publish their pm2.5 measurements every 10 mins to https://aqicn.org/

In my forthcoming O'Reilly book, the first project is to build a ML model to predict air quality at the location of one of those sensors:

Book:

https://learning.oreilly.com/library/view/building-machine-l...

Code:

https://github.com/featurestorebook/mlfs-book/