Reliable Machine Learning

Found in 1 comment on Hacker News

hereonout2 · 2023-12-14 · Original thread

Things like this give a good overview of the problems being face in productionising ML:

https://research.google/pubs/whats-your-ml-test-score-a-rubr...

Note they start to discuss things like unit testing, integration testing, processing pipelines, canary tests, rollbacks, etc. Sound familiar yet?

The same author has also written this book:

https://www.oreilly.com/library/view/reliable-machine-learni...

I don't see a software engineer's skills becoming redundant in this field, especially if you have a good level of experience in cloud infra and tooling. It seems more valuable that ever to me (e.g. I have worked with ML Researchers who don't grasp HTTP let alone could set up a fleet of severs to run their model developed entirely in Jupyter Notebook).

I have found it helpful to equate myself with the correct tools and terminology in order to speak the right language - there's specific tools lots of people use such as Weights & Biases for "Experiment Tracking", terms like "Model Repository" which is just what it sounds like. "Vector Databases" (Elastic Search had this feature for years), "Feature Stores" - feel familiar to big table type databases.

Reading up on a typical use case like "RAG - Retrieval Augmented Generation" is a good idea - alongside starting to think about how you'd actually build and deploy one.

Above all having a decent background in cloud infra, engineering and how to optimise systems and code for production deployment at scale is a very in demand at the moment.

Being the person helping these teams of PHDs (many of whom have little industry experience) to productionise and deploy is where I am at right now - it feels like a fruitful place to be :)

ISBN: 9781098106218