Found in 1 comment on Hacker News
Maro · 2018-12-31 · Original thread
I've put ML/NLP into production this year to automate away much of our ~1,000 person call center (successfully). I've tried a lot of things on the way, played around with SKL to see what features end up being useful. But in the end what's worked and is in production right now is fairly simple tokenization, getting rid of stopwords, and a simple rule engine, where the rules are coming from a back-end ML job, which is not doing anything beyond median/mean/counts/ratios to find good rules. Overall the most fancy library call I have is a fuzzy string match thing, I even got rid of SKL to reduce dependencies. It works very well, easy to understand, tunable, I can add exceptions/logging/etc at each step.

When it comes to DL stuff, I think the most useful thing in production projects will be "embeddings", and all the relatively simple stuff you can do once you have the word -> vector mapping. It's simple stuff, pick up the new O'reilly book 'Deep Learning Cookbook' [1] the first 4 chapters already cover this [2]. Popular libraries have this baked in [3], I think soon this will be like making SQL calls in Django projects...

[1] https://www.amazon.com/Deep-Learning-Cookbook-Practical-Reci...

[2] https://github.com/DOsinga/deep_learning_cookbook/blob/maste...

[3] https://pytorch.org/tutorials/beginner/nlp/word_embeddings_t...

Fresh book recommendations delivered straight to your inbox every Thursday.