Pair it with Afra's book (http://www.amazon.com/Computing-Cambridge-Monographs-Computa...)
Some reading material: A very general blog about philosophy : http://radar.oreilly.com/2015/07/data-has-a-shape.html
A slightly more in-depth blog : https://shapeofdata.wordpress.com/2013/08/27/mapper-and-the-choice-of-scale/ A very accessible book about topology (especially from an algorithms perspective) : http://www.amazon.com/Computing-Cambridge-Monographs-Computational-Mathematics/dp/0521136091/ref=sr_1_1?ie=UTF8&qid=1444971634&sr=8-1&keywords=topology+for+computing Blog exposing persistent homology : https://normaldeviate.wordpress.com/2012/07/01/topological-data-analysis/ Videos exposing persistent homology : https://www.youtube.com/watch?v=CKfUzmznd9g https://www.youtube.com/watch?v=CKfUzmznd9g Some free software: Python Mapper by Daniel Müllner : http://danifold.net/mapper/index.html JPlex library by Harlan Sexton : http://www.math.colostate.edu/~adams/jplex/index.html Dionysus by Dimitriy Morozov : http://www.mrzv.org/software/dionysus/ Topological Data Analysis in R : https://cran.r-project.org/web/packages/TDA/vignettes/article.pdf Infrastructure Our tech stack is: Backend HDFS for storage Our ML and Math code is hand-rolled C++ and Assembly(7% LOC) All coordination/distributed systems code is in Java ZMQ for communication Protocol Buffers for protocol Frontend D3 Backbone Hand-rolled webGL graph visualization (we open sourced it at https://github.com/ayasdi/grapher) We currently don't use GPUs or any other fancy hardware primarily because today, our customers use commodity hardware and getting F1000 companies to buy cutting-edge hardware is just plain horrible. We have an awesome GPU rig at our offices that we test algorithms on and it can really make our algorithms scream, but again, none of our customers have/are willing to invest in GPUs. Apache Spark - it is interesting that in our experience, making it work for ML algorithms is really too much work unless you invest the time to understand the framework and its fundamentals. It performs very well for ETL type tasks, which is what we use it for. On a public offering: no comment :) If you have more questions - I am easy to find :) Gurjeet
http://www.amazon.com/Computing-Cambridge-Monographs-Computa...
Happy to help if you need it!
Under my elbow as I type is a small introductory book called Topology for Computing, which I’d recommend to anyone trying to investigate such links, for its concise clarity and elegant figures.
https://www.amazon.com/Computing-Cambridge-Monographs-Comput...