Found in 3 comments on Hacker News
jamesblonde · 2018-03-31 · Original thread
Anything that's not MNIST. Even Fashion-MNIST benefits from hyperparam search. Distributed TF is the standard in Google, will be the standard amongst everybody else within a couple of years. They actually just run Estimators, and distribution is somewhat transparent to them.

When hyperparam search is this easy with PySpark, why wouldn't you do it?

def model_fn(learning_rate, dropout):

[TensorFlow Code here]

args_dict = {'learning_rate': [0.001, 0.005, 0.01], 'dropout': [0.5, 0.6]} experiment.launch(spark, model_fn, args_dict)

For distributed training, we favor Horovod. It requires minimal changes to code, and scales linearly on our DeepLearning11 servers: https://www.oreilly.com/ideas/distributed-tensorflow

jamesblonde · 2017-12-25 · Original thread
If you want to know what Nvidia are afraid of, look at the last figure in this O'Reilly blog on distributed tensorflow.

https://www.oreilly.com/ideas/distributed-tensorflow

On a DeepLearning11 server (cost $15K), you get about 60-75% DL training performance compared to a DGX-1 (cost $150k)

jamesblonde · 2017-12-22 · Original thread
Here's a figure on the scale-out you can get on the DeepLearning11 server (cost $15k) - it's about 60-80% of the performance of a DGX-1 for 1/10th of the price (for deep ConvNets and image classification, at least).

https://www.oreilly.com/ideas/distributed-tensorflow