Anything that's not MNIST. Even Fashion-MNIST benefits from hyperparam search. Distributed TF is the standard in Google, will be the standard amongst everybody else within a couple of years. They actually just run Estimators, and distribution is somewhat transparent to them.
When hyperparam search is this easy with PySpark, why wouldn't you do it?
Here's a figure on the scale-out you can get on the DeepLearning11 server (cost $15k) - it's about 60-80% of the performance of a DGX-1 for 1/10th of the price (for deep ConvNets and image classification, at least).
When hyperparam search is this easy with PySpark, why wouldn't you do it?
def model_fn(learning_rate, dropout):
[TensorFlow Code here]
args_dict = {'learning_rate': [0.001, 0.005, 0.01], 'dropout': [0.5, 0.6]} experiment.launch(spark, model_fn, args_dict)
For distributed training, we favor Horovod. It requires minimal changes to code, and scales linearly on our DeepLearning11 servers: https://www.oreilly.com/ideas/distributed-tensorflow