https://www.oreilly.com/ideas/distributed-tensorflow
On a DeepLearning11 server (cost $15K), you get about 60-75% DL training performance compared to a DGX-1 (cost $150k)
Join 4,500+ subscribers and get the best books mentioned on Hacker News every Thursday.
When hyperparam search is this easy with PySpark, why wouldn't you do it?
def model_fn(learning_rate, dropout):
[TensorFlow Code here]
args_dict = {'learning_rate': [0.001, 0.005, 0.01], 'dropout': [0.5, 0.6]} experiment.launch(spark, model_fn, args_dict)
For distributed training, we favor Horovod. It requires minimal changes to code, and scales linearly on our DeepLearning11 servers: https://www.oreilly.com/ideas/distributed-tensorflow