Why you need to improve your training data, and how to do it

Pete Warden's blog

sleep_lostPhoto by Lisha Li

Andrej Karpathy showed this slide as part of his talk at Train AI and I loved it! It captures the difference between deep learning research and production perfectly. Academic papers are almost entirely focused on new and improved models, with datasets usually chosen from a small set of public archives. Everyone I know who uses deep learning as part of an actual application spends most of their time worrying about the training data instead.

There are lots of good reasons why researchers are so fixated on model architectures, but it does mean that there are very few resources available to guide people who are focused on deploying machine learning in production. To address that, my talk at the conference was on “the unreasonable effectiveness of training data”, and I want to expand on that a bit in this blog post, explaining why data is so important…

View original post 3,917 more words

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.