29 How to Feed Your Robot: Building and Maintaining Open Machine Learning Datasets Evan Tachovsky 4:00 PM May 8 2019 Daisy Bingham Room https://csvconf.com/speakers/#evan-tachovsky 2019-05-08T16:00:00 While algorithms and computing power get all the press, the special sauce behind many recent machine learning breakthroughs are meticulously labeled training data. Developing and maintaining these data sets as public goods is both an art and a science. In this talk I'll present a new set of best practices gleaned from interview with ~20 data set builders, maintainers, and funders. Topics include: encouraging collaboration between rival data teams; finding and addressing ethical issues with crowd labeling; launching competitions to spur data set use; and revenue generation models for sustainability. https://csvconf.com/img/speakers-2019/etachovsky.jpg
55 Data Scavenger Hunts: Learning about Data Together Ted Laderas 4:00 PM May 9 2019 Daisy Bingham Room https://csvconf.com/speakers/#ted-laderas 2019-05-09T16:00:00 Data exploration and visualization are a highly accessible gateway activity to learning data science. In this talk, we discuss our experience with "Data Scavenger Hunts" using web apps to democratize data science and make it accessible to a wide variety of audiences. In order to acheive this, we have developed an R package called `burro` that can enable public datasets to be explored together via a sharable web app. In this talk, we talk about our experience with using data scavenger hunts to teach each other interesting things about data. In particular, we share our experiences with exploring the NHANES (National Health Nutirition Examination Survey) data and the insights we have taught each other. We show that this guided and communal data exploration leads to increased confidence and curiosity about data science in Biodata-Club, our learning community. `burro` apps can be deployed by anyone to start conversations about data. https://csvconf.com/img/speakers-2019/tladeras.jpg

