54 |
Annotations in the Classroom; The Classroom in Annotations |
Asura Enkhbayar |
4:00 PM |
May 9 2019 |
Fuller Hall |
https://csvconf.com/speakers/#asura-enkhbayar |
2019-05-09T16:00:00 |
In this talk I want to explore the impact of using Hypothesis in the classroom. What does it mean to read, think, and annotate publicly? How does it change your learning experience as a student? How do you evaluate and assess different annotation styles as a teacher?
As a student I can share my own experience of this new mode of teaching and learning. As a data scientist, I want to give a taste of possible new metrics and measurements based on annotation data. Finally, as a critical scholar I am hoping to explore how this new metrification and monitoring of reading might affect education.
The talk will rely on data outlined in this essay: https://course-journals.lib.sfu.ca/index.php/pdc2018/article/view/240/213 |
https://csvconf.com/img/speakers-2019/aenkhbayar.jpg |
27 |
Missing Data for Data - Our Quest to Clean Up Institutional Affiliations in Dryad |
Daniella Lowenberg, Ted Habermann |
4:00 PM |
May 8 2019 |
Main Sanctuary |
https://csvconf.com/speakers/#daniella-lowenberg-ted-habermann |
2019-05-08T16:00:00 |
Data publications and other scholarly outputs do not have clean information on institutional affiliations for researchers. This is caused by a mix of not asking researchers for this information up front, as well as incomplete metadata being submitted by repositories to DataCite and (publications to) Crossref. Without this standardized information we can't properly report on or provide statistics on deposits, usage metrics, or reach by institution. Join us for a session about our work using OpenRefine, organizational identifiers (ROR), and some manual sleuthing to update and improve Dryad institutional metadata for 25,000 data publications. |
https://csvconf.com/img/speakers-2019/dlowenberg_thabermann.jpg |
28 |
Where Has Your Data Come From? Data Ancestry and Other Tales |
Dr. Tania Allard |
4:00 PM |
May 8 2019 |
Fuller Hall |
https://csvconf.com/speakers/#dr-tania-allard |
2019-05-08T16:00:00 |
Over the last few years, great improvements have been made around the areas of reproducible scientific computing research and FAIR (findable, accessible, interoperable and reusable) data. As a consequence, data scientists and researchers alike have started to incorporate modern software development practices in their workflows (i.e. version control, testing). More and more emphasis has been made on the need to look after the quality and validity of the software developed. But what about the data? Data validation and integrity is just as important as the adequacy of the code ingesting and processing the datasets.
In this talk, I will take a high-level look at concepts such as data lineage, provenance, continuous data validation and present real-world examples in which these concepts have been applied to different real-world data pipelines increasing not only the confidence of the results obtained but also the efficiency and integrity of the workflows themselves. |
https://csvconf.com/img/speakers-2019/tallard.jpg |
29 |
How to Feed Your Robot: Building and Maintaining Open Machine Learning Datasets |
Evan Tachovsky |
4:00 PM |
May 8 2019 |
Daisy Bingham Room |
https://csvconf.com/speakers/#evan-tachovsky |
2019-05-08T16:00:00 |
While algorithms and computing power get all the press, the special sauce behind many recent machine learning breakthroughs are meticulously labeled training data. Developing and maintaining these data sets as public goods is both an art and a science. In this talk I'll present a new set of best practices gleaned from interview with ~20 data set builders, maintainers, and funders. Topics include: encouraging collaboration between rival data teams; finding and addressing ethical issues with crowd labeling; launching competitions to spur data set use; and revenue generation models for sustainability. |
https://csvconf.com/img/speakers-2019/etachovsky.jpg |
53 |
Spanking and Spreadsheets: Data-driven Sex Journalism |
Jacqueline Nolis & Heather Nolis |
4:00 PM |
May 9 2019 |
Main Sanctuary |
https://csvconf.com/speakers/#jacqueline-nolis-heather-nolis |
2019-05-09T16:00:00 |
When we saw that the Stranger, Seattle’s alternative newspaper, was running a survey on kinks and sexual preferences, we knew we had to get our hands on the data. We convinced the that using machine learning methods on the responses would be a good idea, and then we quickly set out to analyzing them. But we had never written an article for a newspaper before—nor had we worked with data even remotely as dirty. It turns out what makes for a good blog post or technical journal is very different than writing for print, especially for such a sensitive topic. In this talk we will cover how we made sense of the lewd data, the statistical methods we used (and failures we produced), as well as the final results that ended up in our feature article: “There Are Four Kinds of Sex Partners (which one are you).” |
https://csvconf.com/img/speakers-2019/jnolis_hnolis.jpg |
55 |
Data Scavenger Hunts: Learning about Data Together |
Ted Laderas |
4:00 PM |
May 9 2019 |
Daisy Bingham Room |
https://csvconf.com/speakers/#ted-laderas |
2019-05-09T16:00:00 |
Data exploration and visualization are a highly accessible gateway activity to learning data science. In this talk, we discuss our experience with "Data Scavenger Hunts" using web apps to democratize data science and make it accessible to a wide variety of audiences. In order to acheive this, we have developed an R package called `burro` that can enable public datasets to be explored together via a sharable web app. In this talk, we talk about our experience with using data scavenger hunts to teach each other interesting things about data. In particular, we share our experiences with exploring the NHANES (National Health Nutirition Examination Survey) data and the insights we have taught each other. We show that this guided and communal data exploration leads to increased confidence and curiosity about data science in Biodata-Club, our learning community. `burro` apps can be deployed by anyone to start conversations about data. |
https://csvconf.com/img/speakers-2019/tladeras.jpg |