csvconf: talks: 3 rows where where day = "May 8 2019" and time = "4:00 PM" sorted by datetime

talks

3 rows where day = "May 8 2019" and time = "4:00 PM" sorted by datetime

Link	rowid	title	speaker	time	day	room	url	datetime ▼	abstract	image
27	27	Missing Data for Data - Our Quest to Clean Up Institutional Affiliations in Dryad	Daniella Lowenberg, Ted Habermann	4:00 PM	May 8 2019	Main Sanctuary	https://csvconf.com/speakers/#daniella-lowenberg-ted-habermann	2019-05-08T16:00:00	Data publications and other scholarly outputs do not have clean information on institutional affiliations for researchers. This is caused by a mix of not asking researchers for this information up front, as well as incomplete metadata being submitted by repositories to DataCite and (publications to) Crossref. Without this standardized information we can't properly report on or provide statistics on deposits, usage metrics, or reach by institution. Join us for a session about our work using OpenRefine, organizational identifiers (ROR), and some manual sleuthing to update and improve Dryad institutional metadata for 25,000 data publications.	https://csvconf.com/img/speakers-2019/dlowenberg_thabermann.jpg
28	28	Where Has Your Data Come From? Data Ancestry and Other Tales	Dr. Tania Allard	4:00 PM	May 8 2019	Fuller Hall	https://csvconf.com/speakers/#dr-tania-allard	2019-05-08T16:00:00	Over the last few years, great improvements have been made around the areas of reproducible scientific computing research and FAIR (findable, accessible, interoperable and reusable) data. As a consequence, data scientists and researchers alike have started to incorporate modern software development practices in their workflows (i.e. version control, testing). More and more emphasis has been made on the need to look after the quality and validity of the software developed. But what about the data? Data validation and integrity is just as important as the adequacy of the code ingesting and processing the datasets. In this talk, I will take a high-level look at concepts such as data lineage, provenance, continuous data validation and present real-world examples in which these concepts have been applied to different real-world data pipelines increasing not only the confidence of the results obtained but also the efficiency and integrity of the workflows themselves.	https://csvconf.com/img/speakers-2019/tallard.jpg
29	29	How to Feed Your Robot: Building and Maintaining Open Machine Learning Datasets	Evan Tachovsky	4:00 PM	May 8 2019	Daisy Bingham Room	https://csvconf.com/speakers/#evan-tachovsky	2019-05-08T16:00:00	While algorithms and computing power get all the press, the special sauce behind many recent machine learning breakthroughs are meticulously labeled training data. Developing and maintaining these data sets as public goods is both an art and a science. In this talk I'll present a new set of best practices gleaned from interview with ~20 data set builders, maintainers, and funders. Topics include: encouraging collaboration between rival data teams; finding and addressing ethical issues with crowd labeling; launching competitions to spur data set use; and revenue generation models for sustainability.	https://csvconf.com/img/speakers-2019/etachovsky.jpg

Link

rowid

title

speaker

time

day

room

url

datetime ▼

abstract

image

Missing Data for Data - Our Quest to Clean Up Institutional Affiliations in Dryad

Daniella Lowenberg, Ted Habermann

4:00 PM

May 8 2019

Main Sanctuary

https://csvconf.com/speakers/#daniella-lowenberg-ted-habermann

2019-05-08T16:00:00

Data publications and other scholarly outputs do not have clean information on institutional affiliations for researchers. This is caused by a mix of not asking researchers for this information up front, as well as incomplete metadata being submitted by repositories to DataCite and (publications to) Crossref. Without this standardized information we can't properly report on or provide statistics on deposits, usage metrics, or reach by institution. Join us for a session about our work using OpenRefine, organizational identifiers (ROR), and some manual sleuthing to update and improve Dryad institutional metadata for 25,000 data publications.

https://csvconf.com/img/speakers-2019/dlowenberg_thabermann.jpg

Where Has Your Data Come From? Data Ancestry and Other Tales

Dr. Tania Allard

4:00 PM

May 8 2019

Fuller Hall

https://csvconf.com/speakers/#dr-tania-allard

2019-05-08T16:00:00

Over the last few years, great improvements have been made around the areas of reproducible scientific computing research and FAIR (findable, accessible, interoperable and reusable) data. As a consequence, data scientists and researchers alike have started to incorporate modern software development practices in their workflows (i.e. version control, testing). More and more emphasis has been made on the need to look after the quality and validity of the software developed. But what about the data? Data validation and integrity is just as important as the adequacy of the code ingesting and processing the datasets. In this talk, I will take a high-level look at concepts such as data lineage, provenance, continuous data validation and present real-world examples in which these concepts have been applied to different real-world data pipelines increasing not only the confidence of the results obtained but also the efficiency and integrity of the workflows themselves.

https://csvconf.com/img/speakers-2019/tallard.jpg

How to Feed Your Robot: Building and Maintaining Open Machine Learning Datasets

Evan Tachovsky

4:00 PM

May 8 2019

Daisy Bingham Room

https://csvconf.com/speakers/#evan-tachovsky

2019-05-08T16:00:00

While algorithms and computing power get all the press, the special sauce behind many recent machine learning breakthroughs are meticulously labeled training data. Developing and maintaining these data sets as public goods is both an art and a science. In this talk I'll present a new set of best practices gleaned from interview with ~20 data set builders, maintainers, and funders. Topics include: encouraging collaboration between rival data teams; finding and addressing ethical issues with crowd labeling; launching competitions to spur data set use; and revenue generation models for sustainability.

https://csvconf.com/img/speakers-2019/etachovsky.jpg

Advanced export

JSON shape: default, array, newline-delimited

CREATE TABLE [talks] ( [title] TEXT, [speaker] TEXT, [time] TEXT, [day] TEXT, [room] TEXT, [url] TEXT, [datetime] TEXT, [abstract] TEXT, [image] TEXT )