csvconf: talks: 14 rows where where room = "Daisy Bingham Room" sorted by rowid

talks

14 rows where room = "Daisy Bingham Room" sorted by rowid

An alphabet soup of government agencies like FERC, EPA, EIA, PHMSA, MSHA and the ISOs and RTOs collect and publish terabytes of data about the US energy system. It includes operating costs and fuel consumption, hourly power output and GHG emissions, and the age and length of natural gas pipelines, the price of electricity every 5 minutes at thousands of nodes in the grid, coal production numbers and much much more. In theory all this data is public and freely available, but in practice it takes a lot of wrangling to make it usable for analysis. The result: it's packaged up by one or two platform monopolies that charge tens of thousands of dollars a year for easy access, excluding most non-corporate users. But for anyone interested in the ongoing transformation of our energy system and its climate impacts, this data is a treasure trove worth excavating. The Public Utility Data Liberation project (https://github.com/catalyst-cooperative/pudl) has been working for the last 2.5 years to liberate this data and make it freely accessible to activists, data journalists, and researchers working on US climate and energy policy. This talk will take a look at what the data is, where it comes from, why it's interesting, how we're processing it and making it available, and some of the challenges we're facing and opportunities we see ahead. 1
Data exploration and visualization are a highly accessible gateway activity to learning data science. In this talk, we discuss our experience with "Data Scavenger Hunts" using web apps to democratize data science and make it accessible to a wide variety of audiences. In order to acheive this, we have developed an R package called `burro` that can enable public datasets to be explored together via a sharable web app. In this talk, we talk about our experience with using data scavenger hunts to teach each other interesting things about data. In particular, we share our experiences with exploring the NHANES (National Health Nutirition Examination Survey) data and the insights we have taught each other. We show that this guided and communal data exploration leads to increased confidence and curiosity about data science in Biodata-Club, our learning community. `burro` apps can be deployed by anyone to start conversations about data. 1
Datasette is a tool for instantly publishing structured data on the internet. It makes it easy to construct and execute arbitrary SQL queries (using SQLite) and export the results as CSV. It's accompanying tool csvs-to-sqlite makes it easy to convert CSV files into a SQLite database. More info at https://github.com/simonw/datasette 1
Frictionless Data (FD) initiatives out of Open Knowledge International provide attractive informatics and processing capabilities. The BCO-DMO data repository used FD tools on real-world datasets, and we have some lessons learned to share. By building upon existing FD tools, we found ways to reduce the amount of time data managers spend generating metadata, and writing custom scripts. We are also developing ways for data managers with varying levels of scripting ability to make use of Frictionless Data tools. 1
In this discussion, I will explore the nuances of building an open data program as a step towards participatory democracy and the challenges of creating trust with local communities. 1
Jane is a data scientist. Jane uses Jupyter notebooks as her working environment, and her presentation environment. These “computational essays” allow Jane to present her methods and her results to her colleagues at the same time. Jane is happy with this. But sometimes it’s difficult for Jane to share notebooks with her colleagues, and even harder for them to re-mix or re-use parts of the notebook, or to share their changes back to Jane. And sometimes Jane finds it hard to explain the flow of a particular notebook, or how different notebooks are tied together. There’s no provision for keeping things modular. First, I will discuss a few up-and-coming projects that are leveraging the power of new web technologies and faster browsers to solve all of fictional Jane’s problems, and more. Second, I will present a prototype for my own solution that is also web-based, and draws inspiration from some now-uncool graphical programming languages. 1
King County, Washington is currently undergoing complex social and economic changes that have both positive and negative impacts on local residents. With rising rents displacing low-income households to outlying areas or into homelessness, there is a critical need to understand the prevalence and mechanisms of housing insecurity for government organizations tasked to address these issues. Currently, our team of Data and social scientists at the University of Washington, eScience Institute are collaborating with stakeholders across the King County Housing and Homelessness prevention agencies to derive meaningful insights from their data. While their aim is not to produce academic research, our findings may have significant and immediate impact for their organizational practices and the communities they are tasked to serve. In this context and where there is an iterative and constant feedback loop present, reproducibility of the results we present to them, from figures, tables, and even written language is critical. To ensure a successful collaboration, our team has built an end to end data reporting infrastructure to produce reports for our stakeholders that are reproducible and version controlled from raw data to final product. We employ some common open source tools to accomplish this, including R/Rstudio, Python, Rmarkdown, and git. 1
Measurement Lab (M-Lab) is the largest open internet measurement platform in the world, hosting internet-scale measurement experiments and releasing all data into the public domain (CC0). We are an open source project with contributors from civil society organizations, educational institutions, and private sector companies, and are a fiscally sponsored project of Code for Science & Society. Our mission is to Measure the Internet, save the data, and make it universally accessible and useful. M-Lab works to advance network research and empowers the public with useful information about broadband and mobile connections by maintaining a scalable, global platform for conducting internet measurements, and by supporting an ecosystem of external partners and users around the world interested in using the resulting open data. Our users are researchers, activists, analysts, journalists, experiment developers, hosting providers, regulators, municipalities, and every day consumers. M-Lab works to enhance internet transparency, and help to promote and sustain a healthy, innovative internet by supporting our users in their research and data analyses, developing and publicizing new use cases for our datasets, forming collaborative partnerships, and building open source measurement tools. In this talk we will introduce the M-Lab platform with the csvconf audience, share how our open data and open source tools are being used by communities around the world, and provide resources on how attendees might use them as well. 1
Software sustainability means different things to different groups of people, including the persistence of working software, and the persistence of people, or funding. While we can generally define sustainability as the inflow of resources is sufficient to do the needed work, where those resources both include and are somewhat transferrable into human effort, users, funders, managers, and developers (or maintainers) all mean somewhat different things when they use sustainable in the context of research software. This talk will illustrate some of these different views, and their corresponding aims. It will also provide some guidance on quantifying research software sustainability from some of these views. 1
The world of modern data teamwork isn't one that can be created by software and business process alone. Individuals will need to alter their behavior, which is the hardest part about change. This talk will examine the traits and behaviors that lead organizations to be truly data-driven. 1
This talk will cover the tips & tricks of community-sourcing for openAFRICA.net - the largest independent repository of open data on the African continent - used in order to digitise deadwood to give citizens actionable information. Data availability in many African countries is dismal. Files upon files of important government information lay gathering dust in abandoned storage rooms. On the other hand, journalists and citizens need this information to keep governments in check and ensure they are receiving the right services. So how do you turn paper-based government archives into machine readable & API accessible digital files? 1
While aggregators of multiple listing service (MLS) data have opened up much of the process of finding a house on the internet, the data is still closed. The MLS quotes personal security as the primary reason. What data is being protected and what is the impact of that decision? As a consumer of data from numerous sources, REX has routinely been denied access to this data. In this case we make the case for all the societal benefits for opening this data and the implications. 1
While algorithms and computing power get all the press, the special sauce behind many recent machine learning breakthroughs are meticulously labeled training data. Developing and maintaining these data sets as public goods is both an art and a science. In this talk I'll present a new set of best practices gleaned from interview with ~20 data set builders, maintainers, and funders. Topics include: encouraging collaboration between rival data teams; finding and addressing ethical issues with crowd labeling; launching competitions to spur data set use; and revenue generation models for sustainability. 1
Your bash shell has a _lot_ utilities that can be used to help you analyze your data, often easier and faster than trying to import your data to an external tool. But these utilities can be hard to find and even harder to figure out the right options. I'll walkthrough a data set and show examples of the best utility to use in different situations. I'll go over common commands like `grep` and `cut`, more exotic commands like `comm` and `tr`, and dig up very useful options to a command you might have overlooked, like `sort -k`. 1

Link	rowid ▼	title	speaker	time	day	room	url	datetime	abstract	image
5	5	Frictionless Data Processing in the Wild	Amber D. York	10:30 AM	May 8 2019	Daisy Bingham Room	https://csvconf.com/speakers/#amber-d-york	2019-05-08T10:30:00	Frictionless Data (FD) initiatives out of Open Knowledge International provide attractive informatics and processing capabilities. The BCO-DMO data repository used FD tools on real-world datasets, and we have some lessons learned to share. By building upon existing FD tools, we found ways to reduce the amount of time data managers spend generating metadata, and writing custom scripts. We are also developing ways for data managers with varying levels of scripting ability to make use of Frictionless Data tools.	https://csvconf.com/img/speakers-2019/adyork.jpg
8	8	What's Next after Notebooks?	Alexander Morley	11:00 AM	May 8 2019	Daisy Bingham Room	https://csvconf.com/speakers/#alexander-morley	2019-05-08T11:00:00	Jane is a data scientist. Jane uses Jupyter notebooks as her working environment, and her presentation environment. These “computational essays” allow Jane to present her methods and her results to her colleagues at the same time. Jane is happy with this. But sometimes it’s difficult for Jane to share notebooks with her colleagues, and even harder for them to re-mix or re-use parts of the notebook, or to share their changes back to Jane. And sometimes Jane finds it hard to explain the flow of a particular notebook, or how different notebooks are tied together. There’s no provision for keeping things modular. First, I will discuss a few up-and-coming projects that are leveraging the power of new web technologies and faster browsers to solve all of fictional Jane’s problems, and more. Second, I will present a prototype for my own solution that is also web-based, and draws inspiration from some now-uncool graphical programming languages.	https://csvconf.com/img/speakers-2019/amorley.jpg
11	11	Bash <3's CSVs: Data Analysis on the cmdline	Nicholas Canzoneri	11:30 AM	May 8 2019	Daisy Bingham Room	https://csvconf.com/speakers/#nicholas-canzoneri	2019-05-08T11:30:00	Your bash shell has a _lot_ utilities that can be used to help you analyze your data, often easier and faster than trying to import your data to an external tool. But these utilities can be hard to find and even harder to figure out the right options. I'll walkthrough a data set and show examples of the best utility to use in different situations. I'll go over common commands like `grep` and `cut`, more exotic commands like `comm` and `tr`, and dig up very useful options to a command you might have overlooked, like `sort -k`.	https://csvconf.com/img/speakers-2019/ncanzoneri.jpg
16	16	Measurement Lab - Open Data on Global Internet Health	Chris Ritzo	1:30 PM	May 8 2019	Daisy Bingham Room	https://csvconf.com/speakers/#chris-ritzo	2019-05-08T13:30:00	Measurement Lab (M-Lab) is the largest open internet measurement platform in the world, hosting internet-scale measurement experiments and releasing all data into the public domain (CC0). We are an open source project with contributors from civil society organizations, educational institutions, and private sector companies, and are a fiscally sponsored project of Code for Science & Society. Our mission is to Measure the Internet, save the data, and make it universally accessible and useful. M-Lab works to advance network research and empowers the public with useful information about broadband and mobile connections by maintaining a scalable, global platform for conducting internet measurements, and by supporting an ecosystem of external partners and users around the world interested in using the resulting open data. Our users are researchers, activists, analysts, journalists, experiment developers, hosting providers, regulators, municipalities, and every day consumers. M-Lab works to enhance internet transparency, and help to promote and sustain a healthy, innovative internet by supporting our users in their research and data analyses, developing and publicizing new use cases for our datasets, forming collaborative partnerships, and building open source measurement tools. In this talk we will introduce the M-Lab platform with the csvconf audience, share how our open data and open source tools are being used by communities around the world, and provide resources on how attendees might use them as well.	https://csvconf.com/img/speakers-2019/critzo.jpg
19	19	Hacking Open Data in Africa	Soila Kenya	2:00 PM	May 8 2019	Daisy Bingham Room	https://csvconf.com/speakers/#soila-kenya	2019-05-08T14:00:00	This talk will cover the tips & tricks of community-sourcing for openAFRICA.net - the largest independent repository of open data on the African continent - used in order to digitise deadwood to give citizens actionable information. Data availability in many African countries is dismal. Files upon files of important government information lay gathering dust in abandoned storage rooms. On the other hand, journalists and citizens need this information to keep governments in check and ensure they are receiving the right services. So how do you turn paper-based government archives into machine readable & API accessible digital files?	https://csvconf.com/img/speakers-2019/skenya.jpg
22	22	US Energy Data Liberation	Zane Selvans	2:30 PM	May 8 2019	Daisy Bingham Room	https://csvconf.com/speakers/#zane-selvans	2019-05-08T14:30:00	An alphabet soup of government agencies like FERC, EPA, EIA, PHMSA, MSHA and the ISOs and RTOs collect and publish terabytes of data about the US energy system. It includes operating costs and fuel consumption, hourly power output and GHG emissions, and the age and length of natural gas pipelines, the price of electricity every 5 minutes at thousands of nodes in the grid, coal production numbers and much much more. In theory all this data is public and freely available, but in practice it takes a lot of wrangling to make it usable for analysis. The result: it's packaged up by one or two platform monopolies that charge tens of thousands of dollars a year for easy access, excluding most non-corporate users. But for anyone interested in the ongoing transformation of our energy system and its climate impacts, this data is a treasure trove worth excavating. The Public Utility Data Liberation project (https://github.com/catalyst-cooperative/pudl) has been working for the last 2.5 years to liberate this data and make it freely accessible to activists, data journalists, and researchers working on US climate and energy policy. This talk will take a look at what the data is, where it comes from, why it's interesting, how we're processing it and making it available, and some of the challenges we're facing and opportunities we see ahead.	https://csvconf.com/img/speakers-2019/zselvans.jpg
26	26	Fundamentals of Research Software Sustainability	Daniel S. Katz	3:30 PM	May 8 2019	Daisy Bingham Room	https://csvconf.com/speakers/#daniel-s-katz	2019-05-08T15:30:00	Software sustainability means different things to different groups of people, including the persistence of working software, and the persistence of people, or funding. While we can generally define sustainability as the inflow of resources is sufficient to do the needed work, where those resources both include and are somewhat transferrable into human effort, users, funders, managers, and developers (or maintainers) all mean somewhat different things when they use sustainable in the context of research software. This talk will illustrate some of these different views, and their corresponding aims. It will also provide some guidance on quantifying research software sustainability from some of these views.	https://csvconf.com/img/speakers-2019/dskatz.jpg
29	29	How to Feed Your Robot: Building and Maintaining Open Machine Learning Datasets	Evan Tachovsky	4:00 PM	May 8 2019	Daisy Bingham Room	https://csvconf.com/speakers/#evan-tachovsky	2019-05-08T16:00:00	While algorithms and computing power get all the press, the special sauce behind many recent machine learning breakthroughs are meticulously labeled training data. Developing and maintaining these data sets as public goods is both an art and a science. In this talk I'll present a new set of best practices gleaned from interview with ~20 data set builders, maintainers, and funders. Topics include: encouraging collaboration between rival data teams; finding and addressing ethical issues with crowd labeling; launching competitions to spur data set use; and revenue generation models for sustainability.	https://csvconf.com/img/speakers-2019/etachovsky.jpg
36	36	Should Real Estate Data be Open?	Andy Terrel	11:00 AM	May 9 2019	Daisy Bingham Room	https://csvconf.com/speakers/#andy-terrel	2019-05-09T11:00:00	While aggregators of multiple listing service (MLS) data have opened up much of the process of finding a house on the internet, the data is still closed. The MLS quotes personal security as the primary reason. What data is being protected and what is the impact of that decision? As a consumer of data from numerous sources, REX has routinely been denied access to this data. In this case we make the case for all the societal benefits for opening this data and the implications.	https://csvconf.com/img/speakers-2019/aterrel.jpg
39	39	How to Build a Data-Driven Culture	Patrick McGarry	11:30 AM	May 9 2019	Daisy Bingham Room	https://csvconf.com/speakers/#patrick-mcgarry	2019-05-09T11:30:00	The world of modern data teamwork isn't one that can be created by software and business process alone. Individuals will need to alter their behavior, which is the hardest part about change. This talk will examine the traits and behaviors that lead organizations to be truly data-driven.	https://csvconf.com/img/speakers-2019/pmcgarry.jpg
45	45	Datasette	Simon Willison	2:00 PM	May 9 2019	Daisy Bingham Room	https://csvconf.com/speakers/#simon-willison	2019-05-09T14:00:00	Datasette is a tool for instantly publishing structured data on the internet. It makes it easy to construct and execute arbitrary SQL queries (using SQLite) and export the results as CSV. It's accompanying tool csvs-to-sqlite makes it easy to convert CSV files into a SQLite database. More info at https://github.com/simonw/datasette	https://csvconf.com/img/speakers-2019/swillison.jpg
48	48	How open data can promote participatory democracy	Hector Dominguez	2:30 PM	May 9 2019	Daisy Bingham Room	https://csvconf.com/speakers/#hector-dominguez	2019-05-09T14:30:00	In this discussion, I will explore the nuances of building an open data program as a step towards participatory democracy and the challenges of creating trust with local communities.	https://csvconf.com/img/speakers-2019/hdominguez.jpg
52	52	Version Controlled Stakeholder Reporting: Building an End-to-End Data Reporting Infrastructure	Jose M Hernandez	3:30 PM	May 9 2019	Daisy Bingham Room	https://csvconf.com/speakers/#jose-m-hernandez	2019-05-09T15:30:00	King County, Washington is currently undergoing complex social and economic changes that have both positive and negative impacts on local residents. With rising rents displacing low-income households to outlying areas or into homelessness, there is a critical need to understand the prevalence and mechanisms of housing insecurity for government organizations tasked to address these issues. Currently, our team of Data and social scientists at the University of Washington, eScience Institute are collaborating with stakeholders across the King County Housing and Homelessness prevention agencies to derive meaningful insights from their data. While their aim is not to produce academic research, our findings may have significant and immediate impact for their organizational practices and the communities they are tasked to serve. In this context and where there is an iterative and constant feedback loop present, reproducibility of the results we present to them, from figures, tables, and even written language is critical. To ensure a successful collaboration, our team has built an end to end data reporting infrastructure to produce reports for our stakeholders that are reproducible and version controlled from raw data to final product. We employ some common open source tools to accomplish this, including R/Rstudio, Python, Rmarkdown, and git.	https://csvconf.com/img/speakers-2019/jmhernandez.jpg
55	55	Data Scavenger Hunts: Learning about Data Together	Ted Laderas	4:00 PM	May 9 2019	Daisy Bingham Room	https://csvconf.com/speakers/#ted-laderas	2019-05-09T16:00:00	Data exploration and visualization are a highly accessible gateway activity to learning data science. In this talk, we discuss our experience with "Data Scavenger Hunts" using web apps to democratize data science and make it accessible to a wide variety of audiences. In order to acheive this, we have developed an R package called `burro` that can enable public datasets to be explored together via a sharable web app. In this talk, we talk about our experience with using data scavenger hunts to teach each other interesting things about data. In particular, we share our experiences with exploring the NHANES (National Health Nutirition Examination Survey) data and the insights we have taught each other. We show that this guided and communal data exploration leads to increased confidence and curiosity about data science in Biodata-Club, our learning community. `burro` apps can be deployed by anyone to start conversations about data.	https://csvconf.com/img/speakers-2019/tladeras.jpg

Link

rowid ▼

title

speaker

time

day

room

url

datetime

abstract

image

Frictionless Data Processing in the Wild

Amber D. York

10:30 AM

May 8 2019

Daisy Bingham Room

https://csvconf.com/speakers/#amber-d-york

2019-05-08T10:30:00

Frictionless Data (FD) initiatives out of Open Knowledge International provide attractive informatics and processing capabilities. The BCO-DMO data repository used FD tools on real-world datasets, and we have some lessons learned to share. By building upon existing FD tools, we found ways to reduce the amount of time data managers spend generating metadata, and writing custom scripts. We are also developing ways for data managers with varying levels of scripting ability to make use of Frictionless Data tools.

https://csvconf.com/img/speakers-2019/adyork.jpg

What's Next after Notebooks?

Alexander Morley

11:00 AM

May 8 2019

Daisy Bingham Room

https://csvconf.com/speakers/#alexander-morley

2019-05-08T11:00:00

Jane is a data scientist. Jane uses Jupyter notebooks as her working environment, and her presentation environment. These “computational essays” allow Jane to present her methods and her results to her colleagues at the same time. Jane is happy with this. But sometimes it’s difficult for Jane to share notebooks with her colleagues, and even harder for them to re-mix or re-use parts of the notebook, or to share their changes back to Jane. And sometimes Jane finds it hard to explain the flow of a particular notebook, or how different notebooks are tied together. There’s no provision for keeping things modular. First, I will discuss a few up-and-coming projects that are leveraging the power of new web technologies and faster browsers to solve all of fictional Jane’s problems, and more. Second, I will present a prototype for my own solution that is also web-based, and draws inspiration from some now-uncool graphical programming languages.

https://csvconf.com/img/speakers-2019/amorley.jpg

Bash <3's CSVs: Data Analysis on the cmdline

Nicholas Canzoneri

11:30 AM

May 8 2019

Daisy Bingham Room

https://csvconf.com/speakers/#nicholas-canzoneri

2019-05-08T11:30:00

Your bash shell has a _lot_ utilities that can be used to help you analyze your data, often easier and faster than trying to import your data to an external tool. But these utilities can be hard to find and even harder to figure out the right options. I'll walkthrough a data set and show examples of the best utility to use in different situations. I'll go over common commands like `grep` and `cut`, more exotic commands like `comm` and `tr`, and dig up very useful options to a command you might have overlooked, like `sort -k`.

https://csvconf.com/img/speakers-2019/ncanzoneri.jpg

Measurement Lab - Open Data on Global Internet Health

Chris Ritzo

1:30 PM

May 8 2019

Daisy Bingham Room

https://csvconf.com/speakers/#chris-ritzo

2019-05-08T13:30:00

Measurement Lab (M-Lab) is the largest open internet measurement platform in the world, hosting internet-scale measurement experiments and releasing all data into the public domain (CC0). We are an open source project with contributors from civil society organizations, educational institutions, and private sector companies, and are a fiscally sponsored project of Code for Science & Society. Our mission is to Measure the Internet, save the data, and make it universally accessible and useful. M-Lab works to advance network research and empowers the public with useful information about broadband and mobile connections by maintaining a scalable, global platform for conducting internet measurements, and by supporting an ecosystem of external partners and users around the world interested in using the resulting open data. Our users are researchers, activists, analysts, journalists, experiment developers, hosting providers, regulators, municipalities, and every day consumers. M-Lab works to enhance internet transparency, and help to promote and sustain a healthy, innovative internet by supporting our users in their research and data analyses, developing and publicizing new use cases for our datasets, forming collaborative partnerships, and building open source measurement tools. In this talk we will introduce the M-Lab platform with the csvconf audience, share how our open data and open source tools are being used by communities around the world, and provide resources on how attendees might use them as well.

https://csvconf.com/img/speakers-2019/critzo.jpg

Hacking Open Data in Africa

Soila Kenya

2:00 PM

May 8 2019

Daisy Bingham Room

https://csvconf.com/speakers/#soila-kenya

2019-05-08T14:00:00

This talk will cover the tips & tricks of community-sourcing for openAFRICA.net - the largest independent repository of open data on the African continent - used in order to digitise deadwood to give citizens actionable information. Data availability in many African countries is dismal. Files upon files of important government information lay gathering dust in abandoned storage rooms. On the other hand, journalists and citizens need this information to keep governments in check and ensure they are receiving the right services. So how do you turn paper-based government archives into machine readable & API accessible digital files?

https://csvconf.com/img/speakers-2019/skenya.jpg

US Energy Data Liberation

Zane Selvans

2:30 PM

May 8 2019

Daisy Bingham Room

https://csvconf.com/speakers/#zane-selvans

2019-05-08T14:30:00

An alphabet soup of government agencies like FERC, EPA, EIA, PHMSA, MSHA and the ISOs and RTOs collect and publish terabytes of data about the US energy system. It includes operating costs and fuel consumption, hourly power output and GHG emissions, and the age and length of natural gas pipelines, the price of electricity every 5 minutes at thousands of nodes in the grid, coal production numbers and much much more. In theory all this data is public and freely available, but in practice it takes a lot of wrangling to make it usable for analysis. The result: it's packaged up by one or two platform monopolies that charge tens of thousands of dollars a year for easy access, excluding most non-corporate users. But for anyone interested in the ongoing transformation of our energy system and its climate impacts, this data is a treasure trove worth excavating. The Public Utility Data Liberation project (https://github.com/catalyst-cooperative/pudl) has been working for the last 2.5 years to liberate this data and make it freely accessible to activists, data journalists, and researchers working on US climate and energy policy. This talk will take a look at what the data is, where it comes from, why it's interesting, how we're processing it and making it available, and some of the challenges we're facing and opportunities we see ahead.

https://csvconf.com/img/speakers-2019/zselvans.jpg

Fundamentals of Research Software Sustainability

Daniel S. Katz

3:30 PM

May 8 2019

Daisy Bingham Room

https://csvconf.com/speakers/#daniel-s-katz

2019-05-08T15:30:00

Software sustainability means different things to different groups of people, including the persistence of working software, and the persistence of people, or funding. While we can generally define sustainability as the inflow of resources is sufficient to do the needed work, where those resources both include and are somewhat transferrable into human effort, users, funders, managers, and developers (or maintainers) all mean somewhat different things when they use sustainable in the context of research software. This talk will illustrate some of these different views, and their corresponding aims. It will also provide some guidance on quantifying research software sustainability from some of these views.

https://csvconf.com/img/speakers-2019/dskatz.jpg

How to Feed Your Robot: Building and Maintaining Open Machine Learning Datasets

Evan Tachovsky

4:00 PM

May 8 2019

Daisy Bingham Room

https://csvconf.com/speakers/#evan-tachovsky

2019-05-08T16:00:00

While algorithms and computing power get all the press, the special sauce behind many recent machine learning breakthroughs are meticulously labeled training data. Developing and maintaining these data sets as public goods is both an art and a science. In this talk I'll present a new set of best practices gleaned from interview with ~20 data set builders, maintainers, and funders. Topics include: encouraging collaboration between rival data teams; finding and addressing ethical issues with crowd labeling; launching competitions to spur data set use; and revenue generation models for sustainability.

https://csvconf.com/img/speakers-2019/etachovsky.jpg

Should Real Estate Data be Open?

Andy Terrel

11:00 AM

May 9 2019

Daisy Bingham Room

https://csvconf.com/speakers/#andy-terrel

2019-05-09T11:00:00

While aggregators of multiple listing service (MLS) data have opened up much of the process of finding a house on the internet, the data is still closed. The MLS quotes personal security as the primary reason. What data is being protected and what is the impact of that decision? As a consumer of data from numerous sources, REX has routinely been denied access to this data. In this case we make the case for all the societal benefits for opening this data and the implications.

https://csvconf.com/img/speakers-2019/aterrel.jpg

How to Build a Data-Driven Culture

Patrick McGarry

11:30 AM

May 9 2019

Daisy Bingham Room

https://csvconf.com/speakers/#patrick-mcgarry

2019-05-09T11:30:00

The world of modern data teamwork isn't one that can be created by software and business process alone. Individuals will need to alter their behavior, which is the hardest part about change. This talk will examine the traits and behaviors that lead organizations to be truly data-driven.

https://csvconf.com/img/speakers-2019/pmcgarry.jpg

Datasette

Simon Willison

2:00 PM

May 9 2019

Daisy Bingham Room

https://csvconf.com/speakers/#simon-willison

2019-05-09T14:00:00

Datasette is a tool for instantly publishing structured data on the internet. It makes it easy to construct and execute arbitrary SQL queries (using SQLite) and export the results as CSV. It's accompanying tool csvs-to-sqlite makes it easy to convert CSV files into a SQLite database. More info at https://github.com/simonw/datasette

https://csvconf.com/img/speakers-2019/swillison.jpg

How open data can promote participatory democracy

Hector Dominguez

2:30 PM

May 9 2019

Daisy Bingham Room

https://csvconf.com/speakers/#hector-dominguez

2019-05-09T14:30:00

In this discussion, I will explore the nuances of building an open data program as a step towards participatory democracy and the challenges of creating trust with local communities.

https://csvconf.com/img/speakers-2019/hdominguez.jpg

Version Controlled Stakeholder Reporting: Building an End-to-End Data Reporting Infrastructure

Jose M Hernandez

3:30 PM

May 9 2019

Daisy Bingham Room

https://csvconf.com/speakers/#jose-m-hernandez

2019-05-09T15:30:00

King County, Washington is currently undergoing complex social and economic changes that have both positive and negative impacts on local residents. With rising rents displacing low-income households to outlying areas or into homelessness, there is a critical need to understand the prevalence and mechanisms of housing insecurity for government organizations tasked to address these issues. Currently, our team of Data and social scientists at the University of Washington, eScience Institute are collaborating with stakeholders across the King County Housing and Homelessness prevention agencies to derive meaningful insights from their data. While their aim is not to produce academic research, our findings may have significant and immediate impact for their organizational practices and the communities they are tasked to serve. In this context and where there is an iterative and constant feedback loop present, reproducibility of the results we present to them, from figures, tables, and even written language is critical. To ensure a successful collaboration, our team has built an end to end data reporting infrastructure to produce reports for our stakeholders that are reproducible and version controlled from raw data to final product. We employ some common open source tools to accomplish this, including R/Rstudio, Python, Rmarkdown, and git.

https://csvconf.com/img/speakers-2019/jmhernandez.jpg

Data Scavenger Hunts: Learning about Data Together

Ted Laderas

4:00 PM

May 9 2019

Daisy Bingham Room

https://csvconf.com/speakers/#ted-laderas

2019-05-09T16:00:00

Data exploration and visualization are a highly accessible gateway activity to learning data science. In this talk, we discuss our experience with "Data Scavenger Hunts" using web apps to democratize data science and make it accessible to a wide variety of audiences. In order to acheive this, we have developed an R package called `burro` that can enable public datasets to be explored together via a sharable web app. In this talk, we talk about our experience with using data scavenger hunts to teach each other interesting things about data. In particular, we share our experiences with exploring the NHANES (National Health Nutirition Examination Survey) data and the insights we have taught each other. We show that this guided and communal data exploration leads to increased confidence and curiosity about data science in Biodata-Club, our learning community. `burro` apps can be deployed by anyone to start conversations about data.

https://csvconf.com/img/speakers-2019/tladeras.jpg

Advanced export

JSON shape: default, array, newline-delimited

CREATE TABLE [talks] ( [title] TEXT, [speaker] TEXT, [time] TEXT, [day] TEXT, [room] TEXT, [url] TEXT, [datetime] TEXT, [abstract] TEXT, [image] TEXT )