5 Frictionless Data Processing in the Wild Amber D. York 10:30 AM May 8 2019 Daisy Bingham Room https://csvconf.com/speakers/#amber-d-york 2019-05-08T10:30:00 Frictionless Data (FD) initiatives out of Open Knowledge International provide attractive informatics and processing capabilities. The BCO-DMO data repository used FD tools on real-world datasets, and we have some lessons learned to share. By building upon existing FD tools, we found ways to reduce the amount of time data managers spend generating metadata, and writing custom scripts. We are also developing ways for data managers with varying levels of scripting ability to make use of Frictionless Data tools. https://csvconf.com/img/speakers-2019/adyork.jpg
8 What's Next after Notebooks? Alexander Morley 11:00 AM May 8 2019 Daisy Bingham Room https://csvconf.com/speakers/#alexander-morley 2019-05-08T11:00:00 Jane is a data scientist. Jane uses Jupyter notebooks as her working environment, and her presentation environment. These “computational essays” allow Jane to present her methods and her results to her colleagues at the same time. Jane is happy with this. But sometimes it’s difficult for Jane to share notebooks with her colleagues, and even harder for them to re-mix or re-use parts of the notebook, or to share their changes back to Jane. And sometimes Jane finds it hard to explain the flow of a particular notebook, or how different notebooks are tied together. There’s no provision for keeping things modular. First, I will discuss a few up-and-coming projects that are leveraging the power of new web technologies and faster browsers to solve all of fictional Jane’s problems, and more. Second, I will present a prototype for my own solution that is also web-based, and draws inspiration from some now-uncool graphical programming languages. https://csvconf.com/img/speakers-2019/amorley.jpg
11 Bash <3's CSVs: Data Analysis on the cmdline Nicholas Canzoneri 11:30 AM May 8 2019 Daisy Bingham Room https://csvconf.com/speakers/#nicholas-canzoneri 2019-05-08T11:30:00 Your bash shell has a _lot_ utilities that can be used to help you analyze your data, often easier and faster than trying to import your data to an external tool. But these utilities can be hard to find and even harder to figure out the right options. I'll walkthrough a data set and show examples of the best utility to use in different situations. I'll go over common commands like `grep` and `cut`, more exotic commands like `comm` and `tr`, and dig up very useful options to a command you might have overlooked, like `sort -k`. https://csvconf.com/img/speakers-2019/ncanzoneri.jpg
16 Measurement Lab - Open Data on Global Internet Health Chris Ritzo 1:30 PM May 8 2019 Daisy Bingham Room https://csvconf.com/speakers/#chris-ritzo 2019-05-08T13:30:00 Measurement Lab (M-Lab) is the largest open internet measurement platform in the world, hosting internet-scale measurement experiments and releasing all data into the public domain (CC0). We are an open source project with contributors from civil society organizations, educational institutions, and private sector companies, and are a fiscally sponsored project of Code for Science & Society. Our mission is to Measure the Internet, save the data, and make it universally accessible and useful. M-Lab works to advance network research and empowers the public with useful information about broadband and mobile connections by maintaining a scalable, global platform for conducting internet measurements, and by supporting an ecosystem of external partners and users around the world interested in using the resulting open data. Our users are researchers, activists, analysts, journalists, experiment developers, hosting providers, regulators, municipalities, and every day consumers. M-Lab works to enhance internet transparency, and help to promote and sustain a healthy, innovative internet by supporting our users in their research and data analyses, developing and publicizing new use cases for our datasets, forming collaborative partnerships, and building open source measurement tools. In this talk we will introduce the M-Lab platform with the csvconf audience, share how our open data and open source tools are being used by communities around the world, and provide resources on how attendees might use them as well. https://csvconf.com/img/speakers-2019/critzo.jpg
19 Hacking Open Data in Africa Soila Kenya 2:00 PM May 8 2019 Daisy Bingham Room https://csvconf.com/speakers/#soila-kenya 2019-05-08T14:00:00 This talk will cover the tips & tricks of community-sourcing for openAFRICA.net - the largest independent repository of open data on the African continent - used in order to digitise deadwood to give citizens actionable information. Data availability in many African countries is dismal. Files upon files of important government information lay gathering dust in abandoned storage rooms. On the other hand, journalists and citizens need this information to keep governments in check and ensure they are receiving the right services. So how do you turn paper-based government archives into machine readable & API accessible digital files? https://csvconf.com/img/speakers-2019/skenya.jpg
22 US Energy Data Liberation Zane Selvans 2:30 PM May 8 2019 Daisy Bingham Room https://csvconf.com/speakers/#zane-selvans 2019-05-08T14:30:00 An alphabet soup of government agencies like FERC, EPA, EIA, PHMSA, MSHA and the ISOs and RTOs collect and publish terabytes of data about the US energy system. It includes operating costs and fuel consumption, hourly power output and GHG emissions, and the age and length of natural gas pipelines, the price of electricity every 5 minutes at thousands of nodes in the grid, coal production numbers and much much more. In theory all this data is public and freely available, but in practice it takes a lot of wrangling to make it usable for analysis. The result: it's packaged up by one or two platform monopolies that charge tens of thousands of dollars a year for easy access, excluding most non-corporate users. But for anyone interested in the ongoing transformation of our energy system and its climate impacts, this data is a treasure trove worth excavating. The Public Utility Data Liberation project (https://github.com/catalyst-cooperative/pudl) has been working for the last 2.5 years to liberate this data and make it freely accessible to activists, data journalists, and researchers working on US climate and energy policy. This talk will take a look at what the data is, where it comes from, why it's interesting, how we're processing it and making it available, and some of the challenges we're facing and opportunities we see ahead. https://csvconf.com/img/speakers-2019/zselvans.jpg
26 Fundamentals of Research Software Sustainability Daniel S. Katz 3:30 PM May 8 2019 Daisy Bingham Room https://csvconf.com/speakers/#daniel-s-katz 2019-05-08T15:30:00 Software sustainability means different things to different groups of people, including the persistence of working software, and the persistence of people, or funding. While we can generally define sustainability as the inflow of resources is sufficient to do the needed work, where those resources both include and are somewhat transferrable into human effort, users, funders, managers, and developers (or maintainers) all mean somewhat different things when they use sustainable in the context of research software. This talk will illustrate some of these different views, and their corresponding aims. It will also provide some guidance on quantifying research software sustainability from some of these views. https://csvconf.com/img/speakers-2019/dskatz.jpg
29 How to Feed Your Robot: Building and Maintaining Open Machine Learning Datasets Evan Tachovsky 4:00 PM May 8 2019 Daisy Bingham Room https://csvconf.com/speakers/#evan-tachovsky 2019-05-08T16:00:00 While algorithms and computing power get all the press, the special sauce behind many recent machine learning breakthroughs are meticulously labeled training data. Developing and maintaining these data sets as public goods is both an art and a science. In this talk I'll present a new set of best practices gleaned from interview with ~20 data set builders, maintainers, and funders. Topics include: encouraging collaboration between rival data teams; finding and addressing ethical issues with crowd labeling; launching competitions to spur data set use; and revenue generation models for sustainability. https://csvconf.com/img/speakers-2019/etachovsky.jpg
36 Should Real Estate Data be Open? Andy Terrel 11:00 AM May 9 2019 Daisy Bingham Room https://csvconf.com/speakers/#andy-terrel 2019-05-09T11:00:00 While aggregators of multiple listing service (MLS) data have opened up much of the process of finding a house on the internet, the data is still closed. The MLS quotes personal security as the primary reason. What data is being protected and what is the impact of that decision? As a consumer of data from numerous sources, REX has routinely been denied access to this data. In this case we make the case for all the societal benefits for opening this data and the implications. https://csvconf.com/img/speakers-2019/aterrel.jpg
39 How to Build a Data-Driven Culture Patrick McGarry 11:30 AM May 9 2019 Daisy Bingham Room https://csvconf.com/speakers/#patrick-mcgarry 2019-05-09T11:30:00 The world of modern data teamwork isn't one that can be created by software and business process alone. Individuals will need to alter their behavior, which is the hardest part about change. This talk will examine the traits and behaviors that lead organizations to be truly data-driven. https://csvconf.com/img/speakers-2019/pmcgarry.jpg
45 Datasette Simon Willison 2:00 PM May 9 2019 Daisy Bingham Room https://csvconf.com/speakers/#simon-willison 2019-05-09T14:00:00 Datasette is a tool for instantly publishing structured data on the internet. It makes it easy to construct and execute arbitrary SQL queries (using SQLite) and export the results as CSV. It's accompanying tool csvs-to-sqlite makes it easy to convert CSV files into a SQLite database. More info at https://github.com/simonw/datasette https://csvconf.com/img/speakers-2019/swillison.jpg
48 How open data can promote participatory democracy Hector Dominguez 2:30 PM May 9 2019 Daisy Bingham Room https://csvconf.com/speakers/#hector-dominguez 2019-05-09T14:30:00 In this discussion, I will explore the nuances of building an open data program as a step towards participatory democracy and the challenges of creating trust with local communities. https://csvconf.com/img/speakers-2019/hdominguez.jpg
52 Version Controlled Stakeholder Reporting: Building an End-to-End Data Reporting Infrastructure Jose M Hernandez 3:30 PM May 9 2019 Daisy Bingham Room https://csvconf.com/speakers/#jose-m-hernandez 2019-05-09T15:30:00 King County, Washington is currently undergoing complex social and economic changes that have both positive and negative impacts on local residents. With rising rents displacing low-income households to outlying areas or into homelessness, there is a critical need to understand the prevalence and mechanisms of housing insecurity for government organizations tasked to address these issues. Currently, our team of Data and social scientists at the University of Washington, eScience Institute are collaborating with stakeholders across the King County Housing and Homelessness prevention agencies to derive meaningful insights from their data. While their aim is not to produce academic research, our findings may have significant and immediate impact for their organizational practices and the communities they are tasked to serve. In this context and where there is an iterative and constant feedback loop present, reproducibility of the results we present to them, from figures, tables, and even written language is critical. To ensure a successful collaboration, our team has built an end to end data reporting infrastructure to produce reports for our stakeholders that are reproducible and version controlled from raw data to final product. We employ some common open source tools to accomplish this, including R/Rstudio, Python, Rmarkdown, and git. https://csvconf.com/img/speakers-2019/jmhernandez.jpg
55 Data Scavenger Hunts: Learning about Data Together Ted Laderas 4:00 PM May 9 2019 Daisy Bingham Room https://csvconf.com/speakers/#ted-laderas 2019-05-09T16:00:00 Data exploration and visualization are a highly accessible gateway activity to learning data science. In this talk, we discuss our experience with "Data Scavenger Hunts" using web apps to democratize data science and make it accessible to a wide variety of audiences. In order to acheive this, we have developed an R package called `burro` that can enable public datasets to be explored together via a sharable web app. In this talk, we talk about our experience with using data scavenger hunts to teach each other interesting things about data. In particular, we share our experiences with exploring the NHANES (National Health Nutirition Examination Survey) data and the insights we have taught each other. We show that this guided and communal data exploration leads to increased confidence and curiosity about data science in Biodata-Club, our learning community. `burro` apps can be deployed by anyone to start conversations about data. https://csvconf.com/img/speakers-2019/tladeras.jpg

