csvconf: talks: 57 rows where sorted by abstract

talks

57 rows sorted by abstract

Link	rowid	title	speaker	time	day	room	url	datetime	abstract ▼	image
1	1	Warm Breakfast Buffet / Espresso Cart / Hangout time		9:00 AM	May 8 2019	Main Sanctuary		2019-05-08T09:00:00
2	2	Intros / Hello in Main Sanctuary		10:00 AM	May 8 2019	Main Sanctuary		2019-05-08T10:00:00
12	12	Lunch in Fuller Hall		12:00 PM	May 8 2019	Main Sanctuary		2019-05-08T12:00:00
13	13	KEYNOTE	Dr. Kari L. Jordan	12:30 PM	May 8 2019	Main Sanctuary		2019-05-08T12:30:00
23	23	Break		3:00 PM	May 8 2019	Main Sanctuary		2019-05-08T15:00:00
30	30	KEYNOTE	Alix Dunn	4:30 PM	May 8 2019	Main Sanctuary		2019-05-08T16:30:00
31	31	Reception in Fuller Hall until 7pm		5:30 PM	May 8 2019	Main Sanctuary		2019-05-08T17:30:00
32	32	Warm Breakfast Buffet / Espresso Cart / Hangout time		9:00 AM	May 9 2019	Main Sanctuary		2019-05-09T09:00:00
33	33	KEYNOTE	Teon L. Brooks	10:00 AM	May 9 2019	Main Sanctuary		2019-05-09T10:00:00
40	40	Lunch in Fuller Hall		12:00 PM	May 9 2019	Main Sanctuary		2019-05-09T12:00:00
41	41	KEYNOTE	Kirstie Whitaker	12:30 PM	May 9 2019	Main Sanctuary		2019-05-09T12:30:00
42	42	Lightning Talks in The Main Sanctuary		1:30 PM	May 9 2019	Main Sanctuary		2019-05-09T13:30:00
49	49	Break		3:00 PM	May 9 2019	Main Sanctuary		2019-05-09T15:00:00
56	56	Outros/Goodbye in Main Sanctuary		4:30 PM	May 9 2019	Main Sanctuary		2019-05-09T16:30:00
57	57	5-6pm Hangout time		5:00 PM	May 9 2019	Main Sanctuary		2019-05-09T17:00:00
7	7	Let’s ROR together - building open research organization identifiers	Maria Gould	11:00 AM	May 8 2019	Fuller Hall	https://csvconf.com/speakers/#maria-gould	2019-05-08T11:00:00		https://csvconf.com/img/speakers-2019/comma.jpg
22	22	US Energy Data Liberation	Zane Selvans	2:30 PM	May 8 2019	Daisy Bingham Room	https://csvconf.com/speakers/#zane-selvans	2019-05-08T14:30:00	An alphabet soup of government agencies like FERC, EPA, EIA, PHMSA, MSHA and the ISOs and RTOs collect and publish terabytes of data about the US energy system. It includes operating costs and fuel consumption, hourly power output and GHG emissions, and the age and length of natural gas pipelines, the price of electricity every 5 minutes at thousands of nodes in the grid, coal production numbers and much much more. In theory all this data is public and freely available, but in practice it takes a lot of wrangling to make it usable for analysis. The result: it's packaged up by one or two platform monopolies that charge tens of thousands of dollars a year for easy access, excluding most non-corporate users. But for anyone interested in the ongoing transformation of our energy system and its climate impacts, this data is a treasure trove worth excavating. The Public Utility Data Liberation project (https://github.com/catalyst-cooperative/pudl) has been working for the last 2.5 years to liberate this data and make it freely accessible to activists, data journalists, and researchers working on US climate and energy policy. This talk will take a look at what the data is, where it comes from, why it's interesting, how we're processing it and making it available, and some of the challenges we're facing and opportunities we see ahead.	https://csvconf.com/img/speakers-2019/zselvans.jpg
46	46	Beyond the WARC: Making Web Archives More Useful and User-friendly	Ilya Kreymer	2:30 PM	May 9 2019	Main Sanctuary	https://csvconf.com/speakers/#ilya-kreymer	2019-05-09T14:30:00	Archives of the web contain not only web pages but any type of data. The only standard in web archiving is the ISO WARC file format, which specifies raw data captured from the web. However, the WARC files often lack any context or metadata about how this data was captured. The talk will briefly cover the basics of the WARC format, and also provide possible ideas for making web archiving data more user-friendly, present existing tools and suggest ideas for interoperable ways to describe collections and make sense of growing web archive data beyond the WARC format.	https://csvconf.com/img/speakers-2019/ikreymer.jpg
14	14	The n-of-many-ones: Fueling Community Science with Personal Data	Bastian Greshake Tzovaras	1:30 PM	May 8 2019	Main Sanctuary	https://csvconf.com/speakers/#bastian-greshake-tzovaras	2019-05-08T13:30:00	As we are becoming more and more digitized, we are creating and collecting more personal data than ever before, offering unprecedented chances for research. This potential wealth of data for research comes practical problems such as: How to merge data streams? And how can people responsibly share their personal information? In this talk we will explore how to enable responsible personal data sharing by giving individuals granular sharing options and how this can enable community science. Furthermore, we will also see how we can scale up personal data exploration from the n-of-one to an n-of-many-ones, using a JupyterHub setup built right into a community science platform.	https://csvconf.com/img/speakers-2019/bgtzovaras.jpg
38	38	Preparing Clients for Open Source Contributions	Aaron Couch	11:30 AM	May 9 2019	Fuller Hall	https://csvconf.com/speakers/#aaron-couch	2019-05-09T11:30:00	At CivicActions we've developed a number of methodologies to help enable our clients to be a part of the open source community. This talk will focus on a number of those strategies including capture management, project roles and tools, and reporting measures. This talk will be slightly shorter to allow for time for a more collaborative discussion.	https://csvconf.com/img/speakers-2019/acouch.jpg
55	55	Data Scavenger Hunts: Learning about Data Together	Ted Laderas	4:00 PM	May 9 2019	Daisy Bingham Room	https://csvconf.com/speakers/#ted-laderas	2019-05-09T16:00:00	Data exploration and visualization are a highly accessible gateway activity to learning data science. In this talk, we discuss our experience with "Data Scavenger Hunts" using web apps to democratize data science and make it accessible to a wide variety of audiences. In order to acheive this, we have developed an R package called `burro` that can enable public datasets to be explored together via a sharable web app. In this talk, we talk about our experience with using data scavenger hunts to teach each other interesting things about data. In particular, we share our experiences with exploring the NHANES (National Health Nutirition Examination Survey) data and the insights we have taught each other. We show that this guided and communal data exploration leads to increased confidence and curiosity about data science in Biodata-Club, our learning community. `burro` apps can be deployed by anyone to start conversations about data.	https://csvconf.com/img/speakers-2019/tladeras.jpg
27	27	Missing Data for Data - Our Quest to Clean Up Institutional Affiliations in Dryad	Daniella Lowenberg, Ted Habermann	4:00 PM	May 8 2019	Main Sanctuary	https://csvconf.com/speakers/#daniella-lowenberg-ted-habermann	2019-05-08T16:00:00	Data publications and other scholarly outputs do not have clean information on institutional affiliations for researchers. This is caused by a mix of not asking researchers for this information up front, as well as incomplete metadata being submitted by repositories to DataCite and (publications to) Crossref. Without this standardized information we can't properly report on or provide statistics on deposits, usage metrics, or reach by institution. Join us for a session about our work using OpenRefine, organizational identifiers (ROR), and some manual sleuthing to update and improve Dryad institutional metadata for 25,000 data publications.	https://csvconf.com/img/speakers-2019/dlowenberg_thabermann.jpg
45	45	Datasette	Simon Willison	2:00 PM	May 9 2019	Daisy Bingham Room	https://csvconf.com/speakers/#simon-willison	2019-05-09T14:00:00	Datasette is a tool for instantly publishing structured data on the internet. It makes it easy to construct and execute arbitrary SQL queries (using SQLite) and export the results as CSV. It's accompanying tool csvs-to-sqlite makes it easy to convert CSV files into a SQLite database. More info at https://github.com/simonw/datasette	https://csvconf.com/img/speakers-2019/swillison.jpg
3	3	The Time is Now	Afua Bruce	10:30 AM	May 8 2019	Main Sanctuary	https://csvconf.com/speakers/#afua-bruce	2019-05-08T10:30:00	Despite the tech world’s image of being fast-moving and constantly evolving, segments of those working in, or wanting to work in, tech are often told to wait. It’s no secret that the tech and data worlds do not reflect the nation’s diversity. And for those of us working in Civic Tech or Public Interest Technology, the struggle to secure long-term funding for projects or identify career paths is real. What if we shifted our mindset from “with a lot of time and a lot of work, we’ll figure it out,” to “let’s experiment and incite change today.” The time is now to tackle the question: as the data-driven community matures, how does it do so in a way that’s inclusive and sustainable?	https://csvconf.com/img/speakers-2019/abruce.jpg
25	25	Building Communities of Practice around Environmental Open Data Science	Julia Lowndes	3:30 PM	May 8 2019	Fuller Hall	https://csvconf.com/speakers/#julia-lowndes	2019-05-08T15:30:00	Environmental scientists are a diverse community that ranges from climatologists to geneticists, but we are united by an enormous need to work efficiently with data – and by the fact that we seldom have formal computing or data analysis training of any kind. There is great opportunity to borrow from the work of software engineers and use collaborative open tools that facilitate better science in less time. However, a fundamental shift is needed in the environmental science community that prioritizes data science and provides emerging scientific leaders training in open science tools and practices to strengthen and accelerate their work. I will discuss my work to catalyze this shift through two programs I have developed and lead at the National Center for Ecological Analysis and Synthesis (NCEAS) at the University of California at Santa Barbara. The first is the Ocean Health Index training program, which teaches international government and academic scientists how to channel the best available scientific information into marine policy using our scientific method and tools. And the second I have recently launched in January 2019 as a Mozilla Fellow: Openscapes, a mentorship program that empowers environmental scientists with open data science tools and grows the community of practice.	https://csvconf.com/img/speakers-2019/jlowndes.jpg
5	5	Frictionless Data Processing in the Wild	Amber D. York	10:30 AM	May 8 2019	Daisy Bingham Room	https://csvconf.com/speakers/#amber-d-york	2019-05-08T10:30:00	Frictionless Data (FD) initiatives out of Open Knowledge International provide attractive informatics and processing capabilities. The BCO-DMO data repository used FD tools on real-world datasets, and we have some lessons learned to share. By building upon existing FD tools, we found ways to reduce the amount of time data managers spend generating metadata, and writing custom scripts. We are also developing ways for data managers with varying levels of scripting ability to make use of Frictionless Data tools.	https://csvconf.com/img/speakers-2019/adyork.jpg
10	10	Project Athena: Mapping African Militaries for Good	John Stupart	11:30 AM	May 8 2019	Fuller Hall	https://csvconf.com/speakers/#john-stupart	2019-05-08T11:30:00	I will discuss ADR's project aimed at creating a database mapping out, literally, where each and every significant item in a country's military is. Tanks, planes, barracks and the like are being categorised and placed in our custom-made "Athena" database. Aimed at pulling open the lid on arms flows into Africa, Athena is suited for journalists and those in the humanitarian or academic field alike working in anti-corruption and transparency as it pertains to defence and military affairs.	https://csvconf.com/img/speakers-2019/jstupart.jpg
47	47	How a File Format Led to a Crossword Scandal	Saul Pwanson	2:30 PM	May 9 2019	Fuller Hall	https://csvconf.com/speakers/#saul-pwanson	2019-05-09T14:30:00	In 2016 I designed a plain-text file format for crossword puzzle data, and then spent a couple of months building a micro-data-pipeline, scraping tens of thousands of crosswords from various sources. Then, having all those crosswords in a simple format, I wanted to see if there were any common grid patterns--and discovered egregious plagiarism by a major crossword editor that had gone on for years. This talk would cover the file format, data pipeline, and the design choices that aided rapid exploration; the evidence for the scandal, from the initial anomalies to the final damning visualization; and what it's like for a data project to get 15 minutes of fame.	https://csvconf.com/img/speakers-2019/spwanson.jpg
48	48	How open data can promote participatory democracy	Hector Dominguez	2:30 PM	May 9 2019	Daisy Bingham Room	https://csvconf.com/speakers/#hector-dominguez	2019-05-09T14:30:00	In this discussion, I will explore the nuances of building an open data program as a step towards participatory democracy and the challenges of creating trust with local communities.	https://csvconf.com/img/speakers-2019/hdominguez.jpg
54	54	Annotations in the Classroom; The Classroom in Annotations	Asura Enkhbayar	4:00 PM	May 9 2019	Fuller Hall	https://csvconf.com/speakers/#asura-enkhbayar	2019-05-09T16:00:00	In this talk I want to explore the impact of using Hypothesis in the classroom. What does it mean to read, think, and annotate publicly? How does it change your learning experience as a student? How do you evaluate and assess different annotation styles as a teacher? As a student I can share my own experience of this new mode of teaching and learning. As a data scientist, I want to give a taste of possible new metrics and measurements based on annotation data. Finally, as a critical scholar I am hoping to explore how this new metrification and monitoring of reading might affect education. The talk will rely on data outlined in this essay: https://course-journals.lib.sfu.ca/index.php/pdc2018/article/view/240/213	https://csvconf.com/img/speakers-2019/aenkhbayar.jpg
43	43	Data Science Training and Community Building through Hackweeks	Micaela Parker	2:00 PM	May 9 2019	Main Sanctuary	https://csvconf.com/speakers/#micaela-parker	2019-05-09T14:00:00	Informal training activities enable researchers at all levels to rapidly learn data science tools and best practices that fit their research questions and make significant advances in their work. In this talk, I will describe a highly successful informal training that has emerged in recent years called Hackweeks. These hackathon-style events place a strong focus on cultivating data science literacy, building a community of practice, and developing resources within an existing domain-specific community. By bringing together researchers from many different universities to address methods challenges within a research domain, Hackweeks take advantage of a shared language and shared scientific objectives. The Hackweek structure is designed to foster collaboration and learning among people from various stages of their career and technical abilities, and catalyze a community through a shared interest in solving computational challenges within a field (Huppenkothen et al, 2018). Hackweeks originally came out of the Astronomy community (Astro Hack Week, entering its 6th year in 2019) and the model has been successfully propagated to: neuroscience (Neurohackweek, now a 2-week NIH-funded program called Neurohackademy), geospatial sciences (Geohackweek), oceanography (Oceanhackweek), and more.	https://csvconf.com/img/speakers-2019/mparker.jpg
8	8	What's Next after Notebooks?	Alexander Morley	11:00 AM	May 8 2019	Daisy Bingham Room	https://csvconf.com/speakers/#alexander-morley	2019-05-08T11:00:00	Jane is a data scientist. Jane uses Jupyter notebooks as her working environment, and her presentation environment. These “computational essays” allow Jane to present her methods and her results to her colleagues at the same time. Jane is happy with this. But sometimes it’s difficult for Jane to share notebooks with her colleagues, and even harder for them to re-mix or re-use parts of the notebook, or to share their changes back to Jane. And sometimes Jane finds it hard to explain the flow of a particular notebook, or how different notebooks are tied together. There’s no provision for keeping things modular. First, I will discuss a few up-and-coming projects that are leveraging the power of new web technologies and faster browsers to solve all of fictional Jane’s problems, and more. Second, I will present a prototype for my own solution that is also web-based, and draws inspiration from some now-uncool graphical programming languages.	https://csvconf.com/img/speakers-2019/amorley.jpg
44	44	Crafting Data-Driven Stories for the Everyday Reader	Marisa Aquilina	2:00 PM	May 9 2019	Fuller Hall	https://csvconf.com/speakers/#marisa-aquilina	2019-05-09T14:00:00	Journalists don’t write for other journalists—they write for the curious and community-minded public. In the same way, statistical journalism should not be a black box of visualizations and narrative meant only for data makers like us. Crafting data-driven stories for a general audience means giving readers an opportunity to interact with a fun and practical use case while explaining the interpretative thinking that lies under the hood of statistical methods. I am an undergraduate at Cal Poly that writes and builds interactive, data-driven publishings with a team of students. I'll walk you through how we ideate fascinating questions, make methods explainable, and use Jupyter Notebooks to share reproducible code.	https://csvconf.com/img/speakers-2019/maquilina.jpg
52	52	Version Controlled Stakeholder Reporting: Building an End-to-End Data Reporting Infrastructure	Jose M Hernandez	3:30 PM	May 9 2019	Daisy Bingham Room	https://csvconf.com/speakers/#jose-m-hernandez	2019-05-09T15:30:00	King County, Washington is currently undergoing complex social and economic changes that have both positive and negative impacts on local residents. With rising rents displacing low-income households to outlying areas or into homelessness, there is a critical need to understand the prevalence and mechanisms of housing insecurity for government organizations tasked to address these issues. Currently, our team of Data and social scientists at the University of Washington, eScience Institute are collaborating with stakeholders across the King County Housing and Homelessness prevention agencies to derive meaningful insights from their data. While their aim is not to produce academic research, our findings may have significant and immediate impact for their organizational practices and the communities they are tasked to serve. In this context and where there is an iterative and constant feedback loop present, reproducibility of the results we present to them, from figures, tables, and even written language is critical. To ensure a successful collaboration, our team has built an end to end data reporting infrastructure to produce reports for our stakeholders that are reproducible and version controlled from raw data to final product. We employ some common open source tools to accomplish this, including R/Rstudio, Python, Rmarkdown, and git.	https://csvconf.com/img/speakers-2019/jmhernandez.jpg
16	16	Measurement Lab - Open Data on Global Internet Health	Chris Ritzo	1:30 PM	May 8 2019	Daisy Bingham Room	https://csvconf.com/speakers/#chris-ritzo	2019-05-08T13:30:00	Measurement Lab (M-Lab) is the largest open internet measurement platform in the world, hosting internet-scale measurement experiments and releasing all data into the public domain (CC0). We are an open source project with contributors from civil society organizations, educational institutions, and private sector companies, and are a fiscally sponsored project of Code for Science & Society. Our mission is to Measure the Internet, save the data, and make it universally accessible and useful. M-Lab works to advance network research and empowers the public with useful information about broadband and mobile connections by maintaining a scalable, global platform for conducting internet measurements, and by supporting an ecosystem of external partners and users around the world interested in using the resulting open data. Our users are researchers, activists, analysts, journalists, experiment developers, hosting providers, regulators, municipalities, and every day consumers. M-Lab works to enhance internet transparency, and help to promote and sustain a healthy, innovative internet by supporting our users in their research and data analyses, developing and publicizing new use cases for our datasets, forming collaborative partnerships, and building open source measurement tools. In this talk we will introduce the M-Lab platform with the csvconf audience, share how our open data and open source tools are being used by communities around the world, and provide resources on how attendees might use them as well.	https://csvconf.com/img/speakers-2019/critzo.jpg
37	37	Improving the Quality of Neuroimaging Scans	Jonathan Uriarte-Lopez	11:30 AM	May 9 2019	Main Sanctuary	https://csvconf.com/speakers/#jonathan-uriarte-lopez	2019-05-09T11:30:00	My presentation will be on how adjustments to the human connectome project (HCP) pipeline, with the use of the advanced normalization tools (ANTS), improved the data quality of neuroimaging scans provided by the Autism Brain Imaging Data Exchange (ABIDE). Autism spectrum disorder (ASD) is a neurodevelopmental disorder consisting of altered social and communication difficulties along with repetitive and restrictive behaviors. It is difficult to study a living brain safely which is why we use neuroimaging techniques such as MRI. Data quality can be affected by subjects moving in the scanner, or due to computing pipeline issues. Adjustments to the HCP pipeline lead to an increase in data quality, and a decrease in the amount of data lost. This will save researchers time, money, and data to study the neurophysiological aspects of ASD.	https://csvconf.com/img/speakers-2019/julopez.jpg
51	51	Squishy Amoeba-Like Objects	Darius Kazemi	3:30 PM	May 9 2019	Fuller Hall	https://csvconf.com/speakers/#darius-kazemi	2019-05-09T15:30:00	On June 19th, 1970, a group of computer scientists who were inventing the internet referred to key pieces of its proposed design as "squishy amoeba-like objects". Amoebas are porous yet have well-defined boundaries. Thinking about these creatures gives us new ways to think about networks and communities and technology. This talk makes a case for the squishy amoeba-like object as an organizing principle for what is broadly being called "the decentralized web", a web outside of monolithic, monopolistic actors.	https://csvconf.com/img/speakers-2019/dkazemi.jpg
28	28	Where Has Your Data Come From? Data Ancestry and Other Tales	Dr. Tania Allard	4:00 PM	May 8 2019	Fuller Hall	https://csvconf.com/speakers/#dr-tania-allard	2019-05-08T16:00:00	Over the last few years, great improvements have been made around the areas of reproducible scientific computing research and FAIR (findable, accessible, interoperable and reusable) data. As a consequence, data scientists and researchers alike have started to incorporate modern software development practices in their workflows (i.e. version control, testing). More and more emphasis has been made on the need to look after the quality and validity of the software developed. But what about the data? Data validation and integrity is just as important as the adequacy of the code ingesting and processing the datasets. In this talk, I will take a high-level look at concepts such as data lineage, provenance, continuous data validation and present real-world examples in which these concepts have been applied to different real-world data pipelines increasing not only the confidence of the results obtained but also the efficiency and integrity of the workflows themselves.	https://csvconf.com/img/speakers-2019/tallard.jpg
21	21	Using Research and Technology to Tackle Gender Bias	Mollie Marr	2:30 PM	May 8 2019	Fuller Hall	https://csvconf.com/speakers/#mollie-marr	2019-05-08T14:30:00	Qualitative and mixed methods research studies have provided insight into the language and patterns associated with unconscious bias across multiple fields. These patterns can be converted to rules for NLP and other text analysis programs making it possible to identify bias within a written document. This talk will explore one approach using qualitative research in gender bias and letters of recommendation and evaluation to define rules for a web-based automated text analysis program using NLP. The role of research and technology in addressing structural issues such as bias will be discussed and participants will be encouraged to think about ways in which existing research might be used to inspire new solutions to social problems.	https://csvconf.com/img/speakers-2019/mmarr.jpg
24	24	Qualitative Research Using Open Source Tools	Beth Duckles & Vicky Steeves	3:30 PM	May 8 2019	Main Sanctuary	https://csvconf.com/speakers/#beth-duckles-vicky-steeves	2019-05-08T15:30:00	Qualitative research has long suffered from a lack of free tools for analysis, leaving no options for researchers without significant funds for software licenses. This presents significant challenges for equity. This panel discussion will explore the first two free/libre open source qualitative analysis tools out there: qcoder (R package) and Taguette (desktop application). Drawing from the diverse backgrounds of the presenters (social science, library & information science, software engineering), we will discuss what openness and extensibility means for qualitative research, and how the two tools we've built facilitate equitable, open sharing.	https://csvconf.com/img/speakers-2019/bduckles_vsteeves.jpg
9	9	Chromatocracy: The Pantone® of Mexican Social Mobility.	Adrian Santuario Hernández	11:30 AM	May 8 2019	Main Sanctuary	https://csvconf.com/speakers/#adrian-santuario-hernández	2019-05-08T11:30:00	Skin colour ratings have been used in several studies about racial discrimination and racial attitudes but have rarely been used in Mexico. Although since 1917 the Mexican Constitution establish a legal equality of citizens without distinction as to race, sex, language or religion, it is common to see several discrimination on work spaces, educational facilities and government offices based on skin colour. Despite the last surveys leaded for INEGI (National Institute of Geography and Statistics) shows signals of racial discrimination in his reports, a glance at the map will suffice to see clearly that colour skin is an important issue for mexican social mobility. For example: 95% of the presenters on Mexican TV Shows have 1-3 colour skin tone (Based on PERLA Colour Palette) while 85% of the total Mexican population have 5-7 colour skin tone; that gap on the tones generate an aspirational sentiment of status: whiter is better. To support that correlation between skin color and social mobility I developed a Web Scraping, Machine Learning and Facial Recognition algorithm to answer two questions: Who is more successful in Mexico? (95 percent of CEO tend to have whither color skin (1-3 PERLA) that the rest) and, Are there a correlation between your political affiliation an your tone skin? (Right-wing party (PVEM) is wither that the left-wing party (PRD)). The work demonstrate how technology (Machine Learning Color Algorithm) can help to unravel hidden social dynamics in mexican culture.	https://csvconf.com/img/speakers-2019/ashernández.jpg
26	26	Fundamentals of Research Software Sustainability	Daniel S. Katz	3:30 PM	May 8 2019	Daisy Bingham Room	https://csvconf.com/speakers/#daniel-s-katz	2019-05-08T15:30:00	Software sustainability means different things to different groups of people, including the persistence of working software, and the persistence of people, or funding. While we can generally define sustainability as the inflow of resources is sufficient to do the needed work, where those resources both include and are somewhat transferrable into human effort, users, funders, managers, and developers (or maintainers) all mean somewhat different things when they use sustainable in the context of research software. This talk will illustrate some of these different views, and their corresponding aims. It will also provide some guidance on quantifying research software sustainability from some of these views.	https://csvconf.com/img/speakers-2019/dskatz.jpg
35	35	The Data to Policy Project: Using Data to Build More Equitable Communities	Melissa Mejia	11:00 AM	May 9 2019	Fuller Hall	https://csvconf.com/speakers/#melissa-mejia	2019-05-09T11:00:00	The Data to Policy Project (D2P) is an initiative creating meaningful learning experiences for students by using analysis of open data to generate equity and evidence-based policy proposals addressing local community needs. D2P is integrated into credit-bearing courses where students explore issues like policing and affordable housing in the Denver region. Over the course of a semester, students find, cite, clean, analyze, and visualize data to identify gaps or problems in policing or affordable housing, create policy proposals that address what they found, then create a research poster to communicate their findings. We encourage a critical approach to data literacy that questions the objectivity and neutrality of data and situate it in a socio-political context. The project culminates in a D2P Symposium where students present their research to their peers, faculty, staff, and community members. By focusing on student-initiated concerns and using real data to try and address them, D2P forms a connection between the courses students take and the communities they live in, increasing its meaning and impact. We also partner with local community organizations, governments, and nonprofits to identify and frame the research questions students explore. Our goal is to intentionally include community voices so that the research we work on is relevant, context-specific, and in the interest of the community it will impact. This presentation will communicate the challenges and benefits of this kind of work, how it can be replicated in other contexts, and invite feedback on how to improve the project.	https://csvconf.com/img/speakers-2019/mmejia.jpg
4	4	Digging for Urban and Civic Data in Eastern Europe	Gleb Kanunnikau	10:30 AM	May 8 2019	Fuller Hall	https://csvconf.com/speakers/#gleb-kanunnikau	2019-05-08T10:30:00	The Open Data and Open Knowledge movement in developing countries is constantly bumping against the paradoxical question: is there a way to open up data when the datasets aren't yet present? Where do you get the data to make your effort worthwhile and how do you scale to make the initiative relevant to the larger society? More importantly, do the data producers (the state bodies and the surveillance apparatus, social networks or municipal service providers) share the interest of publishing data for everyone's and society's benefit? Is there a way to dig out interesting and relevant datasets that people didn't know existed to show new ways to analyze and fight urban problems, educate people about the environment, propose solutions to post-Soviet problems like unemployment, decaying public infrastructure, never-digitized cultural assets? Do we cooperate with the fashionable "Smart city" projects funded by Chinese state corporations or remain vary of the surveillance methods they introduce in cooperation with the police and other government structure? Practical cases, civic hacking and citizen data science, establishing cooperation between unlikely partners and other questions of interest for anyone interested in the community building process from scratch.	https://csvconf.com/img/speakers-2019/gkanunnikau.jpg
39	39	How to Build a Data-Driven Culture	Patrick McGarry	11:30 AM	May 9 2019	Daisy Bingham Room	https://csvconf.com/speakers/#patrick-mcgarry	2019-05-09T11:30:00	The world of modern data teamwork isn't one that can be created by software and business process alone. Individuals will need to alter their behavior, which is the hardest part about change. This talk will examine the traits and behaviors that lead organizations to be truly data-driven.	https://csvconf.com/img/speakers-2019/pmcgarry.jpg
20	20	The Streets of Women. An Analysis of Street Nomenclature Data in Latin America and Spain through OpenStreetMap and Wikipedia	Selene Yang	2:30 PM	May 8 2019	Main Sanctuary	https://csvconf.com/speakers/#selene-yang	2019-05-08T14:30:00	This is a collaborative project of Geochicas to produce a map of the streets named after women in Latin America and Spain. This project seeks to link and generate content in OSM and Wikipedia about prominent women. It is intended to make a survey of information on streets, avenues, passages, roads that have the names of women and also their respective biographies in Wikipedia.	https://csvconf.com/img/speakers-2019/syang.jpg
18	18	Data Analysis to Improve Diversity and Equity in Graduate-Level Education	Rachel Mallinga	2:00 PM	May 8 2019	Fuller Hall	https://csvconf.com/speakers/#rachel-mallinga	2019-05-08T14:00:00	This project grew from the need to determine what students of diverse backgrounds need to feel welcomed and represented in their graduate department at the University of Oregon. Two women of color took the initiative to conduct qualitative and quantitative research on how equity and diversity are represented in curriculum, services, and departmental resources. Based on our findings we researched resources on campus that address the problems identified in our data and best practices for graduate education implemented by similar graduate-level programs in Oregon. This talk illustrates how research methods can be used to inform institutional policies and practices to improve diversity and equity.	https://csvconf.com/img/speakers-2019/rmallinga.jpg
19	19	Hacking Open Data in Africa	Soila Kenya	2:00 PM	May 8 2019	Daisy Bingham Room	https://csvconf.com/speakers/#soila-kenya	2019-05-08T14:00:00	This talk will cover the tips & tricks of community-sourcing for openAFRICA.net - the largest independent repository of open data on the African continent - used in order to digitise deadwood to give citizens actionable information. Data availability in many African countries is dismal. Files upon files of important government information lay gathering dust in abandoned storage rooms. On the other hand, journalists and citizens need this information to keep governments in check and ensure they are receiving the right services. So how do you turn paper-based government archives into machine readable & API accessible digital files?	https://csvconf.com/img/speakers-2019/skenya.jpg
34	34	Open Infrastructure for Open Science: How Binder Powers an Open Stack in the Cloud	Chris Holdgraf	11:00 AM	May 9 2019	Main Sanctuary	https://csvconf.com/speakers/#chris-holdgraf	2019-05-09T11:00:00	This talk will discuss the Binder Project in the context of open data and open science, two primary use-cases that have driven the project. It will cover the basics of the Binder Project, such as how to define a reproducible repository to share with others. It will then discuss one of Binder's core goals, which is to build on open standards to facilitate the use of many open languages, interfaces, etc. Finally I'll discuss how BinderHub, the technology behind a Binder deployment, is itself open source and deployable anywhere. I'll finish by describing a goal of distributed, federated BinderHubs that provide a network of reproducible data analytics environments.	https://csvconf.com/img/speakers-2019/choldgraf.jpg
6	6	Data & Social Justice	Dan Phiffer	11:00 AM	May 8 2019	Main Sanctuary	https://csvconf.com/speakers/#dan-phiffer	2019-05-08T11:00:00	This talk will provide an overview of the course I'm currently teaching at Bennington College called Data & Social Justice. I'll outline some of the issues my students have been organizing around, as well as techniques they've developed for doing outreach, using data visualization to support their causes, and describing how I've supported their efforts through my own faculty activism. n.b., I'm only halfway through the semester, but there is already plenty of material for a talk.	https://csvconf.com/img/speakers-2019/dphiffer.jpg
17	17	Social Data: Invading Privacy or Creating Better Cities?	Gala Camacho	2:00 PM	May 8 2019	Main Sanctuary	https://csvconf.com/speakers/#gala-camacho	2019-05-08T14:00:00	Urban designers have long heralded the value of the public realm in creating stronger communities. Yet, their processes and decisions are based around data that is far removed from the community, outdated and/or based on surveys and feedback forums which are generally attended by the same group of people and which can be overtaken by lobbyists. If we want to create cities that place people at the centre, it is essential that we find data about what makes neighbourhoods connected and wholesome, neighbourhoods which provide safe spaces for their community to engage. Social data (data from social media, crowdsourcing, mapping platforms, review apps, etc) can give us an opportunity to understand how people engage in their communities and interact with the places around them. It can be used to provide insights into the social health of local places and identify vulnerabilities, to feel the heartbeat of the neighbourhood. I will talk about what social data is, some of the challenges of getting it and collating it, the data's strengths and weaknesses, and how we are trying using it to make cities more socially connected.	https://csvconf.com/img/speakers-2019/gcamacho.jpg
50	50	A Love Letter to the Boxplot	Melissa Santos	3:30 PM	May 9 2019	Main Sanctuary	https://csvconf.com/speakers/#melissa-santos	2019-05-09T15:30:00	We'll briefly cover what the boxplot is, why it's so great to look at distributions instead of single statistics, and common boxplot variations. I'll spend at least half the talk showing boxplots of real data and comparing them to other summary methods. The talk will wrap up with some quick info on how to create boxplots in many common chartings/statistics/BI tools. I hope this talk will make people more likely to use my favorite chart!	https://csvconf.com/img/speakers-2019/msantos.jpg
53	53	Spanking and Spreadsheets: Data-driven Sex Journalism	Jacqueline Nolis & Heather Nolis	4:00 PM	May 9 2019	Main Sanctuary	https://csvconf.com/speakers/#jacqueline-nolis-heather-nolis	2019-05-09T16:00:00	When we saw that the Stranger, Seattle’s alternative newspaper, was running a survey on kinks and sexual preferences, we knew we had to get our hands on the data. We convinced the that using machine learning methods on the responses would be a good idea, and then we quickly set out to analyzing them. But we had never written an article for a newspaper before—nor had we worked with data even remotely as dirty. It turns out what makes for a good blog post or technical journal is very different than writing for print, especially for such a sensitive topic. In this talk we will cover how we made sense of the lewd data, the statistical methods we used (and failures we produced), as well as the final results that ended up in our feature article: “There Are Four Kinds of Sex Partners (which one are you).”	https://csvconf.com/img/speakers-2019/jnolis_hnolis.jpg
36	36	Should Real Estate Data be Open?	Andy Terrel	11:00 AM	May 9 2019	Daisy Bingham Room	https://csvconf.com/speakers/#andy-terrel	2019-05-09T11:00:00	While aggregators of multiple listing service (MLS) data have opened up much of the process of finding a house on the internet, the data is still closed. The MLS quotes personal security as the primary reason. What data is being protected and what is the impact of that decision? As a consumer of data from numerous sources, REX has routinely been denied access to this data. In this case we make the case for all the societal benefits for opening this data and the implications.	https://csvconf.com/img/speakers-2019/aterrel.jpg
29	29	How to Feed Your Robot: Building and Maintaining Open Machine Learning Datasets	Evan Tachovsky	4:00 PM	May 8 2019	Daisy Bingham Room	https://csvconf.com/speakers/#evan-tachovsky	2019-05-08T16:00:00	While algorithms and computing power get all the press, the special sauce behind many recent machine learning breakthroughs are meticulously labeled training data. Developing and maintaining these data sets as public goods is both an art and a science. In this talk I'll present a new set of best practices gleaned from interview with ~20 data set builders, maintainers, and funders. Topics include: encouraging collaboration between rival data teams; finding and addressing ethical issues with crowd labeling; launching competitions to spur data set use; and revenue generation models for sustainability.	https://csvconf.com/img/speakers-2019/etachovsky.jpg
15	15	Lessons Learned: Creating Space for Inclusive Practices in Academia	Antoinette Foster & Lucille Moore	1:30 PM	May 8 2019	Fuller Hall	https://csvconf.com/speakers/#antoinette-foster-lucille-moore	2019-05-08T13:30:00	With the advent of big data, many people are beginning to explore fighting social inequity and structural systems of oppression with data in order to (1) define the problem and (2) affect changes in policy. We are learning that, for the most part, much of the data around these issues don’t exist, which largely reinforces systems of oppression. At Oregon Health & Science University in Portland, OR, a group of people have come together to focus on the lack of representation of historically underrepresented minorities (URM) in science as well as the lack of inclusive culture within OHSU’s graduate programs. Our group is called the Alliance for Visible Diversity in Science (AVDS). We found that data on a variety of topics, e.g. statistics on the number of URM graduate students that are interviewed/accepted/decide to matriculate, and well-designed climate surveys to assess the culture of inclusivity are lacking. This leads to decision-making and policies based on incomplete data that disproportionately hurts already vulnerable populations. For example, many programs require that applicants report their score on a standardized test called the graduate record examination (GRE) despite the fact that research shows that GRE scores are more highly correlated with socioeconomic status than student success. We would like to share what we have learned through the process of forming AVDS: our successes, our challenges, and the imperative idea that we must in part approach social inequity issues with scientific and data-driven approaches.	https://csvconf.com/img/speakers-2019/afoster_lmoore.jpg
11	11	Bash <3's CSVs: Data Analysis on the cmdline	Nicholas Canzoneri	11:30 AM	May 8 2019	Daisy Bingham Room	https://csvconf.com/speakers/#nicholas-canzoneri	2019-05-08T11:30:00	Your bash shell has a _lot_ utilities that can be used to help you analyze your data, often easier and faster than trying to import your data to an external tool. But these utilities can be hard to find and even harder to figure out the right options. I'll walkthrough a data set and show examples of the best utility to use in different situations. I'll go over common commands like `grep` and `cut`, more exotic commands like `comm` and `tr`, and dig up very useful options to a command you might have overlooked, like `sort -k`.	https://csvconf.com/img/speakers-2019/ncanzoneri.jpg

Link

rowid

title

speaker

time

day

room

url

datetime

abstract ▼

image

Warm Breakfast Buffet / Espresso Cart / Hangout time

9:00 AM

May 8 2019

Main Sanctuary

2019-05-08T09:00:00

Intros / Hello in Main Sanctuary

10:00 AM

May 8 2019

Main Sanctuary

2019-05-08T10:00:00

Lunch in Fuller Hall

12:00 PM

May 8 2019

Main Sanctuary

2019-05-08T12:00:00

KEYNOTE

Dr. Kari L. Jordan

12:30 PM

May 8 2019

Main Sanctuary

2019-05-08T12:30:00

Break

3:00 PM

May 8 2019

Main Sanctuary

2019-05-08T15:00:00

KEYNOTE

Alix Dunn

4:30 PM

May 8 2019

Main Sanctuary

2019-05-08T16:30:00

Reception in Fuller Hall until 7pm

5:30 PM

May 8 2019

Main Sanctuary

2019-05-08T17:30:00

Warm Breakfast Buffet / Espresso Cart / Hangout time

9:00 AM

May 9 2019

Main Sanctuary

2019-05-09T09:00:00

KEYNOTE

Teon L. Brooks

10:00 AM

May 9 2019

Main Sanctuary

2019-05-09T10:00:00

Lunch in Fuller Hall

12:00 PM

May 9 2019

Main Sanctuary

2019-05-09T12:00:00

KEYNOTE

Kirstie Whitaker

12:30 PM

May 9 2019

Main Sanctuary

2019-05-09T12:30:00

Lightning Talks in The Main Sanctuary

1:30 PM

May 9 2019

Main Sanctuary

2019-05-09T13:30:00

Break

3:00 PM

May 9 2019

Main Sanctuary

2019-05-09T15:00:00

Outros/Goodbye in Main Sanctuary

4:30 PM

May 9 2019

Main Sanctuary

2019-05-09T16:30:00

5-6pm Hangout time

5:00 PM

May 9 2019

Main Sanctuary

2019-05-09T17:00:00

Let’s ROR together - building open research organization identifiers

Maria Gould

11:00 AM

May 8 2019

Fuller Hall

https://csvconf.com/speakers/#maria-gould

2019-05-08T11:00:00

https://csvconf.com/img/speakers-2019/comma.jpg

US Energy Data Liberation

Zane Selvans

2:30 PM

May 8 2019

Daisy Bingham Room

https://csvconf.com/speakers/#zane-selvans

2019-05-08T14:30:00

An alphabet soup of government agencies like FERC, EPA, EIA, PHMSA, MSHA and the ISOs and RTOs collect and publish terabytes of data about the US energy system. It includes operating costs and fuel consumption, hourly power output and GHG emissions, and the age and length of natural gas pipelines, the price of electricity every 5 minutes at thousands of nodes in the grid, coal production numbers and much much more. In theory all this data is public and freely available, but in practice it takes a lot of wrangling to make it usable for analysis. The result: it's packaged up by one or two platform monopolies that charge tens of thousands of dollars a year for easy access, excluding most non-corporate users. But for anyone interested in the ongoing transformation of our energy system and its climate impacts, this data is a treasure trove worth excavating. The Public Utility Data Liberation project (https://github.com/catalyst-cooperative/pudl) has been working for the last 2.5 years to liberate this data and make it freely accessible to activists, data journalists, and researchers working on US climate and energy policy. This talk will take a look at what the data is, where it comes from, why it's interesting, how we're processing it and making it available, and some of the challenges we're facing and opportunities we see ahead.

https://csvconf.com/img/speakers-2019/zselvans.jpg

Beyond the WARC: Making Web Archives More Useful and User-friendly

Ilya Kreymer

2:30 PM

May 9 2019

Main Sanctuary

https://csvconf.com/speakers/#ilya-kreymer

2019-05-09T14:30:00

Archives of the web contain not only web pages but any type of data. The only standard in web archiving is the ISO WARC file format, which specifies raw data captured from the web. However, the WARC files often lack any context or metadata about how this data was captured. The talk will briefly cover the basics of the WARC format, and also provide possible ideas for making web archiving data more user-friendly, present existing tools and suggest ideas for interoperable ways to describe collections and make sense of growing web archive data beyond the WARC format.

https://csvconf.com/img/speakers-2019/ikreymer.jpg

The n-of-many-ones: Fueling Community Science with Personal Data

Bastian Greshake Tzovaras

1:30 PM

May 8 2019

Main Sanctuary

https://csvconf.com/speakers/#bastian-greshake-tzovaras

2019-05-08T13:30:00

As we are becoming more and more digitized, we are creating and collecting more personal data than ever before, offering unprecedented chances for research. This potential wealth of data for research comes practical problems such as: How to merge data streams? And how can people responsibly share their personal information? In this talk we will explore how to enable responsible personal data sharing by giving individuals granular sharing options and how this can enable community science. Furthermore, we will also see how we can scale up personal data exploration from the n-of-one to an n-of-many-ones, using a JupyterHub setup built right into a community science platform.

https://csvconf.com/img/speakers-2019/bgtzovaras.jpg

Preparing Clients for Open Source Contributions

Aaron Couch

11:30 AM

May 9 2019

Fuller Hall

https://csvconf.com/speakers/#aaron-couch

2019-05-09T11:30:00

At CivicActions we've developed a number of methodologies to help enable our clients to be a part of the open source community. This talk will focus on a number of those strategies including capture management, project roles and tools, and reporting measures. This talk will be slightly shorter to allow for time for a more collaborative discussion.

https://csvconf.com/img/speakers-2019/acouch.jpg

Data Scavenger Hunts: Learning about Data Together

Ted Laderas

4:00 PM

May 9 2019

Daisy Bingham Room

https://csvconf.com/speakers/#ted-laderas

2019-05-09T16:00:00

Data exploration and visualization are a highly accessible gateway activity to learning data science. In this talk, we discuss our experience with "Data Scavenger Hunts" using web apps to democratize data science and make it accessible to a wide variety of audiences. In order to acheive this, we have developed an R package called `burro` that can enable public datasets to be explored together via a sharable web app. In this talk, we talk about our experience with using data scavenger hunts to teach each other interesting things about data. In particular, we share our experiences with exploring the NHANES (National Health Nutirition Examination Survey) data and the insights we have taught each other. We show that this guided and communal data exploration leads to increased confidence and curiosity about data science in Biodata-Club, our learning community. `burro` apps can be deployed by anyone to start conversations about data.

https://csvconf.com/img/speakers-2019/tladeras.jpg

Missing Data for Data - Our Quest to Clean Up Institutional Affiliations in Dryad

Daniella Lowenberg, Ted Habermann

4:00 PM

May 8 2019

Main Sanctuary

https://csvconf.com/speakers/#daniella-lowenberg-ted-habermann

2019-05-08T16:00:00

Data publications and other scholarly outputs do not have clean information on institutional affiliations for researchers. This is caused by a mix of not asking researchers for this information up front, as well as incomplete metadata being submitted by repositories to DataCite and (publications to) Crossref. Without this standardized information we can't properly report on or provide statistics on deposits, usage metrics, or reach by institution. Join us for a session about our work using OpenRefine, organizational identifiers (ROR), and some manual sleuthing to update and improve Dryad institutional metadata for 25,000 data publications.

https://csvconf.com/img/speakers-2019/dlowenberg_thabermann.jpg

Datasette

Simon Willison

2:00 PM

May 9 2019

Daisy Bingham Room

https://csvconf.com/speakers/#simon-willison

2019-05-09T14:00:00

Datasette is a tool for instantly publishing structured data on the internet. It makes it easy to construct and execute arbitrary SQL queries (using SQLite) and export the results as CSV. It's accompanying tool csvs-to-sqlite makes it easy to convert CSV files into a SQLite database. More info at https://github.com/simonw/datasette

https://csvconf.com/img/speakers-2019/swillison.jpg

The Time is Now

Afua Bruce

10:30 AM

May 8 2019

Main Sanctuary

https://csvconf.com/speakers/#afua-bruce

2019-05-08T10:30:00

Despite the tech world’s image of being fast-moving and constantly evolving, segments of those working in, or wanting to work in, tech are often told to wait. It’s no secret that the tech and data worlds do not reflect the nation’s diversity. And for those of us working in Civic Tech or Public Interest Technology, the struggle to secure long-term funding for projects or identify career paths is real. What if we shifted our mindset from “with a lot of time and a lot of work, we’ll figure it out,” to “let’s experiment and incite change today.” The time is now to tackle the question: as the data-driven community matures, how does it do so in a way that’s inclusive and sustainable?

https://csvconf.com/img/speakers-2019/abruce.jpg

Building Communities of Practice around Environmental Open Data Science

Julia Lowndes

3:30 PM

May 8 2019

Fuller Hall

https://csvconf.com/speakers/#julia-lowndes

2019-05-08T15:30:00

Environmental scientists are a diverse community that ranges from climatologists to geneticists, but we are united by an enormous need to work efficiently with data – and by the fact that we seldom have formal computing or data analysis training of any kind. There is great opportunity to borrow from the work of software engineers and use collaborative open tools that facilitate better science in less time. However, a fundamental shift is needed in the environmental science community that prioritizes data science and provides emerging scientific leaders training in open science tools and practices to strengthen and accelerate their work. I will discuss my work to catalyze this shift through two programs I have developed and lead at the National Center for Ecological Analysis and Synthesis (NCEAS) at the University of California at Santa Barbara. The first is the Ocean Health Index training program, which teaches international government and academic scientists how to channel the best available scientific information into marine policy using our scientific method and tools. And the second I have recently launched in January 2019 as a Mozilla Fellow: Openscapes, a mentorship program that empowers environmental scientists with open data science tools and grows the community of practice.

https://csvconf.com/img/speakers-2019/jlowndes.jpg

Frictionless Data Processing in the Wild

Amber D. York

10:30 AM

May 8 2019

Daisy Bingham Room

https://csvconf.com/speakers/#amber-d-york

2019-05-08T10:30:00

Frictionless Data (FD) initiatives out of Open Knowledge International provide attractive informatics and processing capabilities. The BCO-DMO data repository used FD tools on real-world datasets, and we have some lessons learned to share. By building upon existing FD tools, we found ways to reduce the amount of time data managers spend generating metadata, and writing custom scripts. We are also developing ways for data managers with varying levels of scripting ability to make use of Frictionless Data tools.

https://csvconf.com/img/speakers-2019/adyork.jpg

Project Athena: Mapping African Militaries for Good

John Stupart

11:30 AM

May 8 2019

Fuller Hall

https://csvconf.com/speakers/#john-stupart

2019-05-08T11:30:00

I will discuss ADR's project aimed at creating a database mapping out, literally, where each and every significant item in a country's military is. Tanks, planes, barracks and the like are being categorised and placed in our custom-made "Athena" database. Aimed at pulling open the lid on arms flows into Africa, Athena is suited for journalists and those in the humanitarian or academic field alike working in anti-corruption and transparency as it pertains to defence and military affairs.

https://csvconf.com/img/speakers-2019/jstupart.jpg

How a File Format Led to a Crossword Scandal

Saul Pwanson

2:30 PM

May 9 2019

Fuller Hall

https://csvconf.com/speakers/#saul-pwanson

2019-05-09T14:30:00

In 2016 I designed a plain-text file format for crossword puzzle data, and then spent a couple of months building a micro-data-pipeline, scraping tens of thousands of crosswords from various sources. Then, having all those crosswords in a simple format, I wanted to see if there were any common grid patterns--and discovered egregious plagiarism by a major crossword editor that had gone on for years. This talk would cover the file format, data pipeline, and the design choices that aided rapid exploration; the evidence for the scandal, from the initial anomalies to the final damning visualization; and what it's like for a data project to get 15 minutes of fame.

https://csvconf.com/img/speakers-2019/spwanson.jpg

How open data can promote participatory democracy

Hector Dominguez

2:30 PM

May 9 2019

Daisy Bingham Room

https://csvconf.com/speakers/#hector-dominguez

2019-05-09T14:30:00

In this discussion, I will explore the nuances of building an open data program as a step towards participatory democracy and the challenges of creating trust with local communities.

https://csvconf.com/img/speakers-2019/hdominguez.jpg

Annotations in the Classroom; The Classroom in Annotations

Asura Enkhbayar

4:00 PM

May 9 2019

Fuller Hall

https://csvconf.com/speakers/#asura-enkhbayar

2019-05-09T16:00:00

In this talk I want to explore the impact of using Hypothesis in the classroom. What does it mean to read, think, and annotate publicly? How does it change your learning experience as a student? How do you evaluate and assess different annotation styles as a teacher? As a student I can share my own experience of this new mode of teaching and learning. As a data scientist, I want to give a taste of possible new metrics and measurements based on annotation data. Finally, as a critical scholar I am hoping to explore how this new metrification and monitoring of reading might affect education. The talk will rely on data outlined in this essay: https://course-journals.lib.sfu.ca/index.php/pdc2018/article/view/240/213

https://csvconf.com/img/speakers-2019/aenkhbayar.jpg

Data Science Training and Community Building through Hackweeks

Micaela Parker

2:00 PM

May 9 2019

Main Sanctuary

https://csvconf.com/speakers/#micaela-parker

2019-05-09T14:00:00

Informal training activities enable researchers at all levels to rapidly learn data science tools and best practices that fit their research questions and make significant advances in their work. In this talk, I will describe a highly successful informal training that has emerged in recent years called Hackweeks. These hackathon-style events place a strong focus on cultivating data science literacy, building a community of practice, and developing resources within an existing domain-specific community. By bringing together researchers from many different universities to address methods challenges within a research domain, Hackweeks take advantage of a shared language and shared scientific objectives. The Hackweek structure is designed to foster collaboration and learning among people from various stages of their career and technical abilities, and catalyze a community through a shared interest in solving computational challenges within a field (Huppenkothen et al, 2018). Hackweeks originally came out of the Astronomy community (Astro Hack Week, entering its 6th year in 2019) and the model has been successfully propagated to: neuroscience (Neurohackweek, now a 2-week NIH-funded program called Neurohackademy), geospatial sciences (Geohackweek), oceanography (Oceanhackweek), and more.

https://csvconf.com/img/speakers-2019/mparker.jpg

What's Next after Notebooks?

Alexander Morley

11:00 AM

May 8 2019

Daisy Bingham Room

https://csvconf.com/speakers/#alexander-morley

2019-05-08T11:00:00

Jane is a data scientist. Jane uses Jupyter notebooks as her working environment, and her presentation environment. These “computational essays” allow Jane to present her methods and her results to her colleagues at the same time. Jane is happy with this. But sometimes it’s difficult for Jane to share notebooks with her colleagues, and even harder for them to re-mix or re-use parts of the notebook, or to share their changes back to Jane. And sometimes Jane finds it hard to explain the flow of a particular notebook, or how different notebooks are tied together. There’s no provision for keeping things modular. First, I will discuss a few up-and-coming projects that are leveraging the power of new web technologies and faster browsers to solve all of fictional Jane’s problems, and more. Second, I will present a prototype for my own solution that is also web-based, and draws inspiration from some now-uncool graphical programming languages.

https://csvconf.com/img/speakers-2019/amorley.jpg

Crafting Data-Driven Stories for the Everyday Reader

Marisa Aquilina

2:00 PM

May 9 2019

Fuller Hall

https://csvconf.com/speakers/#marisa-aquilina

2019-05-09T14:00:00

Journalists don’t write for other journalists—they write for the curious and community-minded public. In the same way, statistical journalism should not be a black box of visualizations and narrative meant only for data makers like us. Crafting data-driven stories for a general audience means giving readers an opportunity to interact with a fun and practical use case while explaining the interpretative thinking that lies under the hood of statistical methods. I am an undergraduate at Cal Poly that writes and builds interactive, data-driven publishings with a team of students. I'll walk you through how we ideate fascinating questions, make methods explainable, and use Jupyter Notebooks to share reproducible code.

https://csvconf.com/img/speakers-2019/maquilina.jpg

Version Controlled Stakeholder Reporting: Building an End-to-End Data Reporting Infrastructure

Jose M Hernandez

3:30 PM

May 9 2019

Daisy Bingham Room

https://csvconf.com/speakers/#jose-m-hernandez

2019-05-09T15:30:00

King County, Washington is currently undergoing complex social and economic changes that have both positive and negative impacts on local residents. With rising rents displacing low-income households to outlying areas or into homelessness, there is a critical need to understand the prevalence and mechanisms of housing insecurity for government organizations tasked to address these issues. Currently, our team of Data and social scientists at the University of Washington, eScience Institute are collaborating with stakeholders across the King County Housing and Homelessness prevention agencies to derive meaningful insights from their data. While their aim is not to produce academic research, our findings may have significant and immediate impact for their organizational practices and the communities they are tasked to serve. In this context and where there is an iterative and constant feedback loop present, reproducibility of the results we present to them, from figures, tables, and even written language is critical. To ensure a successful collaboration, our team has built an end to end data reporting infrastructure to produce reports for our stakeholders that are reproducible and version controlled from raw data to final product. We employ some common open source tools to accomplish this, including R/Rstudio, Python, Rmarkdown, and git.

https://csvconf.com/img/speakers-2019/jmhernandez.jpg

Measurement Lab - Open Data on Global Internet Health

Chris Ritzo

1:30 PM

May 8 2019

Daisy Bingham Room

https://csvconf.com/speakers/#chris-ritzo

2019-05-08T13:30:00

Measurement Lab (M-Lab) is the largest open internet measurement platform in the world, hosting internet-scale measurement experiments and releasing all data into the public domain (CC0). We are an open source project with contributors from civil society organizations, educational institutions, and private sector companies, and are a fiscally sponsored project of Code for Science & Society. Our mission is to Measure the Internet, save the data, and make it universally accessible and useful. M-Lab works to advance network research and empowers the public with useful information about broadband and mobile connections by maintaining a scalable, global platform for conducting internet measurements, and by supporting an ecosystem of external partners and users around the world interested in using the resulting open data. Our users are researchers, activists, analysts, journalists, experiment developers, hosting providers, regulators, municipalities, and every day consumers. M-Lab works to enhance internet transparency, and help to promote and sustain a healthy, innovative internet by supporting our users in their research and data analyses, developing and publicizing new use cases for our datasets, forming collaborative partnerships, and building open source measurement tools. In this talk we will introduce the M-Lab platform with the csvconf audience, share how our open data and open source tools are being used by communities around the world, and provide resources on how attendees might use them as well.

https://csvconf.com/img/speakers-2019/critzo.jpg

Improving the Quality of Neuroimaging Scans

Jonathan Uriarte-Lopez

11:30 AM

May 9 2019

Main Sanctuary

https://csvconf.com/speakers/#jonathan-uriarte-lopez

2019-05-09T11:30:00

My presentation will be on how adjustments to the human connectome project (HCP) pipeline, with the use of the advanced normalization tools (ANTS), improved the data quality of neuroimaging scans provided by the Autism Brain Imaging Data Exchange (ABIDE). Autism spectrum disorder (ASD) is a neurodevelopmental disorder consisting of altered social and communication difficulties along with repetitive and restrictive behaviors. It is difficult to study a living brain safely which is why we use neuroimaging techniques such as MRI. Data quality can be affected by subjects moving in the scanner, or due to computing pipeline issues. Adjustments to the HCP pipeline lead to an increase in data quality, and a decrease in the amount of data lost. This will save researchers time, money, and data to study the neurophysiological aspects of ASD.

https://csvconf.com/img/speakers-2019/julopez.jpg

Squishy Amoeba-Like Objects

Darius Kazemi

3:30 PM

May 9 2019

Fuller Hall

https://csvconf.com/speakers/#darius-kazemi

2019-05-09T15:30:00

On June 19th, 1970, a group of computer scientists who were inventing the internet referred to key pieces of its proposed design as "squishy amoeba-like objects". Amoebas are porous yet have well-defined boundaries. Thinking about these creatures gives us new ways to think about networks and communities and technology. This talk makes a case for the squishy amoeba-like object as an organizing principle for what is broadly being called "the decentralized web", a web outside of monolithic, monopolistic actors.

https://csvconf.com/img/speakers-2019/dkazemi.jpg

Where Has Your Data Come From? Data Ancestry and Other Tales

Dr. Tania Allard

4:00 PM

May 8 2019

Fuller Hall

https://csvconf.com/speakers/#dr-tania-allard

2019-05-08T16:00:00

Over the last few years, great improvements have been made around the areas of reproducible scientific computing research and FAIR (findable, accessible, interoperable and reusable) data. As a consequence, data scientists and researchers alike have started to incorporate modern software development practices in their workflows (i.e. version control, testing). More and more emphasis has been made on the need to look after the quality and validity of the software developed. But what about the data? Data validation and integrity is just as important as the adequacy of the code ingesting and processing the datasets. In this talk, I will take a high-level look at concepts such as data lineage, provenance, continuous data validation and present real-world examples in which these concepts have been applied to different real-world data pipelines increasing not only the confidence of the results obtained but also the efficiency and integrity of the workflows themselves.

https://csvconf.com/img/speakers-2019/tallard.jpg

Using Research and Technology to Tackle Gender Bias

Mollie Marr

2:30 PM

May 8 2019

Fuller Hall

https://csvconf.com/speakers/#mollie-marr

2019-05-08T14:30:00

Qualitative and mixed methods research studies have provided insight into the language and patterns associated with unconscious bias across multiple fields. These patterns can be converted to rules for NLP and other text analysis programs making it possible to identify bias within a written document. This talk will explore one approach using qualitative research in gender bias and letters of recommendation and evaluation to define rules for a web-based automated text analysis program using NLP. The role of research and technology in addressing structural issues such as bias will be discussed and participants will be encouraged to think about ways in which existing research might be used to inspire new solutions to social problems.

https://csvconf.com/img/speakers-2019/mmarr.jpg

Qualitative Research Using Open Source Tools

Beth Duckles & Vicky Steeves

3:30 PM

May 8 2019

Main Sanctuary

https://csvconf.com/speakers/#beth-duckles-vicky-steeves

2019-05-08T15:30:00

Qualitative research has long suffered from a lack of free tools for analysis, leaving no options for researchers without significant funds for software licenses. This presents significant challenges for equity. This panel discussion will explore the first two free/libre open source qualitative analysis tools out there: qcoder (R package) and Taguette (desktop application). Drawing from the diverse backgrounds of the presenters (social science, library & information science, software engineering), we will discuss what openness and extensibility means for qualitative research, and how the two tools we've built facilitate equitable, open sharing.

https://csvconf.com/img/speakers-2019/bduckles_vsteeves.jpg

Chromatocracy: The Pantone® of Mexican Social Mobility.

Adrian Santuario Hernández

11:30 AM

May 8 2019

Main Sanctuary

https://csvconf.com/speakers/#adrian-santuario-hernández

2019-05-08T11:30:00

Skin colour ratings have been used in several studies about racial discrimination and racial attitudes but have rarely been used in Mexico. Although since 1917 the Mexican Constitution establish a legal equality of citizens without distinction as to race, sex, language or religion, it is common to see several discrimination on work spaces, educational facilities and government offices based on skin colour. Despite the last surveys leaded for INEGI (National Institute of Geography and Statistics) shows signals of racial discrimination in his reports, a glance at the map will suffice to see clearly that colour skin is an important issue for mexican social mobility. For example: 95% of the presenters on Mexican TV Shows have 1-3 colour skin tone (Based on PERLA Colour Palette) while 85% of the total Mexican population have 5-7 colour skin tone; that gap on the tones generate an aspirational sentiment of status: whiter is better. To support that correlation between skin color and social mobility I developed a Web Scraping, Machine Learning and Facial Recognition algorithm to answer two questions: Who is more successful in Mexico? (95 percent of CEO tend to have whither color skin (1-3 PERLA) that the rest) and, Are there a correlation between your political affiliation an your tone skin? (Right-wing party (PVEM) is wither that the left-wing party (PRD)). The work demonstrate how technology (Machine Learning Color Algorithm) can help to unravel hidden social dynamics in mexican culture.

https://csvconf.com/img/speakers-2019/ashernández.jpg

Fundamentals of Research Software Sustainability

Daniel S. Katz

3:30 PM

May 8 2019

Daisy Bingham Room

https://csvconf.com/speakers/#daniel-s-katz

2019-05-08T15:30:00

Software sustainability means different things to different groups of people, including the persistence of working software, and the persistence of people, or funding. While we can generally define sustainability as the inflow of resources is sufficient to do the needed work, where those resources both include and are somewhat transferrable into human effort, users, funders, managers, and developers (or maintainers) all mean somewhat different things when they use sustainable in the context of research software. This talk will illustrate some of these different views, and their corresponding aims. It will also provide some guidance on quantifying research software sustainability from some of these views.

https://csvconf.com/img/speakers-2019/dskatz.jpg

The Data to Policy Project: Using Data to Build More Equitable Communities

Melissa Mejia

11:00 AM

May 9 2019

Fuller Hall

https://csvconf.com/speakers/#melissa-mejia

2019-05-09T11:00:00

The Data to Policy Project (D2P) is an initiative creating meaningful learning experiences for students by using analysis of open data to generate equity and evidence-based policy proposals addressing local community needs. D2P is integrated into credit-bearing courses where students explore issues like policing and affordable housing in the Denver region. Over the course of a semester, students find, cite, clean, analyze, and visualize data to identify gaps or problems in policing or affordable housing, create policy proposals that address what they found, then create a research poster to communicate their findings. We encourage a critical approach to data literacy that questions the objectivity and neutrality of data and situate it in a socio-political context. The project culminates in a D2P Symposium where students present their research to their peers, faculty, staff, and community members. By focusing on student-initiated concerns and using real data to try and address them, D2P forms a connection between the courses students take and the communities they live in, increasing its meaning and impact. We also partner with local community organizations, governments, and nonprofits to identify and frame the research questions students explore. Our goal is to intentionally include community voices so that the research we work on is relevant, context-specific, and in the interest of the community it will impact. This presentation will communicate the challenges and benefits of this kind of work, how it can be replicated in other contexts, and invite feedback on how to improve the project.

https://csvconf.com/img/speakers-2019/mmejia.jpg

Digging for Urban and Civic Data in Eastern Europe

Gleb Kanunnikau

10:30 AM

May 8 2019

Fuller Hall

https://csvconf.com/speakers/#gleb-kanunnikau

2019-05-08T10:30:00

The Open Data and Open Knowledge movement in developing countries is constantly bumping against the paradoxical question: is there a way to open up data when the datasets aren't yet present? Where do you get the data to make your effort worthwhile and how do you scale to make the initiative relevant to the larger society? More importantly, do the data producers (the state bodies and the surveillance apparatus, social networks or municipal service providers) share the interest of publishing data for everyone's and society's benefit? Is there a way to dig out interesting and relevant datasets that people didn't know existed to show new ways to analyze and fight urban problems, educate people about the environment, propose solutions to post-Soviet problems like unemployment, decaying public infrastructure, never-digitized cultural assets? Do we cooperate with the fashionable "Smart city" projects funded by Chinese state corporations or remain vary of the surveillance methods they introduce in cooperation with the police and other government structure? Practical cases, civic hacking and citizen data science, establishing cooperation between unlikely partners and other questions of interest for anyone interested in the community building process from scratch.

https://csvconf.com/img/speakers-2019/gkanunnikau.jpg

How to Build a Data-Driven Culture

Patrick McGarry

11:30 AM

May 9 2019

Daisy Bingham Room

https://csvconf.com/speakers/#patrick-mcgarry

2019-05-09T11:30:00

The world of modern data teamwork isn't one that can be created by software and business process alone. Individuals will need to alter their behavior, which is the hardest part about change. This talk will examine the traits and behaviors that lead organizations to be truly data-driven.

https://csvconf.com/img/speakers-2019/pmcgarry.jpg

The Streets of Women. An Analysis of Street Nomenclature Data in Latin America and Spain through OpenStreetMap and Wikipedia

Selene Yang

2:30 PM

May 8 2019

Main Sanctuary

https://csvconf.com/speakers/#selene-yang

2019-05-08T14:30:00

This is a collaborative project of Geochicas to produce a map of the streets named after women in Latin America and Spain. This project seeks to link and generate content in OSM and Wikipedia about prominent women. It is intended to make a survey of information on streets, avenues, passages, roads that have the names of women and also their respective biographies in Wikipedia.

https://csvconf.com/img/speakers-2019/syang.jpg

Data Analysis to Improve Diversity and Equity in Graduate-Level Education

Rachel Mallinga

2:00 PM

May 8 2019

Fuller Hall

https://csvconf.com/speakers/#rachel-mallinga

2019-05-08T14:00:00

This project grew from the need to determine what students of diverse backgrounds need to feel welcomed and represented in their graduate department at the University of Oregon. Two women of color took the initiative to conduct qualitative and quantitative research on how equity and diversity are represented in curriculum, services, and departmental resources. Based on our findings we researched resources on campus that address the problems identified in our data and best practices for graduate education implemented by similar graduate-level programs in Oregon. This talk illustrates how research methods can be used to inform institutional policies and practices to improve diversity and equity.

https://csvconf.com/img/speakers-2019/rmallinga.jpg

Hacking Open Data in Africa

Soila Kenya

2:00 PM

May 8 2019

Daisy Bingham Room

https://csvconf.com/speakers/#soila-kenya

2019-05-08T14:00:00

This talk will cover the tips & tricks of community-sourcing for openAFRICA.net - the largest independent repository of open data on the African continent - used in order to digitise deadwood to give citizens actionable information. Data availability in many African countries is dismal. Files upon files of important government information lay gathering dust in abandoned storage rooms. On the other hand, journalists and citizens need this information to keep governments in check and ensure they are receiving the right services. So how do you turn paper-based government archives into machine readable & API accessible digital files?

https://csvconf.com/img/speakers-2019/skenya.jpg

Open Infrastructure for Open Science: How Binder Powers an Open Stack in the Cloud

Chris Holdgraf

11:00 AM

May 9 2019

Main Sanctuary

https://csvconf.com/speakers/#chris-holdgraf

2019-05-09T11:00:00

This talk will discuss the Binder Project in the context of open data and open science, two primary use-cases that have driven the project. It will cover the basics of the Binder Project, such as how to define a reproducible repository to share with others. It will then discuss one of Binder's core goals, which is to build on open standards to facilitate the use of *many* open languages, interfaces, etc. Finally I'll discuss how BinderHub, the technology behind a Binder deployment, is itself open source and deployable anywhere. I'll finish by describing a goal of distributed, federated BinderHubs that provide a network of reproducible data analytics environments.

https://csvconf.com/img/speakers-2019/choldgraf.jpg

Data & Social Justice

Dan Phiffer

11:00 AM

May 8 2019

Main Sanctuary

https://csvconf.com/speakers/#dan-phiffer

2019-05-08T11:00:00

This talk will provide an overview of the course I'm currently teaching at Bennington College called Data & Social Justice. I'll outline some of the issues my students have been organizing around, as well as techniques they've developed for doing outreach, using data visualization to support their causes, and describing how I've supported their efforts through my own faculty activism. n.b., I'm only halfway through the semester, but there is already plenty of material for a talk.

https://csvconf.com/img/speakers-2019/dphiffer.jpg

Social Data: Invading Privacy or Creating Better Cities?

Gala Camacho

2:00 PM

May 8 2019

Main Sanctuary

https://csvconf.com/speakers/#gala-camacho

2019-05-08T14:00:00

Urban designers have long heralded the value of the public realm in creating stronger communities. Yet, their processes and decisions are based around data that is far removed from the community, outdated and/or based on surveys and feedback forums which are generally attended by the same group of people and which can be overtaken by lobbyists. If we want to create cities that place people at the centre, it is essential that we find data about what makes neighbourhoods connected and wholesome, neighbourhoods which provide safe spaces for their community to engage. Social data (data from social media, crowdsourcing, mapping platforms, review apps, etc) can give us an opportunity to understand how people engage in their communities and interact with the places around them. It can be used to provide insights into the social health of local places and identify vulnerabilities, to feel the heartbeat of the neighbourhood. I will talk about what social data is, some of the challenges of getting it and collating it, the data's strengths and weaknesses, and how we are trying using it to make cities more socially connected.

https://csvconf.com/img/speakers-2019/gcamacho.jpg

A Love Letter to the Boxplot

Melissa Santos

3:30 PM

May 9 2019

Main Sanctuary

https://csvconf.com/speakers/#melissa-santos

2019-05-09T15:30:00

We'll briefly cover what the boxplot is, why it's so great to look at distributions instead of single statistics, and common boxplot variations. I'll spend at least half the talk showing boxplots of real data and comparing them to other summary methods. The talk will wrap up with some quick info on how to create boxplots in many common chartings/statistics/BI tools. I hope this talk will make people more likely to use my favorite chart!

https://csvconf.com/img/speakers-2019/msantos.jpg

Spanking and Spreadsheets: Data-driven Sex Journalism

Jacqueline Nolis & Heather Nolis

4:00 PM

May 9 2019

Main Sanctuary

https://csvconf.com/speakers/#jacqueline-nolis-heather-nolis

2019-05-09T16:00:00

When we saw that the Stranger, Seattle’s alternative newspaper, was running a survey on kinks and sexual preferences, we knew we had to get our hands on the data. We convinced the that using machine learning methods on the responses would be a good idea, and then we quickly set out to analyzing them. But we had never written an article for a newspaper before—nor had we worked with data even remotely as dirty. It turns out what makes for a good blog post or technical journal is very different than writing for print, especially for such a sensitive topic. In this talk we will cover how we made sense of the lewd data, the statistical methods we used (and failures we produced), as well as the final results that ended up in our feature article: “There Are Four Kinds of Sex Partners (which one are you).”

https://csvconf.com/img/speakers-2019/jnolis_hnolis.jpg

Should Real Estate Data be Open?

Andy Terrel

11:00 AM

May 9 2019

Daisy Bingham Room

https://csvconf.com/speakers/#andy-terrel

2019-05-09T11:00:00

While aggregators of multiple listing service (MLS) data have opened up much of the process of finding a house on the internet, the data is still closed. The MLS quotes personal security as the primary reason. What data is being protected and what is the impact of that decision? As a consumer of data from numerous sources, REX has routinely been denied access to this data. In this case we make the case for all the societal benefits for opening this data and the implications.

https://csvconf.com/img/speakers-2019/aterrel.jpg

How to Feed Your Robot: Building and Maintaining Open Machine Learning Datasets

Evan Tachovsky

4:00 PM

May 8 2019

Daisy Bingham Room

https://csvconf.com/speakers/#evan-tachovsky

2019-05-08T16:00:00

While algorithms and computing power get all the press, the special sauce behind many recent machine learning breakthroughs are meticulously labeled training data. Developing and maintaining these data sets as public goods is both an art and a science. In this talk I'll present a new set of best practices gleaned from interview with ~20 data set builders, maintainers, and funders. Topics include: encouraging collaboration between rival data teams; finding and addressing ethical issues with crowd labeling; launching competitions to spur data set use; and revenue generation models for sustainability.

https://csvconf.com/img/speakers-2019/etachovsky.jpg

Lessons Learned: Creating Space for Inclusive Practices in Academia

Antoinette Foster & Lucille Moore

1:30 PM

May 8 2019

Fuller Hall

https://csvconf.com/speakers/#antoinette-foster-lucille-moore

2019-05-08T13:30:00

With the advent of big data, many people are beginning to explore fighting social inequity and structural systems of oppression with data in order to (1) define the problem and (2) affect changes in policy. We are learning that, for the most part, much of the data around these issues don’t exist, which largely reinforces systems of oppression. At Oregon Health & Science University in Portland, OR, a group of people have come together to focus on the lack of representation of historically underrepresented minorities (URM) in science as well as the lack of inclusive culture within OHSU’s graduate programs. Our group is called the Alliance for Visible Diversity in Science (AVDS). We found that data on a variety of topics, e.g. statistics on the number of URM graduate students that are interviewed/accepted/decide to matriculate, and well-designed climate surveys to assess the culture of inclusivity are lacking. This leads to decision-making and policies based on incomplete data that disproportionately hurts already vulnerable populations. For example, many programs require that applicants report their score on a standardized test called the graduate record examination (GRE) despite the fact that research shows that GRE scores are more highly correlated with socioeconomic status than student success. We would like to share what we have learned through the process of forming AVDS: our successes, our challenges, and the imperative idea that we must in part approach social inequity issues with scientific and data-driven approaches.

https://csvconf.com/img/speakers-2019/afoster_lmoore.jpg

Bash <3's CSVs: Data Analysis on the cmdline

Nicholas Canzoneri

11:30 AM

May 8 2019

Daisy Bingham Room

https://csvconf.com/speakers/#nicholas-canzoneri

2019-05-08T11:30:00

Your bash shell has a _lot_ utilities that can be used to help you analyze your data, often easier and faster than trying to import your data to an external tool. But these utilities can be hard to find and even harder to figure out the right options. I'll walkthrough a data set and show examples of the best utility to use in different situations. I'll go over common commands like `grep` and `cut`, more exotic commands like `comm` and `tr`, and dig up very useful options to a command you might have overlooked, like `sort -k`.

https://csvconf.com/img/speakers-2019/ncanzoneri.jpg

Advanced export

JSON shape: default, array, newline-delimited

CREATE TABLE [talks] ( [title] TEXT, [speaker] TEXT, [time] TEXT, [day] TEXT, [room] TEXT, [url] TEXT, [datetime] TEXT, [abstract] TEXT, [image] TEXT )