csvconf: talks: 26 rows where where day = "May 9 2019" sorted by speaker

talks

26 rows where day = "May 9 2019" sorted by speaker

Archives of the web contain not only web pages but any type of data. The only standard in web archiving is the ISO WARC file format, which specifies raw data captured from the web. However, the WARC files often lack any context or metadata about how this data was captured. The talk will briefly cover the basics of the WARC format, and also provide possible ideas for making web archiving data more user-friendly, present existing tools and suggest ideas for interoperable ways to describe collections and make sense of growing web archive data beyond the WARC format. 1
At CivicActions we've developed a number of methodologies to help enable our clients to be a part of the open source community. This talk will focus on a number of those strategies including capture management, project roles and tools, and reporting measures. This talk will be slightly shorter to allow for time for a more collaborative discussion. 1
Data exploration and visualization are a highly accessible gateway activity to learning data science. In this talk, we discuss our experience with "Data Scavenger Hunts" using web apps to democratize data science and make it accessible to a wide variety of audiences. In order to acheive this, we have developed an R package called `burro` that can enable public datasets to be explored together via a sharable web app. In this talk, we talk about our experience with using data scavenger hunts to teach each other interesting things about data. In particular, we share our experiences with exploring the NHANES (National Health Nutirition Examination Survey) data and the insights we have taught each other. We show that this guided and communal data exploration leads to increased confidence and curiosity about data science in Biodata-Club, our learning community. `burro` apps can be deployed by anyone to start conversations about data. 1
Datasette is a tool for instantly publishing structured data on the internet. It makes it easy to construct and execute arbitrary SQL queries (using SQLite) and export the results as CSV. It's accompanying tool csvs-to-sqlite makes it easy to convert CSV files into a SQLite database. More info at https://github.com/simonw/datasette 1
In 2016 I designed a plain-text file format for crossword puzzle data, and then spent a couple of months building a micro-data-pipeline, scraping tens of thousands of crosswords from various sources. Then, having all those crosswords in a simple format, I wanted to see if there were any common grid patterns--and discovered egregious plagiarism by a major crossword editor that had gone on for years. This talk would cover the file format, data pipeline, and the design choices that aided rapid exploration; the evidence for the scandal, from the initial anomalies to the final damning visualization; and what it's like for a data project to get 15 minutes of fame. 1
In this discussion, I will explore the nuances of building an open data program as a step towards participatory democracy and the challenges of creating trust with local communities. 1
In this talk I want to explore the impact of using Hypothesis in the classroom. What does it mean to read, think, and annotate publicly? How does it change your learning experience as a student? How do you evaluate and assess different annotation styles as a teacher? As a student I can share my own experience of this new mode of teaching and learning. As a data scientist, I want to give a taste of possible new metrics and measurements based on annotation data. Finally, as a critical scholar I am hoping to explore how this new metrification and monitoring of reading might affect education. The talk will rely on data outlined in this essay: https://course-journals.lib.sfu.ca/index.php/pdc2018/article/view/240/213 1
Informal training activities enable researchers at all levels to rapidly learn data science tools and best practices that fit their research questions and make significant advances in their work. In this talk, I will describe a highly successful informal training that has emerged in recent years called Hackweeks. These hackathon-style events place a strong focus on cultivating data science literacy, building a community of practice, and developing resources within an existing domain-specific community. By bringing together researchers from many different universities to address methods challenges within a research domain, Hackweeks take advantage of a shared language and shared scientific objectives. The Hackweek structure is designed to foster collaboration and learning among people from various stages of their career and technical abilities, and catalyze a community through a shared interest in solving computational challenges within a field (Huppenkothen et al, 2018). Hackweeks originally came out of the Astronomy community (Astro Hack Week, entering its 6th year in 2019) and the model has been successfully propagated to: neuroscience (Neurohackweek, now a 2-week NIH-funded program called Neurohackademy), geospatial sciences (Geohackweek), oceanography (Oceanhackweek), and more. 1
Journalists don’t write for other journalists—they write for the curious and community-minded public. In the same way, statistical journalism should not be a black box of visualizations and narrative meant only for data makers like us. Crafting data-driven stories for a general audience means giving readers an opportunity to interact with a fun and practical use case while explaining the interpretative thinking that lies under the hood of statistical methods. I am an undergraduate at Cal Poly that writes and builds interactive, data-driven publishings with a team of students. I'll walk you through how we ideate fascinating questions, make methods explainable, and use Jupyter Notebooks to share reproducible code. 1
King County, Washington is currently undergoing complex social and economic changes that have both positive and negative impacts on local residents. With rising rents displacing low-income households to outlying areas or into homelessness, there is a critical need to understand the prevalence and mechanisms of housing insecurity for government organizations tasked to address these issues. Currently, our team of Data and social scientists at the University of Washington, eScience Institute are collaborating with stakeholders across the King County Housing and Homelessness prevention agencies to derive meaningful insights from their data. While their aim is not to produce academic research, our findings may have significant and immediate impact for their organizational practices and the communities they are tasked to serve. In this context and where there is an iterative and constant feedback loop present, reproducibility of the results we present to them, from figures, tables, and even written language is critical. To ensure a successful collaboration, our team has built an end to end data reporting infrastructure to produce reports for our stakeholders that are reproducible and version controlled from raw data to final product. We employ some common open source tools to accomplish this, including R/Rstudio, Python, Rmarkdown, and git. 1
My presentation will be on how adjustments to the human connectome project (HCP) pipeline, with the use of the advanced normalization tools (ANTS), improved the data quality of neuroimaging scans provided by the Autism Brain Imaging Data Exchange (ABIDE). Autism spectrum disorder (ASD) is a neurodevelopmental disorder consisting of altered social and communication difficulties along with repetitive and restrictive behaviors. It is difficult to study a living brain safely which is why we use neuroimaging techniques such as MRI. Data quality can be affected by subjects moving in the scanner, or due to computing pipeline issues. Adjustments to the HCP pipeline lead to an increase in data quality, and a decrease in the amount of data lost. This will save researchers time, money, and data to study the neurophysiological aspects of ASD. 1
On June 19th, 1970, a group of computer scientists who were inventing the internet referred to key pieces of its proposed design as "squishy amoeba-like objects". Amoebas are porous yet have well-defined boundaries. Thinking about these creatures gives us new ways to think about networks and communities and technology. This talk makes a case for the squishy amoeba-like object as an organizing principle for what is broadly being called "the decentralized web", a web outside of monolithic, monopolistic actors. 1
The Data to Policy Project (D2P) is an initiative creating meaningful learning experiences for students by using analysis of open data to generate equity and evidence-based policy proposals addressing local community needs. D2P is integrated into credit-bearing courses where students explore issues like policing and affordable housing in the Denver region. Over the course of a semester, students find, cite, clean, analyze, and visualize data to identify gaps or problems in policing or affordable housing, create policy proposals that address what they found, then create a research poster to communicate their findings. We encourage a critical approach to data literacy that questions the objectivity and neutrality of data and situate it in a socio-political context. The project culminates in a D2P Symposium where students present their research to their peers, faculty, staff, and community members. By focusing on student-initiated concerns and using real data to try and address them, D2P forms a connection between the courses students take and the communities they live in, increasing its meaning and impact. We also partner with local community organizations, governments, and nonprofits to identify and frame the research questions students explore. Our goal is to intentionally include community voices so that the research we work on is relevant, context-specific, and in the interest of the community it will impact. This presentation will communicate the challenges and benefits of this kind of work, how it can be replicated in other contexts, and invite feedback on how to improve the project. 1
The world of modern data teamwork isn't one that can be created by software and business process alone. Individuals will need to alter their behavior, which is the hardest part about change. This talk will examine the traits and behaviors that lead organizations to be truly data-driven. 1
This talk will discuss the Binder Project in the context of open data and open science, two primary use-cases that have driven the project. It will cover the basics of the Binder Project, such as how to define a reproducible repository to share with others. It will then discuss one of Binder's core goals, which is to build on open standards to facilitate the use of *many* open languages, interfaces, etc. Finally I'll discuss how BinderHub, the technology behind a Binder deployment, is itself open source and deployable anywhere. I'll finish by describing a goal of distributed, federated BinderHubs that provide a network of reproducible data analytics environments. 1
We'll briefly cover what the boxplot is, why it's so great to look at distributions instead of single statistics, and common boxplot variations. I'll spend at least half the talk showing boxplots of real data and comparing them to other summary methods. The talk will wrap up with some quick info on how to create boxplots in many common chartings/statistics/BI tools. I hope this talk will make people more likely to use my favorite chart! 1
When we saw that the Stranger, Seattle’s alternative newspaper, was running a survey on kinks and sexual preferences, we knew we had to get our hands on the data. We convinced the that using machine learning methods on the responses would be a good idea, and then we quickly set out to analyzing them. But we had never written an article for a newspaper before—nor had we worked with data even remotely as dirty. It turns out what makes for a good blog post or technical journal is very different than writing for print, especially for such a sensitive topic. In this talk we will cover how we made sense of the lewd data, the statistical methods we used (and failures we produced), as well as the final results that ended up in our feature article: “There Are Four Kinds of Sex Partners (which one are you).” 1
While aggregators of multiple listing service (MLS) data have opened up much of the process of finding a house on the internet, the data is still closed. The MLS quotes personal security as the primary reason. What data is being protected and what is the impact of that decision? As a consumer of data from numerous sources, REX has routinely been denied access to this data. In this case we make the case for all the societal benefits for opening this data and the implications. 1

Link	rowid	title	speaker ▼	time	day	room	url	datetime	abstract	image
32	32	Warm Breakfast Buffet / Espresso Cart / Hangout time		9:00 AM	May 9 2019	Main Sanctuary		2019-05-09T09:00:00
40	40	Lunch in Fuller Hall		12:00 PM	May 9 2019	Main Sanctuary		2019-05-09T12:00:00
42	42	Lightning Talks in The Main Sanctuary		1:30 PM	May 9 2019	Main Sanctuary		2019-05-09T13:30:00
49	49	Break		3:00 PM	May 9 2019	Main Sanctuary		2019-05-09T15:00:00
56	56	Outros/Goodbye in Main Sanctuary		4:30 PM	May 9 2019	Main Sanctuary		2019-05-09T16:30:00
57	57	5-6pm Hangout time		5:00 PM	May 9 2019	Main Sanctuary		2019-05-09T17:00:00
38	38	Preparing Clients for Open Source Contributions	Aaron Couch	11:30 AM	May 9 2019	Fuller Hall	https://csvconf.com/speakers/#aaron-couch	2019-05-09T11:30:00	At CivicActions we've developed a number of methodologies to help enable our clients to be a part of the open source community. This talk will focus on a number of those strategies including capture management, project roles and tools, and reporting measures. This talk will be slightly shorter to allow for time for a more collaborative discussion.	https://csvconf.com/img/speakers-2019/acouch.jpg
36	36	Should Real Estate Data be Open?	Andy Terrel	11:00 AM	May 9 2019	Daisy Bingham Room	https://csvconf.com/speakers/#andy-terrel	2019-05-09T11:00:00	While aggregators of multiple listing service (MLS) data have opened up much of the process of finding a house on the internet, the data is still closed. The MLS quotes personal security as the primary reason. What data is being protected and what is the impact of that decision? As a consumer of data from numerous sources, REX has routinely been denied access to this data. In this case we make the case for all the societal benefits for opening this data and the implications.	https://csvconf.com/img/speakers-2019/aterrel.jpg
54	54	Annotations in the Classroom; The Classroom in Annotations	Asura Enkhbayar	4:00 PM	May 9 2019	Fuller Hall	https://csvconf.com/speakers/#asura-enkhbayar	2019-05-09T16:00:00	In this talk I want to explore the impact of using Hypothesis in the classroom. What does it mean to read, think, and annotate publicly? How does it change your learning experience as a student? How do you evaluate and assess different annotation styles as a teacher? As a student I can share my own experience of this new mode of teaching and learning. As a data scientist, I want to give a taste of possible new metrics and measurements based on annotation data. Finally, as a critical scholar I am hoping to explore how this new metrification and monitoring of reading might affect education. The talk will rely on data outlined in this essay: https://course-journals.lib.sfu.ca/index.php/pdc2018/article/view/240/213	https://csvconf.com/img/speakers-2019/aenkhbayar.jpg
34	34	Open Infrastructure for Open Science: How Binder Powers an Open Stack in the Cloud	Chris Holdgraf	11:00 AM	May 9 2019	Main Sanctuary	https://csvconf.com/speakers/#chris-holdgraf	2019-05-09T11:00:00	This talk will discuss the Binder Project in the context of open data and open science, two primary use-cases that have driven the project. It will cover the basics of the Binder Project, such as how to define a reproducible repository to share with others. It will then discuss one of Binder's core goals, which is to build on open standards to facilitate the use of many open languages, interfaces, etc. Finally I'll discuss how BinderHub, the technology behind a Binder deployment, is itself open source and deployable anywhere. I'll finish by describing a goal of distributed, federated BinderHubs that provide a network of reproducible data analytics environments.	https://csvconf.com/img/speakers-2019/choldgraf.jpg
51	51	Squishy Amoeba-Like Objects	Darius Kazemi	3:30 PM	May 9 2019	Fuller Hall	https://csvconf.com/speakers/#darius-kazemi	2019-05-09T15:30:00	On June 19th, 1970, a group of computer scientists who were inventing the internet referred to key pieces of its proposed design as "squishy amoeba-like objects". Amoebas are porous yet have well-defined boundaries. Thinking about these creatures gives us new ways to think about networks and communities and technology. This talk makes a case for the squishy amoeba-like object as an organizing principle for what is broadly being called "the decentralized web", a web outside of monolithic, monopolistic actors.	https://csvconf.com/img/speakers-2019/dkazemi.jpg
48	48	How open data can promote participatory democracy	Hector Dominguez	2:30 PM	May 9 2019	Daisy Bingham Room	https://csvconf.com/speakers/#hector-dominguez	2019-05-09T14:30:00	In this discussion, I will explore the nuances of building an open data program as a step towards participatory democracy and the challenges of creating trust with local communities.	https://csvconf.com/img/speakers-2019/hdominguez.jpg
46	46	Beyond the WARC: Making Web Archives More Useful and User-friendly	Ilya Kreymer	2:30 PM	May 9 2019	Main Sanctuary	https://csvconf.com/speakers/#ilya-kreymer	2019-05-09T14:30:00	Archives of the web contain not only web pages but any type of data. The only standard in web archiving is the ISO WARC file format, which specifies raw data captured from the web. However, the WARC files often lack any context or metadata about how this data was captured. The talk will briefly cover the basics of the WARC format, and also provide possible ideas for making web archiving data more user-friendly, present existing tools and suggest ideas for interoperable ways to describe collections and make sense of growing web archive data beyond the WARC format.	https://csvconf.com/img/speakers-2019/ikreymer.jpg
53	53	Spanking and Spreadsheets: Data-driven Sex Journalism	Jacqueline Nolis & Heather Nolis	4:00 PM	May 9 2019	Main Sanctuary	https://csvconf.com/speakers/#jacqueline-nolis-heather-nolis	2019-05-09T16:00:00	When we saw that the Stranger, Seattle’s alternative newspaper, was running a survey on kinks and sexual preferences, we knew we had to get our hands on the data. We convinced the that using machine learning methods on the responses would be a good idea, and then we quickly set out to analyzing them. But we had never written an article for a newspaper before—nor had we worked with data even remotely as dirty. It turns out what makes for a good blog post or technical journal is very different than writing for print, especially for such a sensitive topic. In this talk we will cover how we made sense of the lewd data, the statistical methods we used (and failures we produced), as well as the final results that ended up in our feature article: “There Are Four Kinds of Sex Partners (which one are you).”	https://csvconf.com/img/speakers-2019/jnolis_hnolis.jpg
37	37	Improving the Quality of Neuroimaging Scans	Jonathan Uriarte-Lopez	11:30 AM	May 9 2019	Main Sanctuary	https://csvconf.com/speakers/#jonathan-uriarte-lopez	2019-05-09T11:30:00	My presentation will be on how adjustments to the human connectome project (HCP) pipeline, with the use of the advanced normalization tools (ANTS), improved the data quality of neuroimaging scans provided by the Autism Brain Imaging Data Exchange (ABIDE). Autism spectrum disorder (ASD) is a neurodevelopmental disorder consisting of altered social and communication difficulties along with repetitive and restrictive behaviors. It is difficult to study a living brain safely which is why we use neuroimaging techniques such as MRI. Data quality can be affected by subjects moving in the scanner, or due to computing pipeline issues. Adjustments to the HCP pipeline lead to an increase in data quality, and a decrease in the amount of data lost. This will save researchers time, money, and data to study the neurophysiological aspects of ASD.	https://csvconf.com/img/speakers-2019/julopez.jpg
52	52	Version Controlled Stakeholder Reporting: Building an End-to-End Data Reporting Infrastructure	Jose M Hernandez	3:30 PM	May 9 2019	Daisy Bingham Room	https://csvconf.com/speakers/#jose-m-hernandez	2019-05-09T15:30:00	King County, Washington is currently undergoing complex social and economic changes that have both positive and negative impacts on local residents. With rising rents displacing low-income households to outlying areas or into homelessness, there is a critical need to understand the prevalence and mechanisms of housing insecurity for government organizations tasked to address these issues. Currently, our team of Data and social scientists at the University of Washington, eScience Institute are collaborating with stakeholders across the King County Housing and Homelessness prevention agencies to derive meaningful insights from their data. While their aim is not to produce academic research, our findings may have significant and immediate impact for their organizational practices and the communities they are tasked to serve. In this context and where there is an iterative and constant feedback loop present, reproducibility of the results we present to them, from figures, tables, and even written language is critical. To ensure a successful collaboration, our team has built an end to end data reporting infrastructure to produce reports for our stakeholders that are reproducible and version controlled from raw data to final product. We employ some common open source tools to accomplish this, including R/Rstudio, Python, Rmarkdown, and git.	https://csvconf.com/img/speakers-2019/jmhernandez.jpg
41	41	KEYNOTE	Kirstie Whitaker	12:30 PM	May 9 2019	Main Sanctuary		2019-05-09T12:30:00
44	44	Crafting Data-Driven Stories for the Everyday Reader	Marisa Aquilina	2:00 PM	May 9 2019	Fuller Hall	https://csvconf.com/speakers/#marisa-aquilina	2019-05-09T14:00:00	Journalists don’t write for other journalists—they write for the curious and community-minded public. In the same way, statistical journalism should not be a black box of visualizations and narrative meant only for data makers like us. Crafting data-driven stories for a general audience means giving readers an opportunity to interact with a fun and practical use case while explaining the interpretative thinking that lies under the hood of statistical methods. I am an undergraduate at Cal Poly that writes and builds interactive, data-driven publishings with a team of students. I'll walk you through how we ideate fascinating questions, make methods explainable, and use Jupyter Notebooks to share reproducible code.	https://csvconf.com/img/speakers-2019/maquilina.jpg
35	35	The Data to Policy Project: Using Data to Build More Equitable Communities	Melissa Mejia	11:00 AM	May 9 2019	Fuller Hall	https://csvconf.com/speakers/#melissa-mejia	2019-05-09T11:00:00	The Data to Policy Project (D2P) is an initiative creating meaningful learning experiences for students by using analysis of open data to generate equity and evidence-based policy proposals addressing local community needs. D2P is integrated into credit-bearing courses where students explore issues like policing and affordable housing in the Denver region. Over the course of a semester, students find, cite, clean, analyze, and visualize data to identify gaps or problems in policing or affordable housing, create policy proposals that address what they found, then create a research poster to communicate their findings. We encourage a critical approach to data literacy that questions the objectivity and neutrality of data and situate it in a socio-political context. The project culminates in a D2P Symposium where students present their research to their peers, faculty, staff, and community members. By focusing on student-initiated concerns and using real data to try and address them, D2P forms a connection between the courses students take and the communities they live in, increasing its meaning and impact. We also partner with local community organizations, governments, and nonprofits to identify and frame the research questions students explore. Our goal is to intentionally include community voices so that the research we work on is relevant, context-specific, and in the interest of the community it will impact. This presentation will communicate the challenges and benefits of this kind of work, how it can be replicated in other contexts, and invite feedback on how to improve the project.	https://csvconf.com/img/speakers-2019/mmejia.jpg
50	50	A Love Letter to the Boxplot	Melissa Santos	3:30 PM	May 9 2019	Main Sanctuary	https://csvconf.com/speakers/#melissa-santos	2019-05-09T15:30:00	We'll briefly cover what the boxplot is, why it's so great to look at distributions instead of single statistics, and common boxplot variations. I'll spend at least half the talk showing boxplots of real data and comparing them to other summary methods. The talk will wrap up with some quick info on how to create boxplots in many common chartings/statistics/BI tools. I hope this talk will make people more likely to use my favorite chart!	https://csvconf.com/img/speakers-2019/msantos.jpg
43	43	Data Science Training and Community Building through Hackweeks	Micaela Parker	2:00 PM	May 9 2019	Main Sanctuary	https://csvconf.com/speakers/#micaela-parker	2019-05-09T14:00:00	Informal training activities enable researchers at all levels to rapidly learn data science tools and best practices that fit their research questions and make significant advances in their work. In this talk, I will describe a highly successful informal training that has emerged in recent years called Hackweeks. These hackathon-style events place a strong focus on cultivating data science literacy, building a community of practice, and developing resources within an existing domain-specific community. By bringing together researchers from many different universities to address methods challenges within a research domain, Hackweeks take advantage of a shared language and shared scientific objectives. The Hackweek structure is designed to foster collaboration and learning among people from various stages of their career and technical abilities, and catalyze a community through a shared interest in solving computational challenges within a field (Huppenkothen et al, 2018). Hackweeks originally came out of the Astronomy community (Astro Hack Week, entering its 6th year in 2019) and the model has been successfully propagated to: neuroscience (Neurohackweek, now a 2-week NIH-funded program called Neurohackademy), geospatial sciences (Geohackweek), oceanography (Oceanhackweek), and more.	https://csvconf.com/img/speakers-2019/mparker.jpg
39	39	How to Build a Data-Driven Culture	Patrick McGarry	11:30 AM	May 9 2019	Daisy Bingham Room	https://csvconf.com/speakers/#patrick-mcgarry	2019-05-09T11:30:00	The world of modern data teamwork isn't one that can be created by software and business process alone. Individuals will need to alter their behavior, which is the hardest part about change. This talk will examine the traits and behaviors that lead organizations to be truly data-driven.	https://csvconf.com/img/speakers-2019/pmcgarry.jpg
47	47	How a File Format Led to a Crossword Scandal	Saul Pwanson	2:30 PM	May 9 2019	Fuller Hall	https://csvconf.com/speakers/#saul-pwanson	2019-05-09T14:30:00	In 2016 I designed a plain-text file format for crossword puzzle data, and then spent a couple of months building a micro-data-pipeline, scraping tens of thousands of crosswords from various sources. Then, having all those crosswords in a simple format, I wanted to see if there were any common grid patterns--and discovered egregious plagiarism by a major crossword editor that had gone on for years. This talk would cover the file format, data pipeline, and the design choices that aided rapid exploration; the evidence for the scandal, from the initial anomalies to the final damning visualization; and what it's like for a data project to get 15 minutes of fame.	https://csvconf.com/img/speakers-2019/spwanson.jpg
45	45	Datasette	Simon Willison	2:00 PM	May 9 2019	Daisy Bingham Room	https://csvconf.com/speakers/#simon-willison	2019-05-09T14:00:00	Datasette is a tool for instantly publishing structured data on the internet. It makes it easy to construct and execute arbitrary SQL queries (using SQLite) and export the results as CSV. It's accompanying tool csvs-to-sqlite makes it easy to convert CSV files into a SQLite database. More info at https://github.com/simonw/datasette	https://csvconf.com/img/speakers-2019/swillison.jpg
55	55	Data Scavenger Hunts: Learning about Data Together	Ted Laderas	4:00 PM	May 9 2019	Daisy Bingham Room	https://csvconf.com/speakers/#ted-laderas	2019-05-09T16:00:00	Data exploration and visualization are a highly accessible gateway activity to learning data science. In this talk, we discuss our experience with "Data Scavenger Hunts" using web apps to democratize data science and make it accessible to a wide variety of audiences. In order to acheive this, we have developed an R package called `burro` that can enable public datasets to be explored together via a sharable web app. In this talk, we talk about our experience with using data scavenger hunts to teach each other interesting things about data. In particular, we share our experiences with exploring the NHANES (National Health Nutirition Examination Survey) data and the insights we have taught each other. We show that this guided and communal data exploration leads to increased confidence and curiosity about data science in Biodata-Club, our learning community. `burro` apps can be deployed by anyone to start conversations about data.	https://csvconf.com/img/speakers-2019/tladeras.jpg
33	33	KEYNOTE	Teon L. Brooks	10:00 AM	May 9 2019	Main Sanctuary		2019-05-09T10:00:00

Link

rowid

title

speaker ▼

time

day

room

url

datetime

abstract

image

Warm Breakfast Buffet / Espresso Cart / Hangout time

9:00 AM

May 9 2019

Main Sanctuary

2019-05-09T09:00:00

Lunch in Fuller Hall

12:00 PM

May 9 2019

Main Sanctuary

2019-05-09T12:00:00

Lightning Talks in The Main Sanctuary

1:30 PM

May 9 2019

Main Sanctuary

2019-05-09T13:30:00

Break

3:00 PM

May 9 2019

Main Sanctuary

2019-05-09T15:00:00

Outros/Goodbye in Main Sanctuary

4:30 PM

May 9 2019

Main Sanctuary

2019-05-09T16:30:00

5-6pm Hangout time

5:00 PM

May 9 2019

Main Sanctuary

2019-05-09T17:00:00

Preparing Clients for Open Source Contributions

Aaron Couch

11:30 AM

May 9 2019

Fuller Hall

https://csvconf.com/speakers/#aaron-couch

2019-05-09T11:30:00

At CivicActions we've developed a number of methodologies to help enable our clients to be a part of the open source community. This talk will focus on a number of those strategies including capture management, project roles and tools, and reporting measures. This talk will be slightly shorter to allow for time for a more collaborative discussion.

https://csvconf.com/img/speakers-2019/acouch.jpg

Should Real Estate Data be Open?

Andy Terrel

11:00 AM

May 9 2019

Daisy Bingham Room

https://csvconf.com/speakers/#andy-terrel

2019-05-09T11:00:00

While aggregators of multiple listing service (MLS) data have opened up much of the process of finding a house on the internet, the data is still closed. The MLS quotes personal security as the primary reason. What data is being protected and what is the impact of that decision? As a consumer of data from numerous sources, REX has routinely been denied access to this data. In this case we make the case for all the societal benefits for opening this data and the implications.

https://csvconf.com/img/speakers-2019/aterrel.jpg

Annotations in the Classroom; The Classroom in Annotations

Asura Enkhbayar

4:00 PM

May 9 2019

Fuller Hall

https://csvconf.com/speakers/#asura-enkhbayar

2019-05-09T16:00:00

In this talk I want to explore the impact of using Hypothesis in the classroom. What does it mean to read, think, and annotate publicly? How does it change your learning experience as a student? How do you evaluate and assess different annotation styles as a teacher? As a student I can share my own experience of this new mode of teaching and learning. As a data scientist, I want to give a taste of possible new metrics and measurements based on annotation data. Finally, as a critical scholar I am hoping to explore how this new metrification and monitoring of reading might affect education. The talk will rely on data outlined in this essay: https://course-journals.lib.sfu.ca/index.php/pdc2018/article/view/240/213

https://csvconf.com/img/speakers-2019/aenkhbayar.jpg

Open Infrastructure for Open Science: How Binder Powers an Open Stack in the Cloud

Chris Holdgraf

11:00 AM

May 9 2019

Main Sanctuary

https://csvconf.com/speakers/#chris-holdgraf

2019-05-09T11:00:00

This talk will discuss the Binder Project in the context of open data and open science, two primary use-cases that have driven the project. It will cover the basics of the Binder Project, such as how to define a reproducible repository to share with others. It will then discuss one of Binder's core goals, which is to build on open standards to facilitate the use of *many* open languages, interfaces, etc. Finally I'll discuss how BinderHub, the technology behind a Binder deployment, is itself open source and deployable anywhere. I'll finish by describing a goal of distributed, federated BinderHubs that provide a network of reproducible data analytics environments.

https://csvconf.com/img/speakers-2019/choldgraf.jpg

Squishy Amoeba-Like Objects

Darius Kazemi

3:30 PM

May 9 2019

Fuller Hall

https://csvconf.com/speakers/#darius-kazemi

2019-05-09T15:30:00

On June 19th, 1970, a group of computer scientists who were inventing the internet referred to key pieces of its proposed design as "squishy amoeba-like objects". Amoebas are porous yet have well-defined boundaries. Thinking about these creatures gives us new ways to think about networks and communities and technology. This talk makes a case for the squishy amoeba-like object as an organizing principle for what is broadly being called "the decentralized web", a web outside of monolithic, monopolistic actors.

https://csvconf.com/img/speakers-2019/dkazemi.jpg

How open data can promote participatory democracy

Hector Dominguez

2:30 PM

May 9 2019

Daisy Bingham Room

https://csvconf.com/speakers/#hector-dominguez

2019-05-09T14:30:00

In this discussion, I will explore the nuances of building an open data program as a step towards participatory democracy and the challenges of creating trust with local communities.

https://csvconf.com/img/speakers-2019/hdominguez.jpg

Beyond the WARC: Making Web Archives More Useful and User-friendly

Ilya Kreymer

2:30 PM

May 9 2019

Main Sanctuary

https://csvconf.com/speakers/#ilya-kreymer

2019-05-09T14:30:00

Archives of the web contain not only web pages but any type of data. The only standard in web archiving is the ISO WARC file format, which specifies raw data captured from the web. However, the WARC files often lack any context or metadata about how this data was captured. The talk will briefly cover the basics of the WARC format, and also provide possible ideas for making web archiving data more user-friendly, present existing tools and suggest ideas for interoperable ways to describe collections and make sense of growing web archive data beyond the WARC format.

https://csvconf.com/img/speakers-2019/ikreymer.jpg

Spanking and Spreadsheets: Data-driven Sex Journalism

Jacqueline Nolis & Heather Nolis

4:00 PM

May 9 2019

Main Sanctuary

https://csvconf.com/speakers/#jacqueline-nolis-heather-nolis

2019-05-09T16:00:00

When we saw that the Stranger, Seattle’s alternative newspaper, was running a survey on kinks and sexual preferences, we knew we had to get our hands on the data. We convinced the that using machine learning methods on the responses would be a good idea, and then we quickly set out to analyzing them. But we had never written an article for a newspaper before—nor had we worked with data even remotely as dirty. It turns out what makes for a good blog post or technical journal is very different than writing for print, especially for such a sensitive topic. In this talk we will cover how we made sense of the lewd data, the statistical methods we used (and failures we produced), as well as the final results that ended up in our feature article: “There Are Four Kinds of Sex Partners (which one are you).”

https://csvconf.com/img/speakers-2019/jnolis_hnolis.jpg

Improving the Quality of Neuroimaging Scans

Jonathan Uriarte-Lopez

11:30 AM

May 9 2019

Main Sanctuary

https://csvconf.com/speakers/#jonathan-uriarte-lopez

2019-05-09T11:30:00

My presentation will be on how adjustments to the human connectome project (HCP) pipeline, with the use of the advanced normalization tools (ANTS), improved the data quality of neuroimaging scans provided by the Autism Brain Imaging Data Exchange (ABIDE). Autism spectrum disorder (ASD) is a neurodevelopmental disorder consisting of altered social and communication difficulties along with repetitive and restrictive behaviors. It is difficult to study a living brain safely which is why we use neuroimaging techniques such as MRI. Data quality can be affected by subjects moving in the scanner, or due to computing pipeline issues. Adjustments to the HCP pipeline lead to an increase in data quality, and a decrease in the amount of data lost. This will save researchers time, money, and data to study the neurophysiological aspects of ASD.

https://csvconf.com/img/speakers-2019/julopez.jpg

Version Controlled Stakeholder Reporting: Building an End-to-End Data Reporting Infrastructure

Jose M Hernandez

3:30 PM

May 9 2019

Daisy Bingham Room

https://csvconf.com/speakers/#jose-m-hernandez

2019-05-09T15:30:00

King County, Washington is currently undergoing complex social and economic changes that have both positive and negative impacts on local residents. With rising rents displacing low-income households to outlying areas or into homelessness, there is a critical need to understand the prevalence and mechanisms of housing insecurity for government organizations tasked to address these issues. Currently, our team of Data and social scientists at the University of Washington, eScience Institute are collaborating with stakeholders across the King County Housing and Homelessness prevention agencies to derive meaningful insights from their data. While their aim is not to produce academic research, our findings may have significant and immediate impact for their organizational practices and the communities they are tasked to serve. In this context and where there is an iterative and constant feedback loop present, reproducibility of the results we present to them, from figures, tables, and even written language is critical. To ensure a successful collaboration, our team has built an end to end data reporting infrastructure to produce reports for our stakeholders that are reproducible and version controlled from raw data to final product. We employ some common open source tools to accomplish this, including R/Rstudio, Python, Rmarkdown, and git.

https://csvconf.com/img/speakers-2019/jmhernandez.jpg

KEYNOTE

Kirstie Whitaker

12:30 PM

May 9 2019

Main Sanctuary

2019-05-09T12:30:00

Crafting Data-Driven Stories for the Everyday Reader

Marisa Aquilina

2:00 PM

May 9 2019

Fuller Hall

https://csvconf.com/speakers/#marisa-aquilina

2019-05-09T14:00:00

Journalists don’t write for other journalists—they write for the curious and community-minded public. In the same way, statistical journalism should not be a black box of visualizations and narrative meant only for data makers like us. Crafting data-driven stories for a general audience means giving readers an opportunity to interact with a fun and practical use case while explaining the interpretative thinking that lies under the hood of statistical methods. I am an undergraduate at Cal Poly that writes and builds interactive, data-driven publishings with a team of students. I'll walk you through how we ideate fascinating questions, make methods explainable, and use Jupyter Notebooks to share reproducible code.

https://csvconf.com/img/speakers-2019/maquilina.jpg

The Data to Policy Project: Using Data to Build More Equitable Communities

Melissa Mejia

11:00 AM

May 9 2019

Fuller Hall

https://csvconf.com/speakers/#melissa-mejia

2019-05-09T11:00:00

The Data to Policy Project (D2P) is an initiative creating meaningful learning experiences for students by using analysis of open data to generate equity and evidence-based policy proposals addressing local community needs. D2P is integrated into credit-bearing courses where students explore issues like policing and affordable housing in the Denver region. Over the course of a semester, students find, cite, clean, analyze, and visualize data to identify gaps or problems in policing or affordable housing, create policy proposals that address what they found, then create a research poster to communicate their findings. We encourage a critical approach to data literacy that questions the objectivity and neutrality of data and situate it in a socio-political context. The project culminates in a D2P Symposium where students present their research to their peers, faculty, staff, and community members. By focusing on student-initiated concerns and using real data to try and address them, D2P forms a connection between the courses students take and the communities they live in, increasing its meaning and impact. We also partner with local community organizations, governments, and nonprofits to identify and frame the research questions students explore. Our goal is to intentionally include community voices so that the research we work on is relevant, context-specific, and in the interest of the community it will impact. This presentation will communicate the challenges and benefits of this kind of work, how it can be replicated in other contexts, and invite feedback on how to improve the project.

https://csvconf.com/img/speakers-2019/mmejia.jpg

A Love Letter to the Boxplot

Melissa Santos

3:30 PM

May 9 2019

Main Sanctuary

https://csvconf.com/speakers/#melissa-santos

2019-05-09T15:30:00

We'll briefly cover what the boxplot is, why it's so great to look at distributions instead of single statistics, and common boxplot variations. I'll spend at least half the talk showing boxplots of real data and comparing them to other summary methods. The talk will wrap up with some quick info on how to create boxplots in many common chartings/statistics/BI tools. I hope this talk will make people more likely to use my favorite chart!

https://csvconf.com/img/speakers-2019/msantos.jpg

Data Science Training and Community Building through Hackweeks

Micaela Parker

2:00 PM

May 9 2019

Main Sanctuary

https://csvconf.com/speakers/#micaela-parker

2019-05-09T14:00:00

Informal training activities enable researchers at all levels to rapidly learn data science tools and best practices that fit their research questions and make significant advances in their work. In this talk, I will describe a highly successful informal training that has emerged in recent years called Hackweeks. These hackathon-style events place a strong focus on cultivating data science literacy, building a community of practice, and developing resources within an existing domain-specific community. By bringing together researchers from many different universities to address methods challenges within a research domain, Hackweeks take advantage of a shared language and shared scientific objectives. The Hackweek structure is designed to foster collaboration and learning among people from various stages of their career and technical abilities, and catalyze a community through a shared interest in solving computational challenges within a field (Huppenkothen et al, 2018). Hackweeks originally came out of the Astronomy community (Astro Hack Week, entering its 6th year in 2019) and the model has been successfully propagated to: neuroscience (Neurohackweek, now a 2-week NIH-funded program called Neurohackademy), geospatial sciences (Geohackweek), oceanography (Oceanhackweek), and more.

https://csvconf.com/img/speakers-2019/mparker.jpg

How to Build a Data-Driven Culture

Patrick McGarry

11:30 AM

May 9 2019

Daisy Bingham Room

https://csvconf.com/speakers/#patrick-mcgarry

2019-05-09T11:30:00

The world of modern data teamwork isn't one that can be created by software and business process alone. Individuals will need to alter their behavior, which is the hardest part about change. This talk will examine the traits and behaviors that lead organizations to be truly data-driven.

https://csvconf.com/img/speakers-2019/pmcgarry.jpg

How a File Format Led to a Crossword Scandal

Saul Pwanson

2:30 PM

May 9 2019

Fuller Hall

https://csvconf.com/speakers/#saul-pwanson

2019-05-09T14:30:00

In 2016 I designed a plain-text file format for crossword puzzle data, and then spent a couple of months building a micro-data-pipeline, scraping tens of thousands of crosswords from various sources. Then, having all those crosswords in a simple format, I wanted to see if there were any common grid patterns--and discovered egregious plagiarism by a major crossword editor that had gone on for years. This talk would cover the file format, data pipeline, and the design choices that aided rapid exploration; the evidence for the scandal, from the initial anomalies to the final damning visualization; and what it's like for a data project to get 15 minutes of fame.

https://csvconf.com/img/speakers-2019/spwanson.jpg

Datasette

Simon Willison

2:00 PM

May 9 2019

Daisy Bingham Room

https://csvconf.com/speakers/#simon-willison

2019-05-09T14:00:00

Datasette is a tool for instantly publishing structured data on the internet. It makes it easy to construct and execute arbitrary SQL queries (using SQLite) and export the results as CSV. It's accompanying tool csvs-to-sqlite makes it easy to convert CSV files into a SQLite database. More info at https://github.com/simonw/datasette

https://csvconf.com/img/speakers-2019/swillison.jpg

Data Scavenger Hunts: Learning about Data Together

Ted Laderas

4:00 PM

May 9 2019

Daisy Bingham Room

https://csvconf.com/speakers/#ted-laderas

2019-05-09T16:00:00

Data exploration and visualization are a highly accessible gateway activity to learning data science. In this talk, we discuss our experience with "Data Scavenger Hunts" using web apps to democratize data science and make it accessible to a wide variety of audiences. In order to acheive this, we have developed an R package called `burro` that can enable public datasets to be explored together via a sharable web app. In this talk, we talk about our experience with using data scavenger hunts to teach each other interesting things about data. In particular, we share our experiences with exploring the NHANES (National Health Nutirition Examination Survey) data and the insights we have taught each other. We show that this guided and communal data exploration leads to increased confidence and curiosity about data science in Biodata-Club, our learning community. `burro` apps can be deployed by anyone to start conversations about data.

https://csvconf.com/img/speakers-2019/tladeras.jpg

KEYNOTE

Teon L. Brooks

10:00 AM

May 9 2019

Main Sanctuary

2019-05-09T10:00:00

Advanced export

JSON shape: default, array, newline-delimited

CREATE TABLE [talks] ( [title] TEXT, [speaker] TEXT, [time] TEXT, [day] TEXT, [room] TEXT, [url] TEXT, [datetime] TEXT, [abstract] TEXT, [image] TEXT )