talks
Data source:
https://csvconf.com/
1 row
where abstract = "In 2016 I designed a plain-text file format for crossword puzzle data, and then spent a couple of months building a micro-data-pipeline, scraping tens of thousands of crosswords from various sources. Then, having all those crosswords in a simple format, I wanted to see if there were any common grid patterns--and discovered egregious plagiarism by a major crossword editor that had gone on for years. This talk would cover the file format, data pipeline, and the design choices that aided rapid exploration; the evidence for the scandal, from the initial anomalies to the final damning visualization; and what it's like for a data project to get 15 minutes of fame.", day = "May 9 2019" and speaker = "Saul Pwanson" sorted by url
✎ View and edit SQL
This data as JSON, CSV (advanced)
abstract
✖
- In 2016 I designed a plain-text file format for crossword puzzle data, and then spent a couple of months building a micro-data-pipeline, scraping tens of thousands of crosswords from various sources. Then, having all those crosswords in a simple format, I wanted to see if there were any common grid patterns--and discovered egregious plagiarism by a major crossword editor that had gone on for years. This talk would cover the file format, data pipeline, and the design choices that aided rapid exploration; the evidence for the scandal, from the initial anomalies to the final damning visualization; and what it's like for a data project to get 15 minutes of fame. · 1 ✖
Link
|
rowid
|
title
|
speaker
|
time
|
day
|
room
|
url ▼
|
datetime
|
abstract
|
image
|
47 |
47 |
How a File Format Led to a Crossword Scandal |
Saul Pwanson |
2:30 PM |
May 9 2019 |
Fuller Hall |
https://csvconf.com/speakers/#saul-pwanson |
2019-05-09T14:30:00 |
In 2016 I designed a plain-text file format for crossword puzzle data, and then spent a couple of months building a micro-data-pipeline, scraping tens of thousands of crosswords from various sources. Then, having all those crosswords in a simple format, I wanted to see if there were any common grid patterns--and discovered egregious plagiarism by a major crossword editor that had gone on for years. This talk would cover the file format, data pipeline, and the design choices that aided rapid exploration; the evidence for the scandal, from the initial anomalies to the final damning visualization; and what it's like for a data project to get 15 minutes of fame. |
https://csvconf.com/img/speakers-2019/spwanson.jpg |
CREATE TABLE [talks] (
[title] TEXT,
[speaker] TEXT,
[time] TEXT,
[day] TEXT,
[room] TEXT,
[url] TEXT,
[datetime] TEXT,
[abstract] TEXT,
[image] TEXT
)