Wednesday, September 3, 2014

Dark data and the distribution of birth years

I'm working on a project that makes use of Day's Biographical Dictionary of the the History of Technology as a source. What I wanted to do required the book's information to be a database format, though. How do you go from "dark data" to something a computer can use? First, hope there is an ebook.

In this case, there is. I was able to get a PDF of the text, but it was still a book mean to be read and understood by humans. After thinking about it, I realized the format of the book was quite excellent and would allow for automated processing.

Each entry began with the name of an inventor, and ended with the initials of the editor (or editors) that worked on that section. In between those START and STOP markers were well defined details on things like birth (day, month, year, location), death (day, month, year), etc. There was also a page at the beginning of the book that listed all the editors and their abbreviations used at the end of each entry. Enter python, but practically any language could have parsed this. Without going into a lot of detail:
  • Read the pdf's editor index and stored it as a list and saved it to a file.
  • Read in the book's index of names and stored the names as a list and saved it as a file.
  • Read in the body of the pdf and stored it as text.
  • Using the editor list and the name list, I split the main body of the text into a giant list where each entry began with the name of someone in the index and ended with the initials of one of the editors.
Of course it wasn't that easy. For example, I discovered numerous instances where a name was spelled on way in the text, and another way in the index. After numerous rounds of cleanup to account for differences in names or oddities in formatting, I ended up with data I could write to a CSV for other processing.

Tonight, out of curiosity, I sat down and looked at the distribution of birth years for all inventors in my database born in or after 1690.


I am sure there are errors in the database since I have not throughly gone over it and checked entries, but this first pass showed N=1065 individuals, with a mean birth year of 1830 (median of 1834). I was somewhat surprised at how the numbers fell off as you get into the 20th century, but I have a thought as to why that might be. The book is a biographical take on the history of technology. For that reason, it is necessarily biased toward individuals, not for the work of teams of people. For example, who invented the atomic bomb? Yes, Leo Szilard famously came up with the idea and patented it, but he certainly didn't build one in his back shed. A team built The Gadget. 

Don't misunderstand me to be on the "the age of the lone inventor is dead" wagon. I'm not weighing in on that. I'm simply trying to explain what I'm seeing and guessing at the bias in this one source.

Tuesday, August 26, 2014

A Nerdy Parent's Movie Selections

UPDATED: 15 March, 2015. Maybe I should start another page....


Have you read the blog entries at Some Kind of Wonderful Noise? It chronicles how an adult (an ADULT!), born and raised in the U.S.A. Canada (EDIT: Canada is still not an excuse) somehow slipped through the cracks and never saw movies like Star Wars, Alien, Indiana Jones, etc. He live blogs his reactions and thoughts while watching these cultural touchstones. I highly recommend starting from his very first entries and reading them in order, because you can watch his working cultural vocabulary build, and he begins to make comments you'd expect from seasoned MSTies. It's like watching an AI bootstrap it's cultural subroutine.

The reason I mention Some Kind of Wonderful Noise is because my wife and I have been intentionally showing our sons various (mainly scifi) movies to introduce them to ones that we deem important to having a working cultural vocabulary. What does it mean when someone says "light is green? Trap is clean." Well, now my sons can use it in a sentence to explain the state of something. Shaka. When the walls fell.

Numerous people have asked for a list of the movies we've shown, so here it is. I'll update this list as more movies get shown. Please note that not everything on the list is what we would consider important. Some are just fun.

-----Time Travel Theme-----
  •    Back to the Future 1,2,3
  •    The Time Machine (1960)
  •    Bill and Ted's Excellent Adventure
  •    Time Bandits
  •    Groundhog Day
  •    The Time Machine (2002)
  •    Flight of the Navigator
  •    Planet of the Apes
  •    Run Lola Run
  •    My science project
  •    The girl who leapt through time (2006 cartoon)
Stargate
Close Encounters of the 3rd Kind
The Explorers
Teen Wolf
Forbidden planet
Big Trouble in Little China
The Blob (short: Yip Yip's discover a telephone)
Men in black (short: Kermit the Frog. News Flash on Pinocchio)
The Goonies
Superman (1978)
Superman 2
Iron Giant
The Frighteners
Rewatch Star Wars (ep 4,5,6 and then 1,2,3)
Beatlejuice
Escape to Witch Mountain
Return to Witch Mountain
Cat from Outer Space
The Absentminded Professor
Who Framed Roger Rabbit
Gremlins
-----Stop motion theme-----
  •    Jason and the Argonauts
  •    The 7th Voyage of Sinbad
  •    Golden Voyage of Sinbad 
  •    Mighty Joe Young (1949)
  •    Clash of the Titans
  •    The Beast from 20,000 Fathoms
E.T.
Wargames
Labyrinth
Short Circuit
Teenage Mutant Ninja Turtles (1990)
5th Element
Batman (1989)
War of the Worlds (1953)
War of the Worlds (2005)
Independence Day
Spiderman (2002)
Spiderman 2 (2004)
Peter Pan (2003)
Spiderman 3 (2007)
Hook
Weird Science
Frankenstein (1931)
Creature from the Black Lagoon (1954)
The Wolf Man (1941)
Mothra reboot (1996)
Mothra (1961)
Popeye
Land Before Time
Weird Science
  -Short: Minuscule v1e1 "Le coccinelle"
The Adventures of Baron Munchausen
  -Short: Minuscule v1e2 "Catapulte"

-----Avenger's Theme-----
  • Iron Man
  • Iron Man 2
  •   -Short: Minuscule v1e3 "Bouse de lá!"
  • Thor (2011)
  •   -Short: Minuscule v1e4 "Deux chenilles"
  • Captain America: The First Avenger (2011)
  • The Avengers (2012)
  • Iron Man 3 (2013)
  • Thor 2: The Dark World (2013)
  • Captain America 2: Winter Soldier (2014)
  • Guardians of the Galaxy (2014)
The Mask (1994)
The Last Unicorn (2012)

Justice League:  War (2014)
Justice League: Throne of Atlantis (2015)

-----Star Wars Theme-----
  • Star Wars, A New Hope
  • Star Wars, Empire Strikes Back
  • Star Wars, Return of the Jedi
  • Star Wars, Attack of the Clones
  • Star Wars, The Clone Wars (2003)
  • Star Wars, Revenge of the Sith

Escape from Alcatraz (1979)
Wall-E (2008)
The Rocketeer (1991)