I’ve been playing with Google Ngrams. If you’ve never heard of them, here’s an introduction to the concept.
Google have scanned about 50 million books, and a couple of scientists at Harvard figured out some ways to analyse that mountain of data – and Google have developed the ngrams tool based on this analysis.
I was trying to use it to compare fame, so testing pairs or groups of famous people from roughly the same era to see who was more famous. Stephen Hawking beats Stephen Fry, Galileo beats Copernicus most of the time, and Edison beats Tesla (to the consternation of true geeks). Moving into fiction, Captain Kirk beats Hans Solo; but Darth Vader beats both – apparently in the fame stakes it pays to be evil.
But one result amused me more than all the others; Michaelangelo vs Da Vinci. There’s a huge upkick in results for Da Vinci, in the early 2000s, right when Dan Brown published the Da Vinci code.
Remember that ngrams is referencing mentions of a term in the 50 million books scanned so far – so this huge jump in the number of times the term is used cannot be due to one book alone. Ngrams also lets you drill down and see what was published in that period with the term “Da Vinci”, and indeed a whole range of books on the subject of Da Vinci was published. Everything from biographies, to school books, to books analysing all the errors in Dan Brown’s book.
You need to use the drill down function when playing around with ngrams, I tested “meme” vs “gene”. Gene won, which was no surprise, but I found data for the word meme being used back into the 1800s which was a surprise. But on drilling down it turns out that it’s the French word “meme” that is being counted.