Digital Mining of Literature Shows Interesting Facts
Remember when you got your first e-reader and saw that great Word Search feature that allowed you to find every instance of a word in a particular work - say you wanted to track down every instance of the word "horse" in Cervantes' classic, Don Quixote? What a great shortcut to finding a particular passage, reviewing and analyzing a work, or taking it a step further, using it to compare and contrast many works of literature either for personal interest or for scholarly purposes.
Turns out Ben Blatt did just this thing posting his findings in Publishers Weekly self-reviewing his book Nabokov's Favorite Word Is Mauve: What the Numbers Reveal About the Classics, Bestsellers, and Our Own Writing (Simon & Schuster), and here are a few of the points inside that piqued my interest:
1. Authorship of previously disputed (or thought to be known) texts can be traced to the real author(s) by the incidence, the order and use of words. For example, they've put to rest the theory that Shakespeare collaborated with Marlow - they positively did.
2. Exclamation points - the so-called marker of not-so-great-writers... Turns out James Joyce, an undisputed GREAT writer, uses them the most! (see below)
3. Comparison of "shortest" and "longest" first sentences between authors. Turns out Toni Morison and Margaret Atwood win for shortest, and Jane Austen and Vladimir Nabokov win for longest.
And there's more. But you'll have to get the book.
April 13, 2017 —