I attended James O'Sullivan's 2016 DHSI course titled Introduction to Computation for Literary Criticism. In the course we were given the opportunity to experiment with a number of tools and methods of statistical analysis, applying them to a variety of data sets.
In most instances we used the stats-centric progamming language R with RStudio, an IDE (integrated development environment) for R. With RStudio, we generated different types of visualizations, including dendrograms, rolling delta scatterplots, and bootstrap consensus trees.
In other instances, we used the "web-based reading and analysis environment" Voyant Tools. Voyant comes packed with a number of visualization tools and enables scholars an instantaneous look at language useage across a number of texts.
Finally, we experimented with an open-source topic modeling tool and the network visualization tool Palladio.
Selected visualizations from my work in the class are given below.
The first image is a dendrogram that clusters James's corpus based on a stylometric analysis of the corpus's most frequent words. The second image is the same data fed into Palladio's network visualization tool, offering another view of the same information. The most obvious observation of this simple analysis is how well James's works cluster chronologically, seemingly typifying James's early, mid, and late styles.
Have you ever heard the rumor that Truman Capote secretly contributed significant passages to Harper Lee's To Kill a Mockingbird? With computational analysis, we are able to test those rumors to a significant degree. The first image is a dendrogram that clusters based on a stylometric analysis of the data set's most frequent words. The second image is a rolling delta analysis that similarly makes comparisons based on most frequent words, but it does so by slicing and comparing individual sections of each text. In both instances, Lee's style remains quite distinct from Capote's, making it unlikely that he contributed significant passages to any of her works.
Voyant allows scholars a quick look at language across a large corpus. What knowledge might be gained by tracing terms like "god" and forms of "goddamn" across Salinger's chronology? I like the idea that his corpus grows less cynical and more spiritual as the Holden Caulfield persona gives way to Seymour Glass as a young boy.
Note: Voyant Tools does allow for data and visualization embeds, but they are not currently supported by GitHub Pages.