Bleg 1: String Distance
String distance measurements are useful for cleaning up the sort of messy data from multiple sources. There are a bunch of string distance algorithms, which usually rely on some form of calculations...
View ArticleFinding the best ordering for states
Here’s a very technical, but kind of fun, problem: what’s the optimal order for a list of geographical elements, like the states of the USA? If you’re just here from the future, and don’t care about...
View ArticleThe Simpsons Bookworm
I thought it would be worth documenting the difficulty (or lack of) in building a Bookworm on a small corpus: I’ve been reading too much lately about the Simpsons thanks to the FX marathon, so figured...
View ArticleMarkdown, Historical Writing, and Killer Apps
Like many technically inclined historians (for instance, Caleb McDaniel, Jason Heppler, and Lincoln Mullen) I find that I’ve increasingly been using the plain-text format Markdown for almost all of my...
View ArticleSearching for structures in the Simpsons and everywhere else.
This is a post about several different things, but maybe it’s got something for everyone. It starts with 1) some thoughts on why we want comparisons between seasons of the Simpsons, hits on 2) some...
View ArticleBuilding topic models into Bookworm searches
I’ve been seeing how deeply we could integrate topic models into the underlying Bookworm architecture a bit lately. My own chief interest in this, because I tend to be a little wary of topic models in...
View ArticleMore thoughts on topic models and tokens
I’ve been thinking a little more about how to work with the topic modeling extension I recently built for bookworm. (I’m curious if any of those running installations want to try it on their own...
View ArticleBuilding outlines and slides from Markdown lectures with Pandoc
Just a quick follow-up to my post from last month on using Markdown for writing lectures. The github repository for implementing this strategy is now online. The goal there was to have one master file...
View ArticleThe Bookworm-Mallet extension
I promised Matt Jockers I’d put together a slightly longer explanation of the weird constraints I’ve imposed on myself for topic models in the Bookworm system, like those I used to look at the...
View ArticleRate My Professor
Just some quick FAQs on my professor evaluations visualization: adding new ones to the front, so start with 1 if you want the important ones. -3 (addition): The largest and in many ways most...
View ArticleCommodius vici of recirculation: the real problem with Syuzhet
Practically everyone in Digital Humanities has been posting increasingly epistemological reflections on Matt Jockers’ Syuzhet package since Annie Swafford posted a set of critiques of its assumptions....
View ArticleBuying a computer for digital humanities work
I’ve gotten a couple e-mails this week from people asking advice about what sort of computers they should buy for digital humanities research. That makes me think there aren’t enough resources online...
View ArticleFeature Reduction on the Underwood-Sellars corpus
This is some real inside baseball; I think only two or three people will be interested in this post. But I’m hoping to get one of them to act out or criticize a quick idea. This started as a comment on...
View Article
More Pages to Explore .....