Quantcast
Channel: Benjamin M. Schmidt
Browsing all 13 articles
Browse latest View live

Bleg 1: String Distance

String distance measurements are useful for cleaning up the sort of messy data from multiple sources. There are a bunch of string distance algorithms, which usually rely on some form of calculations...

View Article



Image may be NSFW.
Clik here to view.

Finding the best ordering for states

Here’s a very technical, but kind of fun, problem: what’s the optimal order for a list of geographical elements, like the states of the USA? If you’re just here from the future, and don’t care about...

View Article

Image may be NSFW.
Clik here to view.

The Simpsons Bookworm

I thought it would be worth documenting the difficulty (or lack of) in building a Bookworm on a small corpus: I’ve been reading too much lately about the Simpsons thanks to the FX marathon, so figured...

View Article

Markdown, Historical Writing, and Killer Apps

Like many technically inclined historians (for instance, Caleb McDaniel, Jason Heppler, and Lincoln Mullen) I find that I’ve increasingly been using the plain-text format Markdown for almost all of my...

View Article

Image may be NSFW.
Clik here to view.

Searching for structures in the Simpsons and everywhere else.

This is a post about several different things, but maybe it’s got something for everyone. It starts with 1) some thoughts on why we want comparisons between seasons of the Simpsons, hits on 2) some...

View Article


Image may be NSFW.
Clik here to view.

Building topic models into Bookworm searches

I’ve been seeing how deeply we could integrate topic models into the underlying Bookworm architecture a bit lately. My own chief interest in this, because I tend to be a little wary of topic models in...

View Article

Image may be NSFW.
Clik here to view.

More thoughts on topic models and tokens

I’ve been thinking a little more about how to work with the topic modeling extension I recently built for bookworm. (I’m curious if any of those running installations want to try it on their own...

View Article

Image may be NSFW.
Clik here to view.

Building outlines and slides from Markdown lectures with Pandoc

Just a quick follow-up to my post from last month on using Markdown for writing lectures. The github repository for implementing this strategy is now online. The goal there was to have one master file...

View Article


The Bookworm-Mallet extension

I promised Matt Jockers I’d put together a slightly longer explanation of the weird constraints I’ve imposed on myself for topic models in the Bookworm system, like those I used to look at the...

View Article


Image may be NSFW.
Clik here to view.

Rate My Professor

Just some quick FAQs on my professor evaluations visualization: adding new ones to the front, so start with 1 if you want the important ones. -3 (addition): The largest and in many ways most...

View Article

Image may be NSFW.
Clik here to view.

Commodius vici of recirculation: the real problem with Syuzhet

Practically everyone in Digital Humanities has been posting increasingly epistemological reflections on Matt Jockers’ Syuzhet package since Annie Swafford posted a set of critiques of its assumptions....

View Article

Buying a computer for digital humanities work

I’ve gotten a couple e-mails this week from people asking advice about what sort of computers they should buy for digital humanities research. That makes me think there aren’t enough resources online...

View Article

Feature Reduction on the Underwood-Sellars corpus

This is some real inside baseball; I think only two or three people will be interested in this post. But I’m hoping to get one of them to act out or criticize a quick idea. This started as a comment on...

View Article

Browsing all 13 articles
Browse latest View live




Latest Images