In part one of our two-part exploration of Seinfeld through the lens of data science, we parsed every line spoken with a sentiment analysis. We measured every word's sentiment to plot out an emotional understanding of each episode, season and character. We tried to get at, perhaps, the raw emotion of Seinfeld and how that plays out through core storylines.
On the whole, it worked. Angry characters like Kramer's lawyer, Jackie, bubbled to the surface. Excitable characters like Steinbrenner cracked the visualizations, too. What's more, we were able to single out episodes steeped in one emotion or another. "The Bris," for instance, charted high in fear and disgust.
It also didn't work. Not 100% of the time. In order to get away from muddying our results with characters who don't speak enough, we stripped a bit of the soul away from Seinfeld. By making it so that a specific amount of lines had to be spoken by one particular character in order to show up in our analysis, we cut out core emotional folks like, say, the Soup Nazi.
The sentiment analysis mostly worked, but it was flawed when it came to getting at what we were hoping to understand. What's Seinfeld about? Perhaps, more importantly for our purposes, could we use data science to tease out that answer?
We know that Jerry Seinfeld himself pitched Seinfeld with Larry David to NBC in 1988, and they wanted to do a show about "how a comedian gets his material." We also know that the show was widely known as being about nothing, especially after characters George and Jerry pitch their own show to NBC later in Seinfeld's run.
Of course, the show's not about "nothing." Its nine seasons span an absurd variety of jokes and content that suggesting it's about nothing is an absolute farce. It's funny, but it can't be any further from the truth.
Seinfeld, ultimately, is about jokes. It's about the humor in everyday situations. It's about eating a Snickers with a fork, awkwardly greeting a neighbor, arguments over a cup of coffee and catchphrases. Oh, the catchphrases.
We had an idea. What if we created a season-by-season bigram for Seinfeld? We scraped our dataset of every line spoken to figure out which two words, in order, were said most in each episode over nine seasons. Our analysis didn't just use simply word frequency, though. In our info retrieval, we used tf-idf, short for term frequency-inverse document frequency. Essentially, our analysis went beyond just words said a lot and measured the value of the words relative to their frequency of use. If a pair of words is said constantly over the span of the nine seasons of Seinfeld, it's less valuable than a pair of words with a sudden, spiked usage.
The results are amazing. Plug any pair of words into Google along with "Seinfeld" and you'll get a precise episode.
Cable boy? Inka dink? Mustard stain? They're all here.
"Crazy Joe" elicits memories of Jerry hiding from Crazy Joe Divola. "Sponge worthy" instantly calls back to Elaine's closet full of contraceptive sponges. "Marine biologist..." Well, that one references one of the greatest scenes in Seinfeld's (perhaps all of television's) history.
"The sea was angry that day, my friends. Like an old man trying to send back soup in a deli."
It might be Seinfeld's characters that deliver great moments like these, but it's the individual jokes and bits that made it the gold standard of sitcom television in the 90s. These jokes defined Seinfeld, and no other show can quite attain the same status. To even a casual viewer of the series, phrases like "jerk store," "Tim Whatley," "AIDS Walk" and "puffy shirt" swell up random waves of nostalgia.
We decided to throw another variable into the mix for this analysis, just to see if we could drive the point home further. What if we figured out the top bigrams for each of the four core characters? Here we had success, too.
From Jerry's weird relationship with Uncle Leo, to George's terrible wedding invitation selection, Elaine's friend Sue Ellen and Kramer's slow motion "cable boy" scene, these bigrams paint a picture of each character and the jokes they told.
Seinfeld's plot is nonexistent. The characters don't grow, the setting hardly changes, there's no real development of relationships in a meaningful way. Seinfeld exists almost without time, as these jokes could occur at any instance along the full series run.
And yet... Just like Jerry Seinfeld and Larry David's initial pitch, Seinfeld is a show about material. It's about the humor that exists in the every day nothingness that we all experience. Sure, it sounds a little hyperbolic, but Seinfeld is about everything.