Skip to content

RE-DESIGN Posts

Predicting movie ratings with IMDb data and R

It’s Oscars season again so why not explore how predictable (my) movie tastes are. This has literally been a million dollar problem and obviously I am not gonna solve it here, but it’s fun and slightly educational to do some number crunching, so why not. Below, I will proceed from a simple linear regression to a generalized additive model to an ordered logistic regression analysis. And I will illustrate the results with nice plots along the way. Of course, all done in R (you can get the script here). Data The data for this little project comes from the IMDb website and, in particular, from my personal ratings of 442 titles recorded there. IMDb keeps the movies you have rated in a nice little table which includes information on the movie title, director, duration, year of release, genre, IMDb rating, and a few other less interesting variables. Conveniently, you can export the data directly as a csv file. Outcome variable The outcome variable that I want to predict is my personal movie rating. IMDb lets you score movies with one to ten stars. Half-points and other fractions are not allowed. It is a tricky variable to work with. It is obviously not a continuous one; at the same time ten ordered categories are a bit too many to treat as a regular categorical variable. Figure 1 plots the frequency distribution (black bars) and density (red area) of my ratings and the density of the IMDb scores (in blue) for the 442 observations…

Swimming in a sea of code

If you are looking for code here, move on. > In the beginning, there was only the relentless blinking of the cursor. With the maddening regularity of waves splashing on the shore: blink, blink, blink, blink…Beyond the cursor, the white wasteland of the empty page: vast, featureless, and terrifying as the sea. You stare at the empty page and primordial fear engulfs you: you are never gonna venture into this wasteland, you are never gonna leave the stable, solid, familiar world of menus and shortcuts, icons and buttons. And then you take the first cautious steps. print ‘Hello world’ > Hello world, the sea obliges. 1+1 > 2 2+2 > 4 You are still scared, but your curiosity is aroused. The playful responsiveness of the sea is tempting, and quickly becomes irresistible. Soon, you are jumpting around like a child, rolling upside-down and around and around: > a=2 > b=3 > a+b 5 > for (x in 1:60) print (x) 1    2    3    4    5    6    7    8    9   10   11   12   13   14  15   16   17   18   19   20   21   22   23   24   25   26  27   28   29   30   31   32   33   34   35   36   37   38  39   40   41   42   43   44   45   46   47   48   49   50  51   52   53   54   55   56   57   58   59   60 The sense of freedom is exhilarating. You take a deep breath and dive: > for (i in 1:10) ifelse (i>5, print (‘ha’), print (‘ho’)) [1] “ho” [1] “ho” [1] “ho” [1]…

The origins of the digital universe

Just finished Turing’s Cathedral – a fine and stimulating book about the origins of the computer, the interlinked history of the first computers and nuclear bombs, the role of John von Neumann in all that, the Institute of Advanced Studies (IAS) in Princeton, and much more. It is a very thoroughly researched volume based on archival materials, interviews, etc. Actually, if I have one complaint it is that it is too scrupulous in presenting the background of all primary, secondary and tertiary characters in the story of the computer and in documenting the development of the various buildings at the IAS. For that reason I found the first part of the book a bit tedious. But the later chapters in which the author allows his own ideas about the digital universe to roam more freely are truly inspired and inspiring. It was also quite fascinating to learn that one of the first uses of the digital computer, apart from calculating nuclear fusion processes and trying to predict the weather, has been to run what would now be called agent-based modeling (by Nils Baricelli). Here is my favorite passage from the book: ‘Books are strings of code. But they have mysterious properties – like strings of DNA. Somehow the author captures a fragment of the universe, unravels it into a one-dimensional sequence, squeezes it through a keyhole, and hopes that a three-dimensional  vision emerges in the reader’s mind. The translation is never exact.’ (p.312)

Constructivism in the world of Dragons

Here is an analysis of Game of Thrones from a realist international relations perspective. Inevitably, here is the response from a constructivist angle. These are supposed to be fun so I approached them with a light heart and popcorn. But halfway through the second article I actually felt sick to my stomach. I am not exaggerating, and it wasn’t the popcorn – seeing the same ‘arguments’ between realists and constructivists rehearsed in this new setting, the same lame responses to the same lame points, the same ‘debate’ where nobody ever changes their mind, the same dreaded confluence of normative, theoretical, and empirical notions that plagues this never-ending exchange in the real (sorry, socially constructed) world, all that really gave me a physical pain. I felt entrapped – even in this fantasy world there was no escape from the Realist and the Constructivist. The Seven Kingdoms were infected by the triviality of our IR theories. The magic of their world was desecrated. Forever…. Nothing wrong with the particular analyses. But precisely because they manage to be good examples of the genres they imitate the bad taste in my mouth felt so real. So is it about interests or norms? Oh no. Is it real politik or the slow construction of a common moral order? Do leader disregard the common folk to their own peril? Oh, please stop. How do norms construct identities? Noooo moooore. Send the Dragons!!! By the way, just one example of how George R.R. Martin can explain a difficult political idea better…

The failure of political science

Last week the American Senate supported with a clear bi-partisan majority a decision to stop funding for political science research from the National Science Foundation. Of all disciplines, only political science has been singled out for the cuts and the money will go for cancer research instead. The decision is obviously wrong for so many reasons but my point is different. How could political scientists who are supposed to understand better than anyone else how politics works allow this to happen? What does it tell us about the state of the discipline that the academic experts in political analysis cannot prevent overt political action that hurts them directly and rather severely? To me, this failure of American political scientists to protect their own turf in the political game is scandalous. It is as bad as Nobel-winning economists Robert Merton and Myron Scholes leading the hedge fund ‘Long Tern Capital Management‘ to bust and losing 4.6 billion dollars with the help of their Nobel-wining economic theories. As Myron & Scholes’ hedge fund story revels the true real-world value of (much) financial economics theories, so does the humiliation of political science by the Congress reveal the true real-world value of (much) political theories. Think about it –  the world-leading academic specialists on collective action, interest representation and mobilization could not get themselves mobilized, organized and represented in Washington to protect their funding. The professors of the political process and legislative institutions could not find a way to work these same institutions to their own…