Skip to content

RE-DESIGN Posts

Hyperlinks

Big data for evaluating education Should have been done long ago, no? Neanderthals painted Most relaxing song ever [?!?]  Testosterone, digit ratio, and abstract reasoning ability [via MindBlog] ‘North Korea’ by Damir Šagolj 1st Prize World Press Photo 2012.  Daily Life Category

Weighted variance and weighted coefficient of variation

Often we want to compare the variability of a variable in different contexts – say, the variability of unemployment in different countries over time, or the variability of height in two populations, etc. The most often used measures of variability are the variance and the standard deviation (which is just the square root of the variance). However, for some types of data, these measures are not entirely appropriate. For example, when data is generated by a Poisson process (e.g. when you have counts of rare events) the mean equals the variance by definition. Clearly, comparing the variability of two Poisson distributions using the variance or the standard deviation would not work if the means of these populations differ. A common and easy fix is to use the coefficient of variation instead, which is simply the standard deviation divided by the mean. So far, so good. Things get tricky however when we want to calculate the weighted coefficient of variation. The weighted mean is just the mean but some data points contribute more than others. For example the mean of 0.4 and 0.8 is 0.6. If we assign the weights 0.9 to the first observation [0.4] and 0.1 to the second [0.8], the weighted mean is (0.9*0.4+0.1*0.8)/1, which equals to 0.44. You would guess that we can compute the weighted variance by analogy,  and you would be wrong. For example, the sample variance of {0.4,0.8} is given by [Wikipedia]: or in our example ((0.4-0.6)^2+(0.8-0.6)^2) / (2-1) which equals to 0.02. But, the weighted sample variance cannot be computed by…

The Good, the Bad, and the Stranger

Once upon a time, in a land far away, there lived two brothers. The first brother was like an ox: strong, dutiful, and hard-working. The second brother was like a rotten apple – useless, menacing, and foul. The first brother set up a small enterprise, which quickly took root and sprawled. Soon, he needed to hire a helping hand. He could either employ his brother, who was wicked and lazy but still a relation, or a Stranger, who was diligent and qualified, but came from some distant God-forsaken place. At this point the story forks and you, the reader, have to choose which path to take: – You hire the stranger. The enterprise grows and prospers. Your brother vanishes in misery. Every Christmas you send him a present to an address he has long abandoned. This is the way of the capitalist. – You hire the brother. He might be trouble, but he is of your own blood. And, on his advice, you close your community to strangers. Soon, your brother stops showing up for work, and when he does, he shows up drunk. You quarrel and curse, but you stay loyal, and the enterprise rapidly goes into wreck. But you go down together. This is the way of the nationalist. – You hire the stranger. Every month you take a generous slice from your profit and a big cut from the stranger’s salary, and you give them to your brother. Your brother acquires a big TV, junk food addiction, and…

No use for big data in electioneering, according to Hollywood

Over the last year two major Hollywood movies that touch upon the use of big data and sophisticated data analysis hit the big screen. Which, of course, is two more than the mean (or was that the median). Moneyball shows how crunching numbers helps win baseball games and Margin Call shows how crunching numbers helps ruin financial firms. It’s kind of fun to see Brad Pitt and Kevin Spacey stare at spreadsheets and nod approvingly while being explained some statistical subtleties. But watching someone stare at somebody else’s spreadsheets quickly becomes tiresome … which probably explains why Regressing with the Stars, Dotchart Master, and America’s Next Multilevel Model haven’t yet taken over reality TV. So I was really disappointed to see that a third 2011 movie – The Ides of March – misses a golden opportunity to show the use of big data and sophisticated analysis for winning elections. The movie revolves around the primary presidential campaign of George Clooney (pardon, Governor Mike Morris) and the dirty politics behind the scenes. But for Hollywood in 2011, electioneering is still a game of horse-trading, media spinning and good-ol’ stabs in the back. All these things about election campaigns are probably true, but I was disappointed that there were no fancy graphs plotting approval ratings and prediction market quotes, no real-time election forecasts (or nowcasts) at which  George Clooney to stare and nod approvingly, no GIS-supported campaign targeting, not even focus groups, twits, facebook pages, not to speak of google circles. Now,…

Hyperlinks

Science visualization challenge 2011 192 answers to the question ‘What is your favorite deep, elegant, or beautiful explanation?’ Higher education for the masses [commentary by Felix Salmon] Researchers feel pressure to cite superfluous papers