Posing arbitrary scope conditions to causal arguments leads to the same problem as subgroup analysis: the ‘results’ are too often just random noise.

Research Design Matters

Imagine the following simple setup: there are two switches (X and Z) and a lamp (Y). Both switches and the lamp are ‘On’. You want to know what switch X does, but you have only one try to manipulate the switches. Which one would you choose to switch off: X, Z or it doesn’t matter? These are the results of the quick Twitter poll I did on the question: Two switches X and Z control lamp Z. Both switches & the lamp are On. You wanna learn what X does. You have one try. Which switch to press? — Dimiter Toshkov (@DToshkov) September 4, 2017 Clearly, almost half of the respondents think it doesn’t matter, switching X is the second choice, and only 2 out of 15 would switch Z to learn what X does. Yet, it is by pressing Z that we have the best chance of learning something about the effect of X. This seems quite counter-intuitive, so let me explain. First, let’s clarify the assumptions embedded in the setup: (A1) both switches and the lamp can be either ‘On’ [1 ] or ‘Off’ [0]; (A2) the lamp is controlled only by these switches; there is nothing outside the system that controls its output; (A3) X and Z can work individually or in combination (so that the lamp is ‘On’ only if both switches are ‘On’ simultaneously). Now let’s represent the information we have in a table: Switch X Switch Z Lamp Y 1 1 1 0 0 0 We are…

Having graded another batch of 40 student research proposals, the distinction between ‘unit of analysis’ and ‘unit of observation’ proves to be, yet again, one of the trickiest for the students to master. After several years of experience, I think I have a good grasp of the difference between the two, but it obviously remains a challenge to explain it to students. King, Keohane and Verba (1994) [KKV] introduce the difference in the context of descriptive inference where it serves the argument that what often goes under the heading of a ‘case study’ often actually has many observations (p.52, see also 116-117). But, admittedly the book is somewhat unclear about the distinction and unambiguous definitions are not provided. In my understanding, the unit of analysis (a case) is at the level at which you pitch the conclusions. The unit of observation is at the level at which you collect the data. So, the unit of observation and the unit of analysis can be the same but they need not be. In the context of quantitative research, units of observation could be students and units of analysis classes, if classes are compared. Or students can be both the units of observation and analysis if students are compared. Or students can be the units of analyses and grades the unit of observations if several observations (grades) are available per student. So it all depends on the design. Simply put, the unit of observation is the row in the data table but the unit of analysis can…

Here is the result of my attempt to use Prezi during the last presentation for the class on Research Design in Public Administration. I tried to use Prezi’s functionality to provide in a novel form the same main lessons I have been emphasizing during the six weeks (yes, it is a short course). Some of the staff is obviously an over-simplification but the purpose is to focus on the big picture and draw the various threads of the course together. Prezi seems fun but I have two small complaints: (1) the handheld device I use to change powerpoint slides from a distance doesn’t work with Prezi, and (2) I can’t find a way to make staff (dis)appear ala PowerPoint without zooming in and out .

Is unit homogeneity a sufficient condition (assumption) for causal inference from observational data? Re-reading King, Keohane and Verba’s bible on research design [lovingly known to all exposed as KKV] I think they regard unit homogeneity and conditional independence as alternative assumptions for causal inference. For example: “we provide an overview here of what is required in terms of the two possible assumptions that enable us to get around the fundamental problem [of causal inference]” (p.91, emphasis mine). However, I don’t see how unit homogeneity on its own can rule out endogeneity (establish the direction of causality). In my understanding, endogeneity is automatically ruled out with conditional independence, but not with unit homogeneity (“Two units are homogeneous when the expected values of the dependent variables from each unit are the same when our explanatory variables takes on a particular value” [p.91]). Going back to Holland’s seminal article which provides the basis of KKV’s approach, we can confirm that unit homogeneity is listed as a sufficient condition for inference (p.948). But Holland divides variables into pre-exposure and post-exposure before he even gets to discuss any of the additional assumptions, so reverse causality is ruled out altogether. Hence, in Holland’s context unit homogeneity can indeed be regarded as sufficient, but in my opinion in KKV’s context unit homogeneity needs to be coupled with some condition (temporal precedence for example) to ascertain the causal direction when making inferences from data. The point is minor but can create confusion when presenting unit homogeneity and conditional independence side by side as alternative assumptions for inference.

I am teaching again the Research Design class for the MSc in Public Administration at Leiden University. It is a rather challenging course since the background of the students is so diverse (from Religious Studies to Psychology to International Relations) and because most of the students have very little training and a certain dislike for any formal method of data analysis. Here is the course outline that we prepared (with my colleague Brendan Carroll). All comments and suggestions are more than welcome.