Unit of analysis vs. Unit of observation

Having graded another batch of 40 student research proposals, the distinction between ‘unit of analysis’ and ‘unit of observation’ proves to be, yet again, one of the trickiest for the students to master.

After several years of experience, I think I have a good grasp of the difference between the two, but it obviously remains a challenge to explain it to students. King, Keohane and Verba (1994) [KKV] introduce the difference in the context of descriptive inference where it serves the argument that what often goes under the heading of a ‘case study’ often actually has many observations (p.52, see also 116-117). But, admittedly the book is somewhat unclear about the distinction and unambiguous definitions are not provided.

In my understanding, the unit of analysis (a case) is at the level at which you pitch the conclusions. The unit of observation is at the level at which you collect the data. So, the unit of observation and the unit of analysis can be the same but they need not be. In the context of quantitative research, units of observation could be students and units of analysis classes, if classes are compared. Or students can be both the units of observation and analysis if students are compared. Or students can be the units of analyses and grades the unit of observations if several observations (grades) are available per student. So it all depends on the design. Simply put, the unit of observation is the row in the data table but the unit of analysis can be at a higher level of aggregation.

In the context of qualitative research, it is more difficult to draw the difference between the two, also because the difference between analysis and observation is in general less clear-cut. In some sense, the same unit (case) traced over time provides distinct observations but I am not sure to what extent these snap-shots would be regarded as distinct ‘observations’ by qualitative researchers. 

But more importantly, I start to feel that the distinction between units of analysis and units of observation creates more confusion rather than more clarity. For the purposes of research design instruction, we would be better off if the term ‘case’ did not exist at all so we could simply speak about observations (single observation vs. single case study, observation selection vs. case selection, etc.) Of course, language policing never works so we seem to be stuck in an unfortunate but unavoidable ambiguity.

Is unit homogeneity a sufficient assumption for causal inference?

Is unit homogeneity a sufficient condition (assumption) for causal inference from observational data?

Re-reading King, Keohane and Verba’s bible on research design [lovingly known to all exposed as KKV] I think they regard unit homogeneity and conditional independence as alternative assumptions for causal inference. For example: “we provide an overview here of what is required in terms of the two possible assumptions that enable us to get around the fundamental problem [of causal inference]” (p.91, emphasis mine). However, I don’t see how unit homogeneity on its own can rule out endogeneity (establish the direction of causality). In my understanding, endogeneity is automatically ruled out with conditional independence, but not with unit homogeneity (“Two units are homogeneous when the expected values of the dependent variables from each unit are the same when our explanatory variables takes on a particular value” [p.91]).

Going back to Holland’s seminal article which provides the basis of KKV’s approach, we can confirm that unit homogeneity is listed as a sufficient condition for inference (p.948). But Holland divides variables into pre-exposure and post-exposure before he even gets to discuss any of the additional assumptions, so reverse causality is ruled out altogether. Hence, in Holland’s context unit homogeneity can indeed be regarded as sufficient, but in my opinion in KKV’s context unit homogeneity needs to be coupled with some condition (temporal precedence for example) to ascertain the causal direction when making inferences from data.

The point is minor but can create confusion when presenting unit homogeneity and conditional independence side by side as alternative assumptions for inference.