Is unit homogeneity a **sufficient** condition (assumption) for causal inference from observational data?

Re-reading King, Keohane and Verba’s bible on research design [lovingly known to all exposed as KKV] I think they regard unit homogeneity and conditional independence as** alternative** assumptions for causal inference. For example: “we provide an overview here of what is required in terms of the two **possible** assumptions that enable us to get around the fundamental problem [of causal inference]” (p.91, emphasis mine). However, I don’t see how unit homogeneity on its own can rule out endogeneity (establish the direction of causality). In my understanding, endogeneity is automatically ruled out with conditional independence, but not with unit homogeneity (“*Two units are homogeneous when the expected values of the dependent variables from each unit are the same when our explanatory variables takes on a particular value*” [p.91]).

Going back to Holland’s seminal article which provides the basis of KKV’s approach, we can confirm that unit homogeneity is listed as a **sufficient** condition for inference (p.948). But Holland divides variables into **pre-exposure** and **post-exposure** before he even gets to discuss any of the additional assumptions, so reverse causality is ruled out altogether. Hence, in Holland’s context unit homogeneity can indeed be regarded as sufficient, but in my opinion in KKV’s context unit homogeneity needs to be coupled with some condition (temporal precedence for example) to ascertain the causal direction when making inferences from data.

The point is minor but can create confusion when presenting unit homogeneity and conditional independence side by side as alternative assumptions for inference.

Hi there, just became alert to your blog through

Google, and found that it is really informative. I’m

gonna watch out for brussels. I’ll be grateful if you continue this in future.

Many people will be benefited from your writing. Cheers!

Hi Ditmer,

Thank you so much for this post. I am currently reading KKV in graduate school and came across a related issue that you might be able to help me with:

What puzzles me is that the authors demand that the expected values of the dependent variable for each unit has to be THE SAME for same values data points on explanatory variable. This assumption just seems very unrealistic. Shouldn’t we expect random fluctuations across these values in real world data? This seems dubious especially if you assume that our data is generated in a probabilistic world…

I might be reading their assumption wrong so it would be great to have your input on this! For instance, their statement would make a lot more sense if they referred to the average treatment effect of treatment variables within the same group of observations. I am just very confused that they explicitly ask for the (exact) same independent variable value to match the predicted value in the model.

Thanks a lot for your help!