Diffusion of smoking bans in Europe

My paper on the diffusion of smoking bans in Europe has been accepted in Public Administration. It probably won’t be published until next year so here is a link to the pre-print and a graph of two of the important results of the paper: the probability of enactment of a more comprehensive (full) smoking ban increases with lower levels of tobacco producton and with rising levels of public support for smoking restrictions:

  And the abstract:

Policy Making Beyond Political Ideology: The Adoption of Smoking Bans in Europe

Policy making is embedded in politics, but an increasing number of issues, like obesity, tobacco control, or road safety, do not map well on the major dimensions of political conflict. This article analyzes the enactment of restrictions on smoking in bars and restaurants in 29 European countries – a conflictual issue which does not fit easily traditional party ideologies. Indeed, the comparative empirical analyses demonstrate that government ideological positions are not associated with the strictness and the timing of adoption of the smoking bans. On the other hand, economic factors like the scale of tobacco production in a country, smoking prevalence in society and public support for tough anti-smoking policy are all significantly related to the time it takes for a country to adopt smoking bans, and to the comprehensiveness and enforcement of these restrictions. In addition, horizontal policy diffusion is strongly implicated in the pattern of policy adoptions.  

Writing with the rear-view mirror

Social science research is supposed to work like this:
1) You want to explain a certain case or a class of phenomena;
2) You develop a theory and derive a set of hypotheses;
3) You test the hypotheses with data;
4) You conclude about the plausibility of the theory;
5) You write a paper with a structure (research question, theory, empirical analysis, conclusions) that mirrors the steps above.

But in practice, social science research often works like this:
1) You want to explain a certain case or a class of phenomena;
2) You test a number hypotheses with data;
3) You pick the hypotheses that matched the data best and combine them in a theory;
4) You conclude that this theory is plausible and relevant;
5) You write a paper with a structure (research question, theory, empirical analysis, conclusions) that does not reflect the steps above.

In short, an inductive quest for a plausible explanation is masked and reported as deductive theory-testing. This fallacy is both well-known and rather common (at least in the fields of political science and public administration). And, in my experience, it turns out to be tacitly supported by the policies of some journals and reviewers.

For one of my previous research projects, I studied the relationship between public support and policy output in the EU. Since the state of the economy can influence both, I included levels of unemployment as a potential omitted variable in the empirical analysis. It turned out that lagged unemployment is positively related to the volume of policy output. In the paper, I mentioned this result in passing but didn’t really discuss it at length because 1) the original relationship between public support and policy output was not affected, and 2) although highly statistically significant, the result was quite puzzling.

When I submitted the paper at a leading political science journal, a large part of the reviewers’ critiques focused on the fact that I do not have an explanation for the link between unemployment and policy output in the paper. But why should I? I did not have a good explanation why these variables should be related (with a precisely 4-year lag) when I did the empirical analysis, so why pretend? Of course, I suspected unemployment as a confounding variable for the original relationship I wanted to study, so I took the pains of collecting the data and doing the tests, still that certainly doesn’t count as an explanation for the observed statistical relationship between unemployment and policy output. But the point is, it would have been entirely possible to write the paper as if I had strong ex ante theoretical reasons to expect that rising unemployment increases the policy output of the EU, and that the empirical test supports (or more precisely, does not reject) this hypothesis. That would certainly have greased the review process, and it only takes moving a few paragraphs from the concluding section to the theory part of the paper. So, if your data has a surprising story to tell, make sure it looks like you anticipated it all along – you even had a theory that predicted it! This is what I call ‘writing with the rear-view mirror’.

Why is it a problem? After all, an empirical association is an empirical association no matter whether you theorized about it beforehand or not. So where is the harm? As I see it, by pretending to have theoretically anticipated an empirical association, you grant it undue credence. Not only is data consistent with a link between two variables, but there are strong theoretical grounds to believe the link should be there. A surprising statistical association, however robust, is just what it is – a surprising statistical association that possibly deserves speculation, exploration and further research. On the other hand, a robust statistical association ‘predicted’ by a previously-developed theory is way more – it is a claim that we understand how the world works.

Until journals and reviewers act as if proper science never deviates from the hypothetico-deductive canon, writers will pretend that they follow it. While openly descriptive and exploratory research is frowned upon, sham theory-testing will prevail.

Eventually, my paper on the links between public support, unemployment and policy output in the EU got accepted (in a different journal). Surprisingly given the bumpy review process, it has just been selected as the best article published in that journal during 2011. Needless to say, an explanation why unemployment might be related to EU policy output is still wanting.

Governing by Polls

The study of policy responsiveness to public opinion is blossoming and propagating. Work published over the last two years includes the 2010 book by Stuart Soroka and Chris Wlezien (Canada, US and the UK), this paper by Sattler, Brandt, and Freeeman on the UK,  this paper on Denmark, my own article on the EU, Roberts and Kim’s work on post-Communist Europe, etc.  The latest edition to the literature is this article by Jeffrey Lax and Justin Phillips from Columbia University (forthcoming in AJPS).

“The Democratic Deficit in the States” takes a cross-sectional rather than a dynamic (time series) perspective and analyzes both responsiveness  (correlation)  and congruence between policy outcomes and public opinion in the US states for eight policies. In short, there is a high degree of responsiveness but far from perfect congruence between majority opinion and policy. More salient policies fair better, and having powerful interest groups on your side helps. Altogether, this is an interesting and important study that adds yet another piece to our understanding of policy responsiveness.

What starts to worry me, however, is that the normative implications of the policy responsiveness literature are too often taken for granted. Lax and Phillips seem to equate the lack of correspondence between public opinion and policy to democratic deficit(similarly, Sattler, Brandt and Freeman speak of ‘democratic accountability’). But there is quite a gap between the fact the a policy contradicts the majority of public opinion and the pronouncement of democratic failure. And we need to start unpacking the normative implications of the (lack of) policy responsiveness. 

Of course, at a very general level no political system can be democratic unless there is dynamic responsiveness and broad correspondence between the wishes of the public and what government does. But can we equate congruence of policy with public opinion with democracy? I don’t think so. Precise responsiveness and congruence are neither necessary nor sufficient for democratic policy making. Why?

First and foremost, public opinion as such does not exist. One doesn’t need to embrace a radical post-modern position to admit that the numbers we love to crunch in studies of policy responsiveness are, at best, imperfect (snapshot) estimates of a fluid social construct. It is not only that estimates of aggregate public opinion are subject to the usual measurement problems. It has been shown times and again that the answers we get from public surveys are sensitive to the precise wording, form, and  context of the questions (see George Bishop’s ‘The Illusion of Public Opinion’ for an overview). The questions themselves are often vague and imprecise. Polls will elicit responses even when the people have no meaningful opinion towards the issue (opinions will be regularly given even on fictitious issues). The availability bias is often a problem, especially in surveys of the ‘most important problem’ (open vs. close forms of the question).

A second problem is that public opinion as portrayed by mass surveys need not be the same as the opinion of a group of people after they (1) have been given relevant information about the issue, (2) have been allowed ample time to think about it, and (3) have had the opportunity to deliberate about it (on deliberative polls which come with their own set of problems see James Fishkin). People know astoundingly little about current policies even when they are personally affected by them (here). Do we expect congruence and responsiveness between policy and public opinion as given over the telephone after a modicum of brain activity, or policy and public opinion as it would have been if people made informed decisions with the common good in mind?

The third problem is that public opinion is expressed on various issues presented in isolation. I can very well support an increase in spending on defense, education, and health, and a decrease in the overall state budget at the very same time.  My opinion and preferences need not be consistent but policies need to be. The problem is compounded by the possibility of preference cycles in aggregate public opinion. Even if individual opinions are rational and well-behaved, preference cycles in aggregate public opinion cannot be ruled out.

There is some unintended irony in Stimson et al. designating the aggregate of attitudes and opinions they construct the ‘policy mood’ of the public. Normatively speaking, do we really expect policy to respond to the mood of the public with all the irrationality, instability and caprice that a mood implies? All in all, the lack of perfect temporal and spatial correspondence between public opinion and policy cannot be interpreted directly as a sign of democratic deficit and failure. Political institutions translating mass preferences into policy exist for a reason (well, a number of reasons, including preference aggregation, deliberation, and inducing stability).

The other side of the same coin is that responsiveness is not sufficient for democracy. The fact that a government follows closely majority opinion as expressed in the polls and adjusts policy accordingly cannot be a substitute for a democratic policy making process. This is especially clear in my own analysis of the EU: although I find that aggregate legislative production closely follows the ebbs and flows in public support for the EU during the 1970s, 1980s, and early 1990s this cannot dispel our misgivings about the democratic deficit of the EU during this period – the polls are not a substitute for elections, representation, and accountability.

The lack of sufficient reflection on the democratic implications of the (lack of ) policy responsiveness is especially worrying in view of the tendency (identified on the basis of my subjective reading of the political process in several European states) of more and more reference to and reliance on ‘instant’ polls in making policy. The increased availability and speed of delivery of ‘representative’ public opinion polls lures politicians into dancing to the tune of public opinion on every occasion. Sensible policies are abandoned if the poll numbers are not right (e.g. second hand smoking restrictions, see here), and retrogressive policies are enacted if the percentage of public support is high enough. But government by polls is only one step removed from the government by mobs. Politicians should sometimes have different policy opinions than the public and they should have the courage to pursue these opinions in the face of (temporary and latent) opposition by the citizens. Meanwhile, social science has the important task to uncover when and how policy responsivness and congruence works. But I see no need to inflate and oversell the normative implications of the research.

The decline of the death penalty

I just finished reading ‘The Decline of the Death Penalty and the Discovery of Innocence’ (link, link to book’s website) by Frank Baumgartner, Suzana De Boef and Amber Boydstun. It is a fine study of the rise of the ‘innocence’ frame and the decline of the use of capital punishment in the US (I have recently posted about the death penalty). The book has received well-deserved praise from several academic corners (list of reviews here). In this post I want to focus on several issues that, in my opinion, deserve further discussion.

One of the major contributions of the book is methodological. The systematic study of policy frames (‘discourse’ is a related concept that seems to be getting out of fashion) is in many ways the holy grail of policy analysis – while we all intuitively feel that words and arguments and ideas matter more than standard models of collective decision making allow, it is quite tricky to demonstrate when and how these words and arguments and ideas matter. Policy frame analysis became something of a hype during the late 1970s and the 1980s, but it delivered less than it promised, so people started to look away (as this Google Ngram shows). Baumgartner, De Boef and Boydstun have produced a book with the potential to re-invigorate research into the impact of policy frames.

So far, the usual way to analyze quantitatively policy frames has been to count the number of newspaper articles on a topic, measure their tone (pro/anti) and classify the arguments into some predefined clusters (the frames). This is what the authors do with respect to the death penalty in Chapter 4. They collected all articles on capital punishment listed in the New York Times Index from 1960 till 2005, coded each article for its pro- or anti- death penalty orientation and classified the arguments found in each article into a pre-defined set of 65 possible arguments, clustered along seven dimensions (efficacy, morality, cost, constitutionality, fairness, mode of execution, and international issues) (p.107).  The approach allows one to track total attention to capital punishment, the net tone, and the relative attention to each of the seven dimensions over time. This is useful to identify, for example, the surge in attention after 1995 to issues of innocence and evidence in stories on the death penalty (p.120) which have ‘come to dominate’ the debate. Existing studies of policy frames usually stop here. But as the authors argue:

[The frequency of attention] matters, of course, but also important is the extent to which these arguments are used in conjunction with one another to form a larger cohesive frame. (p.136)

Enter evolutionary factor analysis (Chapter 5). The technique is essentially a series of factor analyses performed on overlapping 5-year time windows. Factor analysis identifies inductively (from the data) which arguments tend to go together. So you start with a factor analysis of the arguments contained in the articles published in 1960 to 1965 treating each year as a single observation. You repeat for each 5-year period (1961 to 1966, 1962-1967, etc.), track the clusters of arguments (the frames) that seem stable, and trace how they move and change over time. Using this approach, the book claims that a set of 16 arguments centered around ‘innocence’ (the frame) emerged in 1992, captured the debate and is still going strong. Since 13 of these arguments are anti-death penalty, the rise of the innocence frame is responsible for the increasingly anti-death penalty tone of the newspaper coverage. As I said, the approach is path-breaking and holds lots of promise, but I have one concern. Currently, each factor analysis is based on 65 variables (since the authors ignore all arguments that appeared less than five times in any five-year period, the effective number of variables is much smaller but still usually greater than the number of observations) and 5 observations only (the years). This introduces lots of noise in the data (as the authors themselves acknowledge) and necessitates a series of more or less arbitrary decision to get rid of statistical flukes. Rules of thumb about sample size in factor analysis often recommend a minimum of 100 observations and at least twice as many observations as variables [factanal in R even refuses to perform the factor analysis with more variables than observations; SPSS obeys]. So there is a potential problem, but what is a bit puzzling to me is that there seems to be a pretty obvious way to address the problem; a way which the authors do not discuss:
Why not run the factor analyses on all articles that appear in a year, taking the individual article as the unit of observation?
True, many articles are coded to feature only one argument, but the median number of arguments per article is two, and there are 1635 articles (so more than 40% of the sample) that have more than two arguments (that’s based on my quick-and-dirty calculations from the replication dataset available here). Apart from providing more observations, taking the article as a unit of observation makes theoretical sense as well – we want to know whether frames dominate individual contributions (articles), as well as the macro-debate in a given year.

Having demonstrated the rise and the recent dominance of the ‘innocence’ frame, in Chapter 6 Baumgartner, De Boef and Boydstun proceed to estimate the impact of ‘net tone’ on public opinion. As explained above, the book attributes the major changes in ‘net tone’ (pro- vs. anti- sentiment of the newspaper articles) to the changing frames, so indirectly this is testing the impact of frames as well. Using a vector error-correction model, the authors argue that levels of public opinion are ‘positively related to levels of homicides [control variable] and pro-death penalty media coverage’ (p.187). Chapter 7 turns to explaining the number of annual death sentences and concludes that both media ‘net tone’ and public opinion are significantly associated with this policy indicator. I wouldn’t be too quick to attribute any causal power to media tone, however. If one takes seriously the first part of the book, then the policy frame emerges as a potential confounding variable that works both directly (through framing the thinking of policy makers, jurors and judges) and indirectly through the media. If that was the case, the effect of media tone would be exaggerated in the statistical models as it would pick up to the direct effect of the policy frame as well. One can make a similar case for the effect of public opinion. I also prefer investigating more directly the direction of causality in such systems of variables that move together over time (using Granger causality tests or VAR) –  I see little theoretical reason why the number of death sentences cannot have an impact on public opinion, for example.

A bigger threat to the integrity of the story about the rise of ‘innocence’ and the decline of the death penalty, however, is the persistence of important differences among states in public opinion and the number of death sentences and executions. Since this book focuses on the tone and framing of the death penalty debate in a national media (NYT), it cannot address the question of cross-state variation. But I think it is a valid question, and one that deserves more research, whether the population and policy makers in some states are less sensitive (immune?) than others to the effects of framing, or whether they are exposed to different media with a different net tone and using a different frame. A recent paper by Kenneth Shirley and Andrew Gelman shows that black males, and to a lesser extent black females, “have shown the sharpest decline in support” over time (p.31, see also Figure 8 ) while the net change in support for the death penalty among non-black men and women is quite small (Figure 9).  It would seem that the ‘innocence’ frame has resonated much more (only?) with black people, and black people have responded stronger, and faster, to the arguments put forward by the frame. Perhaps the fact that many of the individuals exonerated from death row have been black can explain the differentiated impact of the innocence frame. In any case, there are interesting synergies between Shirley and Gelman’s study with its emphasis on individual and state differences and Baumgartner et al.’s focus on variation over time.

To conclude this rather lengthy post, the ‘The Decline of the Death Penalty and the Discovery of Innocence’ uncovers an exciting new direction for policy frames research. In fact, I am already starting a project attempting to apply the evolutionary factor analysis approach to policy framing in the context of anti-smoking policy in Europe.