Gelman's Blog

Personally, I prefer to write about literature, but, yes, I recognize that these sorts of entries are the bread and butter of this blog. (The posts on bad graphics are the red meat, of course, but that's another story. And, do we offer any vegetables?)

3 hours 47 min ago

Daniel Kramer writes:

In your book, Data Analysis Using Regression and Multilevel..., you state that in situations when little group-level variation is observed, multilevel models reduce to classical regression with no group indicators. Does this essentially mean that with zero group-variation, predicted coefficients, deviance, and AIC would be the same to estimates obtained with classical regression? I ask because I have been asked by an editor to adopt a multimodel inference approach (Burnham and Anderson 2002) in my analysis. Typically a small set of candidate models are ranked using an information theoretic criterion and model averaging may be used to derive coefficient estimates or predictions. Thus, would it be appropriate to compare single-level and multi-level models derived from the same data set using AIC? I am skeptical since the deviance for the null models are different. Of course, there may be no reason to compare single and multi-level models if there is no cost (i.e. reduced model fit) in adopting a multi-level framework as long as the case can be made that the data are hierarchical. The only cost you mention in your book is the added complexity.

My reply: If you're fitting multilevel models, perhaps better to use DIC than AIC. DIC has problems too (search this blog for DIC for discussion of this point), but with AIC you'll definitely run into trouble counting parameters. (BIC has other, more serious problems: it's not a measure of predictive error the way that AIC and DIC are.) More generally, I can see the appeal of presenting and averaging over several models, even if I've rarely ended up doing that myself. Indeed, I'd prefer to let everything of interest vary by group.

P.S. What's the deal with the 256-character limit on titles, anyway?

Categories: Popular Blogs

Some zombies, some of the time

7 hours 1 min ago

Blake Messer writes:

I read your blog frequently, and just wanted to show you this model I made, revisiting the "Smith?" publication's findings on zombie outbreak dynamics.
Categories: Popular Blogs

My second interaction with Eliot Spitzer

9 hours 38 min ago

I've only met Eliot Spitzer once, back when he was the state Attorney General. I was part of a group presenting the findings of a study of racial patterns of police stops in the city. (See here for a writeup of our findings.) Spitzer asked a few questions during the meeting, and I was impressed by his intelligence. Maybe that's how people feel after meeting Bill Clinton, I dunno.

Recently, I had an opportunity for another interaction with Spitzer, this time indirectly, when Sarah Binder, John Sides, and I wrote a brief discussion of an article he wrote in the Boston Review on government's proper role in the market. Spitzer argues for a clearer definition of the role of government as a setter and enforcer of rules in the financial marketplace; as he puts it, "even though private companies compete, only government can ensure that there is competition. Everybody in business wants to be a monopolist. There's nothing wrong with wanting more market share. That's how you make money." He has lots of good stories:

When an investment bank does an IPO, and the IPO is hot--the stock is going to jump on that first day of sale--they give some of these hot stocks to the CEOs of their clients. Why? To keep them happy, so they stay as clients. As attorney general I said that should not be permitted; it violates the fiduciary duty of the CEO to the company. If the investment bank wants to give away something of value to keep a company as a client, it should give it to the shareholders, not the CEO. There's an uglier term for spinning: commercial bribery. In 2002 we negotiated a global deal and outlawed it. People got outraged. One extremely powerful regulator today, a Peter-Principle-on-Steroids survivor, asked me then, "Don't CEOs have any rights anymore?"

Spitzer makes a pretty convincing case that the current system (in which rich dudes pass multibillion dollar favors back and forth to each other) isn't good. I mean, sure, we all help out our friends, but I think there's a difference between giving your brother-in-law the contract for paving your parking lot, and these big-money financial deals.

I think the conservative argument against Spitzer's position would be that, sure, it would be great to have an impartial referee but that, realistically, the government is itself a special interest, and that in the economic realm businesses might need more protection from the government than from each other. I'm guessing that, in response to this particular argument, Spitzer might say that, yes, government corruption (or simply inefficiency or even well-meaning but obstructive regulations) are indeed a concern, but that such concern can best be addressed by more clearly defining the role of government intervention in the financial system, rather than by first denying such a role and then rushing in with the occasional trillion-dollar bailout.

The Boston Review article is accompanied by discussions from Dean Baker, Robert Johnson, and Binder/Sides/Gelman, and a response from Spitzer.

In our discussion, Sarah, John, and I don't address the substance of Spitzer's proposals--none of us being economics experts--but instead talk about how his ideas might fare in public opinion and in Congress. We write that political and institutional hurdles for lawmakers are high and could put serious reform out of reach:

The public also manifests considerable ambivalence about government regulation, consistently opposing regulation in general terms, while supporting it in specific cases.

In a January 2010 Gallup poll, just over half of respondents were "worried" about "too much regulation of business by the government," while just over a third worried about insufficient regulation. Similarly, half believed that government should be "less involved in regulating business," compared to a quarter who thought government should be more involved (the rest thought "things are about right the way they are"). . . .

Yet, perhaps reflecting the "angry populism" Spitzer cites, majorities in recent surveys support regulations that would limit the size and activities of the largest banks, create a special tax on large bonuses, and take steps to limit CEO pay at large companies. This ambivalence vis-à-vis regulation might be a product of views on business itself. . . . majorities oppose certain business practices and, in the case of the financial crisis, blame financial institutions. Majorities of Americans are willing to regulate businesses implicated in the financial crisis even as most oppose government regulation of business in generic terms.

What might our elected representatives do? We write that "they disagree over how future crises could and should be averted. . . . the intense partisan polarization of recent years could derail new legislation, even if lawmakers agree on a policy."

Given that a majority of Americans support reforms to the financial industry, wouldn't opposing reform be costly? Not necessarily. Policymakers routinely escape punishment when they are out of step with majority sentiment. In fact, one of Spitzer's examples of "core values"--the minimum wage--demonstrates this directly. As Princeton political scientist Larry M. Bartels documents in Unequal Democracy, most Americans--upwards of 80 percent, in some surveys--support an increase in the federal minimum wage. But actual increases are few and far between, and the minimum wage has declined substantially in real-dollar terms.

The financial crisis may prove a more salient issue than the minimum wage, making Americans more attentive to congressional action or inaction, but this remains to be seen. Americans are typically more attuned to the performance of the economy than to the specifics of policies that could affect it.

The financial bailout in 2008 was bipartisan--I assume that the experts told George W. Bush that this was something that had to be done, or else the economy would fall off a cliff. Barack Obama and many leading Democrats supported the emergency actions also, perhaps from similar feelings of economic urgency, or perhaps for fear that an even more drastic plan would have to be implemented after the Democrats' anticipated victory in November, or maybe for other reasons. In any case, that bipartisan moment seems to have passed.

Categories: Popular Blogs

Wittgenstein would be amused

March 10, 2010 - 4:25pm

When writing this comment, I learned that it isn't so easy to spell "Wittgenstein." I had to try several times. Luckily, it's in the spell-checker so I eventually got it by trial and error. Quine's in the spell-checker (but, oddly enough, not "Quine's"), but Tarski isn't.

Some others:

wittgenstein (lower-case): fails the spell-checker.
Wittgenstein's is ok, though. As it should be. So I don't know why Quine works but Quine's didn't.
Knuth. Yes.
Russell. Yes.
Whitehead. Yes.
Gelman. No.
Meng. No.
Rubin. Yes.

Hey, that's not fair!

Categories: Popular Blogs

Relative power of the upper and lower house in different countries?

March 10, 2010 - 3:58pm

We're doing a project involving political representation in different countries (related to the USA Today effect), and one thing we need is a measure of the relative power of the lower and upper house in each country. In the U.S., the power is divided roughly evenly between House and Senate; in the U.K., it's nearly all in the House of Commons; in other countries, ...? Is there a standard measure of this somewhere?

Categories: Popular Blogs

Model checking: it's not just for statisticians

March 10, 2010 - 7:48am

Regular readers will know the importance I attach to model checking: to the statistical paradigm in which we take a model seriously, follow its implications, and then look carefully for places where these implications don't make sense, thus revealing problems with the model, which can then trace backwards to understand where your assumptions went wrong.

This sort of reasoning can be done qualitatively also. From Daniel Drezner, here's a fun example, an analysis of a recent political bestseller:

I [Drezner] hereby retract any and all enthusiasm for Game Change-- because I don't know which parts of it are true and which parts are not. . . . It was on page 89 that I began to wonder just how much Game Change's authors double-checked their sources. This section of the book recounts entertainment mogul David Geffen's "break" with Hillary Clinton's presidential campaign: The reaction to the column stunned Geffen. Beseiged by interview requests, he put out a statement saying Dowd had quoted him accurately. Some of Geffen's friends in Hollywood expressed disbelief. Warren Beatty told him, She's going to be president of the United States--you must be nuts to have done this. But many more congratulated Geffen for having the courage to say what everyone else was thinking but was too afraid to put on the record. They said he'd made them feel safer openly supporting or donating to Obama. Soon after, when Geffen visited New York, people in cars on Madison Avenue beeped their horns and gave him the thumbs-up as he walked down the street (emphasis added [by Drezner]).

A self-refuting sentence indeed. Don't these guys have an editor? This reminds me of our recent discussion of the economics of fact checking.

Another hypothesis is that John Heilemann and Mark Halperin--the authors of Game Change--realized all along that the thumbs-up-on-Madison-Avenue story was implausible, but they felt that it was a good quote to include in order to give a sense of where Geffen was coming from. From this perspective, it should be obvious to the reader that the sentence beginning "Soon after, when Geffen visited New York" was a Geffen quote, nothing more and nothing less. In a book based on interviews, it would just be too awkward to explicitly identify each quotation as, for example, writing, "Geffen told us that soon after he visited New York, people in cars . . ." Sure, that latter version would be more accurate but would disrupt the flow.

Similar reasoning might explain or excuse David Halberstam's notorious errors in his baseball book that were noted by Bill James: Halberstam's goal was not to convey what happened but rather to convey the memories of key participants. Similarly, maybe the point of Game Change is to tell us what people recall, not what was actually happening. An oral history presented in narrative form.

P.S. For more on model checking from a Bayesian statistical perspective, see chapter 6 of Bayesian Data Analysis or this article. Or, if you prefer it in French, this.

Categories: Popular Blogs

This is what is done

March 9, 2010 - 2:09pm

This is from a commercial software package:

This is page 1 of a 66-page document. This was essentially impossible to follow on the screen, so I printed it out in 6-pages-per-sheet format, at which size the tiny text was difficult but barely possible to read.

Now here's a fun assignment. How many flaws can you find in this display? Here's what I noticed (in no particular order):

- Sizing of graph is inconsistent with sizing of text: If the text is readable, the graph is too huge to process; if the graph is sized nicely (so that many pages can be fit on a single sheet), the text is tiny.

- The cyclical display of a piechart is inappropriate for these ordered variables.

- The ordering isn't even done right! See the legend: "6-10 hours" should be between "less than 1 hour" and "over 10 hours." The categories appear to have been sorted either alphabetically or in order of increasing frequency, neither of which makes sense here?

- At least one category (1-5 hours) is missing. Even if there are no responses to this one, it should be included.

- The labels 2, 3, 4 on the pie wedges have no connection to anything else on the page.

- The wedges should be labeled directly; no need for the reader to have to go back and forth between the legend and the picture. This will also allow you to get rid of the distracting colors. As Ed Tufte would say, if you want pretty colors, include a pretty picture with your report--but don't clutter your graphs with that stuff!

- If you are going to label the wedges, put the labels right there; the steppy lines connecting the labels to the pie are unnecessary; they just add to the complexity of the presentation. (Perhaps this is a desired effect, to make the results look more professional, in some sense?)

- 100% is written, inappropriately, as "100.00%".

- The text of the question (very top of the page) is completely separated from the responses (at the bottom).

- In fact, it's easy to miss the question entirely. The main label of the graph is some sort of gobbledygook. This sort of identifying information would be better kept in small print in the lower-right corner of the display. (If you're doing this, I'd also recommend a date and time identifier. Datasets get modified all the time.)

- With the distorted shape of the pie chart and the edge added to the lower border, the visible area of each wedge no longer corresponds to the numbers being displayed.

- The percentages are inappropriately displayed as 22.2%, 33.3%, 44.4%; they should be 22%, 33%, 44%. It's just about never meaningful to look at fractions of percentages: we are almost never studying proportions that can be measured to that level of accuracy. Certainly not with a sample of size 9, but even if n=10,000 I would round all percentages to the nearest integer.

Somebody paid money for this!

Categories: Popular Blogs

Campaigning, governing, and the complexity of political speeches

March 9, 2010 - 9:01am

Sanjay Srivista draws some interesting connections between a recent Obama speech and a paper by P. E. Tetlock published in a psychology journal in 1981 (!). In general, I think we as political scientists don't interact enough with research in psychology.

Categories: Popular Blogs

More evidence that "rent-seeking" is economist-speak for "something I don't like"

March 9, 2010 - 8:48am

That said, it's an interesting discussion.

P.S. Yes, I know the phrase has a technical meaning, but my impression is that it's more used as a generalized put-down.

Categories: Popular Blogs

What if everything happened according to plan?

March 8, 2010 - 1:57pm
I had occasion to revisit this graph: And then, it suddenly struck me: what if everything had gone as planned? From the perspective of Obama's reelection chances, the light blue graph ("without recovery plan") is much better than the dark... Andrew Gelman http://www.stat.columbia.edu/~gelman
Categories: Popular Blogs

Cutting chartjunk and red tape

March 8, 2010 - 11:18am

Jared points me to this report that Ed Tufte was appointed by the government to help visualize stimulus funds:

The purpose of the panel is to advise the Recovery Accountability and Transparency Board, whose aim is "To promote accountability by coordinating and conducting oversight of Recovery funds to prevent fraud, waste, and abuse and to foster transparency on Recovery spending by providing the public with accurate, user-friendly information."

Cool. I've been on many government panels but never anything so important. Once I was asked to serve on a panel to evaluate the evidence regarding a particular health risk. I agreed and they promptly sent me about 25 pages of paperwork to fill out. This seemed like too much, so I told them I'd be happy to serve for free. Turning down all compensation and reimbursement helped, but I still was left with about 6 complicated forms. I don't think I ever got around to completing them all. On the upside, the government has supported my research with many millions of dollars over the years, so they must be doing something right!

Seriously, though, this is wonderful news. It's great to see someone like Tufte involved with communication of public data.

Categories: Popular Blogs

Cutting chartjunk and red tape

March 8, 2010 - 11:12am
Jared points me to this report that Ed Tufte was appointed by the government to help visualize stimulus funds: The purpose of the panel is to advise the Recovery Accountability and Transparency Board, whose aim is "To promote accountability by... Andrew Gelman http://www.stat.columbia.edu/~gelman
Categories: Popular Blogs

Are High-Quality Schools Enough to Close the Achievement Gap? Evidence from a Bold Social Experiment in Harlem

March 8, 2010 - 9:23am

This note on charter schools by Alex Tabarrok reminded me of my remarks on the relevant research paper by Dobbie and Fryer, remarks which I somehow never got around to posting here. So here are my (inconclusive) thoughts from a few months ago:

Steve Levitt links to this article by Will Dobbie and Roland Fryer on an educational innovation to improve the education of ethnic minority children. Dobbie and Fryer write:

Harlem Children's Zone (HCZ) is arguably the most ambitious social experiment to alleviate poverty of our time. We [Dobbie and Fryer] provide the first empirical test of the causal impact of HCZ on educational outcomes, with an eye toward informing the long-standing debate whether schools alone can eliminate the achievement gap or whether the issues that poor children bring to school are too much for educators to overcome.

Their conclusions are extremely positive:

Harlem Children's Zone is enormously effective at increasing the achievement of the poorest minority children. Taken at face value, the effects in middle school are enough to reverse the black-white achievement gap in mathematics and reduce it in English Language Arts. The effects in elementary school close the racial achievement gap in both subjects. Harlem Gems and The Baby College, the only two community programs in HCZ that keep detailed administrative data, show mixed success. We conclude by presenting three pieces of evidence that high-quality schools or high-quality schools coupled with community investments generate the achievement gains. Community investments alone cannot explain the results.

Here's how they address the potential concern that kids in the program will be better-prepared than the control group of kids not in these schools:

We implement two identification strategies. First, we exploit the fact that HCZ charter schools are required to select students by lottery when the demand for slots exceeds supply. Second, we use the interaction between a student's home address and cohort year as an instrumental variable.

Here's the punch line:

"Winners" here are students who receive a winning lottery number or who are in the top ten of the waitlist.

They also show results for English tests which are positive, but less impressive. They remark that, "Interventions in education often have larger impacts on math scores as compared to [English] scores (e.g. Decker et al., 2004; Rockoff, 2004; Jacob, 2005). This may be because it is relatively easier to teach math skills, or that reading skills are more likely to be learned outside of school. Another explanation is that language and vocabulary skills may develop early in life, making it difficult to impact reading scores later (Hart and Risley, 1995)."

What does this all mean?

I haven't looked at the statistical details of this paper--that's hard work!--but I do have a few comments, to be made on the assumption that Dobbie and Fryer's analysis is essentially correct.

My first comment is that my mindset, before reading this paper, was that more effective teaching methods do exist--KIPP and the like--and that the way they work is by getting the teachers and students to work harder and longer than is usual during the school day. The Dobbie and Fryer paper did not change my view on this; they write "Our rough estimate is that Promise Academy students that are behind grade level are in school for twice as many hours as a typical public school student in New York City. Students who are at or above grade level still attend the equivalent of about fifty percent more school in a calendar year."

This is not to dismiss the findings--it's not so easy to motivate teachers and students to work twice as hard--but just to connect these results to other things that I've heard.

My second comment is that these schools are described as a way to close the gap between whites and blacks in school performance. But if they're so effective, maybe they'd be applied to white kids also? Or is the point that these school changes would really only be applied as part of a package of interventions in predominanty-minority neighborhoods? I'd like to hear more about this issue in the Conclusion section of the article, which raises the idea of following up in regular public schools.

Silly little things

Dobbie and Fryer's paper has excellent graphs--something you don't always see in work by economists. I'm happy to see that the top economists are presenting their work graphically--this seems like an excellent sign. I just have a couple of minor comments:

I'd prefer if Figure 1 (the map) were shown in a non-distorted way and with more information that is relevant to the study. For example, more information about exactly where the kids live, where the schools are, etc. The existing map is hard to read partly because it is distorted (or so it looks to my eyes), meaning that the distance scale is not so meaningful, also the orange background color makes it hard to see any details at all. Beyond this, the map includes irrelevant information such as the path of the Central Park road; this is the sort of thing that Ed Tufte correctly calls "chartjunk." In this case, the authors didn't add the chartjunk; they just put their info on an existing map. Nonetheless, the end result of this otherwise-potentially-useful map is to show nothing much more than that the Harlem Chlidren's Zone is, indeed, located in Harlem.

Figure 2 is just great. I have only three small suggestions:
- Reduce the y-axis scale. There's no reason to go all the way from -.6 to +.5; you can restrict to the range of the data, which is from -.4 to +.3. Even a small change like this will help a lot, actually.
- There's something weird going on with the y-axis. You can't put "percent enrolled" on the same scale as test scores! That's like saying that my groceries cost $25 and it's 15 degrees out, so my groceries are higher than the temperature. Also, you have to be careful with the whole "percentage" thing. Does ".2" on the percentage scale correspond to 0% or to 20%.
- Also, once you get rid of the percentage thing, you can really expand the scale, because the red and blue lines are all between -.4 and .02 on the y-axis.
- Beyond this, how to we interpret a test score of -.2? That doesn't seem right. I assume that the actual scores are positive, and that this is all explained in the text, but I really think that graphs should be as self-contained as possible.
- The color scheme is great (once you can explain how percentages and test scores fit on a common scale). I'd recommend labeling the lines directly rather than using a legend. Once you fix the scale, the lines will be farther apart also.
- 2003 should come before 2004. In the graph shown, 2004 is on the left and 2003 is on the right, which is counter to the conventional way of displaying time ordering.

I won't go over the other graphs line by line, except to say that they're basically fine. I would prefer, however, that they use a consistent color scheme throughout. In Figure 2, blue represents Math score and red represents English score; in the other figures, blue means Lottery Winners and red means Lottery Losers.

And then there are the tables. I think you know already what I'm going to say, so I won't bother to say it. (I mean, 10.424 with a standard error of 7.167? What are these people thinking?) I know, I know, default choices don't need to be justified. But, still . . .

It's worth emphasizing, at this point, that I think the authors present their results very well, both graphically and in the text of their article. It's only because they took the leap to make these solid graphs, that I can take the next step and try to help them do even better next time. I think one of the roles of a statistician such as myself is to help researchers do their jobs even better--and this is particularly satisfying in settings such as this, where there's no way I would've been doing the research myself.

The last line of the acknowledgments says, "The usual caveat applies." I have no idea what that means--something in economics-speak? I have noticed in general that econ papers have longer acknowledgment sections than stat papers do. My theory has always been that economists write fewer articles and put more time into each one, whereas statisticians spit out articles at a machine-gun rate and don't look back. The two fields have different systems: my impression is that in econ, it's a big deal to be published in the American Economic Review or wherever, whereas, in stat, an article in JASA or Annals of Statistics or wherever won't necessarily get noticed anyway.

Categories: Popular Blogs

Are High-Quality Schools Enough to Close the Achievement Gap? Evidence from a Bold Social Experiment in Harlem

March 8, 2010 - 9:12am
This note on charter schools by Alex Tabarrok reminded me of my remarks on the relevant research paper by Dobbie and Fryer, remarks which I somehow never got around to posting here. So here are my (inconclusive) thoughts from a... Andrew Gelman http://www.stat.columbia.edu/~gelman
Categories: Popular Blogs

What if everything happened according to plan?

March 8, 2010 - 8:09am

I had occasion to revisit this graph:

And then, it suddenly struck me: what if everything had gone as planned? From the perspective of Obama's reelection chances, the light blue graph ("without recovery plan") is much better than the dark blue ("with recovery plan"). By Election Day, 2012, the two curves are nearly at the same point. But in the year from 2011 to 2012, the economy is improving much faster with the top curve than the bottom curve. And, as Doug Hibbs, Bob Erikson, Steven Rosenstone, and others have taught us, year-to-year change in the economy is what it's all about.

I'm not exactly saying that Obama and his team actually want unemployment in 2011 to be any higher than necessary; it's just funny how, from a crude curve-extrapolation perspective, the above graph is looking like it could be good news for them in two and a half years.

Once again, it's the Hoover-or-Reagan story.

Categories: Popular Blogs

Graph of the week

March 8, 2010 - 7:40am

Brendan Nyhan links to this hilariously bad graph from the Wall Street Journal:

It's cute how they scale the black line to go right between the red and blue lines, huh? I'm not quite sure how $7.25 can be 39% of something, while $5.15 is 10%, but I'm sure there's a perfectly good explanation . . .

Follow the above link for more details. As Brendan notes, the graph says essentially nothing about the relation between minimum wage laws and unemployment ("Any variable that trended in one direction during the current economic downturn will be correlated with the unemployment rate among teens or any other group.") and he also helpfully graphs the unemployment trends among the general population, which has a similar upward trend.

This is not to say that increases in the minimum wage are necessarily a good idea--that's not my area of expertise. I'm talkin here about a horrible graph--all the worse, I fear, because of its professionalism. The above graph looks legit--it has many of the visual signifiers of seriousness, looking similar to a newsy graph you might see in the Economist, rather than like a joke graph of the sort identified with USA Today and parodied so well by the Onion.

P.S. I have no problem with the use of a crisp graph to make a political point; see for example here or here.

Categories: Popular Blogs

Graph of the week

March 8, 2010 - 7:25am
Brendan Nyhan links to this hilariously bad graph from the Wall Street Journal: It's cute how they scale the black line to go right between the red and blue lines, huh? I'm not quite sure how $7.25 can be 39%... Andrew Gelman http://www.stat.columbia.edu/~gelman
Categories: Popular Blogs

Criticizing statistical methods for mediation analysis

March 7, 2010 - 3:48pm

Brendan Nyhan passes along an article by Don Green, Shang Ha, and John Bullock, entitled "Enough Already about 'Black Box' Experiments: Studying Mediation Is More Difficult than Most Scholars Suppose," which begins:

The question of how causal effects are transmitted is fascinating and inevitably arises whenever experiments are presented. Social scientists cannot be faulted for taking a lively interest in "mediation," the process by which causal influences are transmitted. However, social scientists frequently underestimate the difficulty of establishing causal pathways in a rigorous empirical manner. We argue that the statistical methods currently used to study mediation are flawed and that even sophisticated experimental designs cannot speak to questions of mediation without the aid of strong assumptions. The study of mediation is more demanding than most social scientists suppose and requires not one experimental study but rather an extensive program of experimental research.

That last sentence echoes a point that I like to make, which is that you generally need to do a new analysis for each causal question you're studying. I'm highly skeptical of the standard poli sci or econ approach which is to have the single master regression from which you can read off many different coefficients, each with its own causal interpretation.

The article seems reasonable to me (I'm basing my judgments on the downloadable version here), although I can't figure out why an article with three authors is written in the first person singular. Also, I'd slam them for writing a paper with no graphs--except that I just did the same thing, on the same topic!

Green et al. set things up by explaining why causal path analysis seems like a good idea:

One can scarcely fault scholars from expressing curiosity about the mechanisms by which an experimental treatment transmits its influence. After all, many of the most interesting discoveries in science have to do with the identifying mediating factors in a causal chain. For example, the introduction of limes into the diet of seafarers in the 18th century dramatically reduced the incidence of scurvy, and eventually 20th century scientists figured out that the key mediating ingredient was vitamin C. Equipped with knowledge about why an experimental treatment works, scientists may devise other, possibly more efficient ways of achieving the same effect. Modern seafarers can prevent scurvy with limes or simply with vitamin C tablets.

Arresting examples of mediators abound in the physical and life sciences. Indeed, not only do scientists know that vitamin C mediates the causal relationship between limes and scurvy, they also understand the biochemical process by which vitamin C counteracts the onset of scurvy. In other words, mediators themselves have mediators. Physical and life scientists continually seek to pinpoint ever more specific explanatory agents.

But now the bad news:

Given the strong requirements in terms of model specification and measurement, the enterprise of "opening the black box" or "exploring causal pathways" using endogenous mediators is largely a rhetorical exercise. I [Green, Ha, and Bullock] am at a loss to produce even a single example in political science in which this kind of mediation analysis has convincingly demonstrated how a causal effect is transmitted from X to Y.

And then they put it all in perspective:

My [Green, Ha, and Bullock's] argument is not that the search for mediators is pointless or impossible. Establishing the mediating pathways by which an effect is transmitted can be of enormous theoretical and practical value, as the vitamin C example illustrates. Rather, I take issue with the impatience that social scientists often express with experimental studies that fail to explain why an effect obtains. As one begins to appreciate the complexity of mediation analysis, it becomes apparent why the experimental investigation of mediators is slow work. Just as it took almost two centuries to discover why limes cure scurvy, it may take decades to figure out the mechanisms that account for the causal relationships observed in social science.

OK, what's everybody talkin bout?

Here's the method that Green et al. criticize:

Although path analysis goes back several decades, mediation analyses surged in popularity in the 1980s with the publication of Baron and Kenny (1986) . . . First, one regresses the outcome (Y) on the independent variable (X). Upon finding an effect to be explained, one proposes a possible mediating variable (M) and regresses it on X. If X appears to cause M, the final step is to examine whether the effect of X becomes negligible when Y is regressed on both M and X. If M predicts Y and X does not, the implication is that X transmits its influence through M.

This approach has always seemed pretty hopeless to me, but a colleague whom I respect has defended it to me, a bit, by framing it as an adjunct to experimental research. As he puts it, the serious social psychologists would not dream of applying the mediatoin analysis stuff directly to observational data. Rather, it's their attempt to squeeze more out of experimental data. From that perspective, maybe it's not so horrible.

Beyond nihilism

Green et al. don't just sit around and criticize; they also offer suggestions for moving forward:

A more judicious approach at this juncture in the development of social science would be to encourage researchers to measure as many outcomes as possible when conducting experiments. For example, consider the many studies that have sought to increase voter turnout by means of some form of campaign contact, such as door-to-door canvassing. In addition to assessing whether the intervention increases turnout, one might also conduct a survey of random samples of the treatment and control groups in order to ascertain whether these groups differ in terms of interest in politics, feelings of civic responsibility, knowledge about where and how to vote, and so forth. With many mediators and only one intervention, this kind of experiment cannot identify which of the many causal pathways transmit the effect of the treatment, but if certain pathways are unaffected by the treatment, one may begin to argue they do not explain why mobilization works. As noted above, this kind of analysis makes some important assumptions about homogenous treatment effects, but the point is that this type of exploratory investigation may provide some useful clues to guide further experimental investigation.

As researchers gradually develop intuitions about the conditions under which effects are larger or smaller, they may begin to experiment with variations in the treatment in an effort to isolate the aspects of the intervention that produce the effect. For example, after a series of pilot studies that suggested that social surveillance might be effective in increasing voter turnout, Gerber, Green, and Larimer (2008) launched a study in which subjects were presented one of several interventions. One encouraged voting as a matter of civic duty; another indicated that researchers would be monitoring who voted; a third revealed the voting behavior of all the people living at the same address; and a final treatment revealed the voting behavior of those living on the block. This study stopped short of measuring mediators such as one's commitment to norms of civic participation or one's desire to maintain a reputation and an engaged citizen; nevertheless, the treatments were designed to activate mediators to varying degrees. One can easily imagine variations in this experimental design that would enable the researcher to differentiate more finely between mediators. And one can imagine introducing survey measures to check whether these inducements produce an intervening psychological effect consistent with the posited mediator.

You won't be surprised to hear that I like the focus on active research examples.

Categories: Popular Blogs

Criticizing statistical methods for mediation analysis

March 7, 2010 - 11:14am
Brendan Nyhan passes along an article by Don Green, Shang Ha, and John Bullock, entitled "Enough Already about 'Black Box' Experiments: Studying Mediation Is More Difficult than Most Scholars Suppose," which begins: The question of how causal effects are transmitted... Andrew Gelman http://www.stat.columbia.edu/~gelman
Categories: Popular Blogs

No comment

March 5, 2010 - 3:20pm

How come, when I posted a few entries last year on Pearl's and Rubin's frameworks for causal inference, I got about 100 comments, but when yesterday I posted my 12-page magnum opus on the topic, only three people commented?

My theory is that the Pearl/Rubin framing of the earlier discussion personalized the topic, and people get much more interested in a subject if it can be seen in terms of personalities.

Another hypothesis is that my recent review was so comprehensive and correct that people had nothing to say about it.

P.S. The present entry is an example of reverse causal inference, in the sense described in my review.

Categories: Popular Blogs