Share |

Gelman's Blog

Gladwell vs Pinker

6 hours 38 min ago

I just happened to notice this from last year. Eric Loken writes:

Steven Pinker reviewed Malcolm Gladwell's latest book and criticized him rather harshly for several shortcomings. Gladwell appears to have made things worse for himself in a letter to the editor of the NYT by defending a manifestly weak claim from one of his essays - the claim that NFL quarterback performance is unrelated to the order they were drafted out of college. The reason w [Loken and his colleagues] are implicated is that Pinker identified an earlier blog post of ours as one of three sources he used to challenge Gladwell (yay us!). But Gladwell either misrepresented or misunderstood our post in his response, and admonishes Pinker by saying "we should agree that our differences owe less to what can be found in the scientific literature than they do to what can be found on Google."

Well, here's what you can find on Google. Follow this link to request the data for NFL quarterbacks drafted between 1980 and 2006. Paste the data into a spreadsheet and make a simple graph of touchdowns thrown (as of 2008) versus order of selection in the draft to create the picture below.

The graph includes 373 QBs with a correlation of -.40. If you take the log of TDs the correlation increases to -.57. But correlation can be misleading here because the data are heavily skewed and stacked at zero. Instead, just focus on the perfectly transparent visual display. What is the probability that a quarterback throws 50 or more touchdowns if picked early in the draft? Is the probability lower for QBs picked later in the draft? If you were going to predict performance, would you want to know the draft position of the QB before you made your prediction? The answer to this last question is an unequivocal yes.

So how do you make this plain-as-day-association disappear? You can eliminate some of the data by declaring it off limits. For example, an economist named David Berri has recently published an article claiming that the correct way to look at the above data is by filtering some observations and making some transformations. (I am working from his blog post here as the journal article is not yet available at my library.) On his blog, Berri says he restricts the analysis to QBs who have played more than 500 downs, or for 5 years. He also looks at per-play statistics, like touchdowns per game, to counter what he considers an opportunity bias. Because early draft picks are given more opportunity to play, there is a natural correlation between draft order and playing time which might inflate the career statistics like total touchdowns.

Fair enough, but you have to be careful about writing off one source of covariance as a bias in need of correction. Longevity in the NFL is a function of opportunity and success. To attribute all the covariance between playing time and draft order as some sort of opportunity bias is to dramatically redefine the logic of the question. Does anyone believe that NFL owners and coaches are just "socially promoting" their early draft picks to run up these gaudy production stats, while equally able QBs with the misfortune of being selected later in the draft sit idly by and watch? Yes,there are Tom Bradys sitting on the bench... but very very few quarterbacks picked 199th in the draft are remotely as good as Brady proved to be, whereas several QBs picked in the early rounds are as good. You can't look at the above graph and not agree that there is some association between draft order and probability of being a high producer. It doesn't make sense to say that graph is an illusion due to uncorrected factors.

Even when I [Loken] do take a few chops at the above data, I can't eliminate the strong correlation. The correlation is still there when I do TDs per game. It's there when I restrict the data for at least 100 pass attempts. The correlation is even bigger when I do TD per game for QBs picked in the first 100 positions of the draft. I can't get the association to go away, and I'm going to let these graphs stand as a challenge to Gladwell's statement that no prediction is possible regarding the future success of NFL quarterbacks. The consensus of the predictive information reflected in draft order out of college unambiguously does predict future performance.

I don't have anything to add here, except to note that Loken's blog entry had a lot of internal links that I was too lazy to cut-and-paste over--I think there must be a way to do this automatically but I don't know how.

Also, here's my Q-and-A with David Berri from a few years ago. We just talk about basketball, not football.

Categories: Popular Blogs

Blending results from two relatively independent multi-level models

September 2, 2010

David Shor writes:

I [Shor] am working on a Bayesian Forecasting model for the Mid-term elections that has two components:

1) A poll aggregation system with pooled and hierarchical house and design effects across every race with polls (Average Standard error for house seat level vote-share ~.055)

2) A Bafumi-style regression that applies national-swing to individual seats. (Average Standard error for house seat level vote-share ~.06)

Since these two estimates are essentially independent, estimates can probably be made more accurate by pooling them together. But If a house effect changes in one draw, that changes estimates in every race. Changes in regression coefficients and National swing have a similar effect. In the face of high and possibly differing seat-to-seat correlations from each method, I'm not sure what the correct way to "blend" these models would be, either for individual or top-line seat estimates.

In the mean-time, I'm just creating variance-weighted averages in excel for seat level estimates and using bayes rule to mix the two seat distributions to get pdfs, which I suspect is sufficient for this particular application. But I'm very curious what the "right" thing to do would be.

My reply:

I'm not quite sure what the right thing to do is here--I'm not following the details of what you're doing, exactly--but I'll give you my general advice, which is that it's usually worth it to create a model for the data and work through the likelihood etc, rather than to create an estimator. If, instead of trying to figure out how to do a weighting, you directly model your data, an appropriate weighting might very well pop our satisfyingly from the posterior distribution. That's been my experience in various contexts, from elections to radon gas.

Shor clarifies:

The issue is that I have two competing models for these races, one that looks horse races as a state-space model (Random walk with noisy biased observations), and the other being that races follow some sort of year to year random walk like you've specified in your house papers.

I suppose they arn't radically different, and your book would recommend trying to build some greater model that incorporates the two interpretations as special cases or construct some hybrid.

Yes, I think a larger model would make sense. But I understand the short-term goal of having a good weighted average. I suppose that a weighted average, weighting by inverse of forecast variance (which is what would be appropriate if the forecasts were truly independent) would make sense.

Categories: Popular Blogs

Interactions of predictors in a causal model

September 2, 2010

Michael Bader writes:

What is the best way to examine interactions of independent variables in a propensity weights framework? Let's say we are interested in estimating breathing difficulty (measured on a continuous scale) and our main predictor is age of housing. The object is to estimate whether living in housing 20 years or older is associated with breathing difficulty compared counterfactually to those living in housing less than 20 years old; as a secondary question, we want to know whether that effect differs for those in poverty compared to those not in poverty. In our first-stage propensity model, we include whether the respondent lives in poverty. The weights applied to the other covariates in the propensity model are similar to those living in poverty compared to those who are not. Now, can I simply interact the poverty variable with the age of construction variable to look at the interaction of age of housing and poverty on breathing difficulty? My thought is no -- in order to get the true interaction, I would need to have the propensity model estimate the joint probability of being poor and living in older housing (as well as the other three cells in the 2x2 table) because without doing this, the main effect of being poor is being washed out by the propensity weights in the first step. On the other hand, if the weights are comparable across the model when we stratify on poverty, I'm not sure whether it will have much of an effect. Or, I could be totally incorrect and running the interaction with the poverty variable is sufficient.

I [Bader] am happy to read up on the subject; but when I tried doing a search, all I could find were debates about adding interactions into the propensity model itself, not looking at interactions of separate independent variables in the model.

My reply:

I don't think it's a good idea to frame this in terms of weights or weighting. I think of propensity scores as just one particular method for the more general problem of constructing similar groups in a treatment/control comparison. (See chapter 10 of ARM for further discussion of this point.) In the example you describe above, you could compare people who lived in housing 20 years older to people who lived in more recent housing, matching on other variables including their previous poverty status. Then you can include the relevant interactions in your model. The whole propensity-weighting thing seems like a distraction from your real goals here.

Categories: Popular Blogs

R needs a good function to make line plots

September 2, 2010

More and more I'm thinking that line plots are great. More specifically, two-way grids of line plots on common scales, with one, two, or three lines per plot (enough to show comparisons but not so many that you can't tell the lines apart). Also dot plots, of the sort that have been masterfully used by Lax and Phillips to show comparisons and trends in support for gay rights.

There's a big step missing, though, and that is to be able to make these graphs as a default. We have to figure out the right way to structure the data so these graphs come naturally.

Then when it's all working, we can talk the Excel people into implementing our ideas. I'm not asking to be paid here; all our ideas are in the public domain and I'm happy for Microsoft or Google or whoever to copy us.

P.S. Drew Conway writes:

This could be accomplished with ggplot2 using various combinations of the grammar. If I am understanding what you mean by line plots, here are some examples with code.

In fact, that website is a tremendous resource for all things data viz in R.

Categories: Popular Blogs

References on predicting elections

September 1, 2010

Mike Axelrod writes:

I [Axelrod] am interested in building a model that predicts voting on the precinct level, using variables such as party registration, age, sex, income etc. Surely political scientists have worked on this problem.

I would be grateful for any reference you could provide in the way of articles and books.

My reply: Political scientists have worked on this problem, and it's easy enough to imagine hierarchical models of the sort discussed in my book with Jennifer. I can picture what I would do if asked to forecast at the precinct level, for example to model exit polls. (In fact, I was briefly hired by the exit poll consortium in 2000 to do this, but then after I told them about hierarchical Bayes, they un-hired me!) But I don't actually know of any literature on precinct-level forecasting. Perhaps one of you out there knows of some references?

Categories: Popular Blogs

Ratios where the numerator and denominator both change signs

September 1, 2010

A couple years ago, I used a question by Benjamin Kay as an excuse to write that it's usually a bad idea to study a ratio whose denominator has uncertain sign. As I wrote then:

Similar problems arise with marginal cost-benefit ratios, LD50 in logistic regression (see chapter 3 of Bayesian Data Analysis for an example), instrumental variables, and the Fieller-Creasy problem in theoretical statistics. . . . In general, the story is that the ratio completely changes in interpretation when the denominator changes sign.

More recently, Kay sent in a related question:

I [Kay] wondered if you have any advice on handling ratios when the signs change as a result of a parameter.

I have three functions, one C * x^a, another D * x^a, and a third f(x,a) in my paper such that:

C * x^a, < f(x,a) < D * x^a

C,D and a all have the same signs.
We can divide through by C * x^a but the results depend on the sign of C either

1< f(x,a) / C * x^a < D * x^a / C * x^a,

or

1 / f(x,a) / C * x^a > D * x^a / C * x^a,

That is, when the sign on a changes, the inequalities flip.

I want to say something about the ratio C/D being close to one so that I can say something about how tight the bounds are on f(x,a) / C * x^a.

So being close (say within 5%) has a confusing presentation.

a>0 : 1< f(x,a) / C * x^a < 1.05

a<0 : 1> f(x,a) / C * x^a >.95

Which no one who has read my paper likes. They mostly find it confusing. I cannot be the first person to have to deal with this, so I wondered if you had any suggestions?

My reply:

I have to admit I can't understand much of your notation but I think I get the general picture. In other settings what I've done is to try to reformulate the problem. For example, instead of looking at C/D, look at C - D, or perhaps delta*(C-D), where delta is a positive quantity set to a reasonable value (of the order of magnitude of |C+D|).

My feeling is that if you carefully express these things as decision problems, ultimately it's differences rather than ratios that really matter. We use ratios because they are conveniently scale-free, but really it shouldn't be hard to scale a difference in a reasonable way. The small amount of effort placed into scaling can pay off big-time in clean and direct interpretation.

Categories: Popular Blogs

How does Bayes do it?

September 1, 2010

I received the following message from a statistician working in industry:

I am studying your paper, A Weakly Informative Default Prior Distribution for Logistic and Other Regression Models. I am not clear why the Bayesian approaches with some priors can usually handle the issue of nonidentifiability or can get stable estimates of parameters in model fit, while the frequentist approaches cannot.

My reply:

1. The term "frequentist approach" is pretty general. "Frequentist" refers to an approach for evaluating inferences, not a method for creating estimates. In particular, any Bayes estimate can be viewed as a frequentist inference if you feel like evaluating its frequency properties. In logistic regression, maximum likelihood has some big problems that are solved with penalized likelihood--equivalently, Bayesian inference. A frequentist can feel free to consider the prior as a penalty function rather than a probability distribution of parameters.

2. The reason our approach works well is that we are adding information. In a logistic regression with separation, there is a lack of information in the likeilhood, and the prior distribution helps out by ruling out unrealistic possibilities.

3. There are settings where our Bayesian method will mess up. For example, if the true logistic regression coefficient is -20, and you have a moderate sample size, our estimate will be much closer to zero (while the maximum likelihood estimate will be minus infinity, which for some purposes might be an acceptable estimate).

Probably I should write more about this sometime. Various questions along those lines arose during my recent talk at Cambridge.

Categories: Popular Blogs

Predicting marathon times

August 31, 2010

Frank Hansen writes:

I [Hansen] signed up for my first marathon race. Everyone asks me my predicted time. The predictors online seem geared to or are based off of elite runners. And anyway they seem a bit limited.

So I decided to do some analysis of my own.

I was going to put together a web page where people could get their race time predictions, maybe sell some ads for sports gps watches, but it might also be publishable.

I have 2 requests which obviously I don't want you to spend more than a few seconds on.

1. I was wondering if you knew of any sports performance researchers working on performance of not just elite athletes, but the full range of runners.

2. Can you suggest a way to do multilevel modeling of this. There are several natural subsets for the data but it's not obvious what makes sense. I describe the data below.

3. Phil (the runner/co-blogger who posted about weight loss) might be interested.

I collected race results for the Chicago marathon and 3 shorter races: Chicago Half Marathon, Soldier Field 10 Miler, Ravenswood 5k. I collected data from 2003 through 2009. Within each year I matched results for finishers between each shorter race and that year's marathon based on full name and age. I used python to scrape web pages for the results.

Of course in a particular year a given marathoner may have run more than one of the shorter races. At this point I am ignoring that, treating them as independent records even though they have the same marathon finish data.

I would think that knowing several shorter races to predict a marathon time would help, but demanding several matches really cuts down the data.

I also collected weather data, so I know the temperature, humidity, wind speed near 8 am for each race (in Chicago).

I end up with around 13,000 records. A record contains a marathon time, a short race time, the type of short race, the temperature, humidity and wind speed difference between the short race and the marathon. I also know the age and sex of the marathon finisher.

Taking logs helps the R-squared, but this way it's easier to interpret.

nt.form <- "mar.pace ~ short.pace + short.race.type + age + sex + temp.dif + humid.dif +wind.dif -1"

Call:
lm(formula = int.form, data = full.dat)

Residuals:
Min 1Q Median 3Q Max
-510.061 -36.867 -5.632 34.116 510.552

Coefficients:
Estimate Std. Error t value Pr(>|t|)
short.pace 0.999389 0.006703 149.087 < 2e-16 ***
short.race.typehalf 82.630974 4.242505 19.477 < 2e-16 ***
short.race.typerw 106.133301 4.347218 24.414 < 2e-16 ***
short.race.typesf10 89.458519 4.209498 21.252 < 2e-16 ***
age 0.321860 0.064960 4.955 7.33e-07 ***
sexM 8.444752 1.286381 6.565 5.41e-11 ***
temp.dif 1.516766 0.051981 29.179 < 2e-16 ***
humid.dif 0.128886 0.041519 3.104 0.00191 **
wind.dif -1.534700 0.150816 -10.176 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 " " 1

Residual standard error: 65.79 on 13004 degrees of freedom
Multiple R-squared: 0.9895, Adjusted R-squared: 0.9895
F-statistic: 1.368e+05 on 9 and 13004 DF, p-value: < 2.2e-16

In the regression results the marathon and short race "pace" variable is in seconds per mile, so the short.race.typehalf equal to 82 means roughly add 82 seconds to your half marathon mile pace to get the marathon mile pace, and so on for the inde[endent variables. Temperature is in Fahrenheit, Humidity in %, Wind Speed in mph.

Marathon day for 2009 was really cold, predicting pace for 2009 based on a fit of the other years has larger errors than predicting 2008 using a fit for the non-2008 data.

My main piece of advice is to never ever ever ever ever use "summary" to display regression outputs in R. Only use "display" or "coefplot". Unless, that is, you care that your standard error is "4.242505" or that your p-value is "4.242505" or that your F-statistic is "1.368e+05". I don't. But, then again, I'm a Bayesian.

Categories: Popular Blogs

Somewhat Bayesian multilevel modeling

August 31, 2010

Eric McGhee writes:

I'm trying to generate county-level estimates from a statewide survey of California using multilevel modeling. I would love to learn the full Bayesian approach, but I'm on a tight schedule and worried about teaching myself something of that complexity in the time available.

I'm hoping I can use the classical approach and simulate standard errors using what you and Jennifer Hill call the "informal Bayesian" method. This has raised a few questions:

First, what are the costs of using this approach as opposed to full Bayesian?

Second, when I use the predictive simulation as described on p. 149 of "Data Analysis" on a binary dependent variable and a sample of 2000, I get a 5%-95% range of simulation results so large as to be effectively useless (on the order of +/- 15 points). This is true even for LA county, which has enough cases by itself (about 500) to get a standard error of about 2 points from simple disaggregation. However, if I simulate only with the coefficients and skip the step of random draws from a binomial distribution (i.e., use the technique described at the bottom of p. 148), I get results that are much more sensible (around +/- 5 points). Do the random draws from the binomial distribution only apply to out-of-sample predictions? Or do they apply to in-sample predictions, too? If the latter, any idea why I would be getting such a large range of results? Might that be signaling something wrong with the model, or with my R code?

Finally, when dealing with simulation results, what would most closely correspond to a margin of error? The 5%-95% interval mentioned above? Or something else? I need a way of summarizing uncertainty using a terminology that is familiar to a policy audience.

My reply:

The main benefit of full Bayes over approximate Bayes (of the sort done by lmer(), for example, and used in many of the examples in my book with Jennifer) arises when group-level variances are small. Approximate Bayes gives a point estimate of the variance parameters, which understates uncertainty compared to full Bayes. We are currently working on an add-on to lmer()-like programs to include some of that uncertainty, but we haven't done it yet, so I don't have any R package to conveniently offer you here.

Regarding your simulation question: Yes, if you're interested in estimating all of California, you don't want to do that binomial simulation--that's something you only do when you're simulating some finite amount of new data.

For the margin of error, you can just compute sd's from the simulations and then compute 2*sd. Or you can use the [2.5%, 97.5%] simulation points, but that will be pretty noisy unless you have thousands of simulations.

Categories: Popular Blogs

Computer models of the oil spill

August 30, 2010

Chris Wilson points me to this visualization of three physical models of the oil spill in the Gulf of Mexico. Cool (and scary) stuff. Wilson writes:

One of the major advantages is that the models are 3D and show the plumes and tails beneath the surface. One of the major disadvantages is that they're still just models.
Categories: Popular Blogs

Computer models of the oil spill

August 30, 2010
Chris Wilson points me to this visualization of three physical models of the oil spill in the Gulf of Mexico. Cool (and scary) stuff. Wilson writes: One of the major advantages is that the models are 3D and show the... Andrew Gelman http://www.stat.columbia.edu/~gelman
Categories: Popular Blogs

Useful models, model checking, and external validation: a mini-discussion

August 30, 2010
I sent a copy of my paper (coauthored with Cosma Shalizi) on Philosophy and the practice of Bayesian statistics in the social sciences to Richard Berk, who wrote: I read your paper this morning. I think we are pretty much... Andrew Gelman http://www.stat.columbia.edu/~gelman
Categories: Popular Blogs

Useful models, model checking, and external validation: a mini-discussion

August 30, 2010

I sent a copy of my paper (coauthored with Cosma Shalizi) on Philosophy and the practice of Bayesian statistics in the social sciences to Richard Berk, who wrote:

I read your paper this morning. I think we are pretty much on the same page about all models being wrong. I like very much the way you handle this in the paper. Yes, Newton's work is wrong, but surely useful. I also like your twist on Bayesian methods. Makes good sense to me. Perhaps most important, your paper raises some difficult issues I have been trying to think more carefully about.

1. If the goal of a model is to be useful, surely we need to explore that "useful" means. At the very least, usefulness will depend on use. So a model that is useful for forecasting may or may not be useful for causal inference.

2. Usefulness will be a matter of degree. So that for each use we will need one or more metrics to represent how useful the model is. In what looks at first to be simple example, if the use is forecasting, forecasting accuracy by something like MSE may be a place to start. But that will depend on one's forecasting loss function, which might not be quadratic or even symmetric. This is a problem I have actually be working on and have some applications appearing. Other kinds of use imply a very different set of metrics --- what is a good usefulness metric for causal inference, for instance?

3. It seems to me that your Bayesian approach is one of several good ways (and not mutually exclusive ways) of doing data analysis. Taking a little liberty with what you say, you try a form of description and if it does not capture well what is in the data, you alter the description. But like use, it will be multidimensional and a matter of degree. There are these days so many interesting ways that statisticians have been thinking about description that I suspect it will be a while (if ever) before we have a compelling and systematic way to think about the process. And it goes to the heart of doing science.

4. I guess I am uneasy with your approach when it uses the same data to build and evaluate a model. I think we would agree that out-of-sample evaluation is required.

5. There are also some issues about statistical inference after models are revised and re-estimated using the same data. I have attached ">a recent paper written for criminologists, co-authored with Larry Brown and Linda Zhao, that appeared in Quantitative Criminology. It is frequentist in perspective. Larry and Ed George are working on a Bayesian version. Along with Andreas Buja and Larry Shepp, we are working on appropriate methods to post-model selection inference, given that current practice is just plain wrong and often very misleading. Bottom line: what does one make of Bayesian output when the model involved has been tuned to the data?

My reply:

I agree with your points #1 and #2. We always talk about a model being "useful" but the concept is hard to quantify.

I also agree with #3. Bayes has worked well for me but I'm sure that other methods could work fine also.

Regarding point #4, the use of the same data to build and evaluate the model is not particularly Bayesian. I see what we do as an extension of non-Bayesian ideas such as chi^2 tests, residual plots, and exploratory data analysis--all of which, in different ways, are methods for assessing model fit using the data that were used to fit the model. In any case, I agree that out-of-sample checks are vital to true statistical understanding.

To put it another way: I think you're imagining that I'm proposing within-sample checks as an alternative to out-of-sample checking. But that's not what I'm saying. What I'm proposing is to do within-sample checks as an alternative to doing no checking at all, which unfortunately is the standard in much of the Bayesian world (abetted by the subjective-Bayes theory/ideology). When a model passes a within-sample check, it doesn't mean the model is correct. But in many many cases, I've learned a lot from seeing a model fail a within-sample check.

Regarding your very last point, there is some classic work on Bayesian inference accounting for estimation of the prior from data. This is the work of various people in the 1960s and 1970s on hierarchical Bayes, when it was realized that "empirical Bayes" or "estimating the prior from data" could be subsumed into a larger hierarchical framework. My guess is that such ideas could be generalized to a higher level of the modeling hierarchy.

Categories: Popular Blogs

The Subtle Micro-Effects of Peacekeeping

August 29, 2010
Eric Mvukiyehe and Cyrus Samii write: We [Mvukiyehe and Samii] use original survey data and administrative data to test a theory of the micro-level impacts of peacekeeping. The theory proposes that through the creation of local security bubbles and also... Andrew Gelman http://www.stat.columbia.edu/~gelman
Categories: Popular Blogs

The Subtle Micro-Effects of Peacekeeping

August 29, 2010

Eric Mvukiyehe and Cyrus Samii write:

We [Mvukiyehe and Samii] use original survey data and administrative data to test a theory of the micro-level impacts of peacekeeping. The theory proposes that through the creation of local security bubbles and also through direct assistance, peacekeeping deployments contribute to economic and social revitalization that may contribute to more durable peace. This theory guides the design of current United Nations peacekeeping operations, and has been proposed as one of the explanations for peacekeeping's well-documented association with more durable peace.

Our evidence paint a complex picture that deviates substantially from the theory. We do not find evidence for local security bubbles around deployment base areas, and we do not find that deployments were substantial contributors to local social infrastructure. In addition, we find a negative relationship between deployment basing locations and NGO contributions to social infrastructure.

Nonetheless, we find that deployments do seem to stimulate local markets, leading to better employment possibilities and substantially higher incomes. The result is something of a puzzle, suggesting that more work needs to be done on other types of direct assistance by peacekeeping contingents--e.g. the impact of mission procurement and routine spending by those associated with the mission. Also, the findings with respect to NGO activities suggest that this is an important factor that past case studies and cross-national studies have not taken into account sufficiently.

(I put in the boldface and the paragraph breaks to add some emphasis.)

At this point, I'd usually say, Here are the graphs. But there are no graphs! I'm sure the article will be even better once they've presented their data and model in an accessible form. In the meantime, I think these guys know what they're doing, so if you're interested in peacekeeping, you should probably read their article right away.

Categories: Popular Blogs

ARM solutions

August 29, 2010
People sometimes email asking if a solution set is available for the exercises in ARM. The answer, unfortunately, is no. Many years ago, I wrote up 50 solutions for BDA and it was a lot of work--really, it was like... Andrew Gelman http://www.stat.columbia.edu/~gelman
Categories: Popular Blogs

ARM solutions

August 29, 2010

People sometimes email asking if a solution set is available for the exercises in ARM. The answer, unfortunately, is no. Many years ago, I wrote up 50 solutions for BDA and it was a lot of work--really, it was like writing a small book in itself. The trouble is that, once I started writing them up, I wanted to do it right, to set a good example. That's a lot more effort than simply scrawling down some quick answers.

Categories: Popular Blogs

Ethics and statistics in development research

August 29, 2010
From Bannerjee and Duflo, "The Experimental Approach to Development Economics," Annual Review of Economics (2009): One issue with the explicit acknowledgment of randomization as a fair way to allocate the program is that implementers may find that the easiest way... Andrew Gelman http://www.stat.columbia.edu/~gelman
Categories: Popular Blogs

Ethics and statistics in development research

August 29, 2010

From Bannerjee and Duflo, "The Experimental Approach to Development Economics," Annual Review of Economics (2009):

One issue with the explicit acknowledgment of randomization as a fair way to allocate the program is that implementers may find that the easiest way to present it to the community is to say that an expansion of the program is planned for the control areas in the future (especially when such is indeed the case, as in phased-in design).

I can't quite figure out whether Bannerjee and Duflo are saying that they would lie and tell people that an expansion is planned when it isn't, or whether they're deploring that other people do it.

I'm not bothered by a lot of the deception in experimental research--for example, I think the Milgram obedience experiment was just fine--but somehow the above deception bothers me. It just seems wrong to tell people that an expansion is planned if it's not.

P.S. Overall the article is pretty good. My only real problem with it is that when discussing data analysis, they pretty much ignore the statistical literature and just look at econometrics. In the long run, that's fine--any relevant developments in statistics should eventually make their way over to the econometrics literature. But for now I think it's a drawback in that it encourages a focus on theory and testing rather than modeling and scientific understanding.

Here are the titles of some of the cited papers:

Bootstrap tests for distributional treatment effects in instrumental variables models
Nonparametric tests for treatment effect heterogeneity
Testing the correlated random coefficient model
Asymptotics for statistical decision rules

Most of the paper, and most of the references, are applied rather than theoretical, so I'm not claiming that Bannerjee and Duflo are ivory-tower theorists. Rather, I'm suggesting that their statistical methods might not be allowing them to get the most out of their data--and that they're looking in the wrong place when researching better methods. The problem, I think, is that they (like many economists) think of statistical methods not as a tool for learning but as a tool for rigor. So they gravitate toward math-heavy methods based on testing, asymptotics, and abstract theories, rather than toward complex modeling. The result is a disconnect between statistical methods and applied goals.

Categories: Popular Blogs

The mathematics of democracy

August 28, 2010
I was sent a copy of "Numbers Rule: The Vexing Mathematics of Democracy, from Plato to the Present," by George Szpiro. It's an interesting book that I think a lot of people will like, going over a bunch of voting... Andrew Gelman http://www.stat.columbia.edu/~gelman
Categories: Popular Blogs