Share |

Popular Blogs

Here's how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

Gelman's Blog - 2 hours 45 min ago

1. I remarked that Sharad had a good research article with some ugly graphs.

2. Dan posted Sharad's graph and some unpleasant alternatives, inadvertently associating me with one of the unpleasant alternatives. Dan was comparing barplots with dotplots.

3. I commented on Dan's site that, in this case, I'd much prefer a well-designed lineplot. I wrote:

There's a principle in decision analysis that the most important step is not the evaluation of the decision tree but the decision of what options to include in the tree in the first place.

I think that's what's happening here. You're seriously limiting yourself by considering the above options, which really are all the same graph with just slight differences in format. What you need to do is break outside the box.

(Graph 2-which I think you think is the kind of thing that Gelman would like-indeed is the kind of thing that I think the R gurus like, but I don't like it at all. It looks clean without actually being clean. Sort of like those modern architecture buildings from the 1930s-1960s that look all sleek and functional but really aren't so functional at all.)

The big problem with your graphs above is that they place two logical dimensions (the model and the scenario) on the same physical dimension (the y-axis). I find this sort of ABCABCABCABC pattern hard to follow. Instead, you want to be able to compare AAAA, BBBB, CCCC, while still being able to make the four separate ABC comparisons.

How to do this? I suggest a lineplot.

Here's how my first try would go:

On the x-axis, put Music, Games, Movies, and Flu, in that order. (Ordering is important in allowing you to see patterns that otherwise might be obscured; see the cover of my book with Jennifer for an example.)

On the y-axis, put the scale. I'll assume you know what you're doing here, so keep with the .4 to 1 scale. But you only need labels at .4, .6, .8, 1.0. The intermediate labels are overkill and just make the graph hard to follow.

Now draw three lines, one for Search, one for Baseline, and one for Combined. Color the lines differently and label each one directly on the plot (not using a legend).

The resulting graph will be compact, and the next step is for you to replicate your study under different conditions, with a new graph for each. You can put these side by side and make some good comparisons.

4. Sharad took my advice and made such a lineplot (see the Addendum at the end of Dan's blog).

5. Kaiser agrees with me and presents an excellent visualization showing why the lineplot is better. (Kaiser's picture is so great that I'll save it for its own entry here, for those of you who don't click through on all the links.)

6. David Smith posts that I prefer the dotplot. Nooooooooooooooooooooooo!!!!!!!!!!!

Categories: Popular Blogs

The China Study: fact or fallacy?

Gelman's Blog - 2 hours 58 min ago

Alex Chernavsky writes:

I recently came across an interesting blog post, written by someone who is self-taught in statistics (not that there's anything wrong with that).

I have no particular expertise in statistics, but her analysis looks impressive to me. I'd be very interested to find out the opinion of a professional statistician. Do you have any interest in blogging about this subject?

My (disappointing, I'm sure) reply: This indeed looks interesting. I don't have the time/energy to look at it more right now, and it's too far from any areas of my expertise for me to give any kind of quick informed opinion. It would be good for this sort of discussion to appear in a nutrition journal where the real experts could get at it. I expect there are some strong statisticians who work in that field, although I don't really know for sure.

P.S. I suppose I really should try to learn more about this sort of thing, as it could well affect my life more than a lot of other subjects (from sports to sex ratios) that I've studied in more depth.

Categories: Popular Blogs

QB2

Gelman's Blog - September 7, 2010

Dave Berri writes:

Saw you had a post on the research I did with Rob Simmons on the NFL draft. I have attached the article. This article has not officially been published, so please don't post this on-line.

The post you linked to states the following: "On his blog, Berri says he restricts the analysis to QBs who have played more than 500 downs, or for 5 years. He also looks at per-play statistics, like touchdowns per game, to counter what he considers an opportunity bias."

Two points: First of all, we did not look at touchdowns per game (that is not a per play stat). More importantly -- as this post indicates -- we did far more than just look at data after five years.

We did mention the five year result, but directly below that discussion (and I mean, directly below), the following sentences appear.

Our data set runs from 1970 to 2007 (adjustments were made for how performance changed over time). We also looked at career performance after 2, 3, 4, 6, 7, and 8 years. In addition, we also looked at what a player did in each year from 1 to 10. And with each data set our story looks essentially the same. The above stats are not really correlated with draft position.

This analysis was also updated and discussed in this post (posted on-line last May). Hopefully that post will also help you see the point Rob and I are making.

I'm out of my depth on this football stuff so I'll leave it to you, the commenters.

Categories: Popular Blogs

The $900 kindergarten teacher

Gelman's Blog - September 7, 2010

Paul Bleicher writes:

This simply screams "post-hoc, multiple comparisons problem," though I haven't seen the paper.

A quote from the online news report:

The findings revealed that kindergarten matters--a lot. Students of kindergarten teachers with above-average experience earn $900 more in annual wages than students of teachers with less experience than average. Being in a class of 15 students instead of a class of 22 increased students' chances of attending college, especially for children who were disadvantaged . . . Children whose test scores improved to the 60th percentile were also less likely to become single parents, more likely to own a home by age 28, and more likely to save for retirement earlier in their work lives.

I haven't seen the paper either. $900 doesn't seem like so much to me, but I suppose it depends where you stand on the income ladder.

Regarding the multiple comparisons problem: this could be a great example for fitting a multilevel model. Seriously.

Categories: Popular Blogs

Inbox zero. Really.

Gelman's Blog - September 6, 2010

Just in time for the new semester:

This time I'm sticking with the plan:

1. Don't open a message until I'm ready to deal with it.
2. Don't store anything--anything--in the inbox.
3. Put to-do items in the (physical) bookje rather than the (computer) "desktop."
4. Never read email before 4pm. (This is the one rule I have been following.
5. Only one email session per day. (I'll have to see how this one works.)

Categories: Popular Blogs

A review of a review of a review of a decade

Gelman's Blog - September 5, 2010

At the sister blog, David Frum writes, of a book by historian Laura Kalman about the politics of the 1970s:

As a work of history about the Ford and Carter years, there is nothing seriously wrong with it. The facts are accurate, the writing is clear and the point of view is not tendentious. Once upon a time, such a book might have been useful to somebody.

But the question it raises--and it's not a question about this book alone--is: What's the point of this kind of history in the age of the Internet? Suppose I'm an undergraduate who stumbles for the first time across the phrase "Proposition 13." I could, if I were minded, walk over to the university library, pull this book from the shelf and flip to the index. Or I could save myself two hours and Google it. I wouldn't learn more from a Google search than I'd learn in these pages. But I wouldn't learn a whole lot less either.

As a textbook writer, I think about some of these issues too! I have two things to add to Frum's remarks (which seem reasonable to me--I would go so far as to call them "perceptive remarks" except that I haven't actually seen Kalman's book, nor have I looked up Proposition 13 on the web, so I'm just taking Frum's word for it):

1. Kalman's book can't be all facts, it must be interpretation also. Given my own struggle with the conventional wisdom in some areas of statistics (as represented by Wikipedia; see, for example, the footnote on page 2 here), I can well understand a historian's motivation to get things right in a definitive article or book--and also a thoughtful student's desire to read a coherent view of a topic rather than a mere collection of received wisdom.

Don't get me wrong--Wikipedia, Snopes, and all the rest are great and are admirably effective in responding to controversy and quashing factual errors (see here and the impressive follow-up on Wikipedia)--but they won't necessarily give you a clear picture of a complex series of events.

2. As someone who tried (and failed) to write a popular social-science book, I believe more than ever before that people like storytelling. If, like David Frum, you've worked closely with famous people during important events, then you can tell stories that are new and relevant. If you're an academic historian, you're probably reduce to rehashing stories that you've read elsewhere. (If you're Doris Kearns Goodwin, you're reduced to copying stories you've read your assistants have read and have inadvertently put your name on.) Rehashing stories is fine--it worked for Shakespeare, Mark Twain, Jeffrey Archer, and the people who compile all those joke books that come out every year. Not everyone can be Chris Rock, you know.

Seriously, though, people do seem to want stories, and the job of a popular history is to tell them and to put them into some sort of logical framework. Frum does get to this point--later on in his review, he criticizes Kalman for slapping down quotes without evaluating them--but I think he's going too far when he demands that a new book help the reader understand "subtle, far-reaching, and perverse effects." That might be fine but it probably isn't what most readers are looking for.

P.S. I followed a link in the comments and found this review by Mary Dudziak. I gotta say, though, that the blurbs quoted by Dudziak do not convince me that Kalman is saying anything new or interesting. I mean, the idea that the "weak leadership" of Jimmy Carter "paved the way for the triumph of Ronald Reagan's forceful conservatism." This ain't exactly a new idea!

Categories: Popular Blogs

Noooooooooooooooooooooooooooooooooooooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Gelman's Blog - September 4, 2010

Masanao sends this one in, under the heading, "another incident of misunderstood p-value":

Warren Davies, a positive psychology MSc student at UEL, provides the latest in our ongoing series of guest features for students. Warren has just released a Psychology Study Guide, which covers information on statistics, research methods and study skills for psychology students. Despite the myriad rules and procedures of science, some research findings are pure flukes. Perhaps you're testing a new drug, and by chance alone, a large number of people spontaneously get better. The better your study is conducted, the lower the chance that your result was a fluke - but still, there is always a certain probability that it was.

Statistical significance testing gives you an idea of what this probability is.

In science we're always testing hypotheses. We never conduct a study to 'see what happens', because there's always at least one way to make any useless set of data look important. We take a risk; we put our idea on the line and expose it to potential refutation. Therefore, all statistical tests in psychology test the possibility that the hypothesis is correct, versus the possibility that it isn't.

I like the BPS Research Digest, but one more item like this and I'll have to take them off the blogroll. This is ridiculous! I don't blame Warren Davies--it's all-too-common for someone teaching statistics to (a) make a mistake and (b) not realize it. But I do blame the editors of the website for getting a non-expert to emit wrong information. One thing that any research psychologist should know is that statistics is tricky. I hate to see this sort of mistake (saying that statistical significance is a measure of the probability the null hypothesis is true) being given the official endorsement of British Psychological Society.

P.S. To any confused readers out there: The p-value is the probability of seeing something as extreme as the data or more so, if the null hypothesis were true. In social science (and I think in psychology as well), the null hypothesis is almost certainly false, false, false, and you don't need a p-value to tell you this. The p-value tells you the extent to which a certain aspect of your data are consistent with the null hypothesis. A lack of rejection doesn't tell you that the null hyp is likely true; rather, it tells you that you don't have enough data to reject the null hyp. For more more more on this, see for example this paper with David Weakliem which was written for a nontechnical audience.

P.P.S. This "zombies" category is really coming in handy, huh?

Categories: Popular Blogs

Bayesian inference viewed as a computational approximation to classical calculations

Gelman's Blog - September 4, 2010

Dave Armstrong writes:

I have a hopefully quick question about Multilevel Models . . . While being Bayesian would make the attached question [having to do with calculating confidence intervals for linear combinations of fixed and varying coefficents] moot, and I am certainly sympathetic in my own work, I am looking to understand the Frequentist perspective as I need to explain how to do this in R to people without experience in WinBUGS and who are generally uninterested in gaining such experience.

My reply:

This sort of thing happens to me all the time, which is one reason I try to do these inferences using simulations, so I don't have to keep track of covariances. The simulation-based Bayes inferences can be interpreted as classical freq inferences; to put it another way, the Bayesian inference can be thought of as a computational trick to work with the multivariate normal and t distributions that arise in classical confidence intervals.

Categories: Popular Blogs

How does multilevel modeling affect the estimate of the grand mean?

Gelman's Blog - September 4, 2010

Subhadeep Mukhopadhyay writes:

I am convinced of the power of hierarchical modeling and individual parameter pooling concept. I was wondering how could multi-level modeling could influence the estimate of grad mean (NOT individual label).

My reply: Multilevel modeling will affect the estimate of the grand mean in two ways:

1. If the group-level mean is correlated with group size, then the partial pooling will change the estimate of the grand mean (and, indeed, you might want to include group size or some similar variable as a group-level predictor.

2. In any case, the extra error term(s) in a multilevel model will typically affect the standard error of everything, including the estimate of the grand mean.

Categories: Popular Blogs

Question about standard range for social science correlations

Gelman's Blog - September 4, 2010

Andrew Eppig writes:

I'm a physicist by training who is transitioning to the social sciences. I recently came across a reference in the Economist to a paper on IQ and parasites which I read as I have more than a passing interest in IQ research (having read much that you and others (e.g., Shalizi, Wicherts) have written). In this paper I note that the authors find a very high correlation between national IQ and parasite prevalence. The strength of the correlation (-0.76 to -0.82) surprised me, as I'm used to much weaker correlations in the social sciences. To me, it's a bit too high, suggesting that there are other factors at play or that one of the variables is merely a proxy for a large number of other variables. But I have no basis for this other than a gut feeling and a memory of a plot on Language Log about the distribution of correlation coefficients in social psychology.

So my question is this: Is a correlation in the range of (-0.82,-0.76) more likely to be a correlation between two variables with no deeper relationship or indicative of a missing set of underlying variables?

My reply:

First off, I don't think you can ever distinguish between correlations of .76 and .82 in this sort of situation, so let's just call it .8.

Second, I certainly agree that other factors could be involved. I don't think you can treat the high correlations as evidence against their argument. I'm not knowledgeable enough in this area to assess their hypotheses; I guess I'd be interested in hearing the thoughts of someone such as Wicherts who's more of an expert in this area.

Finally, are you related to the first author of the linked article, or is it just that you did a search on Eppig and encountered this stuff?

P.S. Eppig responds:

I am in fact related to the first author of the study -- he's my brother.

Since my first question, I've been wondering about how to interpret the results of a regression when some of the dependent variables have been imputed via regression. So if I have a model:

fit.1 <- lm(y ~ x1 + x2 + x3 + x4 + ... + xn)

where x1 has had its missing values imputed using:

fit.2 <- lm(x1 ~ x2 + x3 + ... + xn)

Are there extra considerations required in interpreting the model fit.1? Can one read off the coefficient values and errors from fit.1 as one would in a "regular" (i.e. where no imputation had been performed) model? Naively, I feel that the errors in xn are now correlated with the other independent variables and a simple linear regression is no longer appropriate/valid. Are the coefficients of x1, x2,..., xn valid but the errors invalid?

In response to this later question: This is called a measurement-error or simultaneous equations models. In general you want to fit both models together, or, in general, to model all the variables jointly. That said, in practice I'll typically just take the imputed x-values as exact and not think too hard about it.

Categories: Popular Blogs

Gladwell vs Pinker

Gelman's Blog - September 3, 2010

I just happened to notice this from last year. Eric Loken writes:

Steven Pinker reviewed Malcolm Gladwell's latest book and criticized him rather harshly for several shortcomings. Gladwell appears to have made things worse for himself in a letter to the editor of the NYT by defending a manifestly weak claim from one of his essays - the claim that NFL quarterback performance is unrelated to the order they were drafted out of college. The reason w [Loken and his colleagues] are implicated is that Pinker identified an earlier blog post of ours as one of three sources he used to challenge Gladwell (yay us!). But Gladwell either misrepresented or misunderstood our post in his response, and admonishes Pinker by saying "we should agree that our differences owe less to what can be found in the scientific literature than they do to what can be found on Google."

Well, here's what you can find on Google. Follow this link to request the data for NFL quarterbacks drafted between 1980 and 2006. Paste the data into a spreadsheet and make a simple graph of touchdowns thrown (as of 2008) versus order of selection in the draft to create the picture below.

The graph includes 373 QBs with a correlation of -.40. If you take the log of TDs the correlation increases to -.57. But correlation can be misleading here because the data are heavily skewed and stacked at zero. Instead, just focus on the perfectly transparent visual display. What is the probability that a quarterback throws 50 or more touchdowns if picked early in the draft? Is the probability lower for QBs picked later in the draft? If you were going to predict performance, would you want to know the draft position of the QB before you made your prediction? The answer to this last question is an unequivocal yes.

So how do you make this plain-as-day-association disappear? You can eliminate some of the data by declaring it off limits. For example, an economist named David Berri has recently published an article claiming that the correct way to look at the above data is by filtering some observations and making some transformations. (I am working from his blog post here as the journal article is not yet available at my library.) On his blog, Berri says he restricts the analysis to QBs who have played more than 500 downs, or for 5 years. He also looks at per-play statistics, like touchdowns per game, to counter what he considers an opportunity bias. Because early draft picks are given more opportunity to play, there is a natural correlation between draft order and playing time which might inflate the career statistics like total touchdowns.

Fair enough, but you have to be careful about writing off one source of covariance as a bias in need of correction. Longevity in the NFL is a function of opportunity and success. To attribute all the covariance between playing time and draft order as some sort of opportunity bias is to dramatically redefine the logic of the question. Does anyone believe that NFL owners and coaches are just "socially promoting" their early draft picks to run up these gaudy production stats, while equally able QBs with the misfortune of being selected later in the draft sit idly by and watch? Yes,there are Tom Bradys sitting on the bench... but very very few quarterbacks picked 199th in the draft are remotely as good as Brady proved to be, whereas several QBs picked in the early rounds are as good. You can't look at the above graph and not agree that there is some association between draft order and probability of being a high producer. It doesn't make sense to say that graph is an illusion due to uncorrected factors.

Even when I [Loken] do take a few chops at the above data, I can't eliminate the strong correlation. The correlation is still there when I do TDs per game. It's there when I restrict the data for at least 100 pass attempts. The correlation is even bigger when I do TD per game for QBs picked in the first 100 positions of the draft. I can't get the association to go away, and I'm going to let these graphs stand as a challenge to Gladwell's statement that no prediction is possible regarding the future success of NFL quarterbacks. The consensus of the predictive information reflected in draft order out of college unambiguously does predict future performance.

I don't have anything to add here, except to note that Loken's blog entry had a lot of internal links that I was too lazy to cut-and-paste over--I think there must be a way to do this automatically but I don't know how.

Also, here's my Q-and-A with David Berri from a few years ago. We just talk about basketball, not football.

Categories: Popular Blogs

Blending results from two relatively independent multi-level models

Gelman's Blog - September 2, 2010

David Shor writes:

I [Shor] am working on a Bayesian Forecasting model for the Mid-term elections that has two components:

1) A poll aggregation system with pooled and hierarchical house and design effects across every race with polls (Average Standard error for house seat level vote-share ~.055)

2) A Bafumi-style regression that applies national-swing to individual seats. (Average Standard error for house seat level vote-share ~.06)

Since these two estimates are essentially independent, estimates can probably be made more accurate by pooling them together. But If a house effect changes in one draw, that changes estimates in every race. Changes in regression coefficients and National swing have a similar effect. In the face of high and possibly differing seat-to-seat correlations from each method, I'm not sure what the correct way to "blend" these models would be, either for individual or top-line seat estimates.

In the mean-time, I'm just creating variance-weighted averages in excel for seat level estimates and using bayes rule to mix the two seat distributions to get pdfs, which I suspect is sufficient for this particular application. But I'm very curious what the "right" thing to do would be.

My reply:

I'm not quite sure what the right thing to do is here--I'm not following the details of what you're doing, exactly--but I'll give you my general advice, which is that it's usually worth it to create a model for the data and work through the likelihood etc, rather than to create an estimator. If, instead of trying to figure out how to do a weighting, you directly model your data, an appropriate weighting might very well pop our satisfyingly from the posterior distribution. That's been my experience in various contexts, from elections to radon gas.

Shor clarifies:

The issue is that I have two competing models for these races, one that looks horse races as a state-space model (Random walk with noisy biased observations), and the other being that races follow some sort of year to year random walk like you've specified in your house papers.

I suppose they arn't radically different, and your book would recommend trying to build some greater model that incorporates the two interpretations as special cases or construct some hybrid.

Yes, I think a larger model would make sense. But I understand the short-term goal of having a good weighted average. I suppose that a weighted average, weighting by inverse of forecast variance (which is what would be appropriate if the forecasts were truly independent) would make sense.

Categories: Popular Blogs

Interactions of predictors in a causal model

Gelman's Blog - September 2, 2010

Michael Bader writes:

What is the best way to examine interactions of independent variables in a propensity weights framework? Let's say we are interested in estimating breathing difficulty (measured on a continuous scale) and our main predictor is age of housing. The object is to estimate whether living in housing 20 years or older is associated with breathing difficulty compared counterfactually to those living in housing less than 20 years old; as a secondary question, we want to know whether that effect differs for those in poverty compared to those not in poverty. In our first-stage propensity model, we include whether the respondent lives in poverty. The weights applied to the other covariates in the propensity model are similar to those living in poverty compared to those who are not. Now, can I simply interact the poverty variable with the age of construction variable to look at the interaction of age of housing and poverty on breathing difficulty? My thought is no -- in order to get the true interaction, I would need to have the propensity model estimate the joint probability of being poor and living in older housing (as well as the other three cells in the 2x2 table) because without doing this, the main effect of being poor is being washed out by the propensity weights in the first step. On the other hand, if the weights are comparable across the model when we stratify on poverty, I'm not sure whether it will have much of an effect. Or, I could be totally incorrect and running the interaction with the poverty variable is sufficient.

I [Bader] am happy to read up on the subject; but when I tried doing a search, all I could find were debates about adding interactions into the propensity model itself, not looking at interactions of separate independent variables in the model.

My reply:

I don't think it's a good idea to frame this in terms of weights or weighting. I think of propensity scores as just one particular method for the more general problem of constructing similar groups in a treatment/control comparison. (See chapter 10 of ARM for further discussion of this point.) In the example you describe above, you could compare people who lived in housing 20 years older to people who lived in more recent housing, matching on other variables including their previous poverty status. Then you can include the relevant interactions in your model. The whole propensity-weighting thing seems like a distraction from your real goals here.

Categories: Popular Blogs

R needs a good function to make line plots

Gelman's Blog - September 2, 2010

More and more I'm thinking that line plots are great. More specifically, two-way grids of line plots on common scales, with one, two, or three lines per plot (enough to show comparisons but not so many that you can't tell the lines apart). Also dot plots, of the sort that have been masterfully used by Lax and Phillips to show comparisons and trends in support for gay rights.

There's a big step missing, though, and that is to be able to make these graphs as a default. We have to figure out the right way to structure the data so these graphs come naturally.

Then when it's all working, we can talk the Excel people into implementing our ideas. I'm not asking to be paid here; all our ideas are in the public domain and I'm happy for Microsoft or Google or whoever to copy us.

P.S. Drew Conway writes:

This could be accomplished with ggplot2 using various combinations of the grammar. If I am understanding what you mean by line plots, here are some examples with code.

In fact, that website is a tremendous resource for all things data viz in R.

Categories: Popular Blogs

References on predicting elections

Gelman's Blog - September 1, 2010

Mike Axelrod writes:

I [Axelrod] am interested in building a model that predicts voting on the precinct level, using variables such as party registration, age, sex, income etc. Surely political scientists have worked on this problem.

I would be grateful for any reference you could provide in the way of articles and books.

My reply: Political scientists have worked on this problem, and it's easy enough to imagine hierarchical models of the sort discussed in my book with Jennifer. I can picture what I would do if asked to forecast at the precinct level, for example to model exit polls. (In fact, I was briefly hired by the exit poll consortium in 2000 to do this, but then after I told them about hierarchical Bayes, they un-hired me!) But I don't actually know of any literature on precinct-level forecasting. Perhaps one of you out there knows of some references?

Categories: Popular Blogs

Ratios where the numerator and denominator both change signs

Gelman's Blog - September 1, 2010

A couple years ago, I used a question by Benjamin Kay as an excuse to write that it's usually a bad idea to study a ratio whose denominator has uncertain sign. As I wrote then:

Similar problems arise with marginal cost-benefit ratios, LD50 in logistic regression (see chapter 3 of Bayesian Data Analysis for an example), instrumental variables, and the Fieller-Creasy problem in theoretical statistics. . . . In general, the story is that the ratio completely changes in interpretation when the denominator changes sign.

More recently, Kay sent in a related question:

I [Kay] wondered if you have any advice on handling ratios when the signs change as a result of a parameter.

I have three functions, one C * x^a, another D * x^a, and a third f(x,a) in my paper such that:

C * x^a, < f(x,a) < D * x^a

C,D and a all have the same signs.
We can divide through by C * x^a but the results depend on the sign of C either

1< f(x,a) / C * x^a < D * x^a / C * x^a,

or

1 / f(x,a) / C * x^a > D * x^a / C * x^a,

That is, when the sign on a changes, the inequalities flip.

I want to say something about the ratio C/D being close to one so that I can say something about how tight the bounds are on f(x,a) / C * x^a.

So being close (say within 5%) has a confusing presentation.

a>0 : 1< f(x,a) / C * x^a < 1.05

a<0 : 1> f(x,a) / C * x^a >.95

Which no one who has read my paper likes. They mostly find it confusing. I cannot be the first person to have to deal with this, so I wondered if you had any suggestions?

My reply:

I have to admit I can't understand much of your notation but I think I get the general picture. In other settings what I've done is to try to reformulate the problem. For example, instead of looking at C/D, look at C - D, or perhaps delta*(C-D), where delta is a positive quantity set to a reasonable value (of the order of magnitude of |C+D|).

My feeling is that if you carefully express these things as decision problems, ultimately it's differences rather than ratios that really matter. We use ratios because they are conveniently scale-free, but really it shouldn't be hard to scale a difference in a reasonable way. The small amount of effort placed into scaling can pay off big-time in clean and direct interpretation.

Categories: Popular Blogs

How does Bayes do it?

Gelman's Blog - September 1, 2010

I received the following message from a statistician working in industry:

I am studying your paper, A Weakly Informative Default Prior Distribution for Logistic and Other Regression Models. I am not clear why the Bayesian approaches with some priors can usually handle the issue of nonidentifiability or can get stable estimates of parameters in model fit, while the frequentist approaches cannot.

My reply:

1. The term "frequentist approach" is pretty general. "Frequentist" refers to an approach for evaluating inferences, not a method for creating estimates. In particular, any Bayes estimate can be viewed as a frequentist inference if you feel like evaluating its frequency properties. In logistic regression, maximum likelihood has some big problems that are solved with penalized likelihood--equivalently, Bayesian inference. A frequentist can feel free to consider the prior as a penalty function rather than a probability distribution of parameters.

2. The reason our approach works well is that we are adding information. In a logistic regression with separation, there is a lack of information in the likeilhood, and the prior distribution helps out by ruling out unrealistic possibilities.

3. There are settings where our Bayesian method will mess up. For example, if the true logistic regression coefficient is -20, and you have a moderate sample size, our estimate will be much closer to zero (while the maximum likelihood estimate will be minus infinity, which for some purposes might be an acceptable estimate).

Probably I should write more about this sometime. Various questions along those lines arose during my recent talk at Cambridge.

Categories: Popular Blogs

Predicting marathon times

Gelman's Blog - August 31, 2010

Frank Hansen writes:

I [Hansen] signed up for my first marathon race. Everyone asks me my predicted time. The predictors online seem geared to or are based off of elite runners. And anyway they seem a bit limited.

So I decided to do some analysis of my own.

I was going to put together a web page where people could get their race time predictions, maybe sell some ads for sports gps watches, but it might also be publishable.

I have 2 requests which obviously I don't want you to spend more than a few seconds on.

1. I was wondering if you knew of any sports performance researchers working on performance of not just elite athletes, but the full range of runners.

2. Can you suggest a way to do multilevel modeling of this. There are several natural subsets for the data but it's not obvious what makes sense. I describe the data below.

3. Phil (the runner/co-blogger who posted about weight loss) might be interested.

I collected race results for the Chicago marathon and 3 shorter races: Chicago Half Marathon, Soldier Field 10 Miler, Ravenswood 5k. I collected data from 2003 through 2009. Within each year I matched results for finishers between each shorter race and that year's marathon based on full name and age. I used python to scrape web pages for the results.

Of course in a particular year a given marathoner may have run more than one of the shorter races. At this point I am ignoring that, treating them as independent records even though they have the same marathon finish data.

I would think that knowing several shorter races to predict a marathon time would help, but demanding several matches really cuts down the data.

I also collected weather data, so I know the temperature, humidity, wind speed near 8 am for each race (in Chicago).

I end up with around 13,000 records. A record contains a marathon time, a short race time, the type of short race, the temperature, humidity and wind speed difference between the short race and the marathon. I also know the age and sex of the marathon finisher.

Taking logs helps the R-squared, but this way it's easier to interpret.

nt.form <- "mar.pace ~ short.pace + short.race.type + age + sex + temp.dif + humid.dif +wind.dif -1"

Call:
lm(formula = int.form, data = full.dat)

Residuals:
Min 1Q Median 3Q Max
-510.061 -36.867 -5.632 34.116 510.552

Coefficients:
Estimate Std. Error t value Pr(>|t|)
short.pace 0.999389 0.006703 149.087 < 2e-16 ***
short.race.typehalf 82.630974 4.242505 19.477 < 2e-16 ***
short.race.typerw 106.133301 4.347218 24.414 < 2e-16 ***
short.race.typesf10 89.458519 4.209498 21.252 < 2e-16 ***
age 0.321860 0.064960 4.955 7.33e-07 ***
sexM 8.444752 1.286381 6.565 5.41e-11 ***
temp.dif 1.516766 0.051981 29.179 < 2e-16 ***
humid.dif 0.128886 0.041519 3.104 0.00191 **
wind.dif -1.534700 0.150816 -10.176 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 " " 1

Residual standard error: 65.79 on 13004 degrees of freedom
Multiple R-squared: 0.9895, Adjusted R-squared: 0.9895
F-statistic: 1.368e+05 on 9 and 13004 DF, p-value: < 2.2e-16

In the regression results the marathon and short race "pace" variable is in seconds per mile, so the short.race.typehalf equal to 82 means roughly add 82 seconds to your half marathon mile pace to get the marathon mile pace, and so on for the inde[endent variables. Temperature is in Fahrenheit, Humidity in %, Wind Speed in mph.

Marathon day for 2009 was really cold, predicting pace for 2009 based on a fit of the other years has larger errors than predicting 2008 using a fit for the non-2008 data.

My main piece of advice is to never ever ever ever ever use "summary" to display regression outputs in R. Only use "display" or "coefplot". Unless, that is, you care that your standard error is "4.242505" or that your p-value is "4.242505" or that your F-statistic is "1.368e+05". I don't. But, then again, I'm a Bayesian.

Categories: Popular Blogs

Somewhat Bayesian multilevel modeling

Gelman's Blog - August 31, 2010

Eric McGhee writes:

I'm trying to generate county-level estimates from a statewide survey of California using multilevel modeling. I would love to learn the full Bayesian approach, but I'm on a tight schedule and worried about teaching myself something of that complexity in the time available.

I'm hoping I can use the classical approach and simulate standard errors using what you and Jennifer Hill call the "informal Bayesian" method. This has raised a few questions:

First, what are the costs of using this approach as opposed to full Bayesian?

Second, when I use the predictive simulation as described on p. 149 of "Data Analysis" on a binary dependent variable and a sample of 2000, I get a 5%-95% range of simulation results so large as to be effectively useless (on the order of +/- 15 points). This is true even for LA county, which has enough cases by itself (about 500) to get a standard error of about 2 points from simple disaggregation. However, if I simulate only with the coefficients and skip the step of random draws from a binomial distribution (i.e., use the technique described at the bottom of p. 148), I get results that are much more sensible (around +/- 5 points). Do the random draws from the binomial distribution only apply to out-of-sample predictions? Or do they apply to in-sample predictions, too? If the latter, any idea why I would be getting such a large range of results? Might that be signaling something wrong with the model, or with my R code?

Finally, when dealing with simulation results, what would most closely correspond to a margin of error? The 5%-95% interval mentioned above? Or something else? I need a way of summarizing uncertainty using a terminology that is familiar to a policy audience.

My reply:

The main benefit of full Bayes over approximate Bayes (of the sort done by lmer(), for example, and used in many of the examples in my book with Jennifer) arises when group-level variances are small. Approximate Bayes gives a point estimate of the variance parameters, which understates uncertainty compared to full Bayes. We are currently working on an add-on to lmer()-like programs to include some of that uncertainty, but we haven't done it yet, so I don't have any R package to conveniently offer you here.

Regarding your simulation question: Yes, if you're interested in estimating all of California, you don't want to do that binomial simulation--that's something you only do when you're simulating some finite amount of new data.

For the margin of error, you can just compute sd's from the simulations and then compute 2*sd. Or you can use the [2.5%, 97.5%] simulation points, but that will be pretty noisy unless you have thousands of simulations.

Categories: Popular Blogs

Computer models of the oil spill

Gelman's Blog - August 30, 2010

Chris Wilson points me to this visualization of three physical models of the oil spill in the Gulf of Mexico. Cool (and scary) stuff. Wilson writes:

One of the major advantages is that the models are 3D and show the plumes and tails beneath the surface. One of the major disadvantages is that they're still just models.
Categories: Popular Blogs