Biometrika :: latest articles

Syndicate content
Biometrika - RSS feed of recent issues (covers the latest 3 issues, including the current issue)
Updated: 16 min 28 sec ago

Generalized empirical likelihood methods for analyzing longitudinal data

February 16, 2010 - 10:49am

Efficient estimation of parameters is a major objective in analyzing longitudinal data. We propose two generalized empirical likelihood-based methods that take into consideration within-subject correlations. A nonparametric version of the Wilks theorem for the limiting distributions of the empirical likelihood ratios is derived. It is shown that one of the proposed methods is locally efficient among a class of within-subject variance-covariance matrices. A simulation study is conducted to investigate the finite sample properties of the proposed methods and compares them with the block empirical likelihood method by You et al. (2006) and the normal approximation with a correctly estimated variance-covariance. The results suggest that the proposed methods are generally more efficient than existing methods that ignore the correlation structure, and are better in coverage compared to the normal approximation with correctly specified within-subject correlation. An application illustrating our methods and supporting the simulation study results is presented.

Categories: Statistical Journals

On the use of stochastic ordering to test for trend with clustered binary data

February 16, 2010 - 10:49am

We introduce the use of stochastic ordering for defining treatment-related trend in clustered exchangeable binary data for both when cluster sizes are fixed and when they vary randomly. In the latter case, there is a well-documented tendency for such data to be sparse, a problem we address by making an assumption of interpretability or, equivalently, marginal compatibility. Our procedures are based on a representation of the joint distribution of binary exchangeable random variables by a saturated model, and may hence be considered nonparametric. The definition of trend by stochastic ordering is proposed to ensure a flexibility that allows for various forms of monotone increases in response to the cluster as a whole to be included in the evaluation of the trend. We obtain maximum likelihood estimates of probability functions under stochastic ordering using mixture-likelihood-based algorithms. Since the data are sparse, we avoid the use of asymptotic results and obtain p-values of the likelihood ratio procedures by permutation resampling. We demonstrate how the proposed framework can be used in risk assessment, and provide comparisons with existing procedures.

Categories: Statistical Journals

Stochastic approximation with virtual observations for dose-finding on discrete levels

February 16, 2010 - 10:49am

Phase I clinical studies are experiments in which a new drug is administered to humans to determine the maximum dose that causes toxicity with a target probability. Phase I dose-finding is often formulated as a quantile estimation problem. For studies with a biological endpoint, it is common to define toxicity by dichotomizing the continuous biomarker expression. In this article, we propose a novel variant of the Robbins–Monro stochastic approximation that utilizes the continuous measurements for quantile estimation. The Robbins–Monro method has seldom seen clinical applications, because it does not perform well for quantile estimation with binary data and it works with a continuum of doses that are generally not available in practice. To address these issues, we formulate the dose-finding problem as root-finding for the mean of a continuous variable, for which the stochastic approximation procedure is efficient. To accommodate the use of discrete doses, we introduce the idea of virtual observation that is defined on a continuous dosage range. Our proposed method inherits the convergence properties of the stochastic approximation algorithm and its computational simplicity. Simulations based on real trial data show that our proposed method improves accuracy compared with the continual re-assessment method and produces results robust to model misspecification.

Categories: Statistical Journals

Sharp bounds on causal effects in case-control and cohort studies

February 16, 2010 - 10:49am

Evaluating the causal effect of an exposure on a response from case-control and cohort studies is a major concern in epidemiological and medical research. Since causal effects are in general nonidentifiable from such studies, this paper derives bounds on two causal measures: the causal risk difference and the causal risk ratio. We use the potential response approach and a linear programming method to derive sharp bounds on the causal risk difference, and a novel fractional programming method to derive bounds on the causal risk ratio. In addition, in the presence of missing data, we consider three different missingness mechanisms and propose sharp bounds under these situations. The results provide new guidance on causal inference in case-control and cohort studies.

Categories: Statistical Journals

A semiparametric random effects model for multivariate competing risks data

February 16, 2010 - 10:49am

We propose a semiparametric random effects model for multivariate competing risks data when the failures of a particular type are of interest. Under this model, the marginal cumulative incidence functions follow a generalized semiparametric additive model. The associations between the cause-specific failure times can be studied through dependence parameters of copula functions that are allowed to depend on cluster-level covariates. A cross-odds ratio-type measure is proposed to describe the associations between cause-specific failure times, and its relationship to the dependence parameters is explored. We develop a two-stage estimation procedure where the marginal models are estimated in the first stage and the dependence parameters are estimated in the second stage. The large sample properties of the proposed estimators are derived. The proposed procedures are applied to Danish twin data to model the cumulative incidence for the age of natural menopause and to investigate the association in the onset of natural menopause between monozygotic and dizygotic twins.

Categories: Statistical Journals

Estimation of the retransformed conditional mean in health care cost studies

February 16, 2010 - 10:49am

We propose a new approach for analyzing skewed and heteroscedastic health care cost data through regression of the conditional quantiles of the transformed cost. Using the appealing equivariance property of quantiles to monotone transformations, we propose a distribution-free estimator of the conditional mean cost on the original scale. The proposed method is extended to a two-part heteroscedastic model to account for zero costs commonly seen in health care cost studies. Simulation studies indicate that the proposed estimator has competitive and more robust performance than existing estimators in various heteroscedastic models.

Categories: Statistical Journals

Mean loglikelihood and higher-order approximations

February 16, 2010 - 10:49am

Higher-order approximations to p-values can be obtained from the loglikelihood function and a reparameterization that can be viewed as a canonical parameter in an exponential family approximation to the model. This approach clarifies the connection between Skovgaard (1996) and Fraser et al. (1999a), and shows that the Skovgaard approximation can be obtained directly using the mean loglikelihood function.

Categories: Statistical Journals

On doubly robust estimation in a semiparametric odds ratio model

February 16, 2010 - 10:49am

We consider the doubly robust estimation of the parameters in a semiparametric conditional odds ratio model. Our estimators are consistent and asymptotically normal in a union model that assumes either of two variation independent baseline functions is correctly modelled but not necessarily both. Furthermore, when either outcome has finite support, our estimators are semiparametric efficient in the union model at the intersection submodel where both nuisance functions models are correct. For general outcomes, we obtain doubly robust estimators that are nearly efficient at the intersection submodel. Our methods are easy to implement as they do not require the use of the alternating conditional expectations algorithm of Chen (2007).

Categories: Statistical Journals

On Bayesian testimation and its application to wavelet thresholding

February 16, 2010 - 10:49am

We consider the problem of estimating the unknown response function in the Gaussian white noise model. We first utilize the recently developed Bayesian maximum a posteriori testimation procedure of Abramovich et al. (2007) for recovering an unknown high-dimensional Gaussian mean vector. The existing results for its upper error bounds over various sparse lp-balls are extended to more general cases. We show that, for a properly chosen prior on the number of nonzero entries of the mean vector, the corresponding adaptive estimator is asymptotically minimax in a wide range of sparse and dense lp-balls. The proposed procedure is then applied in a wavelet context to derive adaptive global and level-wise wavelet estimators of the unknown response function in the Gaussian white noise model. These estimators are then proven to be, respectively, asymptotically near-minimax and minimax in a wide range of Besov balls. These results are also extended to the estimation of derivatives of the response function. Simulated examples are conducted to illustrate the performance of the proposed level-wise wavelet estimator in finite sample situations, and to compare it with several existing counterparts.

Categories: Statistical Journals

Forecasting for quantile self-exciting threshold autoregressive time series models

February 16, 2010 - 10:49am

Self-exciting threshold autoregressive time series models have been used extensively, and the conditional mean obtained from these models can be used to predict the future value of a random variable. In this paper we consider quantile forecasts of a time series based on the quantile self-exciting threshold autoregressive time series models proposed by Cai and Stander (2008) and present a new forecasting method for them. Simulation studies and application to real time series show that the method works very well.

Categories: Statistical Journals

A note on the sensitivity to assumptions of a generalized linear mixed model

February 16, 2010 - 10:49am

A simple case of Poisson regression is used to study the potential gain in efficiency from using a mixed model representation. Possible systematic errors arising from misspecification of the random terms in the model are examined. It is shown in particular that for a special but realistic problem, appreciable bias may arise from misspecification of a random component.

Categories: Statistical Journals

Pseudo-score confidence intervals for parameters in discrete statistical models

February 16, 2010 - 10:49am

We propose pseudo-score confidence intervals for parameters in models for discrete data. The confidence interval is obtained by inverting a test that uses a Pearson chi-squared statistic to compare fitted values for the working model with fitted values of the model when a parameter of interest takes various fixed values. For multinomial models, the pseudo-score method simplifies to the score method when the model is saturated and otherwise it is asymptotically equivalent to score and likelihood ratio test-based inferences. For cases in which ordinary score methods are impractical, such as when the likelihood function is not an explicit function of model parameters, the pseudo-score method is feasible. We illustrate the method for four such examples. Generalizations of the method are also presented for future research, including inference for complex sampling designs using a quasilikelihood Pearson statistic that compares fitted values for two models relative to the variance of the observations under the simpler model.

Categories: Statistical Journals

Global and local spectral-based tests for periodicities

February 16, 2010 - 10:49am

We investigate tests for periodicity based on a spectral analysis of a time series, differentiating between global and local spectral-based tests. Global tests use information across the entire frequency band,whereas local tests are based on a window around the test frequency.We show that many spectral-based tests can be expressed in terms of a regression-based F test, which allows for approximate size and power calculations. Since global tests are usually derived assuming white noise errors, we extend to the correlated noise case. We demonstrate via a Monte Carlo study that although the global test may have better size and power, local tests are easier to use, and are comparable or better in terms of the power to detect periodicities, especially for spectra with a large dynamic range. We apply this methodology to a nonbehavioural test of hearing.

Categories: Statistical Journals

Weighted least squares approximate restricted likelihood estimation for vector autoregressive processes

February 16, 2010 - 10:49am

We derive a weighted least squares approximate restricted likelihood estimator for a k-dimensional pth-order autoregressive model with intercept. Exact likelihood optimization of this model is generally infeasible due to the parameter space, which is complicated and high-dimensional, involving pk2 parameters. The weighted least squares estimator has significantly reduced bias and mean squared error than the ordinary least squares estimator for both stationary and nonstationary processes. Furthermore, at the unit root, the limiting distribution of the weighted least squares approximate restricted likelihood estimator is shown to be the zero-intercept Dickey–Fuller distribution, unlike the ordinary least squares with intercept estimator that has a different distribution with significantly higher bias.

Categories: Statistical Journals

Nonparametric Bayesian inference for the spectral density function of a random field

February 16, 2010 - 10:49am

A powerful technique for inference concerning spatial dependence in a random field is to use spectral methods based on frequency domain analysis. Here we develop a nonparametric Bayesian approach to statistical inference for the spectral density of a random field. We construct a multi-dimensional Bernstein polynomial prior for the spectral density and devise a Markov chain Monte Carlo algorithm to simulate from the posterior of the spectral density. The posterior sampling enables us to obtain a smoothed estimate of the spectral density as well as credible bands at desired levels. Simulation shows that our proposed method is more robust than a parametric approach. For illustration, we analyse a soil data example.

Categories: Statistical Journals

The distribution-based p-value for the outlier sum in differential gene expression analysis

February 16, 2010 - 10:49am

Outlier sums were proposed by Tibshirani & Hastie (2007) and Wu (2007) for detecting outlier genes where only a small subset of disease samples shows unusually high gene expression, but they did not develop their distributional properties and formal statistical inference. In this study, a new outlier sum for detection of outlier genes is proposed, its asymptotic distribution theory is developed, and the p-value based on this outlier sum is formulated. Its analytic form is derived on the basis of the large-sample theory. We compare the proposed method with existing outlier sum methods by power comparisons. Our method is applied to DNA microarray data from samples of primary breast tumors examined by Huang et al. (2003). The results show that the proposed method is more efficient in detecting outlier genes.

Categories: Statistical Journals

The maximal data piling direction for discrimination

February 16, 2010 - 10:49am

We study a discriminant direction vector that generally exists only in high-dimension, low sample size settings. Projections of data onto this direction vector take on only two distinct values, one for each class. There exist infinitely many such directions in the subspace generated by the data; but the maximal data piling vector has the longest distance between the projections. This paper investigates mathematical properties and classification performance of this discrimination method.

Categories: Statistical Journals

Systematic sampling with errors in sample locations

February 16, 2010 - 10:49am

Systematic sampling of points in continuous space is widely used in microscopy and spatial surveys. Classical theory provides asymptotic expressions for the variance of estimators based on systematic sampling as the grid spacing decreases. However, the classical theory assumes that the sample grid is exactly periodic; real physical sampling procedures may introduce errors in the placement of the sample points. This paper studies the effect of errors in sample positioning on the variance of estimators in the case of one-dimensional systematic sampling. First we sketch a general approach to variance analysis using point process methods. We then analyze three different models for the error process, calculate exact expressions for the variances, and derive asymptotic variances. Errors in the placement of sample points can lead to substantial inflation of the variance, dampening of zitterbewegung, that is fluctuation effects, and a slower order of convergence. This suggests that the current practice in some areas of microscopy may be based on over-optimistic predictions of estimator accuracy.

Categories: Statistical Journals

Cross-covariance functions for multivariate random fields based on latent dimensions

February 16, 2010 - 10:49am

The problem of constructing valid parametric cross-covariance functions is challenging. We propose a simple methodology, based on latent dimensions and existing covariance models for univariate random fields, to develop flexible, interpretable and computationally feasible classes of cross-covariance functions in closed form. We focus on spatio-temporal cross-covariance functions that can be nonseparable, asymmetric and can have different covariance structures, for instance different smoothness parameters, in each component. We discuss estimation of these models and perform a small simulation study to demonstrate our approach. We illustrate our methodology on a trivariate spatio-temporal pollution dataset from California and demonstrate that our cross-covariance performs better than other competing models.

Categories: Statistical Journals

Incorporating prior probabilities into high-dimensional classifiers

February 16, 2010 - 10:49am

In standard parametric classifiers, or classifiers based on nonparametric methods but where there is an opportunity for estimating population densities, the prior probabilities of the respective populations play a key role. However, those probabilities are largely ignored in the construction of high-dimensional classifiers, partly because there are no likelihoods to be constructed or Bayes risks to be estimated. Nevertheless, including information about prior probabilities can reduce the overall error rate, particularly in cases where doing so is most important, i.e. when the classification problem is particularly challenging and error rates are not close to zero. In this paper we suggest a general approach to reducing error rate in this way, by using a method derived from Breiman’s bagging idea. The potential improvements in performance are identified in theoretical and numerical work, the latter involving both applications to real data and simulations. The method is simple and explicit to apply, and does not involve choice of any tuning parameters.

Categories: Statistical Journals