Wiley Online Library : Journal of the Royal Statistical Society: Series B (Statistical Methodology)
Updated: 24 min 24 sec ago
September 1, 2010
Summary. I describe the background for the paper ‘Controlling the false discovery rate: a new and powerful approach to multiple comparisons’ by Benjamini and Hochberg that was published in the Journal of the Royal Statistical Society, Series B, in 1995. I review the progress since made on the false discovery rate, as well as the major conceptual developments that followed.
September 1, 2010
Summary. Bayesian methods for inference on finite population means and other parameters by using sample survey data face hurdles in all three phases of the inferential procedure: the formulation of a likelihood function, the choice of a prior distribution and the validity of posterior inferences under the design-based frequentist framework. In the case of independent and identically distributed observations, the profile empirical likelihood function of the mean and a non-informative prior on the mean can be used as the basis for inference on the mean and the resulting Bayesian empirical likelihood intervals are also asymptotically valid under the frequentist set-up. For complex survey data, we show that a pseudo-empirical-likelihood approach can be used to construct Bayesian pseudo-empirical-likelihood intervals that are asymptotically valid under the design-based set-up. The approach proposed compares favourably with a full Bayesian analysis under simple random sampling without replacement. It is also valid under general single-stage unequal probability sampling designs, unlike a full Bayesian analysis. Moreover, the approach is very flexible in using auxiliary population information and can accommodate two scenarios which are practically important: incorporation of known auxiliary population information for the construction of intervals by using the basic design weights; calculation of intervals by using calibration weights based on known auxiliary population means or totals.
September 1, 2010
Summary. Estimation of structure, such as in variable selection, graphical modelling or cluster analysis, is notoriously difficult, especially for high dimensional data. We introduce stability selection. It is based on subsampling in combination with (high dimensional) selection algorithms. As such, the method is extremely general and has a very wide range of applicability. Stability selection provides finite sample control for some error rates of false discoveries and hence a transparent principle to choose a proper amount of regularization for structure estimation. Variable selection and structure estimation improve markedly for a range of selection methods if stability selection is applied. We prove for the randomized lasso that stability selection will be variable selection consistent even if the necessary conditions for consistency of the original lasso method are violated. We demonstrate stability selection for variable selection and Gaussian graphical modelling, using real and simulated data.
September 1, 2010
Summary. The paper considers construction of simultaneous confidence tubes for time varying regression coefficients in functional linear models. Using a Gaussian approximation result for non-stationary multiple time series, we show that the constructed simultaneous confidence tubes have asymptotically correct nominal coverage probabilities. Our results are applied to the problem of testing whether the regression coefficients are of certain parametric forms, which is a fundamental problem in the inference of functional linear models. As an application, we analyse an environmental data set and study the association between levels of pollutants and hospital admissions.
September 1, 2010
Summary. It is possible to implement importance sampling, and particle filter algorithms, where the importance sampling weight is random. Such random-weight algorithms have been shown to be efficient for inference for a class of diffusion models, as they enable inference without any (time discretization) approximation of the underlying diffusion model. One difficulty of implementing such random-weight algorithms is the requirement to have weights that are positive with probability 1. We show how Wald's identity for martingales can be used to ensure positive weights. We apply this idea to analysis of diffusion models from high frequency data. For a class of diffusion models we show how to implement a particle filter, which uses all the information in the data, but whose computational cost is independent of the frequency of the data. We use the Wald identity to implement a random-weight particle filter for these models which avoids time discretization error.
September 1, 2010
Summary. We study generalized linear latent variable models without requiring a distributional assumption of the latent variables. Using a geometric approach, we derive consistent semiparametric estimators. We demonstrate that these models have a property which is similar to that of a sufficient complete statistic, which enables us to simplify the estimating procedure and explicitly to formulate the semiparametric estimating equations. We further show that the explicit estimators have the usual root n consistency and asymptotic normality. We explain the computational implementation of our method and illustrate the numerical performance of the estimators in finite sample situations via extensive simulation studies. The advantage of our estimators over the existing likelihood approach is also shown via numerical comparison. We employ the method to analyse a real data example from economics.
June 1, 2010
Summary. Markov chain Monte Carlo and sequential Monte Carlo methods have emerged as the two main tools to sample from high dimensional probability distributions. Although asymptotic convergence of Markov chain Monte Carlo algorithms is ensured under weak assumptions, the performance of these algorithms is unreliable when the proposal distributions that are used to explore the space are poorly chosen and/or if highly correlated variables are updated independently. We show here how it is possible to build efficient high dimensional proposal distributions by using sequential Monte Carlo methods. This allows us not only to improve over standard Markov chain Monte Carlo schemes but also to make Bayesian inference feasible for a large class of statistical models where this was not previously so. We demonstrate these algorithms on a non-linear state space model and a Lévy-driven stochastic volatility model.
June 1, 2010
Summary. We propose a new method to construct confidence intervals for quantities that are associated with a stationary time series, which avoids direct estimation of the asymptotic variances. Unlike the existing tuning-parameter-dependent approaches, our method has the attractive convenience of being free of any user-chosen number or smoothing parameter. The interval is constructed on the basis of an asymptotically distribution-free self-normalized statistic, in which the normalizing matrix is computed by using recursive estimates. Under mild conditions, we establish the theoretical validity of our method for a broad class of statistics that are functionals of the empirical distribution of fixed or growing dimension. From a practical point of view, our method is conceptually simple, easy to implement and can be readily used by the practitioner. Monte Carlo simulations are conducted to compare the finite sample performance of the new method with those delivered by the normal approximation and the block bootstrap approach.
June 1, 2010
Summary. We develop a sufficient dimension reduction paradigm for inhomogeneous spatial point processes driven by Gaussian random fields. Specifically, we introduce the notion of the kth-order central intensity subspace. We show that a central subspace can be defined as the combination of all central intensity subspaces. For many commonly used spatial point process models, we find that the central subspace is equivalent to the first-order central intensity subspace. To estimate the latter, we propose a flexible framework under which most existing benchmark inverse regression methods can be extended to the spatial point process setting. We develop novel graphical and formal testing methods to determine the structural dimension of the central subspace. These methods are extremely versatile in that they do not require any specific model assumption on the correlation structures of the covariates and the spatial point process. To illustrate the practical use of the methods proposed, we apply them to both simulated data and two real examples.
June 1, 2010
Summary. Spatial linear models are popular for the analysis of data on a spatial lattice, but statistical techniques for selection of covariates and a neighbourhood structure are limited. Here we develop new methodology for simultaneous model selection and parameter estimation via penalized maximum likelihood under a spatial adaptive lasso. A computationally efficient algorithm is devised for obtaining approximate penalized maximum likelihood estimates. Asymptotic properties of penalized maximum likelihood estimates and their approximations are established. A simulation study shows that the method proposed has sound finite sample properties and, for illustration, we analyse an ecological data set in western Canada.