Share |

J Royal Stat Soc, Ser C (JRSS,C)

Syndicate content
Wiley Online Library : Journal of the Royal Statistical Society: Series C (Applied Statistics)
Updated: 10 min 23 sec ago

Mixed effect modelling of proteomic mass spectrometry data by using Gaussian mixtures

August 1, 2010
Summary.  Statistical methodology for the analysis of proteomic mass spectrometry data is proposed using mixed effects models. Each high dimensional spectrum is represented by using a near orthogonal low dimensional representation with a basis of Gaussian mixture functions. Linear mixed effect models are proposed in the lower dimensional space. In particular, differences between groups are investigated by using fixed effect parameters, and individual variability of spectra is modelled by using random effects. A deterministic peak fitting algorithm provides estimates of the near orthogonal Gaussian basis. The mixed effects model is fitted by using restricted maximum likelihood, and a parallel fitting procedure is used for computational convenience. The methodology is applied to proteomic mass spectrometry data from serum samples from melanoma patients who were categorized as stage I or stage IV, and significant locations of peaks are identified.
Categories: Statistical Journals

Interval-censored data with repeated measurements and a cured subgroup

August 1, 2010
Summary.  The hypobaric decompression sickness data study was conducted by the National Aeronautics and Space Administration to investigate the risk of decompression sickness in hypobaric environments. The quantity of interest is the time to onset of grade IV venous gas emboli, which was mixed case interval censored because of measurement limitations. In the study, some subjects participated in multiple experiments, leading to repeated and correlated measurements on those subjects. In addition, it has been suggested that some subjects had a much lower risk of developing grade IV venous gas emboli than others, i.e. those subjects were immune from the event of interest (or ‘cured’). We propose to use two-part models, where the first part describes the probability of cure and the second part describes the survival for susceptible subjects. We use two random effects to account for the correlated nature of measurements. A leverage bootstrap approach is proposed for model diagnosis. A simulation study shows satisfactory performance of the estimation and diagnosis approaches proposed. Model estimation and evaluation of the hypobaric decompression sickness data are carefully investigated.
Categories: Statistical Journals

Locally stationary wavelet fields with application to the modelling and analysis of image texture

August 1, 2010
Summary.  The paper proposes the modelling and analysis of image texture by using an extension of a locally stationary wavelet process model into two dimensions for lattice processes. Such a model permits construction of estimates of a spatially localized spectrum and localized autocovariance which can be used to characterize texture in a multiscale and spatially adaptive way. We provide the necessary theoretical support to show that our two-dimensional extension is properly defined and has the proper statistical convergence properties. Our use of a statistical model permits us to identify, and correct for, a bias in established texture measures based on non-decimated wavelet techniques. The method proposed performs nearly as well as optimal Fourier techniques on stationary textures and outperforms them in non-stationary situations. We illustrate our techniques by using pilled fabric data from a fabric care experiment and simulated tile data.
Categories: Statistical Journals

Estimating the prevalence of sensitive behaviour and cheating with a dual design for direct questioning and randomized response

August 1, 2010
Summary.  Randomized response is a misclassification design to estimate the prevalence of sensitive behaviour. Respondents who do not follow the instructions of the design are considered to be cheating. A mixture model is proposed to estimate the prevalence of sensitive behaviour and cheating in the case of a dual sampling scheme with direct questioning and randomized response. The mixing weight is the probability of cheating, where cheating is modelled separately for direct questioning and randomized response. For Bayesian inference, Markov chain Monte Carlo sampling is applied to sample parameter values from the posterior. The model makes it possible to analyse dual sample scheme data in a unified way and to assess cheating for direct questions as well as for randomized response questions. The research is illustrated with randomized response data concerning violations of regulations for social benefit.
Categories: Statistical Journals

Mathematical models for coinfection by two sexually transmitted agents: the human immunodeficiency virus and herpes simplex virus type 2 case

August 1, 2010
Summary.  To study the interactions between two sexually transmitted diseases without remission of the infections, we propose to use Markovian models. One model allows the estimation of the per-partnership female-to-male transmission probabilities for each infection, and the other the per-sex-act transmission probabilities. These models take into account the essential factors for the propagation of both infections, including the variability according to age of the rates of prevalence in the population of female partners for the male individuals constituting our sample. We estimate transmission probabilities and relative risks (for circumcision, usage of condoms and the effect of one infection on the infectivity of the other) by using the maximum likelihood method. Bootstrap procedures are used to provide confidence intervals for the parameters. We illustrate the new procedures with the study of the interactions between herpes simplex virus type 2 and human immunodeficiency virus by using data from the male circumcision trial that was conducted in Orange Farm (South Africa). The study shows that the probability that a susceptible male individual acquires one of the viruses is significantly higher when he is already infected with the other. Using the Akaike information criterion, we show that the per-partnership model fits the data better than the per-sex-act model.
Categories: Statistical Journals

Weighted area under the receiver operating characteristic curve and its application to gene selection

August 1, 2010
Summary.  The partial area under the receiver operating characteristic curve (PAUC) has been proposed for gene selection by Pepe and co-workers and thereafter applied in real data analysis. It was noticed from empirical studies that this measure has several key weaknesses, such as an inability to reflect non-uniform weighting of different decision thresholds, resulting in large numbers of ties. We propose the weighted area under the receiver operating characteristic curve (WAUC) to address the problems that are associated with PAUC. Our proposed measure enjoys a greater flexibility to describe the discrimination accuracy of genes. Non-parametric and parametric estimation methods are introduced, including PAUC as a special case, along with theoretical properties of the estimators. We also provide a simple variance formula, yielding a novel variance estimator for non-parametric estimation of PAUC, which has proven challenging in previous work. The methods proposed permit sensitivity analyses, whereby the effect of differing weight functions on gene rankings may be assessed and results may be synthesized across weights. Simulations and reanalysis of a well-known microarray data set illustrate the practical utility of WAUC.
Categories: Statistical Journals

Bayesian change-point analysis for atomic force microscopy and soft material indentation

August 1, 2010
Summary.  Material indentation studies, in which a probe is brought into controlled physical contact with an experimental sample, have long been a primary means by which scientists characterize the mechanical properties of materials. More recently, the advent of atomic force microscopy, which operates on the same fundamental principle, has in turn revolutionized the nanoscale analysis of soft biomaterials such as cells and tissues. The paper addresses the inferential problems that are associated with material indentation and atomic force microscopy, through a framework for the change-point analysis of pre-contact and post-contact data that is applicable to experiments across a variety of physical scales. A hierarchical Bayesian model is proposed to account for experimentally observed change-point smoothness constraints and measurement error variability, with efficient Monte Carlo methods developed and employed to realize inference via posterior sampling for parameters such as Young's modulus, which is a key quantifier of material stiffness. These results are the first to provide the materials science community with rigorous inference procedures and quantification of uncertainty, via optimized and fully automated high throughput algorithms, implemented as the publicly available software package BayesCP. To demonstrate the consistent accuracy and wide applicability of this approach, results are shown for a variety of data sets from both macromaterials and micromaterials experiments—including silicone, neurons and red blood cells—conducted by the authors and others.
Categories: Statistical Journals

Validation of methods for identifying discontinuation of treatment from prescription data

August 1, 2010
Summary.  Prescription databases are increasingly used in epidemiological studies concerning the use and the effects of drugs. However, this source of data does not provide direct observations of the time of initiation and discontinuation of drug treatment, and these time points therefore need to be estimated. The paper investigates the validity of methods that are used in the literature to identify discontinuation of treatment from prescription data, and we consider the example of post-menopausal hormone therapy. Validation is investigated in terms of a simulation study based on a multistate model for the relationship between episodes of treatment with hormone therapy and occurrence of prescription refills. The multistate model that is introduced is estimated from joint observations of a prescription registry and a cross-sectional survey, involving techniques from the analysis of backward recurrence times. We demonstrate that estimated time points of discontinuation of treatment are highly uncertain, and this may influence studies concerning the immediate effect of discontinuation of treatment. Despite this limitation, we find that a valid assessment of current treatment status (never, current or previous drug use) can be obtained from prescription data.
Categories: Statistical Journals

A Bayesian model for biclustering with applications

August 1, 2010
Summary.  The paper proposes a Bayesian method for biclustering with applications to gene microarray studies, where we want to cluster genes and experimental conditions simultaneously. We begin by embedding bicluster analysis into the framework of a plaid model with random effects. The corresponding likelihood is then regularized by the hierarchical priors in each layer. The resulting posterior, which is asymptotically equivalent to a penalized likelihood, can attenuate the effect of high dimensionality on cluster predictions. We provide an empirical Bayes algorithm for sampling posteriors, in which we estimate the cluster memberships of all genes and samples by maximizing an explicit marginal posterior of these memberships. The new algorithm makes the estimation of the Bayesian plaid model computationally feasible and efficient. The performance of our procedure is evaluated on both simulated and real microarray gene expression data sets. The numerical results show that our proposal substantially outperforms the original plaid model in terms of misclassification rates across a range of scenarios. Applying our method to two yeast gene expression data sets, we identify several new biclusters which show the enrichment of known annotations of yeast genes.
Categories: Statistical Journals

Relative risk estimated from the ratio of two median unbiased estimates

August 1, 2010
Summary.  Clinical trials often include binary end points. In some cases, no successes are observed and the usual large sample estimates of relative risk are undefined. The paper proposes an estimator for relative risk based on the median unbiased estimator. The relative risk estimator proposed is well defined and performs satisfactorily for a wide range of data configurations. To facilitate the use of the estimator, a deterministic bootstrap confidence interval is also proposed, and an SAS macro is available to perform the necessary calculations. An on-going randomized clinical trial motivated the development of the estimator and is used to illustrate the approach.
Categories: Statistical Journals