Wiley Online Library : Biometrical Journal
Updated: 7 min 48 sec ago
June 1, 2010
The objective of this study was to develop methods to estimate the optimal threshold of a longitudinal biomarker and its credible interval when the diagnostic test is based on a criterion that reflects a dynamic progression of that biomarker. Two methods are proposed: one parametric and one non-parametric. In both the cases, the Bayesian inference was used to derive the posterior distribution of the optimal threshold from which an estimate and a credible interval could be obtained. A numerical study shows that the bias of the parametric method is low and the coverage probability of the credible interval close to the nominal value, with a small coverage asymmetry in some cases. This is also true for the non-parametric method in case of large sample sizes. Both the methods were applied to estimate the optimal prostate-specific antigen nadir value to diagnose prostate cancer recurrence after a high-intensity focused ultrasound treatment. The parametric method can also be applied to non-longitudinal biomarkers.
June 1, 2010
In the development of structural equation models (SEMs), observed variables are usually assumed to be normally distributed. However, this assumption is likely to be violated in many practical researches. As the non-normality of observed variables in an SEM can be obtained from either non-normal latent variables or non-normal residuals or both, semiparametric modeling with unknown distribution of latent variables or unknown distribution of residuals is needed. In this article, we find that an SEM becomes nonidentifiable when both the latent variable distribution and the residual distribution are unknown. Hence, it is impossible to estimate reliably both the latent variable distribution and the residual distribution without parametric assumptions on one or the other. We also find that the residuals in the measurement equation are more sensitive to the normality assumption than the latent variables, and the negative impact on the estimation of parameters and distributions due to the non-normality of residuals is more serious. Therefore, when there is no prior knowledge about parametric distributions for either the latent variables or the residuals, we recommend making parametric assumption on latent variables, and modeling residuals nonparametrically. We propose a semiparametric Bayesian approach using the truncated Dirichlet process with a stick breaking prior to tackle the non-normality of residuals in the measurement equation. Simulation studies and a real data analysis demonstrate our findings, and reveal the empirical performance of the proposed methodology. A free WinBUGS code to perform the analysis is available in Supporting Information.
June 1, 2010
Important aspects of population evolution have been investigated using nucleotide sequences. Under the neutral Wright–Fisher model, the scaled mutation rate represents twice the average number of new mutations per generations and it is one of the key parameters in population genetics. In this study, we present various methods of estimation of this parameter, analytical studies of their asymptotic behavior as well as comparisons of the distribution's behavior of these estimators through simulations. As knowledge of the genealogy is needed to estimate the maximum likelihood estimator (MLE), an application with real data is also presented, using jackknife to correct the bias of the MLE, which can be generated by the estimation of the tree. We proved analytically that the Waterson's estimator and the MLE are asymptotically equivalent with the same rate of convergence to normality. Furthermore, we showed that the MLE has a better rate of convergence than Waterson's estimator for values of the parameter greater than one and this relationship is reversed when the parameter is less than one.
June 1, 2010
The clinical pulmonary infection score (CPIS) and bronchoalveolar lavage (BAL) are important diagnostic variables of pneumonia for forcefully ventilated patients who are susceptible to nosocomial infection. Because of its invasive nature, BAL is performed for patients only if the CPIS is greater than a certain threshold value. Thus, CPIS and BAL are closely related, yet BAL values are substantially missing. In a randomized clinical trial, the control and oral treatment groups were compared based on the outcomes from these procedures. Because of the relevance of both outcomes with respect to evaluating the efficacy of treatments, we propose and examine a nonparametric test based on these outcomes, which employs the empirical likelihood methodology. While efficient parametric methods are available when data are observed incompletely, performing appropriate goodness-of-fit tests to justify the parametric assumptions is difficult. Our motivation is to provide an approach based on no particular distributional assumption, which enables us to use all observed bivariate data, whether completed or not in an approximate likelihood manner. A broad Monte Carlo study evaluates the asymptotic properties and efficiency of the proposed method based on various sample sizes and underlying distributions. The proposed technique is applied to a data set from a pneumonia study demonstrating its practical worth.
June 1, 2010
Auxiliary covariate data are often collected in biomedical studies when the primary exposure variable is only assessed on a subset of the study subjects. In this study, we investigate a semiparametric-estimated likelihood estimation for the generalized linear mixed models (GLMM) in the presence of a continuous auxiliary variable. We use a kernel smoother to handle continuous auxiliary data. The method can be used to deal with missing or mismeasured covariate data problems in a variety of applications when an auxiliary variable is available and cluster sizes are not too small. Simulation study results show that the proposed method performs better than that which ignores the random effects in GLMM and that which only uses data in the validation data set. We illustrate the proposed method with a real data set from a recent environmental epidemiology study on the maternal serum 1,1-dichloro-2,2-bis(p-chlorophenyl) ethylene level in relationship to preterm births.
June 1, 2010
Multiple diagnostic tests and risk factors are commonly available for many diseases. This information can be either redundant or complimentary. Combining them may improve the diagnostic/predictive accuracy, but also unnecessarily increase complexity, risks, and/or costs. The improved accuracy gained by including additional variables can be evaluated by the increment of the area under (AUC) the receiver-operating characteristic curves with and without the new variable(s). In this study, we derive a new test statistic to accurately and efficiently determine the statistical significance of this incremental AUC under a multivariate normality assumption. Our test links AUC difference to a quadratic form of a standardized mean shift in a unit of the inverse covariance matrix through a properly linear transformation of all diagnostic variables. The distribution of the quadratic estimator is related to the multivariate Behrens–Fisher problem. We provide explicit mathematical solutions of the estimator and its approximate non-central F-distribution, type I error rate, and sample size formula. We use simulation studies to prove that our new test maintains prespecified type I error rates as well as reasonable statistical power under practical sample sizes. We use data from the Study of Osteoporotic Fractures as an application example to illustrate our method.