Inter-rater reliability is usually assessed by means of the intraclass correlation coefficient. Using two-way analysis of variance to model raters and subjects as random effects, we derive group sequential testing procedures for the design and analysis of reliability studies in which multiple raters evaluate multiple subjects. Compared with the conventional fixed sample procedures, the group sequential test has smaller average sample number. The performance of the proposed technique is examined using simulation studies and critical values are tabulated for a range of two-stage design parameters. The methods are exemplified using data from the Physician Reliability Study for diagnosis of endometriosis.
Interim analysis; inter-rater reliability; intraclass correlation coefficient; measurement errors; sample size and power; two-way ANOVA
Women with a history of gestational diabetes mellitus (GDM) are at substantially increased risk of type 2 diabetes mellitus (T2DM). The identification of important modifiable factors could help prevent T2DM in this high-risk population.
To examine the role of physical activity and television watching and other sedentary behaviors, and changes in these behaviors in the progression from GDM to T2DM.
DESIGN, SETTING, AND PARTICIPANTS
Prospective cohort study of 4554 women from the Nurses’ Health Study II who had a history of GDM, as part of the ongoing Diabetes & Women’s Health Study. These women were followed up from 1991 to 2007.
Physical activity and television watching and other sedentary behaviors were assessed in 1991, 1997, 2001, and 2005.
MAIN OUTCOMES AND MEASURE
Incident T2DM identified through self-report and confirmed by supplemental questionnaires.
We documented 635 incident T2DM cases during 59287 person-years of follow-up. Each 5–metabolic equivalent hours per week (MET-h/wk) increment of total physical activity, which is equivalent to 100 minutes per week of moderate-intensity physical activity, was related to a 9% lower risk of T2DM (adjusted relative risk [RR], 0.91; 95% CI, 0.88–0.94); this inverse association remained significant after additional adjustment for body mass index (BMI). Moreover, an increase in physical activity was associated with a lower risk of developing T2DM. Compared with women who maintained their total physical activity levels, women who increased their total physical activity levels by 7.5 MET-h/wk or more (equivalent to 150 minutes per week of moderate-intensity physical activity) had a 47% lower risk of T2DM (RR, 0.53; 95% CI, 0.38–0.75); the association remained significant after additional adjustment for BMI. The multivariable adjusted RRs (95% CIs) for T2DM associated with television watching of 0 to 5, 6 to 10, 11 to 20, and 20 or more hours per week were 1 (reference), 1.28 (1.04–1.59), 1.41 (1.11–1.79), and 1.77 (1.28–2.45), respectively (P value for trend <.001); additional adjustment for BMI attenuated the association.
CONCLUSIONS AND RELEVANCE
Increasing physical activity may lower the risk of progression from GDM to T2DM. These findings suggest a hopeful message to women with a history of GDM, although they are at exceptionally high risk for T2DM, promoting an active lifestyle may lower the risk.
Multiple diagnostic tests or biomarkers can be combined to improve diagnostic accuracy. The problem of finding the optimal linear combinations of biomarkers to maximise the area under the receiver operating characteristic curve has been extensively addressed in the literature. The purpose of this article is threefold: (1) to provide an extensive review of the existing methods for biomarker combination; (2) to propose a new combination method, namely, the nonparametric stepwise approach; (3) to use leave-one-pair-out cross-validation method, instead of re-substitution method, which is overoptimistic and hence might lead to wrong conclusion, to empirically evaluate and compare the performance of different linear combination methods in yielding the largest area under receiver operating characteristic curve. A data set of Duchenne muscular dystrophy was analysed to illustrate the applications of the discussed combination methods.
Multiple biomarkers; receiver operating characteristic curve; area under the receiver operating characteristic curve; linear combination; diagnostic/prognostic accuracy
Rationale and Objectives
The estimation of the area under the receiver operating characteristic (ROC) curve (AUC) often relies on the assumption that the truly positive population tends to have higher marker results than the truly negative population. The authors propose a discriminatory measure to relax such an assumption and apply the measure to identify the appropriate set of markers for combination.
Materials and Methods
The proposed measure is based on the maximum of the AUC and 1-AUC. The existing methods are applied to estimate the measure. The subset of markers are selected using a combination method which maximizes a function of the proposed discriminatory score with the number of markers as a penalty in the function.
The properties of the estimators for the proposed measure were studied through large-scale simulation studies. The application was illustrated through a real example to identify the set of markers to combine.
Simulation results showed excellent small-sample performance of the estimators for the proposed measure. The application in the example yielded a reasonable subset of markers for combination.
Receiver operating characteristic (ROC); Area under the ROC curves (AUC); Discriminatory score; Box-Cox transformation
Clinical trials utilizing predictive biomarkers have become a research focus in personalized medicine. We investigate the effects of biomarker misclassification on the design and analysis of stratified biomarker clinical trials. For a variety of inference problems including marker-treatment interaction in particular, we show that marker misclassification may have profound adverse effects on the coverage of confidence intervals, power of the tests, and required sample sizes. For each inferential problem we propose methods to adjust for the classification errors.
Biomarkers; classification error; correction for error; personalized medicine; power and sample size; prevalence; randomized controlled clinical trials; sensitivity and specificity
Diagnostic trials often require the use of a homogeneity test among several markers. Such a test may be necessary to determine the power both during the design phase and in the initial analysis stage. However, no formal method is available for the power and sample size calculation when the number of markers is greater than two and marker measurements are clustered in subjects. This article presents two procedures for testing the accuracy among clustered diagnostic markers. The first procedure is a test of homogeneity among continuous markers based on a global null hypothesis of the same accuracy. The result under the alternative provides the explicit distribution for the power and sample size calculation. The second procedure is a simultaneous pairwise comparison test based on weighted areas under the receiver operating characteristic curves. This test is particularly useful if a global difference among markers is found by the homogeneity test. We apply our procedures to the BioCycle Study designed to assess and compare the accuracy of hormone and oxidative stress markers in distinguishing women with ovulatory menstrual cycles from those without.
ROC curve; biomarker; homogeneity test; sample size
We propose efficient nonparametric statistics to compare medical imaging modalities in multi-reader multi-test data and to compare markers in longitudinal ROC data. The proposed methods are based on the weighted area under the ROC curve which includes the area under the curve and the partial area under the curve as special cases. The methods maximize the local power for detecting the difference between imaging modalities. The asymptotic results of the proposed methods are developed under a complex correlation structure. Our simulation studies show that the proposed statistics result in much better powers than existing statistics. We applied the proposed statistics to an endometriosis diagnosis study.
ROC curve; Optimal weights; Wilcoxon statistics; Correlated data
Infantile neuronal ceroid lipofuscinosis (INCL) is a devastating childhood neurodegenerative lysosomal storage disease (LSD) that has no effective treatment. It is caused by inactivating mutations in the palmitoyl-protein thioesterase-1 (PPT1) gene. PPT1-deficiency impairs the cleavage of thioester linkage in palmitoylated proteins (constituents of ceroid), preventing degradation by lysosomal hydrolases. Consequently, accumulation of lysosomal ceroid leads to INCL. Thioester linkage is cleaved by nucleophilic attack. Hydroxylamine, a potent nucleophilic cellular metabolite, may have therapeutic potential for INCL but its toxicity precludes clinical application. Here we report that a hydroxylamine-derivative, N-(tert-Butyl) hydroxylamine (NtBuHA), is non-toxic, cleaves thioester linkage in palmitoylated proteins and mediates lysosomal ceroid depletion in cultured cells from INCL patients. Importantly, in Ppt1−/− mice, which mimic INCL, NtBuHA crossed the blood-brain-barrier, depleted lysosomal ceroid, suppressed neuronal apoptosis, slowed neurological deterioration and extended lifespan. Our findings provide the proof of concept that thioesterase-mimetic and antioxidant small molecules like NtBuHA are potential drug-targets for thioesterase deficiency diseases like INCL.
Motivated by actual study designs, this article considers efficient logistic regression designs where the population is identified with a binary test that is subject to diagnostic error. We consider the case where the imperfect test is obtained on all participants, while the gold standard test is measured on a small chosen subsample. Under maximum-likelihood estimation, we evaluate the optimal design in terms of sample selection as well as verification. We show that there may be substantial efficiency gains by choosing a small percentage of individuals who test negative on the imperfect test for inclusion in the sample (e.g., verifying 90% test-positive cases). We also show that a two-stage design may be a good practical alternative to a fixed design in some situations. Under optimal and nearly optimal designs, we compare maximum-likelihood and semi-parametric efficient estimators under correct and misspecified models with simulations. The methodology is illustrated with an analysis from a diabetes behavioral intervention trial.
Case-control designs; Diagnostic accuracy; Epidemiologic designs; Misclassification; Measurement error
Anorectal atresia is a serious birth defect of largely unknown etiology but candidate genes have been identified in animal studies and human syndromes. Because alterations in the activity of these genes might lead to anorectal atresia, we selected 71 common variants predicted to be in transcription factor binding sites, CpG windows, splice sites, and miRNA target sites of 25 candidate genes, and tested for their association with anorectal atresia. The study population comprised 150 anorectal atresia cases and 623 control infants without major malformations. Variants predicted to affect transcription factor binding, splicing, and DNA methylation in WNT3A, PCSK5, TCF4, MKKS, GLI2, HOXD12, and BMP4 were associated with anorectal atresia based on a nominal P value <0.05. The GLI2 and BMP4 variants are reported to be moderately associated with gene expression changes (Spearman’s rank correlation coefficients between −0.260 and 0.226). We did not find evidence for interaction between maternal pre-pregnancy obesity and variants in MKKS, a gene previously associated with obesity, on the risk of anorectal atresia. Our results for MKKS support previously suggested associations with anorectal malformations. Our findings suggest that more research is needed to determine whether altered GLI2 and BMP4 expression is important in anorectal atresia in humans.
anorectal malformations; imperforate anus; hindgut; congenital abnormalities
The objective of the study is to quantify the causal effect of β-cell function on type 2 diabetes by minimizing residual confounding and reverse causation. We employed a Mendelian randomization (MR) approach using TCF7L2 variant rs7903146 as an instrument for lifelong levels of β-cell function. We first conducted two sets of meta-analyses to quantify the association of the TCF7L2 variant with the risk of type 2 diabetes among 55 436 cases and 106 020 controls from 66 studies by calculating pooled odds ratio (OR) and to quantify the associations with multiple direct or indirect measures of β-cell function among 35 052 non-diabetic individuals from 31 studies by calculating pooled mean difference. We further applied the method of MR to obtain the causal estimates for the effect of β-cell function on type 2 diabetes risk based on findings from the meta-analyses. The OR [95% confidence interval (CI)] was 0.87 (0.81–0.93) for each five unit increment in homeostasis model assessment of insulin secretion (HOMA-%B) (P = 3.0 × 10−5). In addition, for measures based on intravenous glucose tolerance test, ORs (95% CI) associated with type 2 diabetes risk were 0.24 (0.08–0.74) (P = 0.01) and 0.14 (0.04–0.48) (P = 0.002) for per 1 standard deviation increment in insulin sensitivity index and disposition index, respectively. Findings from the present study lend support to a causal role of pancreatic β-cell function itself in the etiology of type 2 diabetes.
This article concerns construction of confidence intervals for the prevalence of a rare disease using Dorfman’s pooled testing procedure when the disease status is classified with an imperfect biomarker. Such an interval can be derived by converting a confidence interval for the probability that a group is tested positive. Wald confidence intervals based on a normal approximation are shown to be inefficient in terms of coverage probability, even for relatively large number of pools. A few alternatives are proposed and their performance is investigated in terms of coverage probability and length of intervals.
confidence intervals; coverage probability; exact inference; pooling; prevalence; rare event; sensitivity; specificity
We conducted a population-based case-control study of single nucleotide polymorphisms (SNPs) in selected genes to find common variants that play a role in the etiology of limb deficiencies (LD)s. Included in the study were 389 infants with LDs of unknown cause and 980 unaffected controls selected from all births in New York State (NYS) for the years 1998 to 2005. We used cases identified from the NYS Department of Health (DOH) Congenital Malformations Registry. Genotypes were obtained for 132 SNPs in genes involved in limb development (SHH, WNT7A, FGF4, FGF8, FGF10, TBX3, TBX5, SALL4, GREM1, GDF5, CTNNB1, EN1, CYP26A1, CYP26B1), angiogenesis (VEGFA, HIF1A, NOS3), and coagulation (F2, F5, MTHFR). Genotype call rates were >97% and SNPs were tested for departure from Hardy-Weinberg expectations by race/ethnic subgroups. For each SNP, odds ratios (OR)s and confidence intervals (CI)s were estimated and corrected for multiple comparisons for all LDs combined and for LD subtypes. Among non-Hispanic white infants, associations between FGF10 SNPs rs10805683 and rs13170645 and all LDs combined were statistically significant following correction for multiple testing (OR=1.99; 95% CI=1.43-2.77; uncorrected p=0.000043 for rs10805683 heterozygous genotype, and OR=2.37; 95% CI=1.48-3.78; uncorrected p=0.00032 for rs13170645 homozygous minor genotype). We also observed suggestive evidence for associations with SNPs in other genes including CYP26B1 and WNT7A. Animal studies have shown that FGF10 induces formation of the apical ectodermal ridge and is necessary for limb development. Our data suggest that common variants in FGF10 increase the risk for a wide range of non-syndromic limb deficiencies.
limb deficiencies; polymorphisms; FGF10
Triad families are routinely used to test association between genetic variants and complex diseases. Triad studies are important and popular since they are robust in terms of being less prone to false positives due to population structure. In practice, one may collect not only complete triads, but also incomplete families such as dyads (affected child with one parent) and singleton monads (affected child without parents). Since there is a lack of convenient algorithms and software to analyze the incomplete data, dyads and monads are usually discarded. This may lead to loss of power and insufficient utilization of genetic information in a study.
We develop likelihood-based statistical models and likelihood ratio tests to test for association between complex diseases and genetic markers by using combinations of full triads, parent-child dyads, and affected singleton monads for a unified analysis. A likelihood is calculated directly to facilitate the data analysis without imputation and to avoid computational complexity. This makes it easy to implement the models and to explain the results.
By simulation studies, we show that the proposed models and tests are very robust in terms of accurately controlling type I error evaluations, and are powerful by empirical power evaluations. The methods are applied to test for association between transforming growth factor alpha (TGFA) gene and cleft palate in an Irish study.
Association mapping of complex diseases; Likelihood ratio tests; Transmission disequilibrium tests
Hirschsprung’s disease (HSCR) results from failed colonization of the embryonic gut by enteric neural crest cells (ENCCs); colonization requires RET proto-oncogene (RET) signaling. We sequenced RET to identify coding and splice-site variants in a population-based case group and we tested for associations between HSCR and common variants in RET and candidate genes (ASCL1, HOXB5, L1CAM, PHOX2B, PROK1, PROKR1) chosen because they are involved in ENCC proliferation, migration, and differentiation in animal models. We conducted a nested case-control study of 304 HSCR cases and 1 215 controls. Among 38 (12.5%) cases with 34 RET coding and splice-site variants, 18 variants were previously unreported. We confirmed associations with common variants in HOXB5 and PHOX2B but the associations with variants in ASCL1, L1CAM, and PROK1 were not significant after multiple comparisons adjustment. RET variants were strongly associated with HSCR (P values between 10−3 and 10−31) but this differed by race/ethnicity: associations were absent in African-Americans. Our population-based study not only identified novel RET variants in HSCR cases, it showed that common RET variants may not contribute to HSCR in all race/ethnic groups. The findings for HOXB5 and PHOX2B provide supportive evidence that genes regulating ENCC proliferation, migration, and differentiation could be risk factors for HSCR.
congenital abnormalities; enteric nervous system; Hirschsprung disease; RET
To test the effect on diabetes management outcomes of a low-intensity, clinic-integrated behavioral intervention for families of youth with type 1 diabetes.
Families (n = 390) obtaining care for type 1 diabetes participated in a 2-year randomized clinical trial of a clinic-integrated behavioral intervention designed to improve family diabetes management practices. Measurement of hemoglobin A1c, the primary outcome, was obtained at each clinic visit and analyzed centrally. Blood glucose meter data were downloaded at each visit. Adherence was assessed by using a semistructured interview at baseline, mid-study, and follow-up. Analyses included 2-sample t tests at predefined time intervals and mixed-effect linear-quadratic models to assess for difference in change in outcomes across the study duration.
A significant overall intervention effect on change in glycemic control from baseline was observed at the 24-month interval (P = .03). The mixed-effect model showed a significant intervention by age interaction (P < .001). Among participants aged 12 to 14, a significant effect on glycemic control was observed (P = .009 for change from baseline to 24-month interval; P = .035 for mixed-effect model across study duration), but there was no effect among those aged 9 to 11. There was no intervention effect on child or parent report of adherence; however, associations of change in adherence with change in glycemic control were weak.
This clinic-integrated behavioral intervention was effective in preventing the deterioration in glycemic control evident during adolescence, offering a potential model for integrating medical and behavioral sciences in clinical care.
type 1 diabetes; children; adolescents; adherence; behavioral intervention; glycemic control
Several optimality properties of Dorfman’s (1943) group testing procedure are derived for estimation of the prevalence of a rare disease whose status is classified with error. Exact ranges of disease prevalence are obtained for which group testing provides more efficient estimation when group size increases.
Binary outcome; Maximum likelihood estimation; Pooling; Prevalence; Sensitivity; Specificity
Diagnostic accuracy can be improved considerably by combining multiple biomarkers. Although the likelihood ratio provides optimal solution to combination of biomarkers, the method is sensitive to distributional assumptions which are often difficult to justify. Alternatively simple linear combinations can be considered whose empirical solution may encounter extensive computation when the number of biomarkers is relatively large. Moreover, the optimal linear combinations derived under multivariate normality may suffer substantial loss of efficiency if the distributions are apart from normality. In this paper we propose a new approach that linearly combines the minimum and maximum values of the biomarkers. Such combination only involves searching for a single combination coefficient that maximizes the area under the receiver operating characteristic (ROC) curves and is thus computation-effective. Simulation results show that the min-max combination may yield larger partial or full area under the ROC curves and is more robust against distributional assumptions. The methods are illustrated using the growth-related hormones data from the Growth and Maturation in Children with Autism or Autistic Spectrum Disorder (ASD) Study (Autism/ASD Study).
Area under curves; linear combinations; receiver operating characteristic (ROC) curve; robustness; sensitivity; specificity
Both taking folic acid-containing vitamins around conception and consuming food fortified with folic acid have been reported to reduce omphalocele rates. Genetic factors are etiologically important in omphalocele as well; our pilot study showed a relationship with the folate metabolic enzyme gene methylenetetrahydrofolate reductase (MTHFR). We studied 169 non-aneuploid omphalocele cases and 761 unaffected, matched controls from all New York State births occurring between 1998 and 2005 to look for associations with single nucleotide polymorphisms (SNPs) known to be important in folate, vitamin B12, or choline metabolism. In the total study population, variants in the transcobalamin receptor gene (TCblR), rs2232775 (Q8R), and the MTHFR gene, rs1801131 (1298A>C), were significantly associated with omphalocele. In African-Americans significant associations were found with SNPs in genes for the vitamin B12 transporter (TCN2) and the vitamin B12 receptor (TCblR). A SNP in the homocysteine-related gene, betaine-homocysteine S-methyltransferase (BHMT), rs3733890 (R239Q), was significantly associated with omphalocele in both African-Americans and Asians. Only the TCblR association in the total population remained statistically significant if Bonferroni correction was applied. The finding that transcobalamin receptor (TCblR) and transporter (TCN2) SNPs and a BHMT SNP were associated with omphalocele suggests that disruption of methylation reactions, in which folate, vitamin B12, and homocysteine play critical parts, may be a risk factor for omphalocele. Our data, if confirmed, suggest that supplements containing both folic acid and vitamin B12 may be beneficial in preventing omphaloceles.
omphalocele; folate; vitamin B12; homocysteine; transcobalamin; transcobalamin receptor
Cigarette smoking has been implicated in reproductive outcomes including delayed conception, but mechanisms underlying these associations remain unclear. One potential mechanism is the effect of cigarette smoking on reproductive hormones; however, studies evaluating associations between smoking and hormone levels are complicated by variability of hormones and timing of specimen collection. We evaluated smoking and its relationship to reproductive hormones among women participating in the BioCycle study, a longitudinal study of menstrual cycle function in healthy, premenopausal, regularly menstruating women (n=259). Fertility monitors were used to help guide timing of specimen collection. Serum levels of estradiol, progesterone, follicle-stimulating hormone (FSH), luteinizing hormone (LH) and total sex-hormone binding globulin (SHBG) across phases of the menstrual cycle were compared between smokers and nonsmokers.
We observed statistically significant phase-specific differences in hormone levels between smokers and nonsmokers. Compared to nonsmokers, smokers had higher levels of FSH in the early follicular phase higher LH at menses after adjusting for potential confounding factors of age, race, BMI, nulliparity, vigorous exercise, and alcohol and caffeine intake through inverse probability of treatment weights. No statistically significant differences were observed for estradiol, progesterone or SHBG. These phase-specific differences in levels of LH and FSH in healthy, regularly menstruating women who are current smokers compared to nonsmokers reflect one mechanism by which smoking may impact fertility and reproductive health.
For comparison of multiple outcomes commonly encountered in biomedical research, Huang et al. (2005) improved O’Brien’s (1984) rank-sum tests through the replacement of the ad hoc variance by the asymptotic variance of the test statistics. The improved tests control the Type I error rate at the desired level and gain power when the differences between the two comparison groups in each outcome variable fall into the same direction. However, they may lose power when the differences are in different directions (e.g., some are positive and some are negative). These tests and the popular Bonferroni correction failed to show important significant difference when applied to compare heart rates from a clinical trial to evaluate the effect of a procedure to remove the cardioprotective solution HTK. We propose an alternative test statistic, taking the maximum of the individual rank-sum statistics, which controls the type I error and maintains satisfactory power regardless of the directions of the differences. Simulation studies show the proposed test to be of higher power than other tests in certain alternative parameter space of interest. Furthermore, when used to analyze the heart rates data the proposed test yields more satisfactory results.
Autism spectrum disorder; Behrens-Fisher problem; Cardioprotective solution; Case-control studies; Growth hormones; Multiple outcomes; Non-parametrics; Rank-sum statistics
The cost efficient two-stage design is often used in genome-wide association studies (GWASs) in searching for genetic loci underlying the susceptibility for complex diseases. Replication-based analysis, which considers data from each stage separately, often suffers from loss of efficiency. Joint test that combines data from both stages has been proposed and widely used to improve efficiency. However, existing joint analyses are based on test statistics derived under an assumed genetic model, and thus might not have robust performance when the assumed genetic model is not appropriate.
In this paper, we propose joint analyses based on two robust tests, MERT and MAX3, for GWASs under a two-stage design. We developed computationally efficient procedures and formulas for significant level evaluation and power calculation. The performances of the proposed approaches are investigated through the extensive simulation studies and a real example. Numerical results show that the joint analysis based on the MAX3 test statistic has the best overall performance.
MAX3 joint analysis is the most robust procedure among the considered joint analyses, and we recommend using it in a two-stage genome-wide association study.
Before a comparative diagnostic trial is carried out, maximum sample sizes for the diseased group and the nondiseased group need to be obtained to achieve a nominal power to detect a meaningful difference in diagnostic accuracy. Sample size calculation depends on the variance of the statistic of interest, which is the difference between receiver operating characteristic summary measures of 2 medical diagnostic tests. To obtain an appropriate value for the variance, one often has to assume an arbitrary parametric model and the associated parameter values for the 2 groups of subjects under 2 tests to be compared. It becomes more tedious to do so when the same subject undergoes 2 different tests because the correlation is then involved in modeling the test outcomes. The calculated variance based on incorrectly specified parametric models may be smaller than the true one, which will subsequently result in smaller maximum sample sizes, leaving the study underpowered. In this paper, we develop a nonparametric adaptive method for comparative diagnostic trials to update the sample sizes using interim data, while allowing early stopping during interim analyses. We show that the proposed method maintains the nominal power and type I error rate through theoretical proofs and simulation studies.
Diagnostic accuracy; Error spending function; ROC; Sensitivity; Specificity
Large comparative clinical trials usual target a wide-range of patients population in which subgroups exist according to certain patients’ characteristics. Often, scientific knowledge or existing empirical data support the assumption that patients’ improvement is larger among certain subgroups than the others. Such information can be used to design a more cost-effective clinical trial.
The goal of the article is to use such information to design a more cost-effective clinical trial.
A two-stage sample-enrichment design strategy is proposed that begins with enrollment from certain subgroup of patients and allows the trial to be terminated for futility in that subgroup.
Simulation studies show that the two-stage sample-enrichment strategy is cost-effective if indeed the null hypothesis of no treatment improvement is true, as also so illustrated with data from a completed trial of calcium to prevent preeclampsia.
Feasibility of the proposed enrichment design relies on the knowledge prior to the start of the trial that certain patients can benefit more than others from the treatment.
The two-stage sample-enrichment approach borrows strength from treatment heterogeneity among target patients in a large scale comparative clinical trial, and is more cost-effective if the treatment are of no difference.
Sample size and power; stopping for futility; subgroup analysis; treatment heterogeneity
Intraclass correlation models with missing data at random are considered. With a properly reduced model, a general method, which allows repeated observations with missing in non-monotone pattern, is proposed to construct exact test statistics and simultaneous confidence intervals for linear contrasts in the means. Simulation results are given to compare exact and asymptotic simultaneous confidence intervals. A real example is provided for illustration of the proposed method.
Contrast; Exact test; Intraclass correlation model; Linear mixed model; Simultaneous confidence intervals