1.  Using absolute risks to assess the risks and benefits of treatment 
Thorax  2014;69(7):604-605.
PMCID: PMC4686127  PMID: 24550059
2.  Response 
PMCID: PMC4271035  PMID: 25381391
3.  Effects of Helicobacter pylori Treatment on Gastric Cancer Incidence and Mortality in Subgroups 
Among 2258 Helicobacter pylori–seropositive subjects randomly assigned to receive one-time H. pylori treatment with amoxicillin-omeprazole or its placebo, we evaluated the 15-year effect of treatment on gastric cancer incidence and mortality in subgroups defined by age, baseline gastric histopathology, and post-treatment infection status. We used conditional logistic and Cox regressions for covariable adjustments in incidence and mortality analyses, respectively. Treatment was associated with a statistically significant decrease in gastric cancer incidence (odds ratio = 0.36; 95% confidence interval [CI] = 0.17 to 0.79) and mortality (hazard ratio = 0.26; 95% CI = 0.09 to 0.79) at ages 55 years and older and a statistically significant decrease in incidence among those with intestinal metaplasia or dysplasia at baseline (odds ratio = 0.56; 95% CI = 0.34 to 0.91). Treatment benefits for incidence and mortality among those with and without post-treatment infection were similar. Thus H. pylori treatment can benefit older members and those with advanced baseline histopathology, and benefits are present even with post-treatment infection, suggesting treatment can benefit an entire population, not just the young or those with mild histopathology.
PMCID: PMC4067110  PMID: 24925350
4.  Maximizing DNA Yield for Epidemiologic Studies: No More Buffy Coats? 
American Journal of Epidemiology  2013;178(7):1170-1176.
Some molecular analyses require microgram quantities of DNA, yet many epidemiologic studies preserve only the buffy coat. In Frederick, Maryland, in 2010, we estimated DNA yields from 5 mL of whole blood and from equivalent amounts of all-cell-pellet (ACP) fraction, buffy coat, and residual blood cells from fresh blood (n = 10 volunteers) and from both fresh and frozen blood (n = 10). We extracted DNA with the QIAamp DNA Blood Midi Kit (Qiagen Sciences, Germantown, Maryland) for silica spin column capture and measured double-stranded DNA. Yields from frozen blood fractions were not statistically significantly different from those obtained from fresh fractions. ACP fractions yielded 80.6% (95% confidence interval: 66, 97) of the yield of frozen whole blood and 99.3% (95% confidence interval: 86, 100) of the yield of fresh blood. Frozen buffy coat and residual blood cells each yielded only half as much DNA as frozen ACP, and the yields were more variable. Assuming that DNA yield and quality from frozen ACP are stable, we recommend freezing plasma and ACP. Not only does ACP yield twice as much DNA as buffy coat but it is easier to process, and its yield is less variable from person to person. Long-term stability studies are needed. If one wishes to separate buffy coat before freezing, one should also save the residual blood cell fraction, which contains just as much DNA.
PMCID: PMC3783090  PMID: 23857774
all-cell-pellet fraction; buffy coat; DNA extraction yield; residual blood cells; whole blood
5.  Dramatic reduction of liver cancer incidence in young adults: 28 year follow-up of etiological interventions in an endemic area of China 
Carcinogenesis  2013;34(8):1800-1805.
Qidong City, China, has had high liver cancer incidence from endemic hepatitis B virus (HBV) infection and dietary exposure to aflatoxin. Based on etiologic studies, we began interventions in 1980 to reduce dietary aflatoxin and initiate neonatal HBV vaccination. We studied trends in liver cancer incidence rates in the 1.1 million inhabitants of Qidong and examined trends in aflatoxin exposure, staple food consumption, HBV infection markers and annual income. Aflatoxin exposure declined greatly in association with economic reform, increased earnings and educational programs to shift staple food consumption in the total population from moldy corn to fresh rice. A controlled neonatal HBV vaccination trial began in 1983 and ended in November, 1990, when vaccination was expanded to all newborns. Liver cancer incidence fell dramatically in young adults. Compared with 1980–83, the age-specific liver cancer incidence rates in 2005–08 significantly decreased 14-fold at ages 20–24, 9-fold at ages 25–29, 4-fold at ages 30–34, 1.5-fold at ages 35–39, 1.2-fold at ages 40–44 and 1.4-fold at ages 45–49, but increased at older ages. The 14-fold reduction at ages 20–24 might reflect the combined effects of reduced aflatoxin exposure and partial neonatal HBV vaccination. Decrease incidence in age groups >25 years could mainly be attributable to rapid aflatoxin reduction. Compared with 1980–83, liver cancer incidence in 1990–93 significantly decreased 3.4-fold at ages 20–24, and 1.9-fold at ages 25–29 when the first vaccinees were <11 years old.
PMCID: PMC3731800  PMID: 23322152
6.  Using multiple risk models with preventive interventions 
Statistics in medicine  2012;31(23):2687-2696.
An ideal preventive intervention would have negligible side effects and could be applied to the entire population, thus achieving maximal preventive impact. Unfortunately, many interventions have adverse effects as well as beneficial effects. For example, tamoxifen reduces the risk of breast cancer by about 50% and the risk of hip fracture by 45%, but increases the risk of stroke by about 60%; other serious adverse effects include endometrial cancer and pulmonary embolus. Hence, tamoxifen should only be given to the subset of the population with high enough risks of breast cancer and hip fracture such that the preventive benefits outweigh the risks. Recommendations for preventive use of tamoxifen have been based primarily on breast cancer risk. Age- and race-specific rates were considered for other health outcomes, but not risk models. In this paper, I investigate the extent to which modeling not only the risk of breast cancer, but also the risk of stroke, can improve the decision to take tamoxifen. These calculations also give insight into the relative benefits of improving the discriminatory accuracy of such risk models versus improving the preventive effectiveness or reducing the adverse risks of the intervention. Depending on the discriminatory accuracies of the risk models, there may be considerable advantage to modeling the risks of more than one health outcome.
PMCID: PMC3926659  PMID: 22733645
absolute risk models; breast cancer; disease prevention; modeling multiple risks; risk-based prevention strategy; risk versus benefit
7.  Estimation of effect size distribution from genome-wide association studies and implications for future discoveries 
Nature genetics  2010;42(7):570-575.
We report a set of tools to estimate the number of susceptibility loci and the distribution of their effect sizes for a trait on the basis of discoveries from existing genome-wide association studies (GWASs). We propose statistical power calculations for future GWASs using estimated distributions of effect sizes. Using reported GWAS findings for height, Crohn’s disease and breast, prostate and colorectal (BPC) cancers, we determine that each of these traits is likely to harbor additional loci within the spectrum of low-penetrance common variants. These loci, which can be identified from sufficiently powerful GWASs, together could explain at least 15–20% of the known heritability of these traits. However, for BPC cancers, which have modest familial aggregation, our analysis suggests that risk models based on common variants alone will have modest discriminatory power (63.5% area under curve), even with new discoveries.
PMCID: PMC4615599  PMID: 20562874
8.  Potential Usefulness of Single Nucleotide Polymorphisms to Identify Persons at High Cancer Risk: An Evaluation of Seven Common Cancers 
Journal of Clinical Oncology  2012;30(17):2157-2162.
To estimate the likely number and predictive strength of cancer-associated single nucleotide polymorphisms (SNPs) that are yet to be discovered for seven common cancers.
From the statistical power of published genome-wide association studies, we estimated the number of undetected susceptibility loci and the distribution of effect sizes for all cancers. Assuming a log-normal model for risks and multiplicative relative risks for SNPs, family history (FH), and known risk factors, we estimated the area under the receiver operating characteristic curve (AUC) and the proportion of patients with risks above risk thresholds for screening. From additional prevalence data, we estimated the positive predictive value and the ratio of non–patient cases to patient cases (false-positive ratio) for various risk thresholds.
Age-specific discriminatory accuracy (AUC) for models including FH and foreseeable SNPs ranged from 0.575 for ovarian cancer to 0.694 for prostate cancer. The proportions of patients in the highest decile of population risk ranged from 16.2% for ovarian cancer to 29.4% for prostate cancer. The corresponding false-positive ratios were 241 for colorectal cancer, 610 for ovarian cancer, and 138 or 280 for breast cancer in women age 50 to 54 or 40 to 44 years, respectively.
Foreseeable common SNP discoveries may not permit identification of small subsets of patients that contain most cancers. Usefulness of screening could be diminished by many false positives. Additional strong risk factors are needed to improve risk discrimination.
PMCID: PMC3397697  PMID: 22585702
9.  Fifteen-Year Effects of Helicobacter pylori, Garlic, and Vitamin Treatments on Gastric Cancer Incidence and Mortality 
In the Shandong Intervention Trial, 2 weeks of antibiotic treatment for Helicobacter pylori reduced the prevalence of precancerous gastric lesions, whereas 7.3 years of oral supplementation with garlic extract and oil (garlic treatment) or vitamin C, vitamin E, and selenium (vitamin treatment) did not. Here we report 14.7-year follow-up for gastric cancer incidence and cause-specific mortality among 3365 randomly assigned subjects in this masked factorial placebo-controlled trial. Conditional logistic regression was used to estimate the odds of gastric cancer incidence, and the Cox proportional hazards model was used to estimate the relative hazard of cause-specific mortality. All statistical tests were two-sided. Gastric cancer was diagnosed in 3.0% of subjects who received H pylori treatment and in 4.6% of those who received placebo (odds ratio = 0.61, 95% confidence interval = 0.38 to 0.96, P = .032). Gastric cancer deaths occurred among 1.5% of subjects assigned H pylori treatment and among 2.1% of those assigned placebo (hazard ratio [HR] of death = 0.67, 95% CI = 0.36 to 1.28). Garlic and vitamin treatments were associated with non-statistically significant reductions in gastric cancer incidence and mortality. Vitamin treatment was associated with statistically significantly fewer deaths from gastric or esophageal cancer, a secondary endpoint (HR = 0.51, 95% CI = 0.30 to 0.87; P = .014).
PMCID: PMC3309129  PMID: 22271764
10.  Risk Factor Modification and Projections of Absolute Breast Cancer Risk 
Although modifiable risk factors have been included in previous models that estimate or project breast cancer risk, there remains a need to estimate the effects of changes in modifiable risk factors on the absolute risk of breast cancer.
Using data from a case–control study of women in Italy (2569 case patients and 2588 control subjects studied from June 1, 1991, to April 1, 1994) and incidence and mortality data from the Florence Registries, we developed a model to predict the absolute risk of breast cancer that included five non-modifiable risk factors (reproductive characteristics, education, occupational activity, family history, and biopsy history) and three modifiable risk factors (alcohol consumption, leisure physical activity, and body mass index). The model was validated using independent data, and the percent risk reduction was calculated in high-risk subgroups identified by use of the Lorenz curve.
The model was reasonably well calibrated (ratio of expected to observed cancers = 1.10, 95% confidence interval [CI] = 0.96 to 1.26), but the discriminatory accuracy was modest. The absolute risk reduction from exposure modifications was nearly proportional to the risk before modifying the risk factors and increased with age and risk projection time span. Mean 20-year reductions in absolute risk among women aged 65 years were 1.6% (95% CI = 0.9% to 2.3%) in the entire population, 3.2% (95% CI = 1.8% to 4.8%) among women with a positive family history of breast cancer, and 4.1% (95% CI = 2.5% to 6.8%) among women who accounted for the highest 10% of the total population risk, as determined from the Lorenz curve.
These data give perspective on the potential reductions in absolute breast cancer risk from preventative strategies based on lifestyle changes. Our methods are also useful for calculating sample sizes required for trials to test lifestyle interventions.
PMCID: PMC3131219  PMID: 21705679
11.  Projecting Individualized Absolute Invasive Breast Cancer Risk in Asian and Pacific Islander American Women 
The Breast Cancer Risk Assessment Tool (BCRAT) of the National Cancer Institute is widely used for estimating absolute risk of invasive breast cancer. However, the absolute risk estimates for Asian and Pacific Islander American (APA) women are based on data from white women. We developed a model for projecting absolute invasive breast cancer risk in APA women and compared its projections to those from BCRAT.
Data from 589 women with breast cancer (case patients) and 952 women without breast cancer (control subjects) in the Asian American Breast Cancer Study were used to compute relative and attributable risks based on the age at menarche, number of affected mothers, sisters, and daughters, and number of previous benign biopsies. Absolute risks were obtained by combining this information with ethnicity-specific data from the National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) program and with US ethnicity-specific mortality data to create the Asian American Breast Cancer Study model (AABCS model). Independent data from APA women in the Women’s Health Initiative (WHI) were used to check the calibration and discriminatory accuracy of the AABCS model.
The AABCS model estimated absolute risk separately for Chinese, Japanese, Filipino, Hawaiian, Other Pacific Islander, and Other Asian women. Relative and attributable risks for APA women were comparable to those in BCRAT, but the AABCS model usually estimated lower-risk projections than BCRAT in Chinese and Filipino, but not in Hawaiian women, and not in every age and ethnic subgroup. The AABCS model underestimated absolute risk by 17% (95% confidence interval = 1% to 38%) in independent data from WHI, but APA women in the WHI had incidence rates approximately 18% higher than those estimated from the SEER program.
The AABCS model was calibrated to ethnicity-specific incidence rates from the SEER program for projecting absolute invasive breast cancer risk and is preferable to BCRAT for counseling APA women.
PMCID: PMC3119648  PMID: 21562243
12.  Personalized estimates of breast cancer risk in clinical practice and public health 
Statistics in medicine  2011;30(10):1090-1104.
This paper defines absolute risk and some of its properties, and presents applications in breast cancer counseling and prevention. For counseling, estimates of absolute risk give useful perspective and can be used in management decisions that require weighing risks and benefits, such as whether or not to take tamoxifen to prevent breast cancer. Absolute risk models are also useful in designing intervention trials to prevent breast cancer and in assessing the potential reductions in absolute risk of disease that might result from reducing exposures that are associated with breast cancer. In these applications, it is important that the risk model be well calibrated, namely that it accurately predict the numbers of women who will develop breast cancer in various subsets of the population. Absolute risk models are also needed to implement a “high risk” prevention strategy that identifies a high risk subset of the population and focuses intervention efforts on that subset. The limitations of the high risk strategy are discussed, including the need for risk models with high discriminatory accuracy, and the need for less toxic interventions that can reduce the threshold of risk above which the intervention provides a net benefit. I also discuss the potential use of risk models in allocating prevention resources under cost constraints. High discriminatory accuracy of the risk model, in addition to good calibration, is desirable in this application, and the risk assessment should not be expensive in comparison with the intervention.
PMCID: PMC3079423  PMID: 21337591
absolute risk; allocation of prevention resources; breast cancer; calibration; crude risk; cumulative incidence; discriminatory accuracy; disease prevention; designing disease prevention trials; high risk prevention strategy; risk versus benefit
13.  Beyond Recreational Physical Activity: Examining Occupational and Household Activity, Transportation Activity, and Sedentary Behavior in Relation to Postmenopausal Breast Cancer Risk 
American Journal of Public Health  2010;100(11):2288-2295.
We prospectively examined nonrecreational physical activity and sedentary behavior in relation to breast cancer risk among 97039 postmenopausal women in the National Institutes of Health–AARP Diet and Health Study.
We identified 2866 invasive and 570 in situ breast cancer cases recorded between 1996 and 2003 and used Cox proportional hazards regression to estimate multivariate relative risks (RRs) and 95% confidence intervals (CIs).
Routine activity during the day at work or at home that included heavy lifting or carrying versus mostly sitting was associated with reduced risk of invasive breast cancer (RR = 0.62; 95% CI = 0.42, 0.91; Ptrend = .024).
Routine activity during the day at work or home may be related to reduced invasive breast cancer risk. Domains outside of recreation time may be attractive targets for increasing physical activity and reducing sedentary behavior among postmenopausal women.
PMCID: PMC2951936  PMID: 20864719
14.  Association between upper digestive tract microbiota and cancer predisposing states in the esophagus and stomach 
The human upper digestive tract microbial community (microbiota) is not well characterized and few studies have explored how it relates to human health. We examined the relationship between upper digestive tract microbiota and two cancer predisposing states, serum pepsinogen I/pepsinogen II ratio (PGI/II) (predictor of gastric cancer risk), and esophageal squamous dysplasia (ESD) (the precursor lesion of esophageal squamous cell carcinoma (ESCC)) in a cross-sectional design.
The Human Oral Microbe Identification Microarray was used to test for the presence of 272 bacterial species in 333 upper digestive tract samples from a Chinese cancer screening cohort. Serum PGI and PGII were determined by enzyme-linked immunosorbent assays. ESD was determined by chromoendoscopy with biopsy.
Lower microbial richness (number of bacterial genera per sample) was significantly associated with lower PGI/II ratio (P=0.034) and the presence of ESD (P=0.018). We conducted principal component (PC) analysis on a β-diversity matrix (pairwise difference in microbiota), and observed significant correlations between PC1, PC3 and PGI/II (P=0.004, 0.009 respectively), and between PC1 and ESD (P=0.003).
lower microbial richness in upper digestive tract was independently associated with both cancer predisposing states in the esophagus and stomach (presence of ESD and lower PGI/II).
PMCID: PMC4011942  PMID: 24700175
microbiota; gastric cancer; esophageal squamous cell carcinoma; esophageal squamous dysplasia; serum pepsinogen I/pepsinogen II ratio
15.  Value of Adding Single-Nucleotide Polymorphism Genotypes to a Breast Cancer Risk Model 
Adding genotypes from seven single-nucleotide polymorphisms (SNPs), which had previously been associated with breast cancer, to the National Cancer Institute's Breast Cancer Risk Assessment Tool (BCRAT) increases the area under the receiver operating characteristic curve from 0.607 to 0.632.
Criteria that are based on four clinical or public health applications were used to compare BCRAT with BCRATplus7, which includes the seven genotypes. Criteria included number of expected life-threatening events for the decision to take tamoxifen, expected decision losses (in units of the loss from giving a mammogram to a woman without detectable breast cancer) for the decision to have a mammogram, rates of risk reclassification, and number of lives saved by risk-based allocation of screening mammography. For all calculations, the following assumptions were made: Hardy–Weinberg equilibrium, linkage equilibrium across SNPs, additive effects of alleles at each locus, no interactions on the logistic scale among SNPs or with factors in BCRAT, and independence of SNPs from factors in BCRAT.
Improvements in expected numbers of life-threatening events were only 0.07% and 0.81% for deciding whether to take tamoxifen to prevent breast cancer for women aged 50–59 and 40–49 years, respectively. For deciding whether to recommend screening mammograms to women aged 50–54 years, the reduction in expected losses was 0.86% if the ideal breast cancer prevalence threshold for recommending mammography was that of women aged 50–54 years. Cross-classification of risks indicated that some women classified by BCRAT would have different classifications with BCRATplus7, which might be useful if BCRATplus7 was well calibrated. Improvements from BCRATplus7 were small for risk-based allocation of mammograms under costs constraints.
The gains from BCRATplus7 are small in the applications examined. Models with SNPs, such as BCRATplus7, have not been validated for calibration in independent cohort data. Additional studies are needed to validate a model with SNPs and justify its use.
PMCID: PMC2704229  PMID: 19535781
16.  Are hospitals “keeping up with the Joneses”?: Assessing the spatial and temporal diffusion of the surgical robot 
The surgical robot has been widely adopted in the United States in spite of its high cost and controversy surrounding its benefit. Some have suggested that a “medical arms race” influences technology adoption. We wanted to determine whether a hospital would acquire a surgical robot if its nearest neighboring hospital already owned one.
We identified 554 hospitals performing radical prostatectomy from the Healthcare Cost and Utilization Project Statewide Inpatient Databases for seven states. We used publicly available data from the website of the surgical robot’s sole manufacturer (Intuitive Surgical, Sunnyvale, CA) combined with data collected from the hospitals to ascertain the timing of robot acquisition during year 2001 to 2008. One hundred thirty four hospitals (24%) had acquired a surgical robot by the end of 2008. We geocoded the address of each hospital and determined a hospital’s likelihood to acquire a surgical robot based on whether its nearest neighbor owned a surgical robot. We developed a Markov chain method to model the acquisition process spatially and temporally and quantified the “neighborhood effect” on the acquisition of the surgical robot while adjusting simultaneously for known confounders.
After adjusting for hospital teaching status, surgical volume, urban status and number of hospital beds, the Markov chain analysis demonstrated that a hospital whose nearest neighbor had acquired a surgical robot had a higher likelihood itself acquiring a surgical robot. (OR=1.71, 95% CI: 1.07–2.72, p=0.02).
There is a significant spatial and temporal association for hospitals acquiring surgical robots during the study period. Hospitals were more likely to acquire a surgical robot during the robot’s early adoption phase if their nearest neighbor had already done so.
PMCID: PMC4376012  PMID: 25821720
17.  Efficient Adaptively Weighted Analysis of Secondary Phenotypes in Case-control Genome-wide Association Studies 
Human heredity  2012;73(3):159-173.
We propose and compare methods of analysis for detecting associations between genotypes of a single nucleotide polymorphism (SNP) and a dichotomous secondary phenotype (X), when the data arise from a case-control study of a primary dichotomous phenotype (D), which is not rare. We considered both a dichotomous genotype (G) as in recessive or dominant models, and an additive genetic model based on the number of minor alleles present. To estimate the log odds ratio, β1, relating X to G in the general population, one needs to understand the conditional distribution [D∣X,G], in the general population. For the most general model, [D∣X,G], one needs external data on P(D=1) to estimate β1. We show that for this “full model”, maximum likelihood (FM) corresponds to a previously proposed weighted logistic regression (WL) approach if G is dichotomous. For the additive model, WL yields results numerically close, but not identical, to those of the maximum likelihood, FM. Efficiency can be gained by assuming that [D∣X,G] is a logistic model with no interaction between X and G (the “reduced model”). However, the resulting maximum likelihood (FM) can be misleading in the presence of interactions. We therefore propose an adaptively weighted approach (AW) that captures the efficiency of RM but is robust to the occasional SNP that might interact with the secondary phenotype to affect risk of the primary disease. We study the robustness of FM, WL, RM and AW to misspecification of P(D=1). In principle, one should be able to estimate β1 without external information on P(D=1) under the reduced model. However, our simulations show that the resulting inference is unreliable. Therefore, in practice one needs to introduce external information on P(D=1), even in the absence of interactions between X and G.
PMCID: PMC4364044  PMID: 22710642
adaptively weighted; case-control study; genome-wide association study; maximum likelihood; secondary phenotype
18.  Discriminatory Accuracy From Single-Nucleotide Polymorphisms in Models to Predict Breast Cancer Risk 
One purpose for seeking common alleles that are associated with disease is to use them to improve models for projecting individualized disease risk. Two genome-wide association studies and a study of candidate genes recently identified seven common single-nucleotide polymorphisms (SNPs) that were associated with breast cancer risk in independent samples. These seven SNPs were located in FGFR2, TNRC9 (now known as TOX3), MAP3K1, LSP1, CASP8, chromosomal region 8q, and chromosomal region 2q35. I used estimates of relative risks and allele frequencies from these studies to estimate how much these SNPs could improve discriminatory accuracy measured as the area under the receiver operating characteristic curve (AUC). A model with these seven SNPs (AUC = 0.574) and a hypothetical model with 14 such SNPs (AUC = 0.604) have less discriminatory accuracy than a model, the National Cancer Institute’s Breast Cancer Risk Assessment Tool (BCRAT), that is based on ages at menarche and at first live birth, family history of breast cancer, and history of breast biopsy examinations (AUC = 0.607). Adding the seven SNPs to BCRAT improved discriminatory accuracy to an AUC of 0.632, which was, however, less than the improvement from adding mammographic density. Thus, these seven common alleles provide less discriminatory accuracy than BCRAT but have the potential to improve the discriminatory accuracy of BCRAT modestly. Experience to date and quantitative arguments indicate that a huge increase in the numbers of case patients with breast cancer and control subjects would be required in genome-wide association studies to find enough SNPs to achieve high discriminatory accuracy.
PMCID: PMC2528005  PMID: 18612136
19.  Discriminatory Accuracy from Single-Nucleotide Polymorphisms in Models to Predict Breast Cancer Risk 
One purpose for seeking common alleles that are associated with disease is to use them to improve models for projecting individualized disease risk. Two genome-wide association studies and a study of candidate genes recently identified seven common single-nucleotide polymorphisms (SNPs) that were associated with breast cancer risk in independent samples. These seven SNPs were located in FGFR2, TNRC9, MAP3K1, LSP1, CASP8, chromosomal region 8q, and chromosomal region 2q35. I used estimates of relative risks and allele frequencies from these studies to estimate how much these SNPs could improve discriminatory accuracy measured as the area under the receiver operating characteristic curve (AUC). A model with these seven SNPs (AUC = 0.574) and a hypothetical model with 14 such SNPs (AUC = 0.604) have less discriminatory accuracy than a model, the National Cancer Institute's Breast Cancer Risk Assessment Tool (BCRAT), which is based on ages at menarche and at first live birth, family history of breast cancer, and history of breast biopsy examinations (AUC = 0.607). Adding the seven SNPs to BCRAT improved discriminatory accuracy to an AUC of 0.632, which was, however, less than the improvement from adding mammographic density. Thus, these seven common alleles provide less discriminatory accuracy than BCRAT but have the potential to improve the discriminatory accuracy of BCRAT modestly. Experience to date and quantitative arguments indicate that a huge increase in the numbers of case patients with breast cancer and control subjects would be required in genome-wide association studies to find enough SNPs to achieve high discriminatory accuracy.
PMCID: PMC2528005  PMID: 18612136
20.  The association between the upper digestive tract microbiota by HOMIM and oral health in a population-based study in Linxian, China 
BMC Public Health  2014;14:1110.
Bacteria affect oral health, but few studies have systematically examined the role of bacterial communities in oral diseases. We examined this relationship in a large population-based Chinese cancer screening cohort.
Human Oral Microbe Identification Microarrays were used to test for the presence of 272 human oral bacterial species (97 genera) in upper digestive tract (UDT) samples collected from 659 participants. Oral health was assessed using US NHANES (National Health and Nutrition Examination Survey) protocols. We assessed both dental health (total teeth missing; tooth decay; and the decayed, missing, and filled teeth (DMFT) score) and periodontal health (bleeding on probing (BoP) extent score, loss of attachment extent score, and a periodontitis summary estimate).
Microbial richness, estimated by number of genera per sample, was positively correlated with BoP score (P = 0.015), but negatively correlated with tooth decay and DMFT score (P = 0.008 and 0.022 respectively). Regarding β-diversity, as estimated by the UniFrac distance matrix for pairwise differences among samples, at least one of the first three principal components of the UniFrac distance matrix was correlated with the number of missing teeth, tooth decay, DMFT, BoP, or periodontitis. Of the examined genera, Parvimonas was positively associated with BoP and periodontitis. Veillonellacease [G-1] was associated with a high DMFT score, and Filifactor and Peptostreptococcus were associated with a low DMFT score.
Our results suggest distinct relationships between UDT microbiota and dental and periodontal health. Poor dental health was associated with a less microbial diversity, whereas poor periodontal health was associated with more diversity and the presence of potentially pathogenic species.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2458-14-1110) contains supplementary material, which is available to authorized users.
PMCID: PMC4223728  PMID: 25348940
Microbiota; Oral health; Dental caries; Periodontitis; Bleeding on probe; Attachment loss
21.  Relationships between computer-extracted mammographic texture pattern features and BRCA1/2 mutation status: a cross-sectional study 
Mammographic density is similar among women at risk of either sporadic or BRCA1/2-related breast cancer. It has been suggested that digitized mammographic images contain computer-extractable information within the parenchymal pattern, which may contribute to distinguishing between BRCA1/2 mutation carriers and non-carriers.
We compared mammographic texture pattern features in digitized mammograms from women with deleterious BRCA1/2 mutations (n = 137) versus non-carriers (n = 100). Subjects were stratified into training (107 carriers, 70 non-carriers) and testing (30 carriers, 30 non-carriers) datasets. Masked to mutation status, texture features were extracted from a retro-areolar region-of-interest in each subject’s digitized mammogram. Stepwise linear regression analysis of the training dataset identified variables to be included in a radiographic texture analysis (RTA) classifier model aimed at distinguishing BRCA1/2 carriers from non-carriers. The selected features were combined using a Bayesian Artificial Neural Network (BANN) algorithm, which produced a probability score rating the likelihood of each subject’s belonging to the mutation-positive group. These probability scores were evaluated in the independent testing dataset to determine whether their distribution differed between BRCA1/2 mutation carriers and non-carriers. A receiver operating characteristic analysis was performed to estimate the model’s discriminatory capacity.
In the testing dataset, a one standard deviation (SD) increase in the probability score from the BANN-trained classifier was associated with a two-fold increase in the odds of predicting BRCA1/2 mutation status: unadjusted odds ratio (OR) = 2.00, 95% confidence interval (CI): 1.59, 2.51, P = 0.02; age-adjusted OR = 1.93, 95% CI: 1.53, 2.42, P = 0.03. Additional adjustment for percent mammographic density did little to change the OR. The area under the curve for the BANN-trained classifier to distinguish between BRCA1/2 mutation carriers and non-carriers was 0.68 for features alone and 0.72 for the features plus percent mammographic density.
Our findings suggest that, unlike percent mammographic density, computer-extracted mammographic texture pattern features are associated with carrying BRCA1/2 mutations. Although still at an early stage, our novel RTA classifier has potential for improving mammographic image interpretation by permitting real-time risk stratification among women undergoing screening mammography.
Electronic supplementary material
The online version of this article (doi:10.1186/s13058-014-0424-8) contains supplementary material, which is available to authorized users.
PMCID: PMC4268674  PMID: 25159706
23.  Feasibility of self-collection of fecal specimens by randomly sampled women for health-related studies of the gut microbiome 
BMC Research Notes  2014;7:204.
The field of microbiome research is growing rapidly. We developed a method for self-collection of fecal specimens that can be used in population-based studies of the gut microbiome. We conducted a pilot study to test the feasibility of our methods among a random sample of healthy, postmenopausal women who are members of Kaiser Permanente Colorado (KPCO). We aimed to collect questionnaire data, fecal and urine specimens from 60 women, aged 55–69, who recently had a normal screening mammogram. We designed the study such that all questionnaire data and specimens could be collected at home.
We mailed an invitation packet, consent form and opt-out postcard to 300 women, then recruited by telephone women who did not opt-out. Verbally consented women were mailed an enrollment package including a risk factor questionnaire, link to an online diet questionnaire, specimen collection kit, and instructions for collecting stool and urine. Specimens were shipped overnight to the biorepository. Of the 300 women mailed an invitation packet, 58 (19%) returned the opt-out postcard. Up to 3 attempts were made to telephone the remaining women, of whom 130 (43%) could not be contacted, 23 (8%) refused, and 12 (4%) were ineligible. Enrollment packages were mailed to 77 women, of whom 59 returned the risk factor questionnaire and specimens. We found no statistically significant differences between enrolled women and those who refused participation or could not be contacted.
We demonstrated that a representative sample of women can be successfully recruited for a gut microbiome study; however, significant personal contact and carefully timed follow-up from the study personnel are required. The methods employed by our study could successfully be applied to analytic studies of a wide range of clinical conditions that have been postulated to be influenced by the gut microbial population.
PMCID: PMC3974920  PMID: 24690120
Study design; Microbiome; Breast cancer
24.  Evaluating breast cancer risk projections for Hispanic women 
Breast cancer research and treatment  2011;132(1):10.1007/s10549-011-1900-9.
For Hispanic women, the Breast Cancer Risk Assessment Tool (BCRAT; “Gail Model”) combines 1990–1996 breast cancer incidence for Hispanic women with relative risks for breast cancer risk factors from non-Hispanic white (NHW) women. BCRAT risk projections have never been comprehensively evaluated for Hispanic women. We compared the relative risks and calibration of BCRAT risk projections for 6,353 Hispanic to 128,976 NHW postmenopausal participants aged 50 and older in the Women’s Health Initiative (WHI). Calibration was assessed by the ratio of the number of breast cancers observed with that expected by the BCRAT (O/E). We re-evaluated calibration for an updated BCRAT that combined BCRAT relative risks with 1993–2007 breast cancer incidence that is contemporaneous with the WHI. Cox regression was used to estimate relative risks. Discriminatory accuracy was assessed using the concordance statistic (AUC). In the WHI Main Study, the BCRAT underestimated the number of breast cancers by 18% in both Hispanics (O/E = 1.18, P = 0.06) and NHWs (O/E = 1.18, P < 0.001). Updating the BCRAT improved calibration for Hispanic women (O/E = 1.08, P = 0.4) and NHW women (O/E = 0.98, P = 0.2). For Hispanic women, relative risks for number of breast biopsies (1.71 vs. 1.27, P = 0.03) and age at first birth (0.97 vs. 1.24, P = 0.02) differed between the WHI and BCRAT. The AUC was higher for Hispanic women than NHW women (0.63 vs. 0.58, P = 0.03). Updating the BCRAT with contemporaneous breast cancer incidence rates improved calibration in the WHI. The modest discriminatory accuracy of the BCRAT for Hispanic women might improve by using risk factor relative risks specific to Hispanic women.
PMCID: PMC3827770  PMID: 22147080
Hispanic; Breast cancer; Risk prediction; Risk assessment; BCRAT
25.  Genome-wide association studies of gastric adenocarcinoma and esophageal squamous cell carcinoma identify a shared susceptibility locus in PLCE1 at 10q23 
Nature genetics  2012;44(10):1090-1097.
We conducted a genome-wide association study of gastric cancer (GC) and esophageal squamous cell carcinoma (ESCC) in ethnic Chinese subjects in which we genotyped 551,152 single nucleotide polymorphisms (SNPs). We report a combined analysis of 2,240 GC cases, 2,115 ESCC cases, and 3,302 controls drawn from five studies. In logistic regression models adjusted for age, sex, and study, multiple variants at 10q23 had genome-wide significance for GC and ESCC independently. A notable signal was rs2274223, a nonsynonymous SNP located in PLCE1, for GC (P=8.40×1010; per allele odds ratio (OR) = 1.31) and ESCC (P=3.85×10−9; OR = 1.34). The association with GC differed by anatomic subsite. For tumors located in the cardia the association was stronger (P=4.19 × 10−15; OR= 1.57) and for those located in the noncardia stomach it was absent (P=0.44; OR=1.05). Our findings at 10q23 could provide insight into the high incidence rates of both cancers in China.
PMCID: PMC3513832  PMID: 22960999

