Search tips
Search criteria

Results 1-25 (1009264)

Clipboard (0)

Related Articles

1.  AucPR: An AUC-based approach using penalized regression for disease prediction with high-dimensional omics data 
BMC Genomics  2014;15(Suppl 10):S1.
It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data.
We propose an AUC-based approach using penalized regression (AucPR), which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non-parametric AUC-based approach, such as the lack of an appropriate concave objective function and a prudent choice of the smoothing parameter. We apply the proposed AucPR for gene selection and classification using four real microarray and synthetic data. Through numerical studies, AucPR is shown to perform better than the penalized logistic regression and the nonparametric AUC-based method, in the sense of AUC and sensitivity for a given specificity, particularly when there are many correlated genes.
We propose a powerful parametric and easily-implementable linear classifier AucPR, for gene selection and disease prediction for high-dimensional data. AucPR is recommended for its good prediction performance. Beside gene expression microarray data, AucPR can be applied to other types of high-dimensional omics data, such as miRNA and protein data.
PMCID: PMC4304290  PMID: 25559769
AUC; high-dimensional data; penalized regression; ROC curve
2.  A boosting method for maximizing the partial area under the ROC curve 
BMC Bioinformatics  2010;11:314.
The receiver operating characteristic (ROC) curve is a fundamental tool to assess the discriminant performance for not only a single marker but also a score function combining multiple markers. The area under the ROC curve (AUC) for a score function measures the intrinsic ability for the score function to discriminate between the controls and cases. Recently, the partial AUC (pAUC) has been paid more attention than the AUC, because a suitable range of the false positive rate can be focused according to various clinical situations. However, existing pAUC-based methods only handle a few markers and do not take nonlinear combination of markers into consideration.
We have developed a new statistical method that focuses on the pAUC based on a boosting technique. The markers are combined componentially for maximizing the pAUC in the boosting algorithm using natural cubic splines or decision stumps (single-level decision trees), according to the values of markers (continuous or discrete). We show that the resulting score plots are useful for understanding how each marker is associated with the outcome variable. We compare the performance of the proposed boosting method with those of other existing methods, and demonstrate the utility using real data sets. As a result, we have much better discrimination performances in the sense of the pAUC in both simulation studies and real data analysis.
The proposed method addresses how to combine the markers after a pAUC-based filtering procedure in high dimensional setting. Hence, it provides a consistent way of analyzing data based on the pAUC from maker selection to marker combination for discrimination problems. The method can capture not only linear but also nonlinear association between the outcome variable and the markers, about which the nonlinearity is known to be necessary in general for the maximization of the pAUC. The method also puts importance on the accuracy of classification performance as well as interpretability of the association, by offering simple and smooth resultant score plots for each marker.
PMCID: PMC2898798  PMID: 20537139
3.  Minimalist ensemble algorithms for genome-wide protein localization prediction 
BMC Bioinformatics  2012;13:157.
Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms.
This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors.
We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at
PMCID: PMC3426488  PMID: 22759391
Protein subcellular localization; Ensemble algorithms; Classifiers; Logistic regression
4.  The discriminative ability of FRAX, the WHO algorithm, to identify women with prevalent asymptomatic vertebral fractures: a cross-sectional study 
A Moroccan model for the FRAX tool to determine the absolute risk of osteoporotic fracture at 10 years has been established recently. The study aimed to assess the discriminative capacity of FRAX in identifying women with prevalent asymptomatic vertebral fractures (VFs).
We enrolled in this cross-sectional study 908 post-menopausal women with a mean age of 60.9 years ±7.7 (50 to 91) with no prior known diagnosis of osteoporosis. Subjects were recruited from asymptomatic women selected from the general population. Lateral VFA images and scans of the lumbar spine and proximal femur were obtained using a GE Healthcare Lunar Prodigy densitometer. VFs were defined using a combination of Genantsemiquantitative (SQ) approach and morphometry. We calculated the absolute risk of major fracture and hip fracture with and without bone mineral density (BMD)using the FRAX website.The overall discriminative value of the different risk scores was assessed by calculating the areas under the ROC curve (AUC).
VFA images showed that 179 of the participants (19.7%) had at least one grade 2/3 VF. The group of women with VFs had a statistically significant higher FRAX scores for major and hip fractures with and without BMD, and lower weight, height, and lumbar spine and hip BMD and T-scores than those without a VFA-identified VF. The AUC ROC of FRAX for major fracture without BMD was 0.757 (CI 95%; 0.718-0.797) and 0.736 (CI 95%; 0.695-0.777) with BMD, being 0.756 (CI 95%; 0.716-0.796) and 0.747 (CI 95%; 0.709-0.785), respectively for FRAX hip fracture without and with BMD. The AUC ROC of lumbar spine T-score and femoral neck T-score were 0.660 (CI 95%; 0.611-0.708) and 0.707 (CI 95%; 0.664-0.751) respectively.
In asymptomatic post-menopausal women, the FRAX risk for major fracture without BMD had a better discriminative capacity in identifying the women with prevalent VFs than lumbar spine and femoral neck T-scores suggesting its usefulness in identifying women in whom VFA could be indicated.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2474-15-365) contains supplementary material, which is available to authorized users.
PMCID: PMC4226884  PMID: 25366306
FRAX; Bone density; Female; Vertebral fractures; VFA; DXA; Bone; Osteoporosis; Postmenopausal; Menopause; Risk factors; Sensitivity and specificity
5.  On the analysis of glycomics mass spectrometry data via the regularized area under the ROC curve 
BMC Bioinformatics  2007;8:477.
Novel molecular and statistical methods are in rising demand for disease diagnosis and prognosis with the help of recent advanced biotechnology. High-resolution mass spectrometry (MS) is one of those biotechnologies that are highly promising to improve health outcome. Previous literatures have identified some proteomics biomarkers that can distinguish healthy patients from cancer patients using MS data. In this paper, an MS study is demonstrated which uses glycomics to identify ovarian cancer. Glycomics is the study of glycans and glycoproteins. The glycans on the proteins may deviate between a cancer cell and a normal cell and may be visible in the blood. High-resolution MS has been applied to measure relative abundances of potential glycan biomarkers in human serum. Multiple potential glycan biomarkers are measured in MS spectra. With the objection of maximizing the empirical area under the ROC curve (AUC), an analysis method was considered which combines potential glycan biomarkers for the diagnosis of cancer.
Maximizing the empirical AUC of glycomics MS data is a large-dimensional optimization problem. The technical difficulty is that the empirical AUC function is not continuous. Instead, it is in fact an empirical 0–1 loss function with a large number of linear predictors. An approach was investigated that regularizes the area under the ROC curve while replacing the 0–1 loss function with a smooth surrogate function. The constrained threshold gradient descent regularization algorithm was applied, where the regularization parameters were chosen by the cross-validation method, and the confidence intervals of the regression parameters were estimated by the bootstrap method. The method is called TGDR-AUC algorithm. The properties of the approach were studied through a numerical simulation study, which incorporates the positive values of mass spectrometry data with the correlations between measurements within person. The simulation proved asymptotic properties that estimated AUC approaches the true AUC. Finally, mass spectrometry data of serum glycan for ovarian cancer diagnosis was analyzed. The optimal combination based on TGDR-AUC algorithm yields plausible result and the detected biomarkers are confirmed based on biological evidence.
The TGDR-AUC algorithm relaxes the normality and independence assumptions from previous literatures. In addition to its flexibility and easy interpretability, the algorithm yields good performance in combining potential biomarkers and is computationally feasible. Thus, the approach of TGDR-AUC is a plausible algorithm to classify disease status on the basis of multiple biomarkers.
PMCID: PMC2211327  PMID: 18076765
6.  A Comparison of Prediction Models for Fractures in Older Women: Is More Better 
Archives of internal medicine  2009;169(22):2087-2094.
A web-based risk assessment tool (FRAX®) using clinical risk factors with and without femoral neck bone mineral density (BMD) has been incorporated into clinical guidelines regarding treatment to prevent fractures. Our objective is to determine whether prediction with FRAX® models is superior to that based on parsimonious models.
We conducted a prospective cohort study in 6252 women aged ≥65 years and compared the value of FRAX® models that include BMD to parsimonious models based on age and BMD alone for prediction of fractures. We also compared FRAX® models without BMD to simple models based on age and fracture history alone. Fractures (hip, major osteoporotic [hip, clinical vertebral, wrist, or humerus], and any clinical fracture) were ascertained during 10 years of follow-up. Area under the curve (AUC) statistics from receiver operating characteristic (ROC) curve analysis were compared between FRAX® models and simple models.
AUC comparisons revealed no differences between FRAX® models with BMD versus simple models with age and BMD alone in discriminating hip (AUC=0.75 for FRAX® model and 0.76 for simple model, p=0.26); major osteoporotic (AUC=0.68 for FRAX® model and 0.69 for simple model, p=0.51); or clinical fracture (AUC=0.64 for FRAX® model and 0.63 for simple model, p=0.16). Similarly, performance of parsimonious models containing age and fracture history alone was nearly identical to that of FRAX® models without BMD. The proportion of women in each quartile of predicted risk who actually experienced a fracture outcome did not differ between FRAX® and simple models (p≥0.16).
Simple models based on age and BMD alone or age and fracture history alone predicted 10-year risk of hip, major osteoporotic, and clinical fracture as well as more complex FRAX® models.
PMCID: PMC2811407  PMID: 20008691
7.  Assessment of Alveolar Bone Mineral Density as a Predictor of Lumbar Fracture Probability 
Advances in Therapy  2013;30(5):487-502.
Osteoporosis and tooth loss have been linked with advancing age, but no clear relationship between these conditions has been proven. Several studies of bone mineral density measurements of the jaw and spine have shown similarities in their rate of age-related deterioration. Thus, measurements of jawbone density may predict lumbar vertebral bone density. Using jawbone density as a proxy marker would circumvent the need for lumbar bone measurements and facilitate prediction of osteoporotic spinal fracture susceptibility at dental clinics. We aimed to characterize the correlation between bone density in the jaw and spine and the incidence of osteoporotic spinal fractures.
We used computerized radiogrammetry to measure alveolar bone mineral density (al-BMD) and dual-energy X-ray absorptiometry to measure lumbar bone mineral density (L-BMD). L-BMD and al-BMD in 30 female patients (average age: 59 ± 5 years) were correlated with various patient attributes. Statistical analysis included area under the curve (AUC) and probability of asymptomatic significance (PAS) in a receiver operating characteristic curve. The predictive strength of L-BMD T-scores (L-BMD[T]) and al-BMD measurements for fracture occurrence was then compared using multivariate analysis with category weight scoring.
L-BMD and al-BMD were significantly correlated with age, years since menopause, and alveolar bone thickness. Both were also negatively correlated with fracture incidence. Category weight scores were −0.275 for a L-BMD(T) <80%; +0.183 for a L-BMD(T) ≥80%; −0.860 for al-BMD <84.9 (brightness); and +0.860 for al-BMD ≥84.9. AUC and PAS analyses suggested that al-BMD had a higher association with fracture occurrence than L-BMD.
Our results suggest the possible association between al-BMD and vertebral fracture risk. Assessment of alveolar bone density may be useful in patients receiving routine dental exams to monitor the clinical picture and the potential course of osteoporosis in patients who may be at a higher risk of developing osteoporosis.
PMCID: PMC3680661  PMID: 23674163
Alveolar; Bone mineral density; Computerized; Fracture; Lumbar; Osteoporosis; Periodontitis; Predictive; Radiogrammetry
8.  FRAX® tool, the WHO algorithm to predict osteoporotic fractures: the first analysis of its discriminative and predictive ability in the Spanish FRIDEX cohort 
The WHO has recently published the FRAX® tool to determine the absolute risk of osteoporotic fracture at 10 years. This tool has not yet been validated in Spain.
A prospective observational study was undertaken in women in the FRIDEX cohort (Barcelona) not receiving bone active drugs at baseline. Baseline measurements: known risk factors including those of FRAX® and a DXA. Follow up data on self-reported incident major fractures (hip, spine, humerus and wrist) and verified against patient records. The calculation of absolute risk of major fracture and hip fracture was by FRAX® website. This work follows the guidelines of the STROBE initiative for cohort studies. The discriminative capacity of FRAX® was analyzed by the Area Under Curve (AUC), Receiver Operating Characteristics (ROC) and the Hosmer-Lemeshow goodness-of-fit test. The predictive capacity was determined using the ratio of observed fractures/expected fractures by FRAX® (ObsFx/ExpFx).
The study subjects were 770 women from 40 to 90 years of age in the FRIDEX cohort. The mean age was 56.8 ± 8 years. The fractures were determined by structured telephone questionnaire and subsequent testing in medical records at 10 years. Sixty-five (8.4%) women presented major fractures (17 hip fractures). Women with fractures were older, had more previous fractures, more cases of rheumatoid arthritis and also more osteoporosis on the baseline DXA. The AUC ROC of FRAX® for major fracture without bone mineral density (BMD) was 0.693 (CI 95%; 0.622-0.763), with T-score of femoral neck (FN) 0.716 (CI 95%; 0.646-0.786), being 0.888 (CI 95%; 0.824-0.952) and 0.849 (CI 95%; 0.737-0.962), respectively for hip fracture. In the model with BMD alone was 0.661 (CI 95%; 0.583-0.739) and 0.779 (CI 95%; 0.631-0.929). In the model with age alone was 0.668 (CI 95%; 0.603-0.733) and 0.882 (CI 95%; 0.832-0.936). In both cases there are not significant differences against FRAX® model. The overall predictive value for major fracture by ObsFx/ExpFx ratio was 2.4 and 2.8 for hip fracture without BMD. With BMD was 2.2 and 2.3 respectively. Sensitivity of the four was always less than 50%. The Hosmer-Lemeshow test showed a good correlation only after calibration with ObsFx/ExpFx ratio.
The current version of FRAX® for Spanish women without BMD analzsed by the AUC ROC demonstrate a poor discriminative capacity to predict major fractures but a good discriminative capacity for hip fractures. Its predictive capacity does not adjust well because leading to underdiagnosis for both predictions major and hip fractures. Simple models based only on age or BMD alone similarly predicted that more complex FRAX® models.
PMCID: PMC3518201  PMID: 23088223
9.  Comparison of Candidate Serologic Markers for Type I and Type II Ovarian Cancer 
Gynecologic oncology  2011;122(3):560-566.
To examine the value of individual and combinations of ovarian cancer associated blood biomarkers for the discrimination between plasma of patients with type I or II ovarian cancer and disease-free volunteers.
Levels of 14 currently promising ovarian cancer-related biomarkers, including CA125, macrophage inhibitory factor-1 (MIF-1), leptin, prolactin, osteopontin (OPN), insulin-like growth factor-II (IGF-II), autoantibodies (AAbs) to eight proteins: p53, NY-ESO-1, p16, ALPP, CTSD, B23, GRP78, and SSX, were measured in the plasma of 151 ovarian cancer patients, 23 with borderline ovarian tumors, 55 with benign tumors and 75 healthy controls.
When examined individually, seven candidate biomarkers (MIF, Prolactin, CA-125, OPN, Leptin, IGF-II and p53 AAbs) had significantly different plasma levels between type II ovarian cancer patients and healthy controls. Based on the receiver operating characteristic (ROC) curves constructed and area under the curve (AUC) calculated, CA125 exhibited the greatest power to discriminate the plasma samples of type II cancer patients from normal volunteers (AUC 0.9310), followed by IGF-II (AUC 0.8514), OPN (AUC 0.7888), leptin (AUC 0.7571), prolactin (AUC 0.7247), p53 AAbs (AUC 0.7033), and MIF (AUC 0.6992). p53 AAbs levels exhibited the lowest correlation with CA125 levels among the six markers, suggesting the potential of p53 AAbs as a biomarker independent of CA125. Indeed, p53 AAbs increased the AUC of ROC curve to the greatest extent when combining CA125 with one of the other markers. At a fixed specificity of 100%, the addition of p53 AAbs to CA125 increased sensitivity from 73.8% to 85.7% to discriminate type II cancer patients from normal controls. Notably, seropositivity of p53 AAbs is comparable in type II ovarian cancer patients with negative and positive CA125, but has no value for type I ovarian cancer patients.
p53 AAbs might be a useful blood-based biomarker for the detection of type II ovarian cancer, especially when combined with CA125 levels.
PMCID: PMC3152615  PMID: 21704359
10.  Utilization of DXA Bone Mineral Densitometry in Ontario 
Executive Summary
Systematic reviews and analyses of administrative data were performed to determine the appropriate use of bone mineral density (BMD) assessments using dual energy x-ray absorptiometry (DXA), and the associated trends in wrist and hip fractures in Ontario.
Dual Energy X-ray Absorptiometry Bone Mineral Density Assessment
Dual energy x-ray absorptiometry bone densitometers measure bone density based on differential absorption of 2 x-ray beams by bone and soft tissues. It is the gold standard for detecting and diagnosing osteoporosis, a systemic disease characterized by low bone density and altered bone structure, resulting in low bone strength and increased risk of fractures. The test is fast (approximately 10 minutes) and accurate (exceeds 90% at the hip), with low radiation (1/3 to 1/5 of that from a chest x-ray). DXA densitometers are licensed as Class 3 medical devices in Canada. The World Health Organization has established criteria for osteoporosis and osteopenia based on DXA BMD measurements: osteoporosis is defined as a BMD that is >2.5 standard deviations below the mean BMD for normal young adults (i.e. T-score <–2.5), while osteopenia is defined as BMD that is more than 1 standard deviation but less than 2.5 standard deviation below the mean for normal young adults (i.e. T-score< –1 & ≥–2.5). DXA densitometry is presently an insured health service in Ontario.
Clinical Need
Burden of Disease
The Canadian Multicenter Osteoporosis Study (CaMos) found that 16% of Canadian women and 6.6% of Canadian men have osteoporosis based on the WHO criteria, with prevalence increasing with age. Osteopenia was found in 49.6% of Canadian women and 39% of Canadian men. In Ontario, it is estimated that nearly 530,000 Ontarians have some degrees of osteoporosis. Osteoporosis-related fragility fractures occur most often in the wrist, femur and pelvis. These fractures, particularly those in the hip, are associated with increased mortality, and decreased functional capacity and quality of life. A Canadian study showed that at 1 year after a hip fracture, the mortality rate was 20%. Another 20% required institutional care, 40% were unable to walk independently, and there was lower health-related quality of life due to attributes such as pain, decreased mobility and decreased ability to self-care. The cost of osteoporosis and osteoporotic fractures in Canada was estimated to be $1.3 billion in 1993.
Guidelines for Bone Mineral Density Testing
With 2 exceptions, almost all guidelines address only women. None of the guidelines recommend blanket population-based BMD testing. Instead, all guidelines recommend BMD testing in people at risk of osteoporosis, predominantly women aged 65 years or older. For women under 65 years of age, BMD testing is recommended only if one major or two minor risk factors for osteoporosis exist. Osteoporosis Canada did not restrict its recommendations to women, and thus their guidelines apply to both sexes. Major risk factors are age greater than or equal to 65 years, a history of previous fractures, family history (especially parental history) of fracture, and medication or disease conditions that affect bone metabolism (such as long-term glucocorticoid therapy). Minor risk factors include low body mass index, low calcium intake, alcohol consumption, and smoking.
Current Funding for Bone Mineral Density Testing
The Ontario Health Insurance Program (OHIP) Schedule presently reimburses DXA BMD at the hip and spine. Measurements at both sites are required if feasible. Patients at low risk of accelerated bone loss are limited to one BMD test within any 24-month period, but there are no restrictions on people at high risk. The total fee including the professional and technical components for a test involving 2 or more sites is $106.00 (Cdn).
Method of Review
This review consisted of 2 parts. The first part was an analysis of Ontario administrative data relating to DXA BMD, wrist and hip fractures, and use of antiresorptive drugs in people aged 65 years and older. The Institute for Clinical Evaluative Sciences extracted data from the OHIP claims database, the Canadian Institute for Health Information hospital discharge abstract database, the National Ambulatory Care Reporting System, and the Ontario Drug Benefit database using OHIP and ICD-10 codes. The data was analyzed to examine the trends in DXA BMD use from 1992 to 2005, and to identify areas requiring improvement.
The second part included systematic reviews and analyses of evidence relating to issues identified in the analyses of utilization data. Altogether, 8 reviews and qualitative syntheses were performed, consisting of 28 published systematic reviews and/or meta-analyses, 34 randomized controlled trials, and 63 observational studies.
Findings of Utilization Analysis
Analysis of administrative data showed a 10-fold increase in the number of BMD tests in Ontario between 1993 and 2005.
OHIP claims for BMD tests are presently increasing at a rate of 6 to 7% per year. Approximately 500,000 tests were performed in 2005/06 with an age-adjusted rate of 8,600 tests per 100,000 population.
Women accounted for 90 % of all BMD tests performed in the province.
In 2005/06, there was a 2-fold variation in the rate of DXA BMD tests across local integrated health networks, but a 10-fold variation between the county with the highest rate (Toronto) and that with the lowest rate (Kenora). The analysis also showed that:
With the increased use of BMD, there was a concomitant increase in the use of antiresorptive drugs (as shown in people 65 years and older) and a decrease in the rate of hip fractures in people age 50 years and older.
Repeat BMD made up approximately 41% of all tests. Most of the people (>90%) who had annual BMD tests in a 2-year or 3-year period were coded as being at high risk for osteoporosis.
18% (20,865) of the people who had a repeat BMD within a 24-month period and 34% (98,058) of the people who had one BMD test in a 3-year period were under 65 years, had no fracture in the year, and coded as low-risk.
Only 19% of people age greater than 65 years underwent BMD testing and 41% received osteoporosis treatment during the year following a fracture.
Men accounted for 24% of all hip fractures and 21 % of all wrist fractures, but only 10% of BMD tests. The rates of BMD tests and treatment in men after a fracture were only half of those in women.
In both men and women, the rate of hip and wrist fractures mainly increased after age 65 with the sharpest increase occurring after age 80 years.
Findings of Systematic Review and Analysis
Serial Bone Mineral Density Testing for People Not Receiving Osteoporosis Treatment
A systematic review showed that the mean rate of bone loss in people not receiving osteoporosis treatment (including postmenopausal women) is generally less than 1% per year. Higher rates of bone loss were reported for people with disease conditions or on medications that affect bone metabolism. In order to be considered a genuine biological change, the change in BMD between serial measurements must exceed the least significant change (variability) of the testing, ranging from 2.77% to 8% for precisions ranging from 1% to 3% respectively. Progression in BMD was analyzed, using different rates of baseline BMD values, rates of bone loss, precision, and BMD value for initiating treatment. The analyses showed that serial BMD measurements every 24 months (as per OHIP policy for low-risk individuals) is not necessary for people with no major risk factors for osteoporosis, provided that the baseline BMD is normal (T-score ≥ –1), and the rate of bone loss is less than or equal to 1% per year. The analyses showed that for someone with a normal baseline BMD and a rate of bone loss of less than 1% per year, the change in BMD is not likely to exceed least significant change (even for a 1% precision) in less than 3 years after the baseline test, and is not likely to drop to a BMD level that requires initiation of treatment in less than 16 years after the baseline test.
Serial Bone Mineral Density Testing in People Receiving Osteoporosis Therapy
Seven published meta-analysis of randomized controlled trials (RCTs) and 2 recent RCTs on BMD monitoring during osteoporosis therapy showed that although higher increases in BMD were generally associated with reduced risk of fracture, the change in BMD only explained a small percentage of the fracture risk reduction.
Studies showed that some people with small or no increase in BMD during treatment experienced significant fracture risk reduction, indicating that other factors such as improved bone microarchitecture might have contributed to fracture risk reduction.
There is conflicting evidence relating to the role of BMD testing in improving patient compliance with osteoporosis therapy.
Even though BMD may not be a perfect surrogate for reduction in fracture risk when monitoring responses to osteoporosis therapy, experts advised that it is still the only reliable test available for this purpose.
A systematic review conducted by the Medical Advisory Secretariat showed that the magnitude of increases in BMD during osteoporosis drug therapy varied among medications. Although most of the studies yielded mean percentage increases in BMD from baseline that did not exceed the least significant change for a 2% precision after 1 year of treatment, there were some exceptions.
Bone Mineral Density Testing and Treatment After a Fragility Fracture
A review of 3 published pooled analyses of observational studies and 12 prospective population-based observational studies showed that the presence of any prevalent fracture increases the relative risk for future fractures by approximately 2-fold or more. A review of 10 systematic reviews of RCTs and 3 additional RCTs showed that therapy with antiresorptive drugs significantly reduced the risk of vertebral fractures by 40 to 50% in postmenopausal osteoporotic women and osteoporotic men, and 2 antiresorptive drugs also reduced the risk of nonvertebral fractures by 30 to 50%. Evidence from observational studies in Canada and other jurisdictions suggests that patients who had undergone BMD measurements, particularly if a diagnosis of osteoporosis is made, were more likely to be given pharmacologic bone-sparing therapy. Despite these findings, the rate of BMD investigation and osteoporosis treatment after a fracture remained low (<20%) in Ontario as well as in other jurisdictions.
Bone Mineral Density Testing in Men
There are presently no specific Canadian guidelines for BMD screening in men. A review of the literature suggests that risk factors for fracture and the rate of vertebral deformity are similar for men and women, but the mortality rate after a hip fracture is higher in men compared with women. Two bisphosphonates had been shown to reduce the risk of vertebral and hip fractures in men. However, BMD testing and osteoporosis treatment were proportionately low in Ontario men in general, and particularly after a fracture, even though men accounted for 25% of the hip and wrist fractures. The Ontario data also showed that the rates of wrist fracture and hip fracture in men rose sharply in the 75- to 80-year age group.
Ontario-Based Economic Analysis
The economic analysis focused on analyzing the economic impact of decreasing future hip fractures by increasing the rate of BMD testing in men and women age greater than or equal to 65 years following a hip or wrist fracture. A decision analysis showed the above strategy, especially when enhanced by improved reporting of BMD tests, to be cost-effective, resulting in a cost-effectiveness ratio ranging from $2,285 (Cdn) per fracture avoided (worst-case scenario) to $1,981 (Cdn) per fracture avoided (best-case scenario). A budget impact analysis estimated that shifting utilization of BMD testing from the low risk population to high risk populations within Ontario would result in a saving of $0.85 million to $1.5 million (Cdn) to the health system. The potential net saving was estimated at $1.2 million to $5 million (Cdn) when the downstream cost-avoidance due to prevention of future hip fractures was factored into the analysis.
Other Factors for Consideration
There is a lack of standardization for BMD testing in Ontario. Two different standards are presently being used and experts suggest that variability in results from different facilities may lead to unnecessary testing. There is also no requirement for standardized equipment, procedure or reporting format. The current reimbursement policy for BMD testing encourages serial testing in people at low risk of accelerated bone loss. This review showed that biannual testing is not necessary for all cases. The lack of a database to collect clinical data on BMD testing makes it difficult to evaluate the clinical profiles of patients tested and outcomes of the BMD tests. There are ministry initiatives in progress under the Osteoporosis Program to address the development of a mandatory standardized requisition form for BMD tests to facilitate data collection and clinical decision-making. Work is also underway for developing guidelines for BMD testing in men and in perimenopausal women.
Increased use of BMD in Ontario since 1996 appears to be associated with increased use of antiresorptive medication and a decrease in hip and wrist fractures.
Data suggest that as many as 20% (98,000) of the DXA BMD tests in Ontario in 2005/06 were performed in people aged less than 65 years, with no fracture in the current year, and coded as being at low risk for accelerated bone loss; this is not consistent with current guidelines. Even though some of these people might have been incorrectly coded as low-risk, the number of tests in people truly at low risk could still be substantial.
Approximately 4% (21,000) of the DXA BMD tests in 2005/06 were repeat BMDs in low-risk individuals within a 24-month period. Even though this is in compliance with current OHIP reimbursement policies, evidence showed that biannual serial BMD testing is not necessary in individuals without major risk factors for fractures, provided that the baseline BMD is normal (T-score < –1). In this population, BMD measurements may be repeated in 3 to 5 years after the baseline test to establish the rate of bone loss, and further serial BMD tests may not be necessary for another 7 to 10 years if the rate of bone loss is no more than 1% per year. Precision of the test needs to be considered when interpreting serial BMD results.
Although changes in BMD may not be the perfect surrogate for reduction in fracture risk as a measure of response to osteoporosis treatment, experts advised that it is presently the only reliable test for monitoring response to treatment and to help motivate patients to continue treatment. Patients should not discontinue treatment if there is no increase in BMD after the first year of treatment. Lack of response or bone loss during treatment should prompt the physician to examine whether the patient is taking the medication appropriately.
Men and women who have had a fragility fracture at the hip, spine, wrist or shoulder are at increased risk of having a future fracture, but this population is presently under investigated and under treated. Additional efforts have to be made to communicate to physicians (particularly orthopaedic surgeons and family physicians) and the public about the need for a BMD test after fracture, and for initiating treatment if low BMD is found.
Men had a disproportionately low rate of BMD tests and osteoporosis treatment, especially after a fracture. Evidence and fracture data showed that the risk of hip and wrist fractures in men rises sharply at age 70 years.
Some counties had BMD utilization rates that were only 10% of that of the county with the highest utilization. The reasons for low utilization need to be explored and addressed.
Initiatives such as aligning reimbursement policy with current guidelines, developing specific guidelines for BMD testing in men and perimenopausal women, improving BMD reports to assist in clinical decision making, developing a registry to track BMD tests, improving access to BMD tests in remote/rural counties, establishing mechanisms to alert family physicians of fractures, and educating physicians and the public, will improve the appropriate utilization of BMD tests, and further decrease the rate of fractures in Ontario. Some of these initiatives such as developing guidelines for perimenopausal women and men, and developing a standardized requisition form for BMD testing, are currently in progress under the Ontario Osteoporosis Strategy.
PMCID: PMC3379167  PMID: 23074491
11.  Clinical performance of osteoporosis risk assessment tools in women aged 67 years and older 
Clinical performance of osteoporosis risk assessment tools was studied in women aged 67 years and older. Weight was as accurate as two of the tools to detect low bone density. Discriminatory ability was slightly better for the OST risk tool, which is based only on age and weight.
Screening performance of osteoporosis risk assessment tools has not been tested in a large, population-based US cohort.
We conducted a diagnostic accuracy analysis of the Osteoporosis Self-assessment Tool (OST), Osteoporosis Risk Assessment Instrument (ORAI), Simple Calculated Osteoporosis Risk Estimation (SCORE), and individual risk factors (age, weight or prior fracture) to identify low central (hip and lumbar spine) bone mineral density (BMD) in 7779 US women aged 67 years and older participating in the Study of Osteoporotic Fractures.
The OST had the greatest area under the receiver operating characteristic curve (AUC 0.76, 95% CI 0.74, 0.77). Weight had an AUC of 0.73 (95% CI 0.72, 0.75), which was ≥AUC values for the ORAI, SCORE, age or prior fracture. Using cut points from the development papers, the risk tools had sensitivities ≥85% and specificities ≤48%. When new cut points were set to achieve a likelihood ratio of negative 0.1–0.2, the tools ruled out fewer than 1/4 of women without low central BMD.
Weight identified low central BMD as accurately as the ORAI and SCORE. The risk tools would be unlikely to show an advantage over simple weight cut points in an osteoporosis screening protocol for elderly women.
PMCID: PMC2562917  PMID: 18219434
Bone density; Female; Mass screening; Osteoporosis; Postmenopause; Risk assessment
12.  A computational pipeline for the development of multi-marker bio-signature panels and ensemble classifiers 
BMC Bioinformatics  2012;13:326.
Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble?
The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity.
Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway.
PMCID: PMC3575305  PMID: 23216969
Biomarkers; Computational; Pipeline; Genomics; Proteomics; Ensemble; Classification
13.  The WHO Absolute Fracture Risk Models (FRAX): Do Clinical Risk Factors Improve Fracture Prediction in Older Women Without Osteoporosis? 
Bone mineral density (BMD) is a strong predictor of fracture, yet most fractures occur in women without osteoporosis by BMD criteria. To improve fracture-risk prediction, the World Health Organization recently developed a country-specific fracture risk index of clinical risk factors (FRAX®) that estimates 10-year probabilities of hip and major osteoporotic fracture. Within differing baseline BMD categories, we evaluated 6252 women age 65 and older in the Study of Osteoporotic Fractures using FRAX 10-year probabilities of hip and major osteoporotic fracture (hip, clinical spine, wrist, humerus) compared to incidence of fractures over 10 years of follow-up. Overall ability of FRAX to predict fracture risk based on initial BMD T-score categories (normal, low bone mass, and osteoporosis) was evaluated with receiver-operating-characteristic (ROC) analyses using area-under-the-curve (AUC). Over 10 years of follow-up, 368 women incurred a hip fracture, and 1011 a major osteoporotic fracture. Women with low bone mass represented the majority (n=3791; 61%); they developed many hip (n=176; 48%) and major osteoporotic fractures (n=569; 56%). Among women with normal and low bone mass, FRAX (including BMD) was an overall better predictor of hip fracture risk (AUC = 0.78 and 0.70, respectively) than major osteoporotic fractures (AUC = 0.64 and 0.62). Simpler models (e.g., age+prior fracture) had similar AUCs to FRAX, including among women for whom primary prevention is sought (no prior fracture or osteoporosis by BMD). The FRAX, and simpler models, predict 10-year risk of incident hip and major osteoporotic fractures in older U.S. women with normal or low bone mass.
PMCID: PMC3622725  PMID: 21351144
osteopenia; osteoporosis; fracture; risk; prediction
14.  Bone fracture risk estimation based on image similarity 
Bone  2009;45(3):560-567.
We propose a fracture risk estimation technique based on image similarity. We employ image similarity indices to determine how images are similar to each other in their 3D bone mineral density distributions. Our premise for fracture risk estimation is that if a given scan is more similar to scans of subjects known to have fractures than to scans of control subjects, this subject is likely to have a higher degree of fracture risk. To test this hypothesis, we analyzed hip QCT scans of 37 patients with hip fractures and 38 age-matched controls. We divided the scans randomly into two groups: the Model Group and the Test Group. For each scan in the the Test Group, the difference between the mean value of its image similarities to the Model fracture group and the mean value of its image similarities to the Model control group was used as index of fracture risk. We then used the estimated fracture risk indices to discriminate the fractured patients and controls in the Test Group. A test scan with a larger mean value of image similarities with respect to the Model fracture group was classified as a scan from a fractured patient, otherwise it was classified as a scan from a control subject. Based on ROC analysis, we compared the discrimination performances using image similarity measures with that obtained by using bone mineral density (BMD). When using BMD measured in the femoral neck, with the optimal BMD cutoff, the sensitivity and specificity were 86.5% and 73.7%. For the image similarity measures, the sensitivity ranged between 86.5% and 100%, and specificity ranged between 63.2% and 76.3%. By combining BMD with image similarity measures, the sensitivity and specificity reached 94.6% and 76.3% using linear discriminant analysis (LDA) algorithm, or 91.9% and 81.6% using recursive partitioning and regression trees (RPART) algorithm. In the RPART approach, the AUC value of the ROC curve was 0.923, higher than the AUC value of 0.835 when using BMD alone (p-value: 0.0046). Our results showed that combining BMD with image similarity measures resulted in improved hip fracture risk estimation.
PMCID: PMC2896043  PMID: 19414074
osteoporosis; proximal femur; QCT; mutual information; image registration
15.  FRAX®: Prediction of Major Osteoporotic Fractures in Women from the General Population: The OPUS Study 
PLoS ONE  2013;8(12):e83436.
The aim of this study was to analyse how well FRAX® predicts the risk of major osteoporotic and vertebral fractures over 6 years in postmenopausal women from general population.
Patients and methods
The OPUS study was conducted in European women aged above 55 years, recruited in 5 centers from random population samples and followed over 6 years. The population for this study consisted of 1748 women (mean age 74.2 years) with information on incident fractures. 742 (43.1%) had a prevalent fracture; 769 (44%) and 155 (8.9%) of them received an antiosteoporotic treatment before and during the study respectively. We compared FRAX® performance with and without bone mineral density (BMD) using receiver operator characteristic (ROC) c-statistical analysis with ORs and areas under receiver operating characteristics curves (AUCs) and net reclassification improvement (NRI).
85 (4.9%) patients had incident major fractures over 6 years. FRAX® with and without BMD predicted these fractures with an AUC of 0.66 and 0.62 respectively. The AUC were 0.60, 0.66, 0.69 for history of low trauma fracture alone, age and femoral neck (FN) BMD and combination of the 3 clinical risk factors, respectively. FRAX® with and without BMD predicted incident radiographic vertebral fracture (n = 65) with an AUC of 0.67 and 0.65 respectively. NRI analysis showed a significant improvement in risk assignment when BMD is added to FRAX®.
This study shows that FRAX® with BMD and to a lesser extent also without FN BMD predict major osteoporotic and vertebral fractures in the general population.
PMCID: PMC3875449  PMID: 24386199
16.  Prediction of protein-protein interaction sites using an ensemble method 
BMC Bioinformatics  2009;10:426.
Prediction of protein-protein interaction sites is one of the most challenging and intriguing problems in the field of computational biology. Although much progress has been achieved by using various machine learning methods and a variety of available features, the problem is still far from being solved.
In this paper, an ensemble method is proposed, which combines bootstrap resampling technique, SVM-based fusion classifiers and weighted voting strategy, to overcome the imbalanced problem and effectively utilize a wide variety of features. We evaluate the ensemble classifier using a dataset extracted from 99 polypeptide chains with 10-fold cross validation, and get a AUC score of 0.86, with a sensitivity of 0.76 and a specificity of 0.78, which are better than that of the existing methods. To improve the usefulness of the proposed method, two special ensemble classifiers are designed to handle the cases of missing homologues and structural information respectively, and the performance is still encouraging. The robustness of the ensemble method is also evaluated by effectively classifying interaction sites from surface residues as well as from all residues in proteins. Moreover, we demonstrate the applicability of the proposed method to identify interaction sites from the non-structural proteins (NS) of the influenza A virus, which may be utilized as potential drug target sites.
Our experimental results show that the ensemble classifiers are quite effective in predicting protein interaction sites. The Sub-EnClassifiers with resampling technique can alleviate the imbalanced problem and the combination of Sub-EnClassifiers with a wide variety of feature groups can significantly improve prediction performance.
PMCID: PMC2808167  PMID: 20015386
NeuroImage  2013;87:220-241.
Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer’s disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and under sampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1). a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2). sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results.
PMCID: PMC3946903  PMID: 24176869
Alzheimer’s disease; classification; imbalanced data; undersampling; oversampling; feature selection
18.  Computerised analysis of osteoporotic bone patterns using texture parameters characterising bone architecture 
The British Journal of Radiology  2013;86(1021):20101115.
To evaluate the geometric change of osteoporotic bone trabecular patterns using root mean square (RMS) values, first moment power spectrum (FMP) values and fractal dimension values. With the use of these methods, we attempted computerised analysis of osteoporotic bone patterns using texture parameters characterising bone architecture and computer-aided diagnosis of osteoporosis.
32 patient cases from Korea University Guro Hospital were analysed. Patient ages ranged from 51 to 89 years, with a mean age of 65 years. Receiver operating characteristic curve analysis was performed with determination of the area under the curve (AUC).
The bone mineral density (BMD) measurement (AUC=0.78) was a better indicator of bone quantity than the RMS, FMP and fractal dimension values (AUC=0.72) for diagnosis; therefore the combination of RMS, FMP and fractal dimension values was a better indicator of bone quality.
Measurements that combined BMD measurement and RMS values and combined FMP and fractal dimension values (AUC=0.85) together produced better results than the use of the two parameter sets separately for a diagnosis of osteoporosis.
Advances in knowledge
For more effective application, additional study on more cases and data will be required.
PMCID: PMC3615401  PMID: 23239687
19.  Biomarker selection for medical diagnosis using the partial area under the ROC curve 
BMC Research Notes  2014;7:25.
A biomarker is usually used as a diagnostic or assessment tool in medical research. Finding an ideal biomarker is not easy and combining multiple biomarkers provides a promising alternative. Moreover, some biomarkers based on the optimal linear combination do not have enough discriminatory power. As a result, the aim of this study was to find the significant biomarkers based on the optimal linear combination maximizing the pAUC for assessment of the biomarkers.
Under the binormality assumption we obtain the optimal linear combination of biomarkers maximizing the partial area under the receiver operating characteristic curve (pAUC). Related statistical tests are developed for assessment of a biomarker set and of an individual biomarker. Stepwise biomarker selections are introduced to identify those biomarkers of statistical significance.
The results of simulation study and three real examples, Duchenne Muscular Dystrophy disease, heart disease, and breast tissue example are used to show that our methods are most suitable biomarker selection for the data sets of a moderate number of biomarkers.
Our proposed biomarker selection approaches can be used to find the significant biomarkers based on hypothesis testing.
PMCID: PMC3923449  PMID: 24410929
Discriminatory power; Hypothesis testing; Optimal linear combination; Partial area under ROC curve; Stepwise biomarker selection
20.  Underestimation of the Calculated Area Under the Concentration-Time Curve Based on Serum Creatinine for Vancomycin Dosing 
Infection & Chemotherapy  2014;46(1):21-29.
The ratio of the steady-state 24-hour area under the concentration-time curve (ssAUC24) to the MIC (AUC24/MIC) for vancomycin has been recommended as the preferred pharmacodynamic index. The aim of this study was to assess whether the calculated AUC24 (cAUC24) using the creatinine clearance (CLcr) differs from the ssAUC24 based on the individual pharmacokinetic data estimated by a commercial software.
Materials and Methods
The cAUC24 was compared with the ssAUC24 with respect to age, body mass index, and trough concentration of vancomycin and the results were expressed as median and interquartile ranges. A correlation between the cAUC24 and ssAUC24 and the trough concentration of vancomycin was evaluated. The probability of reaching an AUC24/MIC of 400 or higher was compared between the cAUC24 and ssAUC24 for different MICs of vancomycin and different daily doses by simulation in a subgroup with a trough concentration of 10 mg/L and higher.
The cAUC24 was significantly lower than the ssAUC24 (392.38 vs. 418.32 mg·hr/L, P < 0.0001) and correlated weakly with the trough concentration (r = 0.649 vs. r = 0.964). Assuming a MIC of 1.0 mg/L, the probability of reaching the value of 400 or higher was 77.5% for the cAUC24/MIC and 100% for the ssAUC24/MIC in patients with a trough concentration of 10 mg/L and higher. If the MIC increased to 2.0 mg/L, the probability was 57.7% for the cAUC24/MIC and 71.8% for the ssAUC24/MIC at a daily vancomycin dose of 4,000 mg.
The cAUC24 using the calculated CLcr is usually underestimated compared with the ssAUC24 based on individual pharmacokinetic data. Therefore, to obtain a more accurate AUC24, therapeutic monitoring of vancomycin rather than a simple calculation based on the CLcr should be performed, and a more accurate biomarker for renal function is needed.
PMCID: PMC3970305  PMID: 24693466
Vancomycin; Pharmacodynamics; Area under curve; Drug monitoring, Therapeutic
21.  AMS 4.0: consensus prediction of post-translational modifications in protein sequences 
Amino Acids  2012;43(2):573-582.
We present here the 2011 update of the AutoMotif Service (AMS 4.0) that predicts the wide selection of 88 different types of the single amino acid post-translational modifications (PTM) in protein sequences. The selection of experimentally confirmed modifications is acquired from the latest UniProt and Phospho.ELM databases for training. The sequence vicinity of each modified residue is represented using amino acids physico-chemical features encoded using high quality indices (HQI) obtaining by automatic clustering of known indices extracted from AAindex database. For each type of the numerical representation, the method builds the ensemble of Multi-Layer Perceptron (MLP) pattern classifiers, each optimising different objectives during the training (for example the recall, precision or area under the ROC curve (AUC)). The consensus is built using brainstorming technology, which combines multi-objective instances of machine learning algorithm, and the data fusion of different training objects representations, in order to boost the overall prediction accuracy of conserved short sequence motifs. The performance of AMS 4.0 is compared with the accuracy of previous versions, which were constructed using single machine learning methods (artificial neural networks, support vector machine). Our software improves the average AUC score of the earlier version by close to 7 % as calculated on the test datasets of all 88 PTM types. Moreover, for the selected most-difficult sequence motifs types it is able to improve the prediction performance by almost 32 %, when compared with previously used single machine learning methods. Summarising, the brainstorming consensus meta-learning methodology on the average boosts the AUC score up to around 89 %, averaged over all 88 PTM types. Detailed results for single machine learning methods and the consensus methodology are also provided, together with the comparison to previously published methods and state-of-the-art software tools. The source code and precompiled binaries of brainstorming tool are available at under Apache 2.0 licensing.
Electronic supplementary material
The online version of this article (doi:10.1007/s00726-012-1290-2) contains supplementary material, which is available to authorized users.
PMCID: PMC3397139  PMID: 22555647
Post-translational modifications; AMS-4; High quality indices; MLP; Consensus
22.  Computational Prediction of Conformational B-Cell Epitopes from Antigen Primary Structures by Ensemble Learning 
PLoS ONE  2012;7(8):e43575.
The conformational B-cell epitopes are the specific sites on the antigens that have immune functions. The identification of conformational B-cell epitopes is of great importance to immunologists for facilitating the design of peptide-based vaccines. As an attempt to narrow the search for experimental validation, various computational models have been developed for the epitope prediction by using antigen structures. However, the application of these models is undermined by the limited number of available antigen structures. In contrast to the most of available structure-based methods, we here attempt to accurately predict conformational B-cell epitopes from antigen sequences.
In this paper, we explore various sequence-derived features, which have been observed to be associated with the location of epitopes or ever used in the similar tasks. These features are evaluated and ranked by their discriminative performance on the benchmark datasets. From the perspective of information science, the combination of various features can usually lead to better results than the individual features. In order to build the robust model, we adopt the ensemble learning approach to incorporate various features, and develop the ensemble model to predict conformational epitopes from antigen sequences.
Evaluated by the leave-one-out cross validation, the proposed method gives out the mean AUC scores of 0.687 and 0.651 on two datasets respectively compiled from the bound structures and unbound structures. When compared with publicly available servers by using the independent dataset, our method yields better or comparable performance. The results demonstrate the proposed method is useful for the sequence-based conformational epitope prediction.
The web server and datasets are freely available at
PMCID: PMC3424238  PMID: 22927994
23.  Comparison of Traditional Cardiovascular Risk Models and Coronary Atherosclerotic Plaque as Detected by Computed Tomography for Prediction of Acute Coronary Syndrome in Patients With Acute Chest Pain 
The objective was to determine the association of four clinical risk scores and coronary plaque burden as detected by computed tomography (CT) with the outcome of acute coronary syndrome (ACS) in patients with acute chest pain. The hypothesis was that the combination of risk scores and plaque burden improved the discriminatory capacity for the diagnosis of ACS.
The study was a subanalysis of the Rule Out Myocardial Infarction Using Computer-Assisted Tomography (ROMICAT) trial—a prospective observational cohort study. The authors enrolled patients presenting to the emergency department (ED) with a chief complaint of acute chest pain, inconclusive initial evaluation (negative biomarkers, nondiagnostic electrocardiogram [ECG]), and no history of coronary artery disease (CAD). Patients underwent contrast-enhanced 64-multidetector-row cardiac CT and received standard clinical care (serial ECG, cardiac biomarkers, and subsequent diagnostic testing, such as exercise treadmill testing, nuclear stress perfusion imaging, and/or invasive coronary angiography), as deemed clinically appropriate. The clinical providers were blinded to CT results. The chest pain score was calculated and the results were dichotomized to ≥10 (high-risk) and <10 (low-risk). Three risk scores were calculated, Goldman, Sanchis, and Thrombolysis in Myocardial Infarction (TIMI), and each patient was assigned to a low-, intermediate-, or high-risk category. Because of the low number of subjects in the high-risk group, the intermediate- and high-risk groups were combined into one. CT images were evaluated for the presence of plaque in 17 coronary segments. Plaque burden was stratified into none, intermediate, and high (zero, one to four, and more than four segments with plaque). An outcome panel of two physicians (blinded to CT findings) established the primary outcome of ACS (defined as either an acute myocardial infarction or unstable angina) during the index hospitalization (from the presentation to the ED to the discharge from the hospital). Logistic regression modeling was performed to examine the association of risk scores and coronary plaque burden to the outcome of ACS. Unadjusted models were individually fitted for the coronary plaque burden and for Goldman, Sanchis, TIMI, and chest pain scores. In adjusted analyses, the authors tested whether the association between risk scores and ACS persisted after controlling for the coronary plaque burden. The prognostic discriminatory capacity of the risk scores and plaque burden for ACS was assessed using c-statistics. The differences in area under the receiver-operating characteristic curve (AUC) and c-statistics were tested by performing the −2 log likelihood ratio test of nested models. A p value <0.05 was considered statistically significant.
Among 368 subjects, 31 (8%) subjects were diagnosed with ACS. Goldman (AUC = 0.61), Sanchis (AUC = 0.71), and TIMI (AUC = 0.63) had modest discriminatory capacity for the diagnosis of ACS. Plaque burden was the strongest predictor of ACS (AUC = 0.86; p < 0.05 for all comparisons with individual risk scores). The combination of plaque burden and risk scores improved prediction of ACS (plaque + Goldman AUC = 0.88, plaque + Sanchis AUC = 0.90, plaque + TIMI AUC = 0.88; p < 0.01 for all comparisons with coronary plaque burden alone).
Risk scores (Goldman, Sanchis, TIMI) have modest discriminatory capacity and coronary plaque burden has good discriminatory capacity for the diagnosis of ACS in patients with acute chest pain. The combined information of risk scores and plaque burden significantly improves the discriminatory capacity for the diagnosis of ACS.
PMCID: PMC3424404  PMID: 22849339
24.  Prediction-based structured variable selection through the receiver operating characteristic curves 
Biometrics  2010;67(3):896-905.
In many clinical settings, a commonly encountered problem is to assess accuracy of a screening test for early detection of a disease. In these applications, predictive performance of the test is of interest. Variable selection may be useful in designing a medical test. An example is a research study conducted to design a new screening test by selecting variables from an existing screener with a hierarchical structure among variables: there are several root questions followed by their stem questions. The stem questions will only be asked after a subject has answered the root question. It is therefore unreasonable to select a model that only contains stem variables but not its root variable. In this work, we propose methods to perform variable selection with structured variables when predictive accuracy of a diagnostic test is the main concern of the analysis. We take a linear combination of individual variables to form a combined test. We then maximize a direct summary measure of the predictive performance of the test, the area under a receiver operating characteristic curve (AUC of an ROC), subject to a penalty function to control for overfitting. Since maximizing empirical AUC of the ROC of a combined test is a complicated non-convex problem (Pepe et al. 2006), we explore the connection between the empirical AUC and a support vector machine (SVM). We cast the problem of maximizing predictive performance of a combined test as a penalized SVM problem and apply a re-parametrization to impose the hierarchical structure among variables. We also describe a penalized logistic regression variable selection procedure for structured variables and compare it with the ROC-based approaches. We use simulation studies based on real data to examine performance of the proposed methods. Finally we apply developed methods to design a structured screener to be used in primary care clinics to refer potentially psychotic patients for further specialty diagnostics and treatment.
PMCID: PMC3134557  PMID: 21175555
ROC curve; Support vector machine; Area under the curve; Disease screening; Hierarchical variable selection
25.  Three-dimensional morphological and signal intensity features for detection of intervertebral disc degeneration from magnetic resonance images 
Background and objectives
Advances in MRI hardware and sequences are continually increasing the amount and complexity of data such as those generated in high-resolution three-dimensional (3D) scanning of the spine. Efficient informatics tools offer considerable opportunities for research and clinically based analyses of magnetic resonance studies. In this work, we present and validate a suite of informatics tools for automated detection of degenerative changes in lumbar intervertebral discs (IVD) from both 3D isotropic and routine two-dimensional (2D) clinical T2-weighted MRI.
Materials and methods
An automated segmentation approach was used to extract morphological (traditional 2D radiological measures and novel 3D shape descriptors) and signal appearance (extracted from signal intensity histograms) features. The features were validated against manual reference, compared between 2D and 3D MRI scans and used for quantification and classification of IVD degeneration across magnetic resonance datasets containing IVD with early and advanced stages of degeneration.
Results and conclusions
Combination of the novel 3D-based shape and signal intensity features on 3D (area under receiver operating curve (AUC) 0.984) and 2D (AUC 0.988) magnetic resonance data deliver a significant improvement in automated classification of IVD degeneration, compared to the combination of previously used 2D radiological measurement and signal intensity features (AUC 0.976 and 0.983, respectively). Further work is required regarding the usefulness of 2D and 3D shape data in relation to clinical scores of lower back pain. The results reveal the potential of the proposed informatics system for computer-aided IVD diagnosis from MRI in large-scale research studies and as a possible adjunct for clinical diagnosis.
PMCID: PMC3822117  PMID: 23813538
Computer-aided diagnosis; Classification; Intervertebral discs; Disc degeneration; Statistical shape models; Morphology

Results 1-25 (1009264)