1.  LC–MS Profiling of N-Glycans Derived from Human Serum Samples for Biomarker Discovery in Hepatocellular Carcinoma 
Journal of Proteome Research  2014;13(11):4859-4868.
Defining clinically relevant biomarkers for early stage hepatocellular carcinoma (HCC) in a high-risk population of cirrhotic patients has potentially far-reaching implications for disease management and patient health. Changes in glycan levels have been associated with the onset of numerous diseases including cancer. In the present study, we used liquid chromatography coupled with electrospray ionization mass spectrometry (LC–ESI-MS) to analyze N-glycans in sera from 183 participants recruited in Egypt and the U.S. and identified candidate biomarkers that distinguish HCC cases from cirrhotic controls. N-Glycans were released from serum proteins and permethylated prior to the LC–ESI-MS analysis. Through two complementary LC–ESI-MS quantitation approaches, global profiling and targeted quantitation, we identified 11 N-glycans with statistically significant differences between HCC cases and cirrhotic controls. These glycans can further be categorized into four structurally related clusters, matching closely with the implications of important glycosyltransferases in cancer progression and metastasis. The results of this study illustrate the power of the integrative approach combining complementary LC–ESI-MS based quantitation approaches to investigate changes in N-glycan levels between HCC cases and patients with liver cirrhosis.
PMCID: PMC4227556  PMID: 25077556
cancer biomarker discovery; glycomics; hepatocellular carcinoma; liver cirrhosis; mass spectrometry; multiple reaction monitoring
2.  Assessing Validity of a Depression Screening Instrument in the Absence of a Gold Standard 
Annals of epidemiology  2014;24(7):527-531.
We evaluated the extent to which use of a hypothesized imperfect gold standard, the Composite International Diagnostic Interview (CIDI), biases the estimates of diagnostic accuracy of the Patient Health Questionnaire-9 (PHQ-9). We also evaluate how statistical correction can be used to address this bias.
The study was conducted among 926 adults where structured interviews were conducted to collect information about participants’ current major depressive disorder (MDD) using PHQ-9 and CIDI instruments. First, we evaluated the relative psychometric properties of PHQ-9 using CIDI as a gold standard. Next, we employed a Bayesian latent-class model to correct for the bias.
In comparison with CIDI, the relative sensitivity and specificity of the PHQ-9 for detecting MDD at a cut point of ≥10 were 53.1% (95%CI: 45.4–60.8%) and 77.5% (95%CI: 74.5–80.5%), respectively. Using a Bayesian latent-class model to correct for the bias arising from the use of an imperfect gold standard increased the sensitivity and specificity of PHQ-9 to 79.8% (95% Bayesian credible interval (BCI): 64.9–90.8%) and 79.1% (95%BCI:74.7–83.7%), respectively
Our results provided evidence that assessing diagnostic validity of mental health screening instrument, where application of a gold standard might not be available, can be accomplished by using appropriate statistical methods.
PMCID: PMC4104527  PMID: 24935465
3.  GC-MS Based Plasma Metabolomics for Identification of Candidate Biomarkers for Hepatocellular Carcinoma in Egyptian Cohort 
PLoS ONE  2015;10(6):e0127299.
This study evaluates changes in metabolite levels in hepatocellular carcinoma (HCC) cases vs. patients with liver cirrhosis by analysis of human blood plasma using gas chromatography coupled with mass spectrometry (GC-MS). Untargeted metabolomic analysis of plasma samples from participants recruited in Egypt was performed using two GC-MS platforms: a GC coupled to single quadruple mass spectrometer (GC-qMS) and a GC coupled to a time-of-flight mass spectrometer (GC-TOFMS). Analytes that showed statistically significant changes in ion intensities were selected using ANOVA models. These analytes and other candidates selected from related studies were further evaluated by targeted analysis in plasma samples from the same participants as in the untargeted metabolomic analysis. The targeted analysis was performed using the GC-qMS in selected ion monitoring (SIM) mode. The method confirmed significant changes in the levels of glutamic acid, citric acid, lactic acid, valine, isoleucine, leucine, alpha tocopherol, cholesterol, and sorbose in HCC cases vs. patients with liver cirrhosis. Specifically, our findings indicate up-regulation of metabolites involved in branched-chain amino acid (BCAA) metabolism. Although BCAAs are increasingly used as a treatment for cancer cachexia, others have shown that BCAA supplementation caused significant enhancement of tumor growth via activation of mTOR/AKT pathway, which is consistent with our results that BCAAs are up-regulated in HCC.
PMCID: PMC4452085  PMID: 26030804
4.  Identification of Functional Modules by Integration of Multiple Data Sources Using a Bayesian Network Classifier 
Prediction of functional modules is indispensable for detecting protein deregulation in human complex diseases such as cancer. Bayesian network (BN) is one of the most commonly used models to integrate heterogeneous data from multiple sources such as protein domain, interactome, functional annotation, genome-wide gene expression, and the literature.
Methods and Results
In this paper, we present a BN classifier that is customized to: 1) increase the ability to integrate diverse information from different sources, 2) effectively predict protein-protein interactions, 3) infer aberrant networks with scale-free and small world properties, and 4) group molecules into functional modules or pathways based on the primary function and biological features. Application of this model on discovering protein biomarkers of hepatocelluar carcinoma (HCC) leads to the identification of functional modules that provide insights into the mechanism of the development and progression of HCC. These functional modules include cell cycle deregulation, increased angiogenesis (e.g., vascular endothelial growth factor, blood vessel morphogenesis), oxidative metabolic alterations, and aberrant activation of signaling pathways involved in cellular proliferation, survival, and differentiation.
The discoveries and conclusions derived from our customized BN classifier are consistent with previously published results. The proposed approach for determining BN structure facilitates the integration of heterogeneous data from multiple sources to elucidate the mechanisms of complex diseases.
PMCID: PMC4079061  PMID: 24736851
systems biology; statistical model; genomics; genetics; bioinformatics; bioinformatics; functional genomics; gene expression; statistical model; computational biology; protein-protein interaction
5.  Sleep disturbances and quality of life in Sub-Saharan African migraineurs 
Although in the past decade occidental countries have increasingly recognized the personal and societal burden of migraine, it remains poorly understood in Africa. No study has evaluated the impact of sleep disturbances and the quality of life (QOL) in sub-Saharan Africans with migraine.
This was a cross-sectional study evaluating adults, ≥ 18 years of age, attending outpatient clinics in Ethiopia. Standardized questionnaires were utilized to collect demographic, headache, sleep, lifestyle, and QOL characteristics in all participants. Migraine classification was based on International Classification of Headache Disorders (ICHD)-II criteria. The Pittsburgh Sleep Quality Index (PSQI) and the World Health Organization Quality of Life (WHOQOL-BREF) questionnaires were utilized to assess sleep quality and QOL characteristics, respectively. Multivariable logistic regression models were fit to estimate adjusted odds ratio (OR) and 95% confidence intervals (95% CI).
Of 1,060 participants, 145 (14%) met ICHD-II criteria for migraine. Approximately three-fifth of the study participants (60.5%) were found to have poor sleep quality. After adjustments, migraineurs had over a two-fold increased odds (OR = 2.24, 95% CI 1.49-3.38) of overall poor sleep quality (PSQI global score >5) as compared with non-migraineurs. Compared with non-migraineurs, migraineurs were also more likely to experience short sleep duration (≤7 hours) (OR = 2.07, 95% CI 1.43-3.00), long sleep latency (≥30 min) (OR = 1.97, 95% CI 1.36-2.85), daytime dysfunction due to sleepiness (OR = 1.51, 95% CI 1.12-2.02), and poor sleep efficiency (<85%) (OR = 1.93, 95% CI 1.31-2.88). Similar to occidental countries, Ethiopian migraineurs reported a reduced QOL as compared to non-migraineurs. Specifically Ethiopian migraineurs were more likely to experience poor physical (OR = 1.56, 95% CI 1.08-2.25) and psychological health (OR = 1.75, 95% CI 1.20-2.56), as well as poor social relationships (OR = 1.56, 95% CI 1.08-2.25), and living environments (OR = 1.41, 95% CI 0.97-2.05) as compared to those without migraine.
Similar to occidental countries, migraine is highly prevalent among Ethiopians and is associated with poor sleep quality and a lower QOL. These findings support the need for physicians and policy makers to take action to improve the quality of headache care and access to treatment in Ethiopia.
PMCID: PMC4385231  PMID: 25902831
Migraine; Sleep quality; Quality of life; Ethiopia
The annals of applied statistics  2014;8(1):148-175.
A number of statistical models have been successfully developed for the analysis of high-throughput data from a single source, but few methods are available for integrating data from different sources. Here we focus on integrating gene expression levels with comparative genomic hybridization (CGH) array measurements collected on the same subjects. We specify a measurement error model that relates the gene expression levels to latent copy number states which, in turn, are related to the observed surrogate CGH measurements via a hidden Markov model. We employ selection priors that exploit the dependencies across adjacent copy number states and investigate MCMC stochastic search techniques for posterior inference. Our approach results in a unified modeling framework for simultaneously inferring copy number variants (CNV) and identifying their significant associations with mRNA transcripts abundance. We show performance on simulated data and illustrate an application to data from a genomic study on human cancer cell lines.
PMCID: PMC4018204  PMID: 24834139
Bayesian Hierarchical Models; Comparative Genomic Hybridization Arrays; Gene Expression; Hidden Markov Models; Measurement Error; Variable Selection
7.  Evaluation of Metabolite Biomarkers for Hepatocellular Carcinoma through Stratified Analysis by Gender, Race and Alcoholic Cirrhosis 
The effects of hepatocellular carcinoma (HCC) on liver metabolism and circulating metabolites have been subjected to continuing investigation. This study compares the levels of selected metabolites in sera of HCC cases versus patients with liver cirrhosis and evaluates the influence of gender, race, and alcoholic cirrhosis on the performance of the metabolites as candidate biomarkers for HCC.
Targeted quantitation of 15 metabolites is performed by selected research monitoring (SRM) in sera from 89 Egyptian subjects (40 HCC cases and 49 cirrhotic controls) and 110 US subjects (56 HCC cases and 54 cirrhotic controls). Logistic regression models are used to evaluate the ability of these metabolites in distinguishing HCC cases from cirrhotic controls. The influences of gender, race, and alcoholic cirrhosis on the performance of the metabolites are analyzed by stratified logistic regression.
Two metabolites are selected based on their significance to both cohorts. While both metabolites discriminate HCC cases from cirrhotic controls in males and Caucasians, they are insignificant in females and African Americans. One metabolite is significant in patients with alcoholic cirrhosis and the other in non-alcoholic cirrhosis.
The study demonstrates the potential of two metabolites as candidate biomarkers for HCC by combining them with α-fetoprotein and gender. Stratified statistical analyses reveal that gender, race, and alcoholic cirrhosis affect the relative levels of small molecules in serum.
The findings of this study contribute to a better understanding of the influence of gender, race, and alcoholic cirrhosis in investigating small molecules as biomarkers for HCC.
PMCID: PMC3947117  PMID: 24186894
Mass spectrometry; metabolomics; cancer biomarker; liver cirrhosis; health disparity
8.  Pathway and Network Approaches for Identification of Cancer Signature Markers from Omics Data 
Journal of Cancer  2015;6(1):54-65.
The advancement of high throughput omic technologies during the past few years has made it possible to perform many complex assays in a much shorter time than the traditional approaches. The rapid accumulation and wide availability of omic data generated by these technologies offer great opportunities to unravel disease mechanisms, but also presents significant challenges to extract knowledge from such massive data and to evaluate the findings. To address these challenges, a number of pathway and network based approaches have been introduced. This review article evaluates these methods and discusses their application in cancer biomarker discovery using hepatocellular carcinoma (HCC) as an example.
PMCID: PMC4278915  PMID: 25553089
Biological pathways; system biology; high-throughput omics data; cancer biomarker.
9.  Construct Validity and Factor Structure of the Pittsburgh Sleep Quality Index and Epworth Sleepiness Scale in a Multi-National Study of African, South East Asian and South American College Students 
PLoS ONE  2014;9(12):e116383.
The Pittsburgh Sleep Quality Index (PSQI) and the Epworth Sleepiness Scale (ESS) are questionnaires used to assess sleep quality and excessive daytime sleepiness in clinical and population-based studies. The present study aimed to evaluate the construct validity and factor structure of the PSQI and ESS questionnaires among young adults in four countries (Chile, Ethiopia, Peru and Thailand).
A cross-sectional study was conducted among 8,481 undergraduate students. Students were invited to complete a self-administered questionnaire that collected information about lifestyle, demographic, and sleep characteristics. In each country, the construct validity and factorial structures of PSQI and ESS questionnaires were tested through exploratory and confirmatory factor analyses (EFA and CFA).
The largest component-total correlation coefficient for sleep quality as assessed using PSQI was noted in Chile (r = 0.71) while the smallest component-total correlation coefficient was noted for sleep medication use in Peru (r = 0.28). The largest component-total correlation coefficient for excessive daytime sleepiness as assessed using ESS was found for item 1 (sitting/reading) in Chile (r = 0.65) while the lowest item-total correlation was observed for item 6 (sitting and talking to someone) in Thailand (r = 0.35). Using both EFA and CFA a two-factor model was found for PSQI questionnaire in Chile, Ethiopia and Thailand while a three-factor model was found for Peru. For the ESS questionnaire, we noted two factors for all four countries
Overall, we documented cross-cultural comparability of sleep quality and excessive daytime sleepiness measures using the PSQI and ESS questionnaires among Asian, South American and African young adults. Although both the PSQI and ESS were originally developed as single-factor questionnaires, the results of our EFA and CFA revealed the multi- dimensionality of the scales suggesting limited usefulness of the global PSQI and ESS scores to assess sleep quality and excessive daytime sleepiness.
PMCID: PMC4281247  PMID: 25551586
10.  Placental Genome and Maternal-Placental Genetic Interactions: A Genome-Wide and Candidate Gene Association Study of Placental Abruption 
PLoS ONE  2014;9(12):e116346.
While available evidence supports the role of genetics in the pathogenesis of placental abruption (PA), PA-related placental genome variations and maternal-placental genetic interactions have not been investigated. Maternal blood and placental samples collected from participants in the Peruvian Abruptio Placentae Epidemiology study were genotyped using Illumina’s Cardio-Metabochip platform. We examined 118,782 genome-wide SNPs and 333 SNPs in 32 candidate genes from mitochondrial biogenesis and oxidative phosphorylation pathways in placental DNA from 280 PA cases and 244 controls. We assessed maternal-placental interactions in the candidate gene SNPS and two imprinted regions (IGF2/H19 and C19MC). Univariate and penalized logistic regression models were fit to estimate odds ratios. We examined the combined effect of multiple SNPs on PA risk using weighted genetic risk scores (WGRS) with repeated ten-fold cross-validations. A multinomial model was used to investigate maternal-placental genetic interactions. In placental genome-wide and candidate gene analyses, no SNP was significant after false discovery rate correction. The top genome-wide association study (GWAS) hits were rs544201, rs1484464 (CTNNA2), rs4149570 (TNFRSF1A) and rs13055470 (ZNRF3) (p-values: 1.11e-05 to 3.54e-05). The top 200 SNPs of the GWAS overrepresented genes involved in cell cycle, growth and proliferation. The top candidate gene hits were rs16949118 (COX10) and rs7609948 (THRB) (p-values: 6.00e-03 and 8.19e-03). Participants in the highest quartile of WGRS based on cross-validations using SNPs selected from the GWAS and candidate gene analyses had a 8.40-fold (95% CI: 5.8–12.56) and a 4.46-fold (95% CI: 2.94–6.72) higher odds of PA compared to participants in the lowest quartile. We found maternal-placental genetic interactions on PA risk for two SNPs in PPARG (chr3∶12313450 and chr3∶12412978) and maternal imprinting effects for multiple SNPs in the C19MC and IGF2/H19 regions. Variations in the placental genome and interactions between maternal-placental genetic variations may contribute to PA risk. Larger studies may help advance our understanding of PA pathogenesis.
PMCID: PMC4280220  PMID: 25549360
11.  Multi-profile Bayesian alignment model for LC-MS data analysis with integration of internal standards 
Bioinformatics  2013;29(21):2774-2780.
Motivation: Liquid chromatography-mass spectrometry (LC-MS) has been widely used for profiling expression levels of biomolecules in various ‘-omic’ studies including proteomics, metabolomics and glycomics. Appropriate LC-MS data preprocessing steps are needed to detect true differences between biological groups. Retention time (RT) alignment, which is required to ensure that ion intensity measurements among multiple LC-MS runs are comparable, is one of the most important yet challenging preprocessing steps. Current alignment approaches estimate RT variability using either single chromatograms or detected peaks, but do not simultaneously take into account the complementary information embedded in the entire LC-MS data.
Results: We propose a Bayesian alignment model for LC-MS data analysis. The alignment model provides estimates of the RT variability along with uncertainty measures. The model enables integration of multiple sources of information including internal standards and clustered chromatograms in a mathematically rigorous framework. We apply the model to LC-MS metabolomic, proteomic and glycomic data. The performance of the model is evaluated based on ground-truth data, by measuring correlation of variation, RT difference across runs and peak-matching performance. We demonstrate that Bayesian alignment model improves significantly the RT alignment performance through appropriate integration of relevant information.
Availability and implementation: MATLAB code, raw and preprocessed LC-MS data are available at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3799465  PMID: 24013927
12.  Daytime Sleepiness, Circadian Preference, Caffeine Consumption and Use of Other Stimulants among Thai College Students 
We conducted this study to evaluate the prevalence of daytime sleepiness and evening chronotype, and to assess the extent to which both are associated with the use of caffeinated stimulants among 3,000 Thai college students. Demographic and behavioral characteristics were collected using a self-administered questionnaire. The Epworth Sleepiness Scale and the Horne and Ostberg Morningness-Eveningness Questionnaire were used to evaluate prevalence of daytime sleepiness and circadian preference. Multivariable logistic regression models were used to evaluate the association between sleep disorders and consumption of caffeinated beverages. Overall, the prevalence of daytime sleepiness was 27.9 % (95% CI: 26.2–29.5%) while the prevalence of evening chronotype was 13% (95% CI: 11.8–14.2%). Students who use energy drinks were more likely to be evening types. For instance, the use of M100/M150 energy drinks was associated with a more than 3-fold increased odds of evening chronotype (OR 3.50; 95% CI 1.90–6.44), while Red Bull users were more than twice as likely to have evening chronotype (OR 2.39; 95% CI 1.02–5.58). Additionally, those who consumed any energy drinks were more likely to be daytime sleepers. For example, Red Bull (OR 1.72; 95% CI 1.08–2.75) or M100/M150 (OR 1.52; 95% CI 1.10–2.11) consumption was associated with increased odds of daytime sleepiness. Our findings emphasize the importance of implementing educational and prevention programs targeted toward improving sleep hygiene and reducing the consumption of energy drinks among young adults
PMCID: PMC4209847  PMID: 25356368
13.  Sleep Quality and Sleep Patterns in Relation to Consumption of Energy Drinks, Caffeinated Beverages and Other Stimulants among Thai College Students 
Sleep & breathing = Schlaf & Atmung  2012;17(3):1017-1028.
Poor sleep and heavy use of caffeinated beverages have been implicated as risk factors for a number of adverse health outcomes. Caffeine consumption and use of other stimulants are common among college students globally. However, to our knowledge, no studies have examined the influence of caffeinated beverages on sleep quality of college students in Southeast Asian populations. We conducted this study to evaluate the patterns of sleep quality; and to examine the extent to which poor sleep quality is associated with consumption of energy drinks, caffeinated beverages and other stimulants among 2,854 Thai college students.
A questionnaire was administered to ascertain demographic and behavioral characteristics. The Pittsburgh Sleep Quality Index (PSQI) was used to assess sleep habits and quality. Chi-square tests and multivariate logistic regression models were used to identify statistically significant associations.
Overall, the prevalence of poor sleep quality was found to be 48.1%. A significant percent of students used stimulant beverages (58.0%). Stimulant use (OR 1.50; 95%CI 1.28-1.77) was found to be statistically significant and positively associated with poor sleep quality. Alcohol consumption (OR 3.10; 95% CI 1.72-5.59) and cigarette smoking (OR 1.43; 95% CI 1.02-1.98) also had statistically significant association with increased daytime dysfunction. In conclusion, stimulant use is common among Thai college students and is associated with several indices of poor sleep quality.
Our findings underscore the need to educate students on the importance of sleep and the influences of dietary and lifestyle choices on their sleep quality and overall health.
PMCID: PMC3621002  PMID: 23239460
Sleep; Energy Drinks; Alcohol; Caffeine; Students; Cigarettes
14.  Daytime Sleepiness, Circadian Preference, Caffeine Consumption and Khat Use among College Students in Ethiopia 
Journal of sleep disorders-- treatment & care  2013;3(1):10.4172/2325-9639.1000130.
To estimate the prevalence of daytime sleepiness and circadian preferences, and to examine the extent to which caffeine consumption and Khat (a herbal stimulant) use are associated with daytime sleepiness and evening chronotype among Ethiopian college students.
A cross-sectional study was conducted among 2,410 college students. A self-administered questionnaire was used to collect information about sleep, behavioral risk factors such as caffeinated beverages, tobacco, alcohol, and Khat consumption. Daytime sleepiness and chronotype were assessed using the Epworth Sleepiness Scale (ESS) and the Horne & Ostberg Morningness /Eveningness Questionnaire (MEQ), respectively. Linear and logistic regression models were used to evaluate associations.
Daytime sleepiness (ESS≥10) was present in 26% of the students (95% CI: 24.4–27.8%) with 25.9% in males and 25.5% in females. A total of 30 (0.8%) students were classified as evening chronotypes (0.7% in females and 0.9% in males). Overall, Overall, Khat consumption, excessive alcohol use and cigarette smoking status were associated with evening chronotype. Use of any caffeinated beverages (OR=2.18; 95%CI: 0.82–5.77) and Khat consumption (OR=7.43; 95%CI: 3.28–16.98) increased the odds of evening chronotype.
The prevalence of daytime sleepiness among our study population was high while few were classified as evening chronotypes. We also found increased odds of evening chronotype with caffeine consumption and Khat use amongst Ethiopian college students. Prospective cohort studies that examine the effects of caffeinated beverages and Khat use on sleep disorders among young adults are needed.
PMCID: PMC4015623  PMID: 24818170
15.  Profile-Based LC-MS Data Alignment—A Bayesian Approach 
A Bayesian alignment model (BAM) is proposed for alignment of liquid chromatography-mass spectrometry (LC-MS) data. BAM belongs to the category of profile-based approaches, which are composed of two major components: a prototype function and a set of mapping functions. Appropriate estimation of these functions is crucial for good alignment results. BAM uses Markov chain Monte Carlo (MCMC) methods to draw inference on the model parameters and improves on existing MCMC-based alignment methods through 1) the implementation of an efficient MCMC sampler and 2) an adaptive selection of knots. A block Metropolis-Hastings algorithm that mitigates the problem of the MCMC sampler getting stuck at local modes of the posterior distribution is used for the update of the mapping function coefficients. In addition, a stochastic search variable selection (SSVS) methodology is used to determine the number and positions of knots. We applied BAM to a simulated data set, an LC-MS proteomic data set, and two LC-MS metabolomic data sets, and compared its performance with the Bayesian hierarchical curve registration (BHCR) model, the dynamic time-warping (DTW) model, and the continuous profile model (CPM). The advantage of applying appropriate profile-based retention time correction prior to performing a feature-based approach is also demonstrated through the metabolomic data sets.
PMCID: PMC3993096  PMID: 23929872
Alignment; Bayesian inference; block Metropolis-Hastings algorithm; liquid chromatography-mass spectrometry (LC-MS); Markov chain Monte Carlo (MCMC); stochastic search variable selection (SSVS)
16.  LC-MS Based Serum Metabolomics for Identification of Hepatocellular Carcinoma Biomarkers in Egyptian Cohort 
Journal of proteome research  2012;11(12):5914-5923.
Although hepatocellular carcinoma (HCC) has been subjected to continuous investigation and its symptoms are well known, early-stage diagnosis of this disease remains difficult and the survival rate after diagnosis is typically very low (3–5%). Early and accurate detection of metabolic changes in the sera of patients with liver cirrhosis can help improve the prognosis of HCC and lead to a better understanding of its mechanism at the molecular level, thus providing patients with in-time treatment of the disease. In this study, we compared metabolite levels in sera of 40 HCC patients and 49 cirrhosis patients from Egypt by using ultra-performance liquid chromatography coupled with quadrupole time-of-flight mass spectrometer (UPLC-QTOF MS). Following data preprocessing, the most relevant ions in distinguishing HCC cases from cirrhotic controls are selected by statistical methods. Putative metabolite identifications for these ions are obtained through mass-based database search. The identities of some of the putative identifications are verified by comparing their MS/MS fragmentation patterns and retention times with those from authentic compounds. Finally, the serum samples are reanalyzed for quantitation of selected metabolites along with other metabolites previously selected as candidate biomarkers of HCC. This quantitation was performed using isotope dilution by selected reaction monitoring (SRM) on a triple quadrupole linear ion trap (QqQLIT) coupled to UPLC. Statistical analysis of the UPLC-QTOF data identified 274 monoisotopic ion masses with statistically significant differences in ion intensities between HCC cases and cirrhotic controls. Putative identifications were obtained for 158 ions by mass based search against databases. We verified the identities of selected putative identifications including glycholic acid (GCA), glycodeoxycholic acid (GDCA), 3beta, 6beta-dihydroxy-5beta-cholan-24-oic acid, oleoyl carnitine, and Phe-Phe. SRM-based quantitation confirmed significant differences between HCC and cirrhotic controls in metabolite levels of bile acid metabolites, long chain carnitines and small peptide. Our study provides useful insight into appropriate experimental design and computational methods for serum biomarker discovery using LC-MS/MS based metabolomics. This study has led to the identification of candidate biomarkers with significant changes in metabolite levels between HCC cases and cirrhotic controls. This is the first MS-based metabolic biomarker discovery study on Egyptian subjects that led to the identification of candidate metabolites that discriminate early stage HCC from patients with liver cirrhosis.
PMCID: PMC3719870  PMID: 23078175
Hepatocellular carcinoma; liver cirrhosis; metabolic biomarker; cancer biomarker discovery; selected reaction monitoring; isotope dilution; mass spectrometry
17.  Gaussian process regression model for normalization of LC-MS data using scan-level information 
Proteome Science  2013;11(Suppl 1):S13.
Differences in sample collection, biomolecule extraction, and instrument variability introduce bias to data generated by liquid chromatography coupled with mass spectrometry (LC-MS). Normalization is used to address these issues. In this paper, we introduce a new normalization method using the Gaussian process regression model (GPRM) that utilizes information from individual scans within an extracted ion chromatogram (EIC) of a peak. The proposed method is particularly applicable for normalization based on analysis order of LC-MS runs. Our method uses measurement variabilities estimated through LC-MS data acquired from quality control samples to correct for bias caused by instrument drift. Maximum likelihood approach is used to find the optimal parameters for the fitted GPRM. We review several normalization methods and compare their performance with GPRM.
To evaluate the performance of different normalization methods, we consider LC-MS data from a study where metabolomic approach is utilized to discover biomarkers for liver cancer. The LC-MS data were acquired by analysis of sera from liver cancer patients and cirrhotic controls. In addition, LC-MS runs from a quality control (QC) sample are included to assess the run to run variability and to evaluate the ability of various normalization method in reducing this undesired variability. Also, ANOVA models are applied to the normalized LC-MS data to identify ions with intensity measurements that are significantly different between cases and controls.
One of the challenges in using label-free LC-MS for quantitation of biomolecules is systematic bias in measurements. Several normalization methods have been introduced to overcome this issue, but there is no universally applicable approach at the present time. Each data set should be carefully examined to determine the most appropriate normalization method. We review here several existing methods and introduce the GPRM for normalization of LC-MS data. Through our in-house data set, we show that the GPRM outperforms other normalization methods considered here, in terms of decreasing the variability of ion intensities among quality control runs.
PMCID: PMC3908948  PMID: 24564985
Extracted ion chromatogram (EIC); Evaluation; Gaussian process; Liquid chromatography-mass spectrometry (LC-MS); Normalization; Quality control (QC); Scan-level data
18.  Utilization of Metabolomics to Identify Serum Biomarkers for Hepatocellular Carcinoma in Patients with Liver Cirrhosis 
Analytica chimica acta  2012;743C:90-100.
Characterizing the metabolic changes pertaining to hepatocellular carcinoma (HCC) in patients with liver cirrhosis is believed to contribute towards early detection, treatment, and understanding of the molecular mechanisms of HCC. In this study, we compare metabolite levels in sera of 78 HCC cases with 184 cirrhotic controls by using ultra performance liquid chromatography coupled with a hybrid quadrupole time-of-flight mass spectrometry (UPLC-QTOF MS). Following data preprocessing, the most relevant ions in distinguishing HCC cases from patients with cirrhosis are selected by parametric and non-parametric statistical methods. Putative metabolite identifications for these ions are obtained through mass-based database search. Verification of the identities of selected metabolites is conducted by comparing their MS/MS fragmentation patterns and retention time with those from authentic compounds. Quantitation of these metabolites is performed in a subset of the serum samples (10 HCC and 10 cirrhosis) using isotope dilution by selected reaction monitoring (SRM) on triple quadrupole linear ion trap (QqQLIT) and triple quadrupole (QqQ) mass spectrometers. The results of this analysis confirm that metabolites involved in sphingolipid metabolism and phospholipid catabolism such as sphingosine-1-phosphate (S-1-P) and lysophosphatidylcholine (lysoPC 17:0) are up-regulated in sera of HCC vs. those with liver cirrhosis. Down-regulated metabolites include those involved in bile acid biosynthesis (specifically cholesterol metabolism) such as glycochenodeoxycholic acid 3-sulfate (3-sulfo-GCDCA), glycocholic acid (GCA), glycodeoxycholic acid (GDCA), taurocholic acid (TCA), and taurochenodeoxycholate (TCDCA). These results provide useful insights into HCC biomarker discovery utilizing metabolomics as an efficient and cost-effective platform. Our work shows that metabolomic profiling is a promising tool to identify candidate metabolic biomarkers for early detection of HCC cases in high risk population of cirrhotic patients.
PMCID: PMC3419576  PMID: 22882828
Metabolomics; biomarkers; liquid chromatography-mass spectrometry; hepatocellular carcinoma; selected reaction monitoring
The annals of applied statistics  2011;5(3):10.1214/11-AOAS463.
The vast amount of biological knowledge accumulated over the years has allowed researchers to identify various biochemical interactions and define different families of pathways. There is an increased interest in identifying pathways and pathway elements involved in particular biological processes. Drug discovery efforts, for example, are focused on identifying biomarkers as well as pathways related to a disease. We propose a Bayesian model that addresses this question by incorporating information on pathways and gene networks in the analysis of DNA microarray data. Such information is used to define pathway summaries, specify prior distributions, and structure the MCMC moves to fit the model. We illustrate the method with an application to gene expression data with censored survival outcomes. In addition to identifying markers that would have been missed otherwise and improving prediction accuracy, the integration of existing biological knowledge into the analysis provides a better understanding of underlying molecular processes.
PMCID: PMC3650864  PMID: 23667412
Bayesian variable selection; gene expression; Markov chain Monte Carlo; Markov random field prior; pathway selection
20.  Inflammation induces fibrinogen nitration in experimental human endotoxemia 
Free radical biology & medicine  2009;47(8):1140-1146.
Elevated plasma fibrinogen is a prothrombotic risk factor for cardiovascular disease (CVD). Recent small studies report that fibrinogen oxidative modifications, specifically tyrosine residue nitration, can occur in inflammatory states and may modify fibrinogen function. HDL cholesterol is inversely related to CVD and suggested to reduce the oxidation of LDL cholesterol, but whether these antioxidant functions extend to fibrinogen modifications is unknown. We used a recently validated ELISA to quantify nitrated fibrinogen during experimental human endotoxemia (N=23) and in a cohort of healthy adults (N=361) who were characterized for inflammatory and HDL parameters as well as subclinical atherosclerosis measures, carotid artery intima-medial thickness (IMT) and coronary artery calcification (CAC). Fibrinogen nitration increased following endotoxemia and directly correlated with accelerated ex vivo plasma clotting velocity. In the observational cohort, nitrated fibrinogen was associated with levels of CRP and serum amyloid A. Nitrated fibrinogen levels were not lower with increasing HDL cholesterol and did not associate with IMT and CAC. In humans, fibrinogen nitration was induced during inflammation and was correlated with markers of inflammation and clotting function but not HDL cholesterol or subclinical atherosclerosis in our modest sample. Inflammation-induced fibrinogen nitration may be a risk factor for promoting CVD events.
PMCID: PMC3651370  PMID: 19631267
Fibrinogen; Nitration; Intima-medial thickness; Coronary artery calcification; High-density lipoprotein (HDL)
21.  The Epidemiology of Sleep Quality, Sleep Patterns, Consumption of Caffeinated Beverages, and Khat Use among Ethiopian College Students 
Sleep Disorders  2012;2012:583510.
Objective. To evaluate sleep habits, sleep patterns, and sleep quality among Ethiopian college students; and to examine associations of poor sleep quality with consumption of caffeinated beverages and other stimulants. Methods. A total of 2,230 undergraduate students completed a self-administered comprehensive questionnaire which gathered information about sleep complaints, sociodemographic and lifestyle characteristics,and theuse of caffeinated beverages and khat. We used multivariable logistic regression procedures to estimate odds ratios for the associations of poor sleep quality with sociodemographic and behavioral factors. Results. Overall 52.7% of students were classified as having poor sleep quality (51.8% among males and 56.9% among females). In adjusted multivariate analyses, caffeine consumption (OR = 1.55; 95% CI: 1.25–1.92), cigarette smoking (OR = 1.68; 95% CI: 1.06–2.63), and khat use (OR = 1.72, 95% CI: 1.09–2.71) were all associated with increased odds of long-sleep latency (>30 minutes). Cigarette smoking (OR = 1.74; 95% CI: 1.11–2.73) and khat consumption (OR = 1.91; 95% CI: 1.22–3.00) were also significantly associated with poor sleep efficiency (<85%), as well as with increased use of sleep medicine. Conclusion. Findings from the present study demonstrate the high prevalence of poor sleep quality and its association with stimulant use among college students. Preventive and educational programs for students should include modules that emphasize the importance of sleep and associated risk factors.
PMCID: PMC3581089  PMID: 23710363
22.  Probabilistic Mixture Regression Models for Alignment of LC-MS Data 
A novel framework of a probabilistic mixture regression model (PMRM) is presented for alignment of liquid chromatography-mass spectrometry (LC-MS) data with respect to both retention time (RT) and mass-to-charge ratio (m/z). The expectation maximization algorithm is used to estimate the joint parameters of spline-based mixture regression models and prior transformation density models. The latter accounts for the variability in RT points, m/z values, and peak intensities. The applicability of PMRM for alignment of LC-MS data is demonstrated through three datasets. The performance of PMRM is compared with other alignment approaches including dynamic time warping, correlation optimized warping, and continuous profile model in terms of coefficient variation of replicate LC-MS runs and accuracy in detecting differentially abundant peptides/proteins.
PMCID: PMC3006656  PMID: 20837998
liquid chromatography; mass spectrometry; mixed-regression model; expectation-maximization
23.  Analysis of Normal-Tumour Tissue Interaction in Tumours: Prediction of Prostate Cancer Features from the Molecular Profile of Adjacent Normal Cells 
PLoS ONE  2011;6(3):e16492.
Statistical modelling, in combination with genome-wide expression profiling techniques, has demonstrated that the molecular state of the tumour is sufficient to infer its pathological state. These studies have been extremely important in diagnostics and have contributed to improving our understanding of tumour biology. However, their importance in in-depth understanding of cancer patho-physiology may be limited since they do not explicitly take into consideration the fundamental role of the tissue microenvironment in specifying tumour physiology. Because of the importance of normal cells in shaping the tissue microenvironment we formulate the hypothesis that molecular components of the profile of normal epithelial cells adjacent the tumour are predictive of tumour physiology. We addressed this hypothesis by developing statistical models that link gene expression profiles representing the molecular state of adjacent normal epithelial cells to tumour features in prostate cancer. Furthermore, network analysis showed that predictive genes are linked to the activity of important secreted factors, which have the potential to influence tumor biology, such as IL1, IGF1, PDGF BB, AGT, and TGFβ.
PMCID: PMC3068146  PMID: 21479216
24.  A Bayesian Based Functional Mixed-Effects Model for Analysis of LC-MS Data 
A Bayesian multilevel functional mixed-effects model with group specific random-effects is presented for analysis of liquid chromatography-mass spectrometry (LC-MS) data. The proposed framework allows alignment of LC-MS spectra with respect to both retention time (RT) and mass-to-charge ratio (m/z). Affine transformations are incorporated within the model to account for any variability along the RT and m/z dimensions. Simultaneous posterior inference of all unknown parameters is accomplished via Markov chain Monte Carlo method using the Gibbs sampling algorithm. The proposed approach is computationally tractable and allows incorporating prior knowledge in the inference process. We demonstrate the applicability of our approach for alignment of LC-MS spectra based on total ion count profiles derived from two LC-MS datasets.
PMCID: PMC2896560  PMID: 19963938
25.  Adipokines, Insulin Resistance and Coronary Artery Calcification 
We evaluated the hypothesis that plasma levels of adiponectin and leptin are independently but oppositely associated with coronary calcification (CAC), a measure of subclinical atherosclerosis. In addition, we assessed which biomarkers of adiposity and insulin resistance are the strongest predictors of CAC beyond traditional risk factors, the metabolic syndrome and plasma C-reactive protein (CRP).
Adipokines are fat-secreted biomolecules with pleiotropic actions that converge in diabetes and cardiovascular disease.
We examined the association of plasma adipocytokines with CAC in 860 asymptomatic, non-diabetic participants in the Study of Inherited Risk of Coronary Atherosclerosis (SIRCA).
Plasma adiponectin and leptin levels had opposite and distinct associations with adiposity, insulin resistance and inflammation. Plasma leptin was positively (top vs. bottom quartile) associated with higher CAC after adjusting for age, gender, traditional risk factors and Framingham Risk Scores (FRS) [tobit regression ratio 2.42 (95% CI 1.48–3.95, p=0.002)] and further adjusting for metabolic syndrome and CRP [ratio 2.31 (95% CI 1.36–3.94, p=0.002)]. In contrast, adiponectin levels were not associated with CAC. Comparative analyses suggested that levels of leptin, IL-6 and sol-TNFR2 as well as HOMA-IR predicted CAC scores but only leptin and HOMA-IR provided value beyond risk factors, the metabolic syndrome and CRP.
In SIRCA, while both leptin and adiponectin levels were associated with metabolic and inflammatory markers, only leptin was a significant independent predictor of CAC. Of several metabolic markers, leptin and the HOMA-IR index had the most robust, independent associations with CAC.
Condensed Abstract
Adipokines are fat-secreted biomolecules with pleiotropic actions and represent novel markers for cardiovascular risk. We examined the association of plasma adipocytokines with CAC in 860 asymptomatic, non-diabetic Caucasians. Leptin was positively (top vs. bottom quartile) associated with higher CAC even after adjustment for age, gender, traditional risk factors, Framingham Risk Score, metabolic syndrome, and CRP [ratio 2.31 (95% CI 1.36–3.94, p=0.002)]. Adiponectin levels were not associated with CAC. Comparative analyses suggested that levels of leptin, IL-6 and sol-TNFR2 as well as HOMA-IR predicted CAC scores, but only leptin and HOMA-IR provided value beyond risk factors, the metabolic syndrome and CRP.
PMCID: PMC2853595  PMID: 18617073
Adiponectin; Leptin; Coronary Artery Calcification; Atherosclerosis; Inflammation

