1.  OMICS in Ecology: Systems Level Analyses of Halobacterium salinarum Reveal Large-scale Temperature-Mediated Changes and a Requirement of CctA for Thermotolerance 
Halobacterium salinarum is an extremely halophilic archaeon that inhabits high-salinity aqueous environments in which the temperature can range widely, both daily and seasonally. An OMICS analysis of the 37°C and 49°C proteomes and transcriptomes for revealing the biomodules affected by temperature is reported here. Analysis of those genes/proteins displaying dramatic changes provided a clue to the coordinated changes in the expression of genes within five arCOG biological clusters. When proteins that exhibited minor changes in their spectral counts and insignificant p values were also examined, the apparent influence of the elevated temperatures on conserved chaperones, metabolism, translation, and other biomodules became more obvious. For instance, increases in all eight conserved chaperones and three arginine deiminase pathway enzymes and reductions in most tricarboxylic acid (TCA) cycle enzymes and ribosomal proteins suggest that complex system responses occurred as the temperature changed. When the requirement for the four proteins that showed the greatest induction at 49°C was analyzed, only CctA (chaperonin subunit α), but not Hsp5, DpsA, or VNG1187G, was essential for thermotolerance. Environmental stimuli and other perturbations may induce many minor gene expression changes. Simultaneous analysis of the genes exhibiting dramatic or minor changes in expression may facilitate the detection of systems level responses.
2.  miR-429 Identified by Dynamic Transcriptome Analysis Is a New Candidate Biomarker for Colorectal Cancer Prognosis 
Colorectal cancer (CRC) is a common malignant gastrointestinal cancer. Efforts for preventive and personalized medicine have intensified in the last decade with attention to novel forms of biomarkers. In the present study, microRNA and genetic analyses were performed in tandem for differential transcriptome profiling between primary tumors with or without nodes or distant metastases. Serial Test Cluster (STC) analysis demonstrated that 20 genes and two microRNAs showed distinctive expression patterns associated with the tumor, node, and metastasis (TNM) stage. The selected target genes were characterized by GO and Pathway analysis. A microRNA-target gene network analysis showed that miR-429 resided in the center of the network, indicating that miR-429 might serve important roles in the development of CRC. Real-time PCR and tissue microarrays showed that miR-429 had a dynamic expression pattern during the CRC progression stage, and was significantly downregulated in stage II and stage III clinical progression. The low expression of miR-429 was correlated with poor prognosis for CRC. Taken together, miR-429 warrant further clinical translation research as a candidate biomarker for CRC prognosis. Additional downstream targets and attendant gene function also need to be discerned to design a sound critical path to personalized medicine for persons susceptible to, or diagnosed with CRC.
3.  Toward More Transparent and Reproducible Omics Studies Through a Common Metadata Checklist and Data Publications 
Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies, omics studies are becoming increasingly prevalent; yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research. These essential steps require consistent generation, capture, and distribution of metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologies and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. The omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and, importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement.
5.  Race and Sex Differences in Small-Molecule Metabolites and Metabolic Hormones in Overweight and Obese Adults 
In overweight/obese individuals, cardiometabolic risk factors differ by race and sex categories. Small-molecule metabolites and metabolic hormone levels might also differ across these categories and contribute to risk factor heterogeneity. To explore this possibility, we performed a cross-sectional analysis of fasting plasma levels of 69 small-molecule metabolites and 13 metabolic hormones in 500 overweight/obese adults who participated in the Weight Loss Maintenance trial. Principal-components analysis (PCA) was used for reduction of metabolite data. Race and sex-stratified comparisons of metabolite factors and metabolic hormones were performed. African Americans represented 37.4% of the study participants, and females 63.0%. Of thirteen metabolite factors identified, three differed by race and sex: levels of factor 3 (branched-chain amino acids and related metabolites, p<0.0001), factor 6 (long-chain acylcarnitines, p<0.01), and factor 2 (medium-chain dicarboxylated acylcarnitines, p<0.0001) were higher in males vs. females; factor 6 levels were higher in Caucasians vs. African Americans (p<0.0001). Significant differences were also observed in hormones regulating body weight homeostasis. Among overweight/obese adults, there are significant race and sex differences in small-molecule metabolites and metabolic hormones; these differences may contribute to risk factor heterogeneity across race and sex subgroups and should be considered in future investigations with circulating metabolites and metabolic hormones.
6.  Integrative Omics Approach Identifies Interleukin-16 as a Biomarker of Emphysema 
Interleukin-16 (IL-16) is a multifunctional cytokine that has been associated with autoimmune and allergic diseases. To investigate comprehensively whether IL-16 is also associated with chronic obstructive pulmonary disease (COPD) and emphysema, we performed an integrated analysis of multiple “omics” data. Over 500 subjects participating in the COPDGene® study donated blood and were clinically characterized and genetically profiled. IL-16 mRNA levels were measured in peripheral blood mononuclear cells (PBMC), and protein levels were measured in fresh frozen plasma. A multivariate analysis found plasma IL-16 positively associated with age and body mass index, and negatively associated with current smoking and emphysema in the upper lobes. PBMC IL-16 expression was positively associated with gender and a composite score for airflow obstruction, emphysema, and gas trapping. Whole-genome expression quantitative trait locus (eQTL) analysis identified a novel IL-16 missense SNP (rs11556218) associated with lower IL-16 in plasma. In summary, an integrated “omics” analysis in a very large cohort identified an association between decreased IL-16 and emphysema and discovered a novel IL-16 cis-eQTL. Thus IL-16 plasma levels and IL-16 genotyping may be useful in a personalized medicine approach for lung disease.
7.  Systems Biology Analysis of the Endocannabinoid System Reveals a Scale-free Network with Distinct Roles for Anandamide and 2-Arachidonoylglycerol 
We represented the endocannabinoid system (ECS) as a biological network, where ECS molecules are the nodes (123) and their interactions the links (189). ECS network follows a scale-free topology, which confers robustness against random damage, easy navigability, and controllability. Network topological parameters, such as clustering coefficient (i.e., how the nodes form clusters) of 0.0009, network diameter (the longest shortest path among all pairs of nodes) of 12, averaged number of neighbors (the mean number of connections per node) of 3.073, and characteristic path length (the expected distance between two connected nodes) of 4.715, suggested that molecular messages are transferred through the ECS network quickly and specifically. Interestingly, ∼75% of nodes are located on, or are active at the level of, the cell membrane. The hubs of ECS network are anandamide (AEA) and 2-arachidonoylglycerol (2-AG), which have also the highest value of betweeness centrality, and their removal causes network collapse into multiple disconnected components. Importantly, AEA is a ubiquitous player while 2-AG plays more restricted actions. Instead, the product of their degradation, arachidonic acid, and their hydrolyzing enzyme, fatty acid amide hydrolase, FAAH, have a marginal impact on ECS network, indeed their removal did not significantly affect its topology.
8.  Multi-Locus Candidate Gene Analyses of Lipid Levels in a Pediatric Turkish Cohort: Lessons Learned on LPL, CETP, LIPC, ABCA1, and SHBG 
Cardiovascular risk factors and atherosclerosis precursors were examined in 365 Turkish children and adolescents. Study participants were recruited at five different state schools. We tested single and multi-locus effects of six polymorphisms from five candidate genes, chosen based on prior known association with lipid levels in adults, for association with low (≤10th percentile) high density lipoprotein cholesterol (HDL-C) and high (≥90th percentile) triglycerides (TG), and the related continuous outcomes. We observed an association between CETP variant rs708272 and low HDL-C (allelic p=0.020, genotypic p=0.046), which was supported by an independent analysis, PRAT (PRAT control p=0.027). Sex-stratified logistic regression analysis showed that the B2 allele of rs708272 decreased odds of being in the lower tenth percentile of HDL-C measurements (OR=0.36, p=0.02) in girls; this direction of effect was also seen in boys but was not significant (OR=0.64, p=0.21). Logistic regression analysis also revealed that the T allele of rs6257 (SHBG) decreased odds of being in the top tenth percentile of TG measurements in boys (OR=0.43, p=0.03). Analysis of lipid levels as a continuous trait revealed a significant association between rs708272 (CETP) and LDL-C levels in males (p=0.02) with the B2B2 genotype group having the lowest mean LDL-C; the same direction of effect was also seen in females (p=0.05). An effect was also seen between rs708272 and HDL-C levels in girls (p=0.01), with the B2B2 genotype having the highest mean HDL-C levels. Multi-locus analysis, using quantitative multifactor dimensionality reduction (qMDR) identified the previously mentioned CETP variant as the best single locus model, and overall model, for predicting HDL-C levels in children. This study provides evidence for association between CETP and low HDL-C phenotype in children, but the results appear to be weaker in children than previous results in adults and may also be subject to gender effects.
9.  Antiviral Cationic Peptides as a Strategy for Innovation in Global Health Therapeutics for Dengue Virus: High Yield Production of the Biologically Active Recombinant Plectasin Peptide 
Dengue virus infects millions of people worldwide, and there is no vaccine or anti-dengue therapeutic available. Antimicrobial peptides have been shown to possess effective antiviral activity against various viruses. One of the main limitations of developing these peptides as potent antiviral drugs is the high cost of production. In this study, high yield production of biologically active plectasin peptide was inexpensively achieved by producing tandem plectasin peptides as inclusion bodies in E. coli. Antiviral activity of the recombinant peptide towards dengue serotype-2 NS2B-NS3 protease (DENV2 NS2B-NS3pro) was assessed as a target to inhibit dengue virus replication in Vero cells. Single units of recombinant plectasin were collected after applying consecutive steps of refolding, cleaving by Factor Xa, and nickel column purification to obtain recombinant proteins of high purity. The maximal nontoxic dose (MNTD) of the recombinant peptide against Vero cells was 20 μM (100 μg/mL). The reaction velocity of DENV2 NS2B-NS3pro decreased significantly after increasing concentrations of recombinant plectasin were applied to the reaction mixture. Plectasin peptide noncompetitively inhibited DENV2 NS2B-NS3pro at Ki value of 5.03±0.98 μM. The percentage of viral inhibition was more than 80% at the MNTD value of plectasin. In this study, biologically active recombinant plectasin which was able to inhibit dengue protease and viral replication in Vero cells was successfully produced in E. coli in a time- and cost- effective method. These findings are potentially important in the development of potent therapeutics against dengue infection.
10.  Agrigenomics for Microalgal Biofuel Production: An Overview of Various Bioinformatics Resources and Recent Studies to Link OMICS to Bioenergy and Bioeconomy 
Microalgal biofuels offer great promise in contributing to the growing global demand for alternative sources of renewable energy. However, to make algae-based fuels cost competitive with petroleum, lipid production capabilities of microalgae need to improve substantially. Recent progress in algal genomics, in conjunction with other “omic” approaches, has accelerated the ability to identify metabolic pathways and genes that are potential targets in the development of genetically engineered microalgal strains with optimum lipid content. In this review, we summarize the current bioeconomic status of global biofuel feedstocks with particular reference to the role of “omics” in optimizing sustainable biofuel production. We also provide an overview of the various databases and bioinformatics resources available to gain a more complete understanding of lipid metabolism across algal species, along with the recent contributions of “omic” approaches in the metabolic pathway studies for microalgal biofuel production.
11.  Postgenomics Diagnostics: Metabolomics Approaches to Human Blood Profiling 
We live in exciting times with the prospects of postgenomics diagnostics. Metabolomics is a novel “omics” data-intensive science that is accelerating the development of postgenomics diagnostics, particularly with use of accessible peripheral tissue compartments. Metabolomics involves the study of a comprehensive set of low molecular weight substances (metabolites) present in biological systems. The metabolite profiles represent the molecular phenotype of biological systems and reflect the information encoded at the genomic level and implemented at the transcriptomic and proteomic levels. Analysis of the human blood metabolite profile is a universal and highly promising tool for clinical postgenomics applications because it reflects both the endogenous and exogenous (environmental) factors influencing an individual organism. This article presents a critical synthesis and original analysis of both the technical implementation of metabolic profiling of blood and statistical analysis of metabolite profiles for effective disease diagnostics and risk assessment in the present postgenomics era.
12.  Novel Multivariate Methods for Integration of Genomics and Proteomics Data: Applications in a Kidney Transplant Rejection Study 
Multi-omics research is a key ingredient of data-intensive life sciences research, permitting measurement of biological molecules at different functional levels in the same individual. For a complete picture at the biological systems level, appropriate statistical techniques must however be developed to integrate different ‘omics’ data sets (e.g., genomics and proteomics). We report here multivariate projection-based analyses approaches to genomics and proteomics data sets, using the case study of and applications to observations in kidney transplant patients who experienced an acute rejection event (n=20) versus non-rejecting controls (n=20). In this data sets, we show how these novel methodologies might serve as promising tools for dimension reduction and selection of relevant features for different analytical frameworks. Unsupervised analyses highlighted the importance of post transplant time-of-rejection, while supervised analyses identified gene and protein signatures that together predicted rejection status with little time effect. The selected genes are part of biological pathways that are representative of immune responses. Gene enrichment profiles revealed increases in innate immune responses and neutrophil activities and a depletion of T lymphocyte related processes in rejection samples as compared to controls. In all, this article offers candidate biomarkers for future detection and monitoring of acute kidney transplant rejection, as well as ways forward for methodological advances to better harness multi-omics data sets.
13.  Functional Genetic Polymorphisms in CYP2C19 Gene in Relation to Cardiac Side Effects and Treatment Dose in a Methadone Maintenance Cohort 
Methadone maintenance therapy is an established treatment for heroin dependence. This study tested the influence of functional genetic polymorphisms in CYP2C19 gene encoding a CYP450 enzyme that contributes to methadone metabolism on treatment dose, plasma concentration, and side effects of methadone. Two single nucleotide polymorphisms (SNPs), rs4986893 (exon 4) and rs4244285 (exon 5), were selected and genotyped in 366 patients receiving methadone maintenance therapy in Taiwan. The steady-state plasma concentrations of both methadone and its EDDP metabolite enantiomers were measured. SNP rs4244285 allele was significantly associated with the corrected QT interval (QTc) change in the electrocardiogram (p=0.021), and the Treatment Emergent Symptom Scale (TESS) total score (p=0.021) in patients who continued using heroin, as demonstrated with a positive urine opiate test. Using the gene dose (GD) models where the CYP2C19 SNPs were clustered into poor (0 GD) versus intermediate (1 GD) and extensive (2 GD) metabolizers, we found that the extensive metabolizers required a higher dose of methadone (p=0.035), and showed a lower plasma R-methadone/methadone dose ratio (p=0.007) in urine opiate test negative patients, as well as a greater QTc change (p=0.008) and higher total scores of TESS (p=0.018) in urine opiate test positive patients, than poor metabolizers. These results in a large study sample from Taiwan suggest that the gene dose of CYP2C19 may potentially serve as an indicator for the plasma R-methadone/methadone dose ratio and cardiac side effect in patients receiving methadone maintenance therapy. Further studies of pharmacogenetic variation in methadone pharmacokinetics and pharmacodynamics are warranted in different world populations.
14.  Cell Metabolomics 
Metabolomics technologies enable the examination and identification of endogenous biochemical reaction products, revealing information on the precise metabolic pathways and processes within a living cell. Metabolism is either directly or indirectly involved with every aspect of cell function, and metabolomics is thus believed to be a reflection of the phenotype of any cell. Metabolomics analysis of cells has many potential applications and advantages compared to currently used methods in the postgenomics era. Cell metabolomics is an emerging field that addresses fundamental biological questions and allows one to observe metabolic phenomena in cells. Cell metabolomics consists of four sequential steps: (a) sample preparation and extraction, (b) metabolic profiles of low-weight metabolites based on MS or NMR spectroscopy techniques, (c) pattern recognition approaches and bioinformatics data analysis, (d) metabolites identification resulting in putative biomarkers and molecular targets. The biomarkers are eventually placed in metabolic networks to provide insight on the cellular biochemical phenomena. This article analyzes the recent developments in use of metabolomics to characterize and interpret the cellular metabolome in a wide range of pathophysiological and clinical contexts, and the putative roles of the endogenous small molecule metabolites in this new frontier of postgenomics biology and systems medicine.
15.  Extracellular Proteome Analysis of Leptospira interrogans serovar Lai 
Leptospirosis is one of the most important zoonoses. Leptospira interrogans serovar Lai is a pathogenic spirochete that is responsible for leptospirosis. Extracellular proteins play an important role in the pathogenicity of this bacterium. In this study, L. interrogans serovar Lai was grown in protein-free medium; the supernatant was collected and subsequently analyzed as the extracellular proteome. A total of 66 proteins with more than two unique peptides were detected by MS/MS, and 33 of these were predicted to be extracellular proteins by a combination of bioinformatics analyses, including Psortb, cello, SoSuiGramN and SignalP. Comparisons of the transcriptional levels of these 33 genes between in vivo and in vitro conditions revealed that 15 genes were upregulated and two genes were downregulated in vivo compared to in vitro. A BLAST search for the components of secretion system at the genomic and proteomic levels revealed the presence of the complete type I secretion system and type II secretion system in this strain. Moreover, this strain also exhibits complete Sec translocase and Tat translocase systems. The extracellular proteome analysis of L. interrogans will supplement the previously generated whole proteome data and provide more information for studying the functions of specific proteins in the infection process and for selecting candidate molecules for vaccines or diagnostic tools for leptospirosis.
16.  SOX2 Targets Fibronectin 1 to Promote Cell Migration and Invasion in Ovarian Cancer: New Molecular Leads for Therapeutic Intervention 
Ovarian cancer ranks as the second most common tumor of the female reproductive system, with a large burden on global public health. Therefore, the identification of novel molecular targets and diagnostics is an urgent need for many women affected by this disease. To this end, the human transcription factor SOX2 is involved in a wide range of pathophysiological roles, such as the maintenance of stem cell characteristics and carcinogenesis. To date, in most studies, SOX2 has been shown to promote the development of cancer, although its inhibitory roles in cancer have also been reported. However, to the best of our knowledge, the role of SOX2, specifically in ovarian cancer cells, has not been examined in detail. In this article, we report, for the first time, that SOX2 promotes migration, invasion, and clonal formation of ovarian cancer cells. We further observed that SOX2 targeted FN1, a key gene that regulates cell migration in ovarian cancer. Our findings collectively suggest that the SOX2-FN1 axis is a key pathway in mediating the migration and invasion of ovarian cancer cells. This pathway offers crucial molecular insights and promises to develop putative candidate therapeutic interventions in women with ovarian cancer.
17.  Genome Wide Identification of the Immunophilin Gene Family in Leptosphaeria maculans: A Causal Agent of Blackleg Disease in Oilseed Rape (Brassica napus) 
Phoma stem canker (blackleg) is a disease of world-wide importance on oilseed rape (Brassica napus) and can cause serious losses for crops globally. The disease is caused by dothideomycetous fungus, Leptosphaeria maculans, which is highly virulent/aggressive. Cyclophilins (CYPs) and FK506-binding proteins (FKBPs) are ubiquitous proteins belonging to the peptidyl-prolyl cis/trans isomerase (PPIase) family. They are collectively referred to as immunophilins (IMMs). In the present study, IMM genes, CYP and FKBP in haploid strain v23.1.3 of L. maculans genome, were identified and classified. Twelve CYPs and five FKBPs were determined in total. Domain architecture analysis revealed the presence of a conserved cyclophilin-like domain (CLD) in the case of CYPs and FKBP_C in the case of FKBPs. Interestingly, IMMs in L. maculans also subgrouped into single domain (SD) and multidomain (MD) proteins. They were primarily found to be localized in cytoplasm, nuclei, and mitochondria. Homologous and orthologous gene pairs were also determined by comparison with the model organism Saccharomyces cerevisiae. Remarkably, IMMs of L. maculans contain shorter introns in comparison to exons. Moreover, CYPs, in contrast with FKBPs, contain few exons. However, two CYPs were determined as being intronless. The expression profile of IMMs in both mycelium and infected primary leaves of B. napus demonstrated their potential role during infection. Secondary structure analysis revealed the presence of atypical eight β strands and two α helices fold architecture. Gene ontology analysis of IMMs predicted their significant role in protein folding and PPIase activity. Taken together, our findings for the first time present new prospects of this highly conserved gene family in phytopathogenic fungus.
18.  Effective Classification of MicroRNA Precursors Using Feature Mining and AdaBoost Algorithms 
MicroRNAs play important roles in most biological processes, including cell proliferation, tissue differentiation, and embryonic development, among others. They originate from precursor transcripts (pre-miRNAs), which contain phylogenetically conserved stem–loop structures. An important bioinformatics problem is to distinguish the pre-miRNAs from pseudo pre-miRNAs that have similar stem–loop structures. We present here a novel method for tackling this bioinformatics problem. Our method, named MirID, accepts an RNA sequence as input, and classifies the RNA sequence either as positive (i.e., a real pre-miRNA) or as negative (i.e., a pseudo pre-miRNA). MirID employs a feature mining algorithm for finding combinations of features suitable for building pre-miRNA classification models. These models are implemented using support vector machines, which are combined to construct a classifier ensemble. The accuracy of the classifier ensemble is further enhanced by the utilization of an AdaBoost algorithm. When compared with two closely related tools on twelve species analyzed with these tools, MirID outperforms the existing tools on the majority of the twelve species. MirID was also tested on nine additional species, and the results showed high accuracies on the nine species. The MirID web server is fully operational and freely accessible at Potential applications of this software in genomics and medicine are also discussed.
19.  Evaluation of Normalization Methods to Pave the Way Towards Large-Scale LC-MS-Based Metabolomics Profiling Experiments 
Combining liquid chromatography-mass spectrometry (LC-MS)-based metabolomics experiments that were collected over a long period of time remains problematic due to systematic variability between LC-MS measurements. Until now, most normalization methods for LC-MS data are model-driven, based on internal standards or intermediate quality control runs, where an external model is extrapolated to the dataset of interest. In the first part of this article, we evaluate several existing data-driven normalization approaches on LC-MS metabolomics experiments, which do not require the use of internal standards. According to variability measures, each normalization method performs relatively well, showing that the use of any normalization method will greatly improve data-analysis originating from multiple experimental runs. In the second part, we apply cyclic-Loess normalization to a Leishmania sample. This normalization method allows the removal of systematic variability between two measurement blocks over time and maintains the differential metabolites. In conclusion, normalization allows for pooling datasets from different measurement blocks over time and increases the statistical power of the analysis, hence paving the way to increase the scale of LC-MS metabolomics experiments. From our investigation, we recommend data-driven normalization methods over model-driven normalization methods, if only a few internal standards were used. Moreover, data-driven normalization methods are the best option to normalize datasets from untargeted LC-MS experiments.
20.  “OMICS” of Human Sperm: Profiling Protein Phosphatases 
Phosphorylation is a major regulatory mechanism in eukaryotic cells performed by the concerted actions of kinases and phosphatases (PPs). Protein phosphorylation has long been relevant to sperm physiology, from acquisition of motility in the epididymis to capacitation in the female reproductive tract. While the precise kinases involved in the regulation of sperm phosphorylation have been studied for decades, the PPs have only recently received research interest. Tyrosine phosphorylation was first implicated in the regulation of several sperm-related functions, from capacitation to oocyte binding. Only afterwards, in 1996, the inhibition of the serine/threonine-PP phosphoprotein phosphatase 1 (PPP1) by okadaic acid and calyculin-A was shown to initiate motility in caput epididymal sperm. Today, the current mechanisms of sperm motility acquisition based on PPP1 and its regulators are still far from being fully understood. PPP1CC2, specifically expressed in mammalian sperm, has been considered to be the only sperm-specific serine/threonine-PP, while other PPP1 isoforms were thought to be absent from sperm. This article examines the “Omics” of human sperm, and reports, for the first time, the identification of three new serine/threonine-protein PPs, PPP1CB, PPP4C, and PPP6C, in human sperm, together with two tyrosine-PPs, MKP1 and PTP1C. We specifically localized in sperm PPP1CB and PPP1CC2 from the PPP1 subfamily, and PPP2CA, PPP4C, and PPP6C from the PPP2 subfamily of the serine/threonine-PPs. A semi-quantitative analysis was performed to determine the various PPs' differential expression in sperm head and tail. These findings contribute to a comprehensive understanding of human sperm PPs, and warrant further research for their clinical and therapeutic significance.
21.  Molecular Phylogeny, Homology Modeling, and Molecular Dynamics Simulation of Race-Specific Bacterial Blight Disease Resistance Protein (xa5) of Rice: A Comparative Agriproteomics Approach 
Rice (Oryza sativa L.), a model plant belonging to the family Poaceae, is a staple food for a majority of the people worldwide. Grown in the tropical and subtropical regions of the world, this important cereal crop is under constant and serious threat from both biotic and abiotic stresses. Among the biotic threats, Xanthomonas oryzae pv. oryzae, causing the damaging bacterial blight disease in rice, is a prominent pathogen. The xa5 gene in the host plant rice confers race-specific resistance to this pathogen. This recessive gene belongs to the Xa gene family of rice and encodes a gamma subunit of transcription factor IIA (TFIIAγ). In view of the importance of this gene in conferring resistance to the devastating disease, we reconstructed the phylogenetic relationship of this gene, developed a three-dimensional protein model, followed by long-term molecular dynamics simulation studies to gain a better understanding of the evolution, structure, and function of xa5. The modeled structure was found to fit well with the small subunit of TFIIA from human, suggesting that it may also act as a small subunit of TFIIA in rice. The model had a stable conformation in response to the atomic flexibility and interaction, when subjected to MD simulation at 20 nano second in aqueous solution. Further structural analysis of xa5 indicated that the protein retained its basic transcription factor function, suggesting that it might govern a novel pathway responsible for bacterial blight resistance. Future molecular docking studies of xa5 underway with its corresponding avirulence gene is expected to shed more direct light into plant–pathogen interactions at the molecular level and thus pave the way for richer agriproteomic insights.
22.  Gene Panel Model Predictive of Outcome in Patients with Prostate Cancer 
In men at high risk for prostate cancer, established clinical and pathological parameters provide only limited prognostic information. Here we analyzed a French cohort of 103 prostate cancer patients and developed a gene panel model predictive of outcome in this group of patients. The model comprised of a 15-gene TaqMan Low-Density Array (TLDA) card, with gene expressions compared to a standardized reference. The RQ value for each gene was calculated, and a scoring system was developed. Summing all the binary scores (0 or 1) corresponding to the 15 genes, a global score is obtained between 0 and 15. This global score can be compared to Gleason score (0 to 10) by recalculating it into a 0–10 scaled score. A scaled score ≥2 suggested that the patient is suffering from a prostate cancer, and a scaled score ≥7 flagged aggressive cancer. Statistical analyses demonstrated a strongly significant linear correlation (p=3.50E-08) between scaled score and Gleason score for this prostate cancer cohort (N=103). These results support the capacity of this designed 15 target gene TLDA card approach to predict outcome in prostate cancer, opening up a new avenue for personalized medicine through future independent replication and applications for rapid identification of aggressive prostate cancer phenotypes for early intervention.
23.  Redundancy Control in Pathway Databases (ReCiPa): An Application for Improving Gene-Set Enrichment Analysis in Omics Studies and “Big Data” Biology 
Unparalleled technological advances have fueled an explosive growth in the scope and scale of biological data and have propelled life sciences into the realm of “Big Data” that cannot be managed or analyzed by conventional approaches. Big Data in the life sciences are driven primarily via a diverse collection of ‘omics’-based technologies, including genomics, proteomics, metabolomics, transcriptomics, metagenomics, and lipidomics. Gene-set enrichment analysis is a powerful approach for interrogating large ‘omics’ datasets, leading to the identification of biological mechanisms associated with observed outcomes. While several factors influence the results from such analysis, the impact from the contents of pathway databases is often under-appreciated. Pathway databases often contain variously named pathways that overlap with one another to varying degrees. Ignoring such redundancies during pathway analysis can lead to the designation of several pathways as being significant due to high content-similarity, rather than truly independent biological mechanisms. Statistically, such dependencies also result in correlated p values and overdispersion, leading to biased results. We investigated the level of redundancies in multiple pathway databases and observed large discrepancies in the nature and extent of pathway overlap. This prompted us to develop the application, ReCiPa (Redundancy Control in Pathway Databases), to control redundancies in pathway databases based on user-defined thresholds. Analysis of genomic and genetic datasets, using ReCiPa-generated overlap-controlled versions of KEGG and Reactome pathways, led to a reduction in redundancy among the top-scoring gene-sets and allowed for the inclusion of additional gene-sets representing possibly novel biological mechanisms. Using obesity as an example, bioinformatic analysis further demonstrated that gene-sets identified from overlap-controlled pathway databases show stronger evidence of prior association to obesity compared to pathways identified from the original databases.
24.  An Orthology-Based Analysis of Pathogenic Protozoa Impacting Global Health: An Improved Comparative Genomics Approach with Prokaryotes and Model Eukaryote Orthologs 
A key focus in 21st century integrative biology and drug discovery for neglected tropical and other diseases has been the use of BLAST-based computational methods for identification of orthologous groups in pathogenic organisms to discern orthologs, with a view to evaluate similarities and differences among species, and thus allow the transfer of annotation from known/curated proteins to new/non-annotated ones. We used here a profile-based sensitive methodology to identify distant homologs, coupled to the NCBI's COG (Unicellular orthologs) and KOG (Eukaryote orthologs), permitting us to perform comparative genomics analyses on five protozoan genomes. OrthoSearch was used in five protozoan proteomes showing that 3901 and 7473 orthologs can be identified by comparison with COG and KOG proteomes, respectively. The core protozoa proteome inferred was 418 Protozoa-COG orthologous groups and 704 Protozoa-KOG orthologous groups: (i) 31.58% (132/418) belongs to the category J (translation, ribosomal structure, and biogenesis), and 9.81% (41/418) to the category O (post-translational modification, protein turnover, chaperones) using COG; (ii) 21.45% (151/704) belongs to the categories J, and 13.92% (98/704) to the O using KOG. The phylogenomic analysis showed four well-supported clades for Eukarya, discriminating Multicellular [(i) human, fly, plant and worm] and Unicellular [(ii) yeast, (iii) fungi, and (iv) protozoa] species. These encouraging results attest to the usefulness of the profile-based methodology for comparative genomics to accelerate semi-automatic re-annotation, especially of the protozoan proteomes. This approach may also lend itself for applications in global health, for example, in the case of novel drug target discovery against pathogenic organisms previously considered difficult to research with traditional drug discovery tools.
25.  Mid-ATR-FTIR Spectroscopic Profiling of HIV/AIDS Sera for Novel Systems Diagnostics in Global Health 
Global health, whether in developed or developing countries, is in need of robust systems diagnostics for major diseases, such as HIV/AIDS, impacting the world populations. Fourier transform Infrared (FTIR) spectroscopy of serum is a quick and reagent-free methodology with which to analyze metabolic alterations such as those caused by disease or treatment. In this study, Attenuated Total Reflectance Fourier-Transform (ATR-FTIR) Spectroscopy was investigated as a means of distinguishing HIV-infected treatment-experienced (HIVpos ARTpos, n=39) and HIV-infected-treatment-naïve (HIVpos ARTneg, n=16) subjects from uninfected control subjects (n=30). Multivariate pattern recognition techniques, including partial least squares discriminant analysis (PLS-DA) and orthogonal partial least squares discriminant analysis (OPLS-DA), successfully distinguished sample classes, while univariate approaches identified significant differences (p<0.05) after Benjamini-Hochberg corrections. OPLS-DA discriminated between all groups with sensitivity, specificity, and accuracy of >90%. Compared to uninfected controls, HIVpos ARTpos and HIVpos ARTneg subjects displayed significant differences in spectral regions linked to lipids/fatty acids (3010 cm−1), carbohydrates (1299 cm−1; 1498 cm−1), glucose (1035 cm−1), and proteins (1600 cm−1; 1652 cm−1). These are all molecules shown by conventional biochemical analysis to be affected by HIV/ART interference. The biofluid metabolomics approach applied here successfully differentiated global metabolic profiles of HIV-infected patients and uninfected controls and detected potential biomarkers for development into indicators of host response to treatment and/or disease progression. Our findings therefore contribute to ongoing efforts for capacity-building in global health for robust omics science and systems diagnostics towards major diseases impacting population health.
