PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1129697)

Clipboard (0)
None

Related Articles

1.  Gene Expression Classification of Colon Cancer into Molecular Subtypes: Characterization, Validation, and Prognostic Value 
PLoS Medicine  2013;10(5):e1001453.
Background
Colon cancer (CC) pathological staging fails to accurately predict recurrence, and to date, no gene expression signature has proven reliable for prognosis stratification in clinical practice, perhaps because CC is a heterogeneous disease. The aim of this study was to establish a comprehensive molecular classification of CC based on mRNA expression profile analyses.
Methods and Findings
Fresh-frozen primary tumor samples from a large multicenter cohort of 750 patients with stage I to IV CC who underwent surgery between 1987 and 2007 in seven centers were characterized for common DNA alterations, including BRAF, KRAS, and TP53 mutations, CpG island methylator phenotype, mismatch repair status, and chromosomal instability status, and were screened with whole genome and transcriptome arrays. 566 samples fulfilled RNA quality requirements. Unsupervised consensus hierarchical clustering applied to gene expression data from a discovery subset of 443 CC samples identified six molecular subtypes. These subtypes were associated with distinct clinicopathological characteristics, molecular alterations, specific enrichments of supervised gene expression signatures (stem cell phenotype–like, normal-like, serrated CC phenotype–like), and deregulated signaling pathways. Based on their main biological characteristics, we distinguished a deficient mismatch repair subtype, a KRAS mutant subtype, a cancer stem cell subtype, and three chromosomal instability subtypes, including one associated with down-regulated immune pathways, one with up-regulation of the Wnt pathway, and one displaying a normal-like gene expression profile. The classification was validated in the remaining 123 samples plus an independent set of 1,058 CC samples, including eight public datasets. Furthermore, prognosis was analyzed in the subset of stage II–III CC samples. The subtypes C4 and C6, but not the subtypes C1, C2, C3, and C5, were independently associated with shorter relapse-free survival, even after adjusting for age, sex, stage, and the emerging prognostic classifier Oncotype DX Colon Cancer Assay recurrence score (hazard ratio 1.5, 95% CI 1.1–2.1, p = 0.0097). However, a limitation of this study is that information on tumor grade and number of nodes examined was not available.
Conclusions
We describe the first, to our knowledge, robust transcriptome-based classification of CC that improves the current disease stratification based on clinicopathological variables and common DNA markers. The biological relevance of these subtypes is illustrated by significant differences in prognosis. This analysis provides possibilities for improving prognostic models and therapeutic strategies. In conclusion, we report a new classification of CC into six molecular subtypes that arise through distinct biological pathways.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Cancer of the large bowel (colorectal cancer) is the third most common cancer in men and the second most common cancer in women worldwide. Despite recent advances in the screening, diagnosis, and treatment of colorectal cancer, an estimated 608,000 people die every year from this form of cancer—8% of all cancer deaths. The prognosis and treatment options for colorectal cancer depend on five pathological stages (0–IV), each of which has a different treatment option and five year survival rate, so it is important that the stage is correctly identified. Unfortunately, pathological staging fails to accurately predict recurrence (relapse) in patients undergoing surgery for localized colorectal cancer, which is a concern, as 10%–20% of patients with stage II and 30%–40% of those with stage III colorectal cancer develop recurrence.
Why Was This Study Done?
Previous studies have investigated whether there are any possible gene expression profiles (identified through microarray techniques) that can help predict prognosis of colorectal cancer, but so far, there have been no firm conclusions that can aid clinical practice. In this study, the researchers used genetic information from a French multicenter study to identify a standard, reproducible molecular classification based on gene expression analysis of colorectal cancer. The authors also assessed whether there were any associations between the identified molecular subtypes and clinical and pathological factors, common DNA alterations, and prognosis.
What Did the Researchers Do and Find?
The researchers used genetic information from a cohort of 750 patients with stage I to IV colorectal cancer who underwent surgery between 1987 and 2007 in seven centers in France. The researchers identified relevant clinical and pathological staging information for each patient from the medical records and calculated recurrence-free survival (the time from surgery to the first recurrence) for patients with stage II or III disease. In the genetic analysis, 566 tumor samples were suitable—443 were used in a discovery set, to create the classification, and the remainder were used in a validation set, to test the classification. The researchers also used information from eight public datasets to validate their findings.
Using these methods, the researchers classified the colon cancer samples into six molecular subtypes (based on gene expression data) and, on further analysis and validation, were able to distinguish the main biological characteristics and deregulated pathways associated with each subtype. Importantly, the researchers found that that these six subtypes were associated with distinct clinical and pathological characteristics, molecular alterations, specific gene expression signatures, and deregulated signaling pathways. In the prognostic analysis based on recurrence-free survival, the researchers found that patients whose tumors were classified in one of two clusters (C4 and C6) had poorer recurrence-free survival than the other patients.
What Do These Findings Mean?
These findings suggest that it is possible to classify colorectal cancer into six robust molecular subtypes that might help identify new prognostic subgroups and could provide a basis for developing robust prognostic genetic signatures for stage II and III colorectal cancer and for identifying specific markers for the different subtypes that might be targets for future drug development. However, as this study was retrospective and did not include some known predictors of colorectal cancer prognosis, such as tumor grade and number of nodes examined, the significance and robustness of the prognostic classification requires further confirmation with large prospective patient cohorts.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001453.
The American Cancer Society provides information about colorectal cancer and also about how colorectal cancer is staged
The US National Cancer Institute also provides information on colon and rectal cancer and colon cancer stages
doi:10.1371/journal.pmed.1001453
PMCID: PMC3660251  PMID: 23700391
2.  The Genome Organization of Thermotoga maritima Reflects Its Lifestyle 
PLoS Genetics  2013;9(4):e1003485.
The generation of genome-scale data is becoming more routine, yet the subsequent analysis of omics data remains a significant challenge. Here, an approach that integrates multiple omics datasets with bioinformatics tools was developed that produces a detailed annotation of several microbial genomic features. This methodology was used to characterize the genome of Thermotoga maritima—a phylogenetically deep-branching, hyperthermophilic bacterium. Experimental data were generated for whole-genome resequencing, transcription start site (TSS) determination, transcriptome profiling, and proteome profiling. These datasets, analyzed in combination with bioinformatics tools, served as a basis for the improvement of gene annotation, the elucidation of transcription units (TUs), the identification of putative non-coding RNAs (ncRNAs), and the determination of promoters and ribosome binding sites. This revealed many distinctive properties of the T. maritima genome organization relative to other bacteria. This genome has a high number of genes per TU (3.3), a paucity of putative ncRNAs (12), and few TUs with multiple TSSs (3.7%). Quantitative analysis of promoters and ribosome binding sites showed increased sequence conservation relative to other bacteria. The 5′UTRs follow an atypical bimodal length distribution comprised of “Short” 5′UTRs (11–17 nt) and “Common” 5′UTRs (26–32 nt). Transcriptional regulation is limited by a lack of intergenic space for the majority of TUs. Lastly, a high fraction of annotated genes are expressed independent of growth state and a linear correlation of mRNA/protein is observed (Pearson r = 0.63, p<2.2×10−16 t-test). These distinctive properties are hypothesized to be a reflection of this organism's hyperthermophilic lifestyle and could yield novel insights into the evolutionary trajectory of microbial life on earth.
Author Summary
Genomic studies have greatly benefited from the advent of high-throughput technologies and bioinformatics tools. Here, a methodology integrating genome-scale data and bioinformatics tools is developed to characterize the genome organization of the hyperthermophilic, phylogenetically deep-branching bacterium Thermotoga maritima. This approach elucidates several features of the genome organization and enables comparative analysis of these features across diverse taxa. Our results suggest that the genome of T. maritima is reflective of its hyperthermophilic lifestyle. Ultimately, constraints imposed on the genome have negative impacts on regulatory complexity and phenotypic diversity. Investigating the genome organization of Thermotogae species will help resolve various causal factors contributing to the genome organization such as phylogeny and environment. Applying a similar analysis of the genome organization to numerous taxa will likely provide insights into microbial evolution.
doi:10.1371/journal.pgen.1003485
PMCID: PMC3636130  PMID: 23637642
3.  Bench-to-bedside review: Future novel diagnostics for sepsis - a systems biology approach 
Critical Care  2013;17(5):231.
The early, accurate diagnosis and risk stratification of sepsis remains an important challenge in the critically ill. Since traditional biomarker strategies have not yielded a gold standard marker for sepsis, focus is shifting towards novel strategies that improve assessment capabilities. The combination of technological advancements and information generated through the human genome project positions systems biology at the forefront of biomarker discovery. While previously available, developments in the technologies focusing on DNA, gene expression, gene regulatory mechanisms, protein and metabolite discovery have made these tools more feasible to implement and less costly, and they have taken on an enhanced capacity such that they are ripe for utilization as tools to advance our knowledge and clinical research. Medicine is in a genome-level era that can leverage the assessment of thousands of molecular signals beyond simply measuring selected circulating proteins. Genomics is the study of the entire complement of genetic material of an individual. Epigenetics is the regulation of gene activity by reversible modifications of the DNA. Transcriptomics is the quantification of the relative levels of messenger RNA for a large number of genes in specific cells or tissues to measure differences in the expression levels of different genes, and the utilization of patterns of differential gene expression to characterize different biological states of a tissue. Proteomics is the large-scale study of proteins. Metabolomics is the study of the small molecule profiles that are the terminal downstream products of the genome and consists of the total complement of all low-molecular-weight molecules that cellular processes leave behind. Taken together, these individual fields of study may be linked during a systems biology approach. There remains a valuable opportunity to deploy these technologies further in human research. The techniques described in this paper not only have the potential to increase the spectrum of diagnostic and prognostic biomarkers in sepsis, but they may also enable the discovery of new disease pathways. This may in turn lead us to improved therapeutic targets. The objective of this paper is to provide an overview and basic framework for clinicians and clinical researchers to better understand the 'omics technologies' to enhance further use of these valuable tools.
doi:10.1186/cc12693
PMCID: PMC4057467  PMID: 24093155
4.  Absolute quantification of microbial proteomes at different states by directed mass spectrometry 
The developed, directed mass spectrometry workflow allows to generate consistent and system-wide quantitative maps of microbial proteomes in a single analysis. Application to the human pathogen L. interrogans revealed mechanistic proteome changes over time involved in pathogenic progression and antibiotic defense, and new insights about the regulation of absolute protein abundances within operons.
The developed, directed proteomic approach allowed consistent detection and absolute quantification of 1680 proteins of the human pathogen L. interrogans in a single LC–MS/MS experiment.The comparison of 25 extensive, consistent and quantitative proteome maps revealed new insights about the proteome changes involved in pathogenic progression and antibiotic defense of L. interrogans, and about the regulation of protein abundances within operons.The generated time-resolved data sets are compatible with pattern analysis algorithms developed for transcriptomics, including hierarchical clustering and functional enrichment analysis of the detected profile clusters.This is the first study that describes the absolute quantitative behavior of any proteome over multiple states and represents the most comprehensive proteome abundance pattern comparison for any organism to date.
Over the last decade, mass spectrometry (MS)-based proteomics has evolved as the method of choice for system-wide proteome studies and now allows for the characterization of several thousands of proteins in a single sample. Despite these great advances, redundant monitoring of protein levels over large sample numbers in a high-throughput manner remains a challenging task. New directed MS strategies have shown to overcome some of the current limitations, thereby enabling the acquisition of consistent and system-wide data sets of proteomes with low-to-moderate complexity at high throughput.
In this study, we applied this integrated, two-stage MS strategy to investigate global proteome changes in the human pathogen L. interrogans. In the initial discovery phase, 1680 proteins (out of around 3600 gene products) could be identified (Schmidt et al, 2008) and, by focusing precious MS-sequencing time on the most dominant, specific peptides per protein, all proteins could be accurately and consistently monitored over 25 different samples within a few days of instrument time in the following scoring phase (Figure 1). Additionally, the co-analysis of heavy reference peptides enabled us to obtain absolute protein concentration estimates for all identified proteins in each perturbation (Malmström et al, 2009). The detected proteins did not show any biases against functional groups or protein classes, including membrane proteins, and span an abundance range of more than three orders of magnitude, a range that is expected to cover most of the L. interrogans proteome (Malmström et al, 2009).
To elucidate mechanistic proteome changes over time involved in pathogenic progression and antibiotic defense of L. interrogans, we generated time-resolved proteome maps of cells perturbed with serum and three different antibiotics at sublethal concentrations that are currently used to treat Leptospirosis. This yielded an information-rich proteomic data set that describes, for the first time, the absolute quantitative behavior of any proteome over multiple states, and represents the most comprehensive proteome abundance pattern comparison for any organism to date. Using this unique property of the data set, we could quantify protein components of entire pathways across several time points and subject the data sets to cluster analysis, a tool that was previously limited to the transcript level due to incomplete sampling on protein level (Figure 4). Based on these analyses, we could demonstrate that Leptospira cells adjust the cellular abundance of a certain subset of proteins and pathways as a general response to stress while other parts of the proteome respond highly specific. The cells furthermore react to individual treatments by ‘fine tuning' the abundance of certain proteins and pathways in order to cope with the specific cause of stress. Intriguingly, the most specific and significant expression changes were observed for proteins involved in motility, tissue penetration and virulence after serum treatment where we tried to simulate the host environment. While many of the detected protein changes demonstrate good agreement with available transcriptomics data, most proteins showed a poor correlation. This includes potential virulence factors, like Loa22 or OmpL1, with confirmed expression in vivo that were significantly up-regulated on the protein level, but not on the mRNA level, strengthening the importance of proteomic studies. The high resolution and coverage of the proteome data set enabled us to further investigate protein abundance changes of co-regulated genes within operons. This suggests that although most proteins within an operon respond to regulation synchronously, bacterial cells seem to have subtle means to adjust the levels of individual proteins or protein groups outside of the general trend, a phenomena that was recently also observed on the transcript level of other bacteria (Güell et al, 2009).
The method can be implemented with standard high-resolution mass spectrometers and software tools that are readily available in the majority of proteomics laboratories. It is scalable to any proteome of low-to-medium complexity and can be extended to post-translational modifications or peptide-labeling strategies for quantification. We therefore expect the approach outlined here to become a cornerstone for microbial systems biology.
Over the past decade, liquid chromatography coupled with tandem mass spectrometry (LC–MS/MS) has evolved into the main proteome discovery technology. Up to several thousand proteins can now be reliably identified from a sample and the relative abundance of the identified proteins can be determined across samples. However, the remeasurement of substantially similar proteomes, for example those generated by perturbation experiments in systems biology, at high reproducibility and throughput remains challenging. Here, we apply a directed MS strategy to detect and quantify sets of pre-determined peptides in tryptic digests of cells of the human pathogen Leptospira interrogans at 25 different states. We show that in a single LC–MS/MS experiment around 5000 peptides, covering 1680 L. interrogans proteins, can be consistently detected and their absolute expression levels estimated, revealing new insights about the proteome changes involved in pathogenic progression and antibiotic defense of L. interrogans. This is the first study that describes the absolute quantitative behavior of any proteome over multiple states, and represents the most comprehensive proteome abundance pattern comparison for any organism to date.
doi:10.1038/msb.2011.37
PMCID: PMC3159967  PMID: 21772258
absolute quantification; directed mass spectrometry; Leptospira interrogans; microbiology; proteomics
5.  Survival-Related Profile, Pathways, and Transcription Factors in Ovarian Cancer 
PLoS Medicine  2009;6(2):e1000024.
Background
Ovarian cancer has a poor prognosis due to advanced stage at presentation and either intrinsic or acquired resistance to classic cytotoxic drugs such as platinum and taxoids. Recent large clinical trials with different combinations and sequences of classic cytotoxic drugs indicate that further significant improvement in prognosis by this type of drugs is not to be expected. Currently a large number of drugs, targeting dysregulated molecular pathways in cancer cells have been developed and are introduced in the clinic. A major challenge is to identify those patients who will benefit from drugs targeting these specific dysregulated pathways.The aims of our study were (1) to develop a gene expression profile associated with overall survival in advanced stage serous ovarian cancer, (2) to assess the association of pathways and transcription factors with overall survival, and (3) to validate our identified profile and pathways/transcription factors in an independent set of ovarian cancers.
Methods and Findings
According to a randomized design, profiling of 157 advanced stage serous ovarian cancers was performed in duplicate using ∼35,000 70-mer oligonucleotide microarrays. A continuous predictor of overall survival was built taking into account well-known issues in microarray analysis, such as multiple testing and overfitting. A functional class scoring analysis was utilized to assess pathways/transcription factors for their association with overall survival. The prognostic value of genes that constitute our overall survival profile was validated on a fully independent, publicly available dataset of 118 well-defined primary serous ovarian cancers. Furthermore, functional class scoring analysis was also performed on this independent dataset to assess the similarities with results from our own dataset. An 86-gene overall survival profile discriminated between patients with unfavorable and favorable prognosis (median survival, 19 versus 41 mo, respectively; permutation p-value of log-rank statistic = 0.015) and maintained its independent prognostic value in multivariate analysis. Genes that composed the overall survival profile were also able to discriminate between the two risk groups in the independent dataset. In our dataset 17/167 pathways and 13/111 transcription factors were associated with overall survival, of which 16 and 12, respectively, were confirmed in the independent dataset.
Conclusions
Our study provides new clues to genes, pathways, and transcription factors that contribute to the clinical outcome of serous ovarian cancer and might be exploited in designing new treatment strategies.
Ate van der Zee and colleagues analyze the gene expression profiles of ovarian cancer samples from 157 patients, and identify an 86-gene expression profile that seems to predict overall survival.
Editors' Summary
Background.
Ovarian cancer kills more than 100,000 women every year and is one of the most frequent causes of cancer death in women in Western countries. Most ovarian cancers develop when an epithelial cell in one of the ovaries (two small organs in the pelvis that produce eggs) acquires genetic changes that allow it to grow uncontrollably and to spread around the body (metastasize). In its early stages, ovarian cancer is confined to the ovaries and can often be treated successfully by surgery alone. Unfortunately, early ovarian cancer rarely has symptoms so a third of women with ovarian cancer have advanced disease when they first visit their doctor with symptoms that include vague abdominal pains and mild digestive disturbances. That is, cancer cells have spread into their abdominal cavity and metastasized to other parts of the body (so-called stage III and IV disease). The outlook for women diagnosed with stage III and IV disease, which are treated with a combination of surgery and chemotherapy, is very poor. Only 30% of women with stage III, and 5% with stage IV, are still alive five years after their cancer is diagnosed.
Why Was This Study Done?
If the cellular pathways that determine the biological behavior of ovarian cancer could be identified, it might be possible to develop more effective treatments for women with stage III and IV disease. One way to identify these pathways is to use gene expression profiling (a technique that catalogs all the genes expressed by a cell) to compare gene expression patterns in the ovarian cancers of women who survive for different lengths of time. Genes with different expression levels in tumors with different outcomes could be targets for new treatments. For example, it might be worth developing inhibitors of proteins whose expression is greatest in tumors with short survival times. In this study, the researchers develop an expression profile that is associated with overall survival in advanced-stage serous ovarian cancer (more than half of ovarian cancers originate in serous cells, epithelial cells that secrete a watery fluid). The researchers also assess the association of various cellular pathways and transcription factors (proteins that control the expression of other proteins) with survival in this type of ovarian carcinoma.
What Did the Researchers Do and Find?
The researchers analyzed the gene expression profiles of tumor samples taken from 157 patients with advanced stage serous ovarian cancer and used the “supervised principal components” method to build a predictor of overall survival from these profiles and patient survival times. This 86-gene predictor discriminated between patients with favorable and unfavorable outcomes (average survival times of 41 and 19 months, respectively). It also discriminated between groups of patients with these two outcomes in an independent dataset collected from 118 additional serous ovarian cancers. Next, the researchers used “functional class scoring” analysis to assess the association between pathway and transcription factor expression in the tumor samples and overall survival. Seventeen of 167 KEGG pathways (“wiring” diagrams of molecular interactions, reactions and relations involved in cellular processes and human diseases listed in the Kyoto Encyclopedia of Genes and Genomes) were associated with survival, 16 of which were confirmed in the independent dataset. Finally, 13 of 111 analyzed transcription factors were associated with overall survival in the tumor samples, 12 of which were confirmed in the independent dataset.
What Do These Findings Mean?
These findings identify an 86-gene overall survival gene expression profile that seems to predict overall survival for women with advanced serous ovarian cancer. However, before this profile can be used clinically, further validation of the profile and more robust methods for determining gene expression profiles are needed. Importantly, these findings also provide new clues about the genes, pathways and transcription factors that contribute to the clinical outcome of serous ovarian cancer, clues that can now be exploited in the search for new treatment strategies. Finally, these findings suggest that it might eventually be possible to tailor therapies to the needs of individual patients by analyzing which pathways are activated in their tumors and thus improve survival times for women with advanced ovarian cancer.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000024.
This study is further discussed in a PLoS Medicine Perspective by Simon Gayther and Kate Lawrenson
See also a related PLoS Medicine Research Article by Huntsman and colleagues
The US National Cancer Institute provides a brief description of what cancer is and how it develops, and information on all aspects of ovarian cancer for patients and professionals (in English and Spanish)
The UK charity Cancerbackup provides general information about cancer, and more specific information about ovarian cancer
MedlinePlus also provides links to other information about ovarian cancer (in English and Spanish)
The KEGG Pathway database provides pathway maps of known molecular networks involved in a wide range of cellular processes
doi:10.1371/journal.pmed.1000024
PMCID: PMC2634794  PMID: 19192944
6.  An Integrative Approach for Interpretation of Clinical NGS Genomic Variant Data 
Antibody (Ab) discovery research has accelerated as monoclonal Ab (mAb)-based biologic strategies have proved efficacious in the treatment of many human diseases, ranging from cancer to autoimmunity. Initial steps in the discovery of therapeutic mAb require epitope characterization and preclinical studies in vitro and in animal models often using limited quantities of Ab. To facilitate this research, our Shared Resource Laboratory (SRL) offers microscale Ab conjugation. Ab submitted for conjugation may or may not be commercially produced, but have not been characterized for use in immunofluorescence applications. Purified mAb and even polyclonal Ab (pAb) can be efficiently conjugated, although the advantages of direct conjugation are more obvious for mAb. To improve consistency of results in microscale (<100ug) conjugation reactions, we chose to utilize several different varieties of commercial kits. Kits tested were limited to covalent fluorophore labeling. Established quality control (QC) processes to validate fluorophore labeling either rely solely on spectrophotometry or utilize flow cytometry of cells expected to express the target antigen. This methodology is not compatible with microscale reactions using uncharacterized Ab. We developed a novel method for cell-free QC of our conjugates that reflects conjugation quality, but is independent of the biological properties of the Ab itself. QC is critical, as amine reactive chemistry relies on the absence of even trace quantities of competing amine moieties such as those found in the Good buffers (HEPES, MOPS, TES, etc.) or irrelevant proteins. Herein, we present data used to validate our method of assessing the extent of labeling and the removal of free dye by using flow cytometric analysis of polystyrene Ab capture beads to verify product quality. This microscale custom conjugation and QC allows for the rapid development and validation of high quality reagents, specific to the needs of our colleagues and clientele. Next generation sequencing (NGS) technologies provide the potential for developing high-throughput and low-cost platforms for clinical diagnostics. A limiting factor to clinical applications of genomic NGS is downstream bioinformatics analysis. Most analysis pipelines do not connect genomic variants to disease and protein specific information during the initial filtering and selection of relevant variants. Robust bioinformatics pipelines were implemented for trimming, genome alignment, SNP, INDEL, or structural variation detection of whole genome or exon-capture sequencing data from Illumina. Quality control metrics were analyzed at each step of the pipeline to ensure data integrity for clinical applications. We further annotate the variants with statistics regarding the diseased population and variant impact. Custom algorithms were developed to analyze the variant data by filtering variants based upon criteria such as quality of variant, inheritance pattern (e.g. dominant, recessive, X-linked), and impact of variant. The resulting variants and their associated genes are linked to Integrated Genome Browser (IGV) in a genome context, and to the PIR iProXpress system for rich protein and disease information. This poster will present detailed analysis of whole exome sequencing performed on patients with facio-skeletal anomalies. We will compare and contrast data analysis methods and report on potential clinically relevant leads discovered by implementing our new clinical variant pipeline. Our variant analysis of these patients and their unaffected family members resulted in more than 500,000 variants. By applying our system of annotations, prioritizations, inheritance filters, and functional profiling and analysis, we have created a unique methodology for further filtering of disease relevant variants that impact protein coding genes. Taken together, the integrative approach allows better selection of disease relevant genomic variants by using both genomic and disease/protein centric information. This type of clustering approach can help clinicians better understand the association of variants to the disease phenotype, enabling application to personalized medicine approaches.
PMCID: PMC4162289
7.  RNA-Seq Reveals Infection-Induced Gene Expression Changes in the Snail Intermediate Host of the Carcinogenic Liver Fluke, Opisthorchis viverrini 
Background
Bithynia siamensis goniomphalos is the snail intermediate host of the liver fluke, Opisthorchis viverrini, the leading cause of cholangiocarcinoma (CCA) in the Greater Mekong sub-region of Thailand. Despite the severe public health impact of Opisthorchis-induced CCA, knowledge of the molecular interactions occurring between the parasite and its snail intermediate host is scant. The examination of differences in gene expression profiling between uninfected and O. viverrini-infected B. siamensis goniomphalos could provide clues on fundamental pathways involved in the regulation of snail-parasite interplay.
Methodology/Principal Findings
Using high-throughput (Illumina) sequencing and extensive bioinformatic analyses, we characterized the transcriptomes of uninfected and O. viverrini-infected B. siamensis goniomphalos. Comparative analyses of gene expression profiling allowed the identification of 7,655 differentially expressed genes (DEGs), associated to 43 distinct biological pathways, including pathways associated with immune defense mechanisms against parasites. Amongst the DEGs with immune functions, transcripts encoding distinct proteases displayed the highest down-regulation in Bithynia specimens infected by O. viverrini; conversely, transcription of genes encoding heat-shock proteins and actins was significantly up-regulated in parasite-infected snails when compared to the uninfected counterparts.
Conclusions/Significance
The present study lays the foundation for functional studies of genes and gene products potentially involved in immune-molecular mechanisms implicated in the ability of the parasite to successfully colonize its snail intermediate host. The annotated dataset provided herein represents a ready-to-use molecular resource for the discovery of molecular pathways underlying susceptibility and resistance mechanisms of B. siamensis goniomphalos to O. viverrini and for comparative analyses with pulmonate snail intermediate hosts of other platyhelminths including schistosomes.
Author Summary
Despite recent significant advances in knowledge of the fundamental biology of the carcinogenic liver fluke Opisthorchis viverrini, little is known of the complement of molecular interactions occurring between this parasite and its prosobranch snail intermediate host, Bithynia siamensis goniomphalos. The determination of such interactions is a key, necessary component of the development of future integrated control strategies for liver fluke infection and associated bile duct cancer. Here, we use cutting-edge high-throughput sequencing technologies and advanced bioinformatic analyses to characterize, for the first time, qualitative and quantitative differences in gene expression between uninfected and O. viverrini-infected B. siamensis goniomphalos collected from an endemic region of Northeast Thailand. The analyses led to the identification of a number of molecules putatively involved in immune defense pathways against invading O. viverrini, and of key biological mechanisms potentially implicated in the ability of the parasite to successfully colonize its snail intermediate host. We believe that this ready-to-use molecular resource will provide the scientific community with new tools for the development of strategies to control the spread of liver fluke infection and the resulting bile duct cancer.
doi:10.1371/journal.pntd.0002765
PMCID: PMC3967946  PMID: 24676090
8.  Application of Genomic Tools in Plant Breeding 
Current Genomics  2012;13(3):179-195.
Plant breeding has been very successful in developing improved varieties using conventional tools and methodologies. Nowadays, the availability of genomic tools and resources is leading to a new revolution of plant breeding, as they facilitate the study of the genotype and its relationship with the phenotype, in particular for complex traits. Next Generation Sequencing (NGS) technologies are allowing the mass sequencing of genomes and transcriptomes, which is producing a vast array of genomic information. The analysis of NGS data by means of bioinformatics developments allows discovering new genes and regulatory sequences and their positions, and makes available large collections of molecular markers. Genome-wide expression studies provide breeders with an understanding of the molecular basis of complex traits. Genomic approaches include TILLING and EcoTILLING, which make possible to screen mutant and germplasm collections for allelic variants in target genes. Re-sequencing of genomes is very useful for the genome-wide discovery of markers amenable for high-throughput genotyping platforms, like SSRs and SNPs, or the construction of high density genetic maps. All these tools and resources facilitate studying the genetic diversity, which is important for germplasm management, enhancement and use. Also, they allow the identification of markers linked to genes and QTLs, using a diversity of techniques like bulked segregant analysis (BSA), fine genetic mapping, or association mapping. These new markers are used for marker assisted selection, including marker assisted backcross selection, ‘breeding by design’, or new strategies, like genomic selection. In conclusion, advances in genomics are providing breeders with new tools and methodologies that allow a great leap forward in plant breeding, including the ‘superdomestication’ of crops and the genetic dissection and breeding for complex traits.
doi:10.2174/138920212800543084
PMCID: PMC3382273  PMID: 23115520
Bioinformatics; complex traits; genetic maps; marker assisted selection; molecular markers; next-generation-sequencing; quantitative trait loci.
9.  In silico gene expression analysis – an overview 
Molecular Cancer  2007;6:50.
Efforts aimed at deciphering the molecular basis of complex disease are underpinned by the availability of high throughput strategies for the identification of biomolecules that drive the disease process. The completion of the human genome-sequencing project, coupled to major technological developments, has afforded investigators myriad opportunities for multidimensional analysis of biological systems. Nowhere has this research explosion been more evident than in the field of transcriptomics. Affordable access and availability to the technology that supports such investigations has led to a significant increase in the amount of data generated. As most biological distinctions are now observed at a genomic level, a large amount of expression information is now openly available via public databases. Furthermore, numerous computational based methods have been developed to harness the power of these data. In this review we provide a brief overview of in silico methodologies for the analysis of differential gene expression such as Serial Analysis of Gene Expression and Digital Differential Display. The performance of these strategies, at both an operational and result/output level is assessed and compared. The key considerations that must be made when completing an in silico expression analysis are also presented as a roadmap to facilitate biologists. Furthermore, to highlight the importance of these in silico methodologies in contemporary biomedical research, examples of current studies using these approaches are discussed. The overriding goal of this review is to present the scientific community with a critical overview of these strategies, so that they can be effectively added to the tool box of biomedical researchers focused on identifying the molecular mechanisms of disease.
doi:10.1186/1476-4598-6-50
PMCID: PMC1964762  PMID: 17683638
10.  Development of Functional Genomic Tools in Trematodes: RNA Interference and Luciferase Reporter Gene Activity in Fasciola hepatica 
The growing availability of sequence information from diverse parasites through genomic and transcriptomic projects offer new opportunities for the identification of key mediators in the parasite–host interaction. Functional genomics approaches and methods for the manipulation of genes are essential tools for deciphering the roles of genes and to identify new intervention targets in parasites. Exciting advances in functional genomics for parasitic helminths are starting to occur, with transgene expression and RNA interference (RNAi) reported in several species of nematodes, but the area is still in its infancy in flatworms, with reports in just three species. While advancing in model organisms, there is a need to rapidly extend these technologies to other parasites responsible for several chronic diseases of humans and cattle. In order to extend these approaches to less well studied parasitic worms, we developed a test method for the presence of a viable RNAi pathway by silencing the exogenous reporter gene, firefly luciferase (fLUC). We established the method in the human blood fluke Schistosoma mansoni and then confirmed its utility in the liver fluke Fasciola hepatica. We transformed newly excysted juveniles of F. hepatica by electroporation with mRNA of fLUC and three hours later were able to detect luciferase enzyme activity, concentrated mainly in the digestive ceca. Subsequently, we tested the presence of an active RNAi pathway in F. hepatica by knocking down the exogenous luciferase activity by introduction into the transformed parasites of double-stranded RNA (dsRNA) specific for fLUC. In addition, we tested the RNAi pathway targeting an endogenous F. hepatica gene encoding leucine aminopeptidase (FhLAP), and observed a significant reduction in specific mRNA levels. In summary, these studies demonstrated the utility of RNAi targeting reporter fLUC as a reporter gene assay to establish the presence of an intact RNAi pathway in helminth parasites. These could facilitate the study of gene function and the identification of relevant targets for intervention in organisms that are by other means intractable. More specifically, these results open new perspectives for functional genomics of F. hepatica, which hopefully can lead to the development of new interventions for fascioliasis.
Author Summary
Reverse genetics tools allow assessing the function of unknown genes. Their application for the study of neglected infectious diseases could lead eventually to the identification of relevant gene products to be used in diagnosis, or as drug targets or immunization candidates. Being technically more simple and less demanding than other reverse genetics tools such as transgenesis or knockouts, the suppression of gene activity mediated by double-stranded RNA has emerged as a powerful tool for the analysis of gene function. RNAi appeared as an obvious alternative to apply in complex biological systems where information is still scarce, a situation common to several infectious and parasitic diseases. However, several technical or practical difficulties have hampered the development of this technique in parasites to the expectations originally generated. We developed a simple method to test the presence of a viable RNAi pathway by silencing an exogenous reporter gene. The method was tested in F. hepatica, describing the conditions for transfection and confirming the existence of a viable RNAi pathway in this parasite. The experimental design created can be useful as a first approach in organisms where genetic analysis is still unavailable, providing a tool to unravel gene function and probably advancing new candidates relevant in pathobiology, prevention or treatment.
doi:10.1371/journal.pntd.0000260
PMCID: PMC2440534  PMID: 18612418
11.  Genomic Predictors for Recurrence Patterns of Hepatocellular Carcinoma: Model Derivation and Validation 
PLoS Medicine  2014;11(12):e1001770.
In this study, Lee and colleagues develop a genomic predictor that can identify patients at high risk for late recurrence of hepatocellular carcinoma (HCC) and provided new biomarkers for risk stratification.
Background
Typically observed at 2 y after surgical resection, late recurrence is a major challenge in the management of hepatocellular carcinoma (HCC). We aimed to develop a genomic predictor that can identify patients at high risk for late recurrence and assess its clinical implications.
Methods and Findings
Systematic analysis of gene expression data from human liver undergoing hepatic injury and regeneration revealed a 233-gene signature that was significantly associated with late recurrence of HCC. Using this signature, we developed a prognostic predictor that can identify patients at high risk of late recurrence, and tested and validated the robustness of the predictor in patients (n = 396) who underwent surgery between 1990 and 2011 at four centers (210 recurrences during a median of 3.7 y of follow-up). In multivariate analysis, this signature was the strongest risk factor for late recurrence (hazard ratio, 2.2; 95% confidence interval, 1.3–3.7; p = 0.002). In contrast, our previously developed tumor-derived 65-gene risk score was significantly associated with early recurrence (p = 0.005) but not with late recurrence (p = 0.7). In multivariate analysis, the 65-gene risk score was the strongest risk factor for very early recurrence (<1 y after surgical resection) (hazard ratio, 1.7; 95% confidence interval, 1.1–2.6; p = 0.01). The potential significance of STAT3 activation in late recurrence was predicted by gene network analysis and validated later. We also developed and validated 4- and 20-gene predictors from the full 233-gene predictor. The main limitation of the study is that most of the patients in our study were hepatitis B virus–positive. Further investigations are needed to test our prediction models in patients with different etiologies of HCC, such as hepatitis C virus.
Conclusions
Two independently developed predictors reflected well the differences between early and late recurrence of HCC at the molecular level and provided new biomarkers for risk stratification.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Primary liver cancer—a tumor that starts when a liver cell acquires genetic changes that allow it to grow uncontrollably—is the second-leading cause of cancer-related deaths worldwide, killing more than 600,000 people annually. If hepatocellular cancer (HCC; the most common type of liver cancer) is diagnosed in its early stages, it can be treated by surgically removing part of the liver (resection), by liver transplantation, or by local ablation, which uses an electric current to destroy the cancer cells. Unfortunately, the symptoms of HCC, which include weight loss, tiredness, and jaundice (yellowing of the skin and eyes), are vague and rarely appear until the cancer has spread throughout the liver. Consequently, HCC is rarely diagnosed before the cancer is advanced and untreatable, and has a poor prognosis (likely outcome)—fewer than 5% of patients survive for five or more years after diagnosis. The exact cause of HCC is unclear, but chronic liver (hepatic) injury and inflammation (caused, for example, by infection with hepatitis B virus [HBV] or by alcohol abuse) promote tumor development.
Why Was This Study Done?
Even when it is diagnosed early, HCC has a poor prognosis because it often recurs. Patients treated for HCC can experience two distinct types of tumor recurrence. Early recurrence, which usually happens within the first two years after surgery, arises from the spread of primary cancer cells into the surrounding liver that left behind during surgery. Late recurrence, which typically happens more than two years after surgery, involves the development of completely new tumors and seems to be the result of chronic liver damage. Because early and late recurrence have different clinical courses, it would be useful to be able to predict which patients are at high risk of which type of recurrence. Given that injury, inflammation, and regeneration seem to prime the liver for HCC development, might the gene expression patterns associated with these conditions serve as predictive markers for the identification of patients at risk of late recurrence of HCC? Here, the researchers develop a genomic predictor for the late recurrence of HCC by examining gene expression patterns in tissue samples from livers that were undergoing injury and regeneration.
What Did the Researchers Do and Find?
By comparing gene expression data obtained from liver biopsies taken before and after liver transplantation or resection and recorded in the US National Center for Biotechnology Information Gene Expression Omnibus database, the researchers identified 233 genes whose expression in liver differed before and after liver injury (the hepatic injury and regeneration, or HIR, signature). Statistical analyses indicate that the expression of the HIR signature in archived tissue samples was significantly associated with late recurrence of HCC in three independent groups of patients, but not with early recurrence (a significant association between two variables is one that is unlikely to have arisen by chance). By contrast, a tumor-derived 65-gene signature previously developed by the researchers was significantly associated with early recurrence but not with late recurrence. Notably, as few as four genes from the HIR signature were sufficient to construct a reliable predictor for late recurrence of HCC. Finally, the researchers report that many of the genes in the HIR signature encode proteins involved in inflammation and cell death, but that others encode proteins involved in cellular growth and proliferation such as STAT3, a protein with a well-known role in liver regeneration.
What Do These Findings Mean?
These findings identify a gene expression signature that was significantly associated with late recurrence of HCC in three independent groups of patients. Because most of these patients were infected with HBV, the ability of the HIR signature to predict late occurrence of HCC may be limited to HBV-related HCC and may not be generalizable to HCC related to other causes. Moreover, the predictive ability of the HIR signature needs to be tested in a prospective study in which samples are taken and analyzed at baseline and patients are followed to see whether their HCC recurs; the current retrospective study analyzed stored tissue samples. Importantly, however, the HIR signature associated with late recurrence and the 65-gene signature associated with early recurrence provide new insights into the biological differences between late and early recurrence of HCC at the molecular level. Knowing about these differences may lead to new treatments for HCC and may help clinicians choose the most appropriate treatments for their patients.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001770.
The US National Cancer Institute provides information about all aspects of cancer, including detailed information for patients and professionals about primary liver cancer (in English and Spanish)
The American Cancer Society also provides information about liver cancer (including information on support programs and services; available in several languages)
The UK National Health Service Choices website provides information about primary liver cancer (including a video about coping with cancer)
Cancer Research UK (a not-for-profit organization) also provides detailed information about primary liver cancer (including information about living with primary liver cancer)
MD Anderson Cancer Center provides information about symptoms, diagnosis, treatment, and prevention of primary liver cancer
MedlinePlus provides links to further resources about liver cancer (in English and Spanish)
doi:10.1371/journal.pmed.1001770
PMCID: PMC4275163  PMID: 25536056
12.  The emerging genomics and systems biology research lead to systems genomics studies 
BMC Genomics  2014;15(Suppl 11):I1.
Synergistically integrating multi-layer genomic data at systems level not only can lead to deeper insights into the molecular mechanisms related to disease initiation and progression, but also can guide pathway-based biomarker and drug target identification. With the advent of high-throughput next-generation sequencing technologies, sequencing both DNA and RNA has generated multi-layer genomic data that can provide DNA polymorphism, non-coding RNA, messenger RNA, gene expression, isoform and alternative splicing information. Systems biology on the other hand studies complex biological systems, particularly systematic study of complex molecular interactions within specific cells or organisms. Genomics and molecular systems biology can be merged into the study of genomic profiles and implicated biological functions at cellular or organism level. The prospectively emerging field can be referred to as systems genomics or genomic systems biology.
The Mid-South Bioinformatics Centre (MBC) and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and University of Arkansas for Medical Sciences are particularly interested in promoting education and research advancement in this prospectively emerging field. Based on past investigations and research outcomes, MBC is further utilizing differential gene and isoform/exon expression from RNA-seq and co-regulation from the ChiP-seq specific for different phenotypes in combination with protein-protein interactions, and protein-DNA interactions to construct high-level gene networks for an integrative genome-phoneme investigation at systems biology level.
doi:10.1186/1471-2164-15-S11-I1
PMCID: PMC4304174  PMID: 25558922
13.  Molecular Profiling of the Phytophthora plurivora Secretome: A Step towards Understanding the Cross-Talk between Plant Pathogenic Oomycetes and Their Hosts 
PLoS ONE  2014;9(11):e112317.
The understanding of molecular mechanisms underlying host–pathogen interactions in plant diseases is of crucial importance to gain insights on different virulence strategies of pathogens and unravel their role in plant immunity. Among plant pathogens, Phytophthora species are eliciting a growing interest for their considerable economical and environmental impact. Plant infection by Phytophthora phytopathogens is a complex process coordinated by a plethora of extracellular signals secreted by both host plants and pathogens. The characterization of the repertoire of effectors secreted by oomycetes has become an active area of research for deciphering molecular mechanisms responsible for host plants colonization and infection. Putative secreted proteins by Phytophthora species have been catalogued by applying high-throughput genome-based strategies and bioinformatic approaches. However, a comprehensive analysis of the effective secretome profile of Phytophthora is still lacking. Here, we report the first large-scale profiling of P. plurivora secretome using a shotgun LC-MS/MS strategy. To gain insight on the molecular signals underlying the cross-talk between plant pathogenic oomycetes and their host plants, we also investigate the quantitative changes of secreted protein following interaction of P. plurivora with the root exudate of Fagus sylvatica which is highly susceptible to the root pathogen. We show that besides known effectors, the expression and/or secretion levels of cell-wall-degrading enzymes were altered following the interaction with the host plant root exudate. In addition, a characterization of the F. sylvatica root exudate was performed by NMR and amino acid analysis, allowing the identification of the main released low-molecular weight components, including organic acids and free amino acids. This study provides important insights for deciphering the extracellular network involved in the highly susceptible P. plurivora-F. sylvatica interaction.
doi:10.1371/journal.pone.0112317
PMCID: PMC4221288  PMID: 25372870
14.  Dynamic interaction networks in a hierarchically organized tissue 
We have integrated gene expression profiling with database and literature mining, mechanistic modeling, and cell culture experiments to identify intercellular and intracellular networks regulating blood stem cell self-renewal.Blood stem cell fate in vitro is regulated non-autonomously by a coupled positive–negative intercellular feedback circuit, composed of megakaryocyte-derived stimulatory growth factors (VEGF, PDGF, EGF, and serotonin) versus monocyte-derived inhibitory factors (CCL3, CCL4, CXCL10, TGFB2, and TNFSF9).The antagonistic signals converge in a core intracellular network focused around PI3K, Raf, PLC, and Akt.Model simulations enable functional classification of the novel endogenous ligands and signaling molecules.
Intercellular (between cell) communication networks are required to maintain homeostasis and coordinate regenerative and developmental cues in multicellular organisms. Despite the recognized importance of intercellular networks in regulating adult stem and progenitor cell fate, the specific cell populations involved, and the underlying molecular mechanisms are largely undefined. Although a limited number of studies have applied novel bioinformatic approaches to unravel intercellular signaling in other cell systems (Frankenstein et al, 2006), a comprehensive analysis of intercellular communication in a stem cell-derived, hierarchical tissue network has yet to be reported.
As a model system to explore intercellular communication networks in a hierarchically organized tissue, we cultured human umbilical cord blood (UCB)-derived stem and progenitor cells in defined, minimal cytokine-supplemented liquid culture (Madlambayan et al, 2006). To systematically explore the molecular and cellular dynamics underlying primitive progenitor growth and differentiation, gene expression profiles of primitive (lineage negative; Lin−) and mature (lineage positive; Lin+) populations were generated during phases of stem cell expansion versus depletion. Parallel phenotypic and subproteomic experiments validated that mRNA expression correlated with complex measures of proteome activity (protein secretion and cell surface expression). Using a curated list of secreted ligand–receptor interactions and published expression profiles of purified mature blood populations, we implemented a novel algorithm to reconstruct the intercellular signaling networks established between stem cells and multi-lineage progeny in vitro. By correlating differential expression patterns with stem cell growth, we predict cell populations, pathways, and secreted ligands associated with stem cell self-renewal and differentiation (Figure 3A).
We then tested the correlative predictions in a series of cell culture experiments. UCB progenitor cell cultures were supplemented with saturating amounts of 18 putative regulatory ligands, or cocultured with purified mature blood lineages (megakaryocytes, monocytes, and erythrocytes), and analyzed for effects on total cell, progenitor, and primitive progenitor growth. At the primitive progenitor level, 3/5 novel predicted stimulatory ligands (EGF, PDGFB, and VEGF) displayed significant positive effects, 5/7 predicted inhibitory factors (CCL3, CCL4, CXCL10, TNFSF9, and TGFB2) displayed negative effects, whereas only 1/5 non-correlated ligand (CXCL7) displayed an effect. Also consistent with predictions from gene expression data, megakaryocytes and monocytes were found to stimulate and inhibit primitive progenitor growth, respectively, and these effects were attributable to differential secretome profiles of stimulatory versus inhibitory ligands.
Cellular responses to external stimuli, particularly in heterogeneous and dynamic cell populations, represent complex functions of multiple cell fate decisions acting both directly and indirectly on the target (stem cell) populations. Experimentally distinguishing the mode of action of cytokines is thus a difficult task. To address this we used our previously published interactive model of hematopoiesis (Kirouac et al, 2009) to classify experimentally identified regulatory ligands into one of four distinct functional categories based on their differential effects on cell population growth. TGFB2 was classified as a proliferation inhibitor, CCL4, CXCL10, SPARC, and TNFSF9 as self-renewal inhibitors, CCL3 a proliferation stimulator, and EGF, VEGF, and PDGFB as self-renewal stimulators.
Stem and progenitor cells exposed to combinatorial extracellular signals must propagate this information through intracellular molecular networks, and respond appropriately by modifying cell fate decisions. To explore how our experimentally identified positive and negative regulatory signals are integrated at the intracellular level, we constructed a blood stem cell self-renewal signaling network through extensive literature curation and protein–protein interaction (PPI) network mapping. We find that signal transduction pathways activated by the various stimulatory and inhibitory ligands converge on a limited set of molecular control nodes, forming a core subnetwork enriched for known regulators of self-renewal (Figure 6A). To experimentally test the intracellular signaling molecules computationally predicted as regulators of stem cell self-renewal, we obtained five small molecule antagonists against the kinases Phosphatidylinositol 3-kinase (PI3K), Raf, Akt, Phospholipase C (PLC), and MEK1. Liquid cultures were supplemented with the five molecules individually, and resultant cell population outputs compared against model simulations to deconvolute the functional effects on proliferation (and survival) versus self-renewal. This analysis classifies inhibition of PI3K and Raf activity as selectively targeting self-renewal, PLC as selectively targeting survival, and Akt as selectively targeting proliferation; MEK inhibition appears non-specific for these processes.
This represents the first systematic characterization of how cell fate decisions are regulated non-autonomously through lineage-specific interactions with differentiated progeny. The complex intercellular communication networks can be approximated as an antagonistic positive–negative feedback circuit, wherein progenitor expansion is modulated by a balance of megakaryocyte-derived stimulatory factors (EGF, PDGF, VEGF, and possibly serotonin) versus monocyte-derived inhibitory factors (CCL3, CCL4, CXCL10, TGFB2, and TNFSF9). This complex milieu of endogenous regulatory signals is integrated and processed within a core intracellular signaling network, resulting in modulation of cell-level kinetic parameters (proliferation, survival, and self-renewal). We reconstruct a stem cell associated intracellular network, and identify PI3K, Raf, Akt, and PLC as functionally distinct signal integration nodes, linking extracellular and intracellular signaling. These findings lay the groundwork for novel strategies to control blood stem cell self-renewal in vitro and in vivo.
Intercellular (between cell) communication networks maintain homeostasis and coordinate regenerative and developmental cues in multicellular organisms. Despite the importance of intercellular networks in stem cell biology, their rules, structure and molecular components are poorly understood. Herein, we describe the structure and dynamics of intercellular and intracellular networks in a stem cell derived, hierarchically organized tissue using experimental and theoretical analyses of cultured human umbilical cord blood progenitors. By integrating high-throughput molecular profiling, database and literature mining, mechanistic modeling, and cell culture experiments, we show that secreted factor-mediated intercellular communication networks regulate blood stem cell fate decisions. In particular, self-renewal is modulated by a coupled positive–negative intercellular feedback circuit composed of megakaryocyte-derived stimulatory growth factors (VEGF, PDGF, EGF, and serotonin) versus monocyte-derived inhibitory factors (CCL3, CCL4, CXCL10, TGFB2, and TNFSF9). We reconstruct a stem cell intracellular network, and identify PI3K, Raf, Akt, and PLC as functionally distinct signal integration nodes, linking extracellular, and intracellular signaling. This represents the first systematic characterization of how stem cell fate decisions are regulated non-autonomously through lineage-specific interactions with differentiated progeny.
doi:10.1038/msb.2010.71
PMCID: PMC2990637  PMID: 20924352
cellular networks; hematopoiesis; intercellular signaling; self-renewal; stem cells
15.  Studying Whole-Body Protein Turnover Using Stable-Isotope Labeled Mice 
Protein turnover is a neglected dimension in functional genomics studies delineating the dynamic changes of protein regulation and links transcriptome, proteome, and potentially metabolome. It is known that proteins are degraded and synthesized continuously at the cellular level. The balance of degradation and biosynthesis is tightly regulated and has great implications in normal physiology, cellular regulation, and human diseases. Recent technological advances in high-resolution mass spectrometry open up new opportunities to make proteome-wide prediction of protein turnover. However, studies of proteome turnover in live animals require high efficiency of labeling using labeled amino acids and bioinformatics to deal with highly complex mass spectra data. To unravel the dynamics of protein turnover in intact animals, we developed an in vivo pulse-chase strategy and bioinformatics tools (APCIE) to perform analysis of proteome turnover. We pulsed C57BL/6 mice with 15N labeled amino acids through diet and chased 15N labeled young (3 weeks old) and old (4 months old) mice with unlabeled diet for 2 weeks. Lungs from these two groups of mice were harvested and analyzed by mass spectrometry-based shotgun proteomic techniques. Using APCIE, we determined protein degradation (loss of 15N labeled amino acids) and protein synthesis (incorporation of 14N amino acids) of over 300 proteins commonly found in young and old mice and observed aged-dependent differences in proteome turnover. As expected, most proteins are turned over faster in young mice than old mice reflected by massive loss of 15N amino acids and incorporation of 14N amino acids. Our newly developed in vivo pulse-chase technique and algorithm can be used for global profiling of protein turnover in intact animals to study complex mouse models mimicking human diseases. Combining global profiling of protein turnover and protein expression will make a tremendous impact towards mechanism-driven therapeutics based on dynamic regulation of proteins and protein networks.
PMCID: PMC3186478
16.  Novel Insights into Adipogenesis from Omics Data 
Current Medicinal Chemistry  2009;16(23):2952-2964.
Obesity, the excess accumulation of adipose tissue, is one of the most pressing health problems in both the Western world and in developing countries. Adipose tissue growth results from two processes: the increase in number of adipocytes (hyperplasia) that develop from precursor cells, and the growth of individual fat cells (hypertrophy) due to incorporation of triglycerides. Adipogenesis, the process of fat cell development, has been extensively studied using various cell and animal models. While these studies pointed out a number of key factors involved in adipogenesis, the list of molecular components is far from complete.
The advance of high-throughput technologies has sparked many experimental studies aimed at the identification of novel molecular components regulating adipogenesis. This paper examines the results of recent studies on adipogenesis using high-throughput technologies. Specifically, it provides an overview of studies employing microarrays for gene expression profiling and studies using gel based and non-gel based proteomics as well as a chromatin immunoprecipitation followed by microarray analysis (ChIP-chip) or sequencing (ChIP-seq). Due to the maturity of the technology, the bulk of the available data was generated using microarrays. Therefore these data sets were not only reviewed but also underwent meta analysis.
The review also shows that large-scale omics technologies in conjunction with sophisticated bioinformatics analyses can provide not only a list of novel players, but also a global view on biological processes and molecular networks. Finally, developing technologies and computational challenges associated with the data analyses are highlighted, and an outlook on the questions not previously addressed is provided.
doi:10.2174/092986709788803132
PMCID: PMC2765082  PMID: 19689276
Adipogenesis; obesity; gene-expression profiling; proteomics; genome-wide location analysis; data integration.
17.  Global proteomics analysis of testis and ovary in adult zebrafish (Danio rerio) 
Fish Physiology and Biochemistry  2011;37(3):619-647.
The molecular mechanisms controlling sex determination and differentiation in zebrafish (Danio rerio) are largely unknown. A genome-wide analysis may provide comprehensive insights into the processes involved. The mRNA expression in zebrafish gonads has been fairly well studied, but much less data on the corresponding protein expression are available, although the proteins are considered to be more relevant markers of gene function. Because mRNA and protein abundances rarely correlate well, mRNA profiles need to be complemented with the information on protein expression. The work presented here analyzed the proteomes of adult zebrafish gonads by a multidimensional protein identification technology, generating the to-date most populated lists of proteins expressed in mature zebrafish gonads. The acquired proteomics data partially confirmed existing transcriptomics information for several genes, including several novel transcripts. However, disagreements between mRNA and protein abundances were often observed, further stressing the necessity to assess the expression on different levels before drawing conclusions on a certain gene’s expression and function. Several gene groups expressed in a sexually dimorphic way in zebrafish gonads were identified. Their potential importance for gonad development and function is discussed. The data gained in the current study provide a basis for further work on elucidating processes occurring during zebrafish development with use of high-throughput proteomics.
Electronic supplementary material
The online version of this article (doi:10.1007/s10695-010-9464-x) contains supplementary material, which is available to authorized users.
doi:10.1007/s10695-010-9464-x
PMCID: PMC3146978  PMID: 21229308
Sex differentiation; Testis; Ovary; Proteomics; Multidimensional protein identification technology (MudPIT); Zebrafish
18.  Whole Genome Sequencing versus Traditional Genotyping for Investigation of a Mycobacterium tuberculosis Outbreak: A Longitudinal Molecular Epidemiological Study 
PLoS Medicine  2013;10(2):e1001387.
In an outbreak investigation of Mycobacterium tuberculosis comparing whole genome sequencing (WGS) with traditional genotyping, Stefan Niemann and colleagues found that classical genotyping falsely clustered some strains, and WGS better reflected contact tracing.
Background
Understanding Mycobacterium tuberculosis (Mtb) transmission is essential to guide efficient tuberculosis control strategies. Traditional strain typing lacks sufficient discriminatory power to resolve large outbreaks. Here, we tested the potential of using next generation genome sequencing for identification of outbreak-related transmission chains.
Methods and Findings
During long-term (1997 to 2010) prospective population-based molecular epidemiological surveillance comprising a total of 2,301 patients, we identified a large outbreak caused by an Mtb strain of the Haarlem lineage. The main performance outcome measure of whole genome sequencing (WGS) analyses was the degree of correlation of the WGS analyses with contact tracing data and the spatio-temporal distribution of the outbreak cases. WGS analyses of the 86 isolates revealed 85 single nucleotide polymorphisms (SNPs), subdividing the outbreak into seven genome clusters (two to 24 isolates each), plus 36 unique SNP profiles. WGS results showed that the first outbreak isolates detected in 1997 were falsely clustered by classical genotyping. In 1998, one clone (termed “Hamburg clone”) started expanding, apparently independently from differences in the social environment of early cases. Genome-based clustering patterns were in better accordance with contact tracing data and the geographical distribution of the cases than clustering patterns based on classical genotyping. A maximum of three SNPs were identified in eight confirmed human-to-human transmission chains, involving 31 patients. We estimated the Mtb genome evolutionary rate at 0.4 mutations per genome per year. This rate suggests that Mtb grows in its natural host with a doubling time of approximately 22 h (400 generations per year). Based on the genome variation discovered, emergence of the Hamburg clone was dated back to a period between 1993 and 1997, hence shortly before the discovery of the outbreak through epidemiological surveillance.
Conclusions
Our findings suggest that WGS is superior to conventional genotyping for Mtb pathogen tracing and investigating micro-epidemics. WGS provides a measure of Mtb genome evolution over time in its natural host context.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Tuberculosis—a contagious bacterial disease that usually infects the lungs—is a major public health problem, particularly in low- and middle-income countries. In 2011, an estimated 8.7 million people developed tuberculosis globally, and 1.4 million people died from the disease. Tuberculosis is second only to HIV/AIDS in terms of global deaths from a single infectious agent. Mycobacterium tuberculosis, the bacterium that causes tuberculosis, is readily spread in airborne droplets when people with active disease cough or sneeze. The characteristic symptoms of tuberculosis include persistent cough, weight loss, fever, and night sweats. Diagnostic tests for the disease include sputum smear analysis (examination of mucus coughed up from the lungs for the presence of M. tuberculosis), mycobacterial culture (growth of M. tuberculosis from sputum), and chest X-rays. Tuberculosis can be cured by taking several antibiotics daily for at least six months, although the recent emergence of multidrug-resistant M. tuberculosis is making tuberculosis harder to treat.
Why Was This Study Done?
Although efforts to reduce the global burden of tuberculosis are showing some improvements, the annual decline in the number of people developing tuberculosis continues to be slow. To develop optimized control strategies, experts need to be able to accurately track M. tuberculosis transmission within human populations. Because M. tuberculosis, like all bacteria, accumulates genetic changes over time, there are many different strains (genetic variants) of M. tuberculosis. Genotyping methods have been developed that identify different bacterial strains by examining specific regions of the bacterial genome (blueprint), but because these methods examine only a small part of the genome, they may not distinguish between related transmission chains. That is, traditional strain genotyping methods may not be able to determine accurately where a tuberculosis outbreak started or how it spread through a population. In this longitudinal cohort study, the researchers compare the ability of whole genome sequencing (WGS), which is rapidly becoming widely available, and traditional genotyping to provide information about a recent German tuberculosis outbreak. In a longitudinal cohort study, a population is followed over time to analyze the occurrence of a specific disease.
What Did the Researchers Do and Find?
During long-term (1997–2010) population-based molecular epidemiological surveillance (disease surveillance that uses molecular techniques rather than reports of illness) in Hamburg and Schleswig-Holstein, the researchers identified a large tuberculosis outbreak caused by M. tuberculosis isolates of the Haarlem lineage using classical strain typing. The researchers examined each of the 86 isolates from this outbreak using WGS and classical genotyping and asked whether the results of these two approaches correlated with contact tracing data (information is routinely collected about the people a patient with tuberculosis has recently met so that these contacts can be tested for tuberculosis and treated if necessary) and with the spatio-temporal distribution of outbreak cases. WGS of the isolates identified 85 single nucleotide polymorphisms (SNPs; genomic sequence variants in which single building blocks, or nucleotides, are altered) that subdivided the outbreak into seven clusters of isolates and 36 unique isolates. The WGS results showed that the first isolates of the outbreak were incorrectly clustered by classical genotyping and that one strain—the “Hamburg clone”—started expanding in 1998. Notably, the genome-based clustering patterns were in better accordance with contact tracing data and with the geographical distribution of cases than clustering patterns based on classical genotyping, and they identified eight confirmed human-to-human transmission chains that involved 31 patients and a maximum of three SNPs. Finally, the researchers used their WGS results to estimate that the Hamburg clone emerged between 1993 and 1997, shortly before the discovery of the tuberculosis outbreak through epidemiological surveillance.
What Do These Findings Mean?
These findings show that WGS can be used to identify specific strains within large tuberculosis outbreaks more accurately than classical genotyping. They also provide new information about the evolution of M. tuberculosis during outbreaks and indicate how WGS data should be interpreted in future genome-based molecular epidemiology studies. WGS has the potential to improve the molecular epidemiological surveillance and control of tuberculosis and of other infectious diseases. Importantly, note the researchers, ongoing reductions in the cost of WGS, the increased availability of “bench top” genome sequencers, and bioinformatics developments should all accelerate the implementation of WGS as a standard method for the identification of transmission chains in infectious disease outbreaks.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001387.
The World Health Organization provides information (in several languages) on all aspects of tuberculosis, including the Global Tuberculosis Report 2012
The Stop TB Partnership is working towards tuberculosis elimination; patient stories about tuberculosis are available (in English and Spanish)
The US Centers for Disease Control and Prevention has information about tuberculosis, including information on tuberculosis genotyping (some information in English and Spanish)
The US National Institute of Allergy and Infectious Diseases also has detailed information on all aspects of tuberculosis
The Tuberculosis Survival Project, which aims to raise awareness of tuberculosis and provide support for people with tuberculosis, provides personal stories about treatment for tuberculosis; the Tuberculosis Vaccine Initiative also provides personal stories about dealing with tuberculosis
MedlinePlus has links to further information about tuberculosis (in English and Spanish)
Wikipedia has a page on whole-genome sequencing (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
doi:10.1371/journal.pmed.1001387
PMCID: PMC3570532  PMID: 23424287
19.  A systems approach to clinical oncology: Focus on breast cancer 
Proteome Science  2006;4:5.
During the past decade, genomic microarrays have been applied with some success to the molecular profiling of breast tumours, which has resulted in a much more detailed classification scheme as well as in the identification of potential gene signature sets. These gene sets have been applied to both the prognosis and prediction of outcome to treatment and have performed better than the current clinical criteria. One of the main limitations of microarray analysis, however, is that frozen tumour samples are required for the assay. This imposes severe limitations on access to samples and precludes large scale validation studies from being conducted. Quantitative reverse transcriptase polymerase chain reaction (qRT-PCR), on the other hand, can be used with degraded RNAs derived from formalin-fixed paraffin-embedded (FFPE) tumour samples, the most important and abundant source of clinical material available. More recently, the novel DASL (cDNA-mediated Annealing, Selection, extension and Ligation) assay has been developed as a high throughput gene expression profiling system specifically designed for use with FFPE tumour tissue samples.
However, we do not believe that genomics is adequate as a sole prognostic and predictive platform in breast cancer. The key proteins driving oncogenesis, for example, can undergo post-translational modifications; moreover, if we are ever to move individualization of therapy into the practical world of blood-based assays, serum proteomics becomes critical. Proteomic platforms, including tissue micro-arrays (TMA) and protein chip arrays, in conjunction with surface-enhanced laser desorption ionization time-of-flight mass spectrometry (SELDI-TOF/MS), have been the technologies most widely applied to the characterization of tumours and serum from breast cancer patients, with still limited but encouraging results.
This review will focus on these genomic and proteomic platforms, with an emphasis placed on the utilization of FFPE tumour tissue samples and serum, as they have been applied to the study of breast cancer for the discovery of gene signatures and biomarkers for the early diagnosis, prognosis and prediction of treatment outcome. The ultimate goal is to be able to apply a systems biology approach to the information gleaned from the combination of these techniques in order to select the best treatment strategy, monitor its effectiveness and make changes as rapidly as possible where needed to achieve the optimal therapeutic results for the patient.
doi:10.1186/1477-5956-4-5
PMCID: PMC1456950  PMID: 16595007
20.  MAID : An effect size based model for microarray data integration across laboratories and platforms 
BMC Bioinformatics  2008;9:305.
Background
Gene expression profiling has the potential to unravel molecular mechanisms behind gene regulation and identify gene targets for therapeutic interventions. As microarray technology matures, the number of microarray studies has increased, resulting in many different datasets available for any given disease. The increase in sensitivity and reliability of measurements of gene expression changes can be improved through a systematic integration of different microarray datasets that address the same or similar biological questions.
Results
Traditional effect size models can not be used to integrate array data that directly compare treatment to control samples expressed as log ratios of gene expressions. Here we extend the traditional effect size model to integrate as many array datasets as possible. The extended effect size model (MAID) can integrate any array datatype generated with either single or two channel arrays using either direct or indirect designs across different laboratories and platforms. The model uses two standardized indices, the standard effect size score for experiments with two groups of data, and a new standardized index that measures the difference in gene expression between treatment and control groups for one sample data with replicate arrays. The statistical significance of treatment effect across studies for each gene is determined by appropriate permutation methods depending on the type of data integrated. We apply our method to three different expression datasets from two different laboratories generated using three different array platforms and two different experimental designs. Our results indicate that the proposed integration model produces an increase in statistical power for identifying differentially expressed genes when integrating data across experiments and when compared to other integration models. We also show that genes found to be significant using our data integration method are of direct biological relevance to the three experiments integrated.
Conclusion
High-throughput genomics data provide a rich and complex source of information that could play a key role in deciphering intricate molecular networks behind disease. Here we propose an extension of the traditional effect size model to allow the integration of as many array experiments as possible with the aim of increasing the statistical power for identifying differentially expressed genes.
doi:10.1186/1471-2105-9-305
PMCID: PMC2483727  PMID: 18616827
21.  The Role of the Toxicologic Pathologist in the Post-Genomic Era# 
Journal of Toxicologic Pathology  2013;26(2):105-110.
An era can be defined as a period in time identified by distinctive character, events, or practices. We are now in the genomic era. The pre-genomic era: There was a pre-genomic era. It started many years ago with novel and seminal animal experiments, primarily directed at studying cancer. It is marked by the development of the two-year rodent cancer bioassay and the ultimate realization that alternative approaches and short-term animal models were needed to replace this resource-intensive and time-consuming method for predicting human health risk. Many alternatives approaches and short-term animal models were proposed and tried but, to date, none have completely replaced our dependence upon the two-year rodent bioassay. However, the alternative approaches and models themselves have made tangible contributions to basic research, clinical medicine and to our understanding of cancer and they remain useful tools to address hypothesis-driven research questions. The pre-genomic era was a time when toxicologic pathologists played a major role in drug development, evaluating the cancer bioassay and the associated dose-setting toxicity studies, and exploring the utility of proposed alternative animal models. It was a time when there was shortage of qualified toxicologic pathologists. The genomic era: We are in the genomic era. It is a time when the genetic underpinnings of normal biological and pathologic processes are being discovered and documented. It is a time for sequencing entire genomes and deliberately silencing relevant segments of the mouse genome to see what each segment controls and if that silencing leads to increased susceptibility to disease. What remains to be charted in this genomic era is the complex interaction of genes, gene segments, post-translational modifications of encoded proteins, and environmental factors that affect genomic expression. In this current genomic era, the toxicologic pathologist has had to make room for a growing population of molecular biologists. In this present era newly emerging DVM and MD scientists enter the work arena with a PhD in pathology often based on some aspect of molecular biology or molecular pathology research. In molecular biology, the almost daily technological advances require one’s complete dedication to remain at the cutting edge of the science. Similarly, the practice of toxicologic pathology, like other morphological disciplines, is based largely on experience and requires dedicated daily examination of pathology material to maintain a well-trained eye capable of distilling specific information from stained tissue slides - a dedicated effort that cannot be well done as an intermezzo between other tasks. It is a rare individual that has true expertise in both molecular biology and pathology. In this genomic era, the newly emerging DVM-PhD or MD-PhD pathologist enters a marketplace without many job opportunities in contrast to the pre-genomic era. Many face an identity crisis needing to decide to become a competent pathologist or, alternatively, to become a competent molecular biologist. At the same time, more PhD molecular biologists without training in pathology are members of the research teams working in drug development and toxicology. How best can the toxicologic pathologist interact in the contemporary team approach in drug development, toxicology research and safety testing? Based on their biomedical training, toxicologic pathologists are in an ideal position to link data from the emerging technologies with their knowledge of pathobiology and toxicology. To enable this linkage and obtain the synergy it provides, the bench-level, slide-reading expert pathologist will need to have some basic understanding and appreciation of molecular biology methods and tools. On the other hand, it is not likely that the typical molecular biologist could competently evaluate and diagnose stained tissue slides from a toxicology study or a cancer bioassay. The post-genomic era: The post-genomic era will likely arrive approximately around 2050 at which time entire genomes from multiple species will exist in massive databases, data from thousands of robotic high throughput chemical screenings will exist in other databases, genetic toxicity and chemical structure-activity-relationships will reside in yet other databases. All databases will be linked and relevant information will be extracted and analyzed by appropriate algorithms following input of the latest molecular, submolecular, genetic, experimental, pathology and clinical data. Knowledge gained will permit the genetic components of many diseases to be amenable to therapeutic prevention and/or intervention. Much like computerized algorithms are currently used to forecast weather or to predict political elections, computerized sophisticated algorithms based largely on scientific data mining will categorize new drugs and chemicals relative to their health benefits versus their health risks for defined human populations and subpopulations. However, this form of a virtual toxicity study or cancer bioassay will only identify probabilities of adverse consequences from interaction of particular environmental and/or chemical/drug exposure(s) with specific genomic variables. Proof in many situations will require confirmation in intact in vivo mammalian animal models. The toxicologic pathologist in the post-genomic era will be the best suited scientist to confirm the data mining and its probability predictions for safety or adverse consequences with the actual tissue morphological features in test species that define specific test agent pathobiology and human health risk.
doi:10.1293/tox.26.105
PMCID: PMC3695332  PMID: 23914052
genomic era; history of toxicologic pathology; molecular biology
22.  The quantitative proteomes of human-induced pluripotent stem cells and embryonic stem cells 
An in-depth proteomic comparison of human-induced pluripotent stem cells, and their parent fibroblast cells, with embryonic stem cells shows that the reprogramming process comprehensively remodels protein expression levels, creating cells that closely resemble natural stem cells.
We present here a large proteomic characterization of human embryonic stem cells, human-induced pluripotent stem cells and their parental fibroblasts cell lines.Overall, 97.8% of the 2683 quantified proteins in four experiments showed no significant differences in abundance between hESC and hiPSC highlighting the high similarity of these pluripotent cell lines.In total, 58 proteins were found significantly differentially expressed between hiPSCs and hESCs. The observed low overlap of these proteins with previous transcriptomic studies suggests that those differences do no reflect a recurrent molecular signature.
Human embryonic stem cells (hESCs) are capable of self-renewal and multi-lineage differentiation. However, the use of hESCs for clinical treatment entails ethical issues as they are derived from human embryos. Recently, reprogramming of somatic cells to an embryonic stem cell-like state, named induced pluripotent stem cells (iPSCs), was achieved through ectopic expression of defined factors. In addition to their clinical potential, hiPSCs represent a unique tool to develop cellular models for human diseases as well. Although current functional assays (e.g., tetraploid complementation) have confirmed the pluripotency of hiPSCs, there might still be significant differences (e.g., differentiation potential) when compared with their natural hESCs counterparts. Consequently, an extensive molecular characterization to address differences and similarities between these two pluripotent cell lines seems to be a prerequisite before any clinical application is conducted. Despite that great efforts, mainly at the genomic levels, have been made to address how similar hESCs and hiPSCs are, the definite answer to this fundamental question is currently still debated. Direct assessment of protein levels has yet to be incorporated into these integrative systems-level analyses. Protein levels are tuned by intricate mechanisms of gene expression regulation and it has recently been documented that mRNA and protein levels poorly correlate in mouse ESCs. Here, we use in-depth quantitative proteomics to gain insights into the differences and similarities in the protein content of two hiPS cell lines, their precursor IMR90 and 4Skin fibroblast cell lines and one hES cell line, providing novel molecular signatures that may assist in filling a gap in the understanding of pluripotency.
To study the degree of similarity, at the protein level, between hiPSCs and hESCs, four MS-based proteomic experiments were designed that use our in-house developed triplex dimethyl labeling chemistry followed by extensive fractionation by strong cation exchange (SCX) chromatography to reduce the sample complexity. High-resolution LC-MS/MS with dedicated fragmentation schemes (i.e., electron transfer dissociation, collision-induced dissociation and higher-energy collision dissociation) was subsequently used to maximize peptide identification rates. A total of 348 LC-MS/MS analyses (including technical and biological replicates) were performed. We confidently identified 1 593 446 peptide spectrum matches (peptide FDR<1%) corresponding to 10 628 unique protein groups (protein FDR∼4%). Using the extracted ion chromatograms, we also estimated the absolute abundance of the proteins within the samples spanning six orders of magnitude. To the best of our knowledge, the coverage obtained in this study represents the largest achieved by any proteomics screen on pluripotent cells.
Most importantly, our results indicate that the reprogramming process remodeled the proteome of both fibroblast cell lines to a profile that closely resembles the pluripotent hESCs proteome: 97.8% of the quantified proteins (2638 proteins in all four experiments) showed nonsignificant changes. Nevertheless, a small fraction of 58 proteins, mainly related to metabolism, antigen processing and cell adhesion, was found significantly regulated between hiPSCs and hESCs. A comparison of the regulated proteins to previously published transcriptomic studies showed a low overlap, highlighting the emerging notion that differences between both pluripotent cell lines rather reflect experimental conditions than a recurrent molecular signature. On the other side, the inclusion of the two parental fibroblast cell lines in our analysis allowed us to study changes in the proteome at both the starting and end points of the reprogramming process. As expected, the vast majority of the proteins (73.4%) showed differential expression between the parental fibroblasts and the reprogrammed pluripotent cells.
To find out if the differences observed in our study were a consequence of transcriptional or translational regulation, we performed paired genome-wide gene expression analyses on the same six samples that were used for the proteomic profiling. Overall, we observed a good correlation between mRNA and protein levels (r∼0.7). These results further authenticated the proteomic measurements and implied a high degree of control at the transcriptional level. Nevertheless, numerous genes were found uncorrelated highlighting the necessity of complementing transcriptomic-based approaches with proteomics.
Assessing relevant molecular differences between human-induced pluripotent stem cells (hiPSCs) and human embryonic stem cells (hESCs) is important, given that such differences may impact their potential therapeutic use. Controversy surrounds recent gene expression studies comparing hiPSCs and hESCs. Here, we present an in-depth quantitative mass spectrometry-based analysis of hESCs, two different hiPSCs and their precursor fibroblast cell lines. Our comparisons confirmed the high similarity of hESCs and hiPSCS at the proteome level as 97.8% of the proteins were found unchanged. Nevertheless, a small group of 58 proteins, mainly related to metabolism, antigen processing and cell adhesion, was found significantly differentially expressed between hiPSCs and hESCs. A comparison of the regulated proteins with previously published transcriptomic studies showed a low overlap, highlighting the emerging notion that differences between both pluripotent cell lines rather reflect experimental conditions than a recurrent molecular signature.
doi:10.1038/msb.2011.84
PMCID: PMC3261715  PMID: 22108792
human embryonic stem cells; human-induced pluripotent stem cells; proteomics; quantitation
23.  Stem cell systems informatics for advanced clinical biodiagnostics: tracing molecular signatures from bench to bedside 
Croatian Medical Journal  2013;54(4):319-329.
Development of innovative high throughput technologies has enabled a variety of molecular landscapes to be interrogated with an unprecedented degree of detail. Emergence of next generation nucleotide sequencing methods, advanced proteomic techniques, and metabolic profiling approaches continue to produce a wealth of biological data that captures molecular frameworks underlying phenotype. The advent of these novel technologies has significant translational applications, as investigators can now explore molecular underpinnings of developmental states with a high degree of resolution. Application of these leading-edge techniques to patient samples has been successfully used to unmask nuanced molecular details of disease vs healthy tissue, which may provide novel targets for palliative intervention. To enhance such approaches, concomitant development of algorithms to reprogram differentiated cells in order to recapitulate pluripotent capacity offers a distinct advantage to advancing diagnostic methodology. Bioinformatic deconvolution of several “-omic” layers extracted from reprogrammed patient cells, could, in principle, provide a means by which the evolution of individual pathology can be developmentally monitored. Significant logistic challenges face current implementation of this novel paradigm of patient treatment and care, however, several of these limitations have been successfully addressed through continuous development of cutting edge in silico archiving and processing methods. Comprehensive elucidation of genomic, transcriptomic, proteomic, and metabolomic networks that define normal and pathological states, in combination with reprogrammed patient cells are thus poised to become high value resources in modern diagnosis and prognosis of patient disease.
doi:10.3325//cmj.2013.54.319
PMCID: PMC3760656  PMID: 23986272
24.  A Mouse to Human Search for Plasma Proteome Changes Associated with Pancreatic Tumor Development 
PLoS Medicine  2008;5(6):e123.
Background
The complexity and heterogeneity of the human plasma proteome have presented significant challenges in the identification of protein changes associated with tumor development. Refined genetically engineered mouse (GEM) models of human cancer have been shown to faithfully recapitulate the molecular, biological, and clinical features of human disease. Here, we sought to exploit the merits of a well-characterized GEM model of pancreatic cancer to determine whether proteomics technologies allow identification of protein changes associated with tumor development and whether such changes are relevant to human pancreatic cancer.
Methods and Findings
Plasma was sampled from mice at early and advanced stages of tumor development and from matched controls. Using a proteomic approach based on extensive protein fractionation, we confidently identified 1,442 proteins that were distributed across seven orders of magnitude of abundance in plasma. Analysis of proteins chosen on the basis of increased levels in plasma from tumor-bearing mice and corroborating protein or RNA expression in tissue documented concordance in the blood from 30 newly diagnosed patients with pancreatic cancer relative to 30 control specimens. A panel of five proteins selected on the basis of their increased level at an early stage of tumor development in the mouse was tested in a blinded study in 26 humans from the CARET (Carotene and Retinol Efficacy Trial) cohort. The panel discriminated pancreatic cancer cases from matched controls in blood specimens obtained between 7 and 13 mo prior to the development of symptoms and clinical diagnosis of pancreatic cancer.
Conclusions
Our findings indicate that GEM models of cancer, in combination with in-depth proteomic analysis, provide a useful strategy to identify candidate markers applicable to human cancer with potential utility for early detection.
Samir Hanash and colleagues identify proteins that are increased at an early stage of pancreatic tumor development in a mouse model and may be a useful tool in detecting early tumors in humans.
Editors' Summary
Background.
Cancers are life-threatening, disorganized masses of cells that can occur anywhere in the human body. They develop when cells acquire genetic changes that allow them to grow uncontrollably and to spread around the body (metastasize). If a cancer is detected when it is still small and has not metastasized, surgery can often provide a cure. Unfortunately, many cancers are detected only when they are large enough to press against surrounding tissues and cause pain or other symptoms. By this time, surgical removal of the original (primary) tumor may be impossible and there may be secondary cancers scattered around the body. In such cases, radiotherapy and chemotherapy can sometimes help, but the outlook for patients whose cancers are detected late is often poor. One cancer type for which late detection is a particular problem is pancreatic adenocarcinoma. This cancer rarely causes any symptoms in its early stages. Furthermore, the symptoms it eventually causes—jaundice, abdominal and back pain, and weight loss—are seen in many other illnesses. Consequently, pancreatic cancer has usually spread before it is diagnosed, and most patients die within a year of their diagnosis.
Why Was This Study Done?
If a test could be developed to detect pancreatic cancer in its early stages, the lives of many patients might be extended. Tumors often release specific proteins—“cancer biomarkers”—into the blood, a bodily fluid that can be easily sampled. If a protein released into the blood by pancreatic cancer cells could be identified, it might be possible to develop a noninvasive screening test for this deadly cancer. In this study, the researchers use a “proteomic” approach to identify potential biomarkers for early pancreatic cancer. Proteomics is the study of the patterns of proteins made by an organism, tissue, or cell and of the changes in these patterns that are associated with various diseases.
What Did the Researchers Do and Find?
The researchers started their search for pancreatic cancer biomarkers by studying the plasma proteome (the proteins in the fluid portion of blood) of mice genetically engineered to develop cancers that closely resemble human pancreatic tumors. Through the use of two techniques called high-resolution mass spectrometry and acrylamide isotopic labeling, the researchers identified 165 proteins that were present in larger amounts in plasma collected from mice with early and/or advanced pancreatic cancer than in plasma from control mice. Then, to test whether any of these protein changes were relevant to human pancreatic cancer, the researchers analyzed blood samples collected from patients with pancreatic cancer. These samples, they report, contained larger amounts of some of these proteins than blood collected from patients with chronic pancreatitis, a condition that has similar symptoms to pancreatic cancer. Finally, using blood samples collected during a clinical trial, the Carotene and Retinol Efficacy Trial (a cancer-prevention study), the researchers showed that the measurement of five of the proteins present in increased amounts at an early stage of tumor development in the mouse model discriminated between people with pancreatic cancer and matched controls up to 13 months before cancer diagnosis.
What Do These Findings Mean?
These findings suggest that in-depth proteomic analysis of genetically engineered mouse models of human cancer might be an effective way to identify biomarkers suitable for the early detection of human cancers. Previous attempts to identify such biomarkers using human samples have been hampered by the many noncancer-related differences in plasma proteins that exist between individuals and by problems in obtaining samples from patients with early cancer. The use of a mouse model of human cancer, these findings indicate, can circumvent both of these problems. More specifically, these findings identify a panel of proteins that might allow earlier detection of pancreatic cancer and that might, therefore, extend the life of some patients who develop this cancer. However, before a routine screening test becomes available, additional markers will need to be identified and extensive validation studies in larger groups of patients will have to be completed.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0050123.
The MedlinePlus Encyclopedia has a page on pancreatic cancer (in English and Spanish). Links to further information are provided by MedlinePlus
The US National Cancer Institute has information about pancreatic cancer for patients and health professionals (in English and Spanish)
The UK charity Cancerbackup also provides information for patients about pancreatic cancer
The Clinical Proteomic Technologies for Cancer Initiative (a US National Cancer Institute initiative) provides a tutorial about proteomics and cancer and information on the Mouse Proteomic Technologies Initiative
doi:10.1371/journal.pmed.0050123
PMCID: PMC2504036  PMID: 18547137
25.  Enhancing the Role of Veterinary Vaccines Reducing Zoonotic Diseases of Humans: Linking Systems Biology with Vaccine Development 
Vaccine  2011;29(41):7197-7206.
The aim of research on infectious diseases is their prevention, and brucellosis and salmonellosis as such are classic examples of worldwide zoonoses for application of a systems biology approach for enhanced rational vaccine development. When used optimally, vaccines prevent disease manifestations, reduce transmission of disease, decrease the need for pharmaceutical intervention, and improve the health and welfare of animals, as well as indirectly protecting against zoonotic diseases of people. Advances in the last decade or so using comprehensive systems biology approaches linking genomics, proteomics, bioinformatics, and biotechnology with immunology, pathogenesis and vaccine formulation and delivery are expected to enable enhanced approaches to vaccine development. The goal of this paper is to evaluate the role of computational systems biology analysis of host:pathogen interactions (the interactome) as a tool for enhanced rational design of vaccines. Systems biology is bringing a new, more robust approach to veterinary vaccine design based upon a deeper understanding of the host-pathogen interactions and its impact on the host's molecular network of the immune system. A computational systems biology method was utilized to create interactome models of the host responses to Brucella melitensis (BMEL), Mycobacterium avium paratuberculosis (MAP), Salmonella enterica Typhimurium (STM), and a Salmonella mutant (isogenic ΔsipA, sopABDE2) and linked to the basis for rational development of vaccines for brucellosis and salmonellosis as reviewed by Adams and Ficht (Adams et al. 2009; Ficht et al. 2009). A bovine ligated ileal loop biological model was established to capture the host gene expression response at multiple time points post infection. New methods based on Dynamic Bayesian Network (DBN) machine learning were employed to conduct a comparative pathogenicity analysis of 219 signaling and metabolic pathways and 1620 Gene Ontology (GO) categories that defined the host's biosignatures to each infectious condition. Through this DBN computational approach, the method identified significantly perturbed pathways and GO category groups of genes that define the pathogenicity signatures of the infectious agent. Our preliminary results provide deeper understanding of the overall complexity of host innate immune response as well as the identification of host gene perturbations that defines a unique host temporal biosignature response to each pathogen. The application of advanced computational methods for developing interactome models based on DBNs has proven to be instrumental in elucidating novel host responses and improved functional biological insight into the host defensive mechanisms. Evaluating the unique differences in pathway and GO perturbations across pathogen conditions allowed the identification of plausible host-pathogen interaction mechanisms. Accordingly, a systems biology approach to study molecular pathway gene expression profiles of host cellular responses to microbial pathogens holds great promise as a methodology to identify, model and predict the overall dynamics of the host-pathogen interactome. Thus, we propose that such an approach has immediate application to the rational design of brucellosis and salmonellosis vaccines.
doi:10.1016/j.vaccine.2011.05.080
PMCID: PMC3170448  PMID: 21651944

Results 1-25 (1129697)