Little is known about the mechanisms of persistent airflow obstruction that result from chronic occupational endotoxin exposure. We sought to analyze the inflammatory response underlying persistent airflow obstruction as a result of chronic occupational endotoxin exposure. We developed a murine model of daily inhaled endotoxin for periods of 5 days to 8 weeks. We analyzed physiologic lung dysfunction, lung histology, bronchoalveolar lavage fluid and total lung homogenate inflammatory cell and cytokine profiles, and pulmonary gene expression profiles. We observed an increase in airway hyperresponsiveness as a result of chronic endotoxin exposure. After 8 weeks, the mice exhibited an increase in bronchoalveolar lavage and lung neutrophils that correlated with an increase in proinflammatory cytokines. Detailed analyses of inflammatory cell subsets revealed an expansion of dendritic cells (DCs), and in particular, proinflammatory DCs, with a reduced percentage of macrophages. Gene expression profiling revealed the up-regulation of a panel of genes that was consistent with DC recruitment, and lung histology revealed an accumulation of DCs in inflammatory aggregates around the airways in 8-week–exposed animals. Repeated, low-dose LPS inhalation, which mirrors occupational exposure, resulted in airway hyperresponsiveness, associated with a failure to resolve the proinflammatory response, an inverted macrophage to DC ratio, and a significant rise in the inflammatory DC population. These findings point to a novel underlying mechanism of airflow obstruction as a result of occupational LPS exposure, and suggest molecular and cellular targets for therapeutic development.
airway resistance; inhalation; neutrophils; macrophages; dendritic cells; endotoxin
The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina’s HiSeq2000, Life Technologies’ SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics’ technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies’ platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other platforms. This helps to identify the proper sequencing platform for whole genome studies with different application scopes.
Endotoxin is a near ubiquitous environmental exposure that that has been associated with both asthma and chronic obstructive pulmonary disease (COPD). These obstructive lung diseases have a complex pathophysiology, making them difficult to study comprehensively in the context of endotoxin. Genome-wide gene expression studies have been used to identify a molecular snapshot of the response to environmental exposures. Identification of differentially expressed genes shared across all published murine models of chronic inhaled endotoxin will provide insight into the biology underlying endotoxin-associated lung disease.
We identified three published murine models with gene expression profiling after repeated low-dose inhaled endotoxin. All array data from these experiments were re-analyzed, annotated consistently, and tested for shared genes found to be differentially expressed. Additional functional comparison was conducted by testing for significant enrichment of differentially expressed genes in known pathways. The importance of this gene signature in smoking-related lung disease was assessed using hierarchical clustering in an independent experiment where mice were exposed to endotoxin, smoke, and endotoxin plus smoke.
A 101-gene signature was detected in three murine models, more than expected by chance. The three model systems exhibit additional similarity beyond shared genes when compared at the pathway level, with increasing enrichment of inflammatory pathways associated with longer duration of endotoxin exposure. Genes and pathways important in both asthma and COPD were shared across all endotoxin models. Mice exposed to endotoxin, smoke, and smoke plus endotoxin were accurately classified with the endotoxin gene signature.
Despite the differences in laboratory, duration of exposure, and strain of mouse used in three experimental models of chronic inhaled endotoxin, surprising similarities in gene expression were observed. The endotoxin component of tobacco smoke may play an important role in disease development.
Although ovarian cancer is often initially chemotherapy-sensitive, the vast majority of tumors eventually relapse and patients die of increasingly aggressive disease. Cancer stem cells are believed to have properties that allow them to survive therapy and may drive recurrent tumor growth. Cancer stem cells or cancer-initiating cells are a rare cell population and difficult to isolate experimentally. Genes that are expressed by stem cells may characterize a subset of less differentiated tumors and aid in prognostic classification of ovarian cancer. The purpose of this study was the genomic identification and characterization of a subtype of ovarian cancer that has stem cell-like gene expression. Using human and mouse gene signatures of embryonic, adult, or cancer stem cells, we performed an unsupervised bipartition class discovery on expression profiles from 145 serous ovarian tumors to identify a stem-like and more differentiated subgroup. Subtypes were reproducible and were further characterized in four independent, heterogeneous ovarian cancer datasets. We identified a stem-like subtype characterized by a 51-gene signature, which is significantly enriched in tumors with properties of Type II ovarian cancer; high grade, serous tumors, and poor survival. Conversely, the differentiated tumors share properties with Type I, including lower grade and mixed histological subtypes. The stem cell-like signature was prognostic within high-stage serous ovarian cancer, classifying a small subset of high-stage tumors with better prognosis, in the differentiated subtype. In multivariate models that adjusted for common clinical factors (including grade, stage, age), the subtype classification was still a significant predictor of relapse. The prognostic stem-like gene signature yields new insights into prognostic differences in ovarian cancer, provides a genomic context for defining Type I/II subtypes, and potential gene targets which following further validation may be valuable in the clinical management or treatment of ovarian cancer.
Background & Aims
The precise mechanisms by which IFN exerts its antiviral effect against HCV have not yet been elucidated. We sought to identify host genes that mediate the antiviral effect of IFN-α by conducting a whole-genome siRNA library screen.
High throughput screening was performed using an HCV genotype 1b replicon, pRep-Feo. Those pools with replicate robust Z scores ≥ 2.0 entered secondary validation in full-length OR6 replicon cells. Huh7.5.1 cells infected with JFH1 were then used to validate the rescue efficacy of selected genes for HCV replication under IFN-α treatment.
We identified and confirmed 93 human genes involved in the IFN-α anti-HCV effect using a whole-genome siRNA library. Gene ontology analysis revealed that mRNA processing (23 genes, P=2.756e-22), translation initiation (9 genes, P=2.42e-6), and IFN signaling (5 genes, P=1.00e-3) were the most enriched functional groups. Nine genes were components of U4/U6.U5 tri-snRNP. We confirmed that silencing squamous cell carcinoma antigen recognized by T cells (SART1), a specific factor of tri-snRNP, abrogates IFN-α's suppressive effects against HCV in both replicon cells and JFH1 infectious cells. We further found that SART1 was not an IFN-α inducible, and its anti-HCV effector in the JFH1 infectious model was through regulation of interferon stimulated genes (ISGs) with or without IFN-α.
We identified 93 genes that mediate the anti-HCV effect of IFN-α through genome-wide siRNA screening; 23 and 9 genes were involved in mRNA processing and translation initiation, respectively. These findings reveal an unexpected role for mRNA processing in generation of the antiviral state, and suggest a new avenue for therapeutic development in HCV.
Hepatitis C Virus, HCV; Interferon-α, IFN-α; Small interfering RNA, siRNA; Squamous cell carcinoma Antigen Recognized by T cells, SART1; U4/U6.U5 tri-small nuclear ribonucleoproteins, U4/U6.U5 tri-snRNP
► We describe monolayers consisting of molecules with polar repeat units. ► The global energy gap in such layers can vanish. ► In such films the dipole moment per molecule saturates beyond a certain length. ► These effects result from electrostatic dimensionality effects. ► This represents an “organic electronics” analogue to inorganic “polar surfaces”.
In conjugated organic molecules, excitation gaps typically decrease reciprocally with increasing the number of repeat units, n. This usually holds for individual molecules as well as for the corresponding bulk materials. Here, we show using density-functional theory calculations that a qualitatively different evolution is found for layers built from molecules consisting of polar repeat units. Whereas a 1/n-dependence is still observed in the case of isolated polar molecules, the global gap decreases essentially linearly with n in the corresponding 2D-periodic systems and vanishes beyond a certain molecular length, with the frontier states being localized at opposite ends of the layer. The latter is accompanied by a saturation of the dipole moment per molecule, an effect not observed in the isolated polar molecules. Interestingly, in both cases the limit of the gap for long (but finite) molecules differs qualitatively from that of infinite length obtained in 1D-periodic and 3D-periodic calculations, the latter serving as models for polymers and the bulk. We rationalize these dimensionality effects as a consequence of the potential gradient within the finite-length layers. They arise from the collective action of intra-molecular dipoles in the 2D periodic layers and can be traced back to surface effects.
Energy gap; Polar surface; Dimensionality effects; Band-structure calculations; Polar molecules; Thin-film electrostatics
To make full use of research data, the bioscience community needs to adopt technologies and reward mechanisms that support interoperability and promote the growth of an open ‘data commoning’ culture. Here we describe the prerequisites for data commoning and present an established and growing ecosystem of solutions using the shared ‘Investigation-Study-Assay’ framework to support that vision.
Background: Animal studies suggest that early-life lead exposure influences gene expression and production of proteins associated with Alzheimer’s disease (AD).
Objectives: We attempted to assess the relationship between early-life lead exposure and potential biomarkers for AD among young men and women. We also attempted to assess whether early-life lead exposure was associated with changes in expression of AD-related genes.
Methods: We used sandwich enzyme-linked immunosorbent assays (ELISA) to measure plasma concentrations of amyloid β proteins Aβ40 and Aβ42 among 55 adults who had participated as newborns and young children in a prospective cohort study of the effects of lead exposure on development. We used RNA microarray techniques to analyze gene expression.
Results: Mean plasma Aβ42 concentrations were lower among 13 participants with high umbilical cord blood lead concentrations (≥ 10 μg/dL) than in 42 participants with lower cord blood lead concentrations (p = 0.08). Among 10 participants with high prenatal lead exposure, we found evidence of an inverse relationship between umbilical cord lead concentration and expression of ADAM metallopeptidase domain 9 (ADAM9), reticulon 4 (RTN4), and low-density lipoprotein receptor-related protein associated protein 1 (LRPAP1) genes, whose products are believed to affect Aβ production and deposition. Gene network analysis suggested enrichment in gene sets involved in nerve growth and general cell development.
Conclusions: Data from our exploratory study suggest that prenatal lead exposure may influence Aβ-related biological pathways that have been implicated in AD onset. Gene network analysis identified further candidates to study the mechanisms of developmental lead neurotoxicity.
Alzheimer’s disease; children; fetal basis of adult disease; human; lead
Gene expression quantitative trait loci (eQTL) are useful for identifying single nucleotide polymorphisms (SNPs) associated with diseases. At times, a genetic variant may be associated with a master regulator involved in the manifestation of a disease. The downstream target genes of the master regulator are typically co-expressed and share biological function. Therefore, it is practical to screen for eQTLs by identifying SNPs associated with the targets of a transcript-regulator (TR). We used a multivariate regression with the gene expression of known targets of TRs and SNPs to identify TReQTLs in European (CEU) and African (YRI) HapMap populations. A nominal p-value of <1×10−6 revealed 234 SNPs in CEU and 154 in YRI as TReQTLs. These represent 36 independent (tag) SNPs in CEU and 39 in YRI affecting the downstream targets of 25 and 36 TRs respectively. At a false discovery rate (FDR) = 45%, one cis-acting tag SNP (within 1 kb of a gene) in each population was identified as a TReQTL. In CEU, the SNP (rs16858621) in Pcnxl2 was found to be associated with the genes regulated by CREM whereas in YRI, the SNP (rs16909324) was linked to the targets of miRNA hsa-miR-125a. To infer the pathways that regulate expression, we ranked TReQTLs by connectivity within the structure of biological process subtrees. One TReQTL SNP (rs3790904) in CEU maps to Lphn2 and is associated (nominal p-value = 8.1×10−7) with the targets of the X-linked breast cancer suppressor Foxp3. The structure of the biological process subtree and a gene interaction network of the TReQTL revealed that tumor necrosis factor, NF-kappaB and variants in G-protein coupled receptors signaling may play a central role as communicators in Foxp3 functional regulation. The potential pleiotropic effect of the Foxp3 TReQTLs was gleaned from integrating mRNA-Seq data and SNP-set enrichment into the analysis.
Mounting evidence suggests that malignant tumors are initiated and maintained by a subpopulation of cancerous cells with biological properties similar to those of normal stem cells. However, descriptions of stem-like gene and pathway signatures in cancers are inconsistent across experimental systems. Driven by a need to improve our understanding of molecular processes that are common and unique across cancer stem cells (CSCs), we have developed the Stem Cell Discovery Engine (SCDE)—an online database of curated CSC experiments coupled to the Galaxy analytical framework. The SCDE allows users to consistently describe, share and compare CSC data at the gene and pathway level. Our initial focus has been on carefully curating tissue and cancer stem cell-related experiments from blood, intestine and brain to create a high quality resource containing 53 public studies and 1098 assays. The experimental information is captured and stored in the multi-omics Investigation/Study/Assay (ISA-Tab) format and can be queried in the data repository. A linked Galaxy framework provides a comprehensive, flexible environment populated with novel tools for gene list comparisons against molecular signatures in GeneSigDB and MSigDB, curated experiments in the SCDE and pathways in WikiPathways. The SCDE is available at http://discovery.hsci.harvard.edu.
A simple biochemical method to isolate mRNAs pulled down with a transfected, biotinylated microRNA was used to identify direct target genes of miR-34a, a tumor suppressor gene. The method reidentified most of the known miR-34a regulated genes expressed in K562 and HCT116 cancer cell lines. Transcripts for 982 genes were enriched in the pull-down with miR-34a in both cell lines. Despite this large number, validation experiments suggested that ∼90% of the genes identified in both cell lines can be directly regulated by miR-34a. Thus miR-34a is capable of regulating hundreds of genes. The transcripts pulled down with miR-34a were highly enriched for their roles in growth factor signaling and cell cycle progression. These genes form a dense network of interacting gene products that regulate multiple signal transduction pathways that orchestrate the proliferative response to external growth stimuli. Multiple candidate miR-34a–regulated genes participate in RAS-RAF-MAPK signaling. Ectopic miR-34a expression reduced basal ERK and AKT phosphorylation and enhanced sensitivity to serum growth factor withdrawal, while cells genetically deficient in miR-34a were less sensitive. Fourteen new direct targets of miR-34a were experimentally validated, including genes that participate in growth factor signaling (ARAF and PIK3R2) as well as genes that regulate cell cycle progression at various phases of the cell cycle (cyclins D3 and G2, MCM2 and MCM5, PLK1 and SMAD4). Thus miR-34a tempers the proliferative and pro-survival effect of growth factor stimulation by interfering with growth factor signal transduction and downstream pathways required for cell division.
microRNAs (miRNAs) are small RNAs that regulate gene expression by binding to mRNAs bearing a partially complementary sequence. miRNAs decrease the stability or translation of mRNA targets, leading to reduced protein expression. Understanding the biological function of a miRNA requires identifying its targets. Here we developed a sensitive and specific biochemical method to identify candidate microRNA targets that are enriched by pull-down with a tagged, transfected microRNA mimic. The method was applied to miR-34a, a miRNA that inhibits cell proliferation. We found that miR-34a can potentially regulate hundreds of genes. Computational analysis of these genes suggested a novel function for miR-34a—suppression of the pro-proliferative response to diverse growth factors. This function complements the previously known role of miR-34a in blocking cell cycle progression. Thus, by reducing the expression of an extensive network of genes, miR-34a dampens growth factor signaling as well as its downstream consequences, promotion of cell survival and proliferation.
By combining genome-wide association data from 8,130 individuals with type 2 diabetes (T2D) and 38,987 controls of European descent and following up previously unidentified meta-analysis signals in a further 34,412 cases and 59,925 controls, we identified 12 new T2D association signals with combinedP < 5 × 10−8. These include a second independent signal at the KCNQ1 locus; the first report, to our knowledge, of an X-chromosomal association (near DUSP9); and a further instance of overlap between loci implicated in monogenic and multifactorial forms of diabetes (at HNF1A). The identified loci affect both beta-cell function and insulin action, and, overall, T2D association signals show evidence of enrichment for genes involved in cell cycle regulation. We also show that a high proportion of T2D susceptibility loci harbor independent association signals influencing apparently unrelated complex traits.
As playing important roles in gene regulation, microRNAs (miRNAs) are
believed as indispensable involvers in the pathogenesis of myocardial
infarction (MI) that causes significant morbidity and mortality. Working on
a hypothesis that modulation of only some key members in the miRNA
superfamily could benefit ischemic heart, we proposed a microarray based
network biology approach to identify them with the recognized clinical
effect of propranolol as a prompt.
A long-term MI model of rat was established in this study. The microarray
technology was applied to determine the global miRNA expression change
intervened by propranolol. Multiple network analyses were sequentially
applied to evaluate the regulatory capacity, efficiency and emphasis of the
miRNAs which dysexpression in MI were significantly reversed by
Microarray data analysis indicated that long-term propranolol administration
caused 18 of the 31 dysregulated miRNAs in MI undergoing reversed
expression, implying that intentional modulation of miRNA expression might
show favorable effects for ischemic heart. Our network analysis identified
that, among these miRNAs, the prime players in MI were miR-1, miR-29b and
miR-98. Further finding revealed that miR-1 focused on regulation of myocyte
growth, yet miR-29b and miR-98 stressed on fibrosis and inflammation,
Our study illustrates how a combination of microarray technology and
functional protein network analysis can be used to identify disease-related
Summary: The first open source software suite for experimentalists and curators that (i) assists in the annotation and local management of experimental metadata from high-throughput studies employing one or a combination of omics and other technologies; (ii) empowers users to uptake community-defined checklists and ontologies; and (iii) facilitates submission to international public repositories.
Availability and Implementation: Software, documentation, case studies and implementations at http://www.isa-tools.org
Computational modeling is used to describe the mechanisms governing energy level alignment between an organic semiconductor (OSC) and a metal covered by various self-assembled monolayers (SAMs). In particular, we address the question to what extent and under what circumstances SAM-induced work-function modifications lead to an actual change of the barriers for electron and hole injection from the metal into the OSC layer. Depending on the nature of the SAM, we observe clear transitions between Fermi level pinning and vacuum-level alignment regimes. Surprisingly, although in most cases the pinning occurs only when the metal is present, it is not related to charge transfer between the electrode and the organic layer. Instead, charge rearrangements at the interface between the SAM and the OSC are observed, accompanied by a polarization of the SAM.
computational modeling; density functional theory; electronic structure/processes/mechanisms; monolayers; organic electronics; molecular electronics; self-assembly; metal/organic interfaces
With the arrival of the postgenomic era, there is increasing interest in the discovery of biomarkers for the accurate diagnosis, prognosis, and early detection of cancer. Blood-borne cancer markers are favored by clinicians, because blood samples can be obtained and analyzed with relative ease. We have used a combined mining strategy based on an integrated cancer microarray platform, Oncomine, and the biomarker module of the Ingenuity Pathways Analysis (IPA) program to identify potential blood-based markers for six common human cancer types.
In the Oncomine platform, the genes overexpressed in cancer tissues relative to their corresponding normal tissues were filtered by Gene Ontology keywords, with the extracellular environment stipulated and a corrected Q value (false discovery rate) cut-off implemented. The identified genes were imported to the IPA biomarker module to separate out those genes encoding putative secreted or cell-surface proteins as blood-borne (blood/serum/plasma) cancer markers. The filtered potential indicators were ranked and prioritized according to normalized absolute Student t values. The retrieval of numerous marker genes that are already clinically useful or under active investigation confirmed the effectiveness of our mining strategy. To identify the biomarkers that are unique for each cancer type, the upregulated marker genes that are in common between each two tumor types across the six human tumors were also analyzed by the IPA biomarker comparison function.
The upregulated marker genes shared among the six cancer types may serve as a molecular tool to complement histopathologic examination, and the combination of the commonly upregulated and unique biomarkers may serve as differentiating markers for a specific cancer. This approach will be increasingly useful to discover diagnostic signatures as the mass of microarray data continues to grow in the ‘omics’ era.
About 5% of western populations are afflicted by autoimmune diseases many of which are affected by sex hormones. Autoimmune diseases are complex and involve many genes. Identifying these disease-associated genes contributes to development of more effective therapies. Also, association studies frequently imply genomic regions that contain disease-associated genes but fall short of pinpointing these genes. The identification of disease-associated genes has always been challenging and to date there is no universal and effective method developed.
We have developed a method to prioritize disease-associated genes for diseases affected strongly by sex hormones. Our method uses various types of information available for the genes, but no information that directly links genes with the disease. It generates a score for each of the considered genes and ranks genes based on that score. We illustrate our method on early-onset myasthenia gravis (MG) using genes potentially controlled by estrogen and localized in a genomic segment (which contains the MHC and surrounding region) strongly associated with MG. Based on the considered genomic segment 283 genes are ranked for their relevance to MG and responsiveness to estrogen. The top three ranked genes, HLA-G, TAP2 and HLA-DRB1, are implicated in autoimmune diseases, while TAP2 is associated with SNPs characteristic for MG. Within the top 35 prioritized genes our method identifies 90% of the 10 already known MG-associated genes from the considered region without using any information that directly links genes to MG. Among the top eight genes we identified HLA-G and TUBB as new candidates. We show that our ab-initio approach outperforms the other methods for prioritizing disease-associated genes.
We have developed a method to prioritize disease-associated genes under the potential control of sex hormones. We demonstrate the success of this method by prioritizing the genes localized in the MHC and surrounding region and evaluating the role of these genes as potential candidates for estrogen control as well as MG. We show that our method outperforms the other methods. The method has a potential to be adapted to prioritize genes relevant to other diseases.
Serial Analysis of Gene Expression (SAGE) is a DNA sequencing-based method for large-scale gene expression profiling that provides an alternative to microarray analysis. Most analyses of SAGE data aimed at identifying co-expressed genes have been accomplished using various versions of clustering approaches that often result in a number of false positives.
Here we explore the use of seriation, a statistical approach for ordering sets of objects based on their similarity, for large-scale expression pattern discovery in SAGE data. For this specific task we implement a seriation heuristic we term ‘progressive construction of contigs’ that constructs local chains of related elements by sequentially rearranging margins of the correlation matrix. We apply the heuristic to the analysis of simulated and experimental SAGE data and compare our results to those obtained with a clustering algorithm developed specifically for SAGE data. We show using simulations that the performance of seriation compares favorably to that of the clustering algorithm on noisy SAGE data.
We explore the use of a seriation approach for visualization-based pattern discovery in SAGE data. Using both simulations and experimental data, we demonstrate that seriation is able to identify groups of co-expressed genes more accurately than a clustering algorithm developed specifically for SAGE data. Our results suggest that seriation is a useful method for the analysis of gene expression data whose applicability should be further pursued.
High-throughput gene expression data can predict gene function through the “guilt by association” principle: coexpressed genes are likely to be functionally associated.
We analyzed publicly available expression data on normal human tissues. The analysis is based on the integration of data obtained with two experimental platforms (microarrays and SAGE) and of various measures of dissimilarity between expression profiles. The building blocks of the procedure are the Ranked Coexpression Groups (RCG), small sets of tightly coexpressed genes which are analyzed in terms of functional annotation. Functionally characterized RCGs are selected by means of the majority rule and used to predict new functional annotations. Functionally characterized RCGs are enriched in groups of genes associated to similar phenotypes. We exploit this fact to find new candidate disease genes for many OMIM phenotypes of unknown molecular origin.
We predict new functional annotations for many human genes, showing that the integration of different data sets and coexpression measures significantly improves the scope of the results. Combining gene expression data, functional annotation and known phenotype-gene associations we provide candidate genes for several genetic diseases of unknown molecular basis.
A fundamental problem in systems biology and whole genome sequence analysis is how to infer functions for the many uncharacterized proteins that are identified, whether they are conserved across organisms of different phyla or are phylum-specific. This problem is especially acute in pathogens, such as malaria parasites, where genetic and biochemical investigations are likely to be more difficult. Here we perform comparative expression analysis on Plasmodium parasite life cycle data derived from P. falciparum blood, sporozoite, zygote and ookinete stages, and P. yoelii mosquito oocyst and salivary gland sporozoites, blood and liver stages and show that type II fatty acid biosynthesis genes are upregulated in liver and insect stages relative to asexual blood stages. We also show that some universally uncharacterized genes with orthologs in Plasmodium species, Saccharomyces cerevisiae and humans show coordinated transcription patterns in large collections of human and yeast expression data and that the function of the uncharacterized genes can sometimes be predicted based on the expression patterns across these diverse organisms. We also use a comprehensive and unbiased literature mining method to predict which uncharacterized parasite-specific genes are likely to have roles in processes such as gliding motility, host-cell interactions, sporozoite stage, or rhoptry function. These analyses, together with protein-protein interaction data, provide probabilistic models that predict the function of 926 uncharacterized malaria genes and also suggest that malaria parasites may provide a simple model system for the study of some human processes. These data also provide a foundation for further studies of transcriptional regulation in malaria parasites.