Search tips
Search criteria

Results 1-25 (1497705)

Clipboard (0)

Related Articles

1.  BioNetwork Bench: Database and Software for Storage, Query, and Analysis of Gene and Protein Networks 
Gene and protein networks offer a powerful approach for integration of the disparate yet complimentary types of data that result from high-throughput analyses. Although many tools and databases are currently available for accessing such data, they are left unutilized by bench scientists as they generally lack features for effective analysis and integration of both public and private datasets and do not offer an intuitive interface for use by scientists with limited computational expertise. We describe BioNetwork Bench, an open source, user-friendly suite of database and software tools for constructing, querying, and analyzing gene and protein network models. It enables biologists to analyze public as well as private gene expression; interactively query gene expression datasets; integrate data from multiple networks; store and selectively share the data and results. Finally, we describe an application of BioNetwork Bench to the assembly and iterative expansion of a gene network that controls the differentiation of retinal progenitor cells into rod photoreceptors. The tool is available from
The emergence of high-throughput technologies has allowed many biological investigators to collect a great deal of information about the behavior of genes and gene products over time or during a particular disease state. Gene and protein networks offer a powerful approach for integration of the disparate yet complimentary types of data that result from such high-throughput analyses. There are a growing number of public databases, as well as tools for visualization and analysis of networks. However, such databases and tools have yet to be widely utilized by bench scientists, as they generally lack features for effective analysis and integration of both public and private datasets and do not offer an intuitive interface for use by biological scientists with limited computational expertise.
We describe BioNetwork Bench, an open source, user-friendly suite of database and software tools for constructing, querying, and analyzing gene and protein network models. BioNetwork Bench currently supports a broad class of gene and protein network models (eg, weighted and un-weighted, undirected graphs, multi-graphs). It enables biologists to analyze public as well as private gene expression, macromolecular interaction and annotation data; interactively query gene expression datasets; integrate data from multiple networks; query multiple networks for interactions of interest; store and selectively share the data as well as results of analyses. BioNetwork Bench is implemented as a plug-in for, and hence is fully interoperable with, Cytoscape, a popular open-source software suite for visualizing macromolecular interaction networks. Finally, we describe an application of BioNetwork Bench to the problem of assembly and iterative expansion of a gene network that controls the differentiation of retinal progenitor cells into rod photoreceptors.
BioNetwork Bench provides a suite of open source software for construction, querying, and selective sharing of gene and protein networks. Although initially aimed at a community of biologists interested in retinal development, the tool can be adapted easily to work with other biological systems simply by populating the associated database with the relevant datasets.
PMCID: PMC3498971
network analysis; software; network contruction; network integration
2.  Gene expression changes during retinal development and rod specification 
Molecular Vision  2015;21:61-87.
Retinitis pigmentosa (RP) typically results from individual mutations in any one of >70 genes that cause rod photoreceptor cells to degenerate prematurely, eventually resulting in blindness. Gene therapies targeting individual RP genes have shown efficacy at clinical trial; however, these therapies require the surviving photoreceptor cells to be viable and functional, and may be economically feasible for only the more commonly mutated genes. An alternative potential treatment strategy, particularly for late stage disease, may involve stem cell transplants into the photoreceptor layer of the retina. Rod progenitors from postnatal mouse retinas can be transplanted and can form photoreceptors in recipient adult retinas; optimal numbers of transplantable cells are obtained from postnatal day 3–5 (P3–5) retinas. These cells can also be expanded in culture; however, this results in the loss of photoreceptor potential. Gene expression differences between postnatal retinas, cultured retinal progenitor cells (RPCs), and rod photoreceptor precursors were investigated to identify gene expression patterns involved in the specification of rod photoreceptors.
Microarrays were used to investigate differences in gene expression between cultured RPCs that have lost photoreceptor potential, P1 retinas, and fresh P5 retinas that contain significant numbers of transplantable photoreceptors. Additionally, fluorescence-activated cell sorting (FACS) sorted Rho-eGFP-expressing rod photoreceptor precursors were compared with Rho-eGFP-negative cells from the same P5 retinas. Differential expression was confirmed with quantitative polymerase chain reaction (q-PCR).
Analysis of the microarray data sets, including the use of t-distributed stochastic neighbor embedding (t-SNE) to identify expression pattern neighbors of key photoreceptor specific genes, resulted in the identification of 636 genes differentially regulated during rod specification. Forty-four of these genes when mutated have previously been found to cause retinal disease. Although gene function in other tissues may be known, the retinal function of approximately 61% of the gene list is as yet undetermined. Many of these genes’ promoters contain binding sites for the key photoreceptor transcription factors Crx and Nr2e3; moreover, the genomic clustering of differentially regulated genes appears to be non-random.
This study aids in understanding gene expression differences between rod photoreceptor progenitors versus cultured RPCs that have lost photoreceptor potential. The results provide insights into rod photoreceptor development and should expedite the development of cell-based treatments for RP. Furthermore, the data set includes a large number of retinopathy genes; less-well-characterized genes within this data set are a resource for those seeking to identify novel retinopathy genes in patients with RP (GEO accession: GSE59201).
PMCID: PMC4300221
3.  Gene expression changes during retinal development and rod specification 
Molecular Vision  2015;21:61-87.
Retinitis pigmentosa (RP) typically results from individual mutations in any one of >70 genes that cause rod photoreceptor cells to degenerate prematurely, eventually resulting in blindness. Gene therapies targeting individual RP genes have shown efficacy at clinical trial; however, these therapies require the surviving photoreceptor cells to be viable and functional, and may be economically feasible for only the more commonly mutated genes. An alternative potential treatment strategy, particularly for late stage disease, may involve stem cell transplants into the photoreceptor layer of the retina. Rod progenitors from postnatal mouse retinas can be transplanted and can form photoreceptors in recipient adult retinas; optimal numbers of transplantable cells are obtained from postnatal day 3–5 (P3–5) retinas. These cells can also be expanded in culture; however, this results in the loss of photoreceptor potential. Gene expression differences between postnatal retinas, cultured retinal progenitor cells (RPCs), and rod photoreceptor precursors were investigated to identify gene expression patterns involved in the specification of rod photoreceptors.
Microarrays were used to investigate differences in gene expression between cultured RPCs that have lost photoreceptor potential, P1 retinas, and fresh P5 retinas that contain significant numbers of transplantable photoreceptors. Additionally, fluorescence-activated cell sorting (FACS) sorted Rho-eGFP-expressing rod photoreceptor precursors were compared with Rho-eGFP-negative cells from the same P5 retinas. Differential expression was confirmed with quantitative polymerase chain reaction (q-PCR).
Analysis of the microarray data sets, including the use of t-distributed stochastic neighbor embedding (t-SNE) to identify expression pattern neighbors of key photoreceptor specific genes, resulted in the identification of 636 genes differentially regulated during rod specification. Forty-four of these genes when mutated have previously been found to cause retinal disease. Although gene function in other tissues may be known, the retinal function of approximately 61% of the gene list is as yet undetermined. Many of these genes’ promoters contain binding sites for the key photoreceptor transcription factors Crx and Nr2e3; moreover, the genomic clustering of differentially regulated genes appears to be non-random.
This study aids in understanding gene expression differences between rod photoreceptor progenitors versus cultured RPCs that have lost photoreceptor potential. The results provide insights into rod photoreceptor development and should expedite the development of cell-based treatments for RP. Furthermore, the data set includes a large number of retinopathy genes; less-well-characterized genes within this data set are a resource for those seeking to identify novel retinopathy genes in patients with RP (GEO accession: GSE59201).
PMCID: PMC4301594
4.  Using Evolutionary Conserved Modules in Gene Networks as a Strategy to Leverage High Throughput Gene Expression Queries 
PLoS ONE  2010;5(9):e12525.
Large-scale gene expression studies have not yielded the expected insight into genetic networks that control complex processes. These anticipated discoveries have been limited not by technology, but by a lack of effective strategies to investigate the data in a manageable and meaningful way. Previous work suggests that using a pre-determined seed-network of gene relationships to query large-scale expression datasets is an effective way to generate candidate genes for further study and network expansion or enrichment. Based on the evolutionary conservation of gene relationships, we test the hypothesis that a seed network derived from studies of retinal cell determination in the fly, Drosophila melanogaster, will be an effective way to identify novel candidate genes for their role in mouse retinal development.
Methodology/Principal Findings
Our results demonstrate that a number of gene relationships regulating retinal cell differentiation in the fly are identifiable as pairwise correlations between genes from developing mouse retina. In addition, we demonstrate that our extracted seed-network of correlated mouse genes is an effective tool for querying datasets and provides a context to generate hypotheses. Our query identified 46 genes correlated with our extracted seed-network members. Approximately 54% of these candidates had been previously linked to the developing brain and 33% had been previously linked to the developing retina. Five of six candidate genes investigated further were validated by experiments examining spatial and temporal protein expression in the developing retina.
We present an effective strategy for pursuing a systems biology approach that utilizes an evolutionary comparative framework between two model organisms, fly and mouse. Future implementation of this strategy will be useful to determine the extent of network conservation, not just gene conservation, between species and will facilitate the use of prior biological knowledge to develop rational systems-based hypotheses.
PMCID: PMC2932711  PMID: 20824082
5.  Machine learning approaches to supporting the identification of photoreceptor-enriched genes based on expression data 
BMC Bioinformatics  2006;7:116.
Retinal photoreceptors are highly specialised cells, which detect light and are central to mammalian vision. Many retinal diseases occur as a result of inherited dysfunction of the rod and cone photoreceptor cells. Development and maintenance of photoreceptors requires appropriate regulation of the many genes specifically or highly expressed in these cells. Over the last decades, different experimental approaches have been developed to identify photoreceptor enriched genes. Recent progress in RNA analysis technology has generated large amounts of gene expression data relevant to retinal development. This paper assesses a machine learning methodology for supporting the identification of photoreceptor enriched genes based on expression data.
Based on the analysis of publicly-available gene expression data from the developing mouse retina generated by serial analysis of gene expression (SAGE), this paper presents a predictive methodology comprising several in silico models for detecting key complex features and relationships encoded in the data, which may be useful to distinguish genes in terms of their functional roles. In order to understand temporal patterns of photoreceptor gene expression during retinal development, a two-way cluster analysis was firstly performed. By clustering SAGE libraries, a hierarchical tree reflecting relationships between developmental stages was obtained. By clustering SAGE tags, a more comprehensive expression profile for photoreceptor cells was revealed. To demonstrate the usefulness of machine learning-based models in predicting functional associations from the SAGE data, three supervised classification models were compared. The results indicated that a relatively simple instance-based model (KStar model) performed significantly better than relatively more complex algorithms, e.g. neural networks. To deal with the problem of functional class imbalance occurring in the dataset, two data re-sampling techniques were studied. A random over-sampling method supported the implementation of the most powerful prediction models. The KStar model was also able to achieve higher predictive sensitivities and specificities using random over-sampling techniques.
The approaches assessed in this paper represent an efficient and relatively inexpensive in silico methodology for supporting large-scale analysis of photoreceptor gene expression by SAGE. They may be applied as complementary methodologies to support functional predictions before implementing more comprehensive, experimental prediction and validation methods. They may also be combined with other large-scale, data-driven methods to facilitate the inference of transcriptional regulatory networks in the developing retina. Furthermore, the methodology assessed may be applied to other data domains.
PMCID: PMC1421439  PMID: 16524483
6.  Structural and functional protein network analyses predict novel signaling functions for rhodopsin 
Proteomic analyses, literature mining, and structural data were combined to generate an extensive signaling network linked to the visual G protein-coupled receptor rhodopsin. Network analysis suggests novel signaling routes to cytoskeleton dynamics and vesicular trafficking.
Using a shotgun proteomic approach, we identified the protein inventory of the light sensing outer segment of the mammalian photoreceptor.These data, combined with literature mining, structural modeling, and computational analysis, offer a comprehensive view of signal transduction downstream of the visual G protein-coupled receptor rhodopsin.The network suggests novel signaling branches downstream of rhodopsin to cytoskeleton dynamics and vesicular trafficking.The network serves as a basis for elucidating physiological principles of photoreceptor function and suggests potential disease-associated proteins.
Photoreceptor cells are neurons capable of converting light into electrical signals. The rod outer segment (ROS) region of the photoreceptor cells is a cellular structure made of a stack of around 800 closed membrane disks loaded with rhodopsin (Liang et al, 2003; Nickell et al, 2007). In disc membranes, rhodopsin arranges itself into paracrystalline dimer arrays, enabling optimal association with the heterotrimeric G protein transducin as well as additional regulatory components (Ciarkowski et al, 2005). Disruption of these highly regulated structures and processes by germline mutations is the cause of severe blinding diseases such as retinitis pigmentosa, macular degeneration, or congenital stationary night blindness (Berger et al, 2010).
Traditionally, signal transduction networks have been studied by combining biochemical and genetic experiments addressing the relations among a small number of components. More recently, large throughput experiments using different techniques like two hybrid or co-immunoprecipitation coupled to mass spectrometry have added a new level of complexity (Ito et al, 2001; Gavin et al, 2002, 2006; Ho et al, 2002; Rual et al, 2005; Stelzl et al, 2005). However, in these studies, space, time, and the fact that many interactions detected for a particular protein are not compatible, are not taken into consideration. Structural information can help discriminate between direct and indirect interactions and more importantly it can determine if two or more predicted partners of any given protein or complex can simultaneously bind a target or rather compete for the same interaction surface (Kim et al, 2006).
In this work, we build a functional and dynamic interaction network centered on rhodopsin on a systems level, using six steps: In step 1, we experimentally identified the proteomic inventory of the porcine ROS, and we compared our data set with a recent proteomic study from bovine ROS (Kwok et al, 2008). The union of the two data sets was defined as the ‘initial experimental ROS proteome'. After removal of contaminants and applying filtering methods, a ‘core ROS proteome', consisting of 355 proteins, was defined.
In step 2, proteins of the core ROS proteome were assigned to six functional modules: (1) vision, signaling, transporters, and channels; (2) outer segment structure and morphogenesis; (3) housekeeping; (4) cytoskeleton and polarity; (5) vesicles formation and trafficking, and (6) metabolism.
In step 3, a protein-protein interaction network was constructed based on the literature mining. Since for most of the interactions experimental evidence was co-immunoprecipitation, or pull-down experiments, and in addition many of the edges in the network are supported by single experimental evidence, often derived from high-throughput approaches, we refer to this network, as ‘fuzzy ROS interactome'. Structural information was used to predict binary interactions, based on the finding that similar domain pairs are likely to interact in a similar way (‘nature repeats itself') (Aloy and Russell, 2002). To increase the confidence in the resulting network, edges supported by a single evidence not coming from yeast two-hybrid experiments were removed, exception being interactions where the evidence was the existence of a three-dimensional structure of the complex itself, or of a highly homologous complex. This curated static network (‘high-confidence ROS interactome') comprises 660 edges linking the majority of the nodes. By considering only edges supported by at least one evidence of direct binary interaction, we end up with a ‘high-confidence binary ROS interactome'. We next extended the published core pathway (Dell'Orco et al, 2009) using evidence from our high-confidence network. We find several new direct binary links to different cellular functional processes (Figure 4): the active rhodopsin interacts with Rac1 and the GTP form of Rho. There is also a connection between active rhodopsin and Arf4, as well as PDEδ with Rab13 and the GTP-bound form of Arl3 that links the vision cycle to vesicle trafficking and structure. We see a connection between PDEδ with prenyl-modified proteins, such as several small GTPases, as well as with rhodopsin kinase. Further, our network reveals several direct binary connections between Ca2+-regulated proteins and cytoskeleton proteins; these are CaMK2A with actinin, calmodulin with GAP43 and S1008, and PKC with 14-3-3 family members.
In step 4, part of the network was experimentally validated using three different approaches to identify physical protein associations that would occur under physiological conditions: (i) Co-segregation/co-sedimentation experiments, (ii) immunoprecipitations combined with mass spectrometry and/or subsequent immunoblotting, and (iii) utilizing the glycosylated N-terminus of rhodopsin to isolate its associated protein partners by Concanavalin A affinity purification. In total, 60 co-purification and co-elution experiments supported interactions that were already in our literature network, and new evidence from 175 co-IP experiments in this work was added. Next, we aimed to provide additional independent experimental confirmation for two of the novel networks and functional links proposed based on the network analysis: (i) the proposed complex between Rac1/RhoA/CRMP-2/tubulin/and ROCK II in ROS was investigated by culturing retinal explants in the presence of an ROCK II-specific inhibitor (Figure 6). While morphology of the retinas treated with ROCK II inhibitor appeared normal, immunohistochemistry analyses revealed several alterations on the protein level. (ii) We supported the hypothesis that PDEδ could function as a GDI for Rac1 in ROS, by demonstrating that PDEδ and Rac1 co localize in ROS and that PDEδ could dissociate Rac1 from ROS membranes in vitro.
In step 5, we use structural information to distinguish between mutually compatible (‘AND') or excluded (‘XOR') interactions. This enables breaking a network of nodes and edges into functional machines or sub-networks/modules. In the vision branch, both ‘AND' and ‘XOR' gates synergize. This may allow dynamic tuning of light and dark states. However, all connections from the vision module to other modules are ‘XOR' connections suggesting that competition, in connection with local protein concentration changes, could be important for transmitting signals from the core vision module.
In the last step, we map and functionally characterize the known mutations that produce blindness.
In summary, this represents the first comprehensive, dynamic, and integrative rhodopsin signaling network, which can be the basis for integrating and mapping newly discovered disease mutants, to guide protein or signaling branch-specific therapies.
Orchestration of signaling, photoreceptor structural integrity, and maintenance needed for mammalian vision remain enigmatic. By integrating three proteomic data sets, literature mining, computational analyses, and structural information, we have generated a multiscale signal transduction network linked to the visual G protein-coupled receptor (GPCR) rhodopsin, the major protein component of rod outer segments. This network was complemented by domain decomposition of protein–protein interactions and then qualified for mutually exclusive or mutually compatible interactions and ternary complex formation using structural data. The resulting information not only offers a comprehensive view of signal transduction induced by this GPCR but also suggests novel signaling routes to cytoskeleton dynamics and vesicular trafficking, predicting an important level of regulation through small GTPases. Further, it demonstrates a specific disease susceptibility of the core visual pathway due to the uniqueness of its components present mainly in the eye. As a comprehensive multiscale network, it can serve as a basis to elucidate the physiological principles of photoreceptor function, identify potential disease-associated genes and proteins, and guide the development of therapies that target specific branches of the signaling pathway.
PMCID: PMC3261702  PMID: 22108793
protein interaction network; rhodopsin signaling; structural modeling
7.  A role for prenylated rab acceptor 1 in vertebrate photoreceptor development 
BMC Neuroscience  2012;13:152.
The rd1 mouse retina is a well-studied model of retinal degeneration where rod photoreceptors undergo cell death beginning at postnatal day (P) 10 until P21. This period coincides with photoreceptor terminal differentiation in a normal retina. We have used the rd1 retina as a model to investigate early molecular defects in developing rod photoreceptors prior to the onset of degeneration.
Using a microarray approach, we performed gene profiling comparing rd1 and wild type (wt) retinas at four time points starting at P2, prior to any obvious biochemical or morphological differences, and concluding at P8, prior to the initiation of cell death. Of the 143 identified differentially expressed genes, we focused on Rab acceptor 1 (Rabac1), which codes for the protein Prenylated rab acceptor 1 (PRA1) and plays an important role in vesicular trafficking. Quantitative RT-PCR analysis confirmed reduced expression of PRA1 in rd1 retina at all time points examined. Immunohistochemical observation showed that PRA1-like immunoreactivity (LIR) co-localized with the cis-Golgi marker GM-130 in the photoreceptor as the Golgi translocated from the perikarya to the inner segment during photoreceptor differentiation in wt retinas. Diffuse PRA1-LIR, distinct from the Golgi marker, was seen in the distal inner segment of wt photoreceptors starting at P8. Both plexiform layers contained PRA1 positive punctae independent of GM-130 staining during postnatal development. In the inner retina, PRA1-LIR also colocalized with the Golgi marker in the perinuclear region of most cells. A similar pattern was seen in the rd1 mouse inner retina. However, punctate and significantly reduced PRA1-LIR was present throughout the developing rd1 inner segment, consistent with delayed photoreceptor development and abnormalities in Golgi sorting and vesicular trafficking.
We have identified genes that are differentially regulated in the rd1 retina at early time points, which may give insights into developmental defects that precede photoreceptor cell death. This is the first report of PRA1 expression in the retina. Our data support the hypothesis that PRA1 plays an important role in vesicular trafficking between the Golgi and cilia in differentiating and mature rod photoreceptors.
PMCID: PMC3576285  PMID: 23241222
Retina; Photoreceptor; Mouse; Retinal degeneration; Photoreceptor development; Rabac1; Prenylated Rab Acceptor 1; Rab6; Vesicular trafficking
8.  Microarray Analysis of XOPS-mCFP Zebrafish Retina Identifies Genes Associated with Rod Photoreceptor Degeneration and Regeneration 
This report presents an analysis of the retinal gene expression profile in a transgenic strain of zebrafish that experiences a continuous cycle of rod photoreceptor development and regeneration.
XOPS-mCFP transgenic zebrafish experience a continual cycle of rod photoreceptor development and degeneration throughout life, making them a useful model for investigating the molecular determinants of rod photoreceptor regeneration. The purpose of this study was to compare the gene expression profiles of wild-type and XOPS-mCFP retinas and identify genes that may contribute to the regeneration of the rods.
Adult wild-type and XOPS-mCFP retinal mRNA was subjected to microarray analysis. Pathway analysis was used to identify biologically relevant processes that were significantly represented in the dataset. Expression changes were verified by RT-PCR. Selected genes were further examined during retinal development and in adult retinas by in situ hybridization and immunohistochemistry and in a transgenic fluorescent reporter line.
More than 600 genes displayed significant expression changes in XOPS-mCFP retinas compared with expression in wild-type controls. Many of the downregulated genes were associated with phototransduction, whereas upregulated genes were associated with several biological functions, including cell cycle, DNA replication and repair, and cell development and death. RT-PCR analysis of a subset of these genes confirmed the microarray results. Three transcription factors (sox11b, insm1a, and c-myb), displaying increased expression in XOPS-mCFP retinas, were also expressed throughout retinal development and in the persistently neurogenic ciliary marginal zone.
This study identified numerous gene expression changes in response to rod degeneration in zebrafish and further suggests a role for the transcriptional regulators sox11b, insm1a, and c-myb in both retinal development and rod photoreceptor regeneration.
PMCID: PMC3080176  PMID: 21217106
9.  Patient-specific iPSC-derived photoreceptor precursor cells as a means to investigate retinitis pigmentosa 
eLife  2013;2:e00824.
Next-generation and Sanger sequencing were combined to identify disease-causing USH2A mutations in an adult patient with autosomal recessive RP. Induced pluripotent stem cells (iPSCs), generated from the patient’s keratinocytes, were differentiated into multi-layer eyecup-like structures with features of human retinal precursor cells. The inner layer of the eyecups contained photoreceptor precursor cells that expressed photoreceptor markers and exhibited axonemes and basal bodies characteristic of outer segments. Analysis of the USH2A transcripts of these cells revealed that one of the patient’s mutations causes exonification of intron 40, a translation frameshift and a premature stop codon. Western blotting revealed upregulation of GRP78 and GRP94, suggesting that the patient’s other USH2A variant (Arg4192His) causes disease through protein misfolding and ER stress. Transplantation into 4-day-old immunodeficient Crb1−/− mice resulted in the formation of morphologically and immunohistochemically recognizable photoreceptor cells, suggesting that the mutations in this patient act via post-developmental photoreceptor degeneration.
eLife digest
Retinitis pigmentosa is an inherited disorder in which the gradual degeneration of light-sensitive cells in the outer retina, known as photoreceptors, causes a progressive loss of sight. Retinitis pigmentosa can also occur as part of a wider syndrome: patients with Usher syndrome, for example, suffer from early-onset deafness and then develop retinitis pigmentosa later in life. Usher syndrome is caused by mutations in any of more than ten genes, but the most commonly affected is USH2A, which encodes a protein called usherin. Mutations in USH2A can also cause retinitis pigmentosa on its own.
Clinical trials are underway to determine whether it is possible to treat various forms of inherited retinal degeneration using gene therapy. This involves inserting a functional copy of the gene associated with the disease into an inactivated virus, which is then injected into the eye. The virus carries the target gene to the light-sensitive photoreceptor cells where it can replace the faulty gene. This could be particularly useful for conditions such as Usher syndrome, in which the early-onset deafness makes it possible to diagnose retinitis pigmentosa before substantial numbers of photoreceptor cells have been lost.
For gene therapy to become a widely used strategy for the treatment of retinal degenerative disease, identification and functional interrogation of the disease-causing gene/mutations will be critical. This is especially true for large highly polymorphic genes such as USH2A that often have mutations that are difficult to identify by standard sequencing techniques. Likewise, viruses that can carry large amounts of genetic material, or endogenous genome editing approaches, will need to be developed and validated in an efficient patient-specific model system.
Tucker et al. might have found a way to address these problems. In their study, they used skin cells from a retinitis pigmentosa patient with mutations in USH2A to produce induced pluripotent stem cells. These are cells that can be made to develop into a wide variety of mature cell types, depending on the exact conditions in which they are cultured. Tucker et al. used these stem cells to generate photoreceptor precursor cells, which they transplanted into the retinas of immune-suppressed mice. The cells developed into normal-looking photoreceptor cells that expressed photoreceptor-specific proteins.
These results have several implications. First, they support the idea that stem cell-derived retinal photoreceptor cells, generated from patients with unknown mutations, can be used to identify disease-causing genes and to interrogate disease pathophysiology. This will allow for a more rapid development of gene therapy strategies. Second, they demonstrate that USH2A mutations cause retinitis pigmentosa by affecting photoreceptors later in life rather than by altering their development. This suggests that it should, via early intervention, be possible to treat retinitis pigmentosa in adult patients with this form of the disease. Third, the technique could be used to generate animal models in which to study the effects of specific disease-causing mutations on cellular development and function. Finally, this study suggests that skin cells from adults with retinitis pigmentosa could be used to generate immunologically matched photoreceptor cells that can be transplanted back into the same patients to restore their sight. Many questions remain to be answered before this technique can be moved into clinical trials but, in the meantime, it will provide a new tool for research into this major cause of blindness.
PMCID: PMC3755341  PMID: 23991284
next-generation sequencing; retinal degeneration; induced pluripotent stem cells; retinal transplantation; retinal cell differentiation; retinitis pigmentosa; Human; Mouse
10.  Transcriptional Regulation of Rod Photoreceptor Homeostasis Revealed by In Vivo NRL Targetome Analysis 
PLoS Genetics  2012;8(4):e1002649.
A stringent control of homeostasis is critical for functional maintenance and survival of neurons. In the mammalian retina, the basic motif leucine zipper transcription factor NRL determines rod versus cone photoreceptor cell fate and activates the expression of many rod-specific genes. Here, we report an integrated analysis of NRL-centered gene regulatory network by coupling chromatin immunoprecipitation followed by high-throughput sequencing (ChIP–Seq) data from Illumina and ABI platforms with global expression profiling and in vivo knockdown studies. We identified approximately 300 direct NRL target genes. Of these, 22 NRL targets are associated with human retinal dystrophies, whereas 95 mapped to regions of as yet uncloned retinal disease loci. In silico analysis of NRL ChIP–Seq peak sequences revealed an enrichment of distinct sets of transcription factor binding sites. Specifically, we discovered that genes involved in photoreceptor function include binding sites for both NRL and homeodomain protein CRX. Evaluation of 26 ChIP–Seq regions validated their enhancer functions in reporter assays. In vivo knockdown of 16 NRL target genes resulted in death or abnormal morphology of rod photoreceptors, suggesting their importance in maintaining retinal function. We also identified histone demethylase Kdm5b as a novel secondary node in NRL transcriptional hierarchy. Exon array analysis of flow-sorted photoreceptors in which Kdm5b was knocked down by shRNA indicated its role in regulating rod-expressed genes. Our studies identify candidate genes for retinal dystrophies, define cis-regulatory module(s) for photoreceptor-expressed genes and provide a framework for decoding transcriptional regulatory networks that dictate rod homeostasis.
Author Summary
The rod and cone photoreceptors in the retina are highly specialized neurons that capture photons under dim and bright light, respectively. Loss of rod photoreceptors is an early clinical manifestation in most retinal neurodegenerative diseases that eventually result in cone cell death and blindness. The transcription factor NRL is a key regulator of rod photoreceptor cell fate and gene expression. Here, we report an integrated analysis of the global transcriptional targets of NRL. We have discovered that both NRL and CRX binding sites are present in genes involved in photoreceptor function, implying their close synergistic relationship. In vivo loss-of-function analysis of 16 NRL target genes in the mouse retina resulted in death or abnormal morphology of photoreceptor cells. Furthermore, we identified histone demethylase Kdm5b as a secondary node in the NRL-centered gene regulatory network. Our studies identify NRL target genes as excellent candidates for mutation screening of patients with retinal degenerative diseases, and they provide the foundation for elucidating regulation of rod homeostasis and targets for therapeutic intervention in diseases involving photoreceptor dysfunction.
PMCID: PMC3325202  PMID: 22511886
11.  Dynamic interaction networks in a hierarchically organized tissue 
We have integrated gene expression profiling with database and literature mining, mechanistic modeling, and cell culture experiments to identify intercellular and intracellular networks regulating blood stem cell self-renewal.Blood stem cell fate in vitro is regulated non-autonomously by a coupled positive–negative intercellular feedback circuit, composed of megakaryocyte-derived stimulatory growth factors (VEGF, PDGF, EGF, and serotonin) versus monocyte-derived inhibitory factors (CCL3, CCL4, CXCL10, TGFB2, and TNFSF9).The antagonistic signals converge in a core intracellular network focused around PI3K, Raf, PLC, and Akt.Model simulations enable functional classification of the novel endogenous ligands and signaling molecules.
Intercellular (between cell) communication networks are required to maintain homeostasis and coordinate regenerative and developmental cues in multicellular organisms. Despite the recognized importance of intercellular networks in regulating adult stem and progenitor cell fate, the specific cell populations involved, and the underlying molecular mechanisms are largely undefined. Although a limited number of studies have applied novel bioinformatic approaches to unravel intercellular signaling in other cell systems (Frankenstein et al, 2006), a comprehensive analysis of intercellular communication in a stem cell-derived, hierarchical tissue network has yet to be reported.
As a model system to explore intercellular communication networks in a hierarchically organized tissue, we cultured human umbilical cord blood (UCB)-derived stem and progenitor cells in defined, minimal cytokine-supplemented liquid culture (Madlambayan et al, 2006). To systematically explore the molecular and cellular dynamics underlying primitive progenitor growth and differentiation, gene expression profiles of primitive (lineage negative; Lin−) and mature (lineage positive; Lin+) populations were generated during phases of stem cell expansion versus depletion. Parallel phenotypic and subproteomic experiments validated that mRNA expression correlated with complex measures of proteome activity (protein secretion and cell surface expression). Using a curated list of secreted ligand–receptor interactions and published expression profiles of purified mature blood populations, we implemented a novel algorithm to reconstruct the intercellular signaling networks established between stem cells and multi-lineage progeny in vitro. By correlating differential expression patterns with stem cell growth, we predict cell populations, pathways, and secreted ligands associated with stem cell self-renewal and differentiation (Figure 3A).
We then tested the correlative predictions in a series of cell culture experiments. UCB progenitor cell cultures were supplemented with saturating amounts of 18 putative regulatory ligands, or cocultured with purified mature blood lineages (megakaryocytes, monocytes, and erythrocytes), and analyzed for effects on total cell, progenitor, and primitive progenitor growth. At the primitive progenitor level, 3/5 novel predicted stimulatory ligands (EGF, PDGFB, and VEGF) displayed significant positive effects, 5/7 predicted inhibitory factors (CCL3, CCL4, CXCL10, TNFSF9, and TGFB2) displayed negative effects, whereas only 1/5 non-correlated ligand (CXCL7) displayed an effect. Also consistent with predictions from gene expression data, megakaryocytes and monocytes were found to stimulate and inhibit primitive progenitor growth, respectively, and these effects were attributable to differential secretome profiles of stimulatory versus inhibitory ligands.
Cellular responses to external stimuli, particularly in heterogeneous and dynamic cell populations, represent complex functions of multiple cell fate decisions acting both directly and indirectly on the target (stem cell) populations. Experimentally distinguishing the mode of action of cytokines is thus a difficult task. To address this we used our previously published interactive model of hematopoiesis (Kirouac et al, 2009) to classify experimentally identified regulatory ligands into one of four distinct functional categories based on their differential effects on cell population growth. TGFB2 was classified as a proliferation inhibitor, CCL4, CXCL10, SPARC, and TNFSF9 as self-renewal inhibitors, CCL3 a proliferation stimulator, and EGF, VEGF, and PDGFB as self-renewal stimulators.
Stem and progenitor cells exposed to combinatorial extracellular signals must propagate this information through intracellular molecular networks, and respond appropriately by modifying cell fate decisions. To explore how our experimentally identified positive and negative regulatory signals are integrated at the intracellular level, we constructed a blood stem cell self-renewal signaling network through extensive literature curation and protein–protein interaction (PPI) network mapping. We find that signal transduction pathways activated by the various stimulatory and inhibitory ligands converge on a limited set of molecular control nodes, forming a core subnetwork enriched for known regulators of self-renewal (Figure 6A). To experimentally test the intracellular signaling molecules computationally predicted as regulators of stem cell self-renewal, we obtained five small molecule antagonists against the kinases Phosphatidylinositol 3-kinase (PI3K), Raf, Akt, Phospholipase C (PLC), and MEK1. Liquid cultures were supplemented with the five molecules individually, and resultant cell population outputs compared against model simulations to deconvolute the functional effects on proliferation (and survival) versus self-renewal. This analysis classifies inhibition of PI3K and Raf activity as selectively targeting self-renewal, PLC as selectively targeting survival, and Akt as selectively targeting proliferation; MEK inhibition appears non-specific for these processes.
This represents the first systematic characterization of how cell fate decisions are regulated non-autonomously through lineage-specific interactions with differentiated progeny. The complex intercellular communication networks can be approximated as an antagonistic positive–negative feedback circuit, wherein progenitor expansion is modulated by a balance of megakaryocyte-derived stimulatory factors (EGF, PDGF, VEGF, and possibly serotonin) versus monocyte-derived inhibitory factors (CCL3, CCL4, CXCL10, TGFB2, and TNFSF9). This complex milieu of endogenous regulatory signals is integrated and processed within a core intracellular signaling network, resulting in modulation of cell-level kinetic parameters (proliferation, survival, and self-renewal). We reconstruct a stem cell associated intracellular network, and identify PI3K, Raf, Akt, and PLC as functionally distinct signal integration nodes, linking extracellular and intracellular signaling. These findings lay the groundwork for novel strategies to control blood stem cell self-renewal in vitro and in vivo.
Intercellular (between cell) communication networks maintain homeostasis and coordinate regenerative and developmental cues in multicellular organisms. Despite the importance of intercellular networks in stem cell biology, their rules, structure and molecular components are poorly understood. Herein, we describe the structure and dynamics of intercellular and intracellular networks in a stem cell derived, hierarchically organized tissue using experimental and theoretical analyses of cultured human umbilical cord blood progenitors. By integrating high-throughput molecular profiling, database and literature mining, mechanistic modeling, and cell culture experiments, we show that secreted factor-mediated intercellular communication networks regulate blood stem cell fate decisions. In particular, self-renewal is modulated by a coupled positive–negative intercellular feedback circuit composed of megakaryocyte-derived stimulatory growth factors (VEGF, PDGF, EGF, and serotonin) versus monocyte-derived inhibitory factors (CCL3, CCL4, CXCL10, TGFB2, and TNFSF9). We reconstruct a stem cell intracellular network, and identify PI3K, Raf, Akt, and PLC as functionally distinct signal integration nodes, linking extracellular, and intracellular signaling. This represents the first systematic characterization of how stem cell fate decisions are regulated non-autonomously through lineage-specific interactions with differentiated progeny.
PMCID: PMC2990637  PMID: 20924352
cellular networks; hematopoiesis; intercellular signaling; self-renewal; stem cells
12.  VAN: an R package for identifying biologically perturbed networks via differential variability analysis 
BMC Research Notes  2013;6:430.
Large-scale molecular interaction networks are dynamic in nature and are of special interest in the analysis of complex diseases, which are characterized by network-level perturbations rather than changes in individual genes/proteins. The methods developed for the identification of differentially expressed genes or gene sets are not suitable for network-level analyses. Consequently, bioinformatics approaches that enable a joint analysis of high-throughput transcriptomics datasets and large-scale molecular interaction networks for identifying perturbed networks are gaining popularity. Typically, these approaches require the sequential application of multiple bioinformatics techniques – ID mapping, network analysis, and network visualization. Here, we present the Variability Analysis in Networks (VAN) software package: a collection of R functions to streamline this bioinformatics analysis.
VAN determines whether there are network-level perturbations across biological states of interest. It first identifies hubs (densely connected proteins/microRNAs) in a network and then uses them to extract network modules (comprising of a hub and all its interaction partners). The function identifySignificantHubs identifies dysregulated modules (i.e. modules with changes in expression correlation between a hub and its interaction partners) using a single expression and network dataset. The function summarizeHubData identifies dysregulated modules based on a meta-analysis of multiple expression and/or network datasets. VAN also converts protein identifiers present in a MITAB-formatted interaction network to gene identifiers (UniProt identifier to Entrez identifier or gene symbol using the function generatePpiMap) and generates microRNA-gene interaction networks using TargetScan and Microcosm databases (generateMicroRnaMap). The function obtainCancerInfo is used to identify hubs (corresponding to significantly perturbed modules) that are already causally associated with cancer(s) in the Cancer Gene Census database. Additionally, VAN supports the visualization of changes to network modules in R and Cytoscape (visualizeNetwork and obtainPairSubset, respectively). We demonstrate the utility of VAN using a gene expression data from metastatic melanoma and a protein-protein interaction network from the Human Protein Reference Database.
Our package provides a comprehensive and user-friendly platform for the integrative analysis of -omics data to identify disease-associated network modules. This bioinformatics approach, which is essentially focused on the question of explaining phenotype with a 'network type’ and in particular, how regulation is changing among different states of interest, is relevant to many questions including those related to network perturbations across developmental timelines.
PMCID: PMC4015612  PMID: 24156242
Protein-protein interaction networks; Network modules; Melanoma
13.  A Scalable Approach for Discovering Conserved Active Subnetworks across Species 
PLoS Computational Biology  2010;6(12):e1001028.
Overlaying differential changes in gene expression on protein interaction networks has proven to be a useful approach to interpreting the cell's dynamic response to a changing environment. Despite successes in finding active subnetworks in the context of a single species, the idea of overlaying lists of differentially expressed genes on networks has not yet been extended to support the analysis of multiple species' interaction networks. To address this problem, we designed a scalable, cross-species network search algorithm, neXus (Network - cross(X)-species - Search), that discovers conserved, active subnetworks based on parallel differential expression studies in multiple species. Our approach leverages functional linkage networks, which provide more comprehensive coverage of functional relationships than physical interaction networks by combining heterogeneous types of genomic data. We applied our cross-species approach to identify conserved modules that are differentially active in stem cells relative to differentiated cells based on parallel gene expression studies and functional linkage networks from mouse and human. We find hundreds of conserved active subnetworks enriched for stem cell-associated functions such as cell cycle, DNA repair, and chromatin modification processes. Using a variation of this approach, we also find a number of species-specific networks, which likely reflect mechanisms of stem cell function that have diverged between mouse and human. We assess the statistical significance of the subnetworks by comparing them with subnetworks discovered on random permutations of the differential expression data. We also describe several case examples that illustrate the utility of comparative analysis of active subnetworks.
Author Summary
Microarrays are a powerful tool for discovering genes whose expression is associated with a particular biological process or phenotype. Differential expression analysis can often generate a list of several hundred or even thousands of significant genes. While these genes represent real expression differences, the large number of candidates can make the process of hypothesis generation for further experimental studies challenging. Use of complementary datasets such as protein-protein interactions can help filter such candidate lists to genes involved with the most relevant pathways. This approach has been applied successfully by many groups, but to date, no one has developed an approach for discovering active pathways or subnetworks that are conserved across multiple species. We propose an algorithm, neXus (Network – cross(X)-species – Search), for cross-species active subnetwork discovery given candidate gene lists from two species and weighted protein-protein interaction networks. We validate our approach on expression studies from human and mouse stem cells. We find many active subnetworks that are conserved across species relevant to stem cell biology as well as other subnetworks that show species-specific behavior. We show that these networks are not likely to have been discovered by chance and discuss several specific cases that reveal potentially novel stem cell biology.
PMCID: PMC3000367  PMID: 21170309
14.  EnRICH: Extraction and Ranking using Integration and Criteria Heuristics 
High throughput screening technologies enable biologists to generate candidate genes at a rate that, due to time and cost constraints, cannot be studied by experimental approaches in the laboratory. Thus, it has become increasingly important to prioritize candidate genes for experiments. To accomplish this, researchers need to apply selection requirements based on their knowledge, which necessitates qualitative integration of heterogeneous data sources and filtration using multiple criteria. A similar approach can also be applied to putative candidate gene relationships. While automation can assist in this routine and imperative procedure, flexibility of data sources and criteria must not be sacrificed. A tool that can optimize the trade-off between automation and flexibility to simultaneously filter and qualitatively integrate data is needed to prioritize candidate genes and generate composite networks from heterogeneous data sources.
We developed the java application, EnRICH (Extraction and Ranking using Integration and Criteria Heuristics), in order to alleviate this need. Here we present a case study in which we used EnRICH to integrate and filter multiple candidate gene lists in order to identify potential retinal disease genes. As a result of this procedure, a candidate pool of several hundred genes was narrowed down to five candidate genes, of which four are confirmed retinal disease genes and one is associated with a retinal disease state.
We developed a platform-independent tool that is able to qualitatively integrate multiple heterogeneous datasets and use different selection criteria to filter each of them, provided the datasets are tables that have distinct identifiers (required) and attributes (optional). With the flexibility to specify data sources and filtering criteria, EnRICH automatically prioritizes candidate genes or gene relationships for biologists based on their specific requirements. Here, we also demonstrate that this tool can be effectively and easily used to apply highly specific user-defined criteria and can efficiently identify high quality candidate genes from relatively sparse datasets.
PMCID: PMC3564850  PMID: 23320748
Qualitative integration; High-throughput data; Heterogeneous data; Network; Network visualization; Candidate prioritization
15.  Harnessing Diversity towards the Reconstructing of Large Scale Gene Regulatory Networks 
PLoS Computational Biology  2013;9(11):e1003361.
Elucidating gene regulatory network (GRN) from large scale experimental data remains a central challenge in systems biology. Recently, numerous techniques, particularly consensus driven approaches combining different algorithms, have become a potentially promising strategy to infer accurate GRNs. Here, we develop a novel consensus inference algorithm, TopkNet that can integrate multiple algorithms to infer GRNs. Comprehensive performance benchmarking on a cloud computing framework demonstrated that (i) a simple strategy to combine many algorithms does not always lead to performance improvement compared to the cost of consensus and (ii) TopkNet integrating only high-performance algorithms provide significant performance improvement compared to the best individual algorithms and community prediction. These results suggest that a priori determination of high-performance algorithms is a key to reconstruct an unknown regulatory network. Similarity among gene-expression datasets can be useful to determine potential optimal algorithms for reconstruction of unknown regulatory networks, i.e., if expression-data associated with known regulatory network is similar to that with unknown regulatory network, optimal algorithms determined for the known regulatory network can be repurposed to infer the unknown regulatory network. Based on this observation, we developed a quantitative measure of similarity among gene-expression datasets and demonstrated that, if similarity between the two expression datasets is high, TopkNet integrating algorithms that are optimal for known dataset perform well on the unknown dataset. The consensus framework, TopkNet, together with the similarity measure proposed in this study provides a powerful strategy towards harnessing the wisdom of the crowds in reconstruction of unknown regulatory networks.
Author Summary
Elucidating gene regulatory networks is crucial to understand disease mechanisms at the system level. A large number of algorithms have been developed to infer gene regulatory networks from gene-expression datasets. If you remember the success of IBM's Watson in ”Jeopardy!„ quiz show, the critical features of Watson were the use of very large numbers of heterogeneous algorithms generating various hypotheses and to select one of which as the answer. We took similar approach, “TopkNet”, to see if “Wisdom of Crowd” approach can be applied for network reconstruction. We discovered that “Wisdom of Crowd” is a powerful approach where integration of optimal algorithms for a given dataset can achieve better results than the best individual algorithm. However, such an analysis begs the question “How to choose optimal algorithms for a given dataset?” We found that similarity among gene-expression datasets is a key to select optimal algorithms, i.e., if dataset A for which optimal algorithms are known is similar to dataset B, the optimal algorithms for dataset A may be also optimal for dataset B. Thus, our “TopkNet” together with similarity measure among datasets can provide a powerful strategy towards harnessing “Wisdom of Crowd” in high-quality reconstruction of gene regulatory networks.
PMCID: PMC3836705  PMID: 24278007
16.  Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets 
PLoS Computational Biology  2010;6(4):e1000742.
Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes.
Author Summary
The generation of high-dimensional datasets in the biological sciences has become routine (protein interaction, gene expression, and DNA/RNA sequence data, to name a few), stretching our ability to derive novel biological insights from them, with even less effort focused on integrating these disparate datasets available in the public domain. Hence a most pressing problem in the life sciences today is the development of algorithms to combine large-scale data on different biological dimensions to maximize our understanding of living systems. We present an algorithm for simultaneously clustering multiple biological networks to identify coherent sets of genes (clusters) underlying cellular processes. The algorithm allows theoretical guarantees on the quality of the detected clusters relative to the optimal clusters that are computationally infeasible to find, and could be applied to coexpression, protein interaction, protein-DNA networks, and other network types. When combining multiple physical and gene expression based networks in yeast, the clusters we identify are consistently enriched for reference classes capturing diverse aspects of biology, yield good coverage of the analysed genes, and highlight novel members in well-studied cellular processes.
PMCID: PMC2855327  PMID: 20419151
17.  A biphasic pattern of gene expression during mouse retina development 
Between embryonic day 12 and postnatal day 21, six major neuronal and one glia cell type are generated from multipotential progenitors in a characteristic sequence during mouse retina development. We investigated expression patterns of retina transcripts during the major embryonic and postnatal developmental stages to provide a systematic view of normal mouse retina development,
A tissue-specific cDNA microarray was generated using a set of sequence non-redundant EST clones collected from mouse retina. Eleven stages of mouse retina, from embryonic day 12.5 (El2.5) to postnatal day 21 (PN21), were collected for RNA isolation. Non-amplified RNAs were labeled for microarray experiments and three sets of data were analyzed for significance, hierarchical relationships, and functional clustering. Six individual gene expression clusters were identified based on expression patterns of transcripts through retina development. Two developmental phases were clearly divided with postnatal day 5 (PN5) as a separate cluster. Among 4,180 transcripts that changed significantly during development, approximately 2/3 of the genes were expressed at high levels up until PN5 and then declined whereas the other 1/3 of the genes increased expression from PN5 and remained at the higher levels until at least PN21. Less than 1% of the genes observed showed a peak of expression between the two phases. Among the later increased population, only about 40% genes are correlated with rod photoreceptors, indicating that multiple cell types contributed to gene expression in this phase. Within the same functional classes, however, different gene populations were expressed in distinct developmental phases. A correlation coefficient analysis of gene expression during retina development between previous SAGE studies and this study was also carried out.
This study provides a complementary genome-wide view of common gene dynamics and a broad molecular classification of mouse retina development. Different genes in the same functional clusters are expressed in the different developmental stages, suggesting that cells might change gene expression profiles from differentiation to maturation stages. We propose that large-scale changes in gene regulation during development are necessary for the final maturation and function of the retina.
PMCID: PMC1633734  PMID: 17044933
18.  Learning Gene Networks under SNP Perturbations Using eQTL Datasets 
PLoS Computational Biology  2014;10(2):e1003420.
The standard approach for identifying gene networks is based on experimental perturbations of gene regulatory systems such as gene knock-out experiments, followed by a genome-wide profiling of differential gene expressions. However, this approach is significantly limited in that it is not possible to perturb more than one or two genes simultaneously to discover complex gene interactions or to distinguish between direct and indirect downstream regulations of the differentially-expressed genes. As an alternative, genetical genomics study has been proposed to treat naturally-occurring genetic variants as potential perturbants of gene regulatory system and to recover gene networks via analysis of population gene-expression and genotype data. Despite many advantages of genetical genomics data analysis, the computational challenge that the effects of multifactorial genetic perturbations should be decoded simultaneously from data has prevented a widespread application of genetical genomics analysis. In this article, we propose a statistical framework for learning gene networks that overcomes the limitations of experimental perturbation methods and addresses the challenges of genetical genomics analysis. We introduce a new statistical model, called a sparse conditional Gaussian graphical model, and describe an efficient learning algorithm that simultaneously decodes the perturbations of gene regulatory system by a large number of SNPs to identify a gene network along with expression quantitative trait loci (eQTLs) that perturb this network. While our statistical model captures direct genetic perturbations of gene network, by performing inference on the probabilistic graphical model, we obtain detailed characterizations of how the direct SNP perturbation effects propagate through the gene network to perturb other genes indirectly. We demonstrate our statistical method using HapMap-simulated and yeast eQTL datasets. In particular, the yeast gene network identified computationally by our method under SNP perturbations is well supported by the results from experimental perturbation studies related to DNA replication stress response.
Author Summary
A complete understanding of how gene regulatory networks are wired in a biological system is important in many areas of biology and medicine. The most popular method for investigating a gene network has been based on experimental perturbation studies, where the expression of a gene is experimentally manipulated to observe how this perturbation affects the expressions of other genes. Such experimental methods are costly, laborious, and do not scale to a perturbation of more than two genes at a time. As an alternative, genetical genomics approach uses genetic variants as naturally-occurring perturbations of gene regulatory system and learns gene networks by decoding the perturbation effects by genetic variants, given population gene-expression and genotype data. However, since there exist millions of genetic variants in genomes that simultaneously perturb a gene network, it is not obvious how to decode the effects of such multifactorial perturbations from data. Our statistical approach overcomes this computational challenge and recovers gene networks under SNP perturbations using probabilistic graphical models. As population gene-expression and genotype datasets are routinely collected to study genetic architectures of complex diseases and phenotypes, our approach can directly leverage these existing datasets to provide a more effective way of identifying gene networks.
PMCID: PMC3937098  PMID: 24586125
19.  Defining the Human Macula Transcriptome and Candidate Retinal Disease Genes UsingEyeSAGE 
To develop large-scale, high-throughput annotation of the human macula transcriptome and to identify and prioritize candidate genes for inherited retinal dystrophies, based on ocular-expression profiles using serial analysis of gene expression (SAGE).
Two human retina and two retinal pigment epithelium (RPE)/choroid SAGE libraries made from matched macula or midperipheral retina and adjacent RPE/choroid of morphologically normal 28- to 66-year-old donors and a human central retina longSAGE library made from 41- to 66-year-old donors were generated. Their transcription profiles were entered into a relational database, EyeSAGE, including microarray expression profiles of retina and publicly available normal human tissue SAGE libraries. EyeSAGE was used to identify retina- and RPE-specific and -associated genes, and candidate genes for retina and RPE disease loci. Differential and/or cell-type specific expression was validated by quantitative and single-cell RT-PCR.
Cone photoreceptor-associated gene expression was elevated in the macula transcription profiles. Analysis of the longSAGE retina tags enhanced tag-to-gene mapping and revealed alternatively spliced genes. Analysis of candidate gene expression tables for the identified Bardet-Biedl syndrome disease gene (BBS5) in the BBS5 disease region table yielded BBS5 as the top candidate. Compelling candidates for inherited retina diseases were identified.
The EyeSAGE database, combining three different gene-profiling platforms including the authors’ multidonor-derived retina/RPE SAGE libraries and existing single-donor retina/RPE libraries, is a powerful resource for definition of the retina and RPE transcriptomes. It can be used to identify retina-specific genes, including alternatively spliced transcripts and to prioritize candidate genes within mapped retinal disease regions.
PMCID: PMC2813776  PMID: 16723438
20.  Robust Target Gene Discovery through Transcriptome Perturbations and Genome-Wide Enhancer Predictions in Drosophila Uncovers a Regulatory Basis for Sensory Specification 
PLoS Biology  2010;8(7):e1000435.
CisTarget X is a novel computational method that accurately predicts Atonal governed regulatory networks in the retina of the fruit fly.
A comprehensive systems-level understanding of developmental programs requires the mapping of the underlying gene regulatory networks. While significant progress has been made in mapping a few such networks, almost all gene regulatory networks underlying cell-fate specification remain unknown and their discovery is significantly hampered by the paucity of generalized, in vivo validated tools of target gene and functional enhancer discovery. We combined genetic transcriptome perturbations and comprehensive computational analyses to identify a large cohort of target genes of the proneural and tumor suppressor factor Atonal, which specifies the switch from undifferentiated pluripotent cells to R8 photoreceptor neurons during larval development. Extensive in vivo validations of the predicted targets for the proneural factor Atonal demonstrate a 50% success rate of bona fide targets. Furthermore we show that these enhancers are functionally conserved by cloning orthologous enhancers from Drosophila ananassae and D. virilis in D. melanogaster. Finally, to investigate cis-regulatory cross-talk between Ato and other retinal differentiation transcription factors (TFs), we performed motif analyses and independent target predictions for Eyeless, Senseless, Suppressor of Hairless, Rough, and Glass. Our analyses show that cisTargetX identifies the correct motif from a set of coexpressed genes and accurately predicts target genes of individual TFs. The validated set of novel Ato targets exhibit functional enrichment of signaling molecules and a subset is predicted to be coregulated by other TFs within the retinal gene regulatory network.
Author Summary
Tens of thousands of regulatory elements determine the spatiotemporal expression pattern of protein-coding genes in the metazoan genome. Each regulatory element, when bound by the appropriate transcription factors, can affect the temporal transcription of a nearby target gene in a particular cell type. Annotating the genome for regulatory elements, as well as determining the input transcription factors for each element, is a key challenge in genome biology. In this study, we introduce a computational method, cisTargetX, that predicts transcription factor binding motifs and their target genes through the integration of gene expression data and comparative genomics. We first validate this method in silico using public gene expression data and, then, apply cisTargetX to the developmental program governing photoreceptor neuron specification in the retina of Drosophila melanogaster. Particularly, we perturbed predicted key transcription factors during the initial steps of neurogenesis; measure gene expression by microarrays; identify motifs and predict target genes; validate the predictions in vivo using transgenic animals; and study several functional and evolutionary aspects of the validated regulatory elements for the proneural factor Atonal. Overall, we show that cisTargetX efficiently predicts genetic regulatory interactions and provides mechanistic insight into gene regulatory networks of postembryonic developmental systems.
PMCID: PMC2910651  PMID: 20668662
21.  Systems Level Analysis of Systemic Sclerosis Shows a Network of Immune and Profibrotic Pathways Connected with Genetic Polymorphisms 
PLoS Computational Biology  2015;11(1):e1004005.
Systemic sclerosis (SSc) is a rare systemic autoimmune disease characterized by skin and organ fibrosis. The pathogenesis of SSc and its progression are poorly understood. The SSc intrinsic gene expression subsets (inflammatory, fibroproliferative, normal-like, and limited) are observed in multiple clinical cohorts of patients with SSc. Analysis of longitudinal skin biopsies suggests that a patient's subset assignment is stable over 6–12 months. Genetically, SSc is multi-factorial with many genetic risk loci for SSc generally and for specific clinical manifestations. Here we identify the genes consistently associated with the intrinsic subsets across three independent cohorts, show the relationship between these genes using a gene-gene interaction network, and place the genetic risk loci in the context of the intrinsic subsets. To identify gene expression modules common to three independent datasets from three different clinical centers, we developed a consensus clustering procedure based on mutual information of partitions, an information theory concept, and performed a meta-analysis of these genome-wide gene expression datasets. We created a gene-gene interaction network of the conserved molecular features across the intrinsic subsets and analyzed their connections with SSc-associated genetic polymorphisms. The network is composed of distinct, but interconnected, components related to interferon activation, M2 macrophages, adaptive immunity, extracellular matrix remodeling, and cell proliferation. The network shows extensive connections between the inflammatory- and fibroproliferative-specific genes. The network also shows connections between these subset-specific genes and 30 SSc-associated polymorphic genes including STAT4, BLK, IRF7, NOTCH4, PLAUR, CSK, IRAK1, and several human leukocyte antigen (HLA) genes. Our analyses suggest that the gene expression changes underlying the SSc subsets may be long-lived, but mechanistically interconnected and related to a patients underlying genetic risk.
Author Summary
Systemic sclerosis (SSc) is a rare autoimmune disease characterized by skin thickening (fibrosis) and progressive organ failure. Previous studies of SSc skin biopsies have identified molecular subsets of SSc based upon gene expression termed the inflammatory, fibroproliferative, normal-like, and limited intrinsic subsets. These gene expression signatures are large and although the biological processes are conserved, the exact list of genes can vary across datasets due to random variation, as well as minor differences in the composition of the study cohorts (e.g. early vs. late disease). We developed a computational tool to identify the consensus genes underlying the subsets across heterogeneous data and characterized the biological role of the consensus genes in SSc in order to obtain a systems level perspective of the SSc subsets. Our analysis reveals a complex network of genes connecting two of the major SSc intrinsic subsets, inflammatory and fibroproliferative. Many genetic loci associated with SSc risk show connections with the consensus genes of the intrinsic subsets, indicating that differential expression of genes defining the subsets may be related to genetic risk for SSc, thus for the first time placing the genetic risk factors in the context of, and showing putative relationships with, the intrinsic gene expression subsets.
PMCID: PMC4288710  PMID: 25569146
22.  In vivo function of the orphan nuclear receptor NR2E3 in establishing photoreceptor identity during mammalian retinal development 
Human molecular genetics  2006;15(17):2588-2602.
Rod and cone photoreceptors in mammalian retina are generated from common pool(s) of neuroepithelial progenitors. NRL, CRX and NR2E3 are key transcriptional regulators that control photoreceptor differentiation. Mutations in NR2E3, a rod-specific orphan nuclear receptor, lead to loss of rods, increased density of S-cones and supernormal S-cone-mediated vision in humans. To better understand its in vivo function, NR2E3 was expressed ectopically in the Nrl−/− retina, where post-mitotic precursors fated to be rods develop into functional S-cones similar to the human NR2E3 disease. Expression of NR2E3 in the Nrl−/− retina completely suppressed cone differentiation and resulted in morphologically rod-like photoreceptors, which were however not functional. Gene profiling of FACS-purified photoreceptors confirmed the role of NR2E3 as a strong suppressor of cone genes but an activator of only a subset of rod genes (including rhodopsin) in vivo. Ectopic expression of NR2E3 in cone precursors and differentiating S-cones of wild-type retina also generated rod-like cells. The dual regulatory function of NR2E3 was not dependent upon the presence of NRL and/or CRX, but on the timing and level of its expression. Our studies reveal a critical role of NR2E3 in establishing functional specificity of NRL-expressing photoreceptor precursors during retinal neurogenesis.
PMCID: PMC1592580  PMID: 16868010
23.  Seeded Bayesian Networks: Constructing genetic networks from microarray data 
BMC Systems Biology  2008;2:57.
DNA microarrays and other genomics-inspired technologies provide large datasets that often include hidden patterns of correlation between genes reflecting the complex processes that underlie cellular metabolism and physiology. The challenge in analyzing large-scale expression data has been to extract biologically meaningful inferences regarding these processes – often represented as networks – in an environment where the datasets are often imperfect and biological noise can obscure the actual signal. Although many techniques have been developed in an attempt to address these issues, to date their ability to extract meaningful and predictive network relationships has been limited. Here we describe a method that draws on prior information about gene-gene interactions to infer biologically relevant pathways from microarray data. Our approach consists of using preliminary networks derived from the literature and/or protein-protein interaction data as seeds for a Bayesian network analysis of microarray results.
Through a bootstrap analysis of gene expression data derived from a number of leukemia studies, we demonstrate that seeded Bayesian Networks have the ability to identify high-confidence gene-gene interactions which can then be validated by comparison to other sources of pathway data.
The use of network seeds greatly improves the ability of Bayesian Network analysis to learn gene interaction networks from gene expression data. We demonstrate that the use of seeds derived from the biomedical literature or high-throughput protein-protein interaction data, or the combination, provides improvement over a standard Bayesian Network analysis, allowing networks involving dynamic processes to be deduced from the static snapshots of biological systems that represent the most common source of microarray data. Software implementing these methods has been included in the widely used TM4 microarray analysis package.
PMCID: PMC2474592  PMID: 18601736
24.  Distinct and Atypical Intrinsic and Extrinsic Cell Death Pathways between Photoreceptor Cell Types upon Specific Ablation of Ranbp2 in Cone Photoreceptors 
PLoS Genetics  2013;9(6):e1003555.
Non-autonomous cell-death is a cardinal feature of the disintegration of neural networks in neurodegenerative diseases, but the molecular bases of this process are poorly understood. The neural retina comprises a mosaic of rod and cone photoreceptors. Cone and rod photoreceptors degenerate upon rod-specific expression of heterogeneous mutations in functionally distinct genes, whereas cone-specific mutations are thought to cause only cone demise. Here we show that conditional ablation in cone photoreceptors of Ran-binding protein-2 (Ranbp2), a cell context-dependent pleiotropic protein linked to neuroprotection, familial necrotic encephalopathies, acute transverse myelitis and tumor-suppression, promotes early electrophysiological deficits, subcellular erosive destruction and non-apoptotic death of cones, whereas rod photoreceptors undergo cone-dependent non-autonomous apoptosis. Cone-specific Ranbp2 ablation causes the temporal activation of a cone-intrinsic molecular cascade highlighted by the early activation of metalloproteinase 11/stromelysin-3 and up-regulation of Crx and CoREST, followed by the down-modulation of cone-specific phototransduction genes, transient up-regulation of regulatory/survival genes and activation of caspase-7 without apoptosis. Conversely, PARP1+-apoptotic rods develop upon sequential activation of caspase-9 and caspase-3 and loss of membrane permeability. Rod photoreceptor demise ceases upon cone degeneration. These findings reveal novel roles of Ranbp2 in the modulation of intrinsic and extrinsic cell death mechanisms and pathways. They also unveil a novel spatiotemporal paradigm of progression of neurodegeneration upon cell-specific genetic damage whereby a cone to rod non-autonomous death pathway with intrinsically distinct cell-type death manifestations is triggered by cell-specific loss of Ranbp2. Finally, this study casts new light onto cell-death mechanisms that may be shared by human dystrophies with distinct retinal spatial signatures as well as with other etiologically distinct neurodegenerative disorders.
Author Summary
The secondary demise of healthy neurons upon the degeneration of neurons harboring primary genetic defect(s) is hallmark to neurodegenerative diseases. However, the factors and mechanisms driving these cell-death processes are not understood, a severe limitation which has hampered the therapeutic development of neuroprotective approaches. The neuroretina is comprised of two main types of photoreceptor neurons, rods and cones. These undergo degeneration upon heterogeneous mutations or environmental stressors and the underlying diseases present conspicuous spatiotemporal pathological signatures whose molecular bases are not understood. We employed the multifunctional protein, Ran-binding protein-2 (Ranbp2), which is implicated in cell-type and stress-dependent clinical manifestations, to examine its role(s) in primary and secondary photoreceptor death mechanisms upon its specific loss in cones. Contrary to prior findings, we found that dying cones can trigger the loss of healthy rods. This process arises by the immediate activation of novel Ranbp2-responsive factors and downstream cascade events in cones that promote extrinsically the demise of rods. The mechanisms of rod and cone demise are molecularly distinct. Collectively, the data uncover distinct Ranbp2 roles in intrinsic and extrinsic cell-death and will likely contribute to our understanding of the spatiotemporal onset and progression of diseases affecting photoreceptor mosaics and other neural networks.
PMCID: PMC3688534  PMID: 23818861
25.  Systematic image-driven analysis of the spatial Drosophila embryonic expression landscape 
We created innovative virtual representation for our large scale Drosophila insitu expression dataset. We aligned an elliptically shaped mesh comprised of small triangular regions to the outline of each embryo. Each triangle defines a unique location in the embryo and comparing corresponding triangles allows easy identification of similar expression patterns.The virtual representation was used to organize the expression landscape at stage 4-6. We identified regions with similar expression in the embryo and clustered genes with similar expression patterns.We created algorithms to mine the dataset for adjacent non-overlapping patterns and anti-correlated patterns. We were able to mine the dataset to identify co-expressed and putative interacting genes.Using co-expression we were able to assign putative functions to unknown genes.
Analyzing both temporal and spatial gene expression is essential for understanding development and regulatory networks of multicellular organisms. Interacting genes are commonly expressed in overlapping or adjacent domains. Thus, gene expression patterns can be used to assign putative gene functions and mined to infer candidates for networks.
We have generated a systematic two-dimensional mRNA expression atlas profiling embryonic development of Drosophila melanogaster (Tomancak et al, 2002, 2007). To date, we have collected over 70 000 images for over 6000 genes. To explore spatial relationships between gene expression patterns, we used a novel computational image-processing approach by converting expression patterns from the images into virtual representations (Figure 1). Using a custom-designed automated pipeline, for each image, we segmented and aligned the outline of the embryo to an elliptically shaped mesh, comprised of 311 small triangular regions each defining a unique location within the embryo. By comparing corresponding triangles, we produced a distance score to identify similar patterns. We generated those triangulated images (TIs) for our entire data set at all developmental stages and demonstrated that this representation can be used as for objective computationally defined description for expression in in situ hybridization images from various sources, including images from the literature.
We used the TIs to conduct a comprehensive analysis of the expression landscape. To this end, we created a novel approach to temporally sort and compact TIs to a non-redundant data set suitable for further computational processing. Although generally applicable for all developmental stages, for this study, we focused on developmental stages 4–6. For this stage range, we reduced the initial set of about 5800 TIs to 553 TIs containing 364 genes. Using this filtered data set, to discover how expression subdivides the embryo into regions, we clustered areas with similar expression and demonstrated that expression patterns divide the early embryo into distinct spatial regions resembling a fate map (Figure 3). To discover the range of unique expression patterns, we used affinity propagation clustering (Frey and Dueck, 2007) to group TIs with similar patterns and identified 39 clusters each representing a distinct pattern class. We integrated the remaining genes into the 39 clusters and studied the distribution of expression patterns and the relationships between the clusters.
The clustered expression patterns were used to identify putative positive and negative regulatory interactions. The similar TIs in each cluster not only grouped already known genes with related functions, but previously undescribed genes. A comparative analysis identified subtle differences between the genes within each expression cluster. To investigate these differences, we developed a novel Markov Random Field (MRF) segmentation algorithm to extract patterns. We then extended the MRF algorithm to detect shared expression boundaries, generate similarity measurements, and discriminate even faint/uncertain patterns between two TIs. This enabled us to identify more subtle partial expression pattern overlaps and adjacent non-overlapping patterns. For example, by conducting this analysis on the cluster containing the gene snail, we identified the previously known huckebein, which restricts snail expression (Reuter and Leptin, 1994), and zfh1, which interacts with tinman (Broihier et al, 1998; Su et al, 1999).
By studying the functions of known genes, we assigned putative developmental roles to each of the 39 clusters. Of the 1800 genes investigated, only half of them had previously assigned functions.
Representing expression patterns with geometric meshes facilitates the analysis of a complex process involving thousands of genes. This approach is complementary to the cellular resolution 3D atlas for the Drosophila embryo (Fowlkes et al, 2008). Our method can be used as a rapid, fully automated, high-throughput approach to obtain a map of co-expression, which will serve to select specific genes for detailed multiplex in-situ hybridization and confocal analysis for a fine-grain atlas. Our data are similar to the data in the literature, and research groups studying reporter constructs, mutant animals, or orthologs can easily produce in situ hybridizations. TIs can be readily created and provide representations that are both comparable to each other and our data set. We have demonstrated that our approach can be used for predicting relationships in regulatory and developmental pathways.
Discovery of temporal and spatial patterns of gene expression is essential for understanding the regulatory networks and development in multicellular organisms. We analyzed the images from our large-scale spatial expression data set of early Drosophila embryonic development and present a comprehensive computational image analysis of the expression landscape. For this study, we created an innovative virtual representation of embryonic expression patterns using an elliptically shaped mesh grid that allows us to make quantitative comparisons of gene expression using a common frame of reference. Demonstrating the power of our approach, we used gene co-expression to identify distinct expression domains in the early embryo; the result is surprisingly similar to the fate map determined using laser ablation. We also used a clustering strategy to find genes with similar patterns and developed new analysis tools to detect variation within consensus patterns, adjacent non-overlapping patterns, and anti-correlated patterns. Of the 1800 genes investigated, only half had previously assigned functions. The known genes suggest developmental roles for the clusters, and identification of related patterns predicts requirements for co-occurring biological functions.
PMCID: PMC2824522  PMID: 20087342
biological function; embryo; gene expression; in situ hybridization; Markov Random Field

Results 1-25 (1497705)