Although a small number of the vast array of animal long non-coding RNAs (lncRNAs) have known effects on cellular processes examined in vitro, the extent of their contributions to normal cell processes throughout development, differentiation and disease for the most part remains less clear. Phenotypes arising from deletion of an entire genomic locus cannot be unequivocally attributed either to the loss of the lncRNA per se or to the associated loss of other overlapping DNA regulatory elements. The distinction between cis- or trans-effects is also often problematic. We discuss the advantages and challenges associated with the current techniques for studying the in vivo function of lncRNAs in the light of different models of lncRNA molecular mechanism, and reflect on the design of experiments to mutate lncRNA loci. These considerations should assist in the further investigation of these transcriptional products of the genome.
long non-coding RNAs; knockout mouse models; lethality; developmental defect; brain development; Science forum; Arabidopsis; D. melanogaster; human; mouse; zebrafish
Mice have been a long-standing model for human biology and disease. Here we characterize, by RNA sequencing, the transcriptional profiles of a large and heterogeneous collection of mouse tissues, augmenting the mouse transcriptome with thousands of novel transcript candidates. Comparison with transcriptome profiles in human cell lines reveals substantial conservation of transcriptional programmes, and uncovers a distinct class of genes with levels of expression that have been constrained early in vertebrate evolution. This core set of genes captures a substantial fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types. Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer. Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but is associated with conserved epigenetic marking, as well as with characteristic post-transcriptional regulatory programme, in which sub-cellular localization and alternative splicing play comparatively large roles.
The analysis of mammalian transcriptomes could provide new insights into human biology. Here the authors carry out RNA sequencing in a large collection of mouse tissues and compare these data to human transcriptome profiles, identifying a set of constrained genes that carry out basic cellular functions with remarkably constant expression levels across tissues and species.
Most RNA molecules are co- or post-transcriptionally modified to alter their chemical and functional properties to assist in their ultimate biological function. Among these modifications, the addition of 5′ cap structure has been found to regulate turnover and localization. Here we report a study of the cap structure of human short (<200 nt) RNAs (sRNAs), using sequencing of cDNA libraries prepared by enzymatic pretreatment of the sRNAs with cap sensitive-specificity, thin layer chromatographic (TLC) analyses of isolated cap structures and mass spectrometric analyses for validation of TLC analyses. Processed versions of snoRNAs and tRNAs sequences of less than 50 nt were observed in capped sRNA libraries, indicating additional processing and recapping of these annotated sRNAs biotypes. We report for the first time 2,7 dimethylguanosine in human sRNAs cap structures and surprisingly we find multiple type 0 cap structures (mGpppC, 7mGpppG, GpppG, GpppA, and 7mGpppA) in RNA length fractions shorter than 50 nt. Finally, we find the presence of additional uncharacterized cap structures that wait determination by the creation of needed reference compounds to be used in TLC analyses. These studies suggest the existence of novel biochemical pathways leading to the processing of primary and sRNAs and the modifications of their RNA 5′ ends with a spectrum of chemical modifications.
MiRNAs bear an increasing number of functions throughout development and in the aging adult. Here we address their role in establishing sexually dimorphic traits and sexual identity in male and female Drosophila. Our survey of miRNA populations in each sex identifies sets of miRNAs differentially expressed in male and female tissues across various stages of development. The pervasive sex-biased expression of miRNAs generally increases with the complexity and sexual dimorphism of tissues, gonads revealing the most striking biases. We find that the male-specific regulation of the X chromosome is relevant to miRNA expression on two levels. First, in the male gonad, testis-biased miRNAs tend to reside on the X chromosome. Second, in the soma, X-linked miRNAs do not systematically rely on dosage compensation. We set out to address the importance of a sex-biased expression of miRNAs in establishing sexually dimorphic traits. Our study of the conserved let-7-C miRNA cluster controlled by the sex-biased hormone ecdysone places let-7 as a primary modulator of the sex-determination hierarchy. Flies with modified let-7 levels present doublesex-related phenotypes and express sex-determination genes normally restricted to the opposite sex. In testes and ovaries, alterations of the ecdysone-induced let-7 result in aberrant gonadal somatic cell behavior and non-cell-autonomous defects in early germline differentiation. Gonadal defects as well as aberrant expression of sex-determination genes persist in aging adults under hormonal control. Together, our findings place ecdysone and let-7 as modulators of a somatic systemic signal that helps establish and sustain sexual identity in males and females and differentiation in gonads. This work establishes the foundation for a role of miRNAs in sexual dimorphism and demonstrates that similar to vertebrate hormonal control of cellular sexual identity exists in Drosophila.
miRNA; sex determination; ecdysteroid; gonad; development; Drosophila; genetics of sex
Recent deep sequencing of transcriptomes from worm to human reveals that individual transcripts can be composed of sequence segments that are not collinear — with some mapping great distances apart and others to other chromosomes. Some of these chimeric transcripts are formed by genetic rearrangements but others appear to arise during post-transcriptional events. While in lower eukaryotes, this is accomplished by a well characterized trans-splicing process, in higher eukaryotes the processes leading to their formation remains unclear. While the biological importance of most chimeric RNAs is unclear as yet, the implications of their existence to the potential information content and functional organization of genomes are profound.
Eukaryotic cells make many types of primary and processed RNAs that are found either in specific sub-cellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic sub-cellular localizations are also poorly understood. Since RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell’s regulatory capabilities are focused on its synthesis, processing, transport, modifications and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations taken together prompt to a redefinition of the concept of a gene.
Motivation: Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases.
Results: To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80–90% success rate, corroborating the high precision of the STAR mapping strategy.
Availability and implementation: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.
The study of transcription using genomic tiling arrays has lead to the identification of numerous additional exons. One example is the MECP2 gene on the X chromosome; using 5’RACE and RT-PCR in human tissues and cell lines, we have found more than 70 novel exons (RACEfrags) connecting to at least one annotated exon.. We sequenced all MECP2-connected exons and flanking sequences in 3 groups: 46 patients with the Rett syndrome and without mutations in the currently annotated exons of the MECP2 and CDKL5 genes; 32 patients with the Rett syndrome and identified mutations in the MECP2 gene; 100 control individuals from the same geoethnic group. Approximately 13kb were sequenced per sample, (2.4Mb of DNA resequencing). A total of 75 individuals had novel rare variants (mostly private variants) but no statistically significant difference was found among the 3 groups. These results suggest that variants in the newly discovered exons may not contribute to Rett syndrome. Interestingly however, there are about twice more variants in the novel exons than in the flanking sequences (44 vs. 21 for approximately 1.3 Mb sequenced for each class of sequences, p = 0.0025). Thus the evolutionary forces that shape these novel exons may be different than those of neighboring sequences.
MECP2; Rett syndrome; RACEfrags; SNP; rare variants; positive selection
The transcriptional landscape in embryonic stem cells (ESCs) and during ESC differentiation has received considerable attention, albeit mostly confined to the polyadenylated fraction of RNA, whereas the non-polyadenylated (NPA) fraction remained largely unexplored. Notwithstanding, the NPA RNA super-family has every potential to participate in the regulation of pluripotency and stem cell fate. We conducted a comprehensive analysis of NPA RNA in ESCs using a combination of whole-genome tiling arrays and deep sequencing technologies. In addition to identifying previously characterized and new non-coding RNA members, we describe a group of novel conserved RNAs (snacRNAs: small NPA conserved), some of which are differentially expressed between ESC and neuronal progenitor cells, providing the first evidence of a novel group of potentially functional NPA RNA involved in the regulation of pluripotency and stem cell fate. We further show that minor spliceosomal small nuclear RNAs, which are NPA, are almost completely absent in ESCs and are upregulated in differentiation. Finally, we show differential processing of the minor intron of the polycomb group gene Eed. Our data suggest that NPA RNA, both known and novel, play important roles in ESCs.
Many animal species use a chromosome-based mechanism of sex determination, which has led to the coordinate evolution of dosage-compensation systems. Dosage compensation not only corrects the imbalance in the number of X chromosomes between the sexes but also is hypothesized to correct dosage imbalance within cells that is due to monoallelic X-linked expression and biallelic autosomal expression, by upregulating X-linked genes twofold (termed ‘Ohno’s hypothesis’). Although this hypothesis is well supported by expression analyses of individual X-linked genes and by microarray-based transcriptome analyses, it was challenged by a recent study using RNA sequencing and proteomics. We obtained new, independent RNA-seq data, measured RNA polymerase distribution and reanalyzed published expression data in mammals, C. elegans and Drosophila. Our analyses, which take into account the skewed gene content of the X chromosome, support the hypothesis of upregulation of expressed X-linked genes to balance expression of the genome.
Previous work has demonstrated that chromatin feature levels correlate with gene expression. The ENCODE project enables us to further explore this relationship using an unprecedented volume of data. Expression levels from more than 100,000 promoters were measured using a variety of high-throughput techniques applied to RNA extracted by different protocols from different cellular compartments of several human cell lines. ENCODE also generated the genome-wide mapping of eleven histone marks, one histone variant, and DNase I hypersensitivity sites in seven cell lines.
We built a novel quantitative model to study the relationship between chromatin features and expression levels. Our study not only confirms that the general relationships found in previous studies hold across various cell lines, but also makes new suggestions about the relationship between chromatin features and gene expression levels. We found that expression status and expression levels can be predicted by different groups of chromatin features, both with high accuracy. We also found that expression levels measured by CAGE are better predicted than by RNA-PET or RNA-Seq, and different categories of chromatin features are the most predictive of expression for different RNA measurement methods. Additionally, PolyA+ RNA is overall more predictable than PolyA- RNA among different cell compartments, and PolyA+ cytosolic RNA measured with RNA-Seq is more predictable than PolyA+ nuclear RNA, while the opposite is true for PolyA- RNA.
Our study provides new insights into transcriptional regulation by analyzing chromatin features in different cellular contexts.
Analysis of bacterial transcriptomes have shown the existence of a genome-wide process of overlapping transcription due to the presence of antisense RNAs, as well as mRNAs that overlapped in their entire length or in some portion of the 5′- and 3′-UTR regions. The biological advantages of such overlapping transcription are unclear but may play important regulatory roles at the level of transcription, RNA stability and translation. In a recent report, the human pathogen Staphylococcus aureus is observed to generate genome-wide overlapping transcription in the same bacterial cells leading to a collection of short RNA fragments generated by the endoribonuclease III, RNase III. This processing appears most prominently in Gram-positive bacteria. The implications of both the use of pervasive overlapping transcription and the processing of these double stranded templates into short RNAs are explored and the consequences discussed.
overlapping transcription; RNase III; RNA processing; bacteria; transcriptome
The classic organization of a gene structure has followed the Jacob and Monod bacterial gene model proposed more than 50 years ago. Since then, empirical determinations of the complexity of the transcriptomes found in yeast to human has blurred the definition and physical boundaries of genes. Using multiple analysis approaches we have characterized individual gene boundaries mapping on human chromosomes 21 and 22. Analyses of the locations of the 5′ and 3′ transcriptional termini of 492 protein coding genes revealed that for 85% of these genes the boundaries extend beyond the current annotated termini, most often connecting with exons of transcripts from other well annotated genes. The biological and evolutionary importance of these chimeric transcripts is underscored by (1) the non-random interconnections of genes involved, (2) the greater phylogenetic depth of the genes involved in many chimeric interactions, (3) the coordination of the expression of connected genes and (4) the close in vivo and three dimensional proximity of the genomic regions being transcribed and contributing to parts of the chimeric RNAs. The non-random nature of the connection of the genes involved suggest that chimeric transcripts should not be studied in isolation, but together, as an RNA network.
To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.
Drosophila melanogaster is one of the most well studied genetic model organisms, nonetheless its genome still contains unannotated coding and non-coding genes, transcripts, exons, and RNA editing sites. Full discovery and annotation are prerequisites for understanding how the regulation of transcription, splicing, and RNA editing directs development of this complex organism. We used RNA-Seq, tiling microarrays, and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. Together, these data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.
Despite recent controversies, the evidence that the majority of the human genome is transcribed into RNA remains strong.
Large-scale sequencing projects have revealed an unexpected complexity in the origins, structures and functions of mammalian transcripts. Many loci are known to produce overlapping coding and non-coding RNAs with capped 5′ ends that vary in size. Methods that identify the 5′ ends of transcripts will facilitate the discovery of novel promoters and 5′ ends derived from secondary capping events. Such methods often require high input amounts of RNA not obtainable from highly refined samples such as tissue microdissections and subcellular fractions. Therefore, we have developed nanoCAGE (Cap Analysis of Gene Expression), a method that captures the 5′ ends of transcripts from as little as 10 nanograms of total RNA and CAGEscan, a mate-pair adaptation of nanoCAGE that captures the transcript 5′ ends linked to a downstream region. Both of these methods allow further annotation-agnostic studies of the complex human transcriptome.
The p53 homologs, p63 and p73, share ∼85% amino acid identity in their DNA-binding domains, but they have distinct biological functions.
Using chromatin immunoprecipitation and high-resolution tiling arrays covering the human genome, we identify p73 DNA binding sites on a genome-wide level in ME180 human cervical carcinoma cells. Strikingly, the p73 binding profile is indistinguishable from the previously described binding profile for p63 in the same cells. Moreover, the p73∶p63 binding ratio is similar at all genomic loci tested, suggesting that there are few, if any, targets that are specific for one of these factors. As assayed by sequential chromatin immunoprecipitation, p63 and p73 co-occupy DNA target sites in vivo, suggesting that p63 and p73 bind primarily as heterotetrameric complexes in ME180 cells.
The observation that p63 and p73 associate with the same genomic targets suggest that their distinct biological functions are due to cell-type specific expression and/or protein domains that involve functions other than DNA binding.
The high-resolution transcriptome of wild-type and nonsense-mediated decay (NMD) defective C. elegans during development reveals insights into the NMD pathway and it’s role in development.
While many genome sequences are complete, transcriptomes are less well characterized. We used both genome-scale tiling arrays and massively parallel sequencing to map the Caenorhabditis elegans transcriptome across development. We utilized this framework to identify transcriptome changes in animals lacking the nonsense-mediated decay (NMD) pathway.
We find that while the majority of detectable transcripts map to known gene structures, >5% of transcribed regions fall outside current gene annotations. We show that >40% of these are novel exons. Using both technologies to assess isoform complexity, we estimate that >17% of genes change isoform across development. Next we examined how the transcriptome is perturbed in animals lacking NMD. NMD prevents expression of truncated proteins by degrading transcripts containing premature termination codons. We find that approximately 20% of genes produce transcripts that appear to be NMD targets. While most of these arise from splicing errors, NMD targets are enriched for transcripts containing open reading frames upstream of the predicted translational start (uORFs). We identify a relationship between the Kozak consensus surrounding the true start codon and the degree to which uORF-containing transcripts are targeted by NMD and speculate that translational efficiency may be coupled to transcript turnover via the NMD pathway for some transcripts.
We generated a high-resolution transcriptome map for C. elegans and used it to identify endogenous targets of NMD. We find that these transcripts arise principally through splicing errors, strengthening the prevailing view that splicing and NMD are highly interlinked processes.
The transcriptomes of eukaryotic cells are incredibly complex. Individual non-coding RNAs dwarf the number of protein-coding genes, and include classes that are well understood as well as classes for which the nature, extent and functional roles are obscure1. Deep sequencing of small RNAs (<200 nucleotides) from human HeLa and HepG2 cells revealed a remarkable breadth of species. These arose both from within annotated genes and from unannotated intergenic regions. Overall, small RNAs tended to align with CAGE (cap-analysis of gene expression) tags2, which mark the 5′ ends of capped, long RNA transcripts. Many small RNAs, including the previously described promoter-associated small RNAs3, appeared to possess cap structures. Members of an extensive class of both small RNAs and CAGE tags were distributed across internal exons of annotated protein coding and non-coding genes, sometimes crossing exon–exon junctions. Here we show that processing of mature mRNAs through an as yet unknown mechanism may generate complex populations of both long and short RNAs whose apparently capped 5′ ends coincide. Supplying synthetic promoter-associated small RNAs corresponding to the c-MYC transcriptional start site reduced MYC messenger RNA abundance. The studies presented here expand the catalogue of cellular small RNAs and demonstrate a biological impact for at least one class of non-canonical small RNAs.
RACE (Rapid Amplification of cDNA Ends) is a widely used approach for transcript identification. Random clone selection from the RACE mixture, however, is an ineffective sampling strategy if the dynamic range of transcript abundances is large. Here, we describe a strategy that uses array hybridization to improve sampling efficiency of human transcripts. The products of the RACE reaction are hybridized onto tiling arrays, and the exons detected are used to delineate a series of RT-PCR reactions, through which the original RACE mixture is segregated into simpler RT-PCR reactions. These are independently cloned, and randomly selected clones are sequenced. This approach is superior to direct cloning and sequencing of RACE products: it specifically targets novel transcripts, and often results in overall normalization of transcript abundances. We show theoretically and experimentally that this strategy leads indeed to efficient sampling of novel transcripts, and we investigate multiplexing it by pooling RACE reactions from multiple interrogated loci prior to hybridization.
The molecular mechanisms underlying pluripotency and lineage specification from embryonic stem (ES) cells are largely unclear. Differentiation pathways may be determined by the targeted activation of lineage specific genes or by selective silencing of genome regions during differentiation. Here we show that the ES cell genome is transcriptionally globally hyperactive and undergoes global silencing as cells differentiate. Normally silent repeat regions are active in ES cells and tissue-specific genes are sporadically expressed at low levels. Whole genome tiling arrays demonstrate widespread transcription in both coding and non-coding regions in pluripotent ES cells whereas the transcriptional landscape becomes more discrete as differentiation proceeds. The transcriptional hyperactivity in ES cells is accompanied by disproportionate expression of chromatin-remodeling genes and the general transcription machinery, but not histone modifying activities. Interference with several chromatin remodeling activities in ES cells affects their proliferation and differentiation behavior. We propose that global transcriptional activity is a hallmark of pluripotent ES cells that contributes to their plasticity and that lineage specification is strongly driven by reduction of the actively transcribed portion of the genome.
High density oligonucleotide tiling arrays are an effective and powerful platform for conducting unbiased genome-wide studies. The ab initio probe selection method employed in tiling arrays is unbiased, and thus ensures consistent sampling across coding and non-coding regions of the genome. These arrays are being increasingly used to study the associated processes of transcription, transcription factor binding, chromatin structure and their association. Studies of differential expression and/or regulation provide critical insight into the mechanics of transcription and regulation that occurs during the developmental program of a cell. The time-course experiment, which comprises an in-vivo system and the proposed analyses, is used to determine if annotated and un-annotated portions of genome manifest coordinated differential response to the induced developmental program.
We have proposed a novel approach, based on a piece-wise function – to analyze genome-wide differential response. This enables segmentation of the response based on protein-coding and non-coding regions; for genes the methodology also partitions differential response with a 5' versus 3' versus intra-genic bias.
The algorithm built upon the framework of Significance Analysis of Microarrays, uses a generalized logic to define regions/patterns of coordinated differential change. By not adhering to the gene-centric paradigm, discordant differential expression patterns between exons and introns have been identified at a FDR of less than 12 percent. A co-localization of differential binding between RNA Polymerase II and tetra-acetylated histone has been quantified at a p-value < 0.003; it is most significant at the 5' end of genes, at a p-value < 10-13. The prototype R code has been made available as supplementary material [see Additional file 1].
Regulatory T (T reg) cells are critical regulators of immune tolerance. Most T reg cells are defined based on expression of CD4, CD25, and the transcription factor, FoxP3. However, these markers have proven problematic for uniquely defining this specialized T cell subset in humans. We found that the IL-7 receptor (CD127) is down-regulated on a subset of CD4+ T cells in peripheral blood. We demonstrate that the majority of these cells are FoxP3+, including those that express low levels or no CD25. A combination of CD4, CD25, and CD127 resulted in a highly purified population of T reg cells accounting for significantly more cells that previously identified based on other cell surface markers. These cells were highly suppressive in functional suppressor assays. In fact, cells separated based solely on CD4 and CD127 expression were anergic and, although representing at least three times the number of cells (including both CD25+CD4+ and CD25−CD4+ T cell subsets), were as suppressive as the “classic” CD4+CD25hi T reg cell subset. Finally, we show that CD127 can be used to quantitate T reg cell subsets in individuals with type 1 diabetes supporting the use of CD127 as a biomarker for human T reg cells.
High density oligonucleotide tiling arrays are an effective and powerful platform for conducting unbiased genome-wide studies. The ab initio probe selection method employed in tiling arrays is unbiased, and thus ensures consistent sampling across coding and non-coding regions of the genome. Tiling arrays are increasingly used in chromatin immunoprecipitation (IP) experiments (ChIP on chip). ChIP on chip facilitates the generation of genome-wide maps of in-vivo interactions between DNA-associated proteins including transcription factors and DNA. Analysis of the hybridization of an immunoprecipitated sample to a tiling array facilitates the identification of ChIP-enriched segments of the genome. These enriched segments are putative targets of antibody assayable regulatory elements. The enrichment response is not ubiquitous across the genome. Typically 5 to 10% of tiled probes manifest some significant enrichment. Depending upon the factor being studied, this response can drop to less than 1%. The detection and assessment of significance for interactions that emanate from non-canonical and/or un-annotated regions of the genome is especially challenging. This is the motivation behind the proposed algorithm.
We have proposed a novel rank and replicate statistics-based methodology for identifying and ascribing statistical confidence to regions of ChIP-enrichment. The algorithm is optimized for identification of sites that manifest low levels of enrichment but are true positives, as validated by alternative biochemical experiments. Although the method is described here in the context of ChIP on chip experiments, it can be generalized to any treatment-control experimental design. The results of the algorithm show a high degree of concordance with independent biochemical validation methods. The sensitivity and specificity of the algorithm have been characterized via quantitative PCR and independent computational approaches.
The algorithm ranks all enrichment sites based on their intra-replicate ranks and inter-replicate rank consistency. Following the ranking, the method allows segmentation of sites based on a meta p-value, a composite array signal enrichment criterion, or a composite of these two measures. The sensitivities obtained subsequent to the segmentation of data using a meta p-value of 10-5, an array signal enrichment of 0.2 and a composite of these two values are 88%, 87% and 95%, respectively.