|Home | About | Journals | Submit | Contact Us | Français|
With the availability of complete genome sequences for a growing number of organisms, high-throughput methods for gene annotation and analysis of genome dynamics are needed. The application of whole-genome tiling microarrays for studies of global gene expression is providing a more unbiased view of the transcriptional activity within genomes. For example, this approach has led to the identification and isolation of many novel non-protein-coding RNAs (ncRNAs), which have been suggested to comprise a major component of the transcriptome that have novel functions involved in epigenetic regulation of the genome. Additionally, tiling arrays have been recently applied to the study of histone modifications and methylation of cytosine bases (DNA methylation). Surprisingly, recent studies combining the analysis of gene expression (transcriptome) and DNA methylation (methylome) using whole-genome tiling arrays revealed that DNA methylation regulates the expression levels of many ncRNAs. Further capture and integration of additional types of genome-wide data sets will help to illuminate additional hidden features of the dynamic genomic landscape that are regulated by both genetic and epigenetic pathways in plants.
With the completion of several plant genome sequences, and with many more genomes underway, it is now realistic to begin to undertake a variety of experimental and/or computational studies at the whole genome level. One approach that takes advantage of this flood of sequence data to capture genome-wide gene expression information is microarray technology [1, 2, 3]. A recent derivation of the standard gene expression microarray is the tiling microarray; these are high-density arrays composed of oligonucleotide probes that span the entire genome of an organism . The application of numerous experimental procedures can be applied without the requirement of a completely annotated genome . The increase in resolution now provided by tiling arrays will allow dramatic improvements in the understanding of many previously unexplored aspects of genome biology. For instance, using this unbiased approach several plant and animal genomes have been interrogated for 1) alternative splice sites [6-8], 2) transcription unit mapping/genome annotation [9-12], 3) transcription factor binding sites [13-16], 4) comparative genome hybridization [17, 18], and 5) for mapping of DNA methylation sites [19••-21]. In this review, we focus on some recent experiments taking advantage of whole-genome tiling array technology. These studies have revealed that a large portion of the genome previously thought of as being unexpressed actually codes for novel non-protein-coding RNAs (ncRNAs), which have been suggested to be involved in epigenetic regulation. Additionally, tiling arrays have been used to identify a novel epigenetic pathway located within heterochromatic regions (previously viewed as junk DNA) and regulated by DNA methylation . Thus, whole genome tiling microarrays are proving to be a useful tool for functional genomic analyses of a number of biological processes, and more generally for understanding genome complexity.
Although great advances have been made in gene discovery approaches, simply applying computational gene prediction methods for annotation of genome sequences is not sufficient for accurate gene structure determination and/or identification of all transcription units of an organism. Additionally, large-scale cloning and sequencing of complementary DNA (cDNA) molecules corresponding to expressed gene products, the traditional approach for identifying coding regions, often misses very low abundance and non-polyadenylated transcripts. Furthermore, cDNA collections are often devoid of transcripts that are expressed in response to a specific physiological or environmental condition(s). To circumvent such problems, Kapranov et al.  and Shoemaker et al.  designed tiling DNA microarrays using sequences homologous to human chromosomes 21 and 22. They used these arrays for gene expression studies by hybridizing targets made from RNA samples extracted from 11 human cell lines. The data obtained from these expression studies identified a large number of novel sites of active gene expression that were not previously annotated by computational gene prediction algorithms using DNA sequencing data or identify in massive collection of sequenced human cDNAs. Additionally, Yamada et al.  used high-density tiling DNA microarrays containing probes with homology to the entire Arabidopsis genome. These arrays were used for gene expression studies by hybridizing targets made from RNA samples of four different tissues (flower, leaf, root, cultured cell). Similar to the expression studies done with human samples, the data obtained from the expression studies in Arabidopsis also identified a large number of novel sites of active gene expression in Arabidopsis missed by computational gene prediction algorithms and cDNA collections. Interestingly, many of the newly identified transcripts were expressed from the opposite DNA strand in the reverse orientation (anti-sense) to many previously annotated transcripts. Additionally, the study of Yamada et al.  was able to capture a number of novel transcripts originating in centromeric regions, which were previously thought to be mostly devoid of active gene expression. More recently, Li et al.  hybridized RNA samples from the indica rice subspecies to high-density tiling microarrays containing probes spanning the entire rice genome. These studies identified a large number of novel sites of active gene expression never before annotated in this model crop genome. Overall, these studies demonstrated that tiling microarrays can be successfully applied for discovery of novel sites of active transcription that lie within the “dark matter” in genomes.
The methylation of cytosine bases within DNA molecules (DNA methylation) is a heritable epigenetic modification that has been previously demonstrated to regulate the expression of a number of genes without permanent changes to their DNA sequence [24, 25]. The regulation of gene expression by DNA methylation can occur in cis (the gene itself is methylated) or in trans (methylation at another site in the genome regulates the target gene) [26, 27]. Recently, whole-genome tiling microarrays were used to map every site of DNA methylation within the Arabidopsis genome thus resulting in the first genome-wide high-resolution methylation maps (the methylome), [19••, 20•]. To do this, these groups used an antibody that specifically recognized methylated cytosine bases within genomic DNA. Therefore, a few hundred base pairs surrounding each region of methylated DNA could be specifically immunoprecipitated using this antibody. The data obtained from these experiments demonstrated that regions containing the highest density of DNA methylation were located in highly repetitive DNA sequences. For instance, centromeric and peri-centromeric repeats of all chromosomes and areas containing large amounts of heterochromatin, like the knob region found on chromosome 4 of Arabidopsis, were found to be highly enriched in DNA methylation sites. Interestingly, the genome-wide methylome maps also revealed that over one-third of all previously annotated genes in Arabidopsis contain sites of DNA methylation within their transcribed regions, consistent with a previous report . Furthermore, most of the genes that were methylated within their transcribed regions were highly expressed and constitutively active. Additionally, these comprehensive, whole-genome studies demonstrated that the distribution of DNA methylation is clearly different between transposons and genes with annotated function. While DNA methylation of transposons is highly distributed across their entire length including both up and downstream regions, genes with annotated function tended to contain methylation sites with a biased distribution towards their 3′ half (Figure 1A). This distribution of DNA methylation may relate to a possible role in silencing of transcription start sites located in the 3′ region of open reading frames (ORFs) . Surprisingly, only ~5% of annotated genes contained sites of DNA methylation in regions upstream of their ORFs (promoter regions). In fact, these data suggested that promoter regions are hypomethylated in general. Taken together, such findings suggest that the distribution of DNA methylation may determine how expression from specific genes is effected by this epigenetic mark.
Interestingly, the methyl groups added to cytosine bases within DNA molecules during the process of DNA methylation can also be removed by so called DNA demethylases (Figure 1A). There are four such DNA demethylases in Arabidopsis, REPRESSOR OF SILENCING1 (ROS1), DEMETER (DME), DEMETER-LIKE2 (DML2), and DEMETER-LIKE3 (DML3). Interestingly, DME has previously been demonstrated to be required for genomic imprinting during Arabidopsis embryo development , while the closely related ROS1 is involved in regulating transcriptional gene silencing in a transgenic background mediated by DNA methylation . A recent study by Penterman et al. [32••] using tiling microarrays to study DNA methylation patterns in wild-type (WT) and mutant plants lacking three of the DNA demethylases (ROS1, DML2, DML3) identified 179 loci that are actively demethylated by DML enzymes in Arabidopsis. Therefore, these mutant plants (ros1-1 dml2-1 dml3-1) exhibited locus-specific DNA hypermethylation in response to the loss of these three DNA demethylases. Interestingly, demethylation by DML enzymes in gene coding regions primarily occurs at the 5′ and 3′ ends (Figure 1A), which is a pattern opposite to the overall distribution of WT DNA methylation (see above). Therefore, DNA methylation, which is widely considered a stable epigenetic mark, is actively removed likely to protect genes from potentially deleterious methylation.
Previously, it was determined that DNA methylation and demethylation are critical for controlling the level of gene expression from loci such as FWA, SUPERMAN [33, 34], PAI [35, 36], and BAL . Additionally, it was suggested that DNA methylation maybe involved in suppressing expression of transposons, but experimental evidence for this claim was limited [38-41]. Certain types of transposons are abundant in chromosomal regions near centromeres and constitute a large fraction of the DNA sequences found in heterochromatic knobs. Interestingly, Lippman et al.  undertook gene expression studies and methylation mapping using microarrays printed with one-kilobase (kb) PCR products that tiled the entire heterochromatic knob of Arabidopsis chromosome 4. They found upon hybridizing target RNA samples extracted from met1 mutant plants, which contain a mutation in the gene encoding the maintenance DNA methyltransferase (MET1) required for symmetric (CG) methylation, that transposons and pseudogenes normally not expressed in WT plants were reactivated. Overall, these array studies revealed that in plants containing genetic lesions in MET1, gene expression within highly heterochromatic and repetitive sequences was extremely elevated mainly due to the expression of normally silenced transposons and pseudogenes. More recently, Zhang et al. [19••] and Zilberman et al. [20•] also reported massive transcriptional reactivation of pseudogenes and transposons on an entire genome level, including those found in chromosomal regions rich in heterochromatin and repetitive sequences in met1 mutant plants using whole-genome tiling arrays.
In addition to a maintenance methyltransferase, plants also have three methyltransferases that are required for de novo (non-CG) DNA methylation, DRM1, DRM2, and CMT3. drm1 drm2 cmt3 triple mutant plants contain genetic lesions in the genes encoding all three de novo methyltransferases, and lose a great deal of the non-CG DNA methylation sties in the Arabidopsis genome . By hybridizing target RNA samples extracted from drm1 drm2 cmt3 triple mutant plants to whole-genome tiling arrays, Zhang et al. [19••] found that relatively few transposons or pseudogenes were reactivated (in terms of changes in steady state RNA accumulation). In fact, the general transcriptional activity within these triple mutant plants was similar to WT plants at the whole genome level. These findings are consistent with the observation that CG methylation was largely unchanged in heterochromatic genomic regions of triple mutant plants. Furthermore, they suggest that in most cases maintenance (CG) methylation alone is sufficient for the silencing of gene expression from methylated transposons and pseudogenes, and is largely unchanged even in the absence of non-CG methylation. Taken together, the results of these gene expression studies of methyltransferase mutants suggest that the widespread presence of transposons in the centromeres of Arabidopsis chromosomes may reflect their importance in heterochromatin formation and/or function [43••].
One of the most interesting findings from a number of recent high-throughput genomic analyses is the identification and isolation of a large class of previously uncharacterized non-protein-coding RNAs (ncRNAs) (Figure 1B-C). These transcripts were found to be expressed in regions not previously annotated as containing an actively expressed genic unit [9, 44]. This class of RNAs seems to represent a major component of the transcriptome that has been suggested to play a novel role in epigenetic gene regulation . Furthermore, a large fraction of these epigenetic regulatory ncRNAs are transcribed by RNA Polymerase II (RNAPII) . One model for gene silencing mediated by this new class of RNAs suggests that expression of the ncRNA from the same or opposite strand could interfere with expression of a target gene, such as the case for the SRG1 ncRNA and its target SER3 (Figure 1B) , the Tsix ncRNA  and its ncRNA target Xist, and the Air ncRNA  and its target Igf2r.
Interestingly, the previously mentioned gene expression studies of DNA methyltransferase (met1 and drm1 drm2 cmt3) mutants also identified a large number of ncRNAs whose expression levels increased in these plants [19••]. Therefore, genomewide gene expression data in combination with mapping of the sites of DNA methylation suggested that the increased expression of these ncRNAs may indeed be a by-product of the loss of DNA methylation, which when present would silence expression from these loci (Figure 1C). Furthermore, a recent report by Kapranov et al. [48••] demonstrated that within mammalian genomes there are numerous, small ncRNAs whose expression originates in sites of high-density DNA methylation (CpG islands), suggesting the existence of ncRNAs in mammals that may also be regulated by DNA methylation.
A number of the novel transcripts found in the Arabidopsis genome seem to be repeated many of times in the genome (multi-copy elements), thereby suggesting they may represent unannotated transposons (19••). Furthermore, this class of novel ncRNAs have been suggested to silence neighboring genes through there expression by 1) RNA interference (RNAi) [47, 49] and 2) occlusion of promoter binding by RNA polymease (RNAP) and other transcription factors (Figure 1B) [46, 50]. Additionally, others of these novel ncRNAs were found to be single or low copy genic units within the genome (19••). Many of the ncRNAs in this class have no significant sequence homology to the genomes of other organisms, thus suggesting that the Arabidopsis genome contains a large number of fast-evolving ncRNAs whose expression is controlled epigenetically by DNA methylation. Indeed, a very large fraction of ncRNAs are poorly conserved, which has led to the speculation that they are not functional . However, this observation is not necessarily associated to a lack of functionality , seeing that the DNA sequences of known functional ncRNAs such as Xist and Air are poorly conserved . Additionally, an independent study on expression of intergenic sequences in human and chimpanzee showed that conservation of expression of ncRNAs in equivalent genomic positions, but not conservation of the RNA sequences . Therefore, these unique ncRNAs may have originated from evolutionary responses to the growth environment. For example, since plants are sessile organisms, they may have evolved novel genes/processes to rapidly regulate/respond to environmental changes. Interestingly, previous whole-genome gene expression studies using Arabidopsis culture cells by Yamada et al.  revealed that many pseudogenes, transposons, and ncRNAs are over-expressed in these de-differentiated cells, similar to the transcriptome profile witnessed for met1 mutant plants. Together, these results suggest that DNA methylation likely plays a role in cellular de-differentiation and regeneration through its regulation of gene expression from putative pseudogenes, transposons, and ncRNAs that litter the Arabidopsis genome.
Massive sequencing efforts have been undertaken to isolate and characterize as many expressed gene products as can possibly be detected from large-scale cDNA libraries. Interestingly, a number of groups involved in these sequencing efforts recognized that there were a far greater number of expressed gene products in their cDNA libraries than potential coding regions predicted by computational methods [44•, 55, 56]. Upon further examination, these groups found that 70% of the cDNAs that did not fall into potential coding regions, were actually RNAs derived from the strand of DNA opposite to ORFs for computationally-predicted protein-coding genes (anti-sense transcripts) [9, 57••]. More recently, anti-sense transcripts have been found to regulate expression from the ORF on the opposite strand (sense transcript) through formation of a double-stranded RNA (dsRNA) molecule consisting of the sense/anti-sense pair [57••]. For instance, Borsani et al. [58••] reported a natural cis-anti-sense pair of transcripts involved in the regulation of salt tolerance in Arabidopsis. This anti-sense gene pair consists of P5CDH (encodes Δ1-pyroline-5-carboxylate dehydrogenase) and a gene of unknown function (SRO5) that is only expressed upon salt stress conditions. Co-expression of these two genes results in the formation of a dsRNA molecule consisting of the very 3′ ends of both transcripts (see Figure 1D). Subsequently, this dsRNA acts as a substrate for small interfering RNA (siRNA) biogenesis. These siRNAs direct initial cleavage of P5CDH, thus resulting in a RNA molecule that can act as a substrate for further siRNA biogenesis. Ultimately, this cis-anti-sense pair of transcripts results in the down-regulation of P5CDH levels, and improved salt tolerance (Figure 1D). Additionally, with the assistance of whole-genome tiling array data, Swiezewski et al. [59•] found that an anti-sense transcript overlapping the 3′ end of RNA for the repressor of Arabidopsis flowering time, FLOWERING LOCUS C (FLC), is involved in the regulation of flowering time. Furthermore, this group was able to demonstrate that the anti-sense transcript may also act as a biogenesis substrate for siRNAs that are responsible for the heterochromatization and subsequent silencing of this genomic region. Thus far, dsRNA molecules (sense/anti-sense pairs) have been demonstrated to regulate sense gene expression by 1) inhibition of transcript elongation by RNAP or 2) by acting as a substrate for siRNA biogenesis, which can result in heterochromatin formation and subsequent silencing of the homologous chromosomal location [57-59•].
Previous studies of DNA methylation in a number of organisms have illuminated that a large number of genes are methylated in their coding regions (body-methylated), and suggested this type of methylation can negatively regulate expression from the anti-sense strand [60-62]. Furthermore, the recent combinatorial whole-methylome and transcriptome analyses performed by Zhang et al. [19••] and Zilberman et al. [20•] demonstrated that the majority of constitutively expressed genes are body-methylated (Figure 1A). Zhang et al. tested the hypothesis that body-methylation in ORFs negatively regulates anti-sense expression. To do this, the authors analyzed the whole-genome gene expression data for met1 mutant plants with the hypothesis that if intragenic “body methylation” silences anti-sense expression in WT plants, then the mutant plants should have increased levels of anti-sense transcripts. However, they found no evidence for a large-scale increase in anti-sense transcription in met1 mutant plants, which have lost most of the DNA methylation in the body of ORFs, compared to WT plants. These results are consistent with several recent reports concerning the regulation of anti-sense gene expression [63, 64]. However, there are possible explanations for why increased anti-sense expression in response to loss of DNA methylation might not be observed. For example, many anti-sense RNAs are likely not to be polyadenylated [65•] and thus, these transcripts would be missed in studies that exclusively use poly-A+ RNA as targets. Additionally, expression of anti-sense transcripts may be regulated by DNA methylation in a tissue or cell-type specific manner. Therefore, understanding of the functions and regulation of DNA methylation and its possible role in anti-sense transcription is at an early stage.
In addition to DNA methylation, the covalent addition of methyl groups to lysine or arginine residues of histone tails (histone methylation) has emerged as another crucial step in controlling eukaryotic genome dynamics [66-71]. For instance, trimethylation of lysine 27 of histone H3 (H3K27me3) plays critical roles in regulating animal development [66-68]. Furthermore, H3K27me3 is required for proper regulation of several genes important for development in plants [69-71]. Recently, genome-wide profiling of H3K27me3 in the Arabidopsis genome was carried out using tiling microarrays [72••]. Interestingly, the results from this study suggest that H3K27me3 is a major silencing mechanism in plants that regulates an unexpectedly large number of Arabidopsis genes. Furthermore, analysis of the whole-genome profile of H3K27me3 suggested that establishment and maintenance of this specific epigenetic modification is largely independent of other epigenetic pathways, such as DNA methylation or RNA silencing. Therefore, the use of whole-genome tiling microarrays to study a variety of epigenetic regulatory pathways in plants and animals have suggested an extremely complex network of mechanisms involved in regulation of genome dynamics. Furthermore, through the combination these various data types only now can we begin to gain an appreciation for the complex nature of regulatory mechanisms employed in controlling genome dynamics.
The use of unbiased genome-wide approaches for characterization of the genomic landscape in numerous organisms is well underway. For instance, in Arabidopsis alone high-throughput analysis of the transcriptome, the DNA and histone methylomes, and more recently, for comparisons of diversity among various ecotypes  using microarrays are now providing reams of genome data for this reference plant. In addition to microarrays, new technologies for extremely deep DNA sequencing (eg., Solexa/Illumina, ABI SOLID, 454/Roche etc.) are also rapidly emerging. For instance, a recent report by Barski et al. [74••] demonstrates high-resolution profiling of 20 histone modifications and protein-DNA interactions in the human genome at single base resolution using Solexa 1G sequencing.
Ultra high-throughput DNA sequencing can be viewed as a “virtual” tiling array, where hundreds of millions of short read sequences will push genomic analysis to an even higher level of resolution [74••, 75••]. Unlike microarrays, next-generation DNA sequencing technologies are not limited to sequenced genomes. Analysis of short-nucleotide-fragments like transcription factor binding sites [74••, 75••] and small RNAs (smRNAs), can result in direct identification of the sequences of these genomic targets. Additionally, next-generation sequencing can also be employed for whole transcriptome sequencing and for large-scale genome (re)sequencing for the detection of genetic variation amongst various individuals within a population. Interestingly, a recent study by Euskirchen et al.  compared strategies for mapping transcription factor (TF) binding sites in the human genome using chromatin immuno-precipitation (ChIP) in combination with two different whole-genome methodologies, DNA microarray analysis (ChIP-chip) and DNA sequencing (ChIP-PET). Comparison of these two methods revealed strong agreement for the highest ranked binding sites with less overlap for the lowest ranked targets. With advantages and disadvantages unique to each approach, this study illuminated the idea that ChIP-chip and ChIP-PET are frequently complementary in their relative abilities to detect transcription factor binding sites targets, whereby the most comprehensive and complete list of binding regions is obtained by merging results from ChIP-chip and ChIP-sequencing. Therefore, whatever the technology to be employed, these genome-wide approaches have already begun to change the scope of our understanding of genome dynamics in only a few short years. Their continued development and implementation for capturing a deeper spectrum of genome-wide data will provide for an even greater understanding of the complexity of genetic and epigenetic control mechanisms.
Work in our laboratory is supported by grants to J.R.E. from the National Science Foundation, the Department of Energy and the National Institutes of Health. B.D.G is a Damon Runyon Fellow supported by the Damon Runyon Cancer Research Foundation.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.