was selected to be the first tree genome to be sequenced, mainly because of its extraordinarily rapid growth rate and its relatively compact genome size (450–500 Mbps [1
]). Biofuels are produced mainly through two sources, i.e. crops high in sugar or cellulose, e.g. sugar canes [3
] and plants [4
], and plants high in vegetable oils like soybean[5
]. The Populus trichocarpa
genome's rapid growth coupled with the high content of lignocelluloses has made it one of the model systems for the new generation of biofuels [4
]. The current assembly of the Poplar
genome was released in June 2004, and its total length is ~485 Mbps. The assembled 19 chromosomes with 7.66% gaps count for 63.41% of the whole genome. Further efforts are still needed to close the gaps in the sequenced chromosomes.
Repetitive elements represent a significant fraction of eukaryotic genomes and they could occupy as high as 80% of some land-plant genomes like wheat [6
] and as low as 10–35% for Arabidopsis thaliana
] and rice [8
]. There are three main classes of repetitive elements, namely, local repeats (tandem and satellite repeats) [9
], interspersed repeats (transposons) and segmental duplications (duplicated genomic segments). Among them, transposable elements are the most extensively studied repetitive elements, and they can be classified as retrotransposons or DNA transposons based on whether they are transposed through the RNA or DNA intermediates [10
]. Both interspersed repeats [11
] and other duplicated elements [17
] may induce homologous recombinations and insertions/deletions in the host genome, which may introduce great difficulties to the correct assembly of the repetitive regions in the host genome.
Typically repetitive elements have been identified in a genome using two approaches: (1) identification of homologous sequences to known repetitive elements [18
], and (2) identification of repeats based on self-comparison a given genome and clustering them into families [19
]. The first approach requires manually curated repetitive elements, which may not be feasible for newly sequenced genomes, though it can identify the precise boundaries of repetitive elements, even for the embedded partial copies. The second approach identifies repetitive elements in a de novo
fashion, though it may require additional manual curations for the boundaries of the predicted elements.