Structural variation within the genome, including insertions, duplications, deletions, and inversions of up to multiple kilobase pairs, have recently been described in a variety of species, including humans [
1-
3], mice [
4], rats [
5], silkworms [
6] drosophila [
7], and dogs [
8]. These genomic variations were recently found to be widespread, encompassing 5% of the human genome [
9], and are thought to be involved in (co)determining complex phenotypes [
10,
11].
The contribution of structural variants (SVs) to complex phenotypes has been measured by association analyses of variance in gene expression levels (traits) and the presence of SVs. SNPs and SVs have been shown to account for 83.6% and 17.7%, respectively, of the total detected genetic variation in gene expression, with only a limited overlap [
12]. The effect that SVs have on gene expression is likely underestimated given the much less completeness and accuracy with which SVs could be queried at that time. In humans, SVs have been associated with sporadic and Mendelian diseases, such as Williams-Beuren syndrome, mental retardation, and red-green color blindness. SVs have also been associated with complex human traits, such as autism, schizophrenia, Crohn's disease, and susceptibility to HIV infection [
13]. Because of their association with human diseases, the importance of SVs has become increasingly apparent [
9,
14,
15]. For most other species, including the major farm animals, chickens, cattle, and pigs, the extent and biological consequences of SVs have remained largely unknown due to the lack of a cost-effective approach for detecting SVs.
Until recently, comparative genomic hybridization (array-CGH) was the most commonly used method for detecting SVs [
16]. Fosmid paired-end sequencing, which is a more laborious technique, has been used to detect SVs larger than 8 kb [
17,
18]. The inability to resolve smaller SVs using array-CGH results in the over-representation of larger SVs in current databases of structural variation (e.g.,
http://projects.tcag.ca/variation/). The resolution of array-CGH, though extremely costly, can be improved by using high-resolution whole-genome tiling arrays. Most of these SVs have been identified by methods that do not resolve SV end points at the base pair level. In addition, methods like array-CGH are based on a reference genome that currently does not encompass all SVs within the population and, thus, is limited in scope. Genomic regions that are the result of deletions not present in the reference genome are not captured by the array and not analyzed for SVs.
Next generation sequencing (NGS) technology was recently shown to be a powerful alternative to array-CGH for identifying genomic structural variation [
1,
7,
19]. Using paired-end sequencing, SVs can be identified with single base pair resolution. Moreover paired-end sequencing allows for the detection of balanced rearrangements in which there is no gain or loss of a genomic region, such as inversions and translocations, which cannot be identified by array-CGH. Paired-end sequencing and mapping (PEM) involves sequencing the paired ends of fragments of known insert size from a genomic DNA library and computationally mapping DNA reads to a reference genome.
Here, we used PEM on reduced representation libraries (RRLs) of pooled chicken DNA samples. In the chicken genome, only 43 (larger) SVs have been described thus far [
20]. These SVs encompass 16 chicken-turkey inter-specific copy number variants (CNV) and 32 chicken-duck inter-specific CNVs, of which five CNVs overlap with inter-specific chicken-turkey CNVs [
21]. In chicken, some phenotypes have already been linked to structural variation, including the pea-comb [
22] and late feathering [
23] phenotypes. With PEM of an RRL, we provide a cost-effective approach for exploring the presence of SVs at high resolution within four chicken breeds.