Sequence variants affecting phenotypes between different individuals were believed to be mostly due to small differences, such as single nucleotide polymorphisms (SNPs) [1
]. However, when comparing two or more genomes within a species, gene presence/absence (P/A) variations have been also commonly observed in recent studies. Since Grant et al.
found P/A polymorphisms in the RPM1
gene in Arabidopsis
], an increasing number of P/A genes have been reported in disease resistance genes in this model species [6
] and other land plants [8
]. This phenomenon has also been described in the human genome [11
], the telomeric region in Drosophila
] as well as bacterial genomes [15
], which suggests that P/A polymorphisms have unique roles in species differentiation. Additionally, several human diseases have been associated with gene insertions or deletions [16
] and in plants, there is evidence that P/A genes are involved in gene expression [18
] and noncollinearity in heterosis [19
]. These examples indicate the importance of P/A genes in the evolutionary history of various species.
The commonly used definition of a P/A gene is that it is a gene present in some individuals but absent in others within a species at a particular locus, although there are different definitions in the literature [6
]. The narrow definition of a P/A gene is one which exists only in one individual but not in another on a genome-wide scale. For example, it was reported in maize that 20% of genome segments (~10,000 genes or gene fragments) are not shared between inbred lines B73 and Mo17 [8
]. Yu et al.
found that 2.2% and 3.3% of rice indica
genes, respectively, are unique to the subspecies [20
], while Ding et al.
found 5.2% genes with P/A polymorphisms between Nipponbare and 93-11 [10
]. Although a gene can be localized to a genomic position and be denoted as a P/A gene at that locus, it may have a paralog at a different locus. By using a broad definition, 4.7% additional genes were classified as P/A genes among rice genomes [10
]. Our study also uses the broad definition of a P/A gene, which is one being found at a particular locus only in some genomes compared to the others.
Most land plants have evolved by whole genome duplication and subsequent gene loss [21
]. Such extensive rearrangement events can result in a high proportion of P/A genes in plants. Transposable elements (TE) are dominant factors inducing intraspecies diversity in maize [8
]. Large duplications can be another source of genetic variation [22
]. In Arabidopsis
, unequal and illegitimate recombination also plays an important role in triggering large-scale indels [23
]. The Arabidopsis
genome is extremely redundant due to segmental duplications and tandem arrays [24
]. These features provide ample opportunity for unequal crossing over to generate P/A genes. Balancing selection is thought to be one of the mechanisms maintaining P/A polymorphisms, at least for some disease resistance P/A genes [6
]. However, compared with the large numbers of detected P/A polymorphisms, the mechanisms for P/A gene generation and maintenance are complicated and remain unclear.
Although P/A polymorphisms have been reported in several species [6
], there is still a lack of a clear estimate of the P/A gene number, proportion and variation pattern in any particular species, since a large number of fully sequenced individual genomes is the basic prerequisite for such studies. Recently, 80 re-sequenced Arabidopsis
genomes were released [26
] and provided a unique opportunity to systematically study the characteristics of P/A genes. By analyzing the data, we identified a remarkable number of P/A genes and obtained an estimate of the P/A genes and their frequency distribution in the worldwide Arabidopsis
accessions. We also used this information to investigate the variation in P/A gene patterns among accessions and to provide a description of their preference locations on chromosomes. An analysis of the relationship between diversity and frequency of P/A genes was performed to explore the natural selection pressure, the evolutionary forces on P/A genes in Arabidopsis
populations as well as the mechanism for P/A generation.