Understanding structure and function of human genome requires knowledge of genomes of our closest living relatives, the primates. Nucleotide insertions and deletions (indels) play a significant role in differentiation that underlies phenotypic differences between humans and chimpanzees. In this study, we evaluated distribution, evolutionary history, and function of indels found by comparing syntenic regions of the human and chimpanzee genomes.
Specifically, we identified 6,279 indels of 10 bp or greater in a ~33 Mb alignment between human and chimpanzee chromosome 22. After the exclusion of those in repetitive DNA, 1,429 or 23% of indels still remained. This group was characterized according to the local or genome-wide repetitive nature, size, location relative to genes, and other genomic features. We defined three major classes of these indels, using local structure analysis: (i) those indels found uniquely without additional copies of indel sequence in the surrounding (10 Kb) region, (ii) those with at least one exact copy found nearby, and (iii) those with similar but not identical copies found locally. Among these classes, we encountered a high number of exactly repeated indel sequences, most likely due to recent duplications. Many of these indels (683 of 1,429) were in proximity of known human genes. Coding sequences and splice sites contained significantly fewer of these indels than expected from random expectations, suggesting that selection is a factor in limiting their persistence. A subset of indels from coding regions was experimentally validated and their impacts were predicted based on direct sequencing in several human populations as well as chimpanzees, bonobos, gorillas, and two subspecies of orangutans.
Our analysis demonstrates that while indels are distributed essentially randomly in intergenic and intronic genomic regions, they are significantly under-represented in coding sequences. There are substantial differences in representation of indel classes among genomic elements, most likely caused by differences in their evolutionary histories. Using local sequence context, we predicted origins and phylogenetic relationships of gene-impacting indels in primate species. These results suggest that genome plasticity is a major force behind speciation events separating the great ape lineages.