The rhesus macaque (Macaca mulatta
) and human (Homo sapiens
) are thought to have shared a common ancestor approximately 25 million years ago [1
]. Due to their genetic, physiological and behavioral similarities with humans, and because of their hardiness, adaptability, and availability, the rhesus macaque has been widely used as a model in biomedical research [2
]. Humans presently are the most numerous and widespread of primates. Furthermore hominid apes representing the ancestral lineage of humans were geographically widespread, their fossils having been found in both Africa and Asia. However the human diaspora is relatively recent, with our African ancestry dating back only 80,000 to 150,000
yrs b.p [4
]. Also, the number of humans worldwide numbered as low as one million as recently as 100,000
yrs ago [5
], and due to limitations in dispersion and gene flow effective population sizes were much smaller still. Substantial evidence exists that the neutral genetic diversity of humans has been shaped, and in fact restricted, by an effective population size that until recently was less than 8,000 [6
]. Current geographic range of the rhesus macaque extends from Afghanistan to the East China Sea. The population presently numbers in the millions, and in its range and population size the rhesus macaque is only exceeded by the humans among primate species [7
]. Fossil evidence indicates that the Macaca
genus originated in North Africa, and dispersed to various sites in Asia at least three million years ago [8
]. The rhesus macaque has adapted to a variety of natural environments, including savannah and forests, and has adapted to various climatic zones. Rhesus macaques thrive in cities where they live side by side with man. The diversity of environmental adaptations and large current and ancestral population sizes suggests that the genetic legacy of the rhesus macaque may include a higher quotient of both neutral and selectively significant genetic variation than humans. Understanding the details of differences in genomic diversity between macaques and humans, especially in functionally important genomic regions, will not only provide valuable information on their evolutionary dynamics but improve the utility of the rhesus macaque as a non-human primate model in biomedical research.
The rhesus macaque is a genetically diverse primate. Consistent with the rhesus macaque having a high degree of genetic variation, substantial morphological variation has been observed between rhesus macaques from the same populations and also between populations, with as many as 13 subspecies identified [9
]. Within rhesus macaques there is some evidence for genetic distinctiveness at the molecular level, and Indian rhesus may be among the least diverse [10
]. Several studies using protein polymorphisms have found higher levels of diversity in rhesus macaques from China (where there are also more subspecies) than India, and there is some evidence for a genetic bottleneck in Indian rhesus macaques [9
]. However, substantial gene flow probably occurred later, which could refresh genetic variation. In a study of six rhesus macaque populations, including Indian, Burmese, and four Chinese populations, Indian macaques had one third to one sixth the mitochondrial DNA diversity as compared to four other populations. However, Indian macaques were approximately equal in diversity to rhesus macaques from one of the Western Chinese populations [9
]. A recent study with more than 1,000 Single nucleotide polymorphisms (SNPs), which are more mutationally stable than other types of genetic markers, revealed that Indian and Chinese rhesus macaques were nearly identical in genetic diversity [11
]. Taken together, the evidence suggests that the rhesus macaque is likely to be a genetically diverse primate species but Indian macaques are if anything among the least heterogeneous populations. Genomic analysis of rhesus macaques of Indian origin, which are more often used for biomedical research, would thus provide a conservative estimate of the variability of rhesus macaques.
A single rhesus macaque of Indian origin was the source for a macaque draft genome sequence, which was completed in 2007 [3
]. This draft sequence opened the opportunity to map the amount and type of macaque genomic variation. Furthermore, characterization of genetic variation in macaques would greatly improve the value of the rhesus macaque as an animal model for biomedical research and human biology. However, just 8,134 macaque SNPs have currently been recorded in dbSNP (Build 135,http://www.ncbi.nlm.nih.gov/SNP/snp_summary.cgi
). In 2007, Malhi et al. reported approximately 23,000 candidate SNPs detected by pyrosequencing [12
]. Fawcett et al. recently reported 3 million SNPs in Indian-origin rhesus macaques using SOLiD re-sequencing along with previous sequencing data [13
]. Some 22 million SNPs are known in the human. The number of SNPs in the macaque is unknown, but may be much larger. Based on their observation of 3 million SNPs, Fawcett et al. suggested that the rhesus macaque is at least as diverse as the human, or more diverse, but did not analyze the two species with an equivalent approach and refrained from a direct quantitative comparison, as we will perform here.
There have been some more limited efforts to comparatively estimate diversity in the rhesus macaque. As revealed in the original sequencing of a single animal and compatible with a larger effective population size of the macaque across evolutionary timeframes, the macaque appeared to have higher sequence diversity than the human [3
]. SNP density was broadly estimated to range between 1–7.8 SNPs/Kb [3
However, the number of loci on which this conclusion was based was relatively small, and the loci were not selected in unbiased fashion. In this study, we have used SNPs equivalently identified in 14 humans and 14 rhesus macaques (mostly of Indian origin) by massively parallel sequencing with both H3K4me3 (trimethylated histone H3-lysine 4) ChIPseq (chromatin immunoprecipitation followed with massively parallel DNA sequencing) and RNAseq (whole transcriptome massively parallel shotgun sequencing) as sources of sequenced fragments. From more than 16,000 genic regions, some half million macaque SNPs, most newly identified, were further analyzed. By sequencing diversity in the tissue-specific transcriptomes and histone-marked regions of the two species, we were able, without the use of DNA capture technology (which did not exist for the macaque) or whole-genome sequencing, to compare diversity in equivalent, functionally relevant regions and detect effects of selection and drift on sequence substitutions.