The DGRP lines, sequences, variant calls, phenotypes, and web tools for molecular population genomics and GWA analysis are publicly available (). The DGRP lines contain at least 4,672,297 SNPs, 105,799 polymorphic microsatellites and 36,810 TEs, as well as insertion/deletion events and copy number variants and are a valuable resource for understanding the genetic architecture of quantitative traits of ecological and evolutionary relevance as well as Drosophila
models of human quantitative traits. These novel mutations have survived the sieve of natural selection and will enhance the functional annotation of the Drosophila
genome, complementing the Drosophila
Gene Disruption Project43
and the Drosophila
Genome-wide molecular population genetic analyses show that patterns of polymorphism, but not divergence, differ by autosomal chromosome region, and between the X
chromosome and autosomes. Polymorphism is lower in autosomal centromeric than non-centromeric regions, but not for the X
chromosome. We hypothesize that the correlation of polymorphism with recombination in regions where recombination is < 2 cM/Mb is due to the reduced effective population size in regions of low recombination9
. Selection is less efficient in regions of low recombination33
, consistent with our observation that the fraction of strongly deleterious mutations and positively selected sites are reduced in these regions.
All molecular population genomic analyses support the ‘faster X
. Relative to the autosomes, the X
chromosome exhibits lower polymorphism, faster rates of molecular evolution, a higher percentage of gene regions undergoing adaptive evolution, a higher fraction of strongly deleterious sites, and a lower level of weak negative selection and relaxation of selection. New X
-linked mutations are directly exposed to selection each generation in hemizygous males, and the X
chromosome has greater recombination than autosomes45
; both of these factors could contribute to this observation.
GWA analyses of three fitness-related quantitative traits reveal hundreds of novel candidate genes, highlighting our ignorance of the genetic basis of complex traits. Most variants associated with the traits are at low frequency, and there is an inverse relationship between frequency and effect. Given that low frequency alleles are likely to be deleterious for traits under directional or stabilizing selection, these results are consistent with the mutation-selection balance hypothesis1
for the maintenance of quantitative genetic variation. Regression models incorporating significant SNPs explain most of the phenotypic variance of the traits, in contrast with human association studies, where significant SNPs have tiny effects and together explain a small fraction of the total phenotypic variance7
. If the genetic architecture of human complex traits is also dominated by low frequency causal alleles, we expect estimates of effect size based on LD with common variants to be strongly biased downwards.
In the future, the full power of Drosophila
genetics can be applied to validating marker-trait associations: mutations, RNAi constructs and QTL mapping populations. The DGRP is an ideal resource for systems genetics analyses of the relationship between molecular variation, causal molecular networks, and genetic variation for complex traits4,39,46
, and will anchor evolutionary studies in comparison with sequenced Drosophila
species to assess to what extent variation within a species corresponds to variation among species.