By re-sequencing ~1,000 chromosomes ascertained from several population samples, we have catalogued novel coding and noncoding DNA variants in CRP
. The vast majority of these novel SNPs were rare, consistent with previous observations from candidate gene deep re-sequencing projects (Glatt et al. 2001
). We also identified common CRP
variation in the Asian and Mexican-descent population samples that overlapped to a great extent with common variation identified by SeattleSNPs in the European-American and African-American population samples.
-specific genetic association studies, these re-sequencing data suggest that CRP
common SNPs are in dbSNP and that common disease/common variant study designs using a tagSNP approach can be applied to several population samples with very little population-specific modification. That is, if the most diverse population (e.g., African-descent populations) is considered in tagSNP selection for CRP
, genetic diversity for CRP
in other population samples will be well represented. However, if only a less diverse population were considered for tagSNP selection (e.g., European-descent populations), one tagSNP (rs3093058) associated with serum CRP levels (Carlson et al. 2005
; D. C. Crawford et al., submitted) would not be represented for CRP
diversity observed in African-Americans and Mexican-Americans. This issue should be considered when designing genetic association studies in non-European population samples using public SNP resources such as HapMap or dbSNP (Clark et al. 2005
The re-sequencing data for CRP
also suggest that coding variation, if not identified in a small re-sequencing sample, is very rare in the general population. These results are not entirely unexpected because the samples sizes used in variation discovery for the present study are based on population genetics theory (i.e., effective population size and mutation rate) and are designed to sufficiently estimate the allele frequency for SNPs with a minor allele frequency of >5–10% (Kruglyak and Nickerson 2001
). Indeed, tagSNPs determined in SeattleSNPs were recently genotyped in three larger studies and demonstrated that allele frequencies estimated in the SeattleSNPs sample of 47 individuals is nearly identical to the allele frequencies estimated in the larger studies (Carlson et al. 2005
; D. C. Crawford et al., submitted; L. A. Lange et al., in preparation).
While the coding variation identified in the present study was rare in the general population, there is a possibility that CRP
coding variation is strongly associated with a phenotype and could be enriched in a case population. The challenge, of course, is to predict a disease or extreme phenotype given that a putative mutation has been identified. Given that the nonsynonymous SNP in CRP
may be damaging to the gene product, we may expect lower levels of serum CRP in heterozygous individuals compared with homozygous individuals for the common allele. Alternately, we may expect individuals heterozygous for these SNPs are unable to mount a proper acute phase response compared with individuals homozygous for the common allele. There is no known human deficiency for CRP, but there are several phenotypes associated with lack of acute phase response (reviewed in Pepys and Hirschfield 2003
). Unfortunately, the CRP assay used in NHANES III is not as sensitive compared with currently used assays (Centers for Disease Control and Prevention 1994
), so we cannot determine how low the serum CRP levels are for the two individuals who are heterozygous for the nonsynonymous CRP
SNP (D. C. Crawford et al., submitted).
The possibility that deleterious variation may exist in human CRP
is intriguing. In addition to providing information for future functional CRP
studies in humans, these deep re-sequencing data provide an opportunity to explore the effects of possibly deleterious genetic variation using model organisms and in vitro assays. One obvious choice for modeling SNP effects in context with various environmental exposures is the mouse (MacAuley and Ladiges 2005
). However, for CRP
, the mouse is not an ideal model because its CRP
gene does not function like the human CRP
gene (Pepys and Hirschfield 2003
). Nevertheless, the CRP
example demonstrates the potential usefulness of identifying coding variants through deep re-sequencing that may be taken forward into other meaningful experiments that specialize in function.
In summary, we describe here the genetic architecture of CRP
with special emphasis on the detection and characterization of rare variants. While we were able to describe novel nonsynonymous variants for CRP
, we were unable to describe these variants at an appreciable frequency when genotyped in the general population making it impossible to explore the impact of these variants on serum CRP. Given this situation, for a quantitative trait such as CRP, it may be more efficient to re-sequence individuals at the extreme low and high end of the distribution in the search for rarer SNPs that impact the phenotype of interest (Cohen et al. 2005
; Cohen et al. 2004
). In the age of more efficient, high throughput genotyping (Hinds et al. 2004
; Shen et al. 2005
), it may be that re-sequencing used in conjunction with genotyping using different study designs is required to fully appreciate the spectrum of genetic variants associated with a human phenotype of interest.