Human GWA studies have led to the identification of clusters of SNPs, in linkage disequilibrium with each other, that reveal regions of usually < 50 kb that are associated with a complex trait such as lipoprotein levels, atherosclerosis, etc. Despite this level of resolution, it is often difficult to identify the causative variant; and, when the best associated SNPs are in close proximity to several genes or in an intergenic region, it may be difficult to even know which gene is responsible [7
]. To better resolve a GWA locus and identify the best candidate genes and/or regulatory regions one can use a comparative genomics approach that relies on the assumption that regulation or variation in the same causative gene in concordant regions across multiple species can modify a phenotypic trait. Thus comparison of a human GWA locus with concordant QTLs identified in mouse or rat strain intercrosses may facilitate the detection of functionally conserved genes and regulatory regions [8
]. Rodent strain intercrosses can identify loci associated with a complex trait through the QTL method; however, this method leads to large intervals that can contain hundreds of potential causative genes. Thus, both human GWA and rodent QTL methods have limitations, but the integration of these two types of analysis may lead to the rapid identification of strong candidate genes relevant to human complex traits such as dyslipidemia and atherosclerosis.
Paigen and colleagues proposed a “toolbox” of comparative genomic and bioinformatics analyses that can be applied to rodent QTL studies to help identify the causal gene and variants () [9
]. Mouse QTLs for a complex trait are sometimes mapped to the same locus in multiple strain intercrosses, which strengthens the evidence for a causative gene, and often helps narrow the likely interval. These mouse QTLs may overlap with syntenic regions identified as rat QTLs or human GWA or linkage-identified loci associated with a similar trait [11
]. These are called “concordant loci” and several studies have shown a remarkable concordance between human and mouse loci for HDL-C (93%), LDL-C (100%), TG (80%) levels, and atherosclerosis (63%) [9
]. However, one should keep in mind that because QTLs for complex multifactorial disease are large and numerous it is possible that the observed concordant loci may be due to chance [8
]. Comparative mapping of the concordant loci across species can be used to narrow the region of interest, although, if the concordance hypothesis for a specific locus is not true, it could lead to discarding the region in which the QTG resides.
Figure 1 Use of bioinformatic tools including comparative genomics to refine a QTL interval. Hypothetical example of a mouse complex trait QTL with the shaded regions representing the QTL region of interest through stepwise analyses as denoted. The respective (more ...)
Combining data from mouse and human studies has led to the recent identification of several candidate genes for dyslipidemia and related diseases, and has provided insight into their mechanism of action. Su et al.
identified HDL-C QTLs from two mouse intercrosses on chromosomes 1, 3, 5, 6, 8, 9, 17, 18, and 19; and, these QTLs were located in regions homologous to human QTLs for HDL-C identified by family based linkage studies in various populations [13
]. Comparative genomics helped narrow the mouse HDL-C QTL interval on chromosome 17 (Hdlq56)
from 48 to 1.7 Mb, and in a similar fashion reduced the QTL interval of HDL QTLs on chromosome 3 (Hdlq21
), 5 (Hdlq52
), 8 (Hdlq44
), and 18 (Hdlq57
]. The comparative genomics step was followed by mouse strain haplotype analysis, which is based upon sequence analysis that has determined that the genomes of the common mouse strains are composed of mosaic blocks from several ancestral strains; and, these regions of ancestral identity among strains have been mapped across the genome [19**
]. The haplotype blocks that are shared between a strain pair are eliminated from further consideration to reduce the QTL interval, based on the assumption that the causative variant is derived from distinct ancestral origins. Although 97% of strain differences have been attributed to ancestral strain origins [20
], more recent mutations within the shared haplotype blocks have occurred, thus this step has the possibility of eliminating the causative variant. When this method was applied to the HDL QTLs it led to a narrowing of the QTL interval to several dispersed genomic regions that have different ancestral origin in the high HDL strains vs. the low HDL strains [17
The next step in this bioinformatics process involves looking for nonsynonymous SNPs and using gene expression differences to identify specific candidate genes. Su et al.,
used these methods to identify Lpl
(lipoprotein lipase), a gene known for its function in regulating HDL levels in mice and humans, as the candidate gene for the Hdlq44
]. They also identified Nr2f2
(nuclear receptor subfamily 2, group F, member 2) as a candidate gene for chromosome 7 (Hdlq58)
gene was found to regulate the transcription of Apoa2
and polymorphisms in this gene are found to be associated with HDL levels in a GWA study [21
], showing cross species concordance.
Furthermore, in a different study Su et al.
, identified the partially overlapping Farp2
(FERM, RhoGEF and pleckstrin domain protein 2) and Stk25
genes (serine/threonine kinase 25) as candidate genes for an HDL QTL on chromosome 1 [22
]. Although they did not use cross-species comparative genomics to narrow down the interval, they observed that both genes are located on human chromosome 2q37.2 where a human QTL for HDL was previously identified [23
]. Also, a GWA study found polymorphisms in the FARP2
gene associated with variations of human HDL levels [21
], thus providing strong evidence that Farp2
and/or the adjacent Stk25
gene identified in mice are likely to play a role in regulating HDL levels in humans as well.
Another successful example of the comparative genomics approach is the narrowing of a mouse HDL QTL on chromosome 18 from 9 Mb to 6.9 Mb and the identification of Lipg
(endothelial lipase) as the most likely candidate gene for this QTL by haplotype, gene expression, and sequence variation analyses [24**
]. Several recent GWA studies have found LIPG
SNPs associated with HDL-C levels in humans [2
]. Also, a rare loss-of- function mutation in the LIPG
) was found to be significantly associated with increased HDL-C levels in human subjects [28
Application of the abovementioned set of bioinformatic tools [10
], including comparative genomics, identified Scarb1
(scavenger receptor class B, member 1) and Acads
(acyl-Coenzyme A dehydrogenase, short chain) as candidate genes for HDL QTLs on mouse chromosome 5 [29*
in mice is adjacent to the Mvk
genes, which were found to be present in a cluster of five genes positioned in a concordant human HDL QTL located at 12q24.23. Recently, two GWA studies have shown that this genomic region contains SNPs significantly associated with HDL levels in humans [26
]. Additional studies, including generation of knockdown, knockout and/or transgenic mice, are necessary to investigate the relationship of the Mvk
genes with HDL metabolism.