PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1666021)

Clipboard (0)
None

Related Articles

1.  ParallABEL: an R library for generalized parallelization of genome-wide association studies 
BMC Bioinformatics  2010;11:217.
Background
Genome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response. Parts of GWA analyses, especially those involving thousands of individuals and consuming hours to months, will benefit from parallel computation. It is arduous acquiring the necessary programming skills to correctly partition and distribute data, control and monitor tasks on clustered computers, and merge output files.
Results
Most components of GWA analysis can be divided into four groups based on the types of input data and statistical outputs. The first group contains statistics computed for a particular Single Nucleotide Polymorphism (SNP), or trait, such as SNP characterization statistics or association test statistics. The input data of this group includes the SNPs/traits. The second group concerns statistics characterizing an individual in a study, for example, the summary statistics of genotype quality for each sample. The input data of this group includes individuals. The third group consists of pair-wise statistics derived from analyses between each pair of individuals in the study, for example genome-wide identity-by-state or genomic kinship analyses. The input data of this group includes pairs of SNPs/traits. The final group concerns pair-wise statistics derived for pairs of SNPs, such as the linkage disequilibrium characterisation. The input data of this group includes pairs of individuals. We developed the ParallABEL library, which utilizes the Rmpi library, to parallelize these four types of computations. ParallABEL library is not only aimed at GenABEL, but may also be employed to parallelize various GWA packages in R. The data set from the North American Rheumatoid Arthritis Consortium (NARAC) includes 2,062 individuals with 545,080, SNPs' genotyping, was used to measure ParallABEL performance. Almost perfect speed-up was achieved for many types of analyses. For example, the computing time for the identity-by-state matrix was linearly reduced from approximately eight hours to one hour when ParallABEL employed eight processors.
Conclusions
Executing genome-wide association analysis using the ParallABEL library on a computer cluster is an effective way to boost performance, and simplify the parallelization of GWA studies. ParallABEL is a user-friendly parallelization of GenABEL.
doi:10.1186/1471-2105-11-217
PMCID: PMC2879286  PMID: 20429914
2.  Efficient haplotype block recognition of very long and dense genetic sequences 
BMC Bioinformatics  2014;15:10.
Background
The new sequencing technologies enable to scan very long and dense genetic sequences, obtaining datasets of genetic markers that are an order of magnitude larger than previously available. Such genetic sequences are characterized by common alleles interspersed with multiple rarer alleles. This situation has renewed the interest for the identification of haplotypes carrying the rare risk alleles. However, large scale explorations of the linkage-disequilibrium (LD) pattern to identify haplotype blocks are not easy to perform, because traditional algorithms have at least Θ(n2) time and memory complexity.
Results
We derived three incremental optimizations of the widely used haplotype block recognition algorithm proposed by Gabriel et al. in 2002. Our most efficient solution, called MIG ++, has only Θ(n) memory complexity and, on a genome-wide scale, it omits >80% of the calculations, which makes it an order of magnitude faster than the original algorithm. Differently from the existing software, the MIG ++ analyzes the LD between SNPs at any distance, avoiding restrictions on the maximal block length. The haplotype block partition of the entire HapMap II CEPH dataset was obtained in 457 hours. By replacing the standard likelihood-based D′ variance estimator with an approximated estimator, the runtime was further improved. While producing a coarser partition, the approximate method allowed to obtain the full-genome haplotype block partition of the entire 1000 Genomes Project CEPH dataset in 44 hours, with no restrictions on allele frequency or long-range correlations. These experiments showed that LD-based haplotype blocks can span more than one million base-pairs in both HapMap II and 1000 Genomes datasets. An application to the North American Rheumatoid Arthritis Consortium (NARAC) dataset shows how the MIG ++ can support genome-wide haplotype association studies.
Conclusions
The MIG ++ enables to perform LD-based haplotype block recognition on genetic sequences of any length and density. In the new generation sequencing era, this can help identify haplotypes that carry rare variants of interest. The low computational requirements open the possibility to include the haplotype block structure into genome-wide association scans, downstream analyses, and visual interfaces for online genome browsers.
doi:10.1186/1471-2105-15-10
PMCID: PMC3898000  PMID: 24423111
3.  A Candidate Gene Approach Identifies the TRAF1/C5 Region as a Risk Factor for Rheumatoid Arthritis 
PLoS Medicine  2007;4(9):e278.
Background
Rheumatoid arthritis (RA) is a chronic autoimmune disorder affecting ∼1% of the population. The disease results from the interplay between an individual's genetic background and unknown environmental triggers. Although human leukocyte antigens (HLAs) account for ∼30% of the heritable risk, the identities of non-HLA genes explaining the remainder of the genetic component are largely unknown. Based on functional data in mice, we hypothesized that the immune-related genes complement component 5 (C5) and/or TNF receptor-associated factor 1 (TRAF1), located on Chromosome 9q33–34, would represent relevant candidate genes for RA. We therefore aimed to investigate whether this locus would play a role in RA.
Methods and Findings
We performed a multitiered case-control study using 40 single-nucleotide polymorphisms (SNPs) from the TRAF1 and C5 (TRAF1/C5) region in a set of 290 RA patients and 254 unaffected participants (controls) of Dutch origin. Stepwise replication of significant SNPs was performed in three independent sample sets from the Netherlands (ncases/controls = 454/270), Sweden (ncases/controls = 1,500/1,000) and US (ncases/controls = 475/475). We observed a significant association (p < 0.05) of SNPs located in a haplotype block that encompasses a 65 kb region including the 3′ end of C5 as well as TRAF1. A sliding window analysis revealed an association peak at an intergenic region located ∼10 kb from both C5 and TRAF1. This peak, defined by SNP14/rs10818488, was confirmed in a total of 2,719 RA patients and 1,999 controls (odds ratiocommon = 1.28, 95% confidence interval 1.17–1.39, pcombined = 1.40 × 10−8) with a population-attributable risk of 6.1%. The A (minor susceptibility) allele of this SNP also significantly correlates with increased disease progression as determined by radiographic damage over time in RA patients (p = 0.008).
Conclusions
Using a candidate-gene approach we have identified a novel genetic risk factor for RA. Our findings indicate that a polymorphism in the TRAF1/C5 region increases the susceptibility to and severity of RA, possibly by influencing the structure, function, and/or expression levels of TRAF1 and/or C5.
Using a candidate-gene approach, Rene Toes and colleagues identified a novel genetic risk factor for rheumatoid arthritis in theTRAF1/C5 region.
Editors' Summary
Background.
Rheumatoid arthritis is a very common chronic illness that affects around 1% of people in developed countries. It is caused by an abnormal immune reaction to various tissues within the body; as well as affecting joints and causing an inflammatory arthritis, it can also affect many other organs of the body. Severe rheumatoid arthritis can be life-threatening, but even mild forms of the disease cause substantial illness and disability. Current treatments aim to give symptomatic relief with the use of simple analgesics, or anti-inflammatory drugs. In addition, most patients are also treated with what are known as disease-modifying agents, which aim to prevent joint damage. Rheumatoid arthritis is known to have a genetic component. For example, an association has been shown with the part of the genome that contains the human leukocyte antigens (HLAs), which are involved in the immune response. Information on other genes involved would be helpful both for understanding the underlying cause of the disease and possibly for the discovery of new treatments.
Why Was This Study Done?
Previous work in mice that have a disease similar to human rheumatoid arthritis has identified a number of possible candidate genes. One of these genes, complement component 5 (C5) is involved in the complement system—a primitive system within the body that is involved in the defense against foreign molecules. In humans the gene for C5 is located on Chromosome 9 close to another gene involved in the inflammatory response, TNF receptor-associated factor 1 (TRAF1). A preliminary study in humans of this region had shown some evidence, albeit weak, to suggest that this region might be associated with rheumatoid arthritis. The authors set out to look in more detail, and in a larger group of individuals, to see if they could prove this association.
What Did the Researchers Do and Find?
The researchers took 40 genetic markers, known as single-nucleotide polymorphisms (SNPs), from across the region that included the C5 and TRAF1 genes. SNPs have each been assigned a unique reference number that specifies a point in the human genome, and each is present in alternate forms so can be differentiated. They compared which of the alternate forms were present in 290 patients with rheumatoid arthritis and 254 unaffected participants of Dutch origin. They then repeated the study in three other groups of patients and controls of Dutch, Swedish, and US origin. They found a consistent association with rheumatoid arthritis of one region of 65 kilobases (a small distance in genetic terms) that included one end of the C5 gene as well as the TRAF1 gene. They could refine the area of interest to a piece marked by one particular SNP that lay between the genes. They went on to show that the genetic region in which these genes are located may be involved in the binding of a protein that modifies the transcription of genes, thus providing a possible explanation for the association. Furthermore, they showed that one of the alternate versions of the marker in this region was associated with more aggressive disease.
What Do These Findings Mean?
The finding of a genetic association is the first step in identifying a genetic component of a disease. The strength of this study is that a novel genetic susceptibility factor for RA has been identified and that the overall result is consistent in four different populations as well as being associated with disease severity. Further work will need to be done to confirm the association in other populations and then to identify the precise genetic change involved. Hopefully this work will lead to new avenues of investigation for therapy.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0040278.
• Medline Plus, the health information site for patients from the US National Library of Medicine, has a page of resources on rheumatoid arthritis
• The UK's National Health Service online information site has information on rheumatoid arthritis
• The Arthritis Research Campaign, a UK charity that funds research on all types of arthritis, has a booklet with information for patients on rheumatoid arthritis
• Reumafonds, a Dutch arthritis foundation, gives information on rheumatoid arthritis (in Dutch)
• Autocure is an initiative whose objective is to transform knowledge obtained from molecular research into a cure for an increasing number of patients suffering from inflammatory rheumatic diseases
• The European league against Rheumatism, an organisation which represents the patient, health professionals, and scientific societies of rheumatology of all European nations
doi:10.1371/journal.pmed.0040278
PMCID: PMC1976626  PMID: 17880261
4.  Haplotype-Based Analysis: A Summary of GAW16 Group 4 Analysis 
Genetic epidemiology  2009;33(Suppl 1):S24-S28.
In this summary paper, we describe the contributions included in the haplotype-based analysis group (Group 4) at the Genetic Analysis Workshop 16, which was held September 17-20, 2008. Our group applied a large number of haplotype-based methods in the context of genome-wide association studies. Two general approaches were applied: a two-stage approach that selected significant single-nucleotide polymorphisms and then created haplotypes and genome-wide analysis of smaller sets of single-nucleotide polymorphisms selected by sliding windows or estimating haplotype blocks. Genome-wide haplotype analyses performed in these ways were feasible. The presence of the very strong chromosome 6 association in the North American Rheumatoid Arthritis Consortium data was detected by every method, and additional analyses attempted to control for this strong result to allow detection of additional haplotype associations.
doi:10.1002/gepi.20468
PMCID: PMC2916652  PMID: 19924718
population stratification; multiple comparisons
5.  Multi-locus stepwise regression: a haplotype-based algorithm for finding genetic associations applied to atopic dermatitis 
BMC Medical Genetics  2012;13:8.
Background
Genome-wide association studies (GWAS) provide an increasing number of single nucleotide polymorphisms (SNPs) associated with diseases. Our aim is to exploit those closely spaced SNPs in candidate regions for a deeper analysis of association beyond single SNP analysis, combining the classical stepwise regression approach with haplotype analysis to identify risk haplotypes for complex diseases.
Methods
Our proposed multi-locus stepwise regression starts with an evaluation of all pair-wise SNP combinations and then extends each SNP combination stepwise by one SNP from the region, carrying out haplotype regression in each step. The best associated haplotype patterns are kept for the next step and must be corrected for multiple testing at the end. These haplotypes should also be replicated in an independent data set. We applied the method to a region of 259 SNPs from the epidermal differentiation complex (EDC) on chromosome 1q21 of a German GWAS using a case control set (1,914 individuals) and to 268 families with at least two affected children as replication.
Results
A 4-SNP haplotype pattern with high statistical significance in the case control set (p = 4.13 × 10-7 after Bonferroni correction) could be identified which remained significant in the family set after Bonferroni correction (p = 0.0398). Further analysis revealed that this pattern reflects mainly the effect of the well-known FLG gene; however, a FLG-independent haplotype in case control set (OR = 1.71, 95% CI: 1.32-2.23, p = 5.6 × 10-5) and family set (OR = 1.68, 95% CI: 1.18-2.38, p = 2.19 × 10-3) could be found in addition.
Conclusion
Our approach is a useful tool for finding allele combinations associated with diseases beyond single SNP analysis in chromosomal candidate regions.
doi:10.1186/1471-2350-13-8
PMCID: PMC3398269  PMID: 22284537
6.  Performance of Single Nucleotide Polymorphisms versus Haplotypes for Genome-Wide Association Analysis in Barley 
PLoS ONE  2010;5(11):e14079.
Genome-wide association studies (GWAS) may benefit from utilizing haplotype information for making marker-phenotype associations. Several rationales for grouping single nucleotide polymorphisms (SNPs) into haplotype blocks exist, but any advantage may depend on such factors as genetic architecture of traits, patterns of linkage disequilibrium in the study population, and marker density. The objective of this study was to explore the utility of haplotypes for GWAS in barley (Hordeum vulgare) to offer a first detailed look at this approach for identifying agronomically important genes in crops. To accomplish this, we used genotype and phenotype data from the Barley Coordinated Agricultural Project and constructed haplotypes using three different methods. Marker-trait associations were tested by the efficient mixed-model association algorithm (EMMA). When QTL were simulated using single SNPs dropped from the marker dataset, a simple sliding window performed as well or better than single SNPs or the more sophisticated methods of blocking SNPs into haplotypes. Moreover, the haplotype analyses performed better 1) when QTL were simulated as polymorphisms that arose subsequent to marker variants, and 2) in analysis of empirical heading date data. These results demonstrate that the information content of haplotypes is dependent on the particular mutational and recombinational history of the QTL and nearby markers. Analysis of the empirical data also confirmed our intuition that the distribution of QTL alleles in nature is often unlike the distribution of marker variants, and hence utilizing haplotype information could capture associations that would elude single SNPs. We recommend routine use of both single SNP and haplotype markers for GWAS to take advantage of the full information content of the genotype data.
doi:10.1371/journal.pone.0014079
PMCID: PMC2989918  PMID: 21124933
7.  Detection of disease-associated deletions in case–control studies using SNP genotypes with application to rheumatoid arthritis 
Human genetics  2009;126(2):303-315.
Genomic deletions have long been known to play a causative role in microdeletion syndromes. Recent whole-genome genetic studies have shown that deletions can increase the risk for several psychiatric disorders, suggesting that genomic deletions play an important role in the genetic basis of complex traits. However, the association between genomic deletions and common, complex diseases has not yet been systematically investigated in gene mapping studies. Likelihood-based statistical methods for identifying disease-associated deletions have recently been developed for familial studies of parent-offspring trios. The purpose of this study is to develop statistical approaches for detecting genomic deletions associated with complex disease in case–control studies. Our methods are designed to be used with dense single nucleotide polymorphism (SNP) genotypes to detect deletions in large-scale or whole-genome genetic studies. As more and more SNP genotype data for genome-wide association studies become available, development of sophisticated statistical approaches will be needed that use these data. Our proposed statistical methods are designed to be used in SNP-by-SNP analyses and in cluster analyses based on combined evidence from multiple SNPs. We found that these methods are useful for detecting disease-associated deletions and are robust in the presence of linkage disequilibrium using simulated SNP data sets. Furthermore, we applied the proposed statistical methods to SNP genotype data of chromosome 6p for 868 rheumatoid arthritis patients and 1,197 controls from the North American Rheumatoid Arthritis Consortium. We detected disease-associated deletions within the region of human leukocyte antigen in which genomic deletions were previously discovered in rheumatoid arthritis patients.
doi:10.1007/s00439-009-0672-3
PMCID: PMC2992885  PMID: 19415332
8.  Suggestive evidence for association between L-type voltage-gated calcium channel (CACNA1C) gene haplotypes and bipolar disorder in Latinos: a family-based association study 
Bipolar disorders  2013;15(2):206-214.
Objectives
Through recent genome-wide association studies (GWAS), several groups have reported significant association between variants in the alpha 1C subunit of the L-type voltage-gated calcium channel (CACNA1C) and bipolar disorder (BP) in European and European-American cohorts. We performed a family-based association study to determine whether CACNA1C is associated with BP in the Latino population.
Methods
This study consisted of 913 individuals from 215 Latino pedigrees recruited from the United States, Mexico, Guatemala, and Costa Rica. The Illumina GoldenGate Genotyping Assay was used to genotype 58 single-nucleotide polymorphisms (SNPs) that spanned a 602.9 kb region encompassing the CACNA1C gene including two SNPs (rs7297582 and rs1006737) previously shown to associate with BP. Individual SNP and haplotype association analyses were performed using Family-Based Association Test (version 2.0.3) and Haploview (version 4.2) software.
Results
An eight-locus haplotype block that included these two markers showed significant association with BP (global marker permuted p = 0.0018) in the Latino population. For individual SNPs, this sample had insufficient power (10%) to detect associations with SNPs with minor effect (odds ratio = 1.15).
Conclusions
Although we were not able to replicate findings of association between individual CACNA1C SNPs rs7297582 and rs1006737 and BP, we were able to replicate the GWAS signal reported for CACNA1C through a haplotype analysis that encompassed these previously reported significant SNPs. These results provide additional evidence that CACNA1C is associated with BP and provides the first evidence that variations in this gene might play a role in the pathogenesis of this disorder in the Latino population.
doi:10.1111/bdi.12041
PMCID: PMC3781018  PMID: 23437964
bipolar disorder; calcium channels; genetic association studies; haplotypes; Hispanic Americans; L-type; pedigree; polymorphism; single nucleotide
9.  A Pleiotropy-Informed Bayesian False Discovery Rate Adapted to a Shared Control Design Finds New Disease Associations From GWAS Summary Statistics 
PLoS Genetics  2015;11(2):e1004926.
Genome-wide association studies (GWAS) have been successful in identifying single nucleotide polymorphisms (SNPs) associated with many traits and diseases. However, at existing sample sizes, these variants explain only part of the estimated heritability. Leverage of GWAS results from related phenotypes may improve detection without the need for larger datasets. The Bayesian conditional false discovery rate (cFDR) constitutes an upper bound on the expected false discovery rate (FDR) across a set of SNPs whose p values for two diseases are both less than two disease-specific thresholds. Calculation of the cFDR requires only summary statistics and have several advantages over traditional GWAS analysis. However, existing methods require distinct control samples between studies. Here, we extend the technique to allow for some or all controls to be shared, increasing applicability. Several different SNP sets can be defined with the same cFDR value, and we show that the expected FDR across the union of these sets may exceed expected FDR in any single set. We describe a procedure to establish an upper bound for the expected FDR among the union of such sets of SNPs. We apply our technique to pairwise analysis of p values from ten autoimmune diseases with variable sharing of controls, enabling discovery of 59 SNP-disease associations which do not reach GWAS significance after genomic control in individual datasets. Most of the SNPs we highlight have previously been confirmed using replication studies or larger GWAS, a useful validation of our technique; we report eight SNP-disease associations across five diseases not previously declared. Our technique extends and strengthens the previous algorithm, and establishes robust limits on the expected FDR. This approach can improve SNP detection in GWAS, and give insight into shared aetiology between phenotypically related conditions.
Author Summary
Many diseases have a significant hereditary component, only part of which has been explained by analysis of genome-wide association studies (GWAS). Shared aetiology, treatment protocols, and overlapping results from existing GWAS suggest similarities in genetic susceptibility between related diseases, which may be exploited to detect more disease-associated SNPs without the need for further data. We extend an existing method for detecting SNPs associated with a given disease by conditioning on association with another disease. Our extension allows GWAS for the two conditions to share control samples, enabling larger overall control groups and application to the common case when GWAS for related diseases pool control samples. We demonstrate that our technique limits the expected overall false discovery rate at a threshold dependent on the two diseases. We apply our technique to genotype data from ten immune mediated diseases. Overall pleiotropy between phenotypes is demonstrated graphically. We are able to declare several SNPs significant at a genome-wide level whilst controlling at a lower false-discovery rate than would be possible using a conventional approach, identifying eight previously unknown disease associations. This technique can improve SNP detection in GWAS by re-analysing existing data, and gives insight into the shared genetic bases of autoimmune diseases.
doi:10.1371/journal.pgen.1004926
PMCID: PMC4450050  PMID: 25658688
10.  A Genome-Wide Association Study Confirms VKORC1, CYP2C9, and CYP4F2 as Principal Genetic Determinants of Warfarin Dose 
PLoS Genetics  2009;5(3):e1000433.
We report the first genome-wide association study (GWAS) whose sample size (1,053 Swedish subjects) is sufficiently powered to detect genome-wide significance (p<1.5×10−7) for polymorphisms that modestly alter therapeutic warfarin dose. The anticoagulant drug warfarin is widely prescribed for reducing the risk of stroke, thrombosis, pulmonary embolism, and coronary malfunction. However, Caucasians vary widely (20-fold) in the dose needed for therapeutic anticoagulation, and hence prescribed doses may be too low (risking serious illness) or too high (risking severe bleeding). Prior work established that ∼30% of the dose variance is explained by single nucleotide polymorphisms (SNPs) in the warfarin drug target VKORC1 and another ∼12% by two non-synonymous SNPs (*2, *3) in the cytochrome P450 warfarin-metabolizing gene CYP2C9. We initially tested each of 325,997 GWAS SNPs for association with warfarin dose by univariate regression and found the strongest statistical signals (p<10−78) at SNPs clustering near VKORC1 and the second lowest p-values (p<10−31) emanating from CYP2C9. No other SNPs approached genome-wide significance. To enhance detection of weaker effects, we conducted multiple regression adjusting for known influences on warfarin dose (VKORC1, CYP2C9, age, gender) and identified a single SNP (rs2108622) with genome-wide significance (p = 8.3×10−10) that alters protein coding of the CYP4F2 gene. We confirmed this result in 588 additional Swedish patients (p<0.0029) and, during our investigation, a second group provided independent confirmation from a scan of warfarin-metabolizing genes. We also thoroughly investigated copy number variations, haplotypes, and imputed SNPs, but found no additional highly significant warfarin associations. We present power analysis of our GWAS that is generalizable to other studies, and conclude we had 80% power to detect genome-wide significance for common causative variants or markers explaining at least 1.5% of dose variance. These GWAS results provide further impetus for conducting large-scale trials assessing patient benefit from genotype-based forecasting of warfarin dose.
Author Summary
Recently, geneticists have begun assaying hundreds of thousands of genetic markers covering the entire human genome to systematically search for and identify genes that cause disease. We have extended this “genome-wide association study” (GWAS) method by assaying ∼326,000 markers in 1,053 Swedish patients in order to identify genes that alter response to the anticoagulant drug warfarin. Warfarin is widely prescribed to reduce blood clotting in order to protect high-risk patients from stroke, thrombosis, and heart attack. But patients vary widely (20-fold) in the warfarin dose needed for proper blood thinning, which means that initial doses in some patients are too high (risking severe bleeding) or too low (risking serious illness). Our GWAS detected two genes (VKORC1, CYP2C9) already known to cause ∼40% of the variability in warfarin dose and discovered a new gene (CYP4F2) contributing 1%–2% of the variability. Since our GWAS searched the entire genome, additional genes having a major influence on warfarin dose might not exist or be found in the near-term. Hence, clinical trials assessing patient benefit from individualized dose forecasting based on a patient's genetic makeup at VKORC1, CYP2C9 and possibly CYP4F2 could provide state-of-the-art clinical benchmarks for warfarin use during the foreseeable future.
doi:10.1371/journal.pgen.1000433
PMCID: PMC2652833  PMID: 19300499
11.  Single Nucleotide Polymorphism (SNP)-Strings: An Alternative Method for Assessing Genetic Associations 
PLoS ONE  2014;9(4):e90034.
Background
Genome-wide association studies (GWAS) identify disease-associations for single-nucleotide-polymorphisms (SNPs) from scattered genomic-locations. However, SNPs frequently reside on several different SNP-haplotypes, only some of which may be disease-associated. This circumstance lowers the observed odds-ratio for disease-association.
Methodology/Principal Findings
Here we develop a method to identify the two SNP-haplotypes, which combine to produce each person’s SNP-genotype over specified chromosomal segments. Two multiple sclerosis (MS)-associated genetic regions were modeled; DRB1 (a Class II molecule of the major histocompatibility complex) and MMEL1 (an endopeptidase that degrades both neuropeptides and β-amyloid). For each locus, we considered sets of eleven adjacent SNPs, surrounding the putative disease-associated gene and spanning ∼200 kb of DNA. The SNP-information was converted into an ordered-set of eleven-numbers (subject-vectors) based on whether a person had zero, one, or two copies of particular SNP-variant at each sequential SNP-location. SNP-strings were defined as those ordered-combinations of eleven-numbers (0 or 1), representing a haplotype, two of which combined to form the observed subject-vector. Subject-vectors were resolved using probabilistic methods. In both regions, only a small number of SNP-strings were present. We compared our method to the SHAPEIT-2 phasing-algorithm. When the SNP-information spanning 200 kb was used, SHAPEIT-2 was inaccurate. When the SHAPEIT-2 window was increased to 2,000 kb, the concordance between the two methods, in both of these eleven-SNP regions, was over 99%, suggesting that, in these regions, both methods were quite accurate. Nevertheless, correspondence was not uniformly high over the entire DNA-span but, rather, was characterized by alternating peaks and valleys of concordance. Moreover, in the valleys of poor-correspondence, SHAPEIT-2 was also inconsistent with itself, suggesting that the SNP-string method is more accurate across the entire region.
Conclusions/Significance
Accurate haplotype identification will enhance the detection of genetic-associations. The SNP-string method provides a simple means to accomplish this and can be extended to cover larger genomic regions, thereby improving a GWAS’s power, even for those published previously.
doi:10.1371/journal.pone.0090034
PMCID: PMC3984082  PMID: 24727690
12.  Regional replication of association with refractive error on 15q14 and 15q25 in the Age-Related Eye Disease Study cohort 
Molecular Vision  2013;19:2173-2186.
Purpose
Refractive error is a complex trait with multiple genetic and environmental risk factors, and is the most common cause of preventable blindness worldwide. The common nature of the trait suggests the presence of many genetic factors that individually may have modest effects. To achieve an adequate sample size to detect these common variants, large, international collaborations have formed. These consortia typically use meta-analysis to combine multiple studies from many different populations. This approach is robust to differences between populations; however, it does not compensate for the different haplotypes in each genetic background evidenced by different alleles in linkage disequilibrium with the causative variant. We used the Age-Related Eye Disease Study (AREDS) cohort to replicate published significant associations at two loci on chromosome 15 from two genome-wide association studies (GWASs). The single nucleotide polymorphisms (SNPs) that exhibited association on chromosome 15 in the original studies did not show evidence of association with refractive error in the AREDS cohort. This paper seeks to determine whether the non-replication in this AREDS sample may be due to the limited number of SNPs chosen for replication.
Methods
We selected all SNPs genotyped on the Illumina Omni2.5v1_B array or custom TaqMan assays or imputed from the GWAS data, in the region surrounding the SNPs from the Consortium for Refractive Error and Myopia study. We analyzed the SNPs for association with refractive error using standard regression methods in PLINK. The effective number of tests was calculated using the Genetic Type I Error Calculator.
Results
Although use of the same SNPs used in the Consortium for Refractive Error and Myopia study did not show any evidence of association with refractive error in this AREDS sample, other SNPs within the candidate regions demonstrated an association with refractive error. Significant evidence of association was found using the hyperopia categorical trait, with the most significant SNPs rs1357179 on 15q14 (p=1.69×10−3) and rs7164400 on 15q25 (p=8.39×10−4), which passed the replication thresholds.
Conclusions
This study adds to the growing body of evidence that attempting to replicate the most significant SNPs found in one population may not be significant in another population due to differences in the linkage disequilibrium structure and/or allele frequency. This suggests that replication studies should include less significant SNPs in an associated region rather than only a few selected SNPs chosen by a significance threshold.
PMCID: PMC3826323  PMID: 24227913
13.  Racial or ethnic differences in allele frequencies of single‐nucleotide polymorphisms in the methylenetetrahydrofolate reductase gene and their influence on response to methotrexate in rheumatoid arthritis 
Annals of the Rheumatic Diseases  2006;65(9):1213-1218.
Background
The anti‐folate drug methotrexate (MTX) is commonly used to treat rheumatoid arthritis.
Objective
To determine the allele frequencies of five common coding single‐nucleotide polymorphisms (SNPs) in the methylenetetrahydrofolate reductase (MTHFR) gene in African‐Americans and Caucasians with rheumatoid arthritis and controls to assess whether there are differences in allele frequencies among these ethnic or racial groups and whether these SNPs differentially affect the efficacy or toxicity of MTX.
Methods
Allele frequencies in the 677, 1298 and 3 additional SNPs in the MTHFR coding region in 223 (193 Caucasians and 30 African‐Americans) patients with rheumatoid arthritis who previously participated in one of two prospective clinical trials were characterised, and genotypes were correlated with the efficacy and toxicity of MTX. Another 308 subjects with rheumatoid arthritis who participated in observational studies, one group predominantly Caucasian and the other African‐American, as well as 103 normal controls (53 African‐Americans and 50 Caucasians) were used to characterise allele frequencies of these SNPs and their associated haplotypes.
Results
Significantly different allele frequencies were seen in three of the five SNPs and haplotype frequencies between Caucasians and African‐Americans. Allele frequencies were similar between patients with rheumatoid arthritis and controls of the same racial or ethnic group. Frequencies of the rs4846051C, 677T and 1298C alleles were 0.33, 0.11 and 0.13, respectively, among African‐Americans with rheumatoid arthritis. Among Caucasians with rheumatoid arthritis, these allele frequencies were 0.08 (p<0.001 compared with African‐Americans with rheumatoid arthritis), 0.30 (p = 0.002) and 0.34 (p<0.001), respectively. There was no association between SNP alleles or haplotypes and response to MTX as measured by the mean change in the 28‐joint Disease Activity Score from baseline values. In Caucasians, the 1298 A (major) allele was associated with a significant increase in MTX‐related adverse events characteristic of a recessive genetic effect (odds ratio 15.86, 95% confidence interval 1.51 to 167.01; p = 0.021), confirming previous reports. There was an association between scores of MTX toxicity and the rs4846051 C allele, and haplotypes containing this allele, in African‐Americans, but not in Caucasians.
Conclusions
: These results, although preliminary, highlight racial or ethnic differences in frequencies of common MTHFR SNPs. The MTHFR 1298 A and the rs4846051 C alleles were associated with MTX‐related adverse events in Caucasians and African‐Americans, respectively, but these findings should be replicated in larger studies. The rs4846051 SNP, which is far more common in African‐Americans than in Caucasians, can also be proved to be a useful ancestry informative marker in future studies on genetic admixture.
doi:10.1136/ard.2005.046797
PMCID: PMC1798268  PMID: 16439441
14.  Common genetic variants associated with disease from genome-wide association studies are mutually exclusive in prostate cancer and rheumatoid arthritis 
Bju International  2012;111(7):1148-1155.
What's known on the subject? and What does the study add?
The link between inflammation and cancer has long been reported and inflammation is thought to play a role in the pathogenesis of many cancers, including prostate cancer (PrCa). Over the last 5 years, genome-wide association studies (GWAS) have reported numerous susceptibility loci that predispose individuals to many different traits.The present study aims to ascertain if there are common genetic risk profiles that might predispose individuals to both PrCa and the autoimmune inflammatory condition, rheumatoid arthritis. These results could have potential public heath impact in terms of screening and chemoprevention.
Objectives
To investigate if potential common pathways exist for the pathogenesis of autoimmune disease and prostate cancer (PrCa).To ascertain if the single nucleotide polymorphisms (SNPs) reported by genome-wide association studies (GWAS) as being associated with susceptibility to PrCa are also associated with susceptibility to the autoimmune disease rheumatoid arthritis (RA).
Materials and Methods
The original Wellcome Trust Case Control Consortium (WTCCC) UK RA GWAS study was expanded to include a total of 3221 cases and 5272 controls.In all, 37 germline autosomal SNPs at genome-wide significance associated with PrCa risk were identified from a UK/Australian PrCa GWAS.Allele frequencies were compared for these 37 SNPs between RA cases and controls using a chi-squared trend test and corrected for multiple testing (Bonferroni).
Results
In all, 33 SNPs were able to be analysed in the RA dataset. Proxies could not be located for the SNPs in 3q26, 5p15 and for two SNPs in 17q12.After applying a Bonferroni correction for the number of SNPs tested, the SNP mapping to CCHCR1 (rs130067) retained statistically significant evidence for association (P = 6 × 10–4; odds ratio [OR] = 1.15, 95% CI: 1.06–1.24); this has also been associated with psoriasis.However, further analyses showed that the association of this allele was due to confounding by RA-associated HLA-DRB1 alleles.
Conclusions
There is currently no evidence that SNPs associated with PrCa at genome-wide significance are associated with the development of RA.Studies like this are important in determining if common genetic risk profiles might predispose individuals to many diseases, which could have implications for public health in terms of screening and chemoprevention.
doi:10.1111/j.1464-410X.2012.11492.x
PMCID: PMC4491307  PMID: 22985493
genetic variants; genome-wide association studies (GWAS); prostate cancer; rheumatoid arthritis
15.  Association mapping of susceptibility loci for rheumatoid arthritis 
BMC Proceedings  2007;1(Suppl 1):S15.
We analyzed a case-control data set for chromosome 18q from the Genetic Analysis Workshop 15 to detect susceptibility loci for rheumatoid arthritis (RA). A total number of 460 cases and 460 unaffected controls were genotyped on 2300 single-nucleotide polymorphisms (SNPs) by the North American Rheumatoid Arthritis Consortium. Using a multimarker approach for association mapping under the framework of the Malecot model and composite likelihood, we identified a region showing significant association with RA (p < 0.002) and the predicted disease locus was at a genomic location of 53,306 kb with a 95% confidence interval (CI) of 53,295–53,331 kb. A common haplotype in this region was protective against RA (p = 0.002). In another region showing nominal significant association (51,585 kb, 95% CI: 51,541–51,628 kb, p = 0.037), a haplotype was also protective (p = 0.002). We further demonstrated that reducing SNP density decreased power and accuracy of association mapping. SNP selection based on equal linkage disequilibrium (LD) distance generally produced higher accuracy than that based on equal kilobase distance or tagging.
PMCID: PMC2367513  PMID: 18466494
16.  Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data 
Background
Due to the low statistical power of individual markers from a genome-wide association study (GWAS), detecting causal single nucleotide polymorphisms (SNPs) for complex diseases is a challenge. SNP combinations are suggested to compensate for the low statistical power of individual markers, but SNP combinations from GWAS generate high computational complexity.
Methods
We aim to detect type 2 diabetes (T2D) causal SNP combinations from a GWAS dataset with optimal filtration and to discover the biological meaning of the detected SNP combinations. Optimal filtration can enhance the statistical power of SNP combinations by comparing the error rates of SNP combinations from various Bonferroni thresholds and p-value range-based thresholds combined with linkage disequilibrium (LD) pruning. T2D causal SNP combinations are selected using random forests with variable selection from an optimal SNP dataset. T2D causal SNP combinations and genome-wide SNPs are mapped into functional modules using expanded gene set enrichment analysis (GSEA) considering pathway, transcription factor (TF)-target, miRNA-target, gene ontology, and protein complex functional modules. The prediction error rates are measured for SNP sets from functional module-based filtration that selects SNPs within functional modules from genome-wide SNPs based expanded GSEA.
Results
A T2D causal SNP combination containing 101 SNPs from the Wellcome Trust Case Control Consortium (WTCCC) GWAS dataset are selected using optimal filtration criteria, with an error rate of 10.25%. Matching 101 SNPs with known T2D genes and functional modules reveals the relationships between T2D and SNP combinations. The prediction error rates of SNP sets from functional module-based filtration record no significance compared to the prediction error rates of randomly selected SNP sets and T2D causal SNP combinations from optimal filtration.
Conclusions
We propose a detection method for complex disease causal SNP combinations from an optimal SNP dataset by using random forests with variable selection. Mapping the biological meanings of detected SNP combinations can help uncover complex disease mechanisms.
doi:10.1186/1472-6947-13-S1-S3
PMCID: PMC3618247  PMID: 23566118
17.  Inference of disease associations with unmeasured genetic variants by combining results from genome-wide association studies with linkage disequilibrium patterns in a reference data set 
BMC Proceedings  2009;3(Suppl 7):S55.
Results from whole-genome association studies of many common diseases are now available. Increasingly, these are being incorporated into meta-analyses to increase the power to detect weak associations with measured single-nucleotide polymorphisms (SNPs). Imputation of genotypes at unmeasured loci has been widely applied using patterns of linkage disequilibrium (LD) observed in the HapMap panels, but there is a need for alternative methods that can utilize the pooled effect estimates from meta-analyses and explore possible associations with SNPs and haplotypes that are not included in HapMap.
By a weighted average technique, we show that association results for common SNPs in an observed data set can be scaled and combined to infer the effect of a genetic variant that has been measured only in an independent reference data set. We show that the ratio p(R-1)/[1 + p(R-1)], where R is the relative risk associated with a measured or unmeasured allele of frequency p, is appropriately scaled by 1/D' and weighted in proportion to r2, both common measures of LD being derived from the reference data set.
We illustrate this computationally simple method by combining the results of a genome-wide association screen from the North American Rheumatoid Arthritis Consortium with LD measures from the British 1958 Birth Cohort, and explore the validity of underlying assumptions about the generalizability of LD from one population to another, and from healthy subjects to subjects with clinical disease.
PMCID: PMC2795955  PMID: 20018048
18.  Identification of polymorphic inversions from genotypes 
BMC Bioinformatics  2012;13:28.
Background
Polymorphic inversions are a source of genetic variability with a direct impact on recombination frequencies. Given the difficulty of their experimental study, computational methods have been developed to infer their existence in a large number of individuals using genome-wide data of nucleotide variation. Methods based on haplotype tagging of known inversions attempt to classify individuals as having a normal or inverted allele. Other methods that measure differences between linkage disequilibrium attempt to identify regions with inversions but unable to classify subjects accurately, an essential requirement for association studies.
Results
We present a novel method to both identify polymorphic inversions from genome-wide genotype data and classify individuals as containing a normal or inverted allele. Our method, a generalization of a published method for haplotype data [1], utilizes linkage between groups of SNPs to partition a set of individuals into normal and inverted subpopulations. We employ a sliding window scan to identify regions likely to have an inversion, and accumulation of evidence from neighboring SNPs is used to accurately determine the inversion status of each subject. Further, our approach detects inversions directly from genotype data, thus increasing its usability to current genome-wide association studies (GWAS).
Conclusions
We demonstrate the accuracy of our method to detect inversions and classify individuals on principled-simulated genotypes, produced by the evolution of an inversion event within a coalescent model [2]. We applied our method to real genotype data from HapMap Phase III to characterize the inversion status of two known inversions within the regions 17q21 and 8p23 across 1184 individuals. Finally, we scan the full genomes of the European Origin (CEU) and Yoruba (YRI) HapMap samples. We find population-based evidence for 9 out of 15 well-established autosomic inversions, and for 52 regions previously predicted by independent experimental methods in ten (9+1) individuals [3,4]. We provide efficient implementations of both genotype and haplotype methods as a unified R package inveRsion.
doi:10.1186/1471-2105-13-28
PMCID: PMC3296650  PMID: 22321652
19.  Genetic variants associated with idiopathic pulmonary fibrosis susceptibility and mortality: a genome-wide association study 
The lancet. Respiratory medicine  2013;1(4):309-317.
Summary
Background
Idiopathic pulmonary fibrosis (IPF) is a devastating disease that probably involves several genetic loci. Several rare genetic variants and one common single nucleotide polymorphism (SNP) of MUC5B have been associated with the disease. Our aim was to identify additional common variants associated with susceptibility and ultimately mortality in IPF.
Methods
First, we did a three-stage genome-wide association study (GWAS): stage one was a discovery GWAS; and stages two and three were independent case-control studies. DNA samples from European-American patients with IPF meeting standard criteria were obtained from several US centres for each stage. Data for European-American control individuals for stage one were gathered from the database of genotypes and phenotypes; additional control individuals were recruited at the University of Pittsburgh to increase the number. For controls in stages two and three, we gathered data for additional sex-matched European-American control individuals who had been recruited in another study. DNA samples from patients and from control individuals were genotyped to identify SNPs associated with IPF. SNPs identified in stage one were carried forward to stage two, and those that achieved genome-wide significance (p<5 × 10−8) in a meta-analysis were carried forward to stage three. Three case series with follow-up data were selected from stages one and two of the GWAS using samples with follow-up data. Mortality analyses were done in these case series to assess the SNPs associated with IPF that had achieved genome-wide significance in the meta-analysis of stages one and two. Finally, we obtained gene-expression profiling data for lungs of patients with IPF from the Lung Genomics Research Consortium and analysed correlation with SNP genotypes.
Findings
In stage one of the GWAS (542 patients with IPF, 542 control individuals matched one-by-one to cases by genetic ancestry estimates), we identified 20 loci. Six SNPs reached genome-wide significance in stage two (544 patients, 687 control individuals): three TOLLIP SNPs (rs111521887, rs5743894, rs5743890) and one MUC5B SNP (rs35705950) at 11p15.5; one MDGA2 SNP (rs7144383) at 14q21.3; and one SPPL2C SNP (rs17690703) at 17q21.31. Stage three (324 patients, 702 control individuals) confirmed the associations for all these SNPs, except for rs7144383. Linkage disequilibrium between the MUC5B SNP (rs35705950) and TOLLIP SNPs (rs111521887 [r2=0.07], rs5743894 [r2=0.16], and rs5743890 [r2=0.01]) was low. 683 patients from the GWAS were included in the mortality analysis. Individuals who developed IPF despite having the protective TOLLIP minor allele of rs5743890 carried an increased mortality risk (meta-analysis with fixed-effect model: hazard ratio 1.72 [95% CI 1.24–2.38]; p=0.0012). TOLLIP expression was decreased by 20% in individuals carrying the minor allele of rs5743890 (p=0.097), 40% in those with the minor allele of rs111521887 (p=3.0 × 10−4), and 50% in those with the minor allele of rs5743894 (p=2.93 × 10−5) compared with homozygous carriers of common alleles for these SNPs.
Interpretation
Novel variants in TOLLIP and SPPL2C are associated with IPF susceptibility. One novel variant of TOLLIP, rs5743890, is also associated with mortality. These associations and the reduced expression of TOLLIP in patients with IPF who carry TOLLIP SNPs emphasise the importance of this gene in the disease.
Funding
National Institutes of Health; National Heart, Lung, and Blood Institute; Pulmonary Fibrosis Foundation; Coalition for Pulmonary Fibrosis; and Instituto de Salud Carlos III.
doi:10.1016/S2213-2600(13)70045-6
PMCID: PMC3894577  PMID: 24429156
20.  Enrichment of Genetic Variants for Rheumatoid Arthritis within T-Cell and NK-Cell Enhancer Regions 
Molecular Medicine  2015;21(1):180-184.
To identify disease-causative variants, we intersected the published results of a metaanalysis of genome-wide association studies (GWAS) for rheumatoid arthritis (RA) with the set of enhancer regions for 71 primary cell types that was provided by the FANTOM consortium. We first retrieved all single nucleotide polymorphisms (SNPs) that are associated (P < 5 × 108) with RA in the GWAS meta-analysis and that are located in any of these enhancer regions. After excluding the major histocompatibility complex (MHC) region, we identified 50 such RA-associated SNPs that are located in enhancer regions. Enhancer sets from different cell types were then compared with each other for their number of RA-associated SNPs by permutation analysis. This analysis showed that RA-associated SNPs are preferentially located in enhancers from several immunological cell types. In particular, we see a strong relative enrichment in enhancer regions that are active in T cells (P < 0.001) and NK cells (P < 0.001). Several loci display multiple RA-associated SNPs in tight linkage disequilibrium that are located within the same or neighboring enhancers. These haplotypes may have a greater likelihood to influence enhancer activity than any SNP on its own. Taken together, these results support the hypothesis that RA-causative variants often act through altering the activity of immune cell enhancers. The enrichment in T-cell and NK-cell enhancer regions indicates that expression changes in these cell types are particularly relevant for the pathogenesis of RA. The specific SNPs that account for this enrichment can be used as a basis for focused genotype-phenotype studies of these cell types.
doi:10.2119/molmed.2014.00252
PMCID: PMC4503658  PMID: 25794145
21.  The frequent and conserved DR3-B8-A1 extended haplotype confers less diabetes risk than other DR3 haplotypes 
Diabetes, obesity & metabolism  2009;11(Suppl 1):25-30.
Aim
The goal of this study was to develop and implement methodology that would aid in the analysis of extended high-density single nucleotide polymorphism (SNP) major histocompatibility complex (MHC) haplotypes combined with human leucocyte antigen (HLA) alleles in relation to type 1 diabetes risk.
Methods
High-density SNP genotype data (2918 SNPs) across the MHC from the Type 1 Diabetes Genetics Consortium (1240 families), in addition to HLA data, were processed into haplotypes using PEDCHECK and MERLIN, and extended DR3 haplotypes were analysed.
Results
With this large dense set of SNPs, the conservation of DR3-B8-A1 (8.1) haplotypes spanned the MHC (≥99% SNP identity). Forty-seven individuals homozygous for the 8.1 haplotype also shared the same homozygous genotype at four ‘sentinel’ SNPs (rs2157678 ‘T’, rs3130380 ‘A’, rs3094628 ‘C’ and rs3130352 ‘T’). Conservation extended from HLA-DQB1 to the telomeric end of the SNP panels (3.4 Mb total). In addition, we found that the 8.1 haplotype is associated with lower risk than other DR3 haplotypes by both haplotypic and genotypic analyses [haplotype: p = 0.009, odds ratio (OR) = 0.65; genotype: p = 6.3 × 10−5, OR = 0.27]. The 8.1 haplotype (from genotypic analyses) is associated with lower risk than the high-risk DR3-B18-A30 haplotype (p = 0.01, OR = 0.23), but the DR3-B18-A30 haplotype did not differ from other non-8.1 DR3 haplotypes relative to diabetes association.
Conclusion
The 8.1 haplotype demonstrates extreme conservation (>3.4 Mb) and is associated with significantly lower risk for type 1 diabetes than other DR3 haplotypes.
doi:10.1111/j.1463-1326.2008.01000.x
PMCID: PMC2769935  PMID: 19143812
8.1 haplotype; extended haplotypes; major histocompatibility complex; T1DGC; type 1 diabetes
22.  The Association Between Genetic Variants in SORL1 and Alzheimer’s Disease in an Urban, Multiethnic, Community-Based Cohort 
Archives of neurology  2007;64(4):501-506.
Context
Variants in 3′ and 5′ regions of SORL1, the neuronal sorting protein-related receptor, were recently found to be associated with late onset familial and sporadic Alzheimer’s disease in several datasets that were selected for familial aggregation or were ethnically diverse or clinic-based selected series.
Objective
To investigate the association between Alzheimer’s disease and variant alleles in SORL1 using a series of single nucleotide polymorphisms (SNPs) in an urban, multiethnic community-based population.
Design & Setting
We used a nested case-control analysis in a population-based, prospective study of aging and dementia in Medicare recipients, 65 years and older, residing in northern Manhattan.
Participants
There were 296 patients with probable Alzheimer’s disease and 428 healthy elderly controls. The participants were of African American (34%), Caribbean Hispanic (51%) or non-Hispanic whites (15%).
Main Outcome Measures
We genotyped all 29 SNPs in SORL1 that were examined in the earlier report. We assessed allelic association with AD using standard case-control methods which included APOE genotype as a covariate.
Results
Several individual SNPs and SNP haplotypes were significantly associated with AD in this prospectively collected community-based cohort, confirming the previously reported positive association of SORL1 with Alzheimer’s disease. SNP 12 near the 5′ region was associated with AD in African-Americans and Hispanics. Two SNPs in the 3′ region were also associated with AD in African-Americans (SNP 26) and Whites (SNP 20). A single haplotype in the 3′ region was associated with AD in Hispanics. However, several different haplotypes were associated with AD in the African-Americans and Whites, including the “TTC” haplotypes at SNPs 23–25 (p=0.035) that was significantly associated with AD in the North European Whites in the previous report.
Conclusions
This study confirms the association between genetic variants in SORL1 and AD. While the associations observed in these datasets overlap with those previously reported, the finding of novel SNP and haplotype associations suggest that there may be extensive allelic heterogeneity in SORL1. Broad regions of the SORL1 gene will therefore need to be scrutinized for functional pathogenic variants.
doi:10.1001/archneur.64.4.501
PMCID: PMC2639214  PMID: 17420311
SORL1; Alzheimer’s disease; sporadic; African American; Caribbean Hispanic
23.  Analysis of genome-wide association data by large-scale Bayesian logistic regression 
BMC Proceedings  2009;3(Suppl 7):S16.
Single-locus analysis is often used to analyze genome-wide association (GWA) data, but such analysis is subject to severe multiple comparisons adjustment. Multivariate logistic regression is proposed to fit a multi-locus model for case-control data. However, when the sample size is much smaller than the number of single-nucleotide polymorphisms (SNPs) or when correlation among SNPs is high, traditional multivariate logistic regression breaks down. To accommodate the scale of data from a GWA while controlling for collinearity and overfitting in a high dimensional predictor space, we propose a variable selection procedure using Bayesian logistic regression. We explored a connection between Bayesian regression with certain priors and L1 and L2 penalized logistic regression. After analyzing large number of SNPs simultaneously in a Bayesian regression, we selected important SNPs for further consideration. With much fewer SNPs of interest, problems of multiple comparisons and collinearity are less severe. We conducted simulation studies to examine probability of correctly selecting disease contributing SNPs and applied developed methods to analyze Genetic Analysis Workshop 16 North American Rheumatoid Arthritis Consortium data.
PMCID: PMC2795912  PMID: 20018005
24.  Combined genotype and haplotype tests for region-based association studies 
BMC Genomics  2013;14:569.
Background
Although single-SNP analysis has proven to be useful in identifying many disease-associated loci, region-based analysis has several advantages. Empirically, it has been shown that region-based genotype and haplotype approaches may possess much higher power than single-SNP statistical tests. Both high quality haplotypes and genotypes may be available for analysis given the development of next generation sequencing technologies and haplotype assembly algorithms.
Results
As generally it is unknown whether genotypes or haplotypes are more relevant for identifying an association, we propose to use both of them with the purpose of preserving high power under both genotype and haplotype disease scenarios. We suggest two approaches for a combined association test and investigate the performance of these two approaches based on a theoretical model, population genetics simulations and analysis of a real data set.
Conclusions
Based on a theoretical model, population genetics simulations and analysis of a central corneal thickness (CCT) Genome Wide Association Study (GWAS) data set we have shown that combined genotype and haplotype approach has a high potential utility for applications in association studies.
doi:10.1186/1471-2164-14-569
PMCID: PMC3852120  PMID: 23964661
Genotype-based tests; Haplotype-based tests; Association analysis; Test statistic combination
25.  A Genome-Wide Scan for Breast Cancer Risk Haplotypes among African American Women 
PLoS ONE  2013;8(2):e57298.
Genome-wide association studies (GWAS) simultaneously investigating hundreds of thousands of single nucleotide polymorphisms (SNP) have become a powerful tool in the investigation of new disease susceptibility loci. Haplotypes are sometimes thought to be superior to SNPs and are promising in genetic association analyses. The application of genome-wide haplotype analysis, however, is hindered by the complexity of haplotypes themselves and sophistication in computation. We systematically analyzed the haplotype effects for breast cancer risk among 5,761 African American women (3,016 cases and 2,745 controls) using a sliding window approach on the genome-wide scale. Three regions on chromosomes 1, 4 and 18 exhibited moderate haplotype effects. Furthermore, among 21 breast cancer susceptibility loci previously established in European populations, 10p15 and 14q24 are likely to harbor novel haplotype effects. We also proposed a heuristic of determining the significance level and the effective number of independent tests by the permutation analysis on chromosome 22 data. It suggests that the effective number was approximately half of the total (7,794 out of 15,645), thus the half number could serve as a quick reference to evaluating genome-wide significance if a similar sliding window approach of haplotype analysis is adopted in similar populations using similar genotype density.
doi:10.1371/journal.pone.0057298
PMCID: PMC3585353  PMID: 23468962

Results 1-25 (1666021)