For the analysis of rare-variant data in population-based designs, we propose a method to detect study subjects that may create population substructure in the study sample. Our approach is computationally fast and simple, permitting applications to whole-genome sequencing studies. The method does not require the variants to be in linkage equilibrium and can be applied to all the genetic loci that are available in the study. For both rare and common variants, we assess the performance of our approach by its application to the 1000 Genome Project data, and in simulation studies. The results are compared to the commonly used outlier detection algorithm based on principal component analysis (PCA). The statistical power of both approaches to detect outliers are comparable in most of the scenarios, but the power of PCA to detect outliers is lower than the novel approach in the presence of linkage disequilibrium and for subpopulations that are genetically similar. The data analysis and the simulation studies suggest that the number of false-positive results appears to be different for the two approaches. Our approach maintains the type I error rate while the outlier detection approach based on PCA does not. Taking additionally into account the minimal computational requirements of our approach and the ability to incorporate all the marker information, the proposed method will have important application in sequencing studies and genome-wide association studies.
population substructure; outlier detection; GWAS; sequence data
Increased susceptibility to tuberculosis following HIV-1 seroconversion contributes significantly to the tuberculosis epidemic in sub-saharan Africa. Lung specific mechanisms underlying the interaction between HIV-1 and Mycobacterium (M.) tuberculosis infection are incompletely understood. This study addressed the effect of HIV-1 and latent M. tuberculosis infection on viral-entry receptors and ligands in bronchoalveolar lavage (BAL). Median fluorescence intensity (MFI) of entry receptor expression was measured by multiparameter flow cytometry and chemokine expression by multiplex bead array.
Irrespective of HIV-1 status, BAL T-cells expressed higher MFI for the beta-chemokine receptor (CCR)5 than peripheral blood T-cells (p<0.001), in particular the CD8+ T-cells of HIV-1 infected persons showed elevated CCR5 expression (p=0.026). The concentration of BAL CCR5 ligands, regulated upon activation normal T-cell expressed and secreted (RANTES; p<0.001) and macrophage inflammatory protein (MIP)-1β (p=0.004) were elevated in the BAL of HIV-1 infected persons compared to controls. CCR5 expression and RANTES concentration correlated strongly with HIV-1 viral load in BAL. By contrast these alterations were not associated with M. tuberculosis sensitization in vivo nor did M. tuberculosis infection of BAL cells ex vivo change RANTES expression.
These data suggest ongoing HIV-1 replication predominantly drives local pulmonary CCR5+ T-cell activation in HIV/latent M. tuberculosis co-infection.
BAL; CCR5; RANTES; TB; viral load
In genome wide association studies (GWAS), family-based studies tend to have less power to detect genetic associations than population-based studies, such as case-control studies. This can be an issue when testing if genes in a family-based GWAS have a direct effect on the phenotype of interest over and above their possible indirect effect through a secondary phenotype. When multiple SNPs are tested for a direct effect in the family-based study, a screening step can be used to minimize the burden of multiple comparisons in the causal analysis. We propose a 2-stage screening step that can be incorporated into the family-based association test (FBAT) approach similar to the conditional mean model approach in the Van Steen-algorithm (Van Steen et al., 2005). Simulations demonstrate that the type 1 error is preserved and this method is advantageous when multiple markers are tested. This method is illustrated by an application to the Framingham Heart Study.
family-based association analysis; causal inference; genetic pathway; mediation; pleiotropy
The advent of genome-wide association studies has led to many novel disease-SNP associations, opening the door to focused study on their biological underpinnings. Because of the importance of analyzing these associations, numerous statistical methods have been devoted to them. However, fewer methods have attempted to associate entire genes or genomic regions with outcomes, which is potentially more useful knowledge from a biological perspective and those methods currently implemented are often permutation-based.
One property of some permutation-based tests is that their power varies as a function of whether significant markers are in regions of linkage disequilibrium (LD) or not, which we show from a theoretical perspective. We therefore develop two methods for quantifying the degree of association between a genomic region and outcome, both of whose power does not vary as a function of LD structure. One method uses dimension reduction to “filter” redundant information when significant LD exists in the region, while the other, called the summary-statistic test, controls for LD by scaling marker Z-statistics using knowledge of the correlation matrix of markers. An advantage of this latter test is that it does not require the original data, but only their Z-statistics from univariate regressions and an estimate of the correlation structure of markers, and we show how to modify the test to protect the type 1 error rate when the correlation structure of markers is misspecified. We apply these methods to sequence data of oral cleft and compare our results to previously proposed gene tests, in particular permutation-based ones. We evaluate the versatility of the modification of the summary-statistic test since the specification of correlation structure between markers can be inaccurate.
We find a significant association in the sequence data between the 8q24 region and oral cleft using our dimension reduction approach and a borderline significant association using the summary-statistic based approach. We also implement the summary-statistic test using Z-statistics from an already-published GWAS of Chronic Obstructive Pulmonary Disorder (COPD) and correlation structure obtained from HapMap. We experiment with the modification of this test because the correlation structure is assumed imperfectly known.
Dimension reduction; Eigenvector; Gene-based testing; Permutation tests
Population stratification leads to a predictable phenomenon—a reduction in the number of heterozygotes compared to that calculated assuming Hardy-Weinberg Equilibrium (HWE). We show that population stratification results in another phenomenon—an excess in the proportion of spouse-pairs with the same genotypes at all ancestrally informative markers, resulting in ancestrally related positive assortative mating. We use principal components analysis to show that there is evidence of population stratification within the Framingham Heart Study, and show that the first principal component correlates with a North-South European cline. We then show that the first principal component is highly correlated between spouses (r=0.58, p=0.0013), demonstrating that there is ancestrally related positive assortative mating among the Framingham Caucasian population. We also show that the single nucleotide polymorphisms loading most heavily on the first principal component show an excess of homozygotes within the spouses, consistent with similar ancestry-related assortative mating in the previous generation. This nonrandom mating likely affects genetic structure seen more generally in the North American population of European descent today, and decreases the rate of decay of linkage disequilibrium for ancestrally informative markers.
population stratification; non-random mating; Hardy-Weinberg equilibrium
We have conducted the first meta-analyses for nonsyndromic cleft lip with or without cleft palate (NSCL/P) using data from the two largest genome-wide association studies published to date. We confirmed associations with all previously identified loci and identified six additional susceptibility regions (1p36, 2p21, 3p11.1, 8q21.3, 13q31.1 and 15q22). Analysis of phenotypic variability identified the first specific genetic risk factor for NSCLP (nonsyndromic cleft lip plus palate) (rs8001641; PNSCLP = 6.51 × 10−11; homozygote relative risk = 2.41, 95% confidence interval (CI) 1.84–3.16).
In infection experiments with genetically distinct Mycobacterium tuberculosis complex (MTBC) strains, we identified clade-specific virulence patterns in human primary macrophages and in mice infected by the aerosol route, both reflecting relevant model systems. Exclusively human-adapted M. tuberculosis lineages, also termed clade I, comprising “modern” lineages, such as Beijing and Euro-American Haarlem strains, showed a significantly enhanced capability to grow compared to that of clade II strains, which include “ancient” lineages, such as, e.g., East African Indian or M. africanum strains. However, a simple correlation of inflammatory response profiles with strain virulence was not apparent. Overall, our data reveal three different pathogenic profiles: (i) strains of the Beijing lineage are characterized by low uptake, low cytokine induction, and a high replicative potential, (ii) strains of the Haarlem lineage by high uptake, high cytokine induction, and high growth rates, and (iii) EAI strains by low uptake, low cytokine induction, and a low replicative potential. Our findings have significant implications for our understanding of host-pathogen interaction and factors that modulate the outcomes of infections. Future studies addressing the underlying mechanisms and clinical implications need to take into account the diversity of both the pathogen and the host.
Clinical strains of the Mycobacterium tuberculosis complex (MTBC) are genetically more diverse than previously anticipated. Our analysis of mycobacterial growth characteristics in primary human macrophages and aerogenically infected mice shows that the MTBC genetic differences translate into pathogenic differences in the interaction with the host. Our study reveals for the first time that “TB is not TB,” if put in plain terms. We are convinced that it is very unlikely that a single molecular mechanism may explain the observed effects. Our study refutes the hypothesis that there is a simple correlation between cytokine induction as a single functional parameter of host interaction and mycobacterial virulence. Instead, careful consideration of strain- and lineage-specific characteristics must guide our attempts to decipher what determines the pathological potential and thus the outcomes of infection with MTBC, one of the most important human pathogens.
Rationale: Variability in pulmonary disease severity is found in patients with cystic fibrosis (CF) who have identical mutations in the CF transmembrane conductance regulator (CFTR) gene. We hypothesized that one factor accounting for heterogeneity in pulmonary disease severity is variation in the family of genes affecting the biology of interleukin-1 (IL-1), which impacts acquisition and maintenance of Pseudomonas aeruginosa infection in animal models of chronic infection. Methods: We genotyped 58 single nucleotide polymorphisms (SNPs) in the IL-1 gene cluster in 808 CF subjects from the University of North Carolina and Case Western Reserve University (UNC/CWRU) joint cohort. All were homozygous for ΔF508, and categories of “severe” (cases) or “mild” (control subjects) lung disease were defined by the lowest or highest quartile of forced expired volume (FEV1) for age in the CF population. After adjustment for age and gender, genotypic data were tested for association with lung disease severity. Odds ratios (ORs) comparing severe versus mild CF were also calculated for each genotype (with the homozygote major allele as the reference group) for all 58 SNPs. From these analyses, nine SNPs with a moderate effect size, OR ≤ 0.5or > 1.5, were selected for further testing. To replicate the case-control study results, we genotyped the same nine SNPs in a second population of CF parent-offspring trios (recruited from Children’s Hospital Boston), in which the offspring had similar pulmonary phenotypes. For the trio analysis, both family-based and population-based associations were performed. Results: SNPs rs1143634 and rs1143639 in the IL1B gene demonstrated a consistent association with lung disease severity categories (P < 0.10) and longitudinal analysis of lung disease severity (P < 0.10) in CF in both the case-control and family-based studies. In females, there was a consistent association (false discovery rate adjusted joint P-value < 0.06 for both SNPs) in both the analysis of lung disease severity in the UNC/CWRU cohort and the family-based analysis of affection status. Conclusion: Our findings suggest that IL1β is a clinically relevant modulator of CF lung disease.
gene modifiers; cystic fibrosis; CFTR; IL-1 gene family
IL10 is an anti-inflammatory cytokine that has been found to have lower production in macrophages and mononuclear cells from asthmatics. Since reduced IL10 levels may influence the severity of asthma phenotypes, we examined IL10 single-nucleotide polymorphisms (SNPs) for association with asthma severity and allergy phenotypes as quantitative traits. Utilizing DNA samples from 518 Caucasian asthmatic children from the Childhood Asthma Management Program (CAMP) and their parents, we genotyped six IL10 SNPs: 3 in the promoter, 2 in introns, and one in the 3′ UTR. Using family-based association tests, each SNP was tested for association with asthma and allergy phenotypes individually. Population-based association analysis was performed with each SNP locus, the promoter haplotypes and the 6-loci haplotypes. The 3′ UTR SNP was significantly associated with FEV1 as a percent of predicted (FEV1PP) (P=0.0002) in both the family and population analyses. The promoter haplotype GCC was positively associated with IgE levels and FEV1PP (P=0.007 and 0.012, respectively). The promoter haplotype ATA was negatively associated with lnPC20 and FEV1PP (P=0.008 and 0.043, respectively). Polymorphisms in IL10 are associated with asthma phenotypes in this cohort. Further studies of variation in the IL10 gene may help elucidate the mechanism of asthma development in children.
interleukin 10 (IL10); single nucleotide polymorphism (SNP); genetic association; family-based association test (FBAT); haplotype; promoter; 3′; untranslated region (3′UTR)
Many Genome-Wide Association Studies (GWAS) have signals with unknown etiology. This paper addresses the question — is such an association signal caused by rare or common variants that lead to increased disease risk? For a genomic region implicated by a GWAS, we use Single Nucleotide Polymorphism (SNP) data in a case-control setting to predict how many common or rare variants there are, using a Bayesian analysis. Our objective is to compute posterior probabilities for configurations of rare and/or common variants. We use an extension of coalescent trees — the Ancestral Recombination Graphs (ARG) — to model the genealogical history of the samples based on marker data. As we expect SNPs to be in Linkage Disequilibrium (LD) with common disease variants, we can expect the trees to reflect on the type of variants. To demonstrate the application, we apply our method to candidate gene sequencing data from a German case-control study on nonsyndromic cleft lip with or without cleft palate (NSCL/P).
Coalescent Tree; Genetic Association; Rare Variant; Common Variant; Ancestral Recombination Graphs; Bayesian Modeling
The response to treatment for asthma is characterized by wide interindividual variability, with a significant number of patients who have no response. We hypothesized that a genomewide association study would reveal novel pharmacogenetic determinants of the response to inhaled glucocorticoids.
We analyzed a small number of statistically powerful variants selected on the basis of a family-based screening algorithm from among 534,290 single-nucleotide polymorphisms (SNPs) to determine changes in lung function in response to inhaled glucocorticoids. A significant, replicated association was found, and we characterized its functional effects.
We identified a significant pharmacogenetic association at SNP rs37972, replicated in four independent populations totaling 935 persons (P = 0.0007), which maps to the glucocorticoid-induced transcript 1 gene (GLCCI1) and is in complete linkage disequilibrium (i.e., perfectly correlated) with rs37973. Both rs37972 and rs37973 are associated with decrements in GLCCI1 expression. In isolated cell systems, the rs37973 variant is associated with significantly decreased luciferase reporter activity. Pooled data from treatment trials indicate reduced lung function in response to inhaled glucocorticoids in subjects with the variant allele (P = 0.0007 for pooled data). Overall, the mean (± SE) increase in forced expiratory volume in 1 second in the treated subjects who were homozygous for the mutant rs37973 allele was only about one third of that seen in similarly treated subjects who were homozygous for the wild-type allele (3.2 ± 1.6% vs. 9.4 ± 1.1%), and their risk of a poor response was significantly higher (odds ratio, 2.36; 95% confidence interval, 1.27 to 4.41), with genotype accounting for about 6.6% of overall inhaled glucocorticoid response variability.
A functional GLCCI1 variant is associated with substantial decrements in the response to inhaled glucocorticoids in patients with asthma. (Funded by the National Institutes of Health and others; ClinicalTrials.gov number, NCT00000575.)
To assess the feasibility of developing a Combined Clinical and Pharmacogenetic Predictive Test, comprised of multiple single nucleotide polymorphisms (SNPs) that is associated with poor bronchodilator response (BDR).
We genotyped SNPs that tagged the whole genome of the parents and children in the Childhood Asthma Management Program (CAMP) and implemented an algorithm using a family-based association test that ranked SNPs by statistical power. The top eight SNPs that were associated with BDR comprised the Pharmacogenetic Predictive Test. The Clinical Predictive Test was comprised of baseline forced expiratory volume in 1 s (FEV1). We evaluated these predictive tests and a Combined Clinical and Pharmacogenetic Predictive Test in three distinct populations: the children of the CAMP trial and two additional clinical trial populations of asthma. Our outcome measure was poor BDR, defined as BDR of less than 20th percentile in each population. BDR was calculated as the percent difference between the prebronchodilator and postbronchodilator (two puffs of albuterol at 180 μg/puff) FEV1 value. To assess the predictive ability of the test, the corresponding area under the receiver operating characteristic curves (AUROCs) were calculated for each population.
The AUROC values for the Clinical Predictive Test alone were not significantly different from 0.50, the AUROC of a random classifier. Our Combined Clinical and Pharmacogenetic Predictive Test comprised of genetic polymorphisms in addition to FEV1 predicted poor BDR with an AUROC of 0.65 in the CAMP children (n= 422) and 0.60 (n= 475) and 0.63 (n= 235) in the two independent populations. Both the Combined Clinical and Pharmacogenetic Predictive Test and the Pharmacogenetic Predictive Test were significantly more accurate than the Clinical Predictive Test (AUROC between 0.44 and 0.55) in each of the populations.
Our finding that genetic polymorphisms with a clinical trait are associated with BDR suggests that there is promise in using multiple genetic polymorphisms simultaneously to predict which asthmatics are likely to respond poorly to bronchodilators.
asthma; bronchodilator response; personalized medicine; pharmacogenetic test; predictive medicine
For the meta-analysis of genome-wide association studies, we propose a new method to adjust for the population stratification and a linear mixed approach that combines family-based and unrelated samples. The proposed approach achieves similar power levels as a standard meta-analysis which combines the different test statistics or p values across studies. However, by virtue of its design, the proposed approach is robust against population admixture and stratification, and no adjustments for population admixture and stratification, even in unrelated samples, are required. Using simulation studies, we examine the power of the proposed method and compare it to standard approaches in the meta-analysis of genome-wide association studies. The practical features of the approach are illustrated with a meta-analysis of three genome-wide association studies for Alzheimer's disease. We identify three single nucleotide polymorphisms showing significant genome-wide association with affection status. Two single nucleotide polymorphisms are novel and will be verified in other populations in our follow-up study.
Meta-analysis; Genome-wide study; Population stratification
For genetic association studies in designs of unrelated individuals, current statistical methodology typically models the phenotype of interest as a function of the genotype and assumes a known statistical model for the phenotype. In the analysis of complex phenotypes, especially in the presence of ascertainment conditions, the specification of such model assumptions is not straight-forward and is error-prone, potentially causing misleading results.
In this paper, we propose an alternative approach that treats the genotype as the random variable and conditions upon the phenotype. Thereby, the validity of the approach does not depend on the correctness of assumptions about the phenotypic model. Misspecification of the phenotypic model may lead to reduced statistical power. Theoretical derivations and simulation studies demonstrate both the validity and the advantages of the approach over existing methodology. In the COPDGene study (a GWAS for Chronic Obstructive Pulmonary Disease (COPD)), we apply the approach to a secondary, quantitative phenotype, the Fagerstrom nicotine dependence score, that is correlated with COPD affection status. The software package that implements this method is available.
The flexibility of this approach enables the straight-forward application to quantitative phenotypes and binary traits in ascertained and unascertained samples. In addition to its robustness features, our method provides the platform for the construction of complex statistical models for longitudinal data, multivariate data, multi-marker tests, rare-variant analysis, and others.
Genetic associations studies; Secondary phenotypes; Case-control; Ascertainment; Semi-parametric
The genetic risk factors for chronic obstructive pulmonary disease (COPD) are still largely unknown. To date, genome-wide association studies (GWASs) of limited size have identified several novel risk loci for COPD at CHRNA3/CHRNA5/IREB2, HHIP and FAM13A; additional loci may be identified through larger studies. We performed a GWAS using a total of 3499 cases and 1922 control subjects from four cohorts: the Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE); the Normative Aging Study (NAS) and National Emphysema Treatment Trial (NETT); Bergen, Norway (GenKOLS); and the COPDGene study. Genotyping was performed on Illumina platforms with additional markers imputed using 1000 Genomes data; results were summarized using fixed-effect meta-analysis. We identified a new genome-wide significant locus on chromosome 19q13 (rs7937, OR = 0.74, P = 2.9 × 10−9). Genotyping this single nucleotide polymorphism (SNP) and another nearby SNP in linkage disequilibrium (rs2604894) in 2859 subjects from the family-based International COPD Genetics Network study (ICGN) demonstrated supportive evidence for association for COPD (P = 0.28 and 0.11 for rs7937 and rs2604894), pre-bronchodilator FEV1 (P = 0.08 and 0.04) and severe (GOLD 3&4) COPD (P = 0.09 and 0.017). This region includes RAB4B, EGLN2, MIA and CYP2A6, and has previously been identified in association with cigarette smoking behavior.
The excitement over findings from Genome-Wide Association Studies (GWASs) has been tempered by the difficulty in finding the location of the true causal disease susceptibility loci (DSLs), rather than markers that are correlated with the causal variants. In addition, many recent GWASs have studied multiple phenotypes – often highly correlated – making it difficult to understand which associations are causal and which are seemingly causal, induced by phenotypic correlations. In order to identify DSLs, which are required to understand the genetic etiology of the observed associations, statistical methodology has been proposed that distinguishes between a direct effect of a genetic locus on the primary phenotype and an indirect effect induced by the association with the intermediate phenotype that is also correlated with the primary phenotype. However, so far, the application of this important methodology has been challenging, as no user-friendly software implementation exists. The lack of software implementation of this sophisticated methodology has prevented its large-scale use in the genetic community. We have now implemented this statistical approach in a user-friendly and robust R package that has been thoroughly tested. The R package ‘CGene' is available for download at http://cran.r-project.org/. The R code is also available at http://people.hsph.harvard.edu/~plipman.
causal modeling; statistical genetics; software
It is useful to have robust gene-environment interaction tests that can utilize a variety of family structures in an efficient way. This paper focuses on tests for gene-environment interaction in the presence of main genetic and environmental effects. The objective is to develop powerful tests that can combine trio data with parental genotypes and discordant sibships when parents genotypes are missing. We first make a modest improvement on a method for discordant sibs (discordant on phenotype), but the approach does not allow one to use families when all offspring are affected, e.g. trios. We then make a modest improvement on a Mendelian transmission-based approach that is inefficient when discordant sibs are available, but can be applied to any nuclear family. Finally, we propose a hybrid approach that utilizes the most efficient method for a specific family type, then combines over families. We utilize this hybrid approach to analyze a chronic obstructive pulmonary disorder dataset to test for gene-environment interaction in the Serpine2 gene with smoking. The methods are freely available in the R package fbati.
Gene-Environment Interaction; Family-Based Association Tests; Candidate Gene Analysis; Binary Trait; COPD; Serpine2
Motivation: For the analysis of rare variants in sequence data, numerous approaches have been suggested. Fixed and flexible threshold approaches collapse the rare variant information of a genomic region into a test statistic with reduced dimensionality. Alternatively, the rare variant information can be combined in statistical frameworks that are based on suitable regression models, machine learning, etc. Although the existing approaches provide powerful tests that can incorporate information on allele frequencies and prior biological knowledge, differences in the spatial clustering of rare variants between cases and controls cannot be incorporated. Based on the assumption that deleterious variants and protective variants cluster or occur in different parts of the genomic region of interest, we propose a testing strategy for rare variants that builds on spatial cluster methodology and that guides the identification of the biological relevant segments of the region. Our approach does not require any assumption about the directions of the genetic effects.
Results: In simulation studies, we assess the power of the clustering approach and compare it with existing methodology. Our simulation results suggest that the clustering approach for rare variants is well powered, even in situations that are ideal for standard methods. The efficiency of our spatial clustering approach is not affected by the presence of rare variants that have opposite effect size directions. An application to a sequencing study for non-syndromic cleft lip with or without cleft palate (NSCL/P) demonstrates its practical relevance. The proposed testing strategy is applied to a genomic region on chromosome 15q13.3 that was implicated in NSCL/P etiology in a previous genome-wide association study, and its results are compared with standard approaches.
Availability: Source code and documentation for the implementation in R will be provided online. Currently, the R-implementation only supports genotype data. We currently are working on an extension for VCF files.
Cachexia, whether assessed by body mass index (BMI) or fat-free mass index (FFMI), affects a significant proportion of patients with chronic obstructive pulmonary disease (COPD), and is an independent risk factor for increased mortality, increased emphysema, and more severe airflow obstruction. The variable development of cachexia among patients with COPD suggests a role for genetic susceptibility. The objective of the present study was to determine genetic susceptibility loci involved in the development of low BMI and FFMI in subjects with COPD. A genome-wide association study (GWAS) of BMI was conducted in three independent cohorts of European descent with Global Initiative for Chronic Obstructive Lung Disease stage II or higher COPD: Evaluation of COPD Longitudinally to Identify Predictive Surrogate End-Points (ECLIPSE; n = 1,734); Norway-Bergen cohort (n = 851); and a subset of subjects from the National Emphysema Treatment Trial (NETT; n = 365). A genome-wide association of FFMI was conducted in two of the cohorts (ECLIPSE and Norway). In the combined analyses, a significant association was found between rs8050136, located in the first intron of the fat mass and obesity–associated (FTO) gene, and BMI (P = 4.97 × 10−7) and FFMI (P = 1.19 × 10−7). We replicated the association in a fourth, independent cohort consisting of 502 subjects with COPD from COPDGene (P = 6 × 10−3). Within the largest contributing cohort of our analysis, lung function, as assessed by forced expiratory volume at 1 second, varied significantly by FTO genotype. Our analysis suggests a potential role for the FTO locus in the determination of anthropomorphic measures associated with COPD.
chronic obstructive pulmonary disease genetics; chronic obstructive pulmonary disease epidemiology; chronic obstructive pulmonary disease metabolism; genome-wide association study
Plasmacytoid dendritic cells (pDC) are rarely present in normal skin but have been shown to infiltrate lesions of infections or autoimmune disorders. Here, we report that several DC subsets including CD123+ BDCA-2/CD303+ pDC accumulate in the dermis in indurations induced by the tuberculin skin test (TST), used to screen immune sensitization by Mycobacterium tuberculosis. Although the purified protein derivate (PPD) used in the TST did not itself induce pDC recruitment or IFNα production, the positive skin reactions showed high expression of the IFNα inducible protein MxA. In contrast, the local immune response to PPD was associated with substantial cell death and high expression of the cationic antimicrobial peptide LL37, which together can provide a means for pDC activation and IFNα production. In vitro, pDC showed low uptake of PPD compared to CD11c+ and BDCA-3/CD141+ myeloid DC subsets. Furthermore, supernatants from pDC activated with LL37-DNA complexes reduced the high PPD uptake in myeloid DC as well as decreased their capacity to activate T cell proliferation. Infiltrating pDC in the TST reaction site may thus have a regulatory effect upon the antigen processing and presentation functions of surrounding potent myeloid DC subsets to limit potentially detrimental and excessive immune stimulation.
dendritic cells; plasmacytoid dendritic cells; skin; tuberculin skin test; LL37; delayed hypersensitivity reaction; PPD
As Next-Generation Sequencing data becomes available, existing hardware environments do not provide sufficient storage space and computational power to store and process the data due to their enormous size. This is and will be a frequent problem that is encountered everyday by researchers who are working on genetic data. There are some options available for compressing and storing such data, such as general-purpose compression software, PBAT/PLINK binary format, etc. However, these currently available methods either do not offer sufficient compression rates, or require a great amount of CPU time for decompression and loading every time the data is accessed.
Here, we propose a novel and simple algorithm for storing such sequencing data. We show that, the compression factor of the algorithm ranges from 16 to several hundreds, which potentially allows SNP data of hundreds of Gigabytes to be stored in hundreds of Megabytes. We provide a C++ implementation of the algorithm, which supports direct loading and parallel loading of the compressed format without requiring extra time for decompression. By applying the algorithm to simulated and real datasets, we show that the algorithm gives greater compression rate than the commonly used compression methods, and the data-loading process takes less time. Also, The C++ library provides direct-data-retrieving functions, which allows the compressed information to be easily accessed by other C++ programs.
The SpeedGene algorithm enables the storage and the analysis of next generation sequencing data in current hardware environment, making system upgrades unnecessary.
We propose a new approach for the analysis of copy number variants (CNVs)for genome-wide association studies in family-based designs. Our new overall association test combines the between-family component and the within-family component of the data so that the new test statistic is fully efficient and, at the same time, achieves the complete robustness against population-admixture and stratification, as classical family-based association tests that are based only on the between-family component. Although all data are incorporated into the test statistic, an adjustment for genetic confounding is not needed, not even for the between-family component. The new test statistic is valid for testing either quantitative or dichotomous phenotypes. If external CNV data are available, the approach can also be used in completely ascertained samples. Similar to the approach by Ionita-Laza et al.(1), the proposed test statistic does not required a CNV-calling algorithm and is based directly on the CNV probe intensity data. We show, via simulation studies, that our methodology increases the power of the FBAT statistic to levels comparable to those of population-based designs. The advantages of the approach in practice are demonstrated by an application to a genome-wide association study for body mass index (BMI).
For genomewide association studies with family-based designs, we propose a Bayesian approach. We show that standard TDT/FBAT statistics can naturally be implemented in a Bayesian framework. We construct a Bayes factor conditional on the offspring phenotype and parental genotype data and then use the data we conditioned on to inform the prior odds for each marker. In the construction of the prior odds, the evidence for association for each single marker is obtained at the population-level by estimating the genetic effect size in the conditional mean model. Since such genetic effect size estimates are statistically independent of the effect size estimation within the families, the actual data set can inform the construction of the prior odds without any statistical penalty. In contrast to Bayesian approaches that have recently been proposed for genomewide association studies, our approach does not require assumptions about the genetic effect size; this makes the proposed method entirely data-driven. The power of the approach was assessed through simulation. We then applied the approach to a genomewide association scan to search for associations between single nucleotide polymorphisms and body mass index in the Childhood Asthma Management Program data.
family-based association tests; Bayes factors; complex traits
For genome-wide association studies in family-based designs, a new, universally applicable approach is proposed. Using a modified Liptak’s method, we combine the p-value of the family-based association test (FBAT) statistic with the p-value for the Van Steen-statistic. The Van Steen-statistic is independent of the FBAT-statistic and utilizes information that is ignored by traditional FBAT-approaches. The new test statistic takes advantages of all available information about the genetic association, while, by virtue of its design, it achieves complete robustness against confounding due to population stratification. The approach is suitable for the analysis of almost any trait type for which FBATs are available, e.g. binary, continuous, time to-onset, multivariate, etc. The efficiency and the validity of the new approach depend on the specification of a nuisance/tuning parameter and the weight parameters in the modified Liptak’s method. For different trait types and ascertainment conditions, we discuss general guidelines for the optimal specification of the tuning parameter and the weight parameters. Our simulation experiments and an application to an Alzheimer study show the validity and the efficiency of the new method, which achieves power levels that are comparable to those of population-based approaches.
FBAT; Liptak’s method; Tuning parameter
Biological and positional evidence supports the involvement of the GAD1 and distal-less homeobox genes (DLXs) in the etiology of autism. We investigated 42 SNPs in these genes as risk factors for autism spectrum disorders (ASD) in a large family-based association study of 715 nuclear families. No single marker showed significant association after correction for multiple testing. A rare haplotype in the DLX1 promoter was associated with ASD (p-value = 0.001). Given the importance of rare variants to the etiology of autism revealed in recent studies, the observed rare haplotype may be relevant to future investigations. Our observations, when taken together with previous findings, suggest that common genetic variation in the GAD1 and DLX genes is unlikely to play a critical role in ASD susceptibility.
Autism spectrum disorder; genetic association; candidate gene study; DLX homeobox; GAD1