PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-10 (10)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Diminishing return for increased Mappability with longer sequencing reads: implications of the k-mer distributions in the human genome 
BMC Bioinformatics  2014;15:2.
Background
The amount of non-unique sequence (non-singletons) in a genome directly affects the difficulty of read alignment to a reference assembly for high throughput-sequencing data. Although a longer read is more likely to be uniquely mapped to the reference genome, a quantitative analysis of the influence of read lengths on mappability has been lacking. To address this question, we evaluate the k-mer distribution of the human reference genome. The k-mer frequency is determined for k ranging from 20 bp to 1000 bp.
Results
We observe that the proportion of non-singletons k-mers decreases slowly with increasing k, and can be fitted by piecewise power-law functions with different exponents at different ranges of k. A slower decay at greater values for k indicates more limited gains in mappability for read lengths between 200 bp and 1000 bp. The frequency distributions of k-mers exhibit long tails with a power-law-like trend, and rank frequency plots exhibit a concave Zipf’s curve. The most frequent 1000-mers comprise 172 regions, which include four large stretches on chromosomes 1 and X, containing genes of biomedical relevance. Comparison with other databases indicates that the 172 regions can be broadly classified into two types: those containing LINE transposable elements and those containing segmental duplications.
Conclusion
Read mappability as measured by the proportion of singletons increases steadily up to the length scale around 200 bp. When read length increases above 200 bp, smaller gains in mappability are expected. Moreover, the proportion of non-singletons decreases with read lengths much slower than linear. Even a read length of 1000 bp would not allow the unique alignment of reads for many coding regions of human genes. A mix of techniques will be needed for efficiently producing high-quality data that cover the complete human genome.
doi:10.1186/1471-2105-15-2
PMCID: PMC3927684  PMID: 24386976
Next-generation sequencing; Read alignment; Repeat sequences; Genome redundancy; Long-tail distribution; k-mers
2.  CSK regulatory polymorphism is associated with systemic lupus erythematosus and influences B cell signaling and activation 
Nature genetics  2012;44(11):1227-1230.
C-src tyrosine kinase, Csk, physically interacts with the intracellular phosphatase Lyp (PTPN22) and can modify the activation state of downstream Src kinases, such as Lyn, in lymphocytes. We identified an association of Csk with systemic lupus erythematosus (SLE) and refined its location to an intronic polymorphism rs34933034 (OR 1.32, p = 1.04 × 10−9). The risk allele is associated with increased CSK expression and augments inhibitory phosphorylation of Lyn. In carriers of the risk allele, B cell receptor (BCR)-mediated activation of mature B cells, as well as plasma IgM, are increased. Moreover, the fraction of transitional B cells is doubled in the cord blood of carriers of the risk allele compared to non-risk haplotypes due to an expansion of the late transitional cells, a stage targeted by selection mechanisms. This suggests that the Lyp-Csk complex increases susceptibility to lupus at multiple maturation and activation points of B cells.
doi:10.1038/ng.2439
PMCID: PMC3715052  PMID: 23042117
3.  Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis 
Nature genetics  2012;44(3):291-296.
The genetic association of the major histocompatibility complex (MHC) to rheumatoid arthritis risk has commonly been attributed to HLA-DRB1 alleles. Yet controversy persists about the causal variants in HLA-DRB1 and the presence of independent effects elsewhere in the MHC. Using existing genome-wide SNP data in 5,018 seropositive cases and 14,974 controls, we imputed and tested classical alleles and amino acid polymorphisms for HLA-A, B, C, DPA1, DPB1, DQA1, DQB1, and DRB1 along with 3,117 SNPs across the MHC. Conditional and haplotype analyses reveal that three amino acid positions (11, 71 and 74) in HLA-DRβ1, and single amino acid polymorphisms in HLA-B (position 9) and HLA-DPβ1 (position 9), all located in the peptide-binding grooves, almost completely explain the MHC association to disease risk. This study illustrates how imputation of functional variation from large reference panels can help fine-map association signals in the MHC.
doi:10.1038/ng.1076
PMCID: PMC3288335  PMID: 22286218
4.  A Simple Method for Analyzing Exome Sequencing Data Shows Distinct Levels of Nonsynonymous Variation for Human Immune and Nervous System Genes 
PLoS ONE  2012;7(6):e38087.
To measure the strength of natural selection that acts upon single nucleotide variants (SNVs) in a set of human genes, we calculate the ratio between nonsynonymous SNVs (nsSNVs) per nonsynonymous site and synonymous SNVs (sSNVs) per synonymous site. We transform this ratio with a respective factor f that corrects for the bias of synonymous sites towards transitions in the genetic code and different mutation rates for transitions and transversions. This method approximates the relative density of nsSNVs (rdnsv) in comparison with the neutral expectation as inferred from the density of sSNVs. Using SNVs from a diploid genome and 200 exomes, we apply our method to immune system genes (ISGs), nervous system genes (NSGs), randomly sampled genes (RSGs), and gene ontology annotated genes. The estimate of rdnsv in an individual exome is around 20% for NSGs and 30–40% for ISGs and RSGs. This smaller rdnsv of NSGs indicates overall stronger purifying selection. To quantify the relative shift of nsSNVs towards rare variants, we next fit a linear regression model to the estimates of rdnsv over different SNV allele frequency bins. The obtained regression models show a negative slope for NSGs, ISGs and RSGs, supporting an influence of purifying selection on the frequency spectrum of segregating nsSNVs. The y-intercept of the model predicts rdnsv for an allele frequency close to 0. This parameter can be interpreted as the proportion of nonsynonymous sites where mutations are tolerated to segregate with an allele frequency notably greater than 0 in the population, given the performed normalization of the observed nsSNV to sSNV ratio. A smaller y-intercept is displayed by NSGs, indicating more nonsynonymous sites under strong negative selection. This predicts more monogenically inherited or de-novo mutation diseases that affect the nervous system.
doi:10.1371/journal.pone.0038087
PMCID: PMC3368947  PMID: 22701602
5.  Locus category based analysis of a large genome-wide association study of rheumatoid arthritis 
Human Molecular Genetics  2010;19(19):3863-3872.
To pinpoint true positive single-nucleotide polymorphism (SNP) associations in a genome-wide association study (GWAS) of rheumatoid arthritis (RA), we categorize genetic loci by external knowledge. We test both the ‘enrichment of associated loci’ in a locus category and the ‘combined association’ of a locus category. The former is quantified by the odds ratio for the presence of SNP associations at the loci of a category, whereas the latter is quantified by the number of loci in a category that have SNP associations. These measures are compared with their expected values as obtained from the permutation of the affection status. To account for linkage disequilibrium (LD) among SNPs, we view each LD block as a genetic locus. Positional candidates were defined as loci implicated by earlier GWAS results, whereas functional candidates were defined by annotations regarding the molecular roles of genes, such as gene ontology categories. As expected, immune-related categories show the largest enrichment signal, although it is not very strong. The intersection of positional and functional candidate information predicts novel RA loci near the genes TEC/TXK, MBL2 and PIK3R1/CD180. Notably, a combined association signal is not only produced by immune-related categories, but also by most other categories and even randomly defined categories. The unspecific quality of these signals limits the possible conclusions from combined association tests. It also reduces the magnitude of enrichment test results. These unspecific signals might result from common variants of small effect and hardly concentrated in candidate categories, or an inflated size of associated regions from weak LD with infrequent mutations.
doi:10.1093/hmg/ddq304
PMCID: PMC2935861  PMID: 20639398
6.  Refining the association of MHC with multiple sclerosis in African Americans 
Human Molecular Genetics  2010;19(15):3080-3088.
Multiple sclerosis (MS) is a common demyelinating disease of the central nervous system mediated by autoimmune and neurodegenerative pathogenic mechanisms. Multiple genes account for its moderate heritability, but the only genetic region shown to have a large replicable effect on MS susceptibility is the major histocompatibility complex (MHC). Strong linkage disequilibrium (LD) across the MHC has made it difficult to fully characterize individual genetic contributions of this region to MS risk in previous studies. African Americans are at a lower risk for MS when compared with northern Europeans and Americans of European descent, but greater haplotypic diversity and distinct patterns of LD suggest that this population may be particularly informative for fine-mapping efforts. To examine the role of the MHC in African American MS, a case–control association study was performed with 499 African American MS patients and 750 African American controls that were genotyped for 6040 MHC region single nucleotide polymorphisms (SNPs). A replication data set consisting of 451 African American patients and 718 African American controls was genotyped for selected SNPs. Two MHC class II SNPs, rs2647040 and rs3135021, were significant in the replication cohort and partially tagged DRB1*15 alleles. Surprisingly, in comparison to similar studies of individuals of European descent, the MHC seems to play a smaller role in MS susceptibility in African Americans, consistent with pervasive genetic heterogeneity across ancestral groups, and may explain the difference in MS susceptibility between African Americans and individuals of European descent.
doi:10.1093/hmg/ddq197
PMCID: PMC2901136  PMID: 20466734
7.  Functionally defective germline variants of sialic acid acetylesterase in autoimmunity 
Nature  2010;466(7303):243-247.
Sialic acid acetylesterase (SIAE) is an enzyme that negatively regulates B lymphocyte antigen receptor signaling and is required for the maintenance of immunological tolerance in mice1, 2. Heterozygous loss-of-function germline rare variants and a homozygous defective polymorphic variant of SIAE were identified in 24/923 Caucasian subjects with relatively common autoimmune disorders and in 2/648 Caucasian controls. All heterozygous loss-of-function SIAE mutations tested were capable of functioning in a dominant negative manner. A homozygous secretion-defective polymorphic variant of SIAE was catalytically active, lacked the ability to function in a dominant negative manner, and was seen in 8 autoimmune subjects but in no control subjects. The Odds Ratio for inheriting defective SIAE alleles was 8.6 in all autoimmune subjects, 8.3 in subjects with rheumatoid arthritis, and 7.9 in subjects with type I diabetes. Functionally defective SIAE rare and polymorphic variants represent a strong genetic link to susceptibility in relatively common human autoimmune disorders.
doi:10.1038/nature09115
PMCID: PMC2900412  PMID: 20555325
8.  Recent positive selection of a human androgen receptor/ectodysplasin A2 receptor haplotype and its relationship to male pattern baldness 
Human Genetics  2009;126:255-264.
Genetic variants in the human androgen receptor gene (AR) are associated with male pattern baldness (androgenetic alopecia, AGA) in Europeans. Previous observations of long-range linkage disequilibrium at the AR locus are consistent with the hypothesis of recent positive selection. Here, we further investigate this signature and its relationship to the AGA risk haplotype. The haplotype homozygosity suggests that the AGA risk haplotype was driven to high frequency by positive selection in Europeans although a low meiotic recombination rate contributed to the high haplotype homozygosity. Further, we find high levels of population differentiation as measured by FST and a series of fixed derived alleles along an extended region centromeric to AR in the Asian HapMap sample. The predominant AGA risk haplotype also carries the putatively functional variant 57K in the flanking ectodysplasin A2 receptor gene (EDA2R). It is therefore probable that the AGA risk haplotype rose to high frequency in combination with this EDA2R variant, possibly by hitchhiking on a positively selected 57K haplotype.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-009-0668-z) contains supplementary material, which is available to authorized users.
doi:10.1007/s00439-009-0668-z
PMCID: PMC3774421  PMID: 19373488
9.  Partial correlation analysis indicates causal relationships between GC-content, exon density and recombination rate in the human genome 
BMC Bioinformatics  2009;10(Suppl 1):S66.
Background
Several features are known to correlate with the GC-content in the human genome, including recombination rate, gene density and distance to telomere. However, by testing for pairwise correlation only, it is impossible to distinguish direct associations from indirect ones and to distinguish between causes and effects.
Results
We use partial correlations to construct partially directed graphs for the following four variables: GC-content, recombination rate, exon density and distance-to-telomere. Recombination rate and exon density are unconditionally uncorrelated, but become inversely correlated by conditioning on GC-content. This pattern indicates a model where recombination rate and exon density are two independent causes of GC-content variation.
Conclusion
Causal inference and graphical models are useful methods to understand genome evolution and the mechanisms of isochore evolution in the human genome.
doi:10.1186/1471-2105-10-S1-S66
PMCID: PMC2648766  PMID: 19208170
10.  Transancestral mapping of the MHC region in systemic lupus erythematosus identifies new independent and interacting loci at MSH5, HLA-DPB1 and HLA-G 
Annals of the Rheumatic Diseases  2012;71(5):777-784.
Objectives
Systemic lupus erythematosus (SLE) is a chronic multisystem genetically complex autoimmune disease characterised by the production of autoantibodies to nuclear and cellular antigens, tissue inflammation and organ damage. Genome-wide association studies have shown that variants within the major histocompatibility complex (MHC) region on chromosome 6 confer the greatest genetic risk for SLE in European and Chinese populations. However, the causal variants remain elusive due to tight linkage disequilibrium across disease-associated MHC haplotypes, the highly polymorphic nature of many MHC genes and the heterogeneity of the SLE phenotype.
Methods
A high-density case-control single nucleotide polymorphism (SNP) study of the MHC region was undertaken in SLE cohorts of Spanish and Filipino ancestry using a custom Illumina chip in order to fine-map association signals in these haplotypically diverse populations. In addition, comparative analyses were performed between these two datasets and a northern European UK SLE cohort. A total of 1433 cases and 1458 matched controls were examined.
Results
Using this transancestral SNP mapping approach, novel independent loci were identified within the MHC region in UK, Spanish and Filipino patients with SLE with some evidence of interaction. These loci include HLA-DPB1, HLA-G and MSH5 which are independent of each other and HLA-DRB1 alleles. Furthermore, the established SLE-associated HLA-DRB1*15 signal was refined to an interval encompassing HLA-DRB1 and HLA-DQA1. Increased frequencies of MHC region risk alleles and haplotypes were found in the Filipino population compared with Europeans, suggesting that the greater disease burden in non-European SLE may be due in part to this phenomenon.
Conclusion
These data highlight the usefulness of mapping disease susceptibility loci using a transancestral approach, particularly in a region as complex as the MHC, and offer a springboard for further fine-mapping, resequencing and transcriptomic analysis.
doi:10.1136/annrheumdis-2011-200808
PMCID: PMC3329227  PMID: 22233601

Results 1-10 (10)