PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-8 (8)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
author:("Luo, ruijiang")
1.  BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU 
PeerJ  2014;2:e421.
This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA’s speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.
doi:10.7717/peerj.421
PMCID: PMC4060040  PMID: 24949238
Secondary analysis; Whole-genome seqeuncing; Whole-exome sequencing; GPU; Variant calling; Genomics; NGS; HPC
2.  Correction: SOAP3-dp: Fast, Accurate and Sensitive GPU-Based Short Read Aligner 
PLoS ONE  2013;8(8):10.1371/annotation/823f3670-ed17-41ec-ba51-b50281651915.
doi:10.1371/annotation/823f3670-ed17-41ec-ba51-b50281651915
PMCID: PMC3750100
3.  Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species 
Bradnam, Keith R | Fass, Joseph N | Alexandrov, Anton | Baranay, Paul | Bechner, Michael | Birol, Inanç | Boisvert, Sébastien | Chapman, Jarrod A | Chapuis, Guillaume | Chikhi, Rayan | Chitsaz, Hamidreza | Chou, Wen-Chi | Corbeil, Jacques | Del Fabbro, Cristian | Docking, T Roderick | Durbin, Richard | Earl, Dent | Emrich, Scott | Fedotov, Pavel | Fonseca, Nuno A | Ganapathy, Ganeshkumar | Gibbs, Richard A | Gnerre, Sante | Godzaridis, Élénie | Goldstein, Steve | Haimel, Matthias | Hall, Giles | Haussler, David | Hiatt, Joseph B | Ho, Isaac Y | Howard, Jason | Hunt, Martin | Jackman, Shaun D | Jaffe, David B | Jarvis, Erich D | Jiang, Huaiyang | Kazakov, Sergey | Kersey, Paul J | Kitzman, Jacob O | Knight, James R | Koren, Sergey | Lam, Tak-Wah | Lavenier, Dominique | Laviolette, François | Li, Yingrui | Li, Zhenyu | Liu, Binghang | Liu, Yue | Luo, Ruibang | MacCallum, Iain | MacManes, Matthew D | Maillet, Nicolas | Melnikov, Sergey | Naquin, Delphine | Ning, Zemin | Otto, Thomas D | Paten, Benedict | Paulo, Octávio S | Phillippy, Adam M | Pina-Martins, Francisco | Place, Michael | Przybylski, Dariusz | Qin, Xiang | Qu, Carson | Ribeiro, Filipe J | Richards, Stephen | Rokhsar, Daniel S | Ruby, J Graham | Scalabrin, Simone | Schatz, Michael C | Schwartz, David C | Sergushichev, Alexey | Sharpe, Ted | Shaw, Timothy I | Shendure, Jay | Shi, Yujian | Simpson, Jared T | Song, Henry | Tsarev, Fedor | Vezzi, Francesco | Vicedomini, Riccardo | Vieira, Bruno M | Wang, Jun | Worley, Kim C | Yin, Shuangye | Yiu, Siu-Ming | Yuan, Jianying | Zhang, Guojie | Zhang, Hao | Zhou, Shiguo | Korf, Ian F
GigaScience  2013;2:10.
Background
The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly.
Results
In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies.
Conclusions
Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.
doi:10.1186/2047-217X-2-10
PMCID: PMC3844414  PMID: 23870653
Genome assembly; N50; Scaffolds; Assessment; Heterozygosity; COMPASS
4.  Sequencing of Fifty Human Exomes Reveals Adaptation to High Altitude 
Science (New York, N.Y.)  2010;329(5987):75-78.
Residents of the Tibetan Plateau show heritable adaptations to extreme altitude. We sequenced 50 exomes of ethnic Tibetans, encompassing coding sequences of 92% of human genes, with an average coverage of 18X per individual. Genes showing population-specific allele frequency changes, which represent strong candidates for altitude adaptation, were identified. The strongest signal of natural selection came from EPAS1, a transcription factor involved in response to hypoxia. One SNP at EPAS1 shows a 78% frequency difference between Tibetan and Han samples, representing the fastest allele frequency change observed at any human gene to date. This SNP’s association with erythrocyte abundance supports the role of EPAS1 in adaptation to hypoxia. Thus, a population genomic survey has revealed a functionally important locus in genetic adaptation to high altitude.
doi:10.1126/science.1190371
PMCID: PMC3711608  PMID: 20595611
5.  SOAP3-dp: Fast, Accurate and Sensitive GPU-Based Short Read Aligner 
PLoS ONE  2013;8(5):e65632.
To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, CUSHAW2, GEM and GPU-based aligners BarraCUDA and CUSHAW, SOAP3-dp was found to be two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60%. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1% FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides the same scoring scheme as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.
doi:10.1371/journal.pone.0065632
PMCID: PMC3669295  PMID: 23741504
6.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler 
GigaScience  2012;1:18.
Background
There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions.
Findings
To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome.
Conclusions
Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.
doi:10.1186/2047-217X-1-18
PMCID: PMC3626529  PMID: 23587118
Genome; Assembly; Contig; Scaffold; Error correction; Gap-filling
7.  Single-base resolution maps of cultivated and wild rice methylomes and regulatory roles of DNA methylation in plant gene expression 
BMC Genomics  2012;13:300.
Background
DNA methylation plays important biological roles in plants and animals. To examine the rice genomic methylation landscape and assess its functional significance, we generated single-base resolution DNA methylome maps for Asian cultivated rice Oryza sativa ssp. japonica, indica and their wild relatives, Oryza rufipogon and Oryza nivara.
Results
The overall methylation level of rice genomes is four times higher than that of Arabidopsis. Consistent with the results reported for Arabidopsis, methylation in promoters represses gene expression while gene-body methylation generally appears to be positively associated with gene expression. Interestingly, we discovered that methylation in gene transcriptional termination regions (TTRs) can significantly repress gene expression, and the effect is even stronger than that of promoter methylation. Through integrated analysis of genomic, DNA methylomic and transcriptomic differences between cultivated and wild rice, we found that primary DNA sequence divergence is the major determinant of methylational differences at the whole genome level, but DNA methylational difference alone can only account for limited gene expression variation between the cultivated and wild rice. Furthermore, we identified a number of genes with significant difference in methylation level between the wild and cultivated rice.
Conclusions
The single-base resolution methylomes of rice obtained in this study have not only broadened our understanding of the mechanism and function of DNA methylation in plant genomes, but also provided valuable data for future studies of rice epigenetics and the epigenetic differentiation between wild and cultivated rice.
doi:10.1186/1471-2164-13-300
PMCID: PMC3447678  PMID: 22747568
Cultivated and wild rice; Methylomes; Transcriptional termination regions (TTRs); Gene expression
8.  The DNA Methylome of Human Peripheral Blood Mononuclear Cells 
PLoS Biology  2010;8(11):e1000533.
Analysis across the genome of patterns of DNA methylation reveals a rich landscape of allele-specific epigenetic modification and consequent effects on allele-specific gene expression.
DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies.
Author Summary
Epigenetic modifications such as addition of methyl groups to cytosine in DNA play a role in regulating gene expression. To better understand these processes, knowledge of the methylation status of all cytosine bases in the genome (the methylome) is required. DNA methylation can differ between the two gene copies (alleles) in each cell. Such allele-specific methylation (ASM) can be due to parental origin of the alleles (imprinting), X chromosome inactivation in females, and other as yet unknown mechanisms. This may significantly alter the expression profile arising from different allele combinations in different individuals. Using advanced sequencing technology, we have determined the methylome of human peripheral blood mononuclear cells (PBMC). Importantly, the PBMC were obtained from the same male Han Chinese individual whose complete genome had previously been determined. This allowed us, for the first time, to study genome-wide differences in ASM. Our analysis shows that ASM in PBMC is higher than can be accounted for by regions known to undergo parent-of-origin imprinting and frequently (>80%) correlates with allele-specific expression (ASE) of the corresponding gene. In addition, our data reveal a rich landscape of epigenomic variation for 20 genomic features, including regulatory, coding, and non-coding sequences, and provide a valuable resource for future studies. Our work further establishes whole-genome sequencing as an efficient method for methylome analysis.
doi:10.1371/journal.pbio.1000533
PMCID: PMC2976721  PMID: 21085693

Results 1-8 (8)