1.  Whole genome HBV deletion profiles and the accumulation of preS deletion mutant during antiviral treatment 
BMC Microbiology  2012;12:307.
Hepatitis B virus (HBV), because of its error-prone viral polymerase, has a high mutation rate leading to widespread substitutions, deletions, and insertions in the HBV genome. Deletions may significantly change viral biological features complicating the progression of liver diseases. However, the clinical conditions correlating to the accumulation of deleted mutants remain unclear. In this study, we explored HBV deletion patterns and their association with disease status and antiviral treatment by performing whole genome sequencing on samples from 51 hepatitis B patients and by monitoring changes in deletion variants during treatment. Clone sequencing was used to analyze preS regions in another cohort of 52 patients.
Among the core, preS, and basic core promoter (BCP) deletion hotspots, we identified preS to have the highest frequency and the most complex deletion pattern using whole genome sequencing. Further clone sequencing analysis on preS identified 70 deletions which were classified into 4 types, the most common being preS2. Also, in contrast to the core and BCP regions, most preS deletions were in-frame. Most deletions interrupted viral surface epitopes, and are possibly involved in evading immuno-surveillance. Among various clinical factors examined, logistic regression showed that antiviral medication affected the accumulation of deletion mutants (OR = 6.81, 95% CI = 1.296 ~ 35.817, P = 0.023). In chronic carriers of the virus, and individuals with chronic hepatitis, the deletion rate was significantly higher in the antiviral treatment group (Fisher exact test, P = 0.007). Particularly, preS2 deletions were associated with the usage of nucleos(t)ide analog therapy (Fisher exact test, P = 0.023). Dynamic increases in preS1 or preS2 deletions were also observed in quasispecies from samples taken from patients before and after three months of ADV therapy. In vitro experiments demonstrated that preS2 deletions alone were not responsible for antiviral resistance, implying the coordination between wild type and mutant strains during viral survival and disease development.
We present the HBV deletion distribution patterns and preS deletion substructures in viral genomes that are prevalent in northern China. The accumulation of preS deletion mutants during nucleos(t)ide analog therapy may be due to viral escape from host immuno-surveillance.
PMCID: PMC3549285  PMID: 23272650
HBV; Deletion; PreS; Chronic hepatitis; Antiviral therapy; Nucleotide analog
2.  Haplo2Ped: a tool using haplotypes as markers for linkage analysis 
BMC Bioinformatics  2011;12:350.
Generally, SNPs are abundant in the genome; however, they display low power in linkage analysis because of their limited heterozygosity. Haplotype markers, on the other hand, which are composed of many SNPs, greatly increase heterozygosity and have superiority in linkage statistics.
Here we developed Haplo2Ped to automatically transform SNP data into haplotype markers and then to compute the logarithm (base 10) of odds (LOD) scores of regional haplotypes that are homozygous within the disease co-segregation haploid group. The results are reported as a hypertext file and a 3D figure to help users to obtain the candidate linkage regions. The hypertext file contains parameters of the disease linked regions, candidate genes, and their links to public databases. The 3D figure clearly displays the linkage signals in each chromosome. We tested Haplo2Ped in a simulated SNP dataset and also applied it to data from a real study. It successfully and accurately located the causative genomic regions. Comparison of Haplo2Ped with other existing software for linkage analysis further indicated the high effectiveness of this software.
Haplo2Ped uses haplotype fragments as mapping markers in whole genome linkage analysis. The advantages of Haplo2Ped over other existing software include straightforward output files, increased accuracy and superior ability to deal with pedigrees showing incomplete penetrance. Haplo2Ped is freely available at:
PMCID: PMC3179971  PMID: 21854652
3.  Assessment of exonic single nucleotide polymorphisms in the adenosine A2A receptor gene to high myopia susceptibility in Chinese subjects 
Molecular Vision  2011;17:486-491.
The adenosine A2A receptor (A2AR) modulates collagen synthesis and extracellular matrix production in ocular tissues that contribute to eye growth and the development of myopia. We aimed to determine if single nucleotide polymorphisms (SNPs) in A2AR exons associates with high myopia found in Chinese subjects.
DNA samples were prepared from venous lymphocytes of 175 Chinese subjects with high myopia of less than –8.00 diopters (D) correction and 101 ethnically similar controls with between –1.00 D and +1.00 D correction. The coding region sequences of A2AR were amplified by PCR and analyzed by Sanger sequencing. The detected variations were confirmed by reverse sequencing. Allelic frequencies of all detected common SNPs were assessed for Hardy–Weinberg equilibrium.
Five variations in A2AR exons, 5675 A>G, 5765 C>T, 13325 G>A, 13448 C>T, and 14000 T>A, were detected in controls at a low frequency (<1%). However, one SNP, 13772 T>C (rs5751876), showed its polymorphism in 53.3% of the total study population. The rs5751876 is a synonymous substitution located in a tyrosine codon of exon 2. Despite no significant difference in genotype distribution between cases and controls, the frequency of heterozygotes with the rs5751876 genotype was significantly lower in subjects with high myopia.
The reduced frequency of the heterozygote rs5751876 genotype in subjects suggests a possible association of A2AR with high myopia in a Chinese population.
PMCID: PMC3380451  PMID: 22740769
4.  Prevalent HBV point mutations and mutation combinations at BCP/preC region and their association with liver disease progression 
BMC Infectious Diseases  2010;10:271.
Mutations in the basic core promoter (BCP) and its adjacent precore (preC) region in HBV genome are common in chronic hepatitis B patients. However, the patterns of mutation combinations in these two regions during chronic infection are less understood. This study focused on single base mutations in BCP and preC region and the multi-mutation patterns observed in chronic HBV infection patients.
Total 192 blood samples of chronic HBV infection patients were included. Direct PCR sequencing on the target region of HBV genome was successfully conducted in 157 samples. The rest 35 samples were analyzed by clone sequencing. Only the nucleotide substitutions with their frequencies no less than 10% were included in multi-mutation analysis with the exception for the polymorphic sites between genotypes B and C.
Five high frequency mutations (≥10%) were found in BCP and preC region. Thirteen types of multi-mutations in one fragment were observed, among which 3 types were common combinations (≥5%). The top three multi-mutations were A1762T/G1764A (36%), A1762T/G1764A/G1896A (11%) and T1753(A/C)/A1762T/G1764A/G1896A (8%). Patients with multi-mutations in viral genomes (≥3) were more likely to have liver cirrhosis or hepatocellular carcinoma (OR = 3.1, 95% CI: 1.6-6.0, P = 0.001). G1896A mutation seemed to be involved in liver disease progression independent of the patient age (OR = 3.6, 95% CI: 1.5-8.6; P = 0.004). In addition, patients with more viral mutations detected (≥3) were more likely to be HBeAg negative (OR = 2.7, 95% CI: 1.1-6.4; P = 0.027). Moreover, G1776A mutation was shown to contribute to HBeAg negativity in our study (OR = 8.6, 95% CI: 1.2-44.9; P = 0.01).
Patients with advanced liver diseases and with HBeAg negativity more likely have multi-mutations in HBV genomes but with different mutation combination patterns. G1896A mutation appears to be independent of infection history.
PMCID: PMC2949759  PMID: 20846420
5.  Evaluation of BLID and LOC399959 as candidate genes for high myopia in the Chinese Han population 
Molecular Vision  2010;16:1920-1927.
BH3-like motif containing, cell death inducer (BLID) and LOC399959 are two genes associated with the single nucleotide polymorphism (SNP) rs577948, which is a susceptibility locus for high myopia in Japanese subjects. The purpose of this study was to determine if BLID and LOC399959 are associated with high myopia in Chinese Han subjects.
High myopia subjects (n=476) had a spherical refractive error of less than −6.00 D in at least one eye and/or an axial length greater than 26 mm. Genomic DNA was extracted and genotyped from peripheral blood leukocytes of high myopes and controls (n=275). Using a case-control association study of candidate regions, linkage disequilibrium blocks for 19 tag SNPs (tSNPs), including rs577948, harbored within and surrounding the BLID and LOC399959 genes were analyzed on a MassArray platform using iPlex chemistry. Each of the tSNPs had an r2>0.8 and minor allele frequency >10% in the Chinese Han population. Haplotype association analysis was performed on Haploview 4.1 using Chi-square (χ2) tests.
None of the 19 tSNPs were statistically associated with high myopia.
While rs577948 may be associated with high myopia in Japanese subjects, it and the other tSNPs near the BLID and LOC399959 genes are not susceptibility loci for high myopia in the Chinese Han population. Thus, associations of SNPs with high myopia as determined by Genome-Wide Association Study (GWAS) may be restricted to certain ethnic or genetically distinct populations. Without systematic replication in other populations, the results of GWAS associations should be interpreted with great caution.
PMCID: PMC2956664  PMID: 21031016
6.  SNP@Evolution: a hierarchical database of positive selection on the human genome 
Positive selection is a driving force that has shaped the modern human. Recent developments in high throughput technologies and corresponding statistics tools have made it possible to conduct whole genome surveys at a population scale, and a variety of measurements, such as heterozygosity (HET), FST, and Tajima's D, have been applied to multiple datasets to identify signals of positive selection. However, great effort has been required to combine various types of data from individual sources, and incompatibility among datasets has been a common problem. SNP@Evolution, a new database which integrates multiple datasets, will greatly assist future work in this area.
As part of our research scanning for evolutionary signals in HapMap Phase II and Phase III datasets, we built SNP@Evolution as a multi-aspect database focused on positive selection. Among its many features, SNP@Evolution provides computed FST and HET of all HapMap SNPs, 5+ HapMap SNPs per qualified gene, and all autosome regions detected from whole genome window scanning. In an attempt to capture multiple selection signals across the genome, selection-signal enrichment strength (ES) values of HET, FST, and P-values of iHS of most annotated genes have been calculated and integrated within one frame for users to search for outliers. Genes with significant ES or P-values (with thresholds of 0.95 and 0.05, respectively) have been highlighted in color. Low diversity chromosome regions have been detected by sliding a 100 kb window in a 10 kb step. To allow this information to be easily disseminated, a graphical user interface (GBrowser) was constructed with the Generic Model Organism Database toolkit.
Available at , SNP@Evolution is a hierarchical database focused on positive selection of the human genome. Based on HapMap Phase II and III data, SNP@Evolution includes 3,619,226/1,389,498 SNPs with their computed HET and FST, as well as qualified genes of 21,859/21,099 with ES values of HET and FST. In at least one HapMap population group, window scanning for selection signals has resulted in 1,606/10,138 large low HET regions. Among Phase II and III geographical groups, 660 and 464 regions show strong differentiation.
PMCID: PMC2755008  PMID: 19732458
7.  A High-Temporal Resolution Technology for Dynamic Proteomic Analysis Based on 35S Labeling 
PLoS ONE  2008;3(8):e2991.
As more and more research efforts have been attracted to dynamic or differential proteomics, a method with high temporal resolution and high throughput is required. In present study, a 35S in vivo Labeling Analysis for Dynamic Proteomics (SiLAD) was designed and tested by analyzing the dynamic proteome changes in the highly synchronized A549 cells, as well as in the rat liver 2/3 partial hepatectomy surgery. The results validated that SiLAD technique, in combination with 2-Dimensional Electrophoresis, provided a highly sensitivity method to illustrate the non-disturbed endogenous proteins dynamic changes with a good temporal resolution and high signal/noise ratio. A significant number of differential proteins can be discovered or re-categorized by this technique. Another unique feature of SiLAD is its capability of quantifying the rate of protein expression, which reflects the cellular physiological turn points more effectively. Finally, the prescribed SiLAD proteome snapshot pattern could be potentially used as an exclusive symbol for characterizing each stage in well regulated biological processes.
PMCID: PMC2500177  PMID: 18714357
8.  Retraction: Universal primers for HBV genome DNA amplification across subtypes: a case study for designing more effective viral primers 
Virology Journal  2007;4:119.
This is a retraction of the article submitted by Zhang et al. Virology J 2007, 4:92
PMCID: PMC2131747  PMID: 17974018
9.  Systematic analysis of alternative first exons in plant genomes 
BMC Plant Biology  2007;7:55.
Alternative splicing (AS) contributes significantly to protein diversity, by selectively using different combinations of exons of the same gene under certain circumstances. One particular type of AS is the use of alternative first exons (AFEs), which can have consequences far beyond the fine-tuning of protein functions. For example, AFEs may change the N-termini of proteins and thereby direct them to different cellular compartments. When alternative first exons are distant, they are usually associated with alternative promoters, thereby conferring an extra level of gene expression regulation. However, only few studies have examined the patterns of AFEs, and these analyses were mainly focused on mammalian genomes. Recent studies have shown that AFEs exist in the rice genome, and are regulated in a tissue-specific manner. Our current understanding of AFEs in plants is still limited, including important issues such as their regulation, contribution to protein diversity, and evolutionary conservation.
We systematically identified 1,378 and 645 AFE-containing clusters in rice and Arabidopsis, respectively. From our data sets, we identified two types of AFEs according to their genomic organisation. In genes with type I AFEs, the first exons are mutually exclusive, while most of the downstream exons are shared among alternative transcripts. Conversely, in genes with type II AFEs, the first exon of one gene structure is an internal exon of an alternative gene structure. The functionality analysis indicated about half and ~19% of the AFEs in Arabidopsis and rice could alter N-terminal protein sequences, and ~5% of the functional alteration in type II AFEs involved protein domain addition/deletion in both genomes. Expression analysis indicated that 20~66% of rice AFE clusters were tissue- and/or development- specifically transcribed, which is consistent with previous observations; however, a much smaller percentage of Arabidopsis AFEs was regulated in this manner, which suggests different regulation mechanisms of AFEs between rice and Arabidopsis. Statistical analysis of some features of AFE clusters, such as splice-site strength and secondary structure formation further revealed differences between these two species. Orthologous search of AFE-containing gene pairs detected only 19 gene pairs conserved between rice and Arabidopsis, accounting only for a few percent of AFE-containing clusters.
Our analysis of AFE-containing genes in rice and Arabidopsis indicates that AFEs have multiple functions, from regulating gene expression to generating protein diversity. Comparisons of AFE clusters revealed different features in the two plant species, which indicates that AFEs may have evolved independently after the separation of rice (a model monocot) and Arabidopsis (a model dicot).
PMCID: PMC2174465  PMID: 17941993
10.  Universal primers for HBV genome DNA amplification across subtypes: a case study for designing more effective viral primers 
Virology Journal  2007;4:92.
The highly heterogenic characteristic of viruses is the major obstacle to efficient DNA amplification. Taking advantage of the large number of virus DNA sequences in public databases to select conserved sites for primer design is an optimal way to tackle the difficulties in virus genome amplification.
Here we use hepatitis B virus as an example to introduce a simple and efficient way for virus primer design. Based on the alignment of HBV sequences in public databases and a program BxB in Perl script, our method selected several optimal sites for HBV primer design. Polymerase chain reaction showed that compared with the success rate of the most popular primers for whole genome amplification of HBV, one set of primers for full length genome amplification and four sets of walking primers showed significant improvement. These newly designed primers are suitable for most subtypes of HBV.
Researchers can extend the method described here to design universal or subtype specific primers for various types of viruses. The BxB program based on multiple sequence alignment not only can be used as a separate tool but also can be integrated in any open source primer design software to select conserved regions for primer design.
PMCID: PMC2099425  PMID: 17892576
11.  Participation of the C-Terminal Domain of RNA Polymerase II in Exon Definition during Pre-mRNA Splicing 
Molecular and Cellular Biology  2000;20(21):8290-8301.
Interaction between transcription and pre-mRNA processing via binding of polymerase II (Pol II) to factors involved in capping, splicing, and polyadenylation has recently been demonstrated. The C-terminal domain (CTD), a highly phosphorylated repeat sequence of the largest subunit of Pol II, has been implicated in this interaction because deletion of this domain affects downstream RNA processing events and because it is the binding site for numerous processing factors. Here we show that recombinant CTD, free of other components of Pol II, activated in vitro splicing and assembly of the spliceosome in nuclear extracts if, and only if, the assayed precursor RNA was recognized via exon definition, i.e., if the substrates contained complete exons with both 3′ and 5′ splice sites. Furthermore, depletion of intact Pol II inactivated splicing of this set of precursor RNAs and addition of recombinant CTD restored activity. The added recombinant CTD was quickly hyper- and hypophosphorylated in extract, became associated with the precursor RNA, and stimulated the association of U1 snRNPs but not ASF/SF2 with substrate RNA. These observations suggest that the mode of interaction between the CTD and splicing factors is integrally tied to exon definition and the mechanism whereby distal exons can be recognized and brought into juxtaposition during assembly of the spliceosome.
PMCID: PMC86437  PMID: 11027297
12.  The Genomes of Oryza sativa: A History of Duplications 
Yu, Jun | Wang, Jun | Lin, Wei | Li, Songgang | Li, Heng | Zhou, Jun | Ni, Peixiang | Dong, Wei | Hu, Songnian | Zeng, Changqing | Zhang, Jianguo | Zhang, Yong | Li, Ruiqiang | Xu, Zuyuan | Li, Shengting | Li, Xianran | Zheng, Hongkun | Cong, Lijuan | Lin, Liang | Yin, Jianning | Geng, Jianing | Li, Guangyuan | Shi, Jianping | Liu, Juan | Lv, Hong | Li, Jun | Wang, Jing | Deng, Yajun | Ran, Longhua | Shi, Xiaoli | Wang, Xiyin | Wu, Qingfa | Li, Changfeng | Ren, Xiaoyu | Wang, Jingqiang | Wang, Xiaoling | Li, Dawei | Liu, Dongyuan | Zhang, Xiaowei | Ji, Zhendong | Zhao, Wenming | Sun, Yongqiao | Zhang, Zhenpeng | Bao, Jingyue | Han, Yujun | Dong, Lingli | Ji, Jia | Chen, Peng | Wu, Shuming | Liu, Jinsong | Xiao, Ying | Bu, Dongbo | Tan, Jianlong | Yang, Li | Ye, Chen | Zhang, Jingfen | Xu, Jingyi | Zhou, Yan | Yu, Yingpu | Zhang, Bing | Zhuang, Shulin | Wei, Haibin | Liu, Bin | Lei, Meng | Yu, Hong | Li, Yuanzhe | Xu, Hao | Wei, Shulin | He, Ximiao | Fang, Lijun | Zhang, Zengjin | Zhang, Yunze | Huang, Xiangang | Su, Zhixi | Tong, Wei | Li, Jinhong | Tong, Zongzhong | Li, Shuangli | Ye, Jia | Wang, Lishun | Fang, Lin | Lei, Tingting | Chen, Chen | Chen, Huan | Xu, Zhao | Li, Haihong | Huang, Haiyan | Zhang, Feng | Xu, Huayong | Li, Na | Zhao, Caifeng | Li, Shuting | Dong, Lijun | Huang, Yanqing | Li, Long | Xi, Yan | Qi, Qiuhui | Li, Wenjie | Zhang, Bo | Hu, Wei | Zhang, Yanling | Tian, Xiangjun | Jiao, Yongzhi | Liang, Xiaohu | Jin, Jiao | Gao, Lei | Zheng, Weimou | Hao, Bailin | Liu, Siqi | Wang, Wen | Yuan, Longping | Cao, Mengliang | McDermott, Jason | Samudrala, Ram | Wang, Jian | Wong, Gane Ka-Shu | Yang, Huanming | Bennetzen, Jeff
PLoS Biology  2005;3(2):e38.
We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000–40,000. Only 2%–3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism (SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family.
Comparative genome sequencing of indica and japonica rice reveals that duplication of genes and genomic regions has played a major part in the evolution of grass genomes
PMCID: PMC546038  PMID: 15685292

