Search tips
Search criteria

Results 1-23 (23)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
author:("Chen, nugen")
2.  A Comprehensive Functional Map of the Hepatitis C Virus Genome Provides a Resource for Probing Viral Proteins 
mBio  2014;5(5):e01469-14.
Pairing high-throughput sequencing technologies with high-throughput mutagenesis enables genome-wide investigations of pathogenic organisms. Knowledge of the specific functions of protein domains encoded by the genome of the hepatitis C virus (HCV), a major human pathogen that contributes to liver disease worldwide, remains limited to insight from small-scale studies. To enhance the capabilities of HCV researchers, we have obtained a high-resolution functional map of the entire viral genome by combining transposon-based insertional mutagenesis with next-generation sequencing. We generated a library of 8,398 mutagenized HCV clones, each containing one 15-nucleotide sequence inserted at a unique genomic position. We passaged this library in hepatic cells, recovered virus pools, and simultaneously assayed the abundance of mutant viruses in each pool by next-generation sequencing. To illustrate the validity of the functional profile, we compared the genetic footprints of viral proteins with previously solved protein structures. Moreover, we show the utility of these genetic footprints in the identification of candidate regions for epitope tag insertion. In a second application, we screened the genetic footprints for phenotypes that reflected defects in later steps of the viral life cycle. We confirmed that viruses with insertions in a region of the nonstructural protein NS4B had a defect in infectivity while maintaining genome replication. Overall, our genome-wide HCV mutant library and the genetic footprints obtained by high-resolution profiling represent valuable new resources for the research community that can direct the attention of investigators toward unidentified roles of individual protein domains.
Our insertional mutagenesis library provides a resource that illustrates the effects of relatively small insertions on local protein structure and HCV viability. We have also generated complementary resources, including a website ( and a panel of epitope-tagged mutant viruses that should enhance the research capabilities of investigators studying HCV. Researchers can now detect epitope-tagged viral proteins by established antibodies, which will allow biochemical studies of HCV proteins for which antibodies are not readily available. Furthermore, researchers can now quickly look up genotype-phenotype relationships and base further mechanistic studies on the residue-by-residue information from the functional profile. More broadly, this approach offers a general strategy for the systematic functional characterization of viruses on the genome scale.
PMCID: PMC4196222  PMID: 25271282
3.  A Quantitative High-Resolution Genetic Profile Rapidly Identifies Sequence Determinants of Hepatitis C Viral Fitness and Drug Sensitivity 
PLoS Pathogens  2014;10(4):e1004064.
Widely used chemical genetic screens have greatly facilitated the identification of many antiviral agents. However, the regions of interaction and inhibitory mechanisms of many therapeutic candidates have yet to be elucidated. Previous chemical screens identified Daclatasvir (BMS-790052) as a potent nonstructural protein 5A (NS5A) inhibitor for Hepatitis C virus (HCV) infection with an unclear inhibitory mechanism. Here we have developed a quantitative high-resolution genetic (qHRG) approach to systematically map the drug-protein interactions between Daclatasvir and NS5A and profile genetic barriers to Daclatasvir resistance. We implemented saturation mutagenesis in combination with next-generation sequencing technology to systematically quantify the effect of every possible amino acid substitution in the drug-targeted region (domain IA of NS5A) on replication fitness and sensitivity to Daclatasvir. This enabled determination of the residues governing drug-protein interactions. The relative fitness and drug sensitivity profiles also provide a comprehensive reference of the genetic barriers for all possible single amino acid changes during viral evolution, which we utilized to predict clinical outcomes using mathematical models. We envision that this high-resolution profiling methodology will be useful for next-generation drug development to select drugs with higher fitness costs to resistance, and also for informing the rational use of drugs based on viral variant spectra from patients.
Author Summary
The emergence of drug resistance during antiviral treatment limits treatment options and poses challenges to pharmaceutical development. Meanwhile, the search for novel antiviral compounds with chemical genetic screens has led to the identification of antiviral agents with undefined drug mechanisms. Daclatasvir, an effective NS5A inhibitor, is one such example. In traditional methods to identify critical residues governing drug-protein interactions, wild type virus is passaged under drug treatment pressure, enabling the identification of resistant mutations evolved after multiple viral passages. However, this method only characterizes a fraction of the positively selected variants. Here we have simultaneously quantified the relative change in replication fitness as well as the relative sensitivity to Daclatasvir for all possible single amino acid mutations in the NS5A domain IA, thereby identifying the entire panel of positions that interact with the drug. Using mathematical models, we predicted which mutations pose the greatest risk of causing emergence of resistance under different scenarios of treatment compliance. The mutant fitness and drug-sensitivity profiles obtained can also inform the patient-specific use of Daclatasvir and may facilitate the development of second-generation drugs with a higher genetic barrier to resistance.
PMCID: PMC3983061  PMID: 24722365
4.  Measles Contributes to Rheumatoid Arthritis: Evidence from Pathway and Network Analyses of Genome-Wide Association Studies 
PLoS ONE  2013;8(10):e75951.
Growing evidence from epidemiological studies indicates the association between rheumatoid arthritis (RA) and measles. However, the exact mechanism for this association is still unclear now. We consider that the strong association between both diseases may be caused by shared genetic pathways. We performed a pathway analysis of large-scale RA genome-wide association studies (GWAS) dataset with 5,539 cases and 20,169 controls of European descent. Meanwhile, we evaluated our findings using previously identified RA loci, protein-protein interaction network and previous results from pathway analysis of RA and other autoimmune diseases GWAS. We confirmed four pathways including Cytokine-cytokine receptor interaction, Jak-STAT signaling, T cell receptor signaling and Cell adhesion molecules. Meanwhile, we highlighted for the first time the involvement of Measles and Intestinal immune network for IgA production pathways in RA. Our results may explain the strong association between RA and measles, which may be caused by the shared genetic pathway. We believe that our results will be helpful for future genetic studies in RA pathogenesis and may significantly assist in the development of therapeutic strategies.
PMCID: PMC3799991  PMID: 24204584
5.  Draft Genome Sequence of Alicyclobacillus hesperidum Strain URH17-3-68 
Journal of Bacteriology  2012;194(22):6348.
Alicyclobacillus hesperidum is a thermoacidophilic bacterium. We isolated strain URH17-3-68 from hot spring sludge in Tengchong, Yunnan province, China. Its extracellular products include heat- and acid-stable enzymes which are important for industrial applications. Here we report the draft genome of this strain.
PMCID: PMC3486414  PMID: 23105079
6.  Genome Sequence of Klebsiella oxytoca M5al, a Promising Strain for Nitrogen Fixation and Chemical Production 
Genome Announcements  2013;1(1):e00074-12.
Klebsiella oxytoca is an important microorganism for nitrogen fixation and chemical production. Here, we report an annotated draft genome of K. oxytoca strain M5al that contains 5,256 protein-coding genes and 95 structural RNAs, which provides a genetic basis for a better understanding of the physiology of this species.
PMCID: PMC3569362  PMID: 23405358
7.  Mutations in the RNA exosome component gene EXOSC3 cause pontocerebellar hypoplasia and spinal motor neuron degeneration 
Nature genetics  2012;44(6):704-708.
RNA exosomes are multi-subunit complexes conserved throughout evolution1 and emerging as the major cellular machinery for processing, surveillance, and turnover of a diverse spectrum of coding and non-coding RNA substrates essential for viability2. By exome sequencing, we discovered recessive mutations in exosome component 3 (EXOSC3) in four siblings with infantile spinal motor neuron disease, cerebellar atrophy, progressive microcephaly, and profound global developmental delay, consistent with pontocerebellar hypoplasia type 1 [PCH1; OMIM 607596]3–6. We identified mutations in EXOSC3 in an additional 8 of 12 families with PCH1. Morpholino knockdown of exosc3 in zebrafish embryos caused embryonic maldevelopment with small brain and poor motility, reminiscent of human clinical features and largely rescued by coinjected wildtype but not mutant exosc3 mRNA. These findings represent the first example of an RNA exosome gene responsible for a human disease and further implicate dysregulation of RNA processing in cerebellar and spinal motor neuron maldevelopment and degeneration.
PMCID: PMC3366034  PMID: 22544365
8.  Molecular diagnosis of putative Stargardt disease probands by exome sequencing 
BMC Medical Genetics  2012;13:67.
The commonest genetic form of juvenile or early adult onset macular degeneration is Stargardt Disease (STGD) caused by recessive mutations in the gene ABCA4. However, high phenotypic and allelic heterogeneity and a small but non-trivial amount of locus heterogeneity currently impede conclusive molecular diagnosis in a significant proportion of cases.
We performed whole exome sequencing (WES) of nine putative Stargardt Disease probands and searched for potentially disease-causing genetic variants in previously identified retinal or macular dystrophy genes. Follow-up dideoxy sequencing was performed for confirmation and to screen for mutations in an additional set of affected individuals lacking a definitive molecular diagnosis.
Whole exome sequencing revealed seven likely disease-causing variants across four genes, providing a confident genetic diagnosis in six previously uncharacterized participants. We identified four previously missed mutations in ABCA4 across three individuals. Likely disease-causing mutations in RDS/PRPH2, ELOVL, and CRB1 were also identified.
Our findings highlight the enormous potential of whole exome sequencing in Stargardt Disease molecular diagnosis and research. WES adequately assayed all coding sequences and canonical splice sites of ABCA4 in this study. Additionally, WES enables the identification of disease-related alleles in other genes. This work highlights the importance of collecting parental genetic material for WES testing as the current knowledge of human genome variation limits the determination of causality between identified variants and disease. While larger sample sizes are required to establish the precision and accuracy of this type of testing, this study supports WES for inherited early onset macular degeneration disorders as an alternative to standard mutation screening techniques.
PMCID: PMC3459799  PMID: 22863181
Stargardt Disease; Macular Degeneration; Exome; Mutation Screening; Molecular Diagnostics; ABCA4; PRPH2
9.  Identification of allele-specific alternative mRNA processing via transcriptome sequencing 
Nucleic Acids Research  2012;40(13):e104.
Establishing the functional roles of genetic variants remains a significant challenge in the post-genomic era. Here, we present a method, allele-specific alternative mRNA processing (ASARP), to identify genetically influenced mRNA processing events using transcriptome sequencing (RNA-Seq) data. The method examines RNA-Seq data at both single-nucleotide and whole-gene/isoform levels to identify allele-specific expression (ASE) and existence of allele-specific regulation of mRNA processing. We applied the methods to data obtained from the human glioblastoma cell line U87MG and primary breast cancer tissues and found that 26–45% of all genes with sufficient read coverage demonstrated ASE, with significant overlap between the two cell types. Our methods predicted potential mechanisms underlying ASE due to regulations affecting either whole-gene-level expression or alternative mRNA processing, including alternative splicing, alternative polyadenylation and alternative transcriptional initiation. Allele-specific alternative splicing and alternative polyadenylation may explain ASE in hundreds of genes in each cell type. Reporter studies following these predictions identified the causal single nucleotide variants (SNVs) for several allele-specific alternative splicing events. Finally, many genes identified in our study were also reported as disease/phenotype-associated genes in genome-wide association studies. Future applications of our approach may provide ample insights for a better understanding of the genetic basis of gene regulation underlying phenotypic diversity and disease mechanisms.
PMCID: PMC3401465  PMID: 22467206
10.  Complete Genome Sequence of Clostridium acetobutylicum DSM 1731, a Solvent-Producing Strain with Multireplicon Genome Architecture ▿  
Journal of Bacteriology  2011;193(18):5007-5008.
Clostridium acetobutylicum is an important microorganism for solvent production. We report the complete genome sequence of C. acetobutylicum DSM 1731, a genome with multireplicon architecture. Comparison with the sequenced type strain C. acetobutylicum ATCC 824, the genome of strain DSM1731 harbors a 1.7-kb insertion and a novel 11.1-kb plasmid, which might have been acquired during evolution.
PMCID: PMC3165653  PMID: 21742891
11.  Melanomas acquire resistance to B-RAF(V600E) inhibition by RTK or N-RAS upregulation 
Nature  2010;468(7326):973-977.
Activating B-RAF(V600E) (also known as BRAF) kinase mutations occur in ~7% of human malignancies and ~60% of melanomas1. Early clinical experience with a novel class I RAF-selective inhibitor, PLX4032, demonstrated an unprecedented 80% anti-tumour response rate among patients with B-RAF(V600E)-positive melanomas, but acquired drug resistance frequently develops after initial responses2. Hypotheses for mechanisms of acquired resistance to B-RAF inhibition include secondary mutations in B-RAF(V600E), MAPK reactivation, and activation of alternative survival pathways3–5. Here we show that acquired resistance to PLX4032 develops by mutually exclusive PDGFRβ (also known as PDGFRB) upregulation or N-RAS (also known as NRAS) mutations but not through secondary mutations in B-RAF(V600E). We used PLX4032-resistant sub-lines artificially derived from B-RAF(V600E)-positive melanoma cell lines and validated key findings in PLX4032-resistant tumours and tumour-matched, short-term cultures from clinical trial patients. Induction of PDGFRβ RNA, protein and tyrosine phosphorylation emerged as a dominant feature of acquired PLX4032 resistance in a subset of melanoma sub-lines, patient-derived biopsies and short-term cultures. PDGFRβ-upregulated tumour cells have low activated RAS levels and, when treated with PLX4032, do not reactivate the MAPK pathway significantly. In another subset, high levels of activated N-RAS resulting from mutations lead to significant MAPK pathway reactivation upon PLX4032 treatment. Knockdown of PDGFRβ or N-RAS reduced growth of the respective PLX4032-resistantsubsets. Overexpression of PDGFRβ or N-RAS(Q61K) conferred PLX4032 resistance to PLX4032-sensitive parental cell lines. Importantly, MAPK reactivation predicts MEK inhibitor sensitivity. Thus, melanomas escape B-RAF(V600E) targeting not through secondary B-RAF(V600E) mutations but via receptor tyrosine kinase (RTK)-mediated activation of alternative survival pathway(s) or activated RAS-mediated reactivation of the MAPK pathway, suggesting additional therapeutic strategies.
PMCID: PMC3143360  PMID: 21107323
12.  Prediction of Antimicrobial Peptides Based on Sequence Alignment and Feature Selection Methods 
PLoS ONE  2011;6(4):e18476.
Antimicrobial peptides (AMPs) represent a class of natural peptides that form a part of the innate immune system, and this kind of ‘nature's antibiotics’ is quite promising for solving the problem of increasing antibiotic resistance. In view of this, it is highly desired to develop an effective computational method for accurately predicting novel AMPs because it can provide us with more candidates and useful insights for drug design. In this study, a new method for predicting AMPs was implemented by integrating the sequence alignment method and the feature selection method. It was observed that, the overall jackknife success rate by the new predictor on a newly constructed benchmark dataset was over 80.23%, and the Mathews correlation coefficient is 0.73, indicating a good prediction. Moreover, it is indicated by an in-depth feature analysis that the results are quite consistent with the previously known knowledge that some amino acids are preferential in AMPs and that these amino acids do play an important role for the antimicrobial activity. For the convenience of most experimental scientists who want to use the prediction method without the interest to follow the mathematical details, a user-friendly web-server is provided at
PMCID: PMC3076375  PMID: 21533231
13.  Phenotype Sequencing: Identifying the Genes That Cause a Phenotype Directly from Pooled Sequencing of Independent Mutants 
PLoS ONE  2011;6(2):e16517.
Random mutagenesis and phenotype screening provide a powerful method for dissecting microbial functions, but their results can be laborious to analyze experimentally. Each mutant strain may contain 50–100 random mutations, necessitating extensive functional experiments to determine which one causes the selected phenotype. To solve this problem, we propose a “Phenotype Sequencing” approach in which genes causing the phenotype can be identified directly from sequencing of multiple independent mutants. We developed a new computational analysis method showing that 1. causal genes can be identified with high probability from even a modest number of mutant genomes; 2. costs can be cut many-fold compared with a conventional genome sequencing approach via an optimized strategy of library-pooling (multiple strains per library) and tag-pooling (multiple tagged libraries per sequencing lane). We have performed extensive validation experiments on a set of E. coli mutants with increased isobutanol biofuel tolerance. We generated a range of sequencing experiments varying from 3 to 32 mutant strains, with pooling on 1 to 3 sequencing lanes. Our statistical analysis of these data (4099 mutations from 32 mutant genomes) successfully identified 3 genes (acrB, marC, acrA) that have been independently validated as causing this experimental phenotype. It must be emphasized that our approach reduces mutant sequencing costs enormously. Whereas a conventional genome sequencing experiment would have cost $7,200 in reagents alone, our Phenotype Sequencing design yielded the same information value for only $1200. In fact, our smallest experiments reliably identified acrB and marC at a cost of only $110–$340.
PMCID: PMC3041756  PMID: 21364744
14.  Fyn and Src are Effectors of Oncogenic EGFR Signaling in Glioblastoma Patients 
Cancer research  2009;69(17):6889-6898.
Activating EGFR mutations are common in many cancers including glioblastoma. However, clinical responses to EGFR inhibitors are infrequent and short-lived. We demonstrate that the Src family kinases (SFKs) Fyn and Src are effectors of oncogenic EGFR signaling, enhancing invasion and tumor cell survival in vivo. Expression of a constitutively active EGFR mutant, EGFRvIII, resulted in activating phosphorylation and physical association with Src and Fyn, promoting tumor growth and motility. Gene silencing of Fyn and Src limited EGFR and EGFRvIII-dependent tumor cell motility. The SFK inhibitor dasatinib inhibited invasion, promoted tumor regression and induced apoptosis in vivo, significantly prolonging survival of an orthotopic glioblastoma model expressing endogenous EGFRvIII. Dasatinib enhanced the efficacy of an anti-EGFR monoclonal antibody (mAb 806) in vivo, further limiting tumor growth and extending survival. Examination of a large cohort of clinical samples demonstrated frequent coactivation of EGFR and SFKs in glioblastoma patients. These results establish a mechanism linking EGFR signaling with Fyn and Src activation to promote tumor progression and invasion in vivo and provide rationale for combined anti-EGFR and anti-SFK targeted therapies.
PMCID: PMC2770839  PMID: 19690143
Fyn; Src; EGFR; glioblastoma; targeted therapy
16.  U87MG Decoded: The Genomic Sequence of a Cytogenetically Aberrant Human Cancer Cell Line 
PLoS Genetics  2010;6(1):e1000832.
U87MG is a commonly studied grade IV glioma cell line that has been analyzed in at least 1,700 publications over four decades. In order to comprehensively characterize the genome of this cell line and to serve as a model of broad cancer genome sequencing, we have generated greater than 30× genomic sequence coverage using a novel 50-base mate paired strategy with a 1.4kb mean insert library. A total of 1,014,984,286 mate-end and 120,691,623 single-end two-base encoded reads were generated from five slides. All data were aligned using a custom designed tool called BFAST, allowing optimal color space read alignment and accurate identification of DNA variants. The aligned sequence reads and mate-pair information identified 35 interchromosomal translocation events, 1,315 structural variations (>100 bp), 191,743 small (<21 bp) insertions and deletions (indels), and 2,384,470 single nucleotide variations (SNVs). Among these observations, the known homozygous mutation in PTEN was robustly identified, and genes involved in cell adhesion were overrepresented in the mutated gene list. Data were compared to 219,187 heterozygous single nucleotide polymorphisms assayed by Illumina 1M Duo genotyping array to assess accuracy: 93.83% of all SNPs were reliably detected at filtering thresholds that yield greater than 99.99% sequence accuracy. Protein coding sequences were disrupted predominantly in this cancer cell line due to small indels, large deletions, and translocations. In total, 512 genes were homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and 35 by interchromosomal translocations to reveal a highly mutated cell line genome. Of the small homozygously mutated variants, 8 SNVs and 99 indels were novel events not present in dbSNP. These data demonstrate that routine generation of broad cancer genome sequence is possible outside of genome centers. The sequence analysis of U87MG provides an unparalleled level of mutational resolution compared to any cell line to date.
Author Summary
Glioblastoma has a particularly dismal prognosis with median survival time of less than fifteen months. Here, we describe the broad genome sequencing of U87MG, a commonly used and thus well-studied glioblastoma cell line. One of the major features of the U87MG genome is the large number of chromosomal abnormalities, which can be typical of cancer cell lines and primary cancers. The systematic, thorough, and accurate mutational analysis of the U87MG genome comprehensively identifies different classes of genetic mutations including single-nucleotide variations (SNVs), insertions/deletions (indels), and translocations. We found 2,384,470 SNVs, 191,743 small indels, and 1,314 large structural variations. Known gene models were used to predict the effect of these mutations on protein-coding sequence. Mutational analysis revealed 512 genes homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and up to 35 by interchromosomal translocations. The major mutational mechanisms in this brain cancer cell line are small indels and large structural variations. The genomic landscape of U87MG is revealed to be much more complex than previously thought based on lower resolution techniques. This mutational analysis serves as a resource for past and future studies on U87MG, informing them with a thorough description of its mutational state.
PMCID: PMC2813426  PMID: 20126413
17.  Improving the efficiency of genomic loci capture using oligonucleotide arrays for high throughput resequencing 
BMC Genomics  2009;10:646.
The emergence of next-generation sequencing technology presents tremendous opportunities to accelerate the discovery of rare variants or mutations that underlie human genetic disorders. Although the complete sequencing of the affected individuals' genomes would be the most powerful approach to finding such variants, the cost of such efforts make it impractical for routine use in disease gene research. In cases where candidate genes or loci can be defined by linkage, association, or phenotypic studies, the practical sequencing target can be made much smaller than the whole genome, and it becomes critical to have capture methods that can be used to purify the desired portion of the genome for shotgun short-read sequencing without biasing allelic representation or coverage. One major approach is array-based capture which relies on the ability to create a custom in-situ synthesized oligonucleotide microarray for use as a collection of hybridization capture probes. This approach is being used by our group and others routinely and we are continuing to improve its performance.
Here, we provide a complete protocol optimized for large aggregate sequence intervals and demonstrate its utility with the capture of all predicted amino acid coding sequence from 3,038 human genes using 241,700 60-mer oligonucleotides. Further, we demonstrate two techniques by which the efficiency of the capture can be increased: by introducing a step to block cross hybridization mediated by common adapter sequences used in sequencing library construction, and by repeating the hybridization capture step. These improvements can boost the targeting efficiency to the point where over 85% of the mapped sequence reads fall within 100 bases of the targeted regions.
The complete protocol introduced in this paper enables researchers to perform practical capture experiments, and includes two novel methods for increasing the targeting efficiency. Coupled with the new massively parallel sequencing technologies, this provides a powerful approach to identifying disease-causing genetic variants that can be localized within the genome by traditional methods.
PMCID: PMC2808330  PMID: 20043857
18.  Identification of EpCAM as the Gene for Congenital Tufting Enteropathy 
Gastroenterology  2008;135(2):429-437.
Background and Aims
Congenital Tufting Enteropathy (CTE) is a rare autosomal recessive diarrheal disorder presenting in the neonatal period. CTE is characterized by intestinal epithelial cell dysplasia leading to severe malabsorption and significant morbidity and mortality. The pathogenesis and genetics of this disorder are not well understood. The objective of this study was to identify the gene responsible for CTE.
A family with 2 children affected with CTE was identified. The affected children are double second cousins providing significant statistical power for linkage. Using Affymetrix 50K Single Nucleotide Polymorphism (SNP) chips, genotyping was performed on only two patients and one unaffected sibling. Direct DNA sequencing of candidate genes, RT-PCR, immunohistochemistry, and Western blotting were performed on specimens from patients and controls.
SNP homozygosity mapping identified a unique 6.5 MB haplotype of homozygous SNPs on chromosome 2p21 where approximately 40 genes are located. Direct sequencing of genes in this region revealed homozygous G > A substitution at the donor splice site of exon 4 in Epithelial Cell Adhesion Molecule (EpCAM) of affected patients. RT-PCR of duodenal tissue demonstrated a novel alternative splice form with deletion of exon 4 in affected patients. Immuno-histochemistry and Western blot of patient intestinal tissue revealed decreased expression of EpCAM. Direct sequencing of EpCAM from two additional unrelated patients revealed novel mutations in the gene.
Mutations in the gene for EpCAM are responsible for Congenital Tufting Enteropathy. This information will be used to gain further insight into the molecular mechanisms of this disease.
PMCID: PMC2574708  PMID: 18572020
19.  MicroRNA Profiling and Head and Neck Cancer 
Head and neck/oral cancer (HNOC) is a devastating disease. Despite advances in diagnosis and treatment, mortality rates have not improved significantly over the past three decades. Improvement in patient survival requires a better understanding of the disease progression so that HNOC can be detected early in the disease process and targeted therapeutic interventions can be deployed. Accumulating evidence suggests that microRNAs play important roles in many human cancers. They are pivotal regulators of diverse cellular processes including proliferation, differentiation, apoptosis, survival, motility, and morphogenesis. MicroRNA expression patterns may become powerful biomarkers for diagnosis and prognosis of HNOC. In addition, microRNA therapy could be a novel strategy for HNOC prevention and therapeutics. Recent advances in microRNA expression profiling have led to a better understanding of the cancer pathogenesis. In this review, we will survey recent technological advances in microRNA profiling and their applications in defining microRNA markers/targets for cancer prediction, diagnostics, treatment, and prognostics. MicroRNA alterations that consistently identified in HNOC will be discussed, such as upregulation of miR-21, miR-31, miR-155, and downregulation of miR-26b, miR-107, miR-133b, miR-138, and miR-139.
PMCID: PMC2688814  PMID: 19753298
20.  Shotgun bisulfite sequencing of the Arabidopsis genome reveals DNA methylation patterning 
Nature  2008;452(7184):215-219.
Cytosine DNA methylation is important in regulating gene expression and in silencing transposons and other repetitive sequences 1, 2. Recent genomic studies in Arabidopsis have revealed that many endogenous genes are methylated either within their promoters or within their transcribed regions, and that gene methylation is highly correlated with transcription levels 3-5. However, plants have different types of methylation controlled by different genetic pathways, and detailed information on the methylation status of each cytosine in any given genome is lacking. To this end, we generated a map at single base pair resolution of methylated cytosines for Arabidopsis, by combining bisulfite treatment of genomic DNA with ultra-high-throughput sequencing using the Illumina 1G Genome Analyzer and Solexa sequencing technology 6. This approach, termed BS-Seq, unlike previous microarray-based methods, allows one to sensitively measure cytosine methylation on a genome-wide scale within specific sequence contexts. We describe methylation on previously inaccessible components of the genome along with an analysis of the DNA methylation sequence composition and distribution. We also describe the effect of various DNA methylation mutants on genome-wide methylation patterns, and demonstrate that our newly developed library construction and computational methods can be applied to large genomes such as mouse.
PMCID: PMC2377394  PMID: 18278030
21.  Genomic assessments of the frequent LOH region on 8p22-p21.3 in head and neck squamous cell carcinoma 
Cancer genetics and cytogenetics  2007;176(2):100-106.
Most human cancers are characterized by genetic instabilities. Chromosomal aberrations include segments of allelic imbalance identifiable by loss of heterozygosity (LOH) at polymorphic loci, which may be used to implicate regions harboring tumor suppressor genes. Here we performed whole genome LOH profiling on over 40 human head and neck squamous cell carcinoma (HNSCC) cell lines. Several frequent LOH regions have been identified on chromosomal arms 3p, 4p, 4q, 5q, 8p, 9p, 10p, 11q, and 17p. A genomic region of ∼7 Mb located at 8p22-p21.3 exhibits the most frequent LOH (87.9%), which suggested that this region harbors important tumor suppressor gene(s). Mitochondrial tumor suppressor gene 1 (MTUS1) is a recently identified candidate tumor suppressor gene that resides in this region. Consistent down-regulation in expression was observed in HNSCC for MTUS1 as measured by real-time quantitative RT-PCR. Sequence analysis of MTUS1 gene in HNSCC revealed several important sequence variants in the exon regions of this gene. Thus, our results suggested that MTUS1 is one of the candidate tumor suppressor gene(s) reside in 8p22-p21.3 for HNSCC. The identification of these candidate genes will facilitate the understanding of tumorigenesis of HNSCC. Further studies are needed to functionally evaluate those candidate genes.
PMCID: PMC2000851  PMID: 17656251
22.  Cartilage-selective genes identified in genome-scale analysis of non-cartilage and cartilage gene expression 
BMC Genomics  2007;8:165.
Cartilage plays a fundamental role in the development of the human skeleton. Early in embryogenesis, mesenchymal cells condense and differentiate into chondrocytes to shape the early skeleton. Subsequently, the cartilage anlagen differentiate to form the growth plates, which are responsible for linear bone growth, and the articular chondrocytes, which facilitate joint function. However, despite the multiplicity of roles of cartilage during human fetal life, surprisingly little is known about its transcriptome. To address this, a whole genome microarray expression profile was generated using RNA isolated from 18–22 week human distal femur fetal cartilage and compared with a database of control normal human tissues aggregated at UCLA, termed Celsius.
161 cartilage-selective genes were identified, defined as genes significantly expressed in cartilage with low expression and little variation across a panel of 34 non-cartilage tissues. Among these 161 genes were cartilage-specific genes such as cartilage collagen genes and 25 genes which have been associated with skeletal phenotypes in humans and/or mice. Many of the other cartilage-selective genes do not have established roles in cartilage or are novel, unannotated genes. Quantitative RT-PCR confirmed the unique pattern of gene expression observed by microarray analysis.
Defining the gene expression pattern for cartilage has identified new genes that may contribute to human skeletogenesis as well as provided further candidate genes for skeletal dysplasias. The data suggest that fetal cartilage is a complex and transcriptionally active tissue and demonstrate that the set of genes selectively expressed in the tissue has been greatly underestimated.
PMCID: PMC1906768  PMID: 17565682
23.  A forward-backward fragment assembling algorithm for the identification of genomic amplification and deletion breakpoints using high-density single nucleotide polymorphism (SNP) array 
BMC Bioinformatics  2007;8:145.
DNA copy number aberration (CNA) is one of the key characteristics of cancer cells. Recent studies demonstrated the feasibility of utilizing high density single nucleotide polymorphism (SNP) genotyping arrays to detect CNA. Compared with the two-color array-based comparative genomic hybridization (array-CGH), the SNP arrays offer much higher probe density and lower signal-to-noise ratio at the single SNP level. To accurately identify small segments of CNA from SNP array data, segmentation methods that are sensitive to CNA while resistant to noise are required.
We have developed a highly sensitive algorithm for the edge detection of copy number data which is especially suitable for the SNP array-based copy number data. The method consists of an over-sensitive edge-detection step and a test-based forward-backward edge selection step.
Using simulations constructed from real experimental data, the method shows high sensitivity and specificity in detecting small copy number changes in focused regions. The method is implemented in an R package FASeg, which includes data processing and visualization utilities, as well as libraries for processing Affymetrix SNP array data.
PMCID: PMC1868765  PMID: 17477871

Results 1-23 (23)