1.  CXCR5 polymorphisms in non-Hodgkin lymphoma risk and prognosis 
CXCR5 [chemokine (C-X-C motif) receptor 5; also known as Burkitt lymphoma receptor 1 (BCR1)] is expressed on mature B-cells, subsets of CD4+ and CD8+ T-cells, and skin-derived migratory dendritic cells. Together with its ligand, CXCL13, CXCR5 is involved in guiding B-cells into the B-cell zones of secondary lymphoid organs as well as T-cell migration. This study evaluated the role of common germline genetic variation in CXCR5 in the risk and prognosis of non-Hodgkin lymphoma (NHL) using a clinic-based study of 1521 controls and 2694 NHL cases including 710 chronic lymphocytic leukemia/small lymphocytic lymphoma (CLL), 586 diffuse large B-cell lymphoma (DLBCL), 588 follicular lymphoma (FL), 137 mantle cell lymphoma (MCL), 230 marginal zone lymphoma (MZL) and 158 peripheral T-cell lymphoma (PTCL). Of the ten CXCR5 tag SNPs in our study, five were associated with risk of NHL, with rs1790192 having the strongest association (OR=1.19, 95%CI 1.08–1.30; p=0.0003). This SNP was most strongly associated with the risk of FL (OR=1.44, 95%CI 1.25–1.66; p=3.1×10−7), with a lower degree of association with DLBCL (OR=1.16, 95%CI 1.01–1.33; p=0.04) and PTCL (OR=1.29, 95%CI 1.02–1.64; p=0.04) but no association with the risk of MCL or MZL. For FL patients that were observed as initial disease management, the number of minor alleles of rs1790192 was associated with better event-free survival (EFS) (HR=0.64; 95%CI 0.47–0.87; p=0.004). These results provide additional evidence for a role of host genetic variation in CXCR5 in lymphomagenesis, particularly for FL.
PMCID: PMC3758443  PMID: 23812490
non-Hodgkin lymphoma; SNPs; prognosis; prospective cohort; case-control
2.  Clinical Correlates of Autosomal Chromosomal Abnormalities in an Electronic Medical Record–Linked Genome-Wide Association Study: A Case Series 
Although mosaic autosomal chromosomal abnormalities are being increasingly detected as part of high-density genotyping studies, the clinical correlates are unclear. From an electronic medical record (EMR)–based genome-wide association study (GWAS) of peripheral arterial disease, log-R-ratio and B-allele-frequency data were used to identify mosaic autosomal chromosomal abnormalities including copy number variation and loss of heterozygosity. The EMRs of patients with chromosomal abnormalities and those without chromosomal abnormalities were reviewed to compare clinical characteristics. Among 3336 study participants, 0.75% (n = 25, mean age = 74.8 ± 10.7 years, 64% men) had abnormal intensity plots indicative of autosomal chromosomal abnormalities. A hematologic malignancy was present in 8 patients (32%), of whom 4 also had a solid organ malignancy while 2 patients had a solid organ malignancy only. In 50 age- and sex-matched participants without chromosomal abnormalities, there was a lower rate of hematologic malignancies (2% vs 32%, P < .001) but not solid organ malignancies (20% vs 24%, P = .69). We also report the clinical characteristics of each patient with the observed chromosomal abnormalities. Interestingly, among 5 patients with 20q deletions, 4 had a myeloproliferative disorder while all 3 men in this group had prostate cancer. In summary, in a GWAS of 3336 adults, 0.75% had autosomal chromosomal abnormalities and nearly a third of them had hematologic malignancies. A potential novel association between 20q deletions, myeloproliferative disorders, and prostate cancer was also noted.
PMCID: PMC4130164  PMID: 25125939
copy number variation; genome-wide association studies; loss of heterozygosity; mosaic abnormalities; mosaic deletion; myeloproliferative disorders; prostate cancer; unipaternal disomy
3.  Identification of Novel Variants in Colorectal Cancer Families by High-Throughput Exome Sequencing 
Colorectal cancer (CRC) in densely affected families without Lynch Syndrome may be due to mutations in undiscovered genetic loci. Familial linkage analyses have yielded disparate results; the use of exome sequencing in coding regions may identify novel segregating variants.
We completed exome sequencing on 40 affected cases from 16 multi-case pedigrees to identify novel loci. Variants shared among all sequenced cases within each family were identified and filtered to exclude common variants and single nucleotide variants (SNVs) predicted to be benign.
We identified 32 nonsense or splice-site SNVs, 375 missense SNVs, 1,394 synonymous or non-coding SNVs, and 50 indels in the 16 families. Of particular interest are two validated and replicated missense variants in CENPE and KIF23, which are both located within previously reported CRC linkage regions, on chromosomes 1 and 15, respectively.
Whole-exome sequencing identified DNA variants in multiple genes. Additional sequencing of these genes in additional samples will further elucidate the role of variants in these regions in colorectal cancer susceptibility.
Exome sequencing of familial CRC cases can identify novel rare variants that may influence disease risk.
PMCID: PMC3704223  PMID: 23637064
colorectal cancer; familial and hereditary cancers; exome sequencing; rare variants; family study design
4.  A Phase I Trial of Immunostimulatory CpG 7909 Oligodeoxynucleotide and 90Yttrium Ibritumomab Tiuxetan Radioimmunotherapy for Relapsed B-cell Non-Hodgkin Lymphoma 
American journal of hematology  2013;88(7):589-593.
Radioimmunotherapy (RIT) for relapsed indolent non-Hodgkin lymphoma produces overall response rates (ORR) of 80% with mostly partial remissions. Synthetic CpG oligonucleotides change the phenotype of malignant B-cells, are immunostimulatory, and can produce responses when injected intratumorally and combined with conventional radiation. In this phase I trial we tested systemic administration of both CpG and RIT. Eligible patients had biopsy-proven previously treated CD20+ B-cell NHL and met criteria for RIT. Patients received rituximab 250 mg/m2 days 1,8, and 15; 111In-ibritumomab tiuxetan days 1, 8; CpG 7909 days 6, 13, 20, 27; and 0.4 mCi/kg of 90Y-ibritumomab tiuxetan day 15. The doses of CpG 7909 tested were 0.08, 0.16, 0.32 (six patients each) and 0.48 mg/kg (12 patients) IV over 2 hours without dose limiting toxicity. The ORR was 93% (28/30) with 63% (19/30) complete remission (CR); median progression free survival of 42.7 months (95% CI 18-NR); and median duration of response (DR) of 35 months (4.6-76+). Correlative studies demonstrated a decrease in IL10 and TNFα, and an increase in IL1β, in response to therapy. CpG 7909 at a dose of 0.48 mg/kg is safe with standard RIT and produces a high CR rate and long DR; these results warrant confirmation.
PMCID: PMC3951424  PMID: 23619698
lymphoma; radioimmunotherapy; rituximab; ibritumomab tiuxetan; CpG 7909
5.  MAP-RSeq: Mayo Analysis Pipeline for RNA sequencing 
BMC Bioinformatics  2014;15:224.
Although the costs of next generation sequencing technology have decreased over the past years, there is still a lack of simple-to-use applications, for a comprehensive analysis of RNA sequencing data. There is no one-stop shop for transcriptomic genomics. We have developed MAP-RSeq, a comprehensive computational workflow that can be used for obtaining genomic features from transcriptomic sequencing data, for any genome.
For optimization of tools and parameters, MAP-RSeq was validated using both simulated and real datasets. MAP-RSeq workflow consists of six major modules such as alignment of reads, quality assessment of reads, gene expression assessment and exon read counting, identification of expressed single nucleotide variants (SNVs), detection of fusion transcripts, summarization of transcriptomics data and final report. This workflow is available for Human transcriptome analysis and can be easily adapted and used for other genomes. Several clinical and research projects at the Mayo Clinic have applied the MAP-RSeq workflow for RNA-Seq studies. The results from MAP-RSeq have thus far enabled clinicians and researchers to understand the transcriptomic landscape of diseases for better diagnosis and treatment of patients.
Our software provides gene counts, exon counts, fusion candidates, expressed single nucleotide variants, mapping statistics, visualizations, and a detailed research data report for RNA-Seq. The workflow can be executed on a standalone virtual machine or on a parallel Sun Grid Engine cluster. The software can be downloaded from
PMCID: PMC4228501  PMID: 24972667
Transcriptomic sequencing; RNA-Seq; Bioinformatics workflow; Gene expression; Exon counts; Fusion transcripts; Expressed single nucleotide variants; RNA-Seq reports
6.  PatternCNV: a versatile tool for detecting copy number changes from exome sequencing data 
Bioinformatics  2014;30(18):2678-2680.
Motivation: Exome sequencing (exome-seq) data, which are typically used for calling exonic mutations, have also been utilized in detecting DNA copy number variations (CNVs). Despite the existence of several CNV detection tools, there is still a great need for a sensitive and an accurate CNV-calling algorithm with built-in QC steps, and does not require a paired reference for each sample.
Results: We developed a novel method named PatternCNV, which (i) accounts for the read coverage variations between exons while leveraging the consistencies of this variability across different samples; (ii) reduces alignment BAM files to WIG format and therefore greatly accelerates computation; (iii) incorporates multiple QC measures designed to identify outlier samples and batch effects; and (iv) provides a variety of visualization options including chromosome, gene and exon-level views of CNVs, along with a tabular summarization of the exon-level CNVs. Compared with other CNV-calling algorithms using data from a lymphoma exome-seq study, PatternCNV has higher sensitivity and specificity.
Availability and implementation: The software for PatternCNV is implemented using Perl and R, and can be used in Mac or Linux environments. Software and user manual are available at, and R package at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC4155258  PMID: 24876377
7.  Mapping of the IRF8 gene identifies a 3’ UTR variant associated with risk of chronic lymphocytic leukemia but not other common non-Hodgkin lymphoma subtypes 
Our genome-wide association study (GWAS) of chronic lymphocytic leukemia (CLL) identified 4 highly-correlated intronic variants within the IRF8 gene that were associated with CLL. These results were further supported by a recent meta-analysis of our GWAS with two other GWAS of CLL, supporting the IRF8 gene as a strong candidate for CLL risk.
To refine the genetic association of CLL risk, we performed Sanger sequencing of IRF8 in 94 CLL cases and 96 controls. We then performed fine-mapping by genotyping 39 variants (of which 10 were identified from sequencing) in 745 CLL cases and 1521 controls. We also assessed these associations with risk of other non-Hodgkin lymphoma (NHL) subtypes.
The strongest association with CLL risk was observed with a common SNP located within the 3’ UTR of IRF8 (rs1044873, log additive odds ratio = 0.7, P=1.81×10−6). This SNP was not associated with the other NHL subtypes (all P>0.05).
We provide evidence that rs1044873 in the IRF8 gene accounts for the initial GWAS signal for CLL risk. This association appears to be unique to CLL with little support for association with other common NHL subtypes. Future work is needed to assess functional role of IRF8 in CLL etiology.
These data provide support that a functional variant within the 3’ UTR of IRF8 may be driving the GWAS signal seen on 16q24.1 for CLL risk.
PMCID: PMC3596428  PMID: 23307532
CLL; NHL; SNPs; IRF8; risk locus
8.  Chronic Caloric Restriction Preserves Mitochondrial Function in Senescence Without Increasing Mitochondrial Biogenesis 
Cell metabolism  2012;16(6):777-788.
Caloric restriction (CR) mitigates many detrimental effects of aging and prolongs lifespan. CR has been suggested to increase mitochondrial biogenesis, thereby attenuating age-related declines in mitochondrial function; a concept that is challenged by recent studies. Here we show that lifelong CR in mice prevents age-related loss of mitochondrial oxidative capacity and efficiency, measured in isolated mitochondria and permeabilized muscle fibers. We find that these beneficial effects of CR occur without increasing mitochondrial abundance. Whole-genome expression profiling and large-scale proteomic surveys revealed expression patterns inconsistent with increased mitochondrial biogenesis, which is further supported by lower mitochondrial protein synthesis with CR. We find that CR decreases oxidant emission, increases antioxidant scavenging, and minimizes oxidative damage to DNA and protein. These results demonstrate that CR preserves mitochondrial function by protecting the integrity and function of existing cellular components rather than by increasing mitochondrial biogenesis.
PMCID: PMC3544078  PMID: 23217257
Caloric Restriction; Calorie restriction; dietary restriction; aging; mitochondria; protein synthesis; skeletal muscle; mitochondrial biogenesis
9.  Gene Expression, Single Nucleotide Variant and Fusion Transcript Discovery in Archival Material from Breast Tumors 
PLoS ONE  2013;8(11):e81925.
Advantages of RNA-Seq over array based platforms are quantitative gene expression and discovery of expressed single nucleotide variants (eSNVs) and fusion transcripts from a single platform, but the sensitivity for each of these characteristics is unknown. We measured gene expression in a set of manually degraded RNAs, nine pairs of matched fresh-frozen, and FFPE RNA isolated from breast tumor with the hybridization based, NanoString nCounter (226 gene panel) and with whole transcriptome RNA-Seq using RiboZeroGold ScriptSeq V2 library preparation kits. We performed correlation analyses of gene expression between samples and across platforms. We then specifically assessed whole transcriptome expression of lincRNA and discovery of eSNVs and fusion transcripts in the FFPE RNA-Seq data. For gene expression in the manually degraded samples, we observed Pearson correlations of >0.94 and >0.80 with NanoString and ScriptSeq protocols, respectively. Gene expression data for matched fresh-frozen and FFPE samples yielded mean Pearson correlations of 0.874 and 0.783 for NanoString (226 genes) and ScriptSeq whole transcriptome protocols respectively, p<2x10-16. Specifically for lincRNAs, we observed superb Pearson correlation (0.988) between matched fresh-frozen and FFPE pairs. FFPE samples across NanoString and RNA-Seq platforms gave a mean Pearson correlation of 0.838. In FFPE libraries, we detected 53.4% of high confidence SNVs and 24% of high confidence fusion transcripts. Sensitivity of fusion transcript detection was not overcome by an increase in depth of sequencing up to 3-fold (increase from ~56 to ~159 million reads). Both NanoString and ScriptSeq RNA-Seq technologies yield reliable gene expression data for degraded and FFPE material. The high degree of correlation between NanoString and RNA-Seq platforms suggests discovery based whole transcriptome studies from FFPE material will produce reliable expression data. The RiboZeroGold ScriptSeq protocol performed particularly well for lincRNA expression from FFPE libraries, but detection of eSNV and fusion transcripts was less sensitive.
PMCID: PMC3838386  PMID: 24278466
10.  An Integrated Model of the Transcriptome of HER2-Positive Breast Cancer 
PLoS ONE  2013;8(11):e79298.
Our goal in these analyses was to use genomic features from a test set of primary breast tumors to build an integrated transcriptome landscape model that makes relevant hypothetical predictions about the biological and/or clinical behavior of HER2-positive breast cancer. We interrogated RNA-Seq data from benign breast lesions, ER+, triple negative, and HER2-positive tumors to identify 685 differentially expressed genes, 102 alternatively spliced genes, and 303 genes that expressed single nucleotide sequence variants (eSNVs) that were associated with the HER2-positive tumors in our survey panel. These features were integrated into a transcriptome landscape model that identified 12 highly interconnected genomic modules, each of which represents a cellular processes pathway that appears to define the genomic architecture of the HER2-positive tumors in our test set. The generality of the model was confirmed by the observation that several key pathways were enriched in HER2-positive TCGA breast tumors. The ability of this model to make relevant predictions about the biology of breast cancer cells was established by the observation that integrin signaling was linked to lapatinib sensitivity in vitro and strongly associated with risk of relapse in the NCCTG N9831 adjuvant trastuzumab clinical trial dataset. Additional modules from the HER2 transcriptome model, including ubiquitin-mediated proteolysis, TGF-beta signaling, RHO-family GTPase signaling, and M-phase progression, were linked to response to lapatinib and paclitaxel in vitro and/or risk of relapse in the N9831 dataset. These data indicate that an integrated transcriptome landscape model derived from a test set of HER2-positive breast tumors has potential for predicting outcome and for identifying novel potential therapeutic strategies for this breast cancer subtype.
PMCID: PMC3815156  PMID: 24223926
11.  A Two-Stage Evaluation of Genetic Variation in Immune and Inflammation Genes with Risk of Non-Hodgkin Lymphoma Identifies New Susceptibility Locus in 6p21.3 Region 
Non-Hodgkin lymphoma (NHL) is a malignancy of lymphocytes, and there is growing evidence for a role of germline genetic variation in immune genes in NHL etiology.
To identify susceptibility immune genes, we conducted a 2-stage analysis of single nucleotide polymorphisms (SNPs) from 1,253 genes using the Immune and Inflammation Panel. In Stage 1, we genotyped 7,670 SNPs in 425 NHL cases and 465 controls, and in Stage 2 we genotyped the top 768 SNPs on an additional 584 cases and 768 controls. The association of individual SNPs with NHL risk from a log-additive model was assessed using the Odds Ratios (ORs) and 95% confidence intervals (CI).
In the pooled analysis, only the TAP2 coding SNP rs241447 (MAF=0.26; Thr655Ala) at 6p21.3 (OR=1.34, 95%CI 1.17-1.53) achieved statistical significance after accounting for multiple testing (p=3.1 × 10−5). The TAP2 SNP was strongly associated with follicular lymphoma (FL, OR=1.82, 95%CI 1.46-2.26; p=6.9 × 10−8), and was independent of other known loci (rs10484561 and rs2647012) from this region. The TAP2 SNP was also associated with diffuse large B-cell lymphoma (DLBCL, OR=1.38, 95% CI 1.08-1.77; p=0.011), but not chronic lymphocytic leukemia (OR=1.08; 95% CI 0.88-1.32). Higher TAP2 expression was associated with the risk allele in both FL and DLBCL tumors.
Genetic variation in TAP2 was associated with NHL risk overall, and FL risk in particular, and this was independent of other established loci from 6p21.3.
Genetic variation in antigen presentation of HLA class I molecules may play a role in lymphomagenesis.
PMCID: PMC3467356  PMID: 22911334
genetics; non-Hodgkin lymphoma; immune function; single nucleotide polymorphisms
12.  Impact of Library Preparation on Downstream Analysis and Interpretation of RNA-Seq Data: Comparison between Illumina PolyA and NuGEN Ovation Protocol 
PLoS ONE  2013;8(8):e71745.
The sequencing by the PolyA selection is the most common approach for library preparation. With limited amount or degraded RNA, alternative protocols such as the NuGEN have been developed. However, it is not yet clear how the different library preparations affect the downstream analyses of the broad applications of RNA sequencing.
Methods and Materials
Eight human mammary epithelial cell (HMEC) lines with high quality RNA were sequenced by Illumina’s mRNA-Seq PolyA selection and NuGEN ENCORE library preparation. The following analyses and comparisons were conducted: 1) the numbers of genes captured by each protocol; 2) the impact of protocols on differentially expressed gene detection between biological replicates; 3) expressed single nucleotide variant (SNV) detection; 4) non-coding RNAs, particularly lincRNA detection; and 5) intragenic gene expression.
Sequences from the NuGEN protocol had lower (75%) alignment rate than the PolyA (over 90%). The NuGEN protocol detected fewer genes (12–20% less) with a significant portion of reads mapped to non-coding regions. A large number of genes were differentially detected between the two protocols. About 17–20% of the differentially expressed genes between biological replicates were commonly detected between the two protocols. Significantly higher numbers of SNVs (5–6 times) were detected in the NuGEN samples, which were largely from intragenic and intergenic regions. The NuGEN captured fewer exons (25% less) and had higher base level coverage variance. While 6.3% of reads were mapped to intragenic regions in the PolyA samples, the percentages were much higher (20–25%) for the NuGEN samples. The NuGEN protocol did not detect more known non-coding RNAs such as lincRNAs, but targeted small and “novel” lincRNAs.
Different library preparations can have significant impacts on downstream analysis and interpretation of RNA-seq data. The NuGEN provides an alternative for limited or degraded RNA but it has limitations for some RNA-seq applications.
PMCID: PMC3747248  PMID: 23977132
13.  A genome-wide association study of venous thromboembolism identifies risk variants in chromosomes 1q24.2 and 9q 
To identify venous thromboembolism (VTE) disease-susceptibility genes.
We performed in silico genome wide association (GWAS) analyses using genotype data imputed to ~2.5 million single nucleotide polymorphisms (SNPs) from adults with objectively-diagnosed VTE (n=1503), and controls frequency-matched on age and sex (n=1459; discovery population). SNPs exceeding genome-wide significance were replicated in a separate population (VTE cases, n=1407; controls, n=1418). Genes associated with VTE were resequenced.
Seven SNPs exceeded genome-wide significance (P < 5 × 10-8); four on chromosome 1q24.2 (F5 rs6025 [Factor V Leiden], BLZF1 rs7538157, NME7 rs16861990 and SLC19A2 rs2038024) and three on chromosome 9q34.2 (ABO rs2519093 [ABO intron 1], rs495828, rs8176719 [ABO blood type O allele]). The replication study confirmed a significant association of F5, NME7, and ABO with VTE. However, F5 was the main signal on 1q24.2 as only ABO SNPs remained significantly associated with VTE after adjusting for F5 rs6025. This 1q24.2 region was shown to be inherited as a haplotype block. ABO resequencing identified 15 novel single nucleotide variations (SNV) in ABO intron 6 and the ABO 3’ UTR that were strongly associated with VTE (P < 10-4) and belonged to three distinct linkage disequilibrium (LD) blocks; none were in LD with ABO rs8176719 or rs2519093. Our sample size provided 80% power to detect odds ratios=2.0 and 1.51 for minor allele frequencies=0.05 and 0.5, respectively (α=1 × 10-8; 1% VTE prevalence).
Aside from F5 rs6025, ABO rs8176719 and rs2519093, and F2 rs1799963, additional common and high VTE-risk SNPs among whites are unlikely.
PMCID: PMC3419811  PMID: 22672568
venous thromboembolism; deep vein thrombosis; pulmonary embolism; genetics; genome-wide scan; epidemiology
14.  Early life sun exposure, vitamin D-related gene variants, and risk of non-Hodgkin lymphoma 
Cancer causes & control : CCC  2012;23(7):1017-1029.
It has been hypothesized that vitamin D mediates the inverse relationship between sun exposure and non-Hodgkin lymphoma (NHL) risk reported in several recent studies. We evaluated the association of self-reported sun exposure at ages <13, 13–21, 22–40, and 41+ years and 19 single nucleotide polymorphisms (SNPs) from 4 candidate genes relevant to vitamin D metabolism (RXR, VDR, CYP24A1, CYP27B1) with NHL risk.
This analysis included 1,009 newly diagnosed NHL cases and 1,233 frequency-matched controls from an ongoing clinic-based study. Odds ratios (OR), 95 % confidence intervals (CI), and tests for trend were estimated using unconditional logistic regression.
There was a significant decrease in NHL risk with increased sun exposure at ages 13–21 years (OR≥15 vs. ≤3 h/week = 0.68; 95 % CI, 0.43–1.08; ptrend = 0.0025), which attenuated for older ages at exposure. We observed significant main effect associations for 3 SNPs in VDR and 1 SNP in CYP24A1: rs886441 (ORper-allele = 0.82; 95 % CI, 0.70–0.96; p = 0.016), rs3819545 (ORper-allele = 1.24; 95 % CI, 1.10–1.40; p = 0.00043), and rs2239186 (ORper-allele = 1.22; 95 % CI, 1.05–1.41; p = 0.0095) for VDR and rs2762939 (ORper-allele = 0.85; 95 % CI, 0.75–0.98; p = 0.023) for CYP24A1. Moreover, the effect of sun exposure at age 13–21 years on overall NHL risk appears to be modified by germline variation in VDR (rs4516035; pinteraction = 0.0066). Exploratory analysis indicated potential heterogeneity of these associations by NHL subtype.
These results suggest that germline genetic variation in VDR, and therefore the vitamin D pathway, may mediate an association between early life sun exposure and NHL risk.
PMCID: PMC3589750  PMID: 22544453
Ultraviolet radiation; Vitamin D; VDR; Molecular epidemiology; Non-Hodgkin lymphoma
15.  Concordance of Changes in Metabolic Pathways Based on Plasma Metabolomics and Skeletal Muscle Transcriptomics in Type 1 Diabetes 
Diabetes  2012;61(5):1004-1016.
Insulin regulates many cellular processes, but the full impact of insulin deficiency on cellular functions remains to be defined. Applying a mass spectrometry–based nontargeted metabolomics approach, we report here alterations of 330 plasma metabolites representing 33 metabolic pathways during an 8-h insulin deprivation in type 1 diabetic individuals. These pathways included those known to be affected by insulin such as glucose, amino acid and lipid metabolism, Krebs cycle, and immune responses and those hitherto unknown to be altered including prostaglandin, arachidonic acid, leukotrienes, neurotransmitters, nucleotides, and anti-inflammatory responses. A significant concordance of metabolome and skeletal muscle transcriptome–based pathways supports an assumption that plasma metabolites are chemical fingerprints of cellular events. Although insulin treatment normalized plasma glucose and many other metabolites, there were 71 metabolites and 24 pathways that differed between nondiabetes and insulin-treated type 1 diabetes. Confirmation of many known pathways altered by insulin using a single blood test offers confidence in the current approach. Future research needs to be focused on newly discovered pathways affected by insulin deficiency and systemic insulin treatment to determine whether they contribute to the high morbidity and mortality in T1D despite insulin treatment.
PMCID: PMC3331761  PMID: 22415876
16.  SAAP-RRBS: streamlined analysis and annotation pipeline for reduced representation bisulfite sequencing 
Bioinformatics  2012;28(16):2180-2181.
Summary: Reduced representation bisulfite sequencing (RRBS) is a cost-effective approach for genome-wide methylation pattern profiling. Analyzing RRBS sequencing data is challenging and specialized alignment/mapping programs are needed. Although such programs have been developed, a comprehensive solution that provides researchers with good quality and analyzable data is still lacking. To address this need, we have developed a Streamlined Analysis and Annotation Pipeline for RRBS data (SAAP-RRBS) that integrates read quality assessment/clean-up, alignment, methylation data extraction, annotation, reporting and visualization. This package facilitates a rapid transition from sequencing reads to a fully annotated CpG methylation report to biological interpretation.
Availability and implementation: SAAP-RRBS is freely available to non-commercial users at the web site
Contact: or
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3413387  PMID: 22689387
17.  MicroRNA-mRNA interactions in a murine model of hyperoxia-induced bronchopulmonary dysplasia 
BMC Genomics  2012;13:204.
Bronchopulmonary dysplasia is a chronic lung disease of premature neonates characterized by arrested pulmonary alveolar development. There is increasing evidence that microRNAs (miRNAs) regulate translation of messenger RNAs (mRNAs) during lung organogenesis. The potential role of miRNAs in the pathogenesis of BPD is unclear.
Following exposure of neonatal mice to 80% O2 or room air (RA) for either 14 or 29 days, lungs of hyperoxic mice displayed histological changes consistent with BPD. Comprehensive miRNA and mRNA profiling was performed using lung tissue from both O2 and RA treated mice, identifying a number of dynamically regulated miRNAs and associated mRNA target genes. Gene ontology enrichment and pathway analysis revealed that hyperoxia modulated genes involved in a variety of lung developmental processes, including cell cycle, cell adhesion, mobility and taxis, inflammation, and angiogenesis. MiR-29 was prominently increased in the lungs of hyperoxic mice, and several predicted mRNA targets of miR-29 were validated with real-time PCR, western blotting and immunohistochemistry. Direct miR-29 targets were further validated in vitro using bronchoalveolar stem cells.
In newborn mice, prolonged hyperoxia induces an arrest of alveolar development similar to that seen in human neonates with BPD. This abnormal lung development is accompanied by significant increases in the levels of multiple miRNAs and corresponding decreases in the levels of predicted mRNA targets, many of which have known or suspected roles in pathways altered in BPD. These data support the hypothesis that dynamic regulation of miRNAs plays a prominent role in the pathophysiology of BPD.
PMCID: PMC3410783  PMID: 22646479
18.  Unique cellular and mitochondrial defects mediate FK506-induced islet β-cell dysfunction 
Transplantation  2011;91(6):615-623.
Determine biological mechanisms involved in post transplantation diabetes mellitus caused by the immunosuppressant FK506.
INS-1 cells and isolated rat islets were incubated with vehicle or FK506 and harvested at 24 hr intervals. Cells were assessed for viability, apoptosis, proliferation, cell insulin secretion and content. Gene expression studies by microarray analysis, qPCR and motifADE analysis of the microarray data identified potential FK506-mediated pathways and regulatory motifs. Mitochondrial functions, including cell respiration, mitochondrial content and bioenergetics were assessed.
Cell replication, viability, insulin secretion, oxygen consumption, and mitochondrial content were decreased (p < 0.05) 1.2-, 1.27-, 1.77-, 1.32-, and 1.43-fold, respectively after 48 hr FK506 treatment. Differences increased with time. FK506 (50 ng/ml) and Cyclosporine A (800 ng/ml) had comparable effects. FK 506 significantly decreased mitochondrial content and mitochondrial bioenergetics and showed a trend towards decreased oxygen consumption in isolated islets. Cell apoptosis and proliferation, mitochondrial DNA copy number and ATP/ADP ratios were not significantly affected. Pathway analysis of microarray data showed FK506 modification of pathways involving ATP metabolism, membrane trafficking and cytoskeleton remodeling. PGC1-α mRNA was down-regulated by FK506. MotifADE identified nuclear factor of activated T-cells (NFAT), an important mediator of β cell survival and function, as a potential factor mediating both up- and down-regulation of gene expression.
At pharmacologically relevant concentrations FK506 decreases insulin secretion and reduces mitochondrial density and function without changing apoptosis rates, suggesting that post transplantation diabetes induced by FK506 may be mediated by its effects on mitochondrial function.
PMCID: PMC3339767  PMID: 21200364
19.  Comprehensive Assessment of Potential Multiple Myeloma Immunoglobulin Heavy Chain V-D-J Intraclonal Variation Using Massively Parallel Pyrosequencing 
Oncotarget  2012;3(4):502-513.
Multiple myeloma (MM) is characterized by the accumulation of malignant plasma cells (PCs) in the bone marrow (BM). MM is viewed as a clonal disorder due to lack of verified intraclonal sequence diversity in the immunoglobulin heavy chain variable region gene (IGHV). However, this conclusion is based on analysis of a very limited number of IGHV subclones and the methodology employed did not permit simultaneous analysis of the IGHV repertoire of non-malignant PCs in the same samples. Here we generated genomic DNA and cDNA libraries from purified MM BMPCs and performed massively parallel pyrosequencing to determine the frequency of cells expressing identical IGHV sequences. This method provided an unprecedented opportunity to interrogate the presence of clonally related MM cells and evaluate the IGHV repertoire of non-MM PCs. Within the MM sample, 37 IGHV genes were expressed, with 98.9% of all immunoglobulin sequences using the same IGHV gene as the MM clone and 83.0% exhibiting exact nucleotide sequence identity in the IGHV and heavy chain complementarity determining region 3 (HCDR3). Of interest, we observed in both genomic DNA and cDNA libraries 48 sets of identical sequences with single point mutations in the MM clonal IGHV or HCDR3 regions. These nucleotide changes were suggestive of putative subclones and therefore were subjected to detailed analysis to interpret: 1) their legitimacy as true subclones; and 2) their significance in the context of MM. Finally, we report for the first time the IGHV repertoire of normal human BMPCs and our data demonstrate the extent of IGHV repertoire diversity as well as the frequency of clonally-related normal BMPCs. This study demonstrates the power and potential weaknesses of in-depth sequencing as a tool to thoroughly investigate the phylogeny of malignant PCs in MM and the IGHV repertoire of normal BMPCs.
PMCID: PMC3380583  PMID: 22522905
IGHV; multiple myeloma; heterogeneity; massively parallel sequencing
20.  Drug efflux by Breast Cancer Resistance Protein (BCRP) is a mechanism of resistance to the benzimidazole insulin-like growth factor receptor/insulin receptor inhibitor, BMS-536924 
Molecular cancer therapeutics  2011;10(1):117-125.
Preclinical investigations have identified insulin-like growth factor (IGF) signaling as a key mechanism for cancer growth and resistance to clinically useful therapies in multiple tumor types, including breast cancer. Thus, agents targeting and blocking IGF signaling have promise in the treatment of solid tumors. To identify possible mechanisms of resistance to blocking the IGF pathway, we generated a cell line that was resistant to the IGF-1R/InsR benzimidazole inhibitors BMS-554417 and BMS-536924 and compared expression profiles of the parental and resistant cells lines using Affymetrix GeneChip Human Genome U133 arrays. Compared to MCF-7 cells, BCRP expression was increased 9-fold in MCF-7R4, which was confirmed by immunoblotting and was highly statistically significant (p= 7.13E-09). BCRP was also upregulated in an independently derived resistant cell line, MCF7 924R. MCF-7R4 cells had significantly lower intracellular accumulation of BMS-536924 compared to MCF-7 cells. Expression of BCRP in MCF-7 cells was sufficient to reduce sensitivity to BMS-536924. Furthermore, knockdown of BCRP in MCF-7R4 cells resensitized cells to BMS-536924. Four cell lines selected for resistance to the pyrrolotriazine IGF-1R/InsR inhibitor, BMS-754807 did not have upregulation of BCRP. These data suggest that benzimidazole IGF-1R/InsR inhibitors may select for upregulation and be effluxed by the ABC transporter BCRP, contributing to resistance. However, pyrrolotriazine IGF-1R/InsR inhibitors do not appear to be affected by this resistance mechanism.
PMCID: PMC3057506  PMID: 21220496
BCRP; BMS-536924; Receptor; IGF Type I; tyrosine kinase inhibitor mechanism of resistance
21.  Deep Sequence Analysis of Non-Small Cell Lung Cancer: Integrated Analysis of Gene Expression, Alternative Splicing, and Single Nucleotide Variations in Lung Adenocarcinomas with and without Oncogenic KRAS Mutations 
KRAS mutations are highly prevalent in non-small cell lung cancer (NSCLC), and tumors harboring these mutations tend to be aggressive and resistant to chemotherapy. We used next-generation sequencing technology to identify pathways that are specifically altered in lung tumors harboring a KRAS mutation. Paired-end RNA-sequencing of 15 primary lung adenocarcinoma tumors (8 harboring mutant KRAS and 7 with wild-type KRAS) were performed. Sequences were mapped to the human genome, and genomic features, including differentially expressed genes, alternate splicing isoforms and single nucleotide variants, were determined for tumors with and without KRAS mutation using a variety of computational methods. Network analysis was carried out on genes showing differential expression (374 genes), alternate splicing (259 genes), and SNV-related changes (65 genes) in NSCLC tumors harboring a KRAS mutation. Genes exhibiting two or more connections from the lung adenocarcinoma network were used to carry out integrated pathway analysis. The most significant signaling pathways identified through this analysis were the NFκB, ERK1/2, and AKT pathways. A 27 gene mutant KRAS-specific sub network was extracted based on gene–gene connections from the integrated network, and interrogated for druggable targets. Our results confirm previous evidence that mutant KRAS tumors exhibit activated NFκB, ERK1/2, and AKT pathways and may be preferentially sensitive to target therapeutics toward these pathways. In addition, our analysis indicates novel, previously unappreciated links between mutant KRAS and the TNFR and PPARγ signaling pathways, suggesting that targeted PPARγ antagonists and TNFR inhibitors may be useful therapeutic strategies for treatment of mutant KRAS lung tumors. Our study is the first to integrate genomic features from RNA-Seq data from NSCLC and to define a first draft genomic landscape model that is unique to tumors with oncogenic KRAS mutations.
PMCID: PMC3356053  PMID: 22655260
transcriptome sequencing; RNA-Seq; KRAS mutation; NSCLC; bioinformatics; network analysis; data integration and computational methods
22.  Meta-analysis of 8q24 for seven cancers reveals a locus between NOV and ENPP2 associated with cancer development 
BMC Medical Genetics  2011;12:156.
Human chromosomal region 8q24 contains several genes which could be functionally related to cancer, including the proto-oncogene c-MYC. However, the abundance of associations around 128 Mb on chromosome 8 could mask the appearance of a weaker, but important, association elsewhere on 8q24.
In this study, we completed a meta-analysis of results from nine genome-wide association studies for seven types of solid-tumor cancers (breast, prostate, pancreatic, lung, ovarian, colon, and glioma) to identify additional associations that were not apparent in any individual study.
Fifteen SNPs in the 8q24 region had meta-analysis p-values < 1E-04. In particular, the region consisting of 120,576,000-120,627,000 bp contained 7 SNPs with p-values < 1.0E-4, including rs6993464 (p = 1.25E-07). This association lies in the region between two genes, NOV and ENPP2, which have been shown to play a role in tumor development and motility. An additional region consisting of 5 markers from 128,478,000 bp - 128,524,000 (around gene POU5F1B) had p-values < 1E-04, including rs6983267, which had the smallest p-value (p = 6.34E-08). This result replicates previous reports of association between rs6983267 and prostate and colon cancer.
Further research in this area is warranted as these results demonstrate that the chromosomal region 8q24 may contain a locus that influences general cancer susceptibility between 120,576 and 120,630 kb.
PMCID: PMC3267702  PMID: 22142333
23.  TREAT: a bioinformatics tool for variant annotations and visualizations in targeted and exome sequencing data 
Bioinformatics  2011;28(2):277-278.
Summary: TREAT (Targeted RE-sequencing Annotation Tool) is a tool for facile navigation and mining of the variants from both targeted resequencing and whole exome sequencing. It provides a rich integration of publicly available as well as in-house developed annotations and visualizations for variants, variant-hosting genes and host-gene pathways.
Availability and implementation: TREAT is freely available to non-commercial users as either a stand-alone annotation and visualization tool, or as a comprehensive workflow integrating sequencing alignment and variant calling. The executables, instructions and the Amazon Cloud Images of TREAT can be downloaded at the website:
Supplementary information: Supplementary data are provided at Bioinformatics online.
PMCID: PMC3259432  PMID: 22088845
24.  c-Myc Regulates Self-Renewal in Bronchoalveolar Stem Cells 
PLoS ONE  2011;6(8):e23707.
Bronchoalveolar stem cells (BASCs) located in the bronchoalveolar duct junction are thought to regenerate both bronchiolar and alveolar epithelium during homeostatic turnover and in response to injury. The mechanisms directing self-renewal in BASCs are poorly understood.
BASCs (Sca-1+, CD34+, CD31− and, CD45−) were isolated from adult mouse lung using FACS, and their capacity for self-renewal and differentiation were demonstrated by immunostaining. A transcription factor network of 53 genes required for pluripotency in embryonic stem cells was assessed in BASCs, Kras-initiated lung tumor tissue, and lung organogenesis by real-time PCR. c-Myc was knocked down in BASCs by infection with c-Myc shRNA lentivirus. Comprehensive miRNA and mRNA profiling for BASCs was performed, and significant miRNAs and mRNAs potentially regulated by c-Myc were identified. We explored a c-Myc regulatory network in BASCs using a number of statistical and computational approaches through two different strategies; 1) c-Myc/Max binding sites within individual gene promoters, and 2) miRNA-regulated target genes.
c-Myc expression was upregulated in BASCs and downregulated over the time course of lung organogenesis in vivo. The depletion of c-Myc in BASCs resulted in decreased proliferation and cell death. Multiple mRNAs and miRNAs were dynamically regulated in c-Myc depleted BASCs. Among a total of 250 dynamically regulated genes in c-Myc depleted BASCs, 57 genes were identified as potential targets of miRNAs through miRBase and TargetScan-based computational mapping. A further 88 genes were identified as potential downstream targets through their c-Myc binding motif.
c-Myc plays a critical role in maintaining the self-renewal capacity of lung bronchoalveolar stem cells through a combination of miRNA and transcription factor regulatory networks.
PMCID: PMC3157444  PMID: 21858211
25.  Casp8p41 expression in primary T cells induces a proinflammatory response 
AIDS (London, England)  2010;24(9):1251-1258.
HIV infection of CD4 T cells can lead to HIV protease-mediated cleavage of procaspase 8 generating a novel, HIV-specific peptide called Casp8p41. Casp8p41 has at least two biologic functions: induction of cell death via mitochondrial depolarization and release of cytochrome C, as well as activation of nuclear factor kappa B (NFκB). We have previously shown that Casp8p41-induced NFκB activation enhances HIV LTR transcription and consequently increases HIV replication. Herein, we questioned whether Casp8p41-induced NFκB activation impacts the cytokine profile of cells expressing Casp8p41.
Analysis of cells expressing Casp8p41 and HIV-infected T cells.
We assessed whether host genes are transcriptionally activated following Casp8p41 production, using microarray analysis, cytokine quantification, followed by western blot and flow cytometry.
Microarray analysis identified 259 genes significantly upregulated following expression of Casp8p41. Furthermore, Casp8p41 expression in primary CD4 T cells results in increased production of interleukin (IL)-2, IL-15 and tumor necrosis factor (TNF), as well as IL-1RA; whereas levels of granulocyte macrophage colony-stimulating factor and interferon (IFN)-γ were reduced in the Casp8p41 expressing cells. Intra-cellular flow cytometry confirmed the co-association of Casp8p41 with elevated TNF in HIV-infected cells.
These data indicate that the expression of Casp8p41 in HIV-infected CD4 T cells in addition to promoting apoptosis and enhancing HIV replication also promotes a proinflammatory cytokine milieu, which is characteristic of untreated HIV infection.
PMCID: PMC3150465  PMID: 20299954
apoptosis; Casp8p41; HIV; inflammation; protease; tumor necrosis factor

