Motivation: There is now a large literature on statistical methods for the meta-analysis of genomic data from multiple studies. However, a crucial assumption for performing many of these analyses is that the data exhibit small between-study variation or that this heterogeneity can be sufficiently modelled probabilistically.
Results: In this article, we propose ‘assumption weighting’, which exploits a weighted hypothesis testing framework proposed by Genovese et al. to incorporate tests of between-study variation into the meta-analysis context. This methodology is fast and computationally simple to implement. Several weighting schemes are considered and compared using simulation studies. In addition, we illustrate application of the proposed methodology using data from several high-profile stem cell gene expression datasets.
Availability:
http://works.bepress.com/debashis_ghosh/50/
Contact:
ghoshd@psu.edu
doi:10.1093/bioinformatics/bts037
PMCID: PMC3307113
PMID: 22285559
doi:10.1111/j.1541-0420.2011.01638.x
PMCID: PMC3175271
PMID: 22547832
Background
Visceral leishmaniasis (VL) is a major public health problem in Bangladesh with the highest disease burden in the Mymensingh District. The disease is transmitted by sand fly bites, but it may also be transmitted through blood transfusions. No information is available about the prevalence of Leishmania infection among blood donors in Bangladesh; therefore we aimed to investigate this question.
Methods
The study was carried out in the Blood Transfusion Department of Mymensingh Medical College Hospital. One thousand one hundred and ninety five adult healthy blood donors attending in this department were enrolled in the study from August 2010 to April 2011. After obtaining written consent, socio-demographic data and a detailed health history were collected. The medical officer in the unit performed a complete physical examination to exclude any acute or chronic diseases, which was followed by sero-diagnosis for exposure to Leishmania by rK39 strip test using finger prick blood. Blood donors with a positive rK39 strip test underwent a PCR test for detection of leishmania DNA in their peripheral blood buffy coat.
Results
Eighty two percent of enrolled blood donors were male (n=985) and 18% (n=210) were female. The mean age of blood donors was 27 years (SD, 7.95 years). The majority of donors were literate and had mid-to-higher socioeconomic condition reflected by household conditions reported by the subject. Only 2.6% had a family member with VL in the past. Three blood donors were positive for leishmania infection by rK39 strip test (0.3%, 95%CI, 0.05%-0.73%). None of these 3 had active leishmania infection as demonstrated by PCR analysis. During six months of follow up, neither rK39 positive (n=3) nor rK39 negative (n=1192) donors developed VL.
Conclusion
The prevalence of Leishmania donovani infection among blood donors attending the Blood Transfusion Department of Mymensingh Medical College Hospital was very low. Therefore the chance for transmission of VL through blood transfusion is negligible. We believe that the National VL Elimination Program does not need set up routine screening for Leishmania donovani infection in blood transfusion departments located in VL endemic areas of Bangladesh.
doi:10.1186/1471-2334-13-62
PMCID: PMC3565955
PMID: 23375008
Visceral leishmaniasis; Kala-azar; Blood donors; Transfusion; Leishmania donovani; Bangladesh
Summary
Conditional independence assumptions are very important in causal inference modelling as well as in dimension reduction methodologies. These are two very strikingly different statistical literatures, and we study links between the two in this article. The concept of covariate sufficiency plays an important role, and we provide theoretical justification when dimension reduction and partial least squares methods will allow for valid causal inference to be performed. The methods are illustrated with application to a medical study and to simulated data.
doi:10.1016/j.spl.2011.03.002
PMCID: PMC3099445
PMID: 21617766
Average causal effect; matching; model misspecification; observational data; potential outcomes
Aromatase (CYP19A1) is an integral membrane enzyme that catalyzes the removal of the 19-methyl group and aromatization of the A-ring of androgens. All human estrogens are synthesized from their androgenic precursors by this unique cytochrome P450. The crystal structure of active aromatase purified from human placenta has recently been determined in complex with its natural substrate androstenedione in the high-spin ferric state of heme. Hydrogen bond forming interactions and tight packing hydrophobic side chains closely complement puckering of the steroid backbone, thereby providing the molecular basis for the androgenic specificity of aromatase. In the crystal, aromatase molecules are linked by a head-to-tail intermolecular interaction via a surface loop between helix D and helix E of one aromatase molecule that penetrates the heme-proximal cavity of the neighboring, crystallographically-related molecule, thus forming in tandem a polymeric aromatase chain. This intermolecular interaction is similar to the aromatase-Cytochrome P450 reductase coupling and is driven by electrostatics between the negative potential surface of the D-E loop region and the positively charged heme-proximal cavity. This loop-to-proximal site link in aromatase is rather unique - there are only a few of examples of somewhat similar intermolecular interactions in the entire P450 structure database. Furthermore, the amino acids involved in the intermolecular contact appear to be specific for aromatase. Higher order organization of aromatase monomers may have implications in lipid integration and catalysis.
doi:10.1016/j.steroids.2011.02.030
PMCID: PMC3217041
PMID: 21392520
The crystal structures of human placental aromatase in complex with the substrate androstenedione and exemestane have revealed an androgen-specific active site and the structural basis for higher order organization. However, X-ray structures do not provide accounts of movements due to short-range fluctuations, ligand binding and protein-protein association. In this work, we conduct normal mode analysis (NMA) revealing the intrinsic fluctuations of aromatase, deduce the internal modes in membrane-free and membrane-integrated monomers as well as the intermolecular modes in oligomers, and propose a quaternary organization for the endoplasmic reticulum (ER) membrane integration. Dynamics of the crystallographic oligomers from NMA is found to be in agreement with the isotropic thermal factors from the X-ray analysis. Calculations of the root mean square fluctuations of the C-alpha atoms from their equilibrium positions confirm that the rigid-core structure of aromatase is intrinsic regardless of the changes in steroid binding interactions, and that aromatase self-association does not deteriorate the rigidity of the catalytic cleft. Furthermore, NMA on membrane-integrated aromatase shows that the internal modes in all likelihood contribute to breathing of the active site access channel. The collective intermolecular hinge bending and twisting modes provide the flexibility in the quaternary association necessary for membrane integration of the aromatase oligomers. Taken together, fluctuations of the active site, the access channel, and the heme-proximal cavity, and a dynamic quaternary organization could all be essential components of the functional aromatase in its role as an ER membrane-embedded steroidogenic enzyme.
doi:10.1371/journal.pone.0032565
PMCID: PMC3288111
PMID: 22384274
With the rapid advances of various high-throughput technologies, generation of ‘-omics’ data is commonplace in almost every biomedical field. Effective data management and analytical approaches are essential to fully decipher the biological knowledge contained in the tremendous amount of experimental data. Meta-analysis, a set of statistical tools for combining multiple studies of a related hypothesis, has become popular in genomic research. Here, we perform a systematic search from PubMed and manual collection to obtain 620 genomic meta-analysis papers, of which 333 microarray meta-analysis papers are summarized as the basis of this paper and the other 249 GWAS meta-analysis papers are discussed in the next companion paper. The review in the present paper focuses on various biological purposes of microarray meta-analysis, databases and software and related statistical procedures. Statistical considerations of such an analysis are further scrutinized and illustrated by a case study. Finally, several open questions are listed and discussed.
doi:10.1093/nar/gkr1265
PMCID: PMC3351145
PMID: 22262733
Over the last decade, genome-wide association studies (GWAS) have become the standard tool for gene discovery in human disease research. While debate continues about how to get the most out of these studies and on occasion about how much value these studies really provide, it is clear that many of the strongest results have come from large-scale mega-consortia and/or meta-analyses that combine data from up to dozens of studies and tens of thousands of subjects. While such analyses are becoming more and more common, statistical methods have lagged somewhat behind. There are good meta-analysis methods available, but even when they are carefully and optimally applied there remain some unresolved statistical issues. This article systematically reviews the GWAS meta-analysis literature, highlighting methodology and software options and reviewing methods that have been used in real studies. We illustrate differences among methods using a case study. We also discuss some of the unresolved issues and potential future directions.
doi:10.1093/nar/gkr1255
PMCID: PMC3351172
PMID: 22241776
Background
During respiratory viral infections host injury occurs due in part to inappropriate host responses. In this study we sought to uncover the host transcriptional responses underlying differences between high- and low-pathogenic infections.
Results
From a compendium of 12 studies that included responses to influenza A subtype H5N1, reconstructed 1918 influenza A virus, and SARS coronavirus, we used meta-analysis to derive multiple gene expression signatures. We compared these signatures by their capacity to segregate biological conditions by pathogenicity and predict pathogenicity in a test data set. The highest-performing signature was expressed as a continuum in low-, medium-, and high-pathogenicity samples, suggesting a direct, analog relationship between expression and pathogenicity. This signature comprised 57 genes including a subnetwork of chemokines, implicating dysregulated cell recruitment in injury.
Conclusions
Highly pathogenic viruses elicit expression of many of the same key genes as lower pathogenic viruses but to a higher degree. This increased degree of expression may result in the uncontrolled co-localization of inflammatory cell types and lead to irreversible host damage.
doi:10.1186/1752-0509-5-202
PMCID: PMC3297540
PMID: 22189154
Summary
There has been substantive interest in the assessment of surrogate endpoints in medical research. These are measures which could potentially replace “true” endpoints in clinical trials and lead to studies that require less follow-up. Recent research in the area has focused on assessments using causal inference frameworks. Beginning with a simple model for associating the surrogate and true endpoints in the population, we approach the problem as one of endogenous covariates. An instrumental variables estimator and general two-stage algorithm is proposed. Existing surrogacy frameworks are then evaluated in the context of the model. In addition, we define an extended relative effect estimator as well as a sensitivity analysis for assessing what we term the treatment instrumentality assumption. A numerical example is used to illustrate the methodology.
doi:10.1002/sim.4027
PMCID: PMC2997195
PMID: 20803482
Clinical Trial; Counterfactual; Nonlinear response; Prentice Criterion; Structural equations model
Chen, Fang | Ding, Xicheng | Ding, Ying | Xiang, Zuoshuang | Li, Xinna | Ghosh, Debashis | Schurig, Gerhardt G. | Sriranganathan, Nammalwar | Boyle, Stephen M. | He, Yongqun | Blanke, S. R.
Brucella spp. are intracellular bacteria that cause an infectious disease called brucellosis in humans and many domestic and wildlife animals. B. suis primarily infects pigs and is pathogenic to humans. The macrophage-Brucella interaction is critical for the establishment of a chronic Brucella infection. Our studies showed that smooth virulent B. suis strain 1330 (S1330) prevented programmed cell death of infected macrophages and rough attenuated B. suis strain VTRS1 (a vaccine candidate) induced strong macrophage cell death. To further investigate the mechanism of VTRS1-induced macrophage cell death, microarrays were used to analyze temporal transcriptional responses of murine macrophage-like J774.A1 cells infected with S1330 or VTRS1. In total 17,685 probe sets were significantly regulated based on the effects of strain, time and their interactions. A miniTUBA dynamic Bayesian network analysis predicted that VTRS1-induced macrophage cell death was mediated by a proinflammatory gene (the tumor necrosis factor alpha [TNF-α] gene), an NF-κB pathway gene (the IκB-α gene), the caspase-2 gene, and several other genes. VTRS1 induced significantly higher levels of transcription of 40 proinflammatory genes than S1330. A Mann-Whitney U test confirmed the proinflammatory response in VTRS1-infected macrophages. Increased production of TNF-α and interleukin 1β (IL-1β) were also detected in the supernatants in VTRS1-infected macrophage cell culture. Hyperphosphorylation of IκB-α was observed in macrophages infected with VTRS1 but not S1330. The important roles of TNF-α and IκB-α in VTRS1-induced macrophage cell death were further confirmed by individual inhibition studies. VTRS1-induced macrophage cell death was significantly inhibited by a caspase-2 inhibitor but not a caspase-1 inhibitor. The role of caspase-2 in regulating the programmed cell death of VTRS1-infected macrophages was confirmed in another study using caspase-2-knockout mice. In summary, VTRS1 induces a proinflammatory, caspase-2- and NF-κB-mediated macrophage cell death. This unique cell death differs from apoptosis, which is not proinflammatory. It is also different from classical pyroptosis, which is caspase-1 mediated.
doi:10.1128/IAI.00050-11
PMCID: PMC3125819
PMID: 21464087
Background
Enrichment testing assesses the overall evidence of differential expression behavior of the elements within a defined set. When we have measured many molecular aspects, e.g. gene expression, metabolites, proteins, it is desirable to assess their differential tendencies jointly across platforms using an integrated set enrichment test. In this work we explore the properties of several methods for performing a combined enrichment test using gene expression and metabolomics as the motivating platforms.
Results
Using two simulation models we explored the properties of several enrichment methods including two novel methods: the logistic regression 2-degree of freedom Wald test and the 2-dimensional permutation p-value for the sum-of-squared statistics test. In relation to their univariate counterparts we find that the joint tests can improve our ability to detect results that are marginal univariately. We also find that joint tests improve the ranking of associated pathways compared to their univariate counterparts. However, there is a risk of Type I error inflation with some methods and self-contained methods lose specificity when the sets are not representative of underlying association.
Conclusions
In this work we show that consideration of data from multiple platforms, in conjunction with summarization via a priori pathway information, leads to increased power in detection of genomic associations with phenotypes.
doi:10.1186/1471-2105-12-459
PMCID: PMC3329720
PMID: 22118224
A biomarker is defined to be a biological characteristic that is objectively measured and evaluated as an indicator of normal biologic processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention. The use of biomarkers in cancer has been advocated for a variety of purposes, which include use as surrogate endpoints, early detection of disease, proxies for environmental exposure and risk prediction. We deal with the latter issue in this paper.
Several authors have proposed use of the predictiveness curve for assessing the capacity of a biomarker for risk prediction. For most situations, it is reasonable to assume monotonicity of the biomarker effects on disease risk. In this article, we propose the use of flexible modelling of the predictiveness curve and its bivariate analogue, the predictiveness surface, through the use of spline algorithms that incorporate the appropriate monotonicity constraints. Estimation proceeds through use of a two-step algorithm that represents the “smooth, then monotonize” approach. Subsampling procedures are used for inference. The methods are illustrated to data from a melanoma study.
PMCID: PMC3193347
PMID: 22003414
Active set algorithm; Isotonic regression; Nonregular asymptotics; Pool adjacent violators algorithm; Risk prediction; Thin-plate spline
The analysis of recurrent failure time data from longitudinal studies can be complicated by the presence of dependent censoring. There has been a substantive literature that has developed based on an artificial censoring device. We explore in this article the connection between this class of methods with truncated data structures. In addition, a new procedure is developed for estimation and inference in a joint model for recurrent events and dependent censoring. Estimation proceeds using a mixed U-statistic based estimating function approach. New resampling-based methods for variance estimation and model checking are also described. The methods are illustrated by application to data from an HIV clinical trial as with a limited simulation study.
doi:10.1007/s10985-009-9150-4
PMCID: PMC2939236
PMID: 20063182
Accelerated failure time model; Cause-specific hazard; Comparability; Competing risks; Empirical process; Semi-competing risks data
Davis, Shannon W | Potok, Mary Anne | Brinkmeier, Michelle L | Carninci, Piero | Lyons, Robert H | MacDonald, James W. | Fleming, Michelle T | Mortensen, Amanda H | Egashira, Noboru | Ghosh, Debashis | Steel, Karen P. | Osamura, Robert Y | Hayashizaki, Yoshihide | Camper, Sally A
Genetic cases of congenital pituitary hormone deficiency are common and many are caused by transcription factor defects. Mouse models with orthologous mutations are invaluable for uncovering the molecular mechanisms that lead to problems in organ development and typical patient characteristics. We are using mutant mice defective in the transcription factors PROP1 and POU1F1 for gene expression profiling to identify target genes for these critical transcription factors and candidates for cases of pituitary hormone deficiency of unknown etiology. These studies reveal critical roles for Wnt signalling pathways including the TCF/LEF transcription factors and interacting proteins of the groucho family, bone morphogenetic proteins antagonists, and targets of notch signalling. Current studies are investigating roles of novel homeobox genes and pathways that regulate the transition from proliferation to differentiation, cell adhesion and cell migration.
Pituitary adenomas are a common human health problem, yet most cases are sporadic, necessitating alternative approaches to traditional Mendelian genetic studies. Mouse models of adenoma formation offer the opportunity for gene expression profiling during progressive stages of hyperplasia, adenoma and tumorigenesis. This approach holds promise for identification of relevant pathways and candidate genes as risk factors for adenoma formation, understanding mechanisms of progression, and identifying drug targets and clinically relevant biomarkers.
doi:10.1159/000192447
PMCID: PMC3140954
PMID: 19407506
cell proliferation; apoptosis; transcription factors; Prop1; Emx2
Background
Visceral leishmaniasis (VL), caused by an intracellular parasite Leishmania donovani in the Indian subcontinent, is considered to be anthroponotic. The role of domestic animals in its transmission is still unclear. Although cattle are the preferred blood host for Phlebotomus argentipes, the sandfly vector of VL in the Indian subcontinent, very little information is available for their role in the disease transmission. In this study, we examined domestic cattle for serological and molecular evidence of Leishmania infection in a VL-endemic area in Bangladesh. Blood samples from 138 domestic cattle were collected from houses with active or recently-treated VL and post-kala-azar dermal leishmaniasis patients. The presence of anti-leishmanial antibodies in serum was investigated using enzyme-linked immunosorbent assay (ELISA) and then with direct agglutination tests (DAT). Nested PCR (Ln PCR) was performed to amplify the ssu-rRNA gene using the DNA extracted from Buffy coat. Recently-developed molecular assay loop-mediated isothermal amplification (LAMP) was also performed for further sensitive detection of parasite DNA.
Results
In this study, 9.4% (n = 13) of the cattle were found to be positive by ELISA. Of the 13 ELISA-positive cattle, only four (30.8%) were positive in DAT. Parasite DNA was not detected in either of the molecular assays (Ln PCR and LAMP).
Conclusions
The study confirmed the presence of antibodies against Leishmania parasite in cattle. However, the absence of Leishmania DNA in the cattle indicates clearly that the cattle do not play a role as reservoir host. Similar study needs to be undertaken in the Indian subcontinent to determine the role of other domestic animals on which sandflies feed.
doi:10.1186/1746-6148-7-27
PMCID: PMC3125318
PMID: 21651757
In high-throughput studies involving genetic data such as from gene expression microarrays, differential expression analysis between two or more experimental conditions has been a very common analytical task. Much of the resulting literature on multiple comparisons has paid relatively little attention to the choice of test statistic. In this article, we focus on the issue of choice of test statistic based on a special pattern of differential expression. The approach here is based on recasting multiple comparisons procedures for assessing outlying expression values. A major complication is that the resulting p-values are discrete; some theoretical properties of sequential testing procedures in this context are explored. We propose the use of q-value estimation procedures in this setting. Data from a gene expression profiling experiment in prostate cancer are used to illustrate the methodology.
doi:10.1080/10543400903572704
PMCID: PMC2845329
PMID: 20309754
Aromatase is a unique cytochrome P450 that catalyzes the removal of the 19-methyl group and aromatization of the A-ring of androgens for the synthesis of estrogens. All human estrogens are synthesized via this enzymatic aromatization pathway. Aromatase inhibitors thus constitute a frontline therapy for estrogen-dependent breast cancer. Despite decades of intense investigation, this enzyme of the endoplasmic reticulum membrane has eluded all structure determination efforts. We have determined the crystal structure of the highly active aromatase purified from human placenta, in complex with its natural substrate androstenedione. The structure shows the binding mode of androstenedione in the catalytically active oxidized high-spin ferric state of the enzyme. Hydrogen bond forming interactions and tight packing hydrophobic side chains that complement the puckering of the steroid backbone provide the molecular basis for the exclusive androgenic specificity of aromatase. Locations of catalytic residues and water molecules shed new light on the mechanism of the aromatization step. The structure also suggests a membrane integration model indicative of the passage of steroids through the lipid bilayer.
doi:10.1016/j.jsbmb.2009.09.012
PMCID: PMC2826573
PMID: 19808095
Abstract
Copy number aberration is a common form of genomic instability in cancer. Gene expression is closely tied to cytogenetic events by the central dogma of molecular biology, and serves as a mediator of copy number changes in disease phenotypes. Accordingly, it is of interest to develop proper statistical methods for jointly analyzing copy number and gene expression data. This work describes a novel Bayesian inferential approach for a double-layered mixture model (DLMM) which directly models the stochastic nature of copy number data and identifies abnormally expressed genes due to aberrant copy number. Simulation studies were conducted to illustrate the robustness of DLMM under various settings of copy number aberration frequency, confounding effects, and signal-to-noise ratio in gene expression data. Analysis of a real breast cancer data shows that DLMM is able to identify expression changes specifically attributable to copy number aberration in tumors and that a sample-specific index built based on the selected genes is correlated with relevant clinical information.
doi:10.1089/cmb.2009.0019
PMCID: PMC3148827
PMID: 20170400
cancer genomics; statistics
Gudjonsson, Johann E. | Ding, Jun | Li, Xing | Nair, Rajan P | Tejasvi, Trilokraj | Qin, Zhaohui S. | Ghosh, Debashis | Aphale, Abhishek | Gumucio, Deborah L. | Voorhees, John J. | Abecasis, Goncalo | Elder, James T.
Psoriasis is a genetically determined inflammatory skin disease. Although the transition from uninvolved into lesional skin is accompanied by changes in the expression of multiple genes, much less is known about the difference between uninvolved skin from psoriatic patients as opposed to skin from normal individuals. Multiple biochemical and morphological changes were reported decades ago in uninvolved psoriatic skin but remain poorly understood. Here we demonstrate dysregulation of 223 transcripts representing 179 unique genes in uninvolved psoriatic skin, 178 of which were not previously known to be altered in their expression. The proteins encoded by these transcripts are involved in lipid metabolism, antimicrobial defenses, epidermal differentiation and control of cutaneous vasculature. Cluster analysis of transcripts with significantly altered expression identified a group of genes involved in lipid metabolism with highly correlated gene expression. Promoter analysis demonstrated enrichment for binding sites of three transcription factors; peroxisome proliferator-activator receptor alpha (PPARA), sterol regulatory element-binding protein (SREBF) and estrogen receptor 2 (ESR2), suggesting that coordinate regulation of lipid metabolic genes may be related to the action of these factors. Taken together, our results identify a “pre-psoriatic” gene expression signature, suggesting decreased lipid biosynthesis and increased innate immunity in uninvolved psoriatic skin.
doi:10.1038/jid.2009.173
PMCID: PMC2783967
PMID: 19571819
psoriasis; gene expression; microarray; skin; immunology
We report a catalog of the mouse embryonic pituitary gland transcriptome consisting of five cDNA libraries including wild type tissue from E12.5 and E14.5, Prop1df/df mutant at E14.5, and two cDNA subtractions: E14.5 WT-E14.5 Prop1df/df and E14.5 WT-E12.5 WT. DNA sequence information is assembled into a searchable database with gene ontology terms representing 12,009 expressed genes. We validated coverage of the libraries by detecting most known homeobox gene transcription factor cDNAs. A total of 45 homeobox genes were detected as part of the pituitary transcriptome, representing most expected ones, which validated library coverage, and many novel ones, underscoring the utility of this resource as a discovery tool. We took a similar approach for signaling-pathway members with novel pituitary expression and found 157 genes related to the BMP, FGF, WNT, SHH and NOTCH pathways. These genes are exciting candidates for regulators of pituitary development and function.
doi:10.1016/j.ygeno.2008.11.010
PMCID: PMC2935795
PMID: 19121383
Cap trapper; Homeobox gene; Prop1; Gene expression; Ames dwarf
Motivation: Chromatin immunoprecipitation (ChIP) experiments followed by array hybridization, or ChIP-chip, is a powerful approach for identifying transcription factor binding sites (TFBS) and has been widely used. Recently, massively parallel sequencing coupled with ChIP experiments (ChIP-seq) has been increasingly used as an alternative to ChIP-chip, offering cost-effective genome-wide coverage and resolution up to a single base pair. For many well-studied TFs, both ChIP-seq and ChIP-chip experiments have been applied and their data are publicly available. Previous analyses have revealed substantial technology-specific binding signals despite strong correlation between the two sets of results. Therefore, it is of interest to see whether the two data sources can be combined to enhance the detection of TFBS.
Results: In this work, hierarchical hidden Markov model (HHMM) is proposed for combining data from ChIP-seq and ChIP-chip. In HHMM, inference results from individual HMMs in ChIP-seq and ChIP-chip experiments are summarized by a higher level HMM. Simulation studies show the advantage of HHMM when data from both technologies co-exist. Analysis of two well-studied TFs, NRSF and CCCTC-binding factor (CTCF), also suggests that HHMM yields improved TFBS identification in comparison to analyses using individual data sources or a simple merger of the two.
Availability: Source code for the software ChIPmeta is freely available for download at http://www.umich.edu/∼hwchoi/HHMMsoftware.zip, implemented in C and supported on linux.
Contact: ghoshd@psu.edu; qin@umich.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btp312
PMCID: PMC2732365
PMID: 19447789
Protein microarrays have been used to explore whether a humoral response to pancreatic cancer-specific tumor antigens has utility as a biomarker of pancreatic cancer. To determine if such arrays can be used to identify novel autoantibodies in the sera from pancreatic cancer patients, proteins from a pancreatic adenocarcinoma cell line (MIAPACA) were resolved by 2-D liquid-based separations, and then arrayed on nitro-cellulose slides. The slides were probed with serum from a set of patients diagnosed with pancreatic cancer and compared with age- and sex-matched normal subjects. To account for patient-to-patient variability, we used a rank-based non-parametric statistical testing approach in which proteins eliciting significant differences in the humoral response in cancer compared with control samples were identified. The prediction analysis for microarrays classification algorithm was used to explore the classification power of the proteins found to be differentially expressed in cancer and control sera. The generalization error of the classification analysis was estimated using leave-one-out cross-validation. A serum diagnosis of pancreatic cancer in this set was predicted with 86.7% accuracy, with a sensitivity and specificity of 93.3 and 80%, respectively. Candidate autoantibody biomarkers identified using this approach were studied for their classification power by performing a humoral response experiment on recombinant proteins using an independent sample set of 238 serum samples. Phosphoglycerate kinase-1 and histone H4 were noted to elicit a significant differential humoral response in cancer sera compared with age- and sex-matched sera from normal patients and patients with chronic pancreatitis and diabetes. This work demonstrates the use of natural protein arrays to study the humoral response as a means to search for the potential markers of cancer in serum.
doi:10.1002/elps.200800857
PMCID: PMC2794663
PMID: 19582723
Cancer; Liquid separations; Microarrays
Summary
Colorectal cancer is the second leading cause of cancer related deaths in the United States, with more than 130,000 new cases of colorectal cancer diagnosed each year. Clinical studies have shown that genetic alterations lead to different responses to the same treatment, despite the morphologic similarities of tumors. A molecular test prior to treatment could help in determining an optimal treatment for a patient with regard to both toxicity and efficacy. This article introduces a statistical method appropriate for predicting and comparing multiple endpoints given different treatment options and molecular profiles of an individual. A latent variable-based multivariate regression model with structured variance covariance matrix is considered here. The latent variables account for the correlated nature of multiple endpoints and accommodate the fact that some clinical endpoints are categorical variables and others are censored variables. The mixture normal hierarchical structure admits a natural variable selection rule. Inference was conducted using the posterior distribution sampling Markov chain Monte Carlo method. We analyzed the finite-sample properties of the proposed method using simulation studies. The application to the advanced colorectal cancer study revealed associations between multiple endpoints and particular biomarkers, demonstrating the potential of individualizing treatment based on genetic profiles.
doi:10.1111/j.1541-0420.2008.01181.x
PMCID: PMC2870722
PMID: 19210736
Bayesian multivariate regression; Biomarker; Hierarchical model; Interaction; Latent variable; Oncology
PURPOSE
Interphotoreceptor retinoid-binding protein (IRBP) has been considered essential for normal rod and cone function, as it mediates the transport of retinoids between the photoreceptors and the retinal pigment epithelium. This study was performed to determine whether mutations in the IRBP gene (RBP3) are associated with photoreceptor degeneration.
METHODS
A consanguineous family was ascertained in which four children had autosomal recessive retinitis pigmentosa (RP). Homozygosity mapping performed with SNP microarrays revealed only one homozygous region shared by all four affected siblings. Sequencing of RBP3, contained in this region, was performed in this family and others with recessive RP. Screening was also performed on patients with various other forms of retinal degeneration or malfunction.
RESULTS
Sequence analysis of RBP3 revealed a homozygous missense mutation (p.Asp1080Asn) in the four affected siblings. The mutation affects a residue that is completely conserved in all four homologous modules of the IRBP protein of vertebrate species and in C-terminal-processing proteases, photosynthesis enzymes found in bacteria, algae, and plants. Based on the previously reported crystal structure of Xenopus IRBP, the authors predict that the Asp1080-mediated conserved salt bridge that appears to participate in scaffolding of the retinol-binding domain is abolished by the mutation. No RBP3 mutations were detected in 395 unrelated patients with recessive or isolate RP or in 680 patients with other forms of hereditary retinal degeneration.
CONCLUSIONS
Mutations in RBP3 are an infrequent cause of autosomal recessive RP. The mutation Asp1080Asn may alter the conformation of the IRBP protein by disrupting a conserved salt bridge.
doi:10.1167/iovs.08-2497
PMCID: PMC2823395
PMID: 19074801