Providing personalized treatments designed to maximize benefits and minimizing harms is of tremendous current medical interest. One problem in this area is the evaluation of the interaction between the treatment and other predictor variables. Treatment effects in subgroups having the same direction but different magnitudes are called quantitative interactions, while those having opposite directions in subgroups are called qualitative interactions (QIs). Identifying QIs is challenging since they are rare and usually unknown among many potential biomarkers. Meanwhile, subgroup analysis reduces the power of hypothesis testing and multiple subgroup analyses inflate the type I error rate. We propose a new Bayesian approach to search for QI in a multiple regression setting with adaptive decision rules. We consider various regression models for the outcome. This method is illustrated in two examples of Phase III clinical trials. The algorithm is straightforward and easy to implement using existing software packages. A sample code was provided in the appendix.
Interaction; Subgroup; Predictive Marker; Prognostic Marker; Clinical Trial
Autism Spectrum Disorder (ASD) occurs more often among males than females in a 4:1 ratio. Among theories used to explain the causes of ASD, the X chromosome and the Y chromosome theories attribute ASD to the X-linked mutation and the male-limited gene expressions on the Y chromosome, respectively. Despite the rationale of the theory, studies have failed to attribute the sex-biased ratio to the significant linkage or association on the regions of interest on X chromosome. We further study the gender biased ratio by examining the possible interaction effects between two genes in the sex chromosomes. We propose a logistic regression model with mixed effects to detect gene–gene interactions on sex chromosomes. We investigated the power and type I error rates of the approach for a range of minor allele frequencies and varying linkage disequilibrium between markers and QTLs. We also evaluated the robustness of the model to population stratification. We applied the model to a trio-family data set with an ASD affected male child to study gene–gene interactions on sex chromosomes.
binary traits; gene–gene interaction; generalized linear mixed effect model; logistic model; trio data; sex chromosomes
Human cytochrome P450 aromatase catalyzes with high specificity the synthesis of estrogens from androgens. Aromatase inhibitors (AIs) such as exemestane, 6-methylideneandrosta-1,4-diene-3,17-dione, are preeminent drugs for the treatment of estrogen-dependent breast cancer. The crystal structure of human placental aromatase has shown an androgen-specific active site. By utilization of the structural data, novel C6-substituted androsta-1,4-diene-3,17-dione inhibitors have been designed. Several of the C6-substituted 2-alkynyloxy compounds inhibit purified placental aromatase with IC50 values in the nanomolar range. Antiproliferation studies in a MCF-7 breast cancer cell line demonstrate that some of these compounds have EC50 values better than 1 nM, exceeding that for exemestane. X-ray structures of aromatase complexes of two potent compounds reveal that, per their design, the novel side groups protrude into the opening to the access channel unoccupied in the enzyme–substrate/exemestane complexes. The observed structure–activity relationship is borne out by the X-ray data. Structure-guided design permits utilization of the aromatase-specific interactions for the development of next generation AIs.
In this note, we address the problem of surrogacy using a causal modelling framework that differs substantially from the potential outcomes model that pervades the biostatistical literature. The framework comes from econometrics and conceptualizes direct effects of the surrogate endpoint on the true endpoint. While this framework can incorporate the so-called semi-competing risks data structure, we also derive a fundamental non-identifiability result. Relationships to existing causal modelling frameworks are also discussed.
Clinical Trial; Counterfactual; Dependence; Nonlinear response; Prentice Criterion; Rubin causal model
The structure of NheA, a component of the Bacillus cereus Nhe tripartite toxin, has been solved at 2.05 Å resolution using selenomethionine multiple-wavelength anomalous dispersion (MAD). The structure shows it to have a fold that is similar to the Bacillus cereus Hbl-B and E. coli ClyA toxins, and it is therefore a member of the ClyA superfamily of α-helical pore forming toxins (α-PFTs), although its head domain is significantly enlarged compared with those of ClyA or Hbl-B. The hydrophobic β-hairpin structure that is a characteristic of these toxins is replaced by an amphipathic β-hairpin connected to the main structure via a β-latch that is reminiscent of a similar structure in the β-PFT Staphylococcus aureus α-hemolysin. Taken together these results suggest that, although it is a member of an archetypal α-PFT family of toxins, NheA may be capable of forming a β rather than an α pore.
TesA from Pseudomonas aeruginosa belongs to the GDSL hydrolase family of serine esterases and lipases that possess a broad substrate- and regiospecificity. It shows high sequence homology to TAP, a multifunctional enzyme from Escherichia coli exhibiting thioesterase, lysophospholipase A, protease and arylesterase activities. Recently, we demonstrated high arylesterase activity for TesA, but only minor thioesterase and no protease activity. Here, we present a comparative analysis of TesA and TAP at the structural, biochemical and physiological levels. The crystal structure of TesA was determined at 1.9 Å and structural differences were identified, providing a possible explanation for the differences in substrate specificities. The comparison of TesA with other GDSL-hydrolase structures revealed that the flexibility of active-site loops significantly affects their substrate specificity. This assumption was tested using a rational approach: we have engineered the putative coenzyme A thioester binding site of E. coli TAP into TesA of P. aeruginosa by introducing mutations D17S and L162R. This TesA variant showed increased thioesterase activity comparable to that of TAP. TesA is the first lysophospholipase A described for the opportunistic human pathogen P. aeruginosa. The enzyme is localized in the periplasm and may exert important functions in the homeostasis of phospholipids or detoxification of lysophospholipids.
The effect of insecticide-treated materials on reducing visceral leishmaniasis (VL) is disputable. In Bangladesh, we evaluated the effect of a community-based intervention with insecticide impregnation of existing bed-nets in reducing VL incidence. This intervention reduced VL by 66.5%. Widespread bed-net impregnation with slow-release insecticide may control VL in Bangladesh.
Bangladesh; visceral leishmaniasis; vector control; bed-net impregnation; vector-borne infections; insecticides; Leishmania spp.; parasites; sandflies
The preparation and reactivity of steroidal vinyldiazo compounds is reported, providing a convenient, substituent tolerant, chemo- and stereoselective entry into 4- and 6-substituted androgen analogues from a common precursor. Under dirhodium catalysis, O—H insertion occurs at the carbenoid site, leading to 4-substituted steroids, but under silver catalysis, O—H insertion occurs at the vinylogous position, leading to 6-substituted steroids.
We explore the utility of p-value weighting for enhancing the power to detect differential metabolites in a two-sample setting. Related gene expression information is used to assign an a priori importance level to each metabolite being tested. We map the gene expression to a metabolite through pathways and then gene expression information is summarized per-pathway using gene set enrichment tests. Through simulation we explore four styles of enrichment tests and four weight functions to convert the gene information into a meaningful p-value weight. We implement the p-value weighting on a prostate cancer metabolomics dataset. Gene expression on matched samples is used to construct the weights. Under certain regulatory conditions, the use of weighted p-values does not in-flate the type I error above what we see for the un-weighted tests except in high correlation situations. The power to detect differential metabolites is notably increased in situations with disjoint pathways and shows moderate improvement, relative to the proportion of enriched pathways, when pathway membership overlaps.
Motivation: There is now a large literature on statistical methods for the meta-analysis of genomic data from multiple studies. However, a crucial assumption for performing many of these analyses is that the data exhibit small between-study variation or that this heterogeneity can be sufficiently modelled probabilistically.
Results: In this article, we propose ‘assumption weighting’, which exploits a weighted hypothesis testing framework proposed by Genovese et al. to incorporate tests of between-study variation into the meta-analysis context. This methodology is fast and computationally simple to implement. Several weighting schemes are considered and compared using simulation studies. In addition, we illustrate application of the proposed methodology using data from several high-profile stem cell gene expression datasets.
Visceral leishmaniasis (VL) is a major public health problem in Bangladesh with the highest disease burden in the Mymensingh District. The disease is transmitted by sand fly bites, but it may also be transmitted through blood transfusions. No information is available about the prevalence of Leishmania infection among blood donors in Bangladesh; therefore we aimed to investigate this question.
The study was carried out in the Blood Transfusion Department of Mymensingh Medical College Hospital. One thousand one hundred and ninety five adult healthy blood donors attending in this department were enrolled in the study from August 2010 to April 2011. After obtaining written consent, socio-demographic data and a detailed health history were collected. The medical officer in the unit performed a complete physical examination to exclude any acute or chronic diseases, which was followed by sero-diagnosis for exposure to Leishmania by rK39 strip test using finger prick blood. Blood donors with a positive rK39 strip test underwent a PCR test for detection of leishmania DNA in their peripheral blood buffy coat.
Eighty two percent of enrolled blood donors were male (n=985) and 18% (n=210) were female. The mean age of blood donors was 27 years (SD, 7.95 years). The majority of donors were literate and had mid-to-higher socioeconomic condition reflected by household conditions reported by the subject. Only 2.6% had a family member with VL in the past. Three blood donors were positive for leishmania infection by rK39 strip test (0.3%, 95%CI, 0.05%-0.73%). None of these 3 had active leishmania infection as demonstrated by PCR analysis. During six months of follow up, neither rK39 positive (n=3) nor rK39 negative (n=1192) donors developed VL.
The prevalence of Leishmania donovani infection among blood donors attending the Blood Transfusion Department of Mymensingh Medical College Hospital was very low. Therefore the chance for transmission of VL through blood transfusion is negligible. We believe that the National VL Elimination Program does not need set up routine screening for Leishmania donovani infection in blood transfusion departments located in VL endemic areas of Bangladesh.
Visceral leishmaniasis; Kala-azar; Blood donors; Transfusion; Leishmania donovani; Bangladesh
Conditional independence assumptions are very important in causal inference modelling as well as in dimension reduction methodologies. These are two very strikingly different statistical literatures, and we study links between the two in this article. The concept of covariate sufficiency plays an important role, and we provide theoretical justification when dimension reduction and partial least squares methods will allow for valid causal inference to be performed. The methods are illustrated with application to a medical study and to simulated data.
Average causal effect; matching; model misspecification; observational data; potential outcomes
Aromatase (CYP19A1) is an integral membrane enzyme that catalyzes the removal of the 19-methyl group and aromatization of the A-ring of androgens. All human estrogens are synthesized from their androgenic precursors by this unique cytochrome P450. The crystal structure of active aromatase purified from human placenta has recently been determined in complex with its natural substrate androstenedione in the high-spin ferric state of heme. Hydrogen bond forming interactions and tight packing hydrophobic side chains closely complement puckering of the steroid backbone, thereby providing the molecular basis for the androgenic specificity of aromatase. In the crystal, aromatase molecules are linked by a head-to-tail intermolecular interaction via a surface loop between helix D and helix E of one aromatase molecule that penetrates the heme-proximal cavity of the neighboring, crystallographically-related molecule, thus forming in tandem a polymeric aromatase chain. This intermolecular interaction is similar to the aromatase-Cytochrome P450 reductase coupling and is driven by electrostatics between the negative potential surface of the D-E loop region and the positively charged heme-proximal cavity. This loop-to-proximal site link in aromatase is rather unique - there are only a few of examples of somewhat similar intermolecular interactions in the entire P450 structure database. Furthermore, the amino acids involved in the intermolecular contact appear to be specific for aromatase. Higher order organization of aromatase monomers may have implications in lipid integration and catalysis.
The crystal structures of human placental aromatase in complex with the substrate androstenedione and exemestane have revealed an androgen-specific active site and the structural basis for higher order organization. However, X-ray structures do not provide accounts of movements due to short-range fluctuations, ligand binding and protein-protein association. In this work, we conduct normal mode analysis (NMA) revealing the intrinsic fluctuations of aromatase, deduce the internal modes in membrane-free and membrane-integrated monomers as well as the intermolecular modes in oligomers, and propose a quaternary organization for the endoplasmic reticulum (ER) membrane integration. Dynamics of the crystallographic oligomers from NMA is found to be in agreement with the isotropic thermal factors from the X-ray analysis. Calculations of the root mean square fluctuations of the C-alpha atoms from their equilibrium positions confirm that the rigid-core structure of aromatase is intrinsic regardless of the changes in steroid binding interactions, and that aromatase self-association does not deteriorate the rigidity of the catalytic cleft. Furthermore, NMA on membrane-integrated aromatase shows that the internal modes in all likelihood contribute to breathing of the active site access channel. The collective intermolecular hinge bending and twisting modes provide the flexibility in the quaternary association necessary for membrane integration of the aromatase oligomers. Taken together, fluctuations of the active site, the access channel, and the heme-proximal cavity, and a dynamic quaternary organization could all be essential components of the functional aromatase in its role as an ER membrane-embedded steroidogenic enzyme.
With the rapid advances of various high-throughput technologies, generation of ‘-omics’ data is commonplace in almost every biomedical field. Effective data management and analytical approaches are essential to fully decipher the biological knowledge contained in the tremendous amount of experimental data. Meta-analysis, a set of statistical tools for combining multiple studies of a related hypothesis, has become popular in genomic research. Here, we perform a systematic search from PubMed and manual collection to obtain 620 genomic meta-analysis papers, of which 333 microarray meta-analysis papers are summarized as the basis of this paper and the other 249 GWAS meta-analysis papers are discussed in the next companion paper. The review in the present paper focuses on various biological purposes of microarray meta-analysis, databases and software and related statistical procedures. Statistical considerations of such an analysis are further scrutinized and illustrated by a case study. Finally, several open questions are listed and discussed.
Over the last decade, genome-wide association studies (GWAS) have become the standard tool for gene discovery in human disease research. While debate continues about how to get the most out of these studies and on occasion about how much value these studies really provide, it is clear that many of the strongest results have come from large-scale mega-consortia and/or meta-analyses that combine data from up to dozens of studies and tens of thousands of subjects. While such analyses are becoming more and more common, statistical methods have lagged somewhat behind. There are good meta-analysis methods available, but even when they are carefully and optimally applied there remain some unresolved statistical issues. This article systematically reviews the GWAS meta-analysis literature, highlighting methodology and software options and reviewing methods that have been used in real studies. We illustrate differences among methods using a case study. We also discuss some of the unresolved issues and potential future directions.
During respiratory viral infections host injury occurs due in part to inappropriate host responses. In this study we sought to uncover the host transcriptional responses underlying differences between high- and low-pathogenic infections.
From a compendium of 12 studies that included responses to influenza A subtype H5N1, reconstructed 1918 influenza A virus, and SARS coronavirus, we used meta-analysis to derive multiple gene expression signatures. We compared these signatures by their capacity to segregate biological conditions by pathogenicity and predict pathogenicity in a test data set. The highest-performing signature was expressed as a continuum in low-, medium-, and high-pathogenicity samples, suggesting a direct, analog relationship between expression and pathogenicity. This signature comprised 57 genes including a subnetwork of chemokines, implicating dysregulated cell recruitment in injury.
Highly pathogenic viruses elicit expression of many of the same key genes as lower pathogenic viruses but to a higher degree. This increased degree of expression may result in the uncontrolled co-localization of inflammatory cell types and lead to irreversible host damage.
There has been substantive interest in the assessment of surrogate endpoints in medical research. These are measures which could potentially replace “true” endpoints in clinical trials and lead to studies that require less follow-up. Recent research in the area has focused on assessments using causal inference frameworks. Beginning with a simple model for associating the surrogate and true endpoints in the population, we approach the problem as one of endogenous covariates. An instrumental variables estimator and general two-stage algorithm is proposed. Existing surrogacy frameworks are then evaluated in the context of the model. In addition, we define an extended relative effect estimator as well as a sensitivity analysis for assessing what we term the treatment instrumentality assumption. A numerical example is used to illustrate the methodology.
Clinical Trial; Counterfactual; Nonlinear response; Prentice Criterion; Structural equations model
Brucella spp. are intracellular bacteria that cause an infectious disease called brucellosis in humans and many domestic and wildlife animals. B. suis primarily infects pigs and is pathogenic to humans. The macrophage-Brucella interaction is critical for the establishment of a chronic Brucella infection. Our studies showed that smooth virulent B. suis strain 1330 (S1330) prevented programmed cell death of infected macrophages and rough attenuated B. suis strain VTRS1 (a vaccine candidate) induced strong macrophage cell death. To further investigate the mechanism of VTRS1-induced macrophage cell death, microarrays were used to analyze temporal transcriptional responses of murine macrophage-like J774.A1 cells infected with S1330 or VTRS1. In total 17,685 probe sets were significantly regulated based on the effects of strain, time and their interactions. A miniTUBA dynamic Bayesian network analysis predicted that VTRS1-induced macrophage cell death was mediated by a proinflammatory gene (the tumor necrosis factor alpha [TNF-α] gene), an NF-κB pathway gene (the IκB-α gene), the caspase-2 gene, and several other genes. VTRS1 induced significantly higher levels of transcription of 40 proinflammatory genes than S1330. A Mann-Whitney U test confirmed the proinflammatory response in VTRS1-infected macrophages. Increased production of TNF-α and interleukin 1β (IL-1β) were also detected in the supernatants in VTRS1-infected macrophage cell culture. Hyperphosphorylation of IκB-α was observed in macrophages infected with VTRS1 but not S1330. The important roles of TNF-α and IκB-α in VTRS1-induced macrophage cell death were further confirmed by individual inhibition studies. VTRS1-induced macrophage cell death was significantly inhibited by a caspase-2 inhibitor but not a caspase-1 inhibitor. The role of caspase-2 in regulating the programmed cell death of VTRS1-infected macrophages was confirmed in another study using caspase-2-knockout mice. In summary, VTRS1 induces a proinflammatory, caspase-2- and NF-κB-mediated macrophage cell death. This unique cell death differs from apoptosis, which is not proinflammatory. It is also different from classical pyroptosis, which is caspase-1 mediated.
Enrichment testing assesses the overall evidence of differential expression behavior of the elements within a defined set. When we have measured many molecular aspects, e.g. gene expression, metabolites, proteins, it is desirable to assess their differential tendencies jointly across platforms using an integrated set enrichment test. In this work we explore the properties of several methods for performing a combined enrichment test using gene expression and metabolomics as the motivating platforms.
Using two simulation models we explored the properties of several enrichment methods including two novel methods: the logistic regression 2-degree of freedom Wald test and the 2-dimensional permutation p-value for the sum-of-squared statistics test. In relation to their univariate counterparts we find that the joint tests can improve our ability to detect results that are marginal univariately. We also find that joint tests improve the ranking of associated pathways compared to their univariate counterparts. However, there is a risk of Type I error inflation with some methods and self-contained methods lose specificity when the sets are not representative of underlying association.
In this work we show that consideration of data from multiple platforms, in conjunction with summarization via a priori pathway information, leads to increased power in detection of genomic associations with phenotypes.
A biomarker is defined to be a biological characteristic that is objectively measured and evaluated as an indicator of normal biologic processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention. The use of biomarkers in cancer has been advocated for a variety of purposes, which include use as surrogate endpoints, early detection of disease, proxies for environmental exposure and risk prediction. We deal with the latter issue in this paper.
Several authors have proposed use of the predictiveness curve for assessing the capacity of a biomarker for risk prediction. For most situations, it is reasonable to assume monotonicity of the biomarker effects on disease risk. In this article, we propose the use of flexible modelling of the predictiveness curve and its bivariate analogue, the predictiveness surface, through the use of spline algorithms that incorporate the appropriate monotonicity constraints. Estimation proceeds through use of a two-step algorithm that represents the “smooth, then monotonize” approach. Subsampling procedures are used for inference. The methods are illustrated to data from a melanoma study.
Active set algorithm; Isotonic regression; Nonregular asymptotics; Pool adjacent violators algorithm; Risk prediction; Thin-plate spline
The analysis of recurrent failure time data from longitudinal studies can be complicated by the presence of dependent censoring. There has been a substantive literature that has developed based on an artificial censoring device. We explore in this article the connection between this class of methods with truncated data structures. In addition, a new procedure is developed for estimation and inference in a joint model for recurrent events and dependent censoring. Estimation proceeds using a mixed U-statistic based estimating function approach. New resampling-based methods for variance estimation and model checking are also described. The methods are illustrated by application to data from an HIV clinical trial as with a limited simulation study.
Accelerated failure time model; Cause-specific hazard; Comparability; Competing risks; Empirical process; Semi-competing risks data
Genetic cases of congenital pituitary hormone deficiency are common and many are caused by transcription factor defects. Mouse models with orthologous mutations are invaluable for uncovering the molecular mechanisms that lead to problems in organ development and typical patient characteristics. We are using mutant mice defective in the transcription factors PROP1 and POU1F1 for gene expression profiling to identify target genes for these critical transcription factors and candidates for cases of pituitary hormone deficiency of unknown etiology. These studies reveal critical roles for Wnt signalling pathways including the TCF/LEF transcription factors and interacting proteins of the groucho family, bone morphogenetic proteins antagonists, and targets of notch signalling. Current studies are investigating roles of novel homeobox genes and pathways that regulate the transition from proliferation to differentiation, cell adhesion and cell migration.
Pituitary adenomas are a common human health problem, yet most cases are sporadic, necessitating alternative approaches to traditional Mendelian genetic studies. Mouse models of adenoma formation offer the opportunity for gene expression profiling during progressive stages of hyperplasia, adenoma and tumorigenesis. This approach holds promise for identification of relevant pathways and candidate genes as risk factors for adenoma formation, understanding mechanisms of progression, and identifying drug targets and clinically relevant biomarkers.
cell proliferation; apoptosis; transcription factors; Prop1; Emx2
Visceral leishmaniasis (VL), caused by an intracellular parasite Leishmania donovani in the Indian subcontinent, is considered to be anthroponotic. The role of domestic animals in its transmission is still unclear. Although cattle are the preferred blood host for Phlebotomus argentipes, the sandfly vector of VL in the Indian subcontinent, very little information is available for their role in the disease transmission. In this study, we examined domestic cattle for serological and molecular evidence of Leishmania infection in a VL-endemic area in Bangladesh. Blood samples from 138 domestic cattle were collected from houses with active or recently-treated VL and post-kala-azar dermal leishmaniasis patients. The presence of anti-leishmanial antibodies in serum was investigated using enzyme-linked immunosorbent assay (ELISA) and then with direct agglutination tests (DAT). Nested PCR (Ln PCR) was performed to amplify the ssu-rRNA gene using the DNA extracted from Buffy coat. Recently-developed molecular assay loop-mediated isothermal amplification (LAMP) was also performed for further sensitive detection of parasite DNA.
In this study, 9.4% (n = 13) of the cattle were found to be positive by ELISA. Of the 13 ELISA-positive cattle, only four (30.8%) were positive in DAT. Parasite DNA was not detected in either of the molecular assays (Ln PCR and LAMP).
The study confirmed the presence of antibodies against Leishmania parasite in cattle. However, the absence of Leishmania DNA in the cattle indicates clearly that the cattle do not play a role as reservoir host. Similar study needs to be undertaken in the Indian subcontinent to determine the role of other domestic animals on which sandflies feed.