In much of the analysis of high-throughput genomic data, “interesting” genes have been selected based on assessment of differential expression between two groups or generalizations thereof. Most of the literature focuses on changes in mean expression or the entire distribution. In this article, we explore the use of C(α) tests, which have been applied in other genomic data settings. Their use for the outlier expression problem, in particular with continuous data, is problematic but nevertheless motivates new statistics that give an unsupervised analog to previously developed outlier profile analysis approaches. Some simulation studies are used to evaluate the proposal. A bivariate extension is described that can accommodate data from two platforms on matched samples. The proposed methods are applied to data from a prostate cancer study.
biomarkers; genomic data integration; heterogeneity; microarray; mixture model; tumor subtypes
A compelling demonstration of adaptation by natural selection is the ability
of parasites to manipulate host behavior. One dramatic example involves fungal
species from the genus Ophiocordyceps that
control their ant hosts by inducing a biting behavior. Intensive sampling across
the globe of ants that died after being manipulated by Ophiocordyceps suggests that this phenomenon is highly
species-specific. We advance our understanding of this system by reconstructing
host manipulation by Ophiocordyceps parasites
under controlled laboratory conditions and combining this with field observations
of infection rates and a metabolomics survey.
We report on a newly discovered species of Ophiocordyceps unilateralis sensu lato from North America that we
use to address the species-specificity of Ophiocordyceps-induced manipulation of ant behavior. We show that
the fungus can kill all ant species tested, but only manipulates the behavior of
those it infects in nature. To investigate if this could be explained at the
molecular level, we used ex vivo culturing
assays to measure the metabolites that are secreted by the fungus to mediate
fungus-ant tissue interactions. We show the fungus reacts heterogeneously to
brains of different ant species by secreting a different array of metabolites. By
determining which ion peaks are significantly enriched when the fungus is grown
alongside brains of its naturally occurring host, we discovered candidate
compounds that could be involved in behavioral manipulation by O. unilateralis s.l.. Two of these candidates are known
to be involved in neurological diseases and cancer.
The integrative work presented here shows that ant brain manipulation by
O. unilateralis s.l. is species-specific
seemingly because the fungus produces a specific array of compounds as a reaction
to the presence of the host brain it has evolved to manipulate. These studies have
resulted in the discovery of candidate compounds involved in establishing
behavioral manipulation by this specialized fungus and therefore represent a major
advancement towards an understanding of the molecular mechanisms underlying this
Electronic supplementary material
The online version of this article (doi:10.1186/s12862-014-0166-3) contains supplementary material, which is available to authorized
Behavioral manipulation; Host specificity; Secretome; Metabolomics; Ophiocordyceps unilateralis
Cytochrome P450 aromatase (CYP19A1) is the only enzyme known to catalyze the biosynthesis of estrogens from androgens. The crystal structure of human placental aromatase (pArom) has paved the way toward understanding the structure–function relationships of this remarkable enzyme. Using an amino terminus-truncated recombinant human aromatase (rArom) construct, we investigate the roles of key amino acids in the active site, at the intermolecular interface, inside the access channel, and at the lipid–protein boundary for their roles in enzyme function and higher-order organization. Replacing the active site residue D309 with an N yields an inactive enzyme, consistent with its proposed involvement in aromatization. Mutation of R192 at the lipid interface, pivotal to the proton relay network in the access channel, results in the loss of enzyme activity. In addition to the distal catalytic residues, we show that mutation of K440 and Y361 of the heme-proximal region critically interferes with substrate binding, enzyme activity, and heme stability. The D–E loop deletion mutant Del7 that disrupts the intermolecular interaction significantly reduces enzyme activity. However, the less drastic Del4 and point mutants E181A and E181K do not. Furthermore, native gel electrophoresis, size-exclusion chromatography, and analytical ultracentrifugation are used to show that mutations in the intermolecular interface alter the quaternary organization of the enzyme in solution. As a validation for interpretation of the mutational results in the context of the innate molecule, we determine the crystal structure of rArom to show that the active site, tertiary, and quaternary structures are identical to those of pArom.
To determine if health-related quality-of-life and self-rated health are associated with mortality in persons with diabetes.
Survey and medical record data were obtained from 7,892 patients with diabetes in Translating Research Into Action for Diabetes (TRIAD), a multicenter prospective observational study of diabetes care in managed care. Vital status at follow-up was determined from the National Death Index. Multivariable proportional hazards models were used to determine if a generic measure of health-related quality-of-life (EQ-5D) and self-rated health measured at baseline were associated with 4-year all-cause, cardiovascular, and noncardiovascular mortality.
At baseline, the mean EQ-5D score for decedents was 0.73 (SD=0.20) and for survivors was 0.81 (SD=0.18) (p<0.0001). Fifty-five percent of decedents and 36% of survivors (p<0.0001) rated their health as fair or poor. Lower EQ-5D scores and fair or poor self-rated health were associated with higher rates of mortality after adjusting for the demographic, socioeconomic, and clinical risk factors for mortality.
Health-related quality-of-life and self-rated health predict mortality in persons with diabetes. Health-related quality-of-life and self-rated health may provide additional information on patient risk independent of demographic, socioeconomic, and clinical risk factors for mortality.
diabetes; mortality; QoL
The visceral leishmaniasis (VL) elimination program in Bangladesh is in its attack phase. The primary goal of this phase is to decrease the burden of VL as much as possible. Active case detection (ACD) by the fever camp method and an approach using past VL cases in the last 6–12 months have been found useful for detection of VL patients in the community. We aimed to explore the yield of Accelerated Active Case Detection (AACD) of non-self reporting VL as well as the factors that are associated with non-self reporting to hospitals in endemic communities of Bangladesh.
Our study was conducted in the Trishal sub-district of Mymensingh, a highly VL endemic region of Bangladesh. We used a two-stage sampling strategy from 12 VL endemic unions of Trishal. Two villages from each union were selected at random. We looked for VL patients who had self-reported to the hospital and were under treatment from these villages. Then we conducted AACD for VL cases in those villages using house-to-house visit. Suspected VL cases were referred to the Trishal hospital where diagnosis and treatment of VL was done following National Guidelines for VL case management. We collected socio-demographic information from patients or a patient guardian using a structured questionnaire.
The total number of VL cases was 51. Nineteen of 51 (37.3%) were identified by AACD. Poverty, female gender and poor knowledge about VL were independent factors associated with non self-reporting to the hospital.
Our primary finding is that AACD is a useful method for early detection of VL cases that would otherwise go unreported to the hospital in later stage due to poverty, poor knowledge about VL and gender inequity. We recommend that the National VL Program should consider AACD to strengthen its early VL case detection strategy.
To investigate visceral leishmaniasis (VL) deaths and risk factors in two VL endemic areas of Bangladesh.
Two geographically and culturally different VL endemic subdistricts, Godagari in the district of Rajshahi and Trishal in the district of Mymensingh in Bangladesh, August 2009–December 2011.
51 094 inhabitants from randomly selected Unions in the two subdistricts.
Main outcome measures
VL deaths, confirmed independently by qualified physicians using the verbal autopsy procedure ICD10 guideline.
The total number of people screened for VL deaths was 51 094 from 12 032 households from Godagari and Trishal subdistricts . About 16% of the people from Godagari were Tribals. The average age of the study population was 25.6 years (SD 18.4) and 49.7% were females. The VL case fatality rate averaged 6.12% (12/196) including 2/137 in Trishal and 10/59 in Godagari. Most of the VL deaths (9/12, 75%) occurred at home and the rest in tertiary hospitals. None of these deaths had been reported in the national VL surveillance system. The VL case fatality rate in the Tribal ethnic (22.2%) population was about 17 times higher than that in the Bangali ethnic (1.3%) population (p<0.0001). Tribal ethnicity had an 18 times (OR=18.1, 95% CI 3.6 to 90.6) higher risk for VL deaths compared with Bangali ethnicity (p<0.0001).
VL deaths were found to be high in the study areas and were under-reported. The Tribal ethnic population was at the highest risk for VL deaths. The national VL Elimination Programme should give special attention to the tribal community in the endemic areas, especially for those in Rajshahi, and should strengthen VL surveillance by including tertiary hospitals in the national surveillance system.
EPIDEMIOLOGY; INFECTIOUS DISEASES; PUBLIC HEALTH
Meta-analysis has become increasingly popular in recent years, especially in genomic data analysis, due to the fast growth of available data and studies that target the same questions. Many methods have been developed, including classical ones such as Fisher’s combined probability test and Stouffer’s Z-test. However, not all meta-analyses have the same goal in mind. Some aim at combining information to find signals in at least one of the studies, while others hope to find more consistent signals across the studies. While many classical meta-analysis methods are developed with the former goal in mind, the latter goal has much more practicality for genomic data analysis.
In this paper, we propose a class of meta-analysis methods based on summaries of weighted ordered p-values (WOP) that aim at detecting significance in a majority of studies. We consider weighted versions of classical procedures such as Fisher’s method and Stouffer’s method where the weight for each p-value is based on its order among the studies. In particular, we consider weights based on the binomial distribution, where the median of the p-values are weighted highest and the outlying p-values are down-weighted. We investigate the properties of our methods and demonstrate their strengths through simulations studies, comparing to existing procedures. In addition, we illustrate application of the proposed methodology by several meta-analysis of gene expression data.
Our proposed weighted ordered p-value (WOP) methods displayed better performance compared to existing methods for testing the hypothesis that there is signal in the majority of studies. They also appeared to be much more robust in applications compared to the rth ordered p-value (rOP) method (Song and Tseng, Ann. Appl. Stat. 2014, 8(2):777–800). With the flexibility of incorporating different p-value combination methods and different weighting schemes, the weighted ordered p-values (WOP) methods have great potential in detecting consistent signal in meta-analysis with heterogeneity.
Fisher’s combined probability test; Meta-analysis; Ordered p-values; Weighted order statistic
Studies have shown the strong association between histone modification levels and gene expression levels. The detailed relationships between the two can vary substantially due to differential regulation, and hence a simple regression model may not be adequate. We apply a regression hidden Markov model (regHMM) to further investigate the potential multiple relationships between genes and histone methylation levels in mouse embryonic stem cells.
Seven histone methylation levels are used in the study. Averaged histone modifications over non-overlapping 200 bp windows on the range transcription starting site (TSS) ± 1 Kb are used as predictors, and in total 70 explanatory variables are generated. Based on regHMM results, genes segregated into two groups, referred to as State 1 and State 2, have distinct association strengths. Genes in State 1 are better explained by histone methylation levels with R2=.72 while those in State 2 have weaker association strength with R2=.38. The regression coefficients in the two states are not very different in magnitude except in the intercept,.25 and 1.15 for State 1 and State 2, respectively. We found specific GO categories that may be attributed to the different relationships. The GO categories more frequently observed in State 2 match those of housekeeping genes, such as cytoplasm, nucleus, and protein binding. In addition, the housekeeping gene expression levels are significantly less explained by histone methylation in mouse embryonic stem cells, which is consistent with the constitutive expression patterns that would be expected.
Gene expression levels are not universally affected by histone methylation levels, and the relationships between the two differ by the gene functions. The expression levels of the genes that perform the most common housekeeping genes’ GO categories are less strongly associated with histone methylation levels. We suspect that additional biological factors may also be strongly associated with the gene expression levels in State 2. We discover that the effect of the presence of CpG island in TSS ± 1 Kb is larger in State 2.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-360) contains supplementary material, which is available to authorized users.
Regression hidden Markov model; Histone modification; Gene expression level; Mouse embryonic stem cell
In complex large-scale experiments, in addition to simultaneously considering a large number of features, multiple hypotheses are often being tested for each feature. This leads to a problem of multi-dimensional multiple testing. For example, in gene expression studies over ordered categories (such as time-course or dose-response experiments), interest is often in testing differential expression across several categories for each gene. In this paper, we consider a framework for testing multiple sets of hypothesis, which can be applied to a wide range of problems.
We adopt the concept of the overall false discovery rate (OFDR) for controlling false discoveries on the hypothesis set level. Based on an existing procedure for identifying differentially expressed gene sets, we discuss a general two-step hierarchical hypothesis set testing procedure, which controls the overall false discovery rate under independence across hypothesis sets. In addition, we discuss the concept of the mixed-directional false discovery rate (mdFDR), and extend the general procedure to enable directional decisions for two-sided alternatives. We applied the framework to the case of microarray time-course/dose-response experiments, and proposed three procedures for testing differential expression and making multiple directional decisions for each gene. Simulation studies confirm the control of the OFDR and mdFDR by the proposed procedures under independence and positive correlations across genes. Simulation results also show that two of our new procedures achieve higher power than previous methods. Finally, the proposed methodology is applied to a microarray dose-response study, to identify 17 β-estradiol sensitive genes in breast cancer cells that are induced at low concentrations.
The framework we discuss provides a platform for multiple testing procedures covering situations involving two (or potentially more) sources of multiplicity. The framework is easy to use and adaptable to various practical settings that frequently occur in large-scale experiments. Procedures generated from the framework are shown to maintain control of the OFDR and mdFDR, quantities that are especially relevant in the case of multiple hypothesis set testing. The procedures work well in both simulations and real datasets, and are shown to have better power than existing methods.
Multiple testing; Overall false discovery rate; Mixed-directional false discovery rate; Benjamini-Hochberg procedure; Microarray; Time course; Dose response
Motivation: Multiply correlated datasets have become increasingly common in genome-wide location analysis of regulatory proteins and epigenetic modifications. Their correlation can be directly incorporated into a statistical model to capture underlying biological interactions, but such modeling quickly becomes computationally intractable.
Results: We present sparsely correlated hidden Markov models (scHMM), a novel method for performing simultaneous hidden Markov model (HMM) inference for multiple genomic datasets. In scHMM, a single HMM is assumed for each series, but the transition probability in each series depends on not only its own hidden states but also the hidden states of other related series. For each series, scHMM uses penalized regression to select a subset of the other data series and estimate their effects on the odds of each transition in the given series. Following this, hidden states are inferred using a standard forward–backward algorithm, with the transition probabilities adjusted by the model at each position, which helps retain the order of computation close to fitting independent HMMs (iHMM). Hence, scHMM is a collection of inter-dependent non-homogeneous HMMs, capable of giving a close approximation to a fully multivariate HMM fit. A simulation study shows that scHMM achieves comparable sensitivity to the multivariate HMM fit at a much lower computational cost. The method was demonstrated in the joint analysis of 39 histone modifications, CTCF and RNA polymerase II in human CD4+ T cells. scHMM reported fewer high-confidence regions than iHMM in this dataset, but scHMM could recover previously characterized histone modifications in relevant genomic regions better than iHMM. In addition, the resulting combinatorial patterns from scHMM could be better mapped to the 51 states reported by the multivariate HMM method of Ernst and Kellis.
Availability: The scHMM package can be freely downloaded from http://sourceforge.net/p/schmm/ and is recommended for use in a linux environment.
firstname.lastname@example.org or email@example.com
Supplementary data are available at Bioinformatics online.
Background. For the treatment of visceral leishmaniasis in Bangladesh, single dose liposomal amphotericin B (ambisome) is supposed to be the safest and most effective treatment. Specific needs for application and storage raise questions about feasibility of its implementation and acceptance by patients and health staff. Methods. The study was carried out in the most endemic district of Bangladesh. Study population includes patients treated with ambisome or miltefosine, hospital staff, and a director of the national visceral leishmaniasis program. Study methods include direct observation (subdistrict hospitals), open interviews (heath staff and program personnel), structured questionnaires, and focus group discussions (patients). Results. Politicalcommitment for ambisome is strong; the general hospital infrastructure favours implementation but further strengthening is required, particularly for drug storage below 25°C (refrigerators), back-up energy (fuel for generators), and supplies for ambisome administration (like 5% dextrose solution). Ambisome created high satisfaction in patients and hospital staff, less adverse events, and less income loss for patients compared to miltefosine. Conclusions. High political commitment, general capacities of subdistrict hospitals, and high acceptability favour the implementation of ambisome treatment in Bangladesh. However, strengthening of the infrastructure and uninterrupted supplies of essential accessories is mandatory before introducing sLAB in Bangladesh.
Providing personalized treatments designed to maximize benefits and minimizing harms is of tremendous current medical interest. One problem in this area is the evaluation of the interaction between the treatment and other predictor variables. Treatment effects in subgroups having the same direction but different magnitudes are called quantitative interactions, while those having opposite directions in subgroups are called qualitative interactions (QIs). Identifying QIs is challenging since they are rare and usually unknown among many potential biomarkers. Meanwhile, subgroup analysis reduces the power of hypothesis testing and multiple subgroup analyses inflate the type I error rate. We propose a new Bayesian approach to search for QI in a multiple regression setting with adaptive decision rules. We consider various regression models for the outcome. This method is illustrated in two examples of Phase III clinical trials. The algorithm is straightforward and easy to implement using existing software packages. A sample code was provided in the appendix.
Interaction; Subgroup; Predictive Marker; Prognostic Marker; Clinical Trial
Autism Spectrum Disorder (ASD) occurs more often among males than females in a 4:1 ratio. Among theories used to explain the causes of ASD, the X chromosome and the Y chromosome theories attribute ASD to the X-linked mutation and the male-limited gene expressions on the Y chromosome, respectively. Despite the rationale of the theory, studies have failed to attribute the sex-biased ratio to the significant linkage or association on the regions of interest on X chromosome. We further study the gender biased ratio by examining the possible interaction effects between two genes in the sex chromosomes. We propose a logistic regression model with mixed effects to detect gene–gene interactions on sex chromosomes. We investigated the power and type I error rates of the approach for a range of minor allele frequencies and varying linkage disequilibrium between markers and QTLs. We also evaluated the robustness of the model to population stratification. We applied the model to a trio-family data set with an ASD affected male child to study gene–gene interactions on sex chromosomes.
binary traits; gene–gene interaction; generalized linear mixed effect model; logistic model; trio data; sex chromosomes
Human cytochrome P450 aromatase catalyzes with high specificity the synthesis of estrogens from androgens. Aromatase inhibitors (AIs) such as exemestane, 6-methylideneandrosta-1,4-diene-3,17-dione, are preeminent drugs for the treatment of estrogen-dependent breast cancer. The crystal structure of human placental aromatase has shown an androgen-specific active site. By utilization of the structural data, novel C6-substituted androsta-1,4-diene-3,17-dione inhibitors have been designed. Several of the C6-substituted 2-alkynyloxy compounds inhibit purified placental aromatase with IC50 values in the nanomolar range. Antiproliferation studies in a MCF-7 breast cancer cell line demonstrate that some of these compounds have EC50 values better than 1 nM, exceeding that for exemestane. X-ray structures of aromatase complexes of two potent compounds reveal that, per their design, the novel side groups protrude into the opening to the access channel unoccupied in the enzyme–substrate/exemestane complexes. The observed structure–activity relationship is borne out by the X-ray data. Structure-guided design permits utilization of the aromatase-specific interactions for the development of next generation AIs.
In this note, we address the problem of surrogacy using a causal modelling framework that differs substantially from the potential outcomes model that pervades the biostatistical literature. The framework comes from econometrics and conceptualizes direct effects of the surrogate endpoint on the true endpoint. While this framework can incorporate the so-called semi-competing risks data structure, we also derive a fundamental non-identifiability result. Relationships to existing causal modelling frameworks are also discussed.
Clinical Trial; Counterfactual; Dependence; Nonlinear response; Prentice Criterion; Rubin causal model
The effect of insecticide-treated materials on reducing visceral leishmaniasis (VL) is disputable. In Bangladesh, we evaluated the effect of a community-based intervention with insecticide impregnation of existing bed-nets in reducing VL incidence. This intervention reduced VL by 66.5%. Widespread bed-net impregnation with slow-release insecticide may control VL in Bangladesh.
Bangladesh; visceral leishmaniasis; vector control; bed-net impregnation; vector-borne infections; insecticides; Leishmania spp.; parasites; sandflies
The preparation and reactivity of steroidal vinyldiazo compounds is reported, providing a convenient, substituent tolerant, chemo- and stereoselective entry into 4- and 6-substituted androgen analogues from a common precursor. Under dirhodium catalysis, O—H insertion occurs at the carbenoid site, leading to 4-substituted steroids, but under silver catalysis, O—H insertion occurs at the vinylogous position, leading to 6-substituted steroids.
We explore the utility of p-value weighting for enhancing the power to detect differential metabolites in a two-sample setting. Related gene expression information is used to assign an a priori importance level to each metabolite being tested. We map the gene expression to a metabolite through pathways and then gene expression information is summarized per-pathway using gene set enrichment tests. Through simulation we explore four styles of enrichment tests and four weight functions to convert the gene information into a meaningful p-value weight. We implement the p-value weighting on a prostate cancer metabolomics dataset. Gene expression on matched samples is used to construct the weights. Under certain regulatory conditions, the use of weighted p-values does not in-flate the type I error above what we see for the un-weighted tests except in high correlation situations. The power to detect differential metabolites is notably increased in situations with disjoint pathways and shows moderate improvement, relative to the proportion of enriched pathways, when pathway membership overlaps.
Motivation: There is now a large literature on statistical methods for the meta-analysis of genomic data from multiple studies. However, a crucial assumption for performing many of these analyses is that the data exhibit small between-study variation or that this heterogeneity can be sufficiently modelled probabilistically.
Results: In this article, we propose ‘assumption weighting’, which exploits a weighted hypothesis testing framework proposed by Genovese et al. to incorporate tests of between-study variation into the meta-analysis context. This methodology is fast and computationally simple to implement. Several weighting schemes are considered and compared using simulation studies. In addition, we illustrate application of the proposed methodology using data from several high-profile stem cell gene expression datasets.
Visceral leishmaniasis (VL) is a major public health problem in Bangladesh with the highest disease burden in the Mymensingh District. The disease is transmitted by sand fly bites, but it may also be transmitted through blood transfusions. No information is available about the prevalence of Leishmania infection among blood donors in Bangladesh; therefore we aimed to investigate this question.
The study was carried out in the Blood Transfusion Department of Mymensingh Medical College Hospital. One thousand one hundred and ninety five adult healthy blood donors attending in this department were enrolled in the study from August 2010 to April 2011. After obtaining written consent, socio-demographic data and a detailed health history were collected. The medical officer in the unit performed a complete physical examination to exclude any acute or chronic diseases, which was followed by sero-diagnosis for exposure to Leishmania by rK39 strip test using finger prick blood. Blood donors with a positive rK39 strip test underwent a PCR test for detection of leishmania DNA in their peripheral blood buffy coat.
Eighty two percent of enrolled blood donors were male (n=985) and 18% (n=210) were female. The mean age of blood donors was 27 years (SD, 7.95 years). The majority of donors were literate and had mid-to-higher socioeconomic condition reflected by household conditions reported by the subject. Only 2.6% had a family member with VL in the past. Three blood donors were positive for leishmania infection by rK39 strip test (0.3%, 95%CI, 0.05%-0.73%). None of these 3 had active leishmania infection as demonstrated by PCR analysis. During six months of follow up, neither rK39 positive (n=3) nor rK39 negative (n=1192) donors developed VL.
The prevalence of Leishmania donovani infection among blood donors attending the Blood Transfusion Department of Mymensingh Medical College Hospital was very low. Therefore the chance for transmission of VL through blood transfusion is negligible. We believe that the National VL Elimination Program does not need set up routine screening for Leishmania donovani infection in blood transfusion departments located in VL endemic areas of Bangladesh.
Visceral leishmaniasis; Kala-azar; Blood donors; Transfusion; Leishmania donovani; Bangladesh
Conditional independence assumptions are very important in causal inference modelling as well as in dimension reduction methodologies. These are two very strikingly different statistical literatures, and we study links between the two in this article. The concept of covariate sufficiency plays an important role, and we provide theoretical justification when dimension reduction and partial least squares methods will allow for valid causal inference to be performed. The methods are illustrated with application to a medical study and to simulated data.
Average causal effect; matching; model misspecification; observational data; potential outcomes
Aromatase (CYP19A1) is an integral membrane enzyme that catalyzes the removal of the 19-methyl group and aromatization of the A-ring of androgens. All human estrogens are synthesized from their androgenic precursors by this unique cytochrome P450. The crystal structure of active aromatase purified from human placenta has recently been determined in complex with its natural substrate androstenedione in the high-spin ferric state of heme. Hydrogen bond forming interactions and tight packing hydrophobic side chains closely complement puckering of the steroid backbone, thereby providing the molecular basis for the androgenic specificity of aromatase. In the crystal, aromatase molecules are linked by a head-to-tail intermolecular interaction via a surface loop between helix D and helix E of one aromatase molecule that penetrates the heme-proximal cavity of the neighboring, crystallographically-related molecule, thus forming in tandem a polymeric aromatase chain. This intermolecular interaction is similar to the aromatase-Cytochrome P450 reductase coupling and is driven by electrostatics between the negative potential surface of the D-E loop region and the positively charged heme-proximal cavity. This loop-to-proximal site link in aromatase is rather unique - there are only a few of examples of somewhat similar intermolecular interactions in the entire P450 structure database. Furthermore, the amino acids involved in the intermolecular contact appear to be specific for aromatase. Higher order organization of aromatase monomers may have implications in lipid integration and catalysis.
The crystal structures of human placental aromatase in complex with the substrate androstenedione and exemestane have revealed an androgen-specific active site and the structural basis for higher order organization. However, X-ray structures do not provide accounts of movements due to short-range fluctuations, ligand binding and protein-protein association. In this work, we conduct normal mode analysis (NMA) revealing the intrinsic fluctuations of aromatase, deduce the internal modes in membrane-free and membrane-integrated monomers as well as the intermolecular modes in oligomers, and propose a quaternary organization for the endoplasmic reticulum (ER) membrane integration. Dynamics of the crystallographic oligomers from NMA is found to be in agreement with the isotropic thermal factors from the X-ray analysis. Calculations of the root mean square fluctuations of the C-alpha atoms from their equilibrium positions confirm that the rigid-core structure of aromatase is intrinsic regardless of the changes in steroid binding interactions, and that aromatase self-association does not deteriorate the rigidity of the catalytic cleft. Furthermore, NMA on membrane-integrated aromatase shows that the internal modes in all likelihood contribute to breathing of the active site access channel. The collective intermolecular hinge bending and twisting modes provide the flexibility in the quaternary association necessary for membrane integration of the aromatase oligomers. Taken together, fluctuations of the active site, the access channel, and the heme-proximal cavity, and a dynamic quaternary organization could all be essential components of the functional aromatase in its role as an ER membrane-embedded steroidogenic enzyme.
With the rapid advances of various high-throughput technologies, generation of ‘-omics’ data is commonplace in almost every biomedical field. Effective data management and analytical approaches are essential to fully decipher the biological knowledge contained in the tremendous amount of experimental data. Meta-analysis, a set of statistical tools for combining multiple studies of a related hypothesis, has become popular in genomic research. Here, we perform a systematic search from PubMed and manual collection to obtain 620 genomic meta-analysis papers, of which 333 microarray meta-analysis papers are summarized as the basis of this paper and the other 249 GWAS meta-analysis papers are discussed in the next companion paper. The review in the present paper focuses on various biological purposes of microarray meta-analysis, databases and software and related statistical procedures. Statistical considerations of such an analysis are further scrutinized and illustrated by a case study. Finally, several open questions are listed and discussed.