PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1597666)

Clipboard (0)
None

Related Articles

1.  Where Have All the Interactions Gone? Estimating the Coverage of Two-Hybrid Protein Interaction Maps 
PLoS Computational Biology  2007;3(11):e214.
Yeast two-hybrid screens are an important method for mapping pairwise physical interactions between proteins. The fraction of interactions detected in independent screens can be very small, and an outstanding challenge is to determine the reason for the low overlap. Low overlap can arise from either a high false-discovery rate (interaction sets have low overlap because each set is contaminated by a large number of stochastic false-positive interactions) or a high false-negative rate (interaction sets have low overlap because each misses many true interactions). We extend capture–recapture theory to provide the first unified model for false-positive and false-negative rates for two-hybrid screens. Analysis of yeast, worm, and fly data indicates that 25% to 45% of the reported interactions are likely false positives. Membrane proteins have higher false-discovery rates on average, and signal transduction proteins have lower rates. The overall false-negative rate ranges from 75% for worm to 90% for fly, which arises from a roughly 50% false-negative rate due to statistical undersampling and a 55% to 85% false-negative rate due to proteins that appear to be systematically lost from the assays. Finally, statistical model selection conclusively rejects the Erdös-Rényi network model in favor of the power law model for yeast and the truncated power law for worm and fly degree distributions. Much as genome sequencing coverage estimates were essential for planning the human genome sequencing project, the coverage estimates developed here will be valuable for guiding future proteomic screens. All software and datasets are available in Datasets S1 and S2, Figures S1–S5, and Tables S1−S6, and are also available from our Web site, http://www.baderzone.org.
Author Summary
The genome sequence of an organism provides a parts list of proteins, but not an instruction manual for assembling the parts into a cell. Assembly instructions now come from experiments such as two-hybrid screens that detect physical interactions between pairs of proteins. Defining the resources required for generating a full interaction map requires accurate estimates of the false-negative and false-positive rates of genome-scale screens. Two-hybrid screens often select a query protein and sample its interaction partners. True partners may be missed, and false partners may be spuriously identified. This sampling process resembles a capture–recapture experiment, except that classical capture–recapture theory assumes no false positives. Novel extensions to capture–recapture theory permit its application to proteomic screens. This new theory provides statistically grounded answers to long-standing questions: false-discovery rates of high-throughput screens (possibly over 50% per unique interaction, but probably no more than 15% per clone); the quality of different screening libraries; protein properties leading to “sticky” or “promiscuous” interactions; the global network topology; and, most importantly, the coverage of existing two-hybrid maps. Models estimate roughly 30,000 total pairwise interactions in yeast and 500,000 to 1,000,000 in metazoans. The majority of these interactions remain to be discovered.
doi:10.1371/journal.pcbi.0030214
PMCID: PMC2082503  PMID: 18039026
2.  Predicting Co-Complexed Protein Pairs from Heterogeneous Data 
PLoS Computational Biology  2008;4(4):e1000054.
Proteins do not carry out their functions alone. Instead, they often act by participating in macromolecular complexes and play different functional roles depending on the other members of the complex. It is therefore interesting to identify co-complex relationships. Although protein complexes can be identified in a high-throughput manner by experimental technologies such as affinity purification coupled with mass spectrometry (APMS), these large-scale datasets often suffer from high false positive and false negative rates. Here, we present a computational method that predicts co-complexed protein pair (CCPP) relationships using kernel methods from heterogeneous data sources. We show that a diffusion kernel based on random walks on the full network topology yields good performance in predicting CCPPs from protein interaction networks. In the setting of direct ranking, a diffusion kernel performs much better than the mutual clustering coefficient. In the setting of SVM classifiers, a diffusion kernel performs much better than a linear kernel. We also show that combination of complementary information improves the performance of our CCPP recognizer. A summation of three diffusion kernels based on two-hybrid, APMS, and genetic interaction networks and three sequence kernels achieves better performance than the sequence kernels or diffusion kernels alone. Inclusion of additional features achieves a still better ROC50 of 0.937. Assuming a negative-to-positive ratio of 600∶1, the final classifier achieves 89.3% coverage at an estimated false discovery rate of 10%. Finally, we applied our prediction method to two recently described APMS datasets. We find that our predicted positives are highly enriched with CCPPs that are identified by both datasets, suggesting that our method successfully identifies true CCPPs. An SVM classifier trained from heterogeneous data sources provides accurate predictions of CCPPs in yeast. This computational method thereby provides an inexpensive method for identifying protein complexes that extends and complements high-throughput experimental data.
Author Summary
Many proteins perform their jobs as part of multi-protein units called complexes, and several technologies exist to identify these complexes and their components with varying precision and throughput. In this work, we describe and apply a computational framework for combining a variety of experimental data to identify pairs of yeast proteins that partipicate in a complex—so-called co-complexed protein pairs (CCPPs). The method uses machine learning to generalize from well-characterized CCPPs, making predictions of novel CCPPs on the basis of sequence similarity, tandem affinity mass spectrometry data, yeast two-hybrid data, genetic interactions, microarray expression data, ChIP-chip assays, and colocalization by fluorescence microscopy. The resulting model accurately summarizes this heterogeneous body of data: in a cross-validated test, the model achieves an estimated coverage of 89% at a false discovery rate of 10%. The final collection of predicted CCPPs is available as a public resource. These predictions, as well as the general methodology described here, provide a valuable summary of diverse yeast interaction data and generate quantitative, testable hypotheses about novel CCPPs.
doi:10.1371/journal.pcbi.1000054
PMCID: PMC2275314  PMID: 18421371
3.  Are scale-free networks robust to measurement errors? 
BMC Bioinformatics  2005;6:119.
Background
Many complex random networks have been found to be scale-free. Existing literature on scale-free networks has rarely considered potential false positive and false negative links in the observed networks, especially in biological networks inferred from high-throughput experiments. Therefore, it is important to study the impact of these measurement errors on the topology of the observed networks.
Results
This article addresses the impact of erroneous links on network topological inference and explores possible error mechanisms for scale-free networks with an emphasis on Saccharomyces cerevisiae protein interaction networks. We study this issue by both theoretical derivations and simulations. We show that the ignorance of erroneous links in network analysis may lead to biased estimates of the scale parameter and recommend robust estimators in such scenarios. Possible error mechanisms of yeast protein interaction networks are explored by comparisons between real data and simulated data.
Conclusion
Our studies show that, in the presence of erroneous links, the connectivity distribution of scale-free networks is still scale-free for the middle range connectivities, but can be greatly distorted for low and high connecitivities. It is more appropriate to use robust estimators such as the least trimmed mean squares estimator to estimate the scale parameter γ under such circumstances. Moreover, we show by simulation studies that the scale-free property is robust to some error mechanisms but untenable to others. The simulation results also suggest that different error mechanisms may be operating in the yeast protein interaction networks produced from different data sources. In the MIPS gold standard protein interaction data, there appears to be a high rate of false negative links, and the false negative and false positive rates are more or less constant across proteins with different connectivities. However, the error mechanism of yeast two-hybrid data may be very different, where the overall false negative rate is low and the false negative rates tend to be higher for links involving proteins with more interacting partners.
doi:10.1186/1471-2105-6-119
PMCID: PMC1156868  PMID: 15904487
4.  Scoping Review on Search Queries and Social Media for Disease Surveillance: A Chronology of Innovation 
Background
The threat of a global pandemic posed by outbreaks of influenza H5N1 (1997) and Severe Acute Respiratory Syndrome (SARS, 2002), both diseases of zoonotic origin, provoked interest in improving early warning systems and reinforced the need for combining data from different sources. It led to the use of search query data from search engines such as Google and Yahoo! as an indicator of when and where influenza was occurring. This methodology has subsequently been extended to other diseases and has led to experimentation with new types of social media for disease surveillance.
Objective
The objective of this scoping review was to formally assess the current state of knowledge regarding the use of search queries and social media for disease surveillance in order to inform future work on early detection and more effective mitigation of the effects of foodborne illness.
Methods
Structured scoping review methods were used to identify, characterize, and evaluate all published primary research, expert review, and commentary articles regarding the use of social media in surveillance of infectious diseases from 2002-2011.
Results
Thirty-two primary research articles and 19 reviews and case studies were identified as relevant. Most relevant citations were peer-reviewed journal articles (29/32, 91%) published in 2010-11 (28/32, 88%) and reported use of a Google program for surveillance of influenza. Only four primary research articles investigated social media in the context of foodborne disease or gastroenteritis. Most authors (21/32 articles, 66%) reported that social media-based surveillance had comparable performance when compared to an existing surveillance program. The most commonly reported strengths of social media surveillance programs included their effectiveness (21/32, 66%) and rapid detection of disease (21/32, 66%). The most commonly reported weaknesses were the potential for false positive (16/32, 50%) and false negative (11/32, 34%) results. Most authors (24/32, 75%) recommended that social media programs should primarily be used to support existing surveillance programs.
Conclusions
The use of search queries and social media for disease surveillance are relatively recent phenomena (first reported in 2006). Both the tools themselves and the methodologies for exploiting them are evolving over time. While their accuracy, speed, and cost compare favorably with existing surveillance systems, the primary challenge is to refine the data signal by reducing surrounding noise. Further developments in digital disease surveillance have the potential to improve sensitivity and specificity, passively through advances in machine learning and actively through engagement of users. Adoption, even as supporting systems for existing surveillance, will entail a high level of familiarity with the tools and collaboration across jurisdictions.
doi:10.2196/jmir.2740
PMCID: PMC3785982  PMID: 23896182
disease; surveillance; social media; review
5.  Divergence of nucleosome positioning between two closely related yeast species: genetic basis and functional consequences 
Inter-species hybrids can be used to dissect the relative contribution of cis and trans effects to the evolution of nucleosome positioning. Most (∼70%) differences in nucleosome positioning between two closely related yeast species are due to cis effects.Cis effects are primarily due to divergence of AT-rich nucleosome-disfavoring sequences, but are not associated with divergence of nucleosome-favoring sequences.Differences in nucleosome positioning propagate to multiple adjacent nucleosomes, supporting the statistical positioning hypothesis.Divergence of nucleosome positioning is excluded from regulatory elements and is not correlated with gene expression divergence, suggesting a neutral mode of evolution.
Phenotypic diversity is often due to changes in gene regulation, and recent studies have characterized extensive differences between the gene expression programs of closely related species (Khaitovich et al, 2006; Tirosh et al, 2009). However, very little is known about the mechanisms that drive this divergence. Here, we analyze the evolution of nucleosome positioning, by comparing the patterns of nucleosomes between two yeast species, as well as generating the allele-specific nucleosome profile in their hybrid. We ask two main questions: (1) what is the genetic basis of inter-species differences in nucleosome positioning? and (2) what is the regulatory function of these differences?
Generally speaking, we can classify the genetic basis of the divergence in nucleosome positioning into two mechanisms. First, mutations in the local DNA sequence may influence the ability to bind nucleosomes at this region; we refer to these as cis effects. Second, mutations may affect the activity of various proteins that alter nucleosome positioning either actively (e.g. chromatin-remodeling enzymes) or by simply competing with nucleosomes for binding to the same DNA sequence (e.g. transcription factors); we refer to these as trans effects.
To classify the observed inter-species differences into cis versus trans effects, we measured allele-specific nucleosome positions within the inter-specific hybrid of the two species (Wittkopp et al, 2004; Tirosh et al, 2009). The hybrid contains the alleles of both species; hence, cis effects, which involve mutations that discriminate between the two alleles, will be maintained in the hybrid so that nucleosome positioning will be different between the alleles coming from the different species. Trans effects, in contrast, will not discriminate between the two hybrid alleles from the different species, as these two alleles reside together at the same trans environment (hybrid nucleus) and are thus regulated by the same set of proteins—the combination of proteins from the two species. Using this approach, we found that ∼70% of the inter-species differences in nucleosome positioning are due to cis effects, whereas the rest is due to trans effects.
The local DNA sequence is indeed known to affect nucleosome positions, and many features of DNA sequences were proposed to influence nucleosome binding, either by rejecting nucleosomes, or by being favorable for nucleosome binding (Segal et al, 2006; Lee et al, 2007; Kaplan et al, 2009). We find, however, that nucleosome positions diverged primarily through changes in AT-rich sequences, which exclude nucleosomes, whereas mutations in sequences that correlate with high-nucleosome occupancy do not influence inter-species divergence.
Nucleosomes restrict the access of proteins to the DNA and may thus affect DNA-related processes such as transcription, recombination or replication. Indeed, promoters and regulatory sequences are often depleted of nucleosomes, and highly transcribed genes are associated with low occupancy of nucleosomes at their promoters (Lee et al, 2007). Several earlier studies also suggested that evolutionary divergence of gene expression is driven by changes in chromatin structure (Lee et al, 2006; Choi and Kim, 2008; Tirosh et al, 2008; Field et al, 2009). However, we find that nucleosome positions (or occupancy) at regulatory elements are largely conserved, and furthermore, that the inter-species differences in nucleosome positions do not correlate with gene expression differences. These results suggest that nucleosome positioning is not a central mechanism for evolutionary changes in gene regulation and that most of the observed changes may be due to neutral drift.
Does the apparent low influence of nucleosome positioning on gene expression divergence implies that nucleosome positions do not have a function in gene regulation? To address this, we examined two additional modes of gene regulation: transcriptional response to changes in growth conditions (glucose versus glycerol media), and the expression differences between different cell types (haploid versus diploid cells). Consistent with earlier studies, we found that the response to growth conditions is significantly, albeit weakly, associated with changes in nucleosome positioning. Interestingly, we also found a strikingly strong association between gene expression and nucleosomal changes in the two cell types. Taken together, these results suggest that nucleosome positioning is used preferentially for biological processes in which genes are turned on and off (e.g. different cell type), but less so during divergence of closely related species in which gradual changes accumulate over time.
Gene regulation differs greatly between related species, constituting a major source of phenotypic diversity. Recent studies characterized extensive differences in the gene expression programs of closely related species. In contrast, virtually nothing is known about the evolution of chromatin structure and how it influences the divergence of gene expression. Here, we compare the genome-wide nucleosome positioning of two closely related yeast species and, by profiling their inter-specific hybrid, trace the genetic basis of the observed differences into mutations affecting the local DNA sequences (cis effects) or the upstream regulators (trans effects). The majority (∼70%) of inter-species differences is due to cis effects, leaving a significant contribution (30%) for trans factors. We show that cis effects are well explained by mutations in nucleosome-disfavoring AT-rich sequences, but are not associated with divergence of nucleosome-favoring sequences. Differences in nucleosome positioning propagate to multiple adjacent nucleosomes, supporting the statistical positioning hypothesis, and we provide evidence that nucleosome-free regions, but not the +1 nucleosome, serve as stable border elements. Surprisingly, although we find that differential nucleosome positioning among cell types is strongly correlated with differential expression, this does not seem to be the case for evolutionary changes: divergence of nucleosome positioning is excluded from regulatory elements and is not correlated with gene expression divergence, suggesting a primarily neutral mode of evolution. Our results provide evolutionary insights to the genetic determinants and regulatory function of nucleosome positioning.
doi:10.1038/msb.2010.20
PMCID: PMC2890324  PMID: 20461072
evolution; gene regulation; nucleosome positioning
6.  High throughput flow cytometry based yeast two-hybrid array approach for large-scale analysis of protein-protein interactions 
The analysis of protein-protein-interactions is a key focus of proteomics efforts. The yeast two-hybrid system has been the most commonly used method in genome-wide searches for protein interaction partners. However, the throughput of the current yeast two-hybrid array approach is hampered by the involvement of the time-consuming LacZ assay and/or the incompatibility of liquid handling automation due to the requirement for selection of colonies/diploids on agar plates. To facilitate large-scale yeast two-hybrid assays, we report a novel array approach by coupling a GFP reporter based yeast two-hybrid system with high throughput flow cytometry that enables the processing of a 96 well plate in as little as 3 minutes. In this approach, the yEGFP reporter has been established in both AH109 (MATa) and Y187 (MATα) reporter cells. It not only allows the generation of two copies of GFP reporter genes in diploid cells, but also allows the convenient determination of self-activators generated from both bait and prey constructs by flow cytometry. We demonstrate a Y2H array assay procedure that is carried out completely in liquid media in 96-well plates by mating bait and prey cells in liquid YPD media, selecting the diploids containing positive interaction pairs in selective media and analyzing the GFP reporter directly by flow cytometry. We have evaluated this flow cytometry based array procedure by showing that the interaction of the positive control pair P53/T is able to be reproducibly detected at 72 hrs post-mating compared to the negative control pairs. We conclude that our flow cytometry based yeast two-hybrid approach is robust, convenient, quantitative, and is amenable to large-scale analysis using liquid-handling automation.
doi:10.1002/cyto.a.21144
PMCID: PMC3250062  PMID: 21954189
HT flow cytometry; Protein-protein interaction; Yeast two-hybrid system; Array approach
7.  Social Media and Rating Sites as Tools to Understanding Quality of Care: A Scoping Review 
Background
Insight into the quality of health care is important for any stakeholder including patients, professionals, and governments. In light of a patient-centered approach, it is essential to assess the quality of health care from a patient’s perspective, which is commonly done with surveys or focus groups. Unfortunately, these “traditional” methods have significant limitations that include social desirability bias, a time lag between experience and measurement, and difficulty reaching large groups of people. Information on social media could be of value to overcoming these limitations, since these new media are easy to use and are used by the majority of the population. Furthermore, an increasing number of people share health care experiences online or rate the quality of their health care provider on physician rating sites. The question is whether this information is relevant to determining or predicting the quality of health care.
Objective
The goal of our research was to systematically analyze the relation between information shared on social media and quality of care.
Methods
We performed a scoping review with the following goals: (1) to map the literature on the association between social media and quality of care, (2) to identify different mechanisms of this relationship, and (3) to determine a more detailed agenda for this relatively new research area. A recognized scoping review methodology was used. We developed a search strategy based on four themes: social media, patient experience, quality, and health care. Four online scientific databases were searched, articles were screened, and data extracted. Results related to the research question were described and categorized according to type of social media. Furthermore, national and international stakeholders were consulted throughout the study, to discuss and interpret results.
Results
Twenty-nine articles were included, of which 21 were concerned with health care rating sites. Several studies indicate a relationship between information on social media and quality of health care. However, some drawbacks exist, especially regarding the use of rating sites. For example, since rating is anonymous, rating values are not risk adjusted and therefore vulnerable to fraud. Also, ratings are often based on only a few reviews and are predominantly positive. Furthermore, people providing feedback on health care via social media are presumably not always representative for the patient population.
Conclusions
Social media and particularly rating sites are an interesting new source of information about quality of care from the patient’s perspective. This new source should be used to complement traditional methods, since measuring quality of care via social media has other, but not less serious, limitations. Future research should explore whether social media are suitable in practice for patients, health insurers, and governments to help them judge the quality performance of professionals and organizations.
doi:10.2196/jmir.3024
PMCID: PMC3961699  PMID: 24566844
social media; rating sites; patient experiences; patient satisfaction; quality of health care
8.  Misrepresentation of Randomized Controlled Trials in Press Releases and News Coverage: A Cohort Study 
PLoS Medicine  2012;9(9):e1001308.
A study conducted by Amélie Yavchitz and colleagues examines the factors associated with “spin” (specific reporting strategies, intentional or unintentional, that emphasize the beneficial effect of treatments) in press releases of clinical trials.
Background
Previous studies indicate that in published reports, trial results can be distorted by the use of “spin” (specific reporting strategies, intentional or unintentional, emphasizing the beneficial effect of the experimental treatment). We aimed to (1) evaluate the presence of “spin” in press releases and associated media coverage; and (2) evaluate whether findings of randomized controlled trials (RCTs) based on press releases and media coverage are misinterpreted.
Methods and Findings
We systematically searched for all press releases indexed in the EurekAlert! database between December 2009 and March 2010. Of the 498 press releases retrieved and screened, we included press releases for all two-arm, parallel-group RCTs (n = 70). We obtained a copy of the scientific article to which the press release related and we systematically searched for related news items using Lexis Nexis.
“Spin,” defined as specific reporting strategies (intentional or unintentional) emphasizing the beneficial effect of the experimental treatment, was identified in 28 (40%) scientific article abstract conclusions and in 33 (47%) press releases. From bivariate and multivariable analysis assessing the journal type, funding source, sample size, type of treatment (drug or other), results of the primary outcomes (all nonstatistically significant versus other), author of the press release, and the presence of “spin” in the abstract conclusion, the only factor associated, with “spin” in the press release was “spin” in the article abstract conclusions (relative risk [RR] 5.6, [95% CI 2.8–11.1], p<0.001). Findings of RCTs based on press releases were overestimated for 19 (27%) reports. News items were identified for 41 RCTs; 21 (51%) were reported with “spin,” mainly the same type of “spin” as those identified in the press release and article abstract conclusion. Findings of RCTs based on the news item was overestimated for ten (24%) reports.
Conclusion
“Spin” was identified in about half of press releases and media coverage. In multivariable analysis, the main factor associated with “spin” in press releases was the presence of “spin” in the article abstract conclusion.
Editors' Summary
Background
The mass media play an important role in disseminating the results of medical research. Every day, news items in newspapers and magazines and on the television, radio, and internet provide the general public with information about the latest clinical studies. Such news items are written by journalists and are often based on information in “press releases.” These short communications, which are posted on online databases such as EurekAlert! and sent directly to journalists, are prepared by researchers or more often by the drug companies, funding bodies, or institutions supporting the clinical research and are designed to attract favorable media attention to newly published research results. Press releases provide journalists with the information they need to develop and publish a news story, including a link to the peer-reviewed journal (a scholarly periodical containing articles that have been judged by independent experts) in which the research results appear.
Why Was This Study Done?
In an ideal world, journal articles, press releases, and news stories would all accurately reflect the results of health research. Unfortunately, the findings of randomized controlled trials (RCTs—studies that compare the outcomes of patients randomly assigned to receive alternative interventions), which are the best way to evaluate new treatments, are sometimes distorted in peer-reviewed journals by the use of “spin”—reporting that emphasizes the beneficial effects of the experimental (new) treatment. For example, a journal article may interpret nonstatistically significant differences as showing the equivalence of two treatments although such results actually indicate a lack of evidence for the superiority of either treatment. “Spin” can distort the transposition of research into clinical practice and, when reproduced in the mass media, it can give patients unrealistic expectations about new treatments. It is important, therefore, to know where “spin” occurs and to understand the effects of that “spin”. In this study, the researchers evaluate the presence of “spin” in press releases and associated media coverage and analyze whether the interpretation of RCT results based on press releases and associated news items could lead to the misinterpretation of RCT results.
What Did the Researchers Do and Find?
The researchers identified 70 press releases indexed in EurekAlert! over a 4-month period that described two-arm, parallel-group RCTs. They used Lexis Nexis, a database of news reports from around the world, to identify associated news items for 41 of these press releases and then analyzed the press releases, news items, and abstracts of the scientific articles related to each press release for “spin”. Finally, they interpreted the results of the RCTs using each source of information independently. Nearly half the press releases and article abstract conclusions contained “spin” and, importantly, “spin” in the press releases was associated with “spin” in the article abstracts. The researchers overestimated the benefits of the experimental treatment from the press release as compared to the full-text peer-reviewed article for 27% of reports. Factors that were associated with this overestimation of treatment benefits included publication in a specialized journal and having “spin” in the press release. Of the news items related to press releases, half contained “spin”, usually of the same type as identified in the press release and article abstract. Finally, the researchers overestimated the benefit of the experimental treatment from the news item as compared to the full-text peer-reviewed article in 24% of cases.
What Do These Findings Mean?
These findings show that “spin” in press releases and news reports is related to the presence of “spin” in the abstract of peer-reviewed reports of RCTs and suggest that the interpretation of RCT results based solely on press releases or media coverage could distort the interpretation of research findings in a way that favors experimental treatments. This interpretation shift is probably related to the presence of “spin” in peer-reviewed article abstracts, press releases, and news items and may be partly responsible for a mismatch between the perceived and real beneficial effects of new treatments among the general public. Overall, these findings highlight the important role that journal reviewers and editors play in disseminating research findings. These individuals, the researchers conclude, have a responsibility to ensure that the conclusions reported in the abstracts of peer-reviewed articles are appropriate and do not over-interpret the results of clinical research.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001308.
The PLOS Hub for Clinical Trials, which collects PLOS journals relating to clinical trials, includes some other articles on “spin” in clinical trial reports
EurekAlert is an online free database for science press releases
The UK National Health Service Choices website includes Beyond the Headlines, a resource that provides an unbiased and evidence-based analysis of health stories that make the news for both the public and health professionals
The US-based organization HealthNewsReview, a project supported by the Foundation for Informed Medical Decision Making, also provides expert reviews of news stories
doi:10.1371/journal.pmed.1001308
PMCID: PMC3439420  PMID: 22984354
9.  Genome-wide transcriptional plasticity underlies cellular adaptation to novel challenge 
By recruiting the essential HIS3 gene to the GAL regulatory system and switching to a repressing glucose medium, we confronted yeast cells with a novel challenge they had not encountered before along their history in evolution.Adaptation to this challenge involved a global transcriptional response of a sizeable fraction of the genome, which relaxed on the time scale of the population adaptation, of order of 10 generations.For a large fraction of the responding genes there is no simple biological interpretation, connecting them to the specific cellular demands imposed by the novel challenge.Strikingly, repeating the experiment did not reproduce similar transcription patterns neither in the transient phase nor in the adapted state in glucose.These results suggest that physiological selection operates on the new metabolic configurations generated by the non-specific large scale transcriptional response to eventually stabilize an adaptive state.
Cells adjust their transcriptional state to accommodate environmental and genetic perturbations. Some common perturbations, such as changes in nutrient composition, elicit well-characterized transcriptional responses that can be understood by simple engineering-like design principles as satisfying specific demands imposed by the perturbation. However, cells also have the ability to adapt to novel and unforeseen challenges. This ability is central in realizing the evolvability potential of cells as they respond to dramatic genetic or environmental changes along evolution. Little is known about the mechanisms underlying such adaptations to novel challenges; in particular, the role of the transcriptional regulatory network in such adaptations has not been characterized. Genome-wide measurements have revealed that, in many cases, perturbations lead to a global transcriptional response involving a sizeable fraction of the genome (Gasch et al, 2000; Jelinsky et al, 2000; Causton et al, 2001; Ideker et al, 2001; Lai et al, 2005). Such global behavior suggests that general collective properties of the genetic network, rather than specific pre-designed pathways, determine an important part of the transcriptional response. It is not known however what fraction of genes within such massive transcriptional responses is essential to the specific cellular demands. It is also unknown whether the non-pre-designed part of the response can have a functional role in adaptation to novel challenges.
To study these questions, we confronted yeast cells with a novel challenge they had not encountered before along their history in evolution. A strain of the yeast Saccharomyces cerevisiae was engineered to recruit the gene HIS3, an essential enzyme from the histidine biosynthesis pathway (Hinnebusch, 1992), to the GAL regulatory system, responsible for galactose utilization (Stolovicki et al, 2006). The GAL system is known to be strongly repressed when the cells are exposed to glucose. Therefore, upon switching to a medium containing glucose and lacking histidine, the GAL system and with it HIS3 are highly repressed immediately following the switch and the cells encounter a severe challenge. We have recently shown that a cell population carrying this rewired genome can adapt to grow competitively in a chemostat in a medium containing pure glucose (Stolovicki et al, 2006). This adaptation occurred on a timescale of ∼10 generations; applying a stronger environmental pressure in the form of a competitive inhibitor to HIS3 (3AT) resulted in a similar adaptation albeit with a longer timescale. Figure 1 shows the dynamics of the population's cell density (blue lines, measured by OD) following a medium switch from galactose to glucose in the chemostat without (A) and with (B) 3AT. The experiments revealed that adaptation occurs on physiological timescales (much shorter than required by spontaneous random mutations), but the mechanisms underlying this adaptation have remained unclear (Stolovicki et al, 2006).
Yeast cells had not encountered recruitment of HIS3 to the GAL system along their evolutionary history, and their genome could not possibly have been selected to specifically address glucose repression of HIS3. This experiment, therefore, provides a unique opportunity to characterize the spontaneous transcriptional response during adaptation to a novel challenge and to assess the functional role of the regulatory system in this adaptation. We used DNA microarrays to measure the genome-wide expression levels at time points along the adaptation process, with and without 3AT. These measurements revealed that a sizeable fraction of the genome responded by induction or repression to the switch into glucose. Superimposed on the OD traces, Figure 1 shows the results of a clustering analysis of the expression of genes as measured by the arrays along time in the experiments. This analysis revealed two dominant clusters, each containing hundreds of genes in each experiment, which responded to the medium switch to glucose by a strong transient induction or repression followed by relaxation to steady state on the timescale of the adaptation process, ∼ 10 generations. The two clusters in each experiment show similar but opposite dynamics.
A detailed analysis of the gene content in the two clusters revealed that only a small portion of the response was induced by a change in carbon source (15% overlap between the corresponding clusters in the two experiments, with and without 3AT). Moreover, it revealed a very low overlap with the universal stress response observed for a wide range of environmental stresses (Gasch et al, 2000; Causton et al, 2001) and with the typical response to amino-acid starvation (Natarajan et al, 2001). Additionally, all known specific responses to stress in the literature are characterized by transient induction or repression with relaxation to steady state within a generation time (Gasch et al, 2000; Koerkamp et al, 2002; Wu et al, 2004), whereas in our experiments relaxation of the transcriptional response occurs over many generations. Taken together, these results show that the transcriptional response observed here is neither a metabolic response to the change in carbon source nor is it a standard response to stress or amino-acid starvation. This raises the possibility that it is a spontaneous collective response that is largely composed of genes that do not have a specific function. This possibility was tested directly by repeating the experiment with different populations and comparing their responses. This procedure revealed reproducible adaptation dynamics and steady states in terms of population density, but showed significantly different transcriptional transient responses and steady states for the two repeated experiments. Thus, a significant portion of the genes that changed their expression during the adaptation process do not have a well-defined and reproducible function in the challenging environment.
The application of a stronger environmental pressure in the form of 3AT had a dramatic effect on the global characteristics of the transcriptional response: it induced a markedly higher correlation among the hundreds of responding genes. Figure 3A compares the array data in color code for the two experiments. It is seen that the emergent pattern of transcription exhibits a higher degree of order by the introduction of high external pressure. Observation of the transcriptional patterns for specific metabolic pathways illustrates the different contributions to the correlated dynamics (Figure 3B–D). A general energetic module such as glycolysis exhibited similar patterns of induction and relaxation in experiments with and without 3AT (Figure 3B). However, in general, we found that more than one-third of the known metabolic modules (30 out of 88 modules described in KEGG) exhibited high expression correlation among their genes when the environmental pressure was high but not when it was low. As an example, Figure 3C shows the histidine biosynthesis pathway and Figure 3D the purine pathway. Note the highly ordered trajectories in the lower panels (with 3AT) compared to the disordered ones in the upper panels (no 3AT). This order extends also between genes belonging to different and even distant metabolic modules. It indicates that a global transcriptional regulatory mechanism is in operation, rather than a local specific one. Surprisingly, genes belonging to the same metabolic pathway exhibited simultaneous positively and negatively correlated dynamics. Thus, an important conclusion of this work is that the global transcriptional response to a novel challenge cannot be explained by a simple cellular or metabolic logic. This is to be expected if the response had not been specifically selected in evolution and was not pre-designed for the challenge.
Our data clearly reveal that the massive transcriptional response underlies the adaptation process to a novel challenge. The novelty of the challenge presented to the cells excludes the possibility that this response has been specifically selected toward this challenge. Thus, transcriptional regulation has dynamic properties resulting in a general massive nonspecific response to a novel perturbation. Such a response in turn allows for metabolic rearrangements, which by feeding back on transcription lead to adaptation of the cells to the unforeseen situation. The drastic change in the expression state of the cell opens multiple new metabolic pathways. Physiological selection works then on these multiple metabolic pathways to stabilize an adaptive state that causes relaxation of the perturbed expression pattern. This scenario, involving the creation of a library of possibilities and physiological selection over this library, is compatible with our understanding of a broad class of biological systems, placing the cellular metabolic/regulatory networks on the same footing as the neural or the immune systems (Gerhart and Kirschner, 1997).
Cells adjust their transcriptional state to accommodate environmental and genetic perturbations. An open question is to what extent transcriptional response to perturbations has been specifically selected along evolution. To test the possibility that transcriptional reprogramming does not need to be ‘pre-designed' to lead to an adaptive metabolic state on physiological timescales, we confronted yeast cells with a novel challenge they had not previously encountered. We rewired the genome by recruiting an essential gene, HIS3, from the histidine biosynthesis pathway to a foreign regulatory system, the GAL network responsible for galactose utilization. Switching medium to glucose in a chemostat caused repression of the essential gene and presented the cells with a severe challenge to which they adapted over approximately 10 generations. Using genome-wide expression arrays, we show here that a global transcriptional reprogramming (>1200 genes) underlies the adaptation. A large fraction of the responding genes is nonreproducible in repeated experiments. These results show that a nonspecific transcriptional response reflecting the natural plasticity of the regulatory network supports adaptation of cells to novel challenges.
doi:10.1038/msb4100147
PMCID: PMC1865588  PMID: 17453047
adaptation; cellular metabolism; expression arrays; plasticity; transcriptional response
10.  Standardization of fluorescence in situ hybridization studies on chronic lymphocytic leukemia (CLL) blood and marrow cells by the CLL Research Consortium 
Cancer genetics and cytogenetics  2010;203(2):141-148.
Five laboratories in the Chronic Lymphocytic Leukemia (CLL) Research Consortium (CRC) investigated standardizing and pooling of fluorescence in situ hybridization (FISH) results as a collaborative research project. This investigation used fixed bone marrow and blood cells available from previous conventional cytogenetic or FISH studies in two pilot studies, a one-day workshop, and proficiency test. Multiple FISH probe strategies were used to detect 6q-, 11q-, +12, 13q-, 17p-, and IGH rearrangements. Ten specimens were studied by participants who used their own probes (pilot study 1). Of 312 FISH interpretations, 224 (72%) were true-negative, 74 (24%) true-positive, 6 (2%) false-negative, and 8 (3%) false-positive. In pilot study no. 2, each participant studied two specimens using identical FISH probe sets to control for variation due to probe sets and probe strategies. Of 80 FISH interpretations, no false interpretations were identified. At a subsequent workshop, discussions produced agreement on scoring criteria. The proficiency test that followed produced no false-negative results and 4% (3/68) false-positive interpretations. Interpretation disagreements among laboratories were primarily attributable to inadequate normal cutoffs, inconsistent scoring criteria, and the use of different FISH probe strategies. Collaborative organizations that use pooled FISH results may wish to impose more conservative empiric normal cutoff values or use an equivocal range between the normal cutoff and the abnormal reference range to eliminate false-positive interpretations. False-negative results will still occur, and would be expected in low-percentage positive cases; these would likely have less clinical significance than false positive results. Individual laboratories can help by closely following rigorous quality assurance guidelines to ensure accurate and consistent FISH studies in their clinical practice and research.
doi:10.1016/j.cancergencyto.2010.08.009
PMCID: PMC3763815  PMID: 21156226
11.  Precision and recall estimates for two-hybrid screens 
Bioinformatics  2008;25(3):372-378.
Motivation: Yeast two-hybrid screens are an important method to map pairwise protein interactions. This method can generate spurious interactions (false discoveries), and true interactions can be missed (false negatives). Previously, we reported a capture–recapture estimator for bait-specific precision and recall. Here, we present an improved method that better accounts for heterogeneity in bait-specific error rates.
Result: For yeast, worm and fly screens, we estimate the overall false discovery rates (FDRs) to be 9.9%, 13.2% and 17.0% and the false negative rates (FNRs) to be 51%, 42% and 28%. Bait-specific FDRs and the estimated protein degrees are then used to identify protein categories that yield more (or fewer) false positive interactions and more (or fewer) interaction partners. While membrane proteins have been suggested to have elevated FDRs, the current analysis suggests that intrinsic membrane proteins may actually have reduced FDRs. Hydrophobicity is positively correlated with decreased error rates and fewer interaction partners. These methods will be useful for future two-hybrid screens, which could use ultra-high-throughput sequencing for deeper sampling of interacting bait–prey pairs.
Availability: All software (C source) and datasets are available as supplemental files and at http://www.baderzone.org under the Lesser GPL v. 3 license.
Contact: joel.bader@jhu.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btn640
PMCID: PMC2639075  PMID: 19091773
12.  A model of yeast cell-cycle regulation based on multisite phosphorylation 
Multisite phosphorylation of CDK target proteins provides the requisite nonlinearity for cell cycle modeling using elementary reaction mechanisms.Stochastic simulations, based on Gillespie's algorithm and using realistic numbers of protein and mRNA molecules, compare favorably with single-cell measurements in budding yeast.The role of transcription–translation coupling is critical in the robust operation of protein regulatory networks in yeast cells.
Progression through the eukaryotic cell cycle is governed by the activation and inactivation of a family of cyclin-dependent kinases (CDKs) and auxiliary proteins that regulate CDK activities (Morgan, 2007). The many components of this protein regulatory network are interconnected by positive and negative feedback loops that create bistable switches and transient pulses (Tyson and Novak, 2008). The network must ensure that cell-cycle events proceed in the correct order, that cell division is balanced with respect to cell growth, and that any problems encountered (in replicating the genome or partitioning chromosomes to daughter cells) are corrected before the cell proceeds to the next phase of the cycle. The network must operate robustly in the context of unavoidable molecular fluctuations in a yeast-sized cell. With a volume of only 5×10−14 l, a yeast cell contains one copy of the gene for each component of the network, a handful of mRNA transcripts of each gene, and a few hundreds to thousands of protein molecules carrying out each gene's function. How large are the molecular fluctuations implied by these numbers, and what effects do they have on the functioning of the cell-cycle control system?
To answer these questions, we have built a new model (Figure 1) of the CDK regulatory network in budding yeast, based on the fact that the targets of CDK activity are typically phosphorylated on multiple sites. The activity of each target protein depends on how many sites are phosphorylated. The target proteins feedback on CDK activity by controlling cyclin synthesis (SBF's role) and degradation (Cdh1's role) and by releasing a CDK-counteracting phosphatase (Cdc14). Every reaction in Figure 1 can be described by a mass-action rate law, with an accompanying rate constant that must be estimated from experimental data. As the transcription and translation of mRNA molecules have major effects on fluctuating numbers of protein molecules (Pedraza and Paulsson, 2008), we have included mRNA transcripts for each protein in the model.
To create a deterministic model, the rate laws are combined, according to standard principles of chemical kinetics, into a set of 60 differential equations that govern the temporal dynamics of the control system. In the stochastic version of the model, the rate law for each reaction determines the probability per unit time that a particular reaction occurs, and we use Gillespie's stochastic simulation algorithm (Gillespie, 1976) to compute possible temporal sequences of reaction events. Accurate stochastic simulations require knowledge of the expected numbers of mRNA and protein molecules in a single yeast cell. Fortunately, these numbers are available from several sources (Ghaemmaghami et al, 2003; Zenklusen et al, 2008). Although the experimental estimates are not always in good agreement with each other, they are sufficiently reliable to populate a stochastic model with realistic numbers of molecules.
By simulating thousands of cells (as in Figure 5), we can build up representative samples for computing the mean and s.d. of any measurable cell-cycle property (e.g. interdivision time, size at division, duration of G1 phase). The excellent fit of simulated statistics to observations of cell-cycle variability is documented in the main text and Supplementary Information.
Of particular interest to us are observations of Di Talia et al (2007) of the timing of a crucial G1 event (export of Whi5 protein from the nucleus) in a population of budding yeast cells growing at a specific growth rate α=ln2/(mass-doubling time). Whi5 export is a consequence of Whi5 phosphorylation, and it occurs simultaneously with the release (activation) of SBF (see Figure 1). Using fluorescently labeled Whi5, Di Talia et al could easily measure (in individual yeast cells) the time, T1, from cell birth to the abrupt loss of Whi5 from the nucleus. Correlating T1 to the size of the cell at birth, Vbirth, they found that, for a sample of daughter cells, αT1 versus ln(Vbirth) could be fit with two straight lines of slope −0.7 and −0.3. Our simulation of this experiment (Figure 7 of the main text) compares favorably with Figure 3d and e in Di Talia et al (2007).
The major sources of noise in our model (and in protein regulatory networks in yeast cells, in general) are related to gene transcription and the small number of unique mRNA transcripts. As each mRNA molecule may instruct the synthesis of dozens of protein molecules, the coefficient of variation of molecular fluctuations at the protein level (CVP) may be dominated by fluctuations at the mRNA level, as expressed in the formula (Pedraza and Paulsson, 2008) where NM, NP denote the number of mRNA and protein molecules, respectively, and ρ=τM/τP is the ratio of half-lives of mRNA and protein molecules. For a yeast cell, typical values of NM and NP are 8 and 800, respectively (Ghaemmaghami et al, 2003; Zenklusen et al, 2008). If ρ=1, then CVP≈25%. Such large fluctuations in protein levels are inconsistent with the observed variability of size and age at division in yeast cells, as shown in the simplified cell-cycle model of Kar et al (2009) and as we have confirmed with our more realistic model. The size of these fluctuations can be reduced to a more acceptable level by assuming a shorter half-life for mRNA (say, ρ=0.1).
There must be some mechanisms whereby yeast cells lessen the protein fluctuations implied by transcription–translation coupling. Following Pedraza and Paulsson (2008), we suggest that mRNA gestation and senescence may resolve this problem. Equation (3) is based on a simple, one-stage, birth–death model of mRNA turnover. In Supplementary Appendix 1, we show that a model of mRNA processing, with 10 stages each of mRNA gestation and senescence, gives reasonable fluctuations at the protein level (CVP≈5%), even if the effective half-life of mRNA is 10 min. A one-stage model with τM=1 min gives comparable fluctuations (CVP≈5%). In the main text, we use a simple birth–death model of mRNA turnover with an ‘effective' half-life of 1 min, in order to limit the computational complexity of the full cell-cycle model.
In order for the cell's genome to be passed intact from one generation to the next, the events of the cell cycle (DNA replication, mitosis, cell division) must be executed in the correct order, despite the considerable molecular noise inherent in any protein-based regulatory system residing in the small confines of a eukaryotic cell. To assess the effects of molecular fluctuations on cell-cycle progression in budding yeast cells, we have constructed a new model of the regulation of Cln- and Clb-dependent kinases, based on multisite phosphorylation of their target proteins and on positive and negative feedback loops involving the kinases themselves. To account for the significant role of noise in the transcription and translation steps of gene expression, the model includes mRNAs as well as proteins. The model equations are simulated deterministically and stochastically to reveal the bistable switching behavior on which proper cell-cycle progression depends and to show that this behavior is robust to the level of molecular noise expected in yeast-sized cells (∼50 fL volume). The model gives a quantitatively accurate account of the variability observed in the G1-S transition in budding yeast, which is governed by an underlying sizer+timer control system.
doi:10.1038/msb.2010.55
PMCID: PMC2947364  PMID: 20739927
bistability; cell-cycle variability; size control; stochastic model; transcription–translation coupling
13.  Rapid organism identification from Bactec NR blood culture media in a diagnostic microbiology laboratory. 
Journal of Clinical Pathology  1994;47(9):796-798.
AIMS--To evaluate rapid organism identification on positive blood culture Bactec NR media (phial types 26, 27, 42 and 17), and to assess the usefulness of these procedures in a diagnostic microbiology laboratory. METHODS--Two hundred and sixty, first positive, blood culture bottles from individual patients were tested by rapid identification methods selected on the basis of Gram film organism morphology. Tube coagulase and latex agglutination were applied to presumptive staphylococci; latex agglutination antigen detection methods to suspected pneumococci, Neisseria and Haemophilus sp; and latex agglutination grouping tests for cultures thought to be non-pneumococcal streptococci. RESULTS--Media type did not influence test performance (p > 0.05 for all comparisons). Misapplication of methods occurred on eight occasions and there were 14 false positive results, nine involving the latex reagents for group C streptococci and pneumococci. The positive predictive values for tube coagulase tests and latex reactions for H influenzae type b, and N meningitidis groups B and C were 100%. The pneumococcal and staphylococcal latex tests gave positive predictive values of 94.1% and 62.5%, respectively, and the corresponding figure for streptococcal grouping reactions was 75.9%. With the exception of staphylococcal latex testing (80%) all investigation negative predictive values were > 90%. CONCLUSIONS--The performance of the staphylococcal latex agglutination method was unsatisfactory and it is not appropriate for use with the media studied. In view of the cross-reactions observed with the tests used to identify group C streptococci and pneumococci, positive findings must be interpreted with caution. In all other regards the protocol evaluated produced rapid, reliable, clinically useful information and, subject to local experience, is recommended to users of Bactec NR media.
PMCID: PMC494934  PMID: 7962646
14.  From co-expression to co-regulation: how many microarray experiments do we need? 
Genome Biology  2004;5(7):R48.
The ability to identify co-regulated genes from microarray clustering results is strongly dependent on the number of microarray experiments used in cluster analysis and the accuracy of these associations plateaus at between 50 and 100 experiments on yeast data. Even with large numbers of experiments, the false positive rate may exceed the true positive rate.
Background
Cluster analysis is often used to infer regulatory modules or biological function by associating unknown genes with other genes that have similar expression patterns and known regulatory elements or functions. However, clustering results may not have any biological relevance.
Results
We applied various clustering algorithms to microarray datasets with different sizes, and we evaluated the clustering results by determining the fraction of gene pairs from the same clusters that share at least one known common transcription factor. We used both yeast transcription factor databases (SCPD, YPD) and chromatin immunoprecipitation (ChIP) data to evaluate our clustering results. We showed that the ability to identify co-regulated genes from clustering results is strongly dependent on the number of microarray experiments used in cluster analysis and the accuracy of these associations plateaus at between 50 and 100 experiments on yeast data. Moreover, the model-based clustering algorithm MCLUST consistently outperforms more traditional methods in accurately assigning co-regulated genes to the same clusters on standardized data.
Conclusions
Our results are consistent with respect to independent evaluation criteria that strengthen our confidence in our results. However, when one compares ChIP data to YPD, the false-negative rate is approximately 80% using the recommended p-value of 0.001. In addition, we showed that even with large numbers of experiments, the false-positive rate may exceed the true-positive rate. In particular, even when all experiments are included, the best results produce clusters with only a 28% true-positive rate using known gene transcription factor interactions.
doi:10.1186/gb-2004-5-7-r48
PMCID: PMC463312  PMID: 15239833
15.  Identification of Candida glabrata by a 30-Second Trehalase Test 
Journal of Clinical Microbiology  2002;40(10):3602-3605.
Rapid (30-s) trehalase tests done with material from colonies of 482 yeasts suspended in a drop of trehalose solution on a commercially supplied glucose test strip were positive for 225 (99.1%) of 227 Candida glabrata isolates grown on either of two differential media, Candida ID medium or CandiSelect medium. The test was positive for only 3 (1.2%) and 12 (4.7%) of 255 isolates of other medically important yeast species grown on the same two media, respectively. A rapid maltase test done with a subset of 255 yeast isolates was negative for all but 1 of 64 trehalase-positive C. glabrata isolates, raising the specificity of the rapid testing for C. glabrata to 98.4 to 100%, depending on the isolation medium used. Rapid trehalase and maltase tests done independently in two laboratories with 217 yeast isolates showed sensitivities of 96.0 to 98.0% and specificities of 98.2 to 99.4% for identification of C. glabrata from colonies grown on Candida ID medium. The specificity was much lower because of frequent false-positive trehalose test results when the source of colonies was Sabouraud agar formulated with 4% glucose. We conclude that direct recognition of C. albicans as blue colonies on Candida ID isolation medium coupled with the performance of the 30-s trehalase and maltase tests for C. glabrata among the white colonies on this medium will allow the rapid presumptive identification of the two yeast species most commonly encountered in clinical samples.
doi:10.1128/JCM.40.10.3602-3605.2002
PMCID: PMC130844  PMID: 12354852
16.  A Novel Scoring Approach for Protein Co-Purification Data Reveals High Interaction Specificity 
PLoS Computational Biology  2009;5(9):e1000515.
Large-scale protein interaction networks (PINs) have typically been discerned using affinity purification followed by mass spectrometry (AP/MS) and yeast two-hybrid (Y2H) techniques. It is generally recognized that Y2H screens detect direct binary interactions while the AP/MS method captures co-complex associations; however, the latter technique is known to yield prevalent false positives arising from a number of effects, including abundance. We describe a novel approach to compute the propensity for two proteins to co-purify in an AP/MS data set, thereby allowing us to assess the detected level of interaction specificity by analyzing the corresponding distribution of interaction scores. We find that two recent AP/MS data sets of yeast contain enrichments of specific, or high-scoring, associations as compared to commensurate random profiles, and that curated, direct physical interactions in two prominent data bases have consistently high scores. Our scored interaction data sets are generally more comprehensive than those of previous studies when compared against four diverse, high-quality reference sets. Furthermore, we find that our scored data sets are more enriched with curated, direct physical associations than Y2H sets. A high-confidence protein interaction network (PIN) derived from the AP/MS data is revealed to be highly modular, and we show that this topology is not the result of misrepresenting indirect associations as direct interactions. In fact, we propose that the modularity in Y2H data sets may be underrepresented, as they contain indirect associations that are significantly enriched with false negatives. The AP/MS PIN is also found to contain significant assortative mixing; however, in line with a previous study we confirm that Y2H interaction data show weak disassortativeness, thus revealing more clearly the distinctive natures of the interaction detection methods. We expect that our scored yeast data sets are ideal for further biological discovery and that our scoring system will prove useful for other AP/MS data sets.
Author Summary
To understand and model cellular processes, we require accurate descriptions of the interactions occurring between constituent proteins. Large-scale protein interaction maps have typically been measured in two distinct ways. The first detects direct pair-wise associations by testing only two proteins at a time for an interaction. The second detects large groups of proteins that have conglomerated or purified together. With regard to the latter, it is difficult to deduce which pairs of proteins are physically interacting in the purification data, and interaction maps generally appear random and unstructured. We have developed a novel computational method to analyze the purification data (from the second method) and identify which proteins are directly interacting. The resultant protein interaction map is highly modular, meaning that the proteins organize themselves into localized, densely connected regions that likely represent individually functioning units. We also analyzed interaction maps of the first method and propose that their lack of modularity is a consequence of missing interactions that are undetected for unclear reasons. This study provides insights into the differences between the two interaction detection methods as well as the nature of biological organization.
doi:10.1371/journal.pcbi.1000515
PMCID: PMC2738424  PMID: 19779545
17.  Nucleic Acid Amplification Based Diagnostic of Lyme (Neuro-)borreliosis – Lost in the Jungle of Methods, Targets, and Assays? 
The Open Neurology Journal  2012;6:129-139.
Laboratory based diagnosis of infectious diseases usually relies on culture of the disease causing micro-organism, followed by identification and susceptibility testing. Since Borrelia burgdorferi sensu lato, the etiologic agent of Lyme disease or Lyme borreliosis, requires very specific culture conditions (e.g. specific liquid media, long term cul-ture) traditional bacteriology is often not done on a routine basis. Instead, confirmation of the clinical diagnosis needs ei-ther indirect techniques (like serology or measurement of cellular activity in the presence of antigens) or direct but culture independent techniques, like microscopy or nucleic acid amplification techniques (NAT), with polymerase chain reaction (PCR) being the most frequently applied NAT method in routine laboratories.
NAT uses nucleic acids of the disease causing micro-organism as template for amplification, isolated from various sources of clinical specimens. Although the underlying principle, adoption of the enzymatic process running during DNA duplication prior to prokaryotic cell division, is comparatively easy, a couple of ‘pitfalls’ is associated with the technique itself as well as with interpretation of the results.
At present, no commercial, CE-marked and sufficiently validated PCR assay is available. A number of homebrew assays have been published, which are different in terms of target (i.e. the gene targeted by the amplification primers), method (nested PCR, PCR followed by hybridization, real-time PCR) and validation criteria. Inhibitory compounds may lead to false negative results, if no appropriate internal control is included. Carry-over of amplicons, insufficient handling and workflow and/or insufficiently validated targets/primers may result in false positive results. Different targets may yield different analytical sensitivity, depending, among other factors, of the redundancy of a target gene in the genome. Per-formance characteristics (e.g. analytical sensitivity and specificity, clinical sensitivity and specificity, reproducibility, etc.) are, if available, only applicable to a specific assay, running in a specific laboratory. Finally, not only the NAT/PCR method itself, but also the process of DNA isolation from the specimen, is highly diverse and may have fundamental im-pact on the (expected) PCR result. Of concern are distribution effects of DNA, in particular, if only low numbers of bacte-ria/genomes are present in a sample, as it is the case for instance in cerebrospinal fluids.
For the ordering physician and for the patient requesting PCR analysis, these ‘pitfalls’ are usually invisible. As a conse-quence, the reported result (i.e. PCR negative or positive for B. burgdorferi) is hard to interpret, especially, if the reported PCR result is contradictory to the clinical diagnosis or other laboratory findings. Moreover, due to the high number of dif-ferent assays in use, two laboratories, testing the same specimen, might come to different PCR results.
The current paper wants to summarize the available PCR/NAT assays for the detection of B. burgdorferi DNA in clinical specimens, with special attention to neurologic disorders, and to discuss the difficulties in PCR analysis and result inter-pretation, associated thereof. In view of growing numbers of patients who are diagnosed of having Lyme disease, and ac-knowledging a substantial growth in knowledge regarding other tick- or vector-borne pathogens, which might be able to induce symptoms comparable to Lyme (neuro-)borreliosis, efforts are urgently needed to standardize and harmonize methods for B. burgdorferi nucleic acid amplification.
doi:10.2174/1874205X01206010129
PMCID: PMC3514706  PMID: 23230454
Polymerase chain reaction; Borrelia burgdorferi; Lyme disease; Lyme neuroborreliosis.
18.  Dye bias correction in dual-labeled cDNA microarray gene expression measurements. 
Environmental Health Perspectives  2004;112(4):480-487.
A significant limitation to the analytical accuracy and precision of dual-labeled spotted cDNA microarrays is the signal error due to dye bias. Transcript-dependent dye bias may be due to gene-specific differences of incorporation of two distinctly different chemical dyes and the resultant differential hybridization efficiencies of these two chemically different targets for the same probe. Several approaches were used to assess and minimize the effects of dye bias on fluorescent hybridization signals and maximize the experimental design efficiency of a cell culture experiment. Dye bias was measured at the individual transcript level within each batch of simultaneously processed arrays by replicate dual-labeled split-control sample hybridizations and accounted for a significant component of fluorescent signal differences. This transcript-dependent dye bias alone could introduce unacceptably high numbers of both false-positive and false-negative signals. We found that within a given set of concurrently processed hybridizations, the bias is remarkably consistent and therefore measurable and correctable. The additional microarrays and reagents required for paired technical replicate dye-swap corrections commonly performed to control for dye bias could be costly to end users. Incorporating split-control microarrays within a set of concurrently processed hybridizations to specifically measure dye bias can eliminate the need for technical dye swap replicates and reduce microarray and reagent costs while maintaining experimental accuracy and technical precision. These data support a practical and more efficient experimental design to measure and mathematically correct for dye bias.
PMCID: PMC1241902  PMID: 15033598
19.  ROCS: a Reproducibility Index and Confidence Score for Interaction Proteomics Studies 
BMC Bioinformatics  2012;13:128.
Background
Affinity-Purification Mass-Spectrometry (AP-MS) provides a powerful means of identifying protein complexes and interactions. Several important challenges exist in interpreting the results of AP-MS experiments. First, the reproducibility of AP-MS experimental replicates can be low, due both to technical variability and the dynamic nature of protein interactions in the cell. Second, the identification of true protein-protein interactions in AP-MS experiments is subject to inaccuracy due to high false negative and false positive rates. Several experimental approaches can be used to mitigate these drawbacks, including the use of replicated and control experiments and relative quantification to sensitively distinguish true interacting proteins from false ones.
Methods
To address the issues of reproducibility and accuracy of protein-protein interactions, we introduce a two-step method, called ROCS, which makes use of Indicator Prey Proteins to select reproducible AP-MS experiments, and of Confidence Scores to select specific protein-protein interactions. The Indicator Prey Proteins account for measures of protein identifiability as well as protein reproducibility, effectively allowing removal of outlier experiments that contribute noise and affect downstream inferences. The filtered set of experiments is then used in the Protein-Protein Interaction (PPI) scoring step. Prey protein scoring is done by computing a Confidence Score, which accounts for the probability of occurrence of prey proteins in the bait experiments relative to the control experiment, where the significance cutoff parameter is estimated by simultaneously controlling false positives and false negatives against metrics of false discovery rate and biological coherence respectively. In summary, the ROCS method relies on automatic objective criterions for parameter estimation and error-controlled procedures.
Results
We illustrate the performance of our method by applying it to five previously published AP-MS experiments, each containing well characterized protein interactions, allowing for systematic benchmarking of ROCS. We show that our method may be used on its own to make accurate identification of specific, biologically relevant protein-protein interactions, or in combination with other AP-MS scoring methods to significantly improve inferences.
Conclusions
Our method addresses important issues encountered in AP-MS datasets, making ROCS a very promising tool for this purpose, either on its own or in conjunction with other methods. We anticipate that our methodology may be used more generally in proteomics studies and databases, where experimental reproducibility issues arise. The method is implemented in the R language, and is available as an R package called “ROCS”, freely available from the CRAN repository http://cran.r-project.org/.
doi:10.1186/1471-2105-13-128
PMCID: PMC3568013  PMID: 22682516
Experimental Reproducibility; Indicator Prey Proteins; Confidence Score; Protein-Protein Interaction; Affinity-Purification Mass-Spectrometry
20.  ROCS: A reproducibility index and confidence score for interaction proteomics 
BMC Public Health  2013;13:1011.
Background Affinity-Purification Mass-Spectrometry (AP-MS) provides a powerful means of identifying protein complexes and interactions. Several important challenges exist in interpreting the results of AP-MS experiments. First, the reproducibility of AP-MS experimental replicates can be low, due both to technical variability and the dynamic nature of protein interactions in the cell. Second, the identification of true protein-protein interactions in AP-MS experiments is subject to inaccuracy due to high false negative and false positive rates. Several experimental approaches can be used to mitigate these drawbacks, including the use of replicated and control experiments and relative quantification to sensitively distinguish true interacting proteins from false ones. Results To address the issues of reproducibility and accuracy of protein-protein interactions, we introduce a two-step method, called ROCS, which makes use of Indicator Proteins to select reproducible AP-MS experiments, and of Confidence Scores to select specific protein-protein interactions. The Indicator Proteins account for measures of protein identification as well as protein reproducibility, effectively allowing removal of outlier experiments that contribute noise and affect downstream inferences. The filtered set of experiments is then used in the Protein-Protein Interaction (PPI) scoring step. Prey protein scoring is done by computing a Confidence Score, which accounts for the probability of occurrence of prey proteins in the bait experiments relative to the control experiment, where the significance cutoff parameter is estimated by simultaneously controlling false positives and false negatives against metrics of false discovery rate and biological coherence respectively. In summary, the ROCS method relies on automatic objective criterions for parameter estimation and error-controlled procedures. We illustrate the performance of our method by applying it to five previously published AP-MS experiments, each containing well characterized protein interactions, allowing for systematic benchmarking of ROCS. We show that our method may be used on its own to make accurate identification of specific, biologically relevant protein-protein interactions or in combination with other AP-MS scoring methods to significantly improve inferences. Conclusions Our method addresses important issues encountered in AP-MS datasets, making ROCS a very promising tool for this purpose, either on its own or especially in conjunction with other methods. We anticipate that our methodology may be used more generally in proteomics studies and databases, where experimental reproducibility issues arise. The method is implemented in the R language, and is available as an R package called "ROCS", freely available from the CRAN repository http://cran.r-project.org/.
doi:10.1186/1471-2458-13-1011
PMCID: PMC3854457  PMID: 24160674
21.  ROCS: A reproducibility index and confidence score for interaction proteomics 
BMC Public Health  2013;13:1010.
Background Affinity-Purification Mass-Spectrometry (AP-MS) provides a powerful means of identifying protein complexes and interactions. Several important challenges exist in interpreting the results of AP-MS experiments. First, the reproducibility of AP-MS experimental replicates can be low, due both to technical variability and the dynamic nature of protein interactions in the cell. Second, the identification of true protein-protein interactions in AP-MS experiments is subject to inaccuracy due to high false negative and false positive rates. Several experimental approaches can be used to mitigate these drawbacks, including the use of replicated and control experiments and relative quantification to sensitively distinguish true interacting proteins from false ones. Results To address the issues of reproducibility and accuracy of protein-protein interactions, we introduce a two-step method, called ROCS, which makes use of Indicator Proteins to select reproducible AP-MS experiments, and of Confidence Scores to select specific protein-protein interactions. The Indicator Proteins account for measures of protein identification as well as protein reproducibility, effectively allowing removal of outlier experiments that contribute noise and affect downstream inferences. The filtered set of experiments is then used in the Protein-Protein Interaction (PPI) scoring step. Prey protein scoring is done by computing a Confidence Score, which accounts for the probability of occurrence of prey proteins in the bait experiments relative to the control experiment, where the significance cutoff parameter is estimated by simultaneously controlling false positives and false negatives against metrics of false discovery rate and biological coherence respectively. In summary, the ROCS method relies on automatic objective criterions for parameter estimation and error-controlled procedures. We illustrate the performance of our method by applying it to five previously published AP-MS experiments, each containing well characterized protein interactions, allowing for systematic benchmarking of ROCS. We show that our method may be used on its own to make accurate identification of specific, biologically relevant protein-protein interactions or in combination with other AP-MS scoring methods to significantly improve inferences. Conclusions Our method addresses important issues encountered in AP-MS datasets, making ROCS a very promising tool for this purpose, either on its own or especially in conjunction with other methods. We anticipate that our methodology may be used more generally in proteomics studies and databases, where experimental reproducibility issues arise. The method is implemented in the R language, and is available as an R package called "ROCS", freely available from the CRAN repository http://cran.r-project.org/.
doi:10.1186/1471-2458-13-1010
PMCID: PMC3840679  PMID: 24160571
22.  ROCS: A reproducibility index and confidence score for interaction proteomics 
BMC Public Health  2013;13:1009.
Background Affinity-Purification Mass-Spectrometry (AP-MS) provides a powerful means of identifying protein complexes and interactions. Several important challenges exist in interpreting the results of AP-MS experiments. First, the reproducibility of AP-MS experimental replicates can be low, due both to technical variability and the dynamic nature of protein interactions in the cell. Second, the identification of true protein-protein interactions in AP-MS experiments is subject to inaccuracy due to high false negative and false positive rates. Several experimental approaches can be used to mitigate these drawbacks, including the use of replicated and control experiments and relative quantification to sensitively distinguish true interacting proteins from false ones. Results To address the issues of reproducibility and accuracy of protein-protein interactions, we introduce a two-step method, called ROCS, which makes use of Indicator Proteins to select reproducible AP-MS experiments, and of Confidence Scores to select specific protein-protein interactions. The Indicator Proteins account for measures of protein identification as well as protein reproducibility, effectively allowing removal of outlier experiments that contribute noise and affect downstream inferences. The filtered set of experiments is then used in the Protein-Protein Interaction (PPI) scoring step. Prey protein scoring is done by computing a Confidence Score, which accounts for the probability of occurrence of prey proteins in the bait experiments relative to the control experiment, where the significance cutoff parameter is estimated by simultaneously controlling false positives and false negatives against metrics of false discovery rate and biological coherence respectively. In summary, the ROCS method relies on automatic objective criterions for parameter estimation and error-controlled procedures. We illustrate the performance of our method by applying it to five previously published AP-MS experiments, each containing well characterized protein interactions, allowing for systematic benchmarking of ROCS. We show that our method may be used on its own to make accurate identification of specific, biologically relevant protein-protein interactions or in combination with other AP-MS scoring methods to significantly improve inferences. Conclusions Our method addresses important issues encountered in AP-MS datasets, making ROCS a very promising tool for this purpose, either on its own or especially in conjunction with other methods. We anticipate that our methodology may be used more generally in proteomics studies and databases, where experimental reproducibility issues arise. The method is implemented in the R language, and is available as an R package called "ROCS", freely available from the CRAN repository http://cran.r-project.org/.
doi:10.1186/1471-2458-13-1009
PMCID: PMC3854487  PMID: 24156626
23.  AhR/Arnt:XRE interaction: Turning false negatives into true positives in the modified yeast one-hybrid assay 
Analytical biochemistry  2008;382(2):101-106.
Given the frequent occurrence of false negatives in yeast genetic assays, it is both interesting and practical to address the possible mechanisms of false negatives and, more important, to turn false negatives into true positives. We recently developed a modified yeast one-hybrid system (MY1H) useful for investigation of simultaneous protein–protein and protein:DNA interactions in vivo. We coexpressed the basic helix–loop–helix/Per-Arnt-Sim (bHLH/PAS) domains of aryl hydrocarbon receptor (AhR) and aryl hydrocarbon receptor nuclear translocator (Arnt)—namely NAhR and NArnt, respectively—which are known to form heterodimers and bind the cognate xenobiotic response element (XRE) sequence both in vitro and in vivo, as a positive control in the study of XRE-binding proteins in the MY1H system. However, we observed negative results, that is, no positive signal detected from binding of the NAhR/NArnt heterodimer and XRE site. We demonstrate that by increasing the copy number of XRE sites integrated into the yeast genome and using double GAL4 activation domains, the NAhR/NArnt heterodimer forms and specifically binds the cognate XRE sequence, an interaction that is now clearly detectable in the MY1H system. This methodology may be helpful in troubleshooting and correcting false negatives that arise from unproductive transcription in yeast genetic assays.
doi:10.1016/j.ab.2008.07.026
PMCID: PMC2643841  PMID: 18722998
Aryl hydrocarbon receptor; Basic helix–loop–helix; Xenobiotic response element; Protein–protein interaction; Protein:DNA interaction; Yeast one-hybrid
24.  Isolation of plant transcription factors using a modified yeast one-hybrid system 
Plant Methods  2006;2:3.
Background
The preparation of expressional cDNA libraries for use in the yeast two-hybrid system is quick and efficient when using the dedicated Clontech™ product, the MATCHMAKER Library Construction and Screening Kit 3. This kit employs SMART technology for the amplification of full-length cDNAs, in combination with cloning using homologous recombination.
Unfortunately, such cDNA libraries prepared directly in yeast can not be used for the efficient recovery of purified plasmids and thus are incompatible with existing yeast one-hybrid systems, which use yeast transformation for the library screen.
Results
Here we propose an adaptation of the yeast one-hybrid system for identification and cloning of transcription factors using a MATCHMAKER cDNA library. The procedure is demonstrated using a cDNA library prepared from the liquid part of the multinucleate coenocyte of wheat endosperm. The method is a modification of a standard one-hybrid screening protocol, utilising a mating step to introduce the library construct and reporter construct into the same cell. Several novel full length transcription factors from the homeodomain, AP2 domain and E2F families of transcription factors were identified and isolated.
Conclusion
In this paper we propose a method to extend the compatibility of MATCHMAKER cDNA libraries from yeast two-hybrid screens to one-hybrid screens. The utility of the new yeast one-hybrid technology is demonstrated by the successful cloning from wheat of full-length cDNAs encoding several transcription factors from three different families.
doi:10.1186/1746-4811-2-3
PMCID: PMC1402289  PMID: 16504065
25.  Controlled Clinical Comparison of VersaTREK and BacT/ALERT Blood Culture Systems▿  
Journal of Clinical Microbiology  2006;45(2):299-302.
To assess the relative yields in automated microbial detection systems of bacteria and yeasts isolated from the blood of adult patients with suspected sepsis, we compared the new VersaTREK system (VTI) (TREK Diagnostic Systems, Cleveland, OH) to the BacT/ALERT 3D system (3D) (bioMérieux, Inc., Durham, NC). Identical protocols were followed for the two systems. Paired aerobic (REDOX 1) and anaerobic (REDOX 2) VTI media were compared with standard aerobic (SA) and anaerobic (SN) 3D media; each of the four culture bottles was filled with 6 to 9 ml of blood. All bottles flagged positive by the instruments were subcultured to determine both true-positive (growth) and false-positive (no growth) cultures. Additionally, to assess false-negative bottles, terminal subcultures were done on all negative companion bottles to true-positive bottles. All isolates were identified by standard methods. All 4 bottles were adequately filled and yielded 413 clinically significant isolates in 5,389 (79%) of the 6,786 4-bottle sets obtained. Although no overall difference in yield or in time to detection was detected between the two systems, significantly more streptococci and enterococci as a group were detected by VTI. Moreover, significantly more microorganisms were detected by VTI for patients receiving antimicrobial therapy. The two systems were comparable (P, not significant) at detecting the 179 unimicrobial episodes of bacteremia seen. False-positive rates for aerobic and anaerobic bottles, respectively, were 1.6% and 0.9% for VTI and 0.7% and 0.8% for 3D. We conclude that the VTI and 3D systems are comparable for detection of bloodstream infections with bacteria or yeasts.
doi:10.1128/JCM.01697-06
PMCID: PMC1829065  PMID: 17122016

Results 1-25 (1597666)