Search tips
Search criteria

Results 1-24 (24)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  Acute and Chronic Plasma Metabolomic and Liver Transcriptomic Stress Effects in a Mouse Model with Features of Post-Traumatic Stress Disorder 
PLoS ONE  2015;10(1):e0117092.
Acute responses to intense stressors can give rise to post-traumatic stress disorder (PTSD). PTSD diagnostic criteria include trauma exposure history and self-reported symptoms. Individuals who meet PTSD diagnostic criteria often meet criteria for additional psychiatric diagnoses. Biomarkers promise to contribute to reliable phenotypes of PTSD and comorbidities by linking biological system alterations to behavioral symptoms. Here we have analyzed unbiased plasma metabolomics and other stress effects in a mouse model with behavioral features of PTSD. In this model, C57BL/6 mice are repeatedly exposed to a trained aggressor mouse (albino SJL) using a modified, resident-intruder, social defeat paradigm. Our recent studies using this model found that aggressor-exposed mice exhibited acute stress effects including changed behaviors, body weight gain, increased body temperature, as well as inflammatory and fibrotic histopathologies and transcriptomic changes of heart tissue. Some of these acute stress effects persisted, reminiscent of PTSD. Here we report elevated proteins in plasma that function in inflammation and responses to oxidative stress and damaged tissue at 24 hrs post-stressor. Additionally at this acute time point, transcriptomic analysis indicated liver inflammation. The unbiased metabolomics analysis showed altered metabolites in plasma at 24 hrs that only partially normalized toward control levels after stress-withdrawal for 1.5 or 4 wks. In particular, gut-derived metabolites were altered at 24 hrs post-stressor and remained altered up to 4 wks after stress-withdrawal. Also at the 4 wk time point, hyperlipidemia and suppressed metabolites of amino acids and carbohydrates in plasma coincided with transcriptomic indicators of altered liver metabolism (activated xenobiotic and lipid metabolism). Collectively, these system-wide sequelae to repeated intense stress suggest that the simultaneous perturbed functioning of multiple organ systems (e.g., brain, heart, intestine and liver) can interact to produce injuries that lead to chronic metabolic changes and disorders that have been associated with PTSD.
PMCID: PMC4309402  PMID: 25629821
2.  Identification of Gene Signatures Used to Recognize Biological Characteristics of Gastric Cancer Upon Gene Expression Data 
Biomarker Insights  2014;9:67-76.
High-throughput gene expression microarrays can be examined by machine-learning algorithms to identify gene signatures that recognize the biological characteristics of specific human diseases, including cancer, with high sensitivity and specificity. A previous study compared 20 gastric cancer (GC) samples against 20 normal tissue (NT) samples and identified 1,519 differentially expressed genes (DEGs). In this study, Classification Information Index (CII), Information Gain Index (IGI), and RELIEF algorithms are used to mine the previously reported gene expression profiling data. In all, 29 of these genes are identified by all three algorithms and are treated as GC candidate biomarkers. Three biomarkers, COL1A2, ATP4B, and HADHSC, are selected and further examined using quantitative real-time polymerase chain reaction (qRT-PCR) and immunohistochemistry (IHC) staining in two independent sets of GC and normal adjacent tissue (NAT) samples. Our study shows that COL1A2 and HADHSC are the two best biomarkers from the microarray data, distinguishing all GC from the NT, whereas ATP4B is diagnostically significant in lab tests because of its wider range of fold-changes in expression. Herein, a data-mining model applicable for small sample sizes is presented and discussed. Our result suggested that this mining model may be useful in small sample-size studies to identify putative biomarkers and potential biological features of GC.
PMCID: PMC4149392  PMID: 25210421
gastric cancer; gene signature; microarray; machine-learning algorithm
3.  The differential processing of telomeres in response to increased telomeric transcription and RNA–DNA hybrid accumulation 
RNA Biology  2014;11(2):95-100.
Telomeres are protective nucleoprotein structures at the ends of eukaryotic chromosomes. Despite the heterochromatic state of telomeres they are transcribed, generating non-coding telomeric repeat-containing RNA (TERRA). Strongly induced TERRA transcription has been shown to cause telomere shortening and accelerated senescence in the absence of both telomerase and homology-directed repair (HDR). Moreover, it has recently been demonstrated that TERRA forms RNA–DNA hybrids at chromosome ends. The accumulation of RNA–DNA hybrids at telomeres also leads to rapid senescence and telomere loss in the absence of telomerase and HDR. Conversely, in the presence of HDR, telomeric RNA–DNA hybrid accumulation and increased telomere transcription promote telomere recombination, and hence, delayed senescence. Here, we demonstrate that despite these similar phenotypic outcomes, telomeres that are highly transcribed are not processed in the same manner as those that accumulate RNA–DNA hybrids.
PMCID: PMC3973735  PMID: 24525824
TERRA; telomere; senescence; Exo1; RNA-DNA hybrid; R-loop; RNase H
4.  Knowledge and Theme Discovery across Very Large Biological Data Sets Using Distributed Queries: A Prototype Combining Unstructured and Structured Data 
PLoS ONE  2013;8(12):e80503.
As the discipline of biomedical science continues to apply new technologies capable of producing unprecedented volumes of noisy and complex biological data, it has become evident that available methods for deriving meaningful information from such data are simply not keeping pace. In order to achieve useful results, researchers require methods that consolidate, store and query combinations of structured and unstructured data sets efficiently and effectively. As we move towards personalized medicine, the need to combine unstructured data, such as medical literature, with large amounts of highly structured and high-throughput data such as human variation or expression data from very large cohorts, is especially urgent. For our study, we investigated a likely biomedical query using the Hadoop framework. We ran queries using native MapReduce tools we developed as well as other open source and proprietary tools. Our results suggest that the available technologies within the Big Data domain can reduce the time and effort needed to utilize and apply distributed queries over large datasets in practical clinical applications in the life sciences domain. The methodologies and technologies discussed in this paper set the stage for a more detailed evaluation that investigates how various data structures and data models are best mapped to the proper computational framework.
PMCID: PMC3846626  PMID: 24312478
5.  Guanine Holes Are Prominent Targets for Mutation in Cancer and Inherited Disease 
PLoS Genetics  2013;9(9):e1003816.
Single base substitutions constitute the most frequent type of human gene mutation and are a leading cause of cancer and inherited disease. These alterations occur non-randomly in DNA, being strongly influenced by the local nucleotide sequence context. However, the molecular mechanisms underlying such sequence context-dependent mutagenesis are not fully understood. Using bioinformatics, computational and molecular modeling analyses, we have determined the frequencies of mutation at G•C bp in the context of all 64 5′-NGNN-3′ motifs that contain the mutation at the second position. Twenty-four datasets were employed, comprising >530,000 somatic single base substitutions from 21 cancer genomes, >77,000 germline single-base substitutions causing or associated with human inherited disease and 16.7 million benign germline single-nucleotide variants. In several cancer types, the number of mutated motifs correlated both with the free energies of base stacking and the energies required for abstracting an electron from the target guanines (ionization potentials). Similar correlations were also evident for the pathological missense and nonsense germline mutations, but only when the target guanines were located on the non-transcribed DNA strand. Likewise, pathogenic splicing mutations predominantly affected positions in which a purine was located on the non-transcribed DNA strand. Novel candidate driver mutations and tissue-specific mutational patterns were also identified in the cancer datasets. We conclude that electron transfer reactions within the DNA molecule contribute to sequence context-dependent mutagenesis, involving both somatic driver and passenger mutations in cancer, as well as germline alterations causing or associated with inherited disease.
Author Summary
A large number of DNA mutations identified in cells from patients with cancer or human inherited disease were analyzed to address a fundamental issue in human pathology, viz, the mutational mechanisms that cause irreversible changes to DNA. By using bioinformatics and computational methods, we found that mutations do not occur randomly, but instead affect specific bases, most often guanines flanked by other guanines or adenines. We attribute this effect to electron transfer, a chemical reaction known to underlie basic biological processes such as cellular respiration and photosynthesis. Certain types of carcinogens, oxidants or radiation can interact with DNA and abstract an electron. Our results imply that the ensuing sites of electron loss can migrate from their original position in the DNA to neighboring guanines where they become trapped, leading to further chemical modifications that may eventually result in mutations. Many of the mutations known to be important for tumor growth (driver mutations), as well as passenger mutations and mutations associated with inherited disease, appear to be caused by electron transfer. Beyond pathological mutations, electron transfer may represent a universal mechanism by which genetic changes occur in all life forms to drive population fitness over evolutionary time.
PMCID: PMC3784513  PMID: 24086153
6.  Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools 
Nucleic Acids Research  2012;41(Database issue):D94-D100.
The non-B DB, available at, catalogs predicted non-B DNA-forming sequence motifs, including Z-DNA, G-quadruplex, A-phased repeats, inverted repeats, mirror repeats, direct repeats and their corresponding subsets: cruciforms, triplexes and slipped structures, in several genomes. Version 2.0 of the database revises and re-implements the motif discovery algorithms to better align with accepted definitions and thresholds for motifs, expands the non-B DNA-forming motifs coverage by including short tandem repeats and adds key visualization tools to compare motif locations relative to other genomic annotations. Non-B DB v2.0 extends the ability for comparative genomics by including re-annotation of the five organisms reported in non-B DB v1.0, human, chimpanzee, dog, macaque and mouse, and adds seven additional organisms: orangutan, rat, cow, pig, horse, platypus and Arabidopsis thaliana. Additionally, the non-B DB v2.0 provides an overall improved graphical user interface and faster query performance.
PMCID: PMC3531222  PMID: 23125372
7.  Genes affected by mouse mammary tumor virus (MMTV) proviral insertions in mouse mammary tumors are deregulated or mutated in primary human mammary tumors 
Oncotarget  2012;3(11):1320-1334.
The accumulation of mutations is a contributing factor in the initiation of premalignant mammary lesions and their progression to malignancy and metastasis. We have used a mouse model in which the carcinogen is the mouse mammary tumor virus (MMTV) which induces clonal premalignant mammary lesions and malignant mammary tumors by insertional mutagenesis. Identification of the genes and signaling pathways affected in MMTV-induced mouse mammary lesions provides a rationale for determining whether genetic alteration of the human orthologues of these genes/pathways may contribute to human breast carcinogenesis. A high-throughput platform for inverse PCR to identify MMTV-host junction fragments and their nucleotide sequences in a large panel of MMTV-induced lesions was developed. Validation of the genes affected by MMTV-insertion was carried out by microarray analysis. Common integration site (CIS) means that the gene was altered by an MMTV proviral insertion in at least two independent lesions arising in different hosts. Three of the new genes identified as CIS for MMTV were assayed for their capability to confer on HC11 mouse mammary epithelial cells the ability for invasion, anchorage independent growth and tumor development in nude mice. Analysis of MMTV induced mammary premalignant hyperplastic outgrowth (HOG) lines and mammary tumors led to the identification of CIS restricted to 35 loci. Within these loci members of the Wnt, Fgf and Rspo gene families plus two linked genes (Npm3 and Ddn) were frequently activated in tumors induced by MMTV. A second group of 15 CIS occur at a low frequency (2-5 observations) in mammary HOGs or tumors. In this latter group the expression of either Phf19 or Sdc2 was shown to increase HC11 cells invasion capability. Foxl1 expression conferred on HC11 cells the capability for anchorage-independent colony formation in soft agar and tumor development in nude mice. The published transcriptome and nucleotide sequence analysis of gene expression in primary human breast tumors was interrogated. Twenty of the human orthologues of MMTV CIS associated genes are deregulated and/or mutated in human breast tumors.
PMCID: PMC3717796  PMID: 23131872
mouse mammary tumor virus; premalignant lesions; mammary tumors; genes; human breast carcinomas; metastases
8.  An Optimized Method for Computing 18O/16O Ratios of Differentially Stable-isotope Labeled Peptides in the Context of Post-digestion 18O Exchange/Labeling 
Analytical chemistry  2010;82(13):5878-5886.
Differential 18O/16O stable isotope labeling of peptides that relies on enzyme-catalyzed oxygen exchange at their carboxyl termini in the presence of H218O has been widely used for relative quantitation of peptides/proteins. The role of tryptic proteolysis in bottom-up shotgun proteomics and low reagent costs, has made trypsin-catalyzed 18O post-digestion exchange a convenient and affordable stable isotope labeling approach. However, it is known that trypsin-catalyzed 18O exchange at the carboxyl terminus is in many instances inhomogeneous/incomplete. The extent of the 18O exchange/incorporation fluctuates from peptide to peptide mostly due to variable enzyme-substrate affinity. Thus, accurate calculation and interpretation of peptide ratios are analytically complicated and in some regard deficient. Therefore, a computational approach capable of improved measurement of actual 18O incorporation for each differentially labeled peptide pair is needed. In this regard, we have developed an algorithmic method that relies on the trapezoidal rule to integrate peak intensities of all detected isotopic species across a particular peptide ion over the retention time, which fits the isotopic manifold to Poisson distributions. Optimal values for manifold fitting were calculated and then 18O/16O ratios derived via evolutionary programming. The algorithm is tested using trypsin–catalyzed 18O post-digestion exchange to differentially label bovine serum albumin (BSA) at a priori determined ratios. Both, accuracy and precision are improved utilizing this rigorous mathematical approach. Utilizing this algorithmic technique, we demonstrate the effectiveness of this method to accurately calculate 18O/16O ratios for differentially labeled BSA peptides, by accounting for artifacts caused by a variable degree of post-digestion 18O exchange. We further demonstrate the effectiveness of this method to accurately calculate 18O/16O ratios in a large scale proteomic quantitation of detergent resistant membrane microdomains (DRMMs) isolated from cells expressing wild-type HIV-1 Gag and its non myristylated mutant.
PMCID: PMC3479679  PMID: 20540505
quantitation; 18O/16O stable isotope labeling; variable/incomplete 18O exchange
9.  Rif2 Promotes a Telomere Fold-Back Structure through Rpd3L Recruitment in Budding Yeast 
PLoS Genetics  2012;8(9):e1002960.
Using a genome-wide screening approach, we have established the genetic requirements for proper telomere structure in Saccharomyces cerevisiae. We uncovered 112 genes, many of which have not previously been implicated in telomere function, that are required to form a fold-back structure at chromosome ends. Among other biological processes, lysine deacetylation, through the Rpd3L, Rpd3S, and Hda1 complexes, emerged as being a critical regulator of telomere structure. The telomeric-bound protein, Rif2, was also found to promote a telomere fold-back through the recruitment of Rpd3L to telomeres. In the absence of Rpd3 function, telomeres have an increased susceptibility to nucleolytic degradation, telomere loss, and the initiation of premature senescence, suggesting that an Rpd3-mediated structure may have protective functions. Together these data reveal that multiple genetic pathways may directly or indirectly impinge on telomere structure, thus broadening the potential targets available to manipulate telomere function.
Author Summary
Impaired telomere elongation eventually results in telomere dysfunction and can lead to diseases such as dyskeratosis congenita, which is associated with bone-marrow failure and pulmonary fibrosis. Cancer cells require continuous telomere maintenance to ensure continued cellular proliferation. Therefore the regulation of telomere function, both positively (in the case of dyskeratosis congenita) and negatively (for cancer), may be of therapeutic benefit. In this study we have used yeast to determine which genetic factors are important for a certain telomeric structure (the loop structure), which may help to maintain chromosome ends in a protected state. We found that multiple genetic factors and pathways affect telomere structure, ranging from metabolic signaling to specific telomere-binding proteins. We found that proper chromatin structure at the telomere is essential to maintain a telomere fold-back structure. Importantly, there was a strong correlation between telomere structure and function, as the mutants found in our screen (looping defective) were often associated with rapid senescence and telomere dysfunction phenotypes. We believe that, through the regulation of the various genetic pathways uncovered in our screen, one may be able to both positively and negatively influence telomere function.
PMCID: PMC3447961  PMID: 23028367
10.  The Mph1 Helicase Can Promote Telomere Uncapping and Premature Senescence in Budding Yeast 
PLoS ONE  2012;7(7):e42028.
Double strand breaks (DSBs) can be repaired via either Non-Homologous End Joining (NHEJ) or Homology directed Repair (HR). Telomeres, which resemble DSBs, are refractory to repair events in order to prevent chromosome end fusions and genomic instability. In some rare instances telomeres engage in Break-Induced Replication (BIR), a type of HR, in order to maintain telomere length in the absence of the enzyme telomerase. Here we have investigated how the yeast helicase, Mph1, affects DNA repair at both DSBs and telomeres. We have found that overexpressed Mph1 strongly inhibits BIR at internal DSBs however allows it to proceed at telomeres. Furthermore, while overexpressed Mph1 potently inhibits NHEJ at telomeres it has no effect on NHEJ at DSBs within the chromosome. At telomeres Mph1 is able to promote telomere uncapping and the accumulation of ssDNA, which results in premature senescence in the absence of telomerase. We propose that Mph1 is able to direct repair towards HR (thereby inhibiting NHEJ) at telomeres by remodeling them into a nuclease-sensitive structure, which promotes the accumulation of a recombinogenic ssDNA intermediate. We thus put forward that Mph1 is a double-edge sword at the telomere, it prevents NHEJ, but promotes senescence in cells with dysfunctional telomeres by increasing the levels of ssDNA.
PMCID: PMC3407055  PMID: 22848695
11.  Deregulated telomere transcription causes replication-dependent telomere shortening and promotes cellular senescence 
Nucleic Acids Research  2012;40(14):6649-6659.
Telomeres are transcribed into non-coding TElomeric Repeat containing RNAs (TERRA). We have employed a transcriptionally inducible telomere to investigate how telomere transcription affects telomere function in Saccharomyces cerevisiae. We report that telomere shortening resulting from high levels of telomere transcription stems from a DNA replication-dependent loss of telomere tracts, which can occur independent of both telomerase inhibition and homologous recombination. We show that in order for telomere loss to occur, transcription must pass through the telomere tract itself producing a TERRA molecule. We demonstrate that increased telomere transcription of a single telomere leads to a premature cellular senescence in the absence of a telomere maintenance mechanism (telomerase and homology directed repair). Similar rapid senescence and telomere shortening are also seen in sir2Δ cells with compromised telomere maintenance, where TERRA levels are increased at natural telomeres. These data suggest that telomere transcription must be tightly controlled to prevent telomere loss and early onset senescence.
PMCID: PMC3413150  PMID: 22553368
12.  The Role of Methylation in the Intrinsic Dynamics of B- and Z-DNA 
PLoS ONE  2012;7(4):e35558.
Methylation of cytosine at the 5-carbon position (5mC) is observed in both prokaryotes and eukaryotes. In humans, DNA methylation at CpG sites plays an important role in gene regulation and has been implicated in development, gene silencing, and cancer. In addition, the CpG dinucleotide is a known hot spot for pathologic mutations genome-wide. CpG tracts may adopt left-handed Z-DNA conformations, which have also been implicated in gene regulation and genomic instability. Methylation facilitates this B-Z transition but the underlying mechanism remains unclear. Herein, four structural models of the dinucleotide d(GC)5 repeat sequence in B-, methylated B-, Z-, and methylated Z-DNA forms were constructed and an aggregate 100 nanoseconds of molecular dynamics simulations in explicit solvent under physiological conditions was performed for each model. Both unmethylated and methylated B-DNA were found to be more flexible than Z-DNA. However, methylation significantly destabilized the BII, relative to the BI, state through the Gp5mC steps. In addition, methylation decreased the free energy difference between B- and Z-DNA. Comparisons of α/γ backbone torsional angles showed that torsional states changed marginally upon methylation for B-DNA, and Z-DNA. Methylation-induced conformational changes and lower energy differences may contribute to the transition to Z-DNA by methylated, over unmethylated, B-DNA and may be a contributing factor to biological function.
PMCID: PMC3328458  PMID: 22530050
13.  Getting in (and out of) the loop: regulating higher order telomere structures 
Frontiers in Oncology  2012;2:180.
The DNA at the ends of linear chromosomes (the telomere) folds back onto itself and forms an intramolecular lariat-like structure. Although the telomere loop has been implicated in the protection of chromosome ends from nuclease-mediated resection and unscheduled DNA repair activities, it potentially poses an obstacle to the DNA replication machinery during S-phase. Therefore, the coordinated regulation of telomere loop formation, maintenance, and resolution is required in order to establish a balance between protecting the chromosome ends and promoting their duplication prior to cell division. Until recently, the only factor known to influence telomere looping in human cells was TRF2, a component of the shelterin complex. Recent work in yeast and mouse cells has uncovered additional regulatory factors that affect the loop structure at telomeres. In the following “perspective” we outline what is known about telomere looping and highlight the latest results regarding the regulation of this chromosome end structure. We speculate about how the manipulation of the telomere loop may have therapeutic implications in terms of diseases associated with telomere dysfunction and uncontrolled proliferation.
PMCID: PMC3510458  PMID: 23226680
t-loop; telomere; RTEL1; end protection; cancer; Mph1
15.  Non-B DB: a database of predicted non-B DNA-forming motifs in mammalian genomes 
Nucleic Acids Research  2010;39(Database issue):D383-D391.
Although the capability of DNA to form a variety of non-canonical (non-B) structures has long been recognized, the overall significance of these alternate conformations in biology has only recently become accepted en masse. In order to provide access to genome-wide locations of these classes of predicted structures, we have developed non-B DB, a database integrating annotations and analysis of non-B DNA-forming sequence motifs. The database provides the most complete list of alternative DNA structure predictions available, including Z-DNA motifs, quadruplex-forming motifs, inverted repeats, mirror repeats and direct repeats and their associated subsets of cruciforms, triplex and slipped structures, respectively. The database also contains motifs predicted to form static DNA bends, short tandem repeats and homo(purine•pyrimidine) tracts that have been associated with disease. The database has been built using the latest releases of the human, chimp, dog, macaque and mouse genomes, so that the results can be compared directly with other data sources. In order to make the data interpretable in a genomic context, features such as genes, single-nucleotide polymorphisms and repetitive elements (SINE, LINE, etc.) have also been incorporated. The database is accessed through query pages that produce results with links to the UCSC browser and a GBrowse-based genomic viewer. It is freely accessible at
PMCID: PMC3013731  PMID: 21097885
16.  18O Stable Isotope Labeling in MS-based Proteomics 
A variety of stable isotope labeling techniques have been developed and used in mass spectrometry (MS)-based proteomics, primarily for relative quantitation of changes in protein abundances between two compared samples, but also for qualitative characterization of differentially labeled proteomes. Differential 16O/18O coding relies on the 18O exchange that takes place at the C-terminal carboxyl group of proteolytic fragments, where two 16O atoms are typically replaced by two 18O atoms by enzyme-catalyzed oxygen-exchange in the presence of H218O. The resulting mass shift between differentially labeled peptide ions permits identification, characterization and quantitation of proteins from which the peptides are proteolytically generated. This review focuses on the utility of 16O/18O labeling within the context of mass spectrometry-based proteome research. Different strategies employing 16O/18O are examined in the context of global comparative proteome profiling, targeted subcellular proteomics, analysis of post-translational modifications and biomarker discovery. Also discussed are analytical issues related to this technique, including variable 18O exchange along with advantages and disadvantages of 16O/18O labeling in comparison with other isotope-coding techniques.
PMCID: PMC2722262  PMID: 19151093
18O labeling; enzyme-mediated isotope incorporation; stable isotope labeling; MS-based proteomics; relative protein quantitation; LC/MS/MS
17.  TERRA: telomeric repeat-containing RNA 
The EMBO Journal  2009;28(17):2503-2510.
Telomeres, the physical ends of eukaryotic chromosomes, consist of tandem arrays of short DNA repeats and a large set of specialized proteins. A recent analysis has identified telomeric repeat-containing RNA (TERRA), a large non-coding RNA in animals and fungi, which forms an integral component of telomeric heterochromatin. TERRA transcription occurs at most or all chromosome ends and it is regulated by RNA surveillance factors and in response to changes in telomere length. TERRA functions that are emerging suggest important roles in the regulation of telomerase and in orchestrating chromatin remodelling throughout development and cellular differentiation. The accumulation of TERRA at telomeres can also interfere with telomere replication, leading to a sudden loss of telomere tracts. Such a phenotype can be observed upon impairment of the RNA surveillance machinery or in cells from ICF (Immunodeficiency, Centromeric region instability, Facial anomalies) patients, in which TERRA is upregulated because of DNA methylation defects in the subtelomeric region. Thus, TERRA may mediate several crucial functions at the telomeres, a region of the genome that had been considered to be transcriptionally silent.
PMCID: PMC2722245  PMID: 19629047
chromatin; non-coding RNA; telomerase; telomeres; TERRA
18.  Examining the significance of fingerprint-based classifiers 
BMC Bioinformatics  2008;9:545.
Experimental examinations of biofluids to measure concentrations of proteins or their fragments or metabolites are being explored as a means of early disease detection, distinguishing diseases with similar symptoms, and drug treatment efficacy. Many studies have produced classifiers with a high sensitivity and specificity, and it has been argued that accurate results necessarily imply some underlying biology-based features in the classifier. The simplest test of this conjecture is to examine datasets designed to contain no information with classifiers used in many published studies.
The classification accuracy of two fingerprint-based classifiers, a decision tree (DT) algorithm and a medoid classification algorithm (MCA), are examined. These methods are used to examine 30 artificial datasets that contain random concentration levels for 300 biomolecules. Each dataset contains between 30 and 300 Cases and Controls, and since the 300 observed concentrations are randomly generated, these datasets are constructed to contain no biological information. A modest search of decision trees containing at most seven decision nodes finds a large number of unique decision trees with an average sensitivity and specificity above 85% for datasets containing 60 Cases and 60 Controls or less, and for datasets with 90 Cases and 90 Controls many DTs have an average sensitivity and specificity above 80%. For even the largest dataset (300 Cases and 300 Controls) the MCA procedure finds several unique classifiers that have an average sensitivity and specificity above 88% using only six or seven features.
While it has been argued that accurate classification results must imply some biological basis for the separation of Cases from Controls, our results show that this is not necessarily true. The DT and MCA classifiers are sufficiently flexible and can produce good results from datasets that are specifically constructed to contain no information. This means that a chance fitting to the data is possible. All datasets used in this investigation are available on the web.
This work is funded by NCI Contract N01-CO-12400.
PMCID: PMC2628908  PMID: 19091087
19.  Increased serum levels of complement C3a anaphylatoxin indicate the presence of colorectal tumors 
Gastroenterology  2006;131(4):1020-1284.
Background & Aims
Late diagnosis of colorectal carcinomas results in a significant reduction of average survival times. Yet, despite screening programs about 70% of tumors are detected at advanced stages (UICC III/IV). We explored whether detection of malignant disease would be possible through identification of tumor specific protein biomarkers in serum samples.
A discovery set of sera from patients with colorectal malignancy (n=58) and healthy control individuals (n=32) were screened for potential differences using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS). Candidate proteins were identified, and their expression levels validated in independent sample sets using a specific immunoassay (ELISA).
Utilizing class comparison and custom developed algorithms we identified several m/z values that were differentially expressed between the malignant samples and the healthy controls of the discovery set. Characterization of the most prominent m/z values revealed a member of the complement system, the stable form of C3a anaphylatoxin, i.e., C3a-desArg. Based on a specific ELISA, serum levels of complement C3a-desArg predicted the presence of colorectal malignancy in a blinded validation set (n=59) with a sensitivity of 96.8% and a specificity of 96.2%. Increased serum levels were also detected in 86.1% of independently collected sera from patients with colorectal adenomas (n=36), while only 5.6% were classified as normal.
Complement C3a-desArg is present at significantly higher levels in serum from patients with colorectal adenomas (p<0.0001) and carcinomas (p<0.0001) than in healthy individuals. This suggests that quantification of C3a-desArg levels could ameliorate existing screening tests for colorectal cancer.
PMCID: PMC2532535  PMID: 17030172
Colorectal Cancer; Polyps; Screening; Serum; SELDI-TOF MS; C3a-desArg
20.  Polymorphism Interaction Analysis (PIA): a method for investigating complex gene-gene interactions 
BMC Bioinformatics  2008;9:146.
The risk of common diseases is likely determined by the complex interplay between environmental and genetic factors, including single nucleotide polymorphisms (SNPs). Traditional methods of data analysis are poorly suited for detecting complex interactions due to sparseness of data in high dimensions, which often occurs when data are available for a large number of SNPs for a relatively small number of samples. Validation of associations observed using multiple methods should be implemented to minimize likelihood of false-positive associations. Moreover, high-throughput genotyping methods allow investigators to genotype thousands of SNPs at one time. Investigating associations for each individual SNP or interactions between SNPs using traditional approaches is inefficient and prone to false positives.
We developed the Polymorphism Interaction Analysis tool (PIA version 2.0) to include different approaches for ranking and scoring SNP combinations, to account for imbalances between case and control ratios, stratify on particular factors, and examine associations of user-defined pathways (based on SNP or gene) with case status. PIA v. 2.0 detected 2-SNP interactions as the highest ranking model 77% of the time, using simulated data sets of genetic models of interaction (minor allele frequency = 0.2; heritability = 0.01; N = 1600) generated previously [Velez DR, White BC, Motsinger AA, Bush WS, Ritchie MD, Williams SM, Moore JH: A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol 2007, 31:306–315.]. Interacting SNPs were detected in both balanced (20 SNPs) and imbalanced data (case:control 1:2 and 1:4, 10 SNPs) in the context of non-interacting SNPs.
PIA v. 2.0 is a useful tool for exploring gene*gene or gene*environment interactions and identifying a small number of putative associations which may be investigated further using other statistical methods and in replication study populations.
PMCID: PMC2335300  PMID: 18325117
21.  Exploring SNP-SNP interactions and colon cancer risk using polymorphism interaction analysis 
Several single nucleotide polymorphisms (SNPs) in genes derived from distinct pathways are associated with colon cancer risk; however, few studies have examined SNP-SNP interactions concurrently. We explored the association between colon cancer and 94 SNPs, using a novel approach, polymorphism interaction analysis (PIA). We developed PIA to examine all possible SNP combinations, based on the 94 SNPs studied in 216 male colon cancer cases and 255 male controls, employing 2 separate functions that cross-validate and minimize false-positive results in the evaluation of SNP combinations to predict colon cancer risk. PIA identified previously described null polymorphisms in glutathione-S-transferase T1 (GSTT1) as the best predictor of colon cancer among the studied SNPs, and also identified novel polymorphisms in the inflammation and hormone metabolism pathways that singly or jointly predict cancer risk. PIA identified SNPs that may interact with the GSTT1 polymorphism, including coding polymorphisms in TP53 (Arg72Pro in p53) and CASP8 (Asp302His in caspase 8), which may modify the association between this polymorphism and colon cancer. This was confirmed by logistic regression, as the GSTT1 null polymorphism in combination with either the TP53 or the CASP8 polymorphism significantly alter colon cancer risk (pinteraction < 0.02 for both). GSTT1 prevents DNA damage by detoxifying mutagenic compounds, while the p53 protein facilitates repair of DNA damage and induces apoptosis, and caspase 8 is activated in p53-mediated apoptosis. Our results suggest that PIA is a valid method for suggesting SNP-SNP interactions that may be validated in future studies, using more traditional statistical methods on different datasets (Supplementary material can be found on the International Journal of Cancer website at
PMCID: PMC1451415  PMID: 16217767
polymorphism interaction analysis; single nucleotide polymorphism; colon cancer
22.  Saccharomyces cerevisiae Ebs1p is a putative ortholog of human Smg7 and promotes nonsense-mediated mRNA decay 
Nucleic Acids Research  2007;35(22):7688-7697.
The Smg proteins Smg5, Smg6 and Smg7 are involved in nonsense-mediated RNA decay (NMD) in metazoans, but no orthologs have been found in the budding yeast Saccharomyces cerevisiae. Sequence alignments reveal that yeast Ebs1p is similar in structure to the human Smg5-7, with highest homology to Smg7. We demonstrate here that Ebs1p is involved in NMD and behaves similarly to human Smg proteins. Indeed, both loss and overexpression of Ebs1p results in stabilization of NMD targets. However, Ebs1-loss in yeast or Smg7-depletion in human cells only partially disrupts NMD and in the latter, Smg7-depletion is partially compensated for by Smg6. Ebs1p physically interacts with the NMD helicase Upf1p and overexpressed Ebs1p leads to recruitment of Upf1p into cytoplasmic P-bodies. Furthermore, Ebs1p localizes to P-bodies upon glucose starvation along with Upf1p. Overall our findings suggest that NMD is more conserved in evolution than previously thought, and that at least one of the Smg5-7 proteins is conserved in budding yeast.
PMCID: PMC2190716  PMID: 17984081
23.  The Use of Urine Proteomic and Metabonomic Patterns for the Diagnosis of Interstitial Cystitis and Bacterial Cystitis 
Disease markers  2004;19(4-5):169-183.
The advent of systems biology approaches that have stemmed from the sequencing of the human genome has led to the search for new methods to diagnose diseases. While much effort has been focused on the identification of disease-specific biomarkers, recent efforts are underway toward the use of proteomic and metabonomic patterns to indicate disease. We have developed and contrasted the use of both proteomic and metabonomic patterns in urine for the detection of interstitial cystitis (IC). The methodology relies on advanced bioinformatics to scrutinize information contained within mass spectrometry (MS) and high-resolution proton nuclear magnetic resonance (1H-NMR) spectral patterns to distinguish IC-affected from non-affected individuals as well as those suffering from bacterial cystitis (BC). We have applied a novel pattern recognition tool that employs an unsupervised system (self-organizing-type cluster mapping) as a fitness test for a supervised system (a genetic algorithm). With this approach, a training set comprised of mass spectra and 1H-NMR spectra from urine derived from either unaffected individuals or patients with IC is employed so that the most fit combination of relative, normalized intensity features defined at precise m/z or chemical shift values plotted in n-space can reliably distinguish the cohorts used in training. Using this bioinformatic approach, we were able to discriminate spectral patterns associated with IC-affected, BC-affected, and unaffected patients with a success rate of approximately 84%.
PMCID: PMC3850593  PMID: 15258332
24.  An analysis of a preoperative pediatric autologous blood donation program 
Canadian Journal of Surgery  2000;43(2):125-129.
To determine the efficacy of a pediatric autologous blood donation program.
A retrospective study of patient charts and blood-bank records.
The Children’s Hospital of Eastern Ontario, Ottawa, a tertiary care, pediatric centre.
One hundred and seventy-three children who received blood transfusions for a total of 182 procedures between June 1987 and June 1997.
Autologous and homologous blood transfusion required for major surgical intervention, primarily spinal fusion.
Main outcome measures
Surgeons’ accuracy in predicting the number of autologous blood units required for a given procedure, compliance rate (children’s ability to donate the requested volume of blood), utilization rate of autologous units and rate of allogeneic transfusion.
The surgeons’ accuracy in predicting the number of autologous units required for a given procedure was 53.8%. The compliance rate of children to donate the requested amount of blood was 80.3%. In children below the standard age and weight criteria for blood donation the compliance rate was 75.5%. The utilization rate of autologous units obtained was 84.4% and the incidence of allogeneic transfusion was 26.6%.
There was a high rate of compliance and utilization of predonated autologous blood in the children in the study. Preoperative blood donation programs are safe and effective in children, even in those below the standard age and weight criteria of 10 years and 40 kg.
PMCID: PMC3695125  PMID: 10812347

Results 1-24 (24)