In an era of biodiversity crisis, arthropods have great potential to inform conservation assessment and test hypotheses about community assembly. This is because their relatively narrow geographic distributions and high diversity offer high-resolution data on landscape-scale patterns of biodiversity. However, a major impediment to the more widespread application of arthropod data to a range of scientific and policy questions is the poor state of modern arthropod taxonomy, especially in the tropics. Inventories of spiders and other megadiverse arthropods from tropical forests are dominated by undescribed species. Such studies typically organize their data using morphospecies codes, which make it difficult for data from independent inventories to be compared and combined. To combat this shortcoming, we offer cyberdiversity, an online community-based approach for reconciling results of independent inventory studies where current taxonomic knowledge is incomplete. Participating scientists can upload images and DNA barcode sequences to dedicated databases and submit occurrence data and links to a web site (www.digitalSpiders.org). Taxonomic determinations can be shared with a crowdsourcing comments feature, and researchers can discover specimens of interest available for loan and request aliquots of genomic DNA extract. To demonstrate the value of the cyberdiversity framework, we reconcile data from three rapid structured inventories of spiders conducted in Vietnam with an independent inventory (Doi Inthanon, Thailand) using online image libraries. Species richness and inventory completeness were assessed using non-parametric estimators. Community similarity was evaluated using a novel index based on the Jaccard replacing observed with estimated values to correct for unobserved species. We use a distance-decay framework to demonstrate a rudimentary model of landscape-scale changes in community composition that will become increasingly informative as additional inventories participate. With broader adoption of the cyberdiversity approach, networks of information-sharing taxonomists can more efficiently and effectively address taxonomic impediments while elucidating landscape scale patterns of biodiversity.
Neuroanatomically precise, genome-wide maps of transcript distributions are critical resources to complement genomic sequence data and to correlate functional and genetic brain architecture. Here we describe the generation and analysis of a transcriptional atlas of the adult human brain, comprising extensive histological analysis and comprehensive microarray profiling of ~900 neuroanatomically precise subdivisions in two individuals. Transcriptional regulation varies enormously by anatomical location, with different regions and their constituent cell types displaying robust molecular signatures that are highly conserved between individuals. Analysis of differential gene expression and gene co-expression relationships demonstrates that brain-wide variation strongly reflects the distributions of major cell classes such as neurons, oligodendrocytes, astrocytes and microglia. Local neighbourhood relationships between fine anatomical subdivisions are associated with discrete neuronal subtypes and genes involved with synaptic transmission. The neocortex displays a relatively homogeneous transcriptional pattern, but with distinct features associated selectively with primary sensorimotor cortices and with enriched frontal lobe expression. Notably, the spatial topography of the neocortex is strongly reflected in its molecular topography— the closer two cortical regions, the more similar their transcriptomes. This freely accessible online data resource forms a high-resolution transcriptional baseline for neurogenetic studies of normal and abnormal human brain function.
Neuroscience; Genetics; Genomics; Databases
Autism spectrum disorder (ASD) is a complex developmental syndrome of unknown etiology. Recent studies employing exome- and genome-wide sequencing have identified nine high-confidence ASD (hcASD) genes. Working from the hypothesis that ASD-associated mutations in these biologically pleiotropic genes will disrupt intersecting developmental processes to contribute to a common phenotype, we have attempted to identify time periods, brain regions, and cell types in which these genes converge. We have constructed coexpression networks based on the hcASD “seed” genes, leveraging a rich expression data set encompassing multiple human brain regions across human development and into adulthood. By assessing enrichment of an independent set of probable ASD (pASD) genes, derived from the same sequencing studies, we demonstrate a key point of convergence in midfetal layer 5/6 cortical projection neurons. This approach informs when, where, and in what cell types mutations in these specific genes may be productively studied to clarify ASD pathophysiology.
The anatomical and functional architecture of the human brain is largely determined by prenatal transcriptional processes. We describe an anatomically comprehensive atlas of mid-gestational human brain, including de novo reference atlases, in situ hybridization, ultra-high resolution magnetic resonance imaging (MRI) and microarray analysis on highly discrete laser microdissected brain regions. In developing cerebral cortex, transcriptional differences are found between different proliferative and postmitotic layers, wherein laminar signatures reflect cellular composition and developmental processes. Cytoarchitectural differences between human and mouse have molecular correlates, including species differences in gene expression in subplate, although surprisingly we find minimal differences between the inner and human-expanded outer subventricular zones. Both germinal and postmitotic cortical layers exhibit fronto-temporal gradients, with particular enrichment in frontal lobe. Finally, many neurodevelopmental disorder and human evolution-related genes show patterned expression, potentially underlying unique features of human cortical formation. These data provide a rich, freely-accessible resource for understanding human brain development.
Human brain; Transcriptome; Microarray; Development; Gene expression; Evolution
Background: Recent years have seen a surge in projects that produce large volumes of structured, machine-readable biodiversity data. To make these data amenable to processing by generic, open source “data enrichment” workflows, they are increasingly being represented in a variety of standards-compliant interchange formats. Here, we report on an initiative in which software developers and taxonomists came together to address the challenges and highlight the opportunities in the enrichment of such biodiversity data by engaging in intensive, collaborative software development: The Biodiversity Data Enrichment Hackathon.
Results: The hackathon brought together 37 participants (including developers and taxonomists, i.e. scientific professionals that gather, identify, name and classify species) from 10 countries: Belgium, Bulgaria, Canada, Finland, Germany, Italy, the Netherlands, New Zealand, the UK, and the US. The participants brought expertise in processing structured data, text mining, development of ontologies, digital identification keys, geographic information systems, niche modeling, natural language processing, provenance annotation, semantic integration, taxonomic name resolution, web service interfaces, workflow tools and visualisation. Most use cases and exemplar data were provided by taxonomists.
One goal of the meeting was to facilitate re-use and enhancement of biodiversity knowledge by a broad range of stakeholders, such as taxonomists, systematists, ecologists, niche modelers, informaticians and ontologists. The suggested use cases resulted in nine breakout groups addressing three main themes: i) mobilising heritage biodiversity knowledge; ii) formalising and linking concepts; and iii) addressing interoperability between service platforms. Another goal was to further foster a community of experts in biodiversity informatics and to build human links between research projects and institutions, in response to recent calls to further such integration in this research domain.
Conclusions: Beyond deriving prototype solutions for each use case, areas of inadequacy were discussed and are being pursued further. It was striking how many possible applications for biodiversity data there were and how quickly solutions could be put together when the normal constraints to collaboration were broken down for a week. Conversely, mobilising biodiversity knowledge from their silos in heritage literature and natural history collections will continue to require formalisation of the concepts (and the links between them) that define the research domain, as well as increased interoperability between the software platforms that operate on these concepts.
Biodiversity informatics; Data enrichment; Hackathon; Intelligent openness; Linked data; Open source; Software; Semantic Web; Taxonomy; Web services
Although diagnostic x-ray procedures provide important medical benefits, cancer risks associated with their exposure are also possible, but not well characterized. The US Radiologic Technologists Study (1983–2006) is a nationwide, prospective cohort study with extensive questionnaire data on history of personal diagnostic imaging procedures collected prior to cancer diagnosis. We used Cox proportional hazard regressions to estimate thyroid cancer risks related to the number and type of selected procedures. We assessed potential modifying effects of age and calendar year of the first x-ray procedure in each category of procedures. Incident thyroid cancers (n = 251) were diagnosed among 75,494 technologists (1.3 million person-years; mean follow-up = 17 years). Overall, there was no clear evidence of thyroid cancer risk associated with diagnostic x-rays except for dental x-rays. We observed a 13% increase in thyroid cancer risk for every 10 reported dental radiographs (hazard ratio = 1.13, 95% confidence interval: 1.01, 1.26), which was driven by dental x-rays first received before 1970, but we found no evidence that the relationship between dental x-rays and thyroid cancer was associated with childhood or adolescent exposures as would have been anticipated. The lack of association of thyroid cancer with x-ray procedures that expose the thyroid to higher radiation doses than do dental x-rays underscores the need to conduct a detailed radiation exposure assessment to enable quantitative evaluation of risk.
radiation; radiography; thyroid gland; thyroid neoplasms; x-rays
We studied cancer mortality in a cohort of 5,573 women with scoliosis and other spine disorders diagnosed between 1912 and 1965, and who were exposed to frequent diagnostic X-ray procedures. Patients were identified from medical records in 14 orthopedic medical centers in the United States and followed for vital status and address through Dec 31, 2004, using publicly available regional, state, and nation-wide databases. Causes of death were obtained from death certificates or through linkage with the National Death Index (NDI). Statistical analyses included standardized mortality ratios (SMR=observed/expected) based on death rates for U.S. females, and internal comparisons using Cox regression models with attained age as the time scale. Diagnostic radiation exposure was estimated from radiology files for over 137,000 procedures; estimated average cumulative radiation doses to the breast, lung, thyroid and bone marrow were 10.9, 4.1, 7.4, and 1.0 cGy, respectively. After a median follow-up period of 47 years, 1527 women died including 355 from cancer. Cancer mortality was 8% higher than expected (95% CI=0.97–1.20). Mortality from breast cancer was significantly elevated (SMR=1.68; 95% CI: 1.38–2.02), whereas death rates from several other cancers were below expectation, in particular lung (SMR=0.77), cervical (SMR=0.31), and liver (SMR=0.17). The excess relative risk (ERR) for breast cancer mortality increased significantly with 10-yr lagged radiation dose to the breast (ERR/Gy=3.9; 95% CI: 1.0–9.3).
breast neoplasms; radiation-induced; mortality; radiation; radiography; scoliosis; cohort study; epidemiology
danaugirangensis sp. n. (Araneae: Symphytognathidae) was discovered during a tropical ecology field course held at the Danau Girang Field Centre in Sabah, Malaysia. A taxonomic description and accompanying ecological study were completed as course activities. To assess the ecology of this species, which belongs to the ground-web-building spider community, three habitat types were surveyed: riparian forest, recently inundated riverine forest, and oil palm plantation. Crassignatha
danaugirangensis sp. n. is the most abundant ground-web-building spider species in riparian forest; it is rare or absent from the recently inundated forest and was not found in a nearby oil palm plantation. The availability of this taxonomic description may help facilitate the accumulation of data about this species and the role of inundated riverine forest in shaping invertebrate communities.
; disturbance; inundation; oil palm plantation; riparian forest; riverine forest; tropical field course
High-throughput sequencing is gradually replacing microarrays as the preferred method for studying mRNA expression levels, providing nucleotide resolution and accurately measuring absolute expression levels of almost any transcript, known or novel. However, existing microarray data from clinical, pharmaceutical, and academic settings represent valuable and often underappreciated resources, and methods for assessing and improving the quality of these data are lacking.
To quantitatively assess the quality of microarray probes, we directly compare RNA-Seq to Agilent microarrays by processing 231 unique samples from the Allen Human Brain Atlas using RNA-Seq. Both techniques provide highly consistent, highly reproducible gene expression measurements in adult human brain, with RNA-Seq slightly outperforming microarray results overall. We show that RNA-Seq can be used as ground truth to assess the reliability of most microarray probes, remove probes with off-target effects, and scale probe intensities to match the expression levels identified by RNA-Seq. These sequencing scaled microarray intensities (SSMIs) provide more reliable, quantitative estimates of absolute expression levels for many genes when compared with unscaled intensities. Finally, we validate this result in two human cell lines, showing that linear scaling factors can be applied across experiments using the same microarray platform.
Microarrays provide consistent, reproducible gene expression measurements, which are improved using RNA-Seq as ground truth. We expect that our strategy could be used to improve probe quality for many data sets from major existing repositories.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-154) contains supplementary material, which is available to authorized users.
Allen Brain Atlas; Microarray; RNA-Seq; High-throughput sequencing; Transcriptome profiling; Reliability; Gene expression; Brain
We report initial results from an ongoing effort to build a library of DNA barcode sequences for Dutch spiders and investigate the utility of museum collections as a source of specimens for barcoding spiders. Source material for the library comes from a combination of specimens freshly collected in the field specifically for this project and museum specimens collected in the past. For the museum specimens, we focus on 31 species that have been frequently collected over the past several decades. A series of progressively older specimens representing these 31 species were selected for DNA barcoding. Based on the pattern of sequencing successes and failures, we find that smaller-bodied species expire before larger-bodied species as tissue sources for single-PCR standard DNA barcoding. Body size and age of oldest successful DNA barcode are significantly correlated after factoring out phylogenetic effects using independent contrasts analysis. We found some evidence that extracted DNA concentration is correlated with body size and inversely correlated with time since collection, but these relationships are neither strong nor consistent. DNA was extracted from all specimens using standard destructive techniques involving the removal and grinding of tissue. A subset of specimens was selected to evaluate nondestructive extraction. Nondestructive extractions significantly extended the DNA barcoding shelf life of museum specimens, especially small-bodied species, and yielded higher DNA concentrations compared to destructive extractions. All primary data are publically available through a Dryad archive and the Barcode of Life database.
DNA extraction; DNA concentration; DNA preservation; DNA degradation; independent contrasts; museum
The study of parasitoids and their hosts suffers from a lack of reliable taxonomic data. We use a combination of morphological characters and DNA sequences to produce taxonomic determinations that can be verified with reference to specimens in an accessible collection and DNA barcode sequences posted to the Barcode of Life database (BOLD). We demonstrate that DNA can be successfully extracted from consumed host spiders and the shed pupal case of a wasp using non-destructive methods. We found Acrodactyla
quadrisculpta to be a parasitoid of Tetragnatha
percontatoria and Zatypota
bohemani both are parasitoids of Neottiura
anomala is a parasitoid of an as yet unidentified host in the family Dictynidae, but the host species may be possible to identify in the future as the library of reference sequences on BOLD continues to grow. The study of parasitoids and their hosts traditionally requires specialized knowledge and techniques, and accumulating data is a slow process. DNA barcoding could allow more professional and amateur naturalists to contribute data to this field of study. A publication venue dedicated to aggregating datasets of all sizes online is well suited to this model of distributed science.
DNA barcode; host; morphological identification; non-destructive extraction; parasitoid
A recent report found that left-handed adolescents were over three-fold more likely to have an Apolipoprotein (APOE) ε2 allele. This study was unable to replicate this association in young-adults (N=166). A meta-analysis of nine other datasets (N = 360 to 7,559, Power > 0.999) including that of National Alzheimer’s Coordinating Center also failed to find an over-representation of ε2 among left-handers indicating that this earlier outcome was most likely a statistical artifact.
APOE; handedness; right
Male spermatogenesis is a complex biological process that is regulated by hormonal signals from the hypothalamus (GnRH), the pituitary gonadotropins (LH and FSH) and the testis (androgens, inhibin). The two key somatic cell types of the testis, Leydig and Sertoli cells, respond to gonadotropins and androgens and regulate the development and maturation of fertilization competent spermatozoa. Although progress has been made in the identification of specific transcripts that are translated in Sertoli and Leydig cells and their response to hormones, efforts to expand these studies have been restricted by technical hurdles. In order to address this problem we have applied an in vivo ribosome tagging strategy (RiboTag) that allows a detailed and physiologically relevant characterization of the “translatome” (polysome-associated mRNAs) of Leydig or Sertoli cells in vivo. Our analysis identified all previously characterized Leydig and Sertoli cell-specific markers and identified in a comprehensive manner novel markers of Leydig and Sertoli cells; the translational response of these two cell types to gonadotropins or testosterone was also investigated. Modulation of a small subset of Sertoli cell genes occurred after FSH and testosterone stimulation. However, Leydig cells responded robustly to gonadotropin deprivation and LH restoration with acute changes in polysome-associated mRNAs. These studies identified the transcription factors that are induced by LH stimulation, uncovered novel potential regulators of LH signaling and steroidogenesis, and demonstrate the effects of LH on the translational machinery in vivo in the Leydig cell.
Transcriptional studies suggest Alzheimer's disease (AD) involves dysfunction of many cellular pathways, including synaptic transmission, cytoskeletal dynamics, energetics, and apoptosis. Despite known progression of AD pathologies, it is unclear how such striking regional vulnerability occurs, or which genes play causative roles in disease progression.
To address these issues, we performed a large-scale transcriptional analysis in the CA1 and relatively less vulnerable CA3 brain regions of individuals with advanced AD and nondemented controls. In our study, we assessed differential gene expression across region and disease status, compared our results to previous studies of similar design, and performed an unbiased co-expression analysis using weighted gene co-expression network analysis (WGCNA). Several disease genes were identified and validated using qRT-PCR.
We find disease signatures consistent with several previous microarray studies, then extend these results to show a relationship between disease status and brain region. Specifically, genes showing decreased expression with AD progression tend to show enrichment in CA3 (and vice versa), suggesting transcription levels may reflect a region's vulnerability to disease. Additionally, we find several candidate vulnerability (ABCA1, MT1H, PDK4, RHOBTB3) and protection (FAM13A1, LINGO2, UNC13C) genes based on expression patterns. Finally, we use a systems-biology approach based on WGCNA to uncover disease-relevant expression patterns for major cell types, including pathways consistent with a key role for early microglial activation in AD.
These results paint a picture of AD as a multifaceted disease involving slight transcriptional changes in many genes between regions, coupled with a systemic immune response, gliosis, and neurodegeneration. Despite this complexity, we find that a consistent picture of gene expression in AD is emerging.
Human Immunodeficiency Virus-1 (HIV) infection frequently results in neurocognitive impairment. While the cause remains unclear, recent gene expression studies have identified genes whose transcription is dysregulated in individuals with HIV-association neurocognitive disorder (HAND). However, the methods for interpretation of such data have lagged behind the technical advances allowing the decoding genetic material. Here, we employ systems biology methods novel to the field of NeuroAIDS to further interrogate extant transcriptome data derived from brains of HIV + patients in order to further elucidate the neuropathogenesis of HAND. Additionally, we compare these data to those derived from brains of individuals with Alzheimer’s disease (AD) in order to identify common pathways of neuropathogenesis.
In Study 1, using data from three brain regions in 6 HIV-seronegative and 15 HIV + cases, we first employed weighted gene co-expression network analysis (WGCNA) to further explore transcriptome networks specific to HAND with HIV-encephalitis (HIVE) and HAND without HIVE. We then used a symptomatic approach, employing standard expression analysis and WGCNA to identify networks associated with neurocognitive impairment (NCI), regardless of HIVE or HAND diagnosis. Finally, we examined the association between the CNS penetration effectiveness (CPE) of antiretroviral regimens and brain transcriptome. In Study 2, we identified common gene networks associated with NCI in both HIV and AD by correlating gene expression with pre-mortem neurocognitive functioning.
Study 1: WGCNA largely corroborated findings from standard differential gene expression analyses, but also identified possible meta-networks composed of multiple gene ontology categories and oligodendrocyte dysfunction. Differential expression analysis identified hub genes highly correlated with NCI, including genes implicated in gliosis, inflammation, and dopaminergic tone. Enrichment analysis identified gene ontology categories that varied across the three brain regions, the most notable being downregulation of genes involved in mitochondrial functioning. Finally, WGCNA identified dysregulated networks associated with NCI, including oligodendrocyte and mitochondrial functioning. Study 2: Common gene networks dysregulated in relation to NCI in AD and HIV included mitochondrial genes, whereas upregulation of various cancer-related genes was found.
While under-powered, this study identified possible biologically-relevant networks correlated with NCI in HIV, and common networks shared with AD, opening new avenues for inquiry in the investigation of HAND neuropathogenesis. These results suggest that further interrogation of existing transcriptome data using systems biology methods can yield important information.
HIV encephalitis; HIV-associated dementia; HIV-associated neurocognitive disorder; Weighted gene coexpression network analysis; WGCNA; CNS penetration effectiveness; National neuroAIDS tissue consortium; Coexpression module
cybertaxonomy; open access publishing; semantic content; XML markup
The National NeuroAIDS Tissue Consortium (NNTC) performed a brain gene expression array to elucidate pathophysiologies of Human Immunodeficiency Virus type 1 (HIV-1)-associated neurocognitive disorders.
Twenty-four human subjects in four groups were examined A) Uninfected controls; B) HIV-1 infected subjects with no substantial neurocognitive impairment (NCI); C) Infected with substantial NCI without HIV encephalitis (HIVE); D) Infected with substantial NCI and HIVE. RNA from neocortex, white matter, and neostriatum was processed with the Affymetrix® array platform.
With HIVE the HIV-1 RNA load in brain tissue was three log10 units higher than other groups and over 1,900 gene probes were regulated. Interferon response genes (IFRGs), antigen presentation, complement components and CD163 antigen were strongly upregulated. In frontal neocortex downregulated neuronal pathways strongly dominated in HIVE, including GABA receptors, glutamate signaling, synaptic potentiation, axon guidance, clathrin-mediated endocytosis and 14-3-3 protein. Expression was completely different in neuropsychologically impaired subjects without HIVE. They had low brain HIV-1 loads, weak brain immune responses, lacked neuronally expressed changes in neocortex and exhibited upregulation of endothelial cell type transcripts. HIV-1-infected subjects with normal neuropsychological test results had upregulation of neuronal transcripts involved in synaptic transmission of neostriatal circuits.
Two patterns of brain gene expression suggest that more than one pathophysiological process occurs in HIV-1-associated neurocognitive impairment. Expression in HIVE suggests that lowering brain HIV-1 replication might improve NCI, whereas NCI without HIVE may not respond in kind; array results suggest that modulation of transvascular signaling is a potentially promising approach. Striking brain regional differences highlighted the likely importance of circuit level disturbances in HIV/AIDS. In subjects without impairment regulation of genes that drive neostriatal synaptic plasticity reflects adaptation. The array provides an infusion of public resources including brain samples, clinicopathological data and correlative gene expression data for further exploration (http://www.nntc.org/gene-array-project).
The family Eresidae C. L. Koch, 1850 is reviewed at the genus level. The family comprises nine genera including one new genus. They are: Adonea Simon, 1873, Dorceus C. L. Koch, 1846, Dresserus Simon, 1876, Eresus Walckenaer, 1805, Gandanameno Lehtinen, 1967, Loureedia
gen. n., ParadoneaLawrence, 1968, Seothyra Purcell, 1903, and Stegodyphus Simon, 1873. A key to all genera and major lineages is provided along with corresponding diagnoses, as well as descriptions of selected species. These are documented with collections of photographs, scanning electron micrographs, and illustrations. A new phylogeny of Eresidae based on molecular sequence data expands on a previously published analysis. A species of the genus Paradonea Lawrence, 1968 is sequenced and placed phylogenetically for the first time. New sequences from twenty Gandanameno Lehtinen, 1967 specimens were added to investigate species limits within the genus. The genus Loureedia
gen. n. is proposed to accommodate Eresus annulipes Lucas, 1857. Two species, Eresus semicanus Simon, 1908 and Eresus jerbae El-Hennawy, 2005, are synonymized with Loureedia annulipes
comb. n. One new species, Paradonea presleyi
sp. n. is described. Eresus algericus El-Hennawy, 2004 is transferred to Adonea Simon, 1873. The female of Dorceus fastuosus C. L. Koch, 1846 is described for the first time. The first figures depicting Paradonea splendens (Lawrence, 1936) are presented.
ladybird spiders; molecular phylogeny; spinneret spigot morphology; taxonomy
A new troglomorphic spider from caves in Central Java, Indonesia, is described and placed in the ctenid genus Amauropelma Raven, Stumkat & Gray, until now containing only species from Queensland, Australia. Only juveniles and mature females of the new species are known. We give our reasons for placing the new species in Amauropelma, discuss conflicting characters, and make predictions about the morphology of the as yet undiscovered male that will test our taxonomic hypothesis. The description includes DNA barcode sequence data.
conservation; DNA barcode; Indonesia; Jonggrangan Limestone; troglobite
Genomic and other high dimensional analyses often require one to summarize multiple related variables by a single representative. This task is also variously referred to as collapsing, combining, reducing, or aggregating variables. Examples include summarizing several probe measurements corresponding to a single gene, representing the expression profiles of a co-expression module by a single expression profile, and aggregating cell-type marker information to de-convolute expression data. Several standard statistical summary techniques can be used, but network methods also provide useful alternative methods to find representatives. Currently few collapsing functions are developed and widely applied.
We introduce the R function collapseRows that implements several collapsing methods and evaluate its performance in three applications. First, we study a crucial step of the meta-analysis of microarray data: the merging of independent gene expression data sets, which may have been measured on different platforms. Toward this end, we collapse multiple microarray probes for a single gene and then merge the data by gene identifier. We find that choosing the probe with the highest average expression leads to best between-study consistency. Second, we study methods for summarizing the gene expression profiles of a co-expression module. Several gene co-expression network analysis applications show that the optimal collapsing strategy depends on the analysis goal. Third, we study aggregating the information of cell type marker genes when the aim is to predict the abundance of cell types in a tissue sample based on gene expression data ("expression deconvolution"). We apply different collapsing methods to predict cell type abundances in peripheral human blood and in mixtures of blood cell lines. Interestingly, the most accurate prediction method involves choosing the most highly connected "hub" marker gene. Finally, to facilitate biological interpretation of collapsed gene lists, we introduce the function userListEnrichment, which assesses the enrichment of gene lists for known brain and blood cell type markers, and for other published biological pathways.
The R function collapseRows implements several standard and network-based collapsing methods. In various genomic applications we provide evidence that both types of methods are robust and biologically relevant tools.
In budding yeast, the eukaryotic initiator protein ORC (origin recognition complex) binds to a bipartite sequence consisting of an 11 bp ACS element and an adjacent B1 element. However, the genome contains many more matches to this consensus than actually bind ORC or function as origins in vivo. Although ORC-dependent loading of the replicative MCM helicase at origins is enhanced by a distal B2 element, less is known about this element. Here, we analyzed four highly active origins (ARS309, ARS319, ARS606 and ARS607) by linker scanning mutagenesis and found that sequences adjacent to the ACS contributed substantially to origin activity and ORC binding. Using the sequences of four additional B2 elements we generated a B2 multiple sequence alignment and identified a shared, degenerate 8 bp sequence that was enriched within 228 known origins. In addition, our high-resolution analysis revealed that not all origins exist within nucleosome free regions: a class of Sir2-regulated origins has a stably positioned nucleosome overlapping or near B2. This study illustrates the conserved yet flexible nature of yeast origin architecture to promote ORC binding and origin activity, and helps explain why a strong match to the ORC binding site is insufficient to identify origins within the genome.
The ability to selectively detect and target cancer cells that have undergone an epithelial-mesenchymal transition (EMT) may lead to improved methods to treat cancers such as pancreatic cancer. The remodeling of cellular glycosylation previously has been associated with cell differentiation and may represent a valuable class of molecular targets for EMT.
As a first step toward investigating the nature of glycosylation alterations in EMT, we characterized the expression of glycan-related genes in three in-vitro model systems that each represented a complementary aspect of pancreatic cancer EMT. These models included: 1) TGFβ-induced EMT, which provided a look at the active transition between states; 2) a panel of 22 pancreatic cancer cell lines, which represented terminal differentiation states of either epithelial-like or mesenchymal-like; and 3) actively-migrating and stationary cells, which provided a look at the mechanism of migration. We analyzed expression data from a list of 587 genes involved in glycosylation (biosynthesis, sugar transport, glycan-binding, etc.) or EMT. Glycogenes were altered at a higher prevalence than all other genes in the first two models (p<0.05 and <0.005, respectively) but not in the migration model. Several functional themes were shared between the induced-EMT model and the cell line panel, including alterations to matrix components and proteoglycans, the sulfation of glycosaminoglycans; mannose receptor family members; initiation of O-glycosylation; and certain forms of sialylation. Protein-level changes were confirmed by Western blot for the mannose receptor MRC2 and the O-glycosylation enzyme GALNT3, and cell-surface sulfation changes were confirmed using Alcian Blue staining.
Alterations to glycogenes are a major component of cancer EMT and are characterized by changes to matrix components, the sulfation of GAGs, mannose receptors, O-glycosylation, and specific sialylated structures. These results provide leads for targeting aggressive and drug resistant forms of pancreatic cancer cells.
Pleural malignant mesothelioma (MM) is an aggressive cancer with a very long latency and a very short median survival. Little is known about the genetic events that trigger MM and their relation to poor outcome. The goal of our study was to characterize major genomic gains and losses associated with MM origin and progression and assess their clinical significance. We performed Representative Oligonucleotide Microarray Analysis (ROMA) on DNA isolated from tumors of 22 patients who recurred at variable interval with the disease after surgery. The total number of copy number alterations (CNA) and frequent imbalances for patients with short time (<12 months from surgery) and long time to recurrence were recorded and mapped using the Analysis of Copy Errors (ACE) algorithm. We report a profound increase in CNA in the short-time recurrence group with most chromosomes affected, which can be explained by chromosomal instability associated with MM. Deletions in chromosomes 22q12.2, 19q13.32, and 17p13.1 appeared to be the most frequent events (55-74%) shared between MM patients followed by deletions in 1p, 9p, 9q, 4p, 3p and gains in 5p, 18q, 8q, and 17q (23-55%). Deletions in 9p21.3 encompassing CDKN2A/ARF and CDKN2B were characterized as specific for the short-term recurrence group. Analysis of the minimal common areas of frequent gains and losses identified candidate genes that may be involved in different stages of MM: OSM (22q12.2), FUS1 and PL6 (3p21.3), DNAJA1 (9p21.1), and CDH2 (18q11.2-q12.3). Imbalances seen by ROMA were confirmed by Affymetrix genome analysis in a subset of samples.
mesothelioma; ROMA; CGH; copy number alterations; tumor suppressors; oncogenes