1.  Immunogenetics: Genome-Wide Association of Non-Progressive HIV and Viral Load Control: HLA Genes and Beyond 
Very early after the identification of the human immunodeficiency virus (HIV), host genetics factors were anticipated to play a role in viral control and disease progression. As early as the mid-1990s, candidate gene studies demonstrated a central role for the chemokine co-receptor/ligand (e.g., CCR5) and human leukocyte antigen (HLA) systems. In the last decade, the advent of genome-wide arrays opened a new era for unbiased genetic exploration of the genome and brought big expectations for the identification of new unexpected genes and pathways involved in HIV/AIDS. More than 15 genome-wide association studies targeting various HIV-linked phenotypes have been published since 2007. Surprisingly, only the two HIV-chemokine co-receptors and HLA loci have exhibited consistent and reproducible statistically significant genetic associations. In this chapter, we will review the findings from the genome-wide studies focusing especially on non-progressive and HIV control phenotypes, and discuss the current perspectives.
PMCID: PMC3664380  PMID: 23750159
genome-wide association study; SNP; HIV-1; viral control; long-term non-progression; chemokine receptors region; HLA
2.  The Nobel Prize as a Reward Mechanism in the Genomics Era: Anonymous Researchers, Visible Managers and the Ethics of Excellence 
Journal of Bioethical Inquiry  2010;7(3):299-312.
The Human Genome Project (HGP) is regarded by many as one of the major scientific achievements in recent science history, a large-scale endeavour that is changing the way in which biomedical research is done and expected, moreover, to yield considerable benefit for society. Thus, since the completion of the human genome sequencing effort, a debate has emerged over the question whether this effort merits to be awarded a Nobel Prize and if so, who should be the one(s) to receive it, as (according to current procedures) no more than three individuals can be selected. In this article, the HGP is taken as a case study to consider the ethical question to what extent it is still possible, in an era of big science, of large-scale consortia and global team work, to acknowledge and reward individual contributions to important breakthroughs in biomedical fields. Is it still viable to single out individuals for their decisive contributions in order to reward them in a fair and convincing way? Whereas the concept of the Nobel prize as such seems to reflect an archetypical view of scientists as solitary researchers who, at a certain point in their careers, make their one decisive discovery, this vision has proven to be problematic from the very outset. Already during the first decade of the Nobel era, Ivan Pavlov was denied the Prize several times before finally receiving it, on the basis of the argument that he had been active as a research manager (a designer and supervisor of research projects) rather than as a researcher himself. The question then is whether, in the case of the HGP, a research effort that involved the contributions of hundreds or even thousands of researchers worldwide, it is still possible to “individualise” the Prize? The “HGP Nobel Prize problem” is regarded as an exemplary issue in current research ethics, highlighting a number of quandaries and trends involved in contemporary life science research practices more broadly.
PMCID: PMC2917546  PMID: 20730106
Human Genome Project; Nobel Prize; Research ethics; Fairness of reward mechanism in biomedical research
3.  Systems solutions by lactic acid bacteria: from paradigms to practice 
Microbial Cell Factories  2011;10(Suppl 1):S2.
Lactic acid bacteria are among the powerhouses of the food industry, colonize the surfaces of plants and animals, and contribute to our health and well-being. The genomic characterization of LAB has rocketed and presently over 100 complete or nearly complete genomes are available, many of which serve as scientific paradigms. Moreover, functional and comparative metagenomic studies are taking off and provide a wealth of insight in the activity of lactic acid bacteria used in a variety of applications, ranging from starters in complex fermentations to their marketing as probiotics. In this new era of high throughput analysis, biology has become big science. Hence, there is a need to systematically store the generated information, apply this in an intelligent way, and provide modalities for constructing self-learning systems that can be used for future improvements. This review addresses these systems solutions with a state of the art overview of the present paradigms that relate to the use of lactic acid bacteria in industrial applications. Moreover, an outlook is presented of the future developments that include the transition into practice as well as the use of lactic acid bacteria in synthetic biology and other next generation applications.
PMCID: PMC3231926  PMID: 21995776
4.  A biological treasure metagenome: pave a way for big science 
Indian Journal of Microbiology  2008;48(2):163-172.
The trend of recent researches, in which synthetic biology and white technology through system approaches based on “Omics technology” are recognized as the ground of biotechnology, indicates the coming of the ‘metagenome era’ that accesses the genomes of all microbes aiming at the understanding and industrial application of the whole microbial resources. The remarkable advance of technologies for digging out and analyzing metagenome is enabling not only practical applications of metagenome but also system approaches on a mixed-genome level based on accumulated information. In this situation, the present review is purposed to introduce the trends and methods of research on metagenome and to examine big science led by related resources in the future.
PMCID: PMC3450180  PMID: 23100711
Metagenome; Gene mining; Novel metabolites; Systems approach; Biological treasure
5.  Big Defensins, a Diverse Family of Antimicrobial Peptides That Follows Different Patterns of Expression in Hemocytes of the Oyster Crassostrea gigas 
PLoS ONE  2011;6(9):e25594.
Big defensin is an antimicrobial peptide composed of a highly hydrophobic N-terminal region and a cationic C-terminal region containing six cysteine residues involved in three internal disulfide bridges. While big defensin sequences have been reported in various mollusk species, few studies have been devoted to their sequence diversity, gene organization and their expression in response to microbial infections.
Using the high-throughput Digital Gene Expression approach, we have identified in Crassostrea gigas oysters several sequences coding for big defensins induced in response to a Vibrio infection. We showed that the oyster big defensin family is composed of three members (named Cg-BigDef1, Cg-BigDef2 and Cg-BigDef3) that are encoded by distinct genomic sequences. All Cg-BigDefs contain a hydrophobic N-terminal domain and a cationic C-terminal domain that resembles vertebrate β-defensins. Both domains are encoded by separate exons. We found that big defensins form a group predominantly present in mollusks and closer to vertebrate defensins than to invertebrate and fungi CSαβ-containing defensins. Moreover, we showed that Cg-BigDefs are expressed in oyster hemocytes only and follow different patterns of gene expression. While Cg-BigDef3 is non-regulated, both Cg-BigDef1 and Cg-BigDef2 transcripts are strongly induced in response to bacterial challenge. Induction was dependent on pathogen associated molecular patterns but not damage-dependent. The inducibility of Cg-BigDef1 was confirmed by HPLC and mass spectrometry, since ions with a molecular mass compatible with mature Cg-BigDef1 (10.7 kDa) were present in immune-challenged oysters only. From our biochemical data, native Cg-BigDef1 would result from the elimination of a prepropeptide sequence and the cyclization of the resulting N-terminal glutamine residue into a pyroglutamic acid.
We provide here the first report showing that big defensins form a family of antimicrobial peptides diverse not only in terms of sequences but also in terms of genomic organization and regulation of gene expression.
PMCID: PMC3182236  PMID: 21980497
6.  Fish Assemblages in Streams Subject to Anthropogenic Disturbances Along The Natchez Trace Parkway, Mississippi, USA 
A three-year study (July 2000 – June 2003) of fish assemblages was conducted in four tributaries of the Big Black River: Big Bywy, Little Bywy, Middle Bywy and McCurtain creeks that cross the Natchez Trace Parkway, Choctaw County, Mississippi, USA. Little Bywy and Middle Bywy creeks were within watersheds influenced by the lignite mining. Big Bywy and Middle Bywy creeks were historically impacted by channelisation. McCurtain Creek was chosen as a reference (control) stream. Fish were collected using a portable backpack electrofishing unit (Smith-Root Inc., Washington, USA). Insectivorous fish dominated all of the streams. There were no pronounced differences in relative abundances of fishes among the streams (P > 0.05) but fish assemblages fluctuated seasonally. Although there were some differences among streams with regard to individual species, channelisation and lignite mining had no discernable adverse effects on functional components of fish assemblages suggesting that fishes in these systems are euryceous fluvial generalist species adapted to the variable environments of small stream ecosystems.
PMCID: PMC3819055  PMID: 24575177
Fish; Mining; Channelisation
7.  Some experiences and opportunities for big data in translational research 
Health care has become increasingly information intensive. The advent of genomic data, integrated into patient care, significantly accelerates the complexity and amount of clinical data. Translational research in the present day increasingly embraces new biomedical discovery in this data-intensive world, thus entering the domain of “big data.” The Electronic Medical Records and Genomics consortium has taught us many lessons, while simultaneously advances in commodity computing methods enable the academic community to affordably manage and process big data. Although great promise can emerge from the adoption of big data methods and philosophy, the heterogeneity and complexity of clinical data, in particular, pose additional challenges for big data inferencing and clinical application. However, the ultimate comparability and consistency of heterogeneous clinical information sources can be enhanced by existing and emerging data standards, which promise to bring order to clinical data chaos. Meaningful Use data standards in particular have already simplified the task of identifying clinical phenotyping patterns in electronic health records.
PMCID: PMC3906918  PMID: 24008998
clinical data representation; big data; genomics; health information technology standards
8.  Characterization of whole genome radiation hybrid mapping resources for non-mammalian vertebrates. 
Nucleic Acids Research  1998;26(15):3562-3566.
Radiation hybrid panels are already available for genome mapping in human and mouse. In this study we have used two model organisms (chicken and zebrafish) to show that hybrid panels that contain a full complement of the donor genome can be generated by fusion to hamster cells. The quality of the resulting hybrids has been assessed using PCR and FISH. We confirmed the utility of our panels by establishing the percentage of donor DNA present in the hybrids. Our hybrid resources will allow inexpensive gene mapping and we expect that this technology can be transferred to many other species. Such successes are providing the basis for a new era of mapping tools, in the form of whole genome radiation hybrid panels, and are opening new possibilities for systematic genome analysis in the animal genetics community.
PMCID: PMC147736  PMID: 9671819
9.  BigWig and BigBed: enabling browsing of large distributed datasets 
Bioinformatics  2010;26(17):2204-2207.
Summary: BigWig and BigBed files are compressed binary indexed files containing data at several resolutions that allow the high-performance display of next-generation sequencing experiment results in the UCSC Genome Browser. The visualization is implemented using a multi-layered software approach that takes advantage of specific capabilities of web-based protocols and Linux and UNIX operating systems files, R trees and various indexing and compression tricks. As a result, only the data needed to support the current browser view is transmitted rather than the entire file, enabling fast remote access to large distributed data sets.
Availability and implementation: Binaries for the BigWig and BigBed creation and parsing utilities may be downloaded at Source code for the creation and visualization software is freely available for non-commercial use at, implemented in C and supported on Linux. The UCSC Genome Browser is available at
Supplementary information: Supplementary byte-level details of the BigWig and BigBed file formats are available at Bioinformatics online. For an in-depth description of UCSC data file formats and custom tracks, see and
PMCID: PMC2922891  PMID: 20639541
10.  A practical approach to phylogenomics: the phylogeny of ray-finned fish (Actinopterygii) as a case study 
Molecular systematics occupies one of the central stages in biology in the genomic era, ushered in by unprecedented progress in DNA technology. The inference of organismal phylogeny is now based on many independent genetic loci, a widely accepted approach to assemble the tree of life. Surprisingly, this approach is hindered by lack of appropriate nuclear gene markers for many taxonomic groups especially at high taxonomic level, partially due to the lack of tools for efficiently developing new phylogenetic makers. We report here a genome-comparison strategy to identifying nuclear gene markers for phylogenetic inference and apply it to the ray-finned fishes – the largest vertebrate clade in need of phylogenetic resolution.
A total of 154 candidate molecular markers – relatively well conserved, putatively single-copy gene fragments with long, uninterrupted exons – were obtained by comparing whole genome sequences of two model organisms, Danio rerio and Takifugu rubripes. Experimental tests of 15 of these (randomly picked) markers on 36 taxa (representing two-thirds of the ray-finned fish orders) demonstrate the feasibility of amplifying by PCR and directly sequencing most of these candidates from whole genomic DNA in a vast diversity of fish species. Preliminary phylogenetic analyses of sequence data obtained for 14 taxa and 10 markers (total of 7,872 bp for each species) are encouraging, suggesting that the markers obtained will make significant contributions to future fish phylogenetic studies.
We present a practical approach that systematically compares whole genome sequences to identify single-copy nuclear gene markers for inferring phylogeny. Our method is an improvement over traditional approaches (e.g., manually picking genes for testing) because it uses genomic information and automates the process to identify large numbers of candidate makers. This approach is shown here to be successful for fishes, but also could be applied to other groups of organisms for which two or more complete genome sequences exist, which has important implications for assembling the tree of life.
PMCID: PMC1838417  PMID: 17374158
11.  External parasite infection of common carp (Cyprinus carpio) and big head (Hypophthalmichthys nobilis) in fish farms of Mashhad, northeast of Iran 
Totally 75 common carp and 100 big head fishes were caught by using net from fish farms in Mashhad, northeast of Iran. In laboratory skin, eye and fin of fishes were inspected by stereomicroscope and in second phase direct smears are prepared from probable lesions. Gills were dissected and its filaments were placed in petri dishes and fixed by Glycerin. In total 50 parasites (consist 19 protozoa and 31 metazoa) were recorded from fishes. The parasites represented in Protozoa (Ichthyophthirius multifiliis and Trichodina sp.) and Monogenea) Dactylogyrus spp.) and Copepoda (Lernea cyprinacea). During this study, infection with Dactylogyrus spp. was recorded on fish in all months. Mean intensity of Dactylogyrus spp. varied significantly among the seasons (P < 0.05). The maximum mean intensity was recorded in winter. Also infection with Lernea cyprinicea in spring was significantly higher than other seasons (P < 0.05). The results of this study together with the previously recorded prevalence of parasitic infection in fishes support that infection to external parasites (both protozoa and metazoa) is widespread cause for losses in fish farms of Iran.
PMCID: PMC3590374  PMID: 24431554
External parasite; Cyprinus carpio; Hypophthalmichthys nobilis
12.  A need for a 'whole-istic functional genomics' approach in complex human diseases: arthritis 
Arthritis Research & Therapy  2003;5(2):76-79.
'Genomic tools', such as gene/protein chips, single nucleotide polymorphism and haplotype analyses, are empowering us to generate staggering amounts of correlative data, from human/animal genetics and from normal and disease-affected tissues obtained from complex diseases such as arthritis. These tools are transforming molecular biology into a 'data rich' science, with subjects with an '-omic' suffix. These disciplines have to converge and integrate at a systemic level to examine the structure and dynamics of cellular and organismal function ('functionomics') simultaneously, using a multidimensional approach for cells, tissues, organs, rodents and Zebra fish models, which intertwines various approaches and readouts to study the development and homeostasis of a system. In summary, the postgenomic era of functionomics will facilitate narrowing the bridge between correlative data and causative data, thus integrating 'intercoms' of interacting and interdependent disciplines and forming a unified whole.
PMCID: PMC165036  PMID: 12718747
arthritis; genomics; inflammation; proteomics
13.  Exploiting Cellular-Developmental Evolution as the Scientific Basis for Preventive Medicine 
Medical hypotheses  2009;72(5):596-602.
In the post-genomic era, we must make maximal use of this technological advancement to broaden our perspective on biology and medicine. Our understanding of the evolutionary process is undermined by looking at it retrospectively, perpetuating a descriptive rather than a mechanistic approach. The reintroduction of developmental biologic principles into evolutionary studies, or evo-devo, allows us to apply embryologic cell-molecular biologic principles to the mechanisms of phylogeny, obviating the artificial space and time barriers between ontogeny and phylogeny. This perspective allows us to consider the continuum between the proximate and ultimate causes of speciation, which was unthinkable when looked at from the descriptive perspective. Using a cell-cell interactive ‘middle-out’ approach, we have gained insight to the evolution of the lung from the swim bladder of fish based on gene regulatory networks that generate both lung ontogeny and phylogeny, i.e. decreased alveolar size, decreased alveolar wall thickness, and increased alveolar wall strength. Vertical integration of cell-cell interactions predicts the adaptivity and maladaptivity of the lung, leading to novel insights for chronic lung disease. Since we have employed principles involved in all of development, this approach is amenable to all biologic structures, functions, adaptations, maladaptations, and diseases, providing an operational basis for preventive medicine.
PMCID: PMC2677996  PMID: 19147298
14.  Conservation of all three p53 family members and Mdm2 and Mdm4 in the cartilaginous fish 
Cell Cycle  2011;10(24):4272-4279.
Analysis of the genome of the elephant shark (Callorhinchus milii), a member of the cartilaginous fishes (class Chondrichthyes), reveals that it encodes all three members of the p53 gene family, p53, p63 and p73, each with clear homology to the equivalent gene in bony vertebrates (class Osteichthyes). Thus, the gene duplication events that lead to the presence of three family members in the vertebrates dates to before the Silurian era. It also encodes Mdm2 and Mdm4 genes but does not encode the p19Arf gene. Detailed comparison of the amino acid sequences of these proteins in the vertebrates reveals that they are evolving at highly distinctive rates, and this variation occurs not only between the three family members but extends to distinct domains in each protein.
PMCID: PMC3272259  PMID: 22107961
p53; p63; p73; Mdm2; Mdm4; elephant shark
15.  MazG, a Nucleoside Triphosphate Pyrophosphohydrolase, Interacts with Era, an Essential GTPase in Escherichia coli 
Journal of Bacteriology  2002;184(19):5323-5329.
Era is an essential GTPase in Escherichia coli, and Era has been implicated in a number of cellular functions. Homologues of Era have been identified in various bacteria and some eukaryotes. Using the era gene as bait in the yeast two-hybrid system to screen E. coli genomic libraries, we discovered that Era interacts with MazG, a protein of unknown function which is highly conserved among bacteria. The direct interaction between Era and MazG was also confirmed in vitro, being stronger in the presence of GDP than in the presence of GTPγS. MazG was characterized as a nucleoside triphosphate pyrophosphohydrolase which can hydrolyze all eight of the canonical ribo- and deoxynucleoside triphosphates to their respective monophosphates and PPi, with a preference for deoxynucleotides. A mazG deletion strain of E. coli was constructed by replacing the mazG gene with a kanamycin resistance gene. Unlike mutT, a gene for another conserved nucleotide triphosphate pyrophosphohydrolase that functions as a mutator gene, the mazG deletion did not result in a mutator phenotype in E. coli.
PMCID: PMC135369  PMID: 12218018
16.  Targeting BIG3–PHB2 interaction to overcome tamoxifen resistance in breast cancer cells 
Nature Communications  2013;4:2443.
The acquisition of endocrine resistance is a common obstacle in endocrine therapy of patients with oestrogen receptor-α (ERα)-positive breast tumours. We previously demonstrated that the BIG3–PHB2 complex has a crucial role in the modulation of oestrogen/ERα signalling in breast cancer cells. Here we report a cell-permeable peptide inhibitor, called ERAP, that regulates multiple ERα-signalling pathways associated with tamoxifen resistance in breast cancer cells by inhibiting the interaction between BIG3 and PHB2. Intrinsic PHB2 released from BIG3 by ERAP directly binds to both nuclear- and membrane-associated ERα, which leads to the inhibition of multiple ERα-signalling pathways, including genomic and non-genomic ERα activation and ERα phosphorylation, and the growth of ERα-positive breast cancer cells both in vitro and in vivo. More importantly, ERAP treatment suppresses tamoxifen resistance and enhances tamoxifen responsiveness in ERα-positive breast cancer cells. These findings suggest inhibiting the interaction between BIG3 and PHB2 may be a new therapeutic strategy for the treatment of luminal-type breast cancer.
Oestrogen receptor-α (ERα) signalling has a role in breast cancer drug resistance. Here, the authors report a synthetic peptide that disrupts the interaction between the signalling molecules BIG3 and PHB2, and thereby suppresses tamoxifen resistance.
PMCID: PMC3791465  PMID: 24051437
17.  Transcriptome characterization and polymorphism detection between subspecies of big sagebrush (Artemisia tridentata) 
BMC Genomics  2011;12:370.
Big sagebrush (Artemisia tridentata) is one of the most widely distributed and ecologically important shrub species in western North America. This species serves as a critical habitat and food resource for many animals and invertebrates. Habitat loss due to a combination of disturbances followed by establishment of invasive plant species is a serious threat to big sagebrush ecosystem sustainability. Lack of genomic data has limited our understanding of the evolutionary history and ecological adaptation in this species. Here, we report on the sequencing of expressed sequence tags (ESTs) and detection of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers in subspecies of big sagebrush.
cDNA of A. tridentata sspp. tridentata and vaseyana were normalized and sequenced using the 454 GS FLX Titanium pyrosequencing technology. Assembly of the reads resulted in 20,357 contig consensus sequences in ssp. tridentata and 20,250 contigs in ssp. vaseyana. A BLASTx search against the non-redundant (NR) protein database using 29,541 consensus sequences obtained from a combined assembly resulted in 21,436 sequences with significant blast alignments (≤ 1e-15). A total of 20,952 SNPs and 119 polymorphic SSRs were detected between the two subspecies. SNPs were validated through various methods including sequence capture. Validation of SNPs in different individuals uncovered a high level of nucleotide variation in EST sequences. EST sequences of a third, tetraploid subspecies (ssp. wyomingensis) obtained by Illumina sequencing were mapped to the consensus sequences of the combined 454 EST assembly. Approximately one-third of the SNPs between sspp. tridentata and vaseyana identified in the combined assembly were also polymorphic within the two geographically distant ssp. wyomingensis samples.
We have produced a large EST dataset for Artemisia tridentata, which contains a large sample of the big sagebrush leaf transcriptome. SNP mapping among the three subspecies suggest the origin of ssp. wyomingensis via mixed ancestry. A large number of SNP and SSR markers provide the foundation for future research to address questions in big sagebrush evolution, ecological genetics, and conservation using genomic approaches.
PMCID: PMC3150299  PMID: 21767398
18.  From Darwin to the Census of Marine Life: Marine Biology as Big Science 
PLoS ONE  2013;8(1):e54284.
With the development of the Human Genome Project, a heated debate emerged on biology becoming ‘big science’. However, biology already has a long tradition of collaboration, as natural historians were part of the first collective scientific efforts: exploring the variety of life on earth. Such mappings of life still continue today, and if field biology is gradually becoming an important subject of studies into big science, research into life in the world's oceans is not taken into account yet. This paper therefore explores marine biology as big science, presenting the historical development of marine research towards the international ‘Census of Marine Life’ (CoML) making an inventory of life in the world's oceans. Discussing various aspects of collaboration – including size, internationalisation, research practice, technological developments, application, and public communication – I will ask if CoML still resembles traditional collaborations to collect life. While showing both continuity and change, I will argue that marine biology is a form of natural history: a specific way of working together in biology that has transformed substantially in interaction with recent developments in the life sciences and society. As a result, the paper does not only give an overview of transformations towards large scale research in marine biology, but also shines a new light on big biology, suggesting new ways to deepen the understanding of collaboration in the life sciences by distinguishing between different ‘collective ways of knowing’.
PMCID: PMC3544803  PMID: 23342119
19.  Temporal lobe necrosis: a dwindling entity in a patient with nasopharyngeal cancer after radiation therapy 
Our objective was to report a case of misdiagnosed temporal lobe necrosis (TLN) in a patient with nasopharyngeal cancer (NPC) after radiation therapy.
Case Presentation
We report a case of a 45 years old Chinese woman who developed moderate to severe headache and dizziness 1 year after 2D radiation therapy for NPC. Subsequent MRI scanning revealed a big enhancing mass in the right temporal lobe. The initial diagnosis was metastatic or intracranial extension of NPC, or a primary intracranial malignancy. She was referred to the neurosurgery department where a maximal surgical resection of the lesion was performed. A diagnosis of TLN was made according to the final histology.
TLN still matters in the IMRT era. The diagnostic quagmire of TLN lies in its close resemblance to neoplasm on clinical presentation and imaging. Reviewing the patient's treatment plan to scrutinize the dose to the temporal lobes is an important prerequisite for diagnosis.
PMCID: PMC3042977  PMID: 21310054
20.  Sagace: A web-based search engine for biomedical databases in Japan 
BMC Research Notes  2012;5:604.
In the big data era, biomedical research continues to generate a large amount of data, and the generated information is often stored in a database and made publicly available. Although combining data from multiple databases should accelerate further studies, the current number of life sciences databases is too large to grasp features and contents of each database.
We have developed Sagace, a web-based search engine that enables users to retrieve information from a range of biological databases (such as gene expression profiles and proteomics data) and biological resource banks (such as mouse models of disease and cell lines). With Sagace, users can search more than 300 databases in Japan. Sagace offers features tailored to biomedical research, including manually tuned ranking, a faceted navigation to refine search results, and rich snippets constructed with retrieved metadata for each database entry.
Sagace will be valuable for experts who are involved in biomedical research and drug development in both academia and industry. Sagace is freely available at
PMCID: PMC3532241  PMID: 23110816
Search engine; Biomedical data; Biomedical resources; Faceted search; Microdata
21.  A sea of standards for omics data: sink or swim? 
In the era of Big Data, omic-scale technologies, and increasing calls for data sharing, it is generally agreed that the use of community-developed, open data standards is critical. Far less agreed upon is exactly which data standards should be used, the criteria by which one should choose a standard, or even what constitutes a data standard. It is impossible simply to choose a domain and have it naturally follow which data standards should be used in all cases. The ‘right’ standards to use is often dependent on the use case scenarios for a given project. Potential downstream applications for the data, however, may not always be apparent at the time the data are generated. Similarly, technology evolves, adding further complexity. Would-be standards adopters must strike a balance between planning for the future and minimizing the burden of compliance. Better tools and resources are required to help guide this balancing act.
PMCID: PMC3932466  PMID: 24076747
Data Standards; Data Sharing; Terminology; Information dissemination
22.  The Role of the Toxicologic Pathologist in the Post-Genomic Era# 
Journal of Toxicologic Pathology  2013;26(2):105-110.
An era can be defined as a period in time identified by distinctive character, events, or practices. We are now in the genomic era. The pre-genomic era: There was a pre-genomic era. It started many years ago with novel and seminal animal experiments, primarily directed at studying cancer. It is marked by the development of the two-year rodent cancer bioassay and the ultimate realization that alternative approaches and short-term animal models were needed to replace this resource-intensive and time-consuming method for predicting human health risk. Many alternatives approaches and short-term animal models were proposed and tried but, to date, none have completely replaced our dependence upon the two-year rodent bioassay. However, the alternative approaches and models themselves have made tangible contributions to basic research, clinical medicine and to our understanding of cancer and they remain useful tools to address hypothesis-driven research questions. The pre-genomic era was a time when toxicologic pathologists played a major role in drug development, evaluating the cancer bioassay and the associated dose-setting toxicity studies, and exploring the utility of proposed alternative animal models. It was a time when there was shortage of qualified toxicologic pathologists. The genomic era: We are in the genomic era. It is a time when the genetic underpinnings of normal biological and pathologic processes are being discovered and documented. It is a time for sequencing entire genomes and deliberately silencing relevant segments of the mouse genome to see what each segment controls and if that silencing leads to increased susceptibility to disease. What remains to be charted in this genomic era is the complex interaction of genes, gene segments, post-translational modifications of encoded proteins, and environmental factors that affect genomic expression. In this current genomic era, the toxicologic pathologist has had to make room for a growing population of molecular biologists. In this present era newly emerging DVM and MD scientists enter the work arena with a PhD in pathology often based on some aspect of molecular biology or molecular pathology research. In molecular biology, the almost daily technological advances require one’s complete dedication to remain at the cutting edge of the science. Similarly, the practice of toxicologic pathology, like other morphological disciplines, is based largely on experience and requires dedicated daily examination of pathology material to maintain a well-trained eye capable of distilling specific information from stained tissue slides - a dedicated effort that cannot be well done as an intermezzo between other tasks. It is a rare individual that has true expertise in both molecular biology and pathology. In this genomic era, the newly emerging DVM-PhD or MD-PhD pathologist enters a marketplace without many job opportunities in contrast to the pre-genomic era. Many face an identity crisis needing to decide to become a competent pathologist or, alternatively, to become a competent molecular biologist. At the same time, more PhD molecular biologists without training in pathology are members of the research teams working in drug development and toxicology. How best can the toxicologic pathologist interact in the contemporary team approach in drug development, toxicology research and safety testing? Based on their biomedical training, toxicologic pathologists are in an ideal position to link data from the emerging technologies with their knowledge of pathobiology and toxicology. To enable this linkage and obtain the synergy it provides, the bench-level, slide-reading expert pathologist will need to have some basic understanding and appreciation of molecular biology methods and tools. On the other hand, it is not likely that the typical molecular biologist could competently evaluate and diagnose stained tissue slides from a toxicology study or a cancer bioassay. The post-genomic era: The post-genomic era will likely arrive approximately around 2050 at which time entire genomes from multiple species will exist in massive databases, data from thousands of robotic high throughput chemical screenings will exist in other databases, genetic toxicity and chemical structure-activity-relationships will reside in yet other databases. All databases will be linked and relevant information will be extracted and analyzed by appropriate algorithms following input of the latest molecular, submolecular, genetic, experimental, pathology and clinical data. Knowledge gained will permit the genetic components of many diseases to be amenable to therapeutic prevention and/or intervention. Much like computerized algorithms are currently used to forecast weather or to predict political elections, computerized sophisticated algorithms based largely on scientific data mining will categorize new drugs and chemicals relative to their health benefits versus their health risks for defined human populations and subpopulations. However, this form of a virtual toxicity study or cancer bioassay will only identify probabilities of adverse consequences from interaction of particular environmental and/or chemical/drug exposure(s) with specific genomic variables. Proof in many situations will require confirmation in intact in vivo mammalian animal models. The toxicologic pathologist in the post-genomic era will be the best suited scientist to confirm the data mining and its probability predictions for safety or adverse consequences with the actual tissue morphological features in test species that define specific test agent pathobiology and human health risk.
PMCID: PMC3695332  PMID: 23914052
genomic era; history of toxicologic pathology; molecular biology
23.  The UCSC Genome Browser database: update 2011 
Nucleic Acids Research  2010;39(Database issue):D876-D882.
The University of California, Santa Cruz Genome Browser ( offers online access to a database of genomic sequence and annotation data for a wide variety of organisms. The Browser also has many tools for visualizing, comparing and analyzing both publicly available and user-generated genomic data sets, aligning sequences and uploading user data. Among the features released this year are a gene search tool and annotation track drag-reorder functionality as well as support for BAM and BigWig/BigBed file formats. New display enhancements include overlay of multiple wiggle tracks through use of transparent coloring, options for displaying transformed wiggle data, a ‘mean+whiskers’ windowing function for display of wiggle data at high zoom levels, and more color schemes for microarray data. New data highlights include seven new genome assemblies, a Neandertal genome data portal, phenotype and disease association data, a human RNA editing track, and a zebrafish Conservation track. We also describe updates to existing tracks.
PMCID: PMC3242726  PMID: 20959295
24.  The Gene for 16S rRNA Methyltransferase (ksgA) Functions as a Multicopy Suppressor for a Cold-Sensitive Mutant of Era, an Essential RAS-Like GTP-Binding Protein in Escherichia coli 
Journal of Bacteriology  1998;180(19):5243-5246.
Era, a Ras-like GTP-binding protein in Escherichia coli, has been shown to be essential for growth. However, its cellular functions still remain elusive. In this study, a genetic screening of an E. coli genomic library was performed to identify those genes which can restore the growth ability of a cold-sensitive mutant, Era(Cs) (E200K), at a restrictive temperature when expressed in a multicopy plasmid. Among eight suppressors isolated, six were located at 1 min of the E. coli genomic map, and the gene responsible for the suppression of Era(Cs) (E200K) was identified as the ksgA gene for 16S rRNA transmethylase, whose mutation causes a phenotype of resistance to kasugamycin, a translation initiation inhibitor. This is the first demonstration of suppression of impaired function of Era by overproduction of a functional enzyme. A possible mechanism of the suppression of the Era cold-sensitive phenotype by KsgA overproduction is discussed.
PMCID: PMC107565  PMID: 9748462
25.  Phenotypic Properties and Microbial Diversity of Methanogenic Granules from a Full-Scale Upflow Anaerobic Sludge Bed Reactor Treating Brewery Wastewater†  
Methanogenic granules from an anaerobic bioreactor that treated wastewater of a beer brewery consisted of different morphological types of granules. In this study, the microbial compositions of the different granules were analyzed by molecular microbiological techniques: cloning, denaturing gradient gel electrophoresis and fluorescent in situ hybridization (FISH), and scanning and transmission electron microscopy. We propose here that the different types of granules reflect the different stages in the life cycle of granules. Young granules were small, black, and compact and harbored active cells. Gray granules were the most abundant granules. These granules have a multilayer structure with channels and void areas. The core was composed of dead or starving cells with low activity. The brown granules, which were the largest granules, showed a loose and amorphous structure with big channels that resulted in fractured zones and corresponded to the older granules. Firmicutes (as determined by FISH) and Nitrospira and Deferribacteres (as determined by cloning and sequencing) were the predominant Bacteria. Remarkably, Firmicutes could not be detected in the brown granules. The methanogenic Archaea identified were Methanosaeta concilii (70 to 90% by FISH and cloning), Methanosarcina mazei, and Methanospirillum spp. The phenotypic appearance of the granules reflected the physiological condition of the granules. This may be valuable to easily select appropriate seed sludges to start up other reactors.
PMCID: PMC1489364  PMID: 16820491

