Advances in high-throughput genotyping and the International HapMap Project have enabled association studies at the whole-genome level. We have constructed whole-genome genotyping panels of over 550,000 (HumanHap550) and 650,000 (HumanHap650Y) SNP loci by choosing tag SNPs from all populations genotyped by the International HapMap Project. These panels also contain additional SNP content in regions that have historically been overrepresented in diseases, such as nonsynonymous sites, the MHC region, copy number variant regions and mitochondrial DNA. We estimate that the tag SNP loci in these panels cover the majority of all common variation in the genome as measured by coverage of both all common HapMap SNPs and an independent set of SNPs derived from complete resequencing of genes obtained from SeattleSNPs. We also estimate that, given a sample size of 1,000 cases and 1,000 controls, these panels have the power to detect single disease loci of moderate risk (λ ∼ 1.8–2.0). Relative risks as low as λ ∼ 1.1–1.3 can be detected using 10,000 cases and 10,000 controls depending on the sample population and disease model. If multiple loci are involved, the power increases significantly to detect at least one locus such that relative risks 20%–35% lower can be detected with 80% power if between two and four independent loci are involved. Although our SNP selection was based on HapMap data, which is a subset of all common SNPs, these panels effectively capture the majority of all common variation and provide high power to detect risk alleles that are not represented in the HapMap data.
Advances in high-throughput genotyping technology and the International HapMap Project have enabled genetic association studies at the whole-genome level. Our paper describes two genome-wide SNP panels that contain tag SNPs derived from the International HapMap Project. Tag SNPs are proxies for groups of highly correlated SNPs. Information can be captured for the entire group of correlated SNPs by genotyping only one representative SNP, the tag SNP. These whole-genome SNP panels also contain additional content thought to be overrepresented in disease, such as amino acid–changing nonsynonymous SNPs and mitochondrial SNPs. We show that these panels cover the genome with very high efficiency as measured by coverage of all HapMap SNPs and a set of SNPs derived from completely resequenced genes from the Seattle SNPs database. We also show that these panels have high power to detect disease risk alleles for both HapMap and non-HapMap SNPs. In complex disease where multiple risk alleles are believed to be involved, we show that the ability to detect at least one risk allele with the tag SNP panels is also high.
Females having a longer telomere and lifespan than males have been documented in many animals. Such linkage however has never been reported in fish. Progressive shortening of telomere length is an important aging mechanism. Mounting in vitro evidence has shown that telomere shortening beyond a critical length triggered replicative senescence or cell death. Estrogen has been postulated as a key factor contributing to maintenance of telomere and sex-dependent longevity in animals. This postulation remains unproven due to the lack of a suitable animal system for testing. Here, we introduce a teleost model, the Japanese medaka Oryzias latipes, which shows promise for research into the molecular mechanism(s) controlling sex difference in aging.
Using the medaka, we demonstrate for the first time in teleost that (i) sex differences (female > male) in telomere length and longevity also exist in fish, and (ii) a natural, ‘menopause’-like decline of plasma estrogen was evident in females during aging. Estrogen levels significantly correlated with telomerase activity as well as telomere length in female organs (not in males), suggesting estrogen could modulate telomere length via telomerase activation in a sex -specific manner. A hypothetical in vivo ‘critical’ terminal restriction fragment (TRF, representing telomere) length of approximately 4 kb was deduced in medaka liver for prediction of organismal mortality, which is highly comparable with that for human cells. An age conversion model was also established to enable age translation between medaka (in months) and human (in years). These novel tools are useful for future research on comparative biology of aging using medaka.
The striking similarity in estrogen profile between aging female O. latipes and women enables studying the influence of “postmenopausal” decline of estrogen on telomere and longevity without the need of invasive ovariectomy. Medaka fish is advantageous for studying the direct effect of increased estrogen on telomere length and longevity without the breast cancer complications reported in rodents. The findings strongly support the notion that O. latipes is a unique non-mammalian model for validation of estrogenic influence on telomere and longevity in vertebrates. This laboratory model fish is of potential significance for deciphering the ostensibly conserved mechanism(s) of sex-associated longevity in vertebrates.
Lifespan; Aging; Telomerase and telomere; Estrogen profile; Sex difference and medaka O. latipes
MicroRNAs (miRNAs) are endogenous non-protein-coding RNA genes which exist in a wide variety of organisms, including animals, plants, virus and even unicellular organisms. Medaka (Oryzias latipes) is a useful model organism among vertebrate animals. However, no medaka miRNAs have been investigated systematically. It is beneficial to conduct a genome-wide miRNA discovery study using the next generation sequencing (NGS) technology, which has emerged as a powerful sequencing tool for high-throughput analysis.
In this study, we adopted ABI SOLiD platform to generate small RNA sequence reads from medaka tissues, followed by mapping these sequence reads back to medaka genome. The mapped genomic loci were considered as candidate miRNAs and further processed by a support vector machine (SVM) classifier. As result, we identified 599 novel medaka pre-miRNAs, many of which were found to encode more than one isomiRs. Besides, additional minor miRNAs (also called miRNA star) can be also detected with the improvement of sequencing depth. These quantifiable isomiRs and minor miRNAs enable us to further characterize medaka miRNA genes in many aspects. First of all, many medaka candidate pre-miRNAs position close to each other, forming many miRNA clusters, some of which are also conserved across other vertebrate animals. Secondly, during miRNA maturation, there is an arm selection preference of mature miRNAs within precursors. We observed the differences on arm selection preference between our candidate pre-miRNAs and their orthologous ones. We classified these differences into three categories based on the distribution of NGS reads. Finally, we also investigated the relationship between conservation status and expression level of miRNA genes. We concluded that the evolutionally conserved miRNAs were usually the most abundant ones.
Medaka is a widely used model animal and usually involved in many biomedical studies, including the ones on development biology. Identifying and characterizing medaka miRNA genes would benefit the studies using medaka as a model organism.
FXYD proteins are novel regulators of Na+-K+-ATPase (NKA). In fish subjected to salinity challenges, NKA activity in osmoregulatory organs (e.g., gills) is a primary driving force for the many ion transport systems that act in concert to maintain a stable internal environment. Although teleostean FXYD proteins have been identified and investigated, previous studies focused on only a limited group of species. The purposes of the present study were to establish the brackish medaka (Oryzias dancena) as a potential saltwater fish model for osmoregulatory studies and to investigate the diversity of teleostean FXYD expression profiles by comparing two closely related euryhaline model teleosts, brackish medaka and Japanese medaka (O. latipes), upon exposure to salinity changes. Seven members of the FXYD protein family were identified in each medaka species, and the expression of most branchial fxyd genes was salinity-dependent. Among the cloned genes, fxyd11 was expressed specifically in the gills and at a significantly higher level than the other fxyd genes. In the brackish medaka, branchial fxyd11 expression was localized to the NKA-immunoreactive cells in gill epithelia. Furthermore, the FXYD11 protein interacted with the NKA α-subunit and was expressed at a higher level in freshwater-acclimated individuals relative to fish in other salinity groups. The protein sequences and tissue distributions of the FXYD proteins were very similar between the two medaka species, but different expression profiles were observed upon salinity challenge for most branchial fxyd genes. Salinity changes produced different effects on the FXYD11 and NKA α-subunit expression patterns in the gills of the brackish medaka. To our knowledge, this report is the first to focus on FXYD expression in the gills of closely related euryhaline teleosts. Given the advantages conferred by the well-developed Japanese medaka system, we propose the brackish medaka as a saltwater fish model for osmoregulatory studies.
Linkage maps are important tools in evolutionary genetics and in studies of speciation. We performed a karyotyping study and constructed high-density linkage maps for two closely related killifish species, Lucania parva and L. goodei, that differ in salinity tolerance and still hybridize in their contact zone in Florida. Using SNPs from orthologous EST contigs, we compared synteny between the two species to determine how genomic architecture has shifted with divergence. Karyotyping revealed that L. goodei possesses 24 acrocentric chromosomes (1N) whereas L. parva possesses 23 chromosomes (1N), one of which is a large metacentric chromosome. Likewise, high-density single-nucleotide polymorphism−based linkage maps indicated 24 linkage groups for L. goodei and 23 linkage groups for L. parva. Synteny mapping revealed two linkage groups in L. goodei that were highly syntenic with the largest linkage group in L. parva. Together, this evidence points to the largest linkage group in L. parva being the result of a chromosomal fusion. We further compared synteny between Lucania with the genome of a more distant teleost relative medaka (Oryzias latipes) and found good conservation of synteny at the chromosomal level. Each Lucania LG had a single best match with each medaka chromosome. These results provide the groundwork for future studies on the genetic architecture of reproductive isolation and salinity tolerance in Lucania and other Fundulidae.
synteny; Robertsonian fusion; chromosomal rearrangement; linkage map; speciation; EST-based SNPs; fundulidae
A novel sarcomeric myosin heavy chain gene, MYH14, was identified following the completion of the human genome project. MYH14 contains an intronic microRNA, miR-499, which is expressed in a slow/cardiac muscle specific manner along with its host gene; it plays a key role in muscle fiber-type specification in mammals. Interestingly, teleost fish genomes contain multiple MYH14 and miR-499 paralogs. However, the evolutionary history of MYH14 and miR-499 has not been studied in detail. In the present study, we identified MYH14/miR-499 loci on various teleost fish genomes and examined their evolutionary history by sequence and expression analyses.
Synteny and phylogenetic analyses depict the evolutionary history of MYH14/miR-499 loci where teleost specific duplication and several subsequent rounds of species-specific gene loss events took place. Interestingly, miR-499 was not located in the MYH14 introns of certain teleost fish. An MYH14 paralog, lacking miR-499, exhibited an accelerated rate of evolution compared with those containing miR-499, suggesting a putative functional relationship between MYH14 and miR-499. In medaka, Oryzias latipes, miR-499 is present where MYH14 is completely absent in the genome. Furthermore, by using in situ hybridization and small RNA sequencing, miR-499 was expressed in the notochord at the medaka embryonic stage and slow/cardiac muscle at the larval and adult stages. Comparing the flanking sequences of MYH14/miR-499 loci between torafugu Takifugu rubripes, zebrafish Danio rerio, and medaka revealed some highly conserved regions, suggesting that cis-regulatory elements have been functionally conserved in medaka miR-499 despite the loss of its host gene.
This study reveals the evolutionary history of the MYH14/miRNA-499 locus in teleost fish, indicating divergent distribution and expression of MYH14 and miR-499 genes in different teleost fish lineages. We also found that medaka miR-499 was even expressed in the absence of its host gene. To our knowledge, this is the first report that shows the conversion of intronic into non-intronic miRNA during the evolution of a teleost fish lineage.
Myosin heavy chain; MYH14 (MYH7b); microRNA; miR-499; Muscle; Muscle fiber-type; Teleostei
Medaka (Oryzias latipes) is a small egg-laying freshwater teleost native to East Asia that has become an excellent model system for developmental genetics and evolutionary biology. The draft medaka genome sequence (700 Mb) was reported in June 2007, and its substantial genomic resources have been opened to the public through the University of Tokyo Genome Browser Medaka (UTGB/medaka) database. This database provides basic genomic information, such as predicted genes, expressed sequence tags (ESTs), guanine/cytosine (GC) content, repeats and comparative genomics, as well as unique data resources including (i) 2473 genetic markers and experimentally confirmed PCR primers that amplify these markers, (ii) 142 414 bacterial artificial chromosome (BAC) and 217 344 fosmid end sequences that amount to 15.0- and 11.1-fold clone coverage of the entire genome, respectively, and were used for draft genome assembly, (iii) 16 519 460 single nucleotide polymorphisms (SNPs), and 2 859 905 insertions/deletions detected between two medaka inbred strain genomes and (iv) 841 235 5′-end serial analyses of gene-expression (SAGE) tags that identified 344 266 transcription start sites on the genome. UTGB/medaka is available at: http://medaka.utgenome.org/
Oryzias latipes (medaka) has been established as a vertebrate genetic model for more than a century and recently has been rediscovered outside its native Japan. The power of new sequencing methods now makes it possible to reinvigorate medaka genetics, in particular by establishing a near-isogenic panel derived from a single wild population. Here we characterize the genomes of wild medaka catches obtained from a single Southern Japanese population in Kiyosu as a precursor for the establishment of a near-isogenic panel of wild lines. The population is free of significant detrimental population structure and has advantageous linkage disequilibrium properties suitable for the establishment of the proposed panel. Analysis of morphometric traits in five representative inbred strains suggests phenotypic mapping will be feasible in the panel. In addition, high-throughput genome sequencing of these medaka strains confirms their evolutionary relationships on lines of geographic separation and provides further evidence that there has been little significant interbreeding between the Southern and Northern medaka population since the Southern/Northern population split. The sequence data suggest that the Southern Japanese medaka existed as a larger older population that went through a relatively recent bottleneck approximately 10,000 years ago. In addition, we detect patterns of recent positive selection in the Southern population. These data indicate that the genetic structure of the Kiyosu medaka samples is suitable for the establishment of a vertebrate near-isogenic panel and therefore inbreeding of 200 lines based on this population has commenced. Progress of this project can be tracked at http://www.ebi.ac.uk/birney-srv/medaka-ref-panel.
Medaka; inbreeding; population genomics; strain specific features
The gene segments encoding antibodies have been studied in many capacities and represent some of the best-characterized gene families in traditional animal disease models (mice and humans). To date, multiple immunoglobulin light chain (IgL) isotypes have been found in vertebrates and it is unclear as to which isotypes might be more primordial in nature. Sequence data emerging from an array of fish genome projects is a valuable resource for discerning complex multigene assemblages in this critical branch point of vertebrate phylogeny. Herein, we have analyzed the genomic organization of medaka (Oryzias latipes) IgL gene segments based on recently released genome data. The medaka IgL locus located on chromosome 11 contains at least three clusters of IgL gene segments comprised of multiple gene assemblages of the kappa light chain isotype. These data suggest that medaka IgL gene segments may undergo both intra- and inter-cluster rearrangements as a means to generate additional diversity. Alignments of expressed sequence tags to concordant gene segments which revealed each of the three IgL clusters are expressed. Collectively, these data provide a genomic framework for IgL genes in medaka and indicate that Ig diversity in this species is achieved from at least three distinct chromosomal regions.
Immunoglobulin genes; Teleost immunity; Antibody diversity; Evolution
Gene duplication is a major force of evolution. One whole genome duplication (WGD) event in the fish ancestor generated genome-wide duplicates in all modern species. Coloration and patterning on the animal body surface exhibit enormous diversity, representing a mysterious and ideal system for understanding gene evolution. Surface colors and patterns are determined primarily by pigment cells in the skin and eye. Thus, microphthalmia-associated transcription factor (Mitf) as a master regulator of melanocyte development is excellent for studying the evolution of WGD-derived gene duplicates. Here we report the evolution of mitf duplicate, mitf1 and mitf2, in the fish medaka (Oryzias latipes), which encode medaka co-homologs Mitf1 and Mitf2 of the mouse Mitf. Compared to mitf1, mitf2 exhibits an accelerated sequence divergence and loses melanocytic expression in embryos at critical developmental stages. Compared to a Xiphophorus counterpart, the medaka Mitf2 displayed a reduced activity in activating melanogenic gene expression by reporter assays and RT-PCR analyses. We show that the medaka Mitf2 has the ability to induce melanocyte differentiation in medaka embryonic stem cells but at a remarkably reduced efficiency compared to the Xiphophorus counterpart. Our data suggest differential evolution of the medaka mitf duplicate, with mitf1 adopting conservation and mitf2 employing degeneration, which is different from the duplication-degeneration-complementation proposed as the mechanism to preserve many gene duplicates in zebrafish. Our finding reveals species-specific variations for mitf duplicate evolution, in agreement with enormous diversity of body coloration and patterning.
mitf; gene duplicate; melanocyte; neural crest; pigmentation.
study seeks to delineate the ligand interactions that drive
biomarker induction in fish exposed to estrogenic pollutants and provide
a case study on the capacity of human (h) estrogen receptor (ER)-based in vitro screening assays to predict estrogenic effects
in aquatic species. Adult male Japanese medaka (Oryzias latipes) were exposed to solutions of singular steroidal estrogens or to
the estrogenic extract of an anaerobic swine waste lagoon. All exposure
concentrations were calibrated to be equipotent based on the yeast
estrogen screen (YES), which reports activation of hERα. These
exposures elicited significantly different magnitudes of hepatic vitellogenin
and choriogenin gene induction in the male medaka. Effects of the
same YES-calibrated solutions in the T47D-KBluc assay, which reports
activation of hERα and hERβ, generally recapitulated observations
in medaka. Using competitive ligand binding assays, it was found that
the magnitude of vitellogenin/choriogenin induction by different estrogenic
ligands correlated positively with preferential binding affinity for
medaka ERβ subtypes, which are highly expressed in male medaka
liver prior to estrogen exposure. Results support emerging evidence
that ERβ subtypes are critically involved in the teleost estrogenic
response, with the ERα:ERβ ratio being of particular importance.
Accordingly, incorporation of multiple ER subtypes into estrogen screening
protocols may increase predictive value for the risk assessment of
aquatic systems, including complex estrogenic mixtures.
Genetic polymorphisms are thought to generate intraspecific behavioral diversities, both within and among populations. The mechanisms underlying genetic control of behavioral properties, however, remain unclear in wild-type vertebrates, including humans. To explore this issue, we used diverse inbred strains of medaka fish (Oryzias latipes) established from the same and different local populations. Medaka exhibit a startle response to a visual stimulus (extinction of illumination) by rapidly bending their bodies (C-start) 20-ms after the stimulus presentation. We measured the rates of the response to repeated stimuli (1-s interval, 40 times) among four inbred strains, HNI-I, HNI-II, HO5, and Hd-rR-II1, and quantified two properties of the startle response: sensitivity (response rate to the first stimulus) and attenuation of the response probability with repeated stimulus presentation. Among the four strains, the greatest differences in these properties were detected between HNI-II and Hd-rR-II1. HNI-II exhibited high sensitivity (approximately 80%) and no attenuation, while Hd-rR-II1 exhibited low sensitivity (approximately 50%) and almost complete attenuation after only five stimulus presentations. Our findings suggested behavioral diversity of the startle response within a local population as well as among different populations. Linkage analysis with F2 progeny between HNI-II and Hd-rR-II1 detected quantitative trait loci (QTL) highly related to attenuation, but not to sensitivity, with a maximum logarithm of odds score of 11.82 on linkage group 16. The three genotypes (homozygous for HNI-II and Hd-rR-II1 alleles, and heterozygous) at the marker nearest the QTL correlated with attenuation. Our findings are the first to suggest that a single genomic region might be sufficient to generate individual differences in startle behavior between wild-type strains. Further identification of genetic polymorphisms that define the behavioral trait will contribute to our understanding of the neural mechanisms underlying behavioral diversity, allowing us to investigate the adaptive significance of intraspecific behavioral polymorphisms of the startle response.
The small freshwater teleost, medaka (Oryzias latipes), has a history of usage in studies of chronic toxicity of liver and biliary system. Recent progress with this model has focused on defining the medaka hepatobiliary system. Here we investigate critical liver function and toxicity by examining the in vivo role and function of the farnesoid X receptor alpha (FXRα, NR1H4), a member of the nuclear receptor superfamily that plays an essential role in the regulation of bile acid homeostasis. Quantitative mRNA analysis of medaka FXRα demonstrates differential expression of two FXRα isoforms designated Fxrα1 and Fxrα2, in both free swimming medaka embryos with remaining yolk (eleutheroembryos, EEs) and adults. Activation of medaka Fxrα in vivo with GW4064 (a strong FXRα agonist) resulted in modification of gene expression for defined FXRα gene targets including the bile salt export protein, small heterodimer partner, and cytochrome P450 7A1. Histological examination of medaka liver subsequent to GW4064 exposure demonstrated significant lipid accumulation, cellular and organelle alterations in both hepatocytes and biliary epithelial cells of the liver. This report of hepatobiliary injury following GW4064 exposure extends previous investigations of the intrahepatic biliary system in medaka, reveals sensitivity to toxicant exposure, and illustrates the need for added resolution in detection and interpretation of toxic responses in this vertebrate.
Farnesoid X receptor; Liver; Medaka; Toxicity; Eleutheroembryo
An accumulating body of research indicates there is an increased cancer risk associated with chronic infections. The genus Mycobacterium contains a number of species, including M. tuberculosis, which mount chronic infections and have been implicated in higher cancer risk. Several non-tuberculosis mycobacterial species, including M. marinum, are known to cause chronic infections in fish and like human tuberculosis, often go undetected. The elevated carcinogenic potential for fish colonies infected with Mycobacterium spp. could have far reaching implications because fish models are widely used to study human diseases. Japanese medaka (Oryzias latipes) is an established laboratory fish model for toxicology, mutagenesis, and carcinogenesis; and produces a chronic tuberculosis-like disease when infected by M. marinum. We examined the role that chronic mycobacterial infections play in cancer risk for medaka. Experimental M. marinum infections of medaka alone did not increase the mutational loads or proliferative lesion incidence in all tissues examined. However, we showed that chronic M. marinum infections increased hepatocellular proliferative lesions in fish also exposed to low doses of the mutagen benzo[a]pyrene. These results indicate that chronic mycobacterial infections of medaka are acting as tumor promoters and thereby suggest increased human risks for cancer promotion in human populations burdened with chronic tuberculosis infections.
benzo[a]pyrene; carcinogenesis; hepatic neoplasia; inflammation; Oryzias latipes; tuberculosis; non-tuberculosis mycobacteria
Genetic variation in genomes is organized in haplotype blocks, and species-specific block structure is defined by differential contribution of population history effects in combination with mutation and recombination events. Haplotype maps characterize the common patterns of linkage disequilibrium in populations and have important applications in the design and interpretation of genetic experiments. Although evolutionary processes are known to drive the selection of individual polymorphisms, their effect on haplotype block structure dynamics has not been shown. Here, we present a high-resolution haplotype map for a 5-megabase genomic region in the rat and compare it with the orthologous human and mouse segments. Although the size and fine structure of haplotype blocks are species dependent, there is a significant interspecies overlap in structure and a tendency for blocks to encompass complete genes. Extending these findings to the complete human genome using haplotype map phase I data reveals that linkage disequilibrium values are significantly higher for equally spaced positions in genic regions, including promoters, as compared to intergenic regions, indicating that a selective mechanism exists to maintain combinations of alleles within potentially interacting coding and regulatory regions. Although this characteristic may complicate the identification of causal polymorphisms underlying phenotypic traits, conservation of haplotype structure may be employed for the identification and characterization of functionally important genomic regions.
Differences at the DNA level are the major contributant underlying the phenotypic diversity between individuals in a population. The most common type of this genetic variation are single nucleotide polymorphisms (SNPs). Although the majority of SNPs do not have a functional effect, others may affect chromosome organization, gene expression, or protein function. SNPs and their individual states (alleles) are not randomly distributed throughout the genome and within a population. Recombination and mutation events, in combination with selection processes and population history, have resulted in common block-like structures in genomes. These structures are characterized by a common combination of SNP alleles, a so-called haplotype. Selection for specific haplotypes within a population is primarily driven by the advantageous effect of an individual polymorphism in the haplotype block.
By comparing the orthologous rat, mouse, and human haplotype structure of a 5-megabase region from rat Chromosome 1, the authors now show that haplotype block structure is conserved across mammals, most prominently in genic regions, suggesting the existence of an evolutionary selection process that drives the conservation of long-range allele combinations. Indeed, genome-wide gene-centric analysis of human HapMap data revealed that equally spaced polymorphic positions in genic regions and their upstream regulatory regions are genetically more tightly linked than in non-genic regions.
These findings may complicate the identification of causal polymorphisms underlying phenotypic traits, because in regions where haplotype structure is conserved, not a single polymorphism, but rather combinations of tightly linked polymorphisms could contribute to the phenotypic difference. On the other hand, conservation of haplotype structure may be employed for the identification and characterization of functionally important genomic regions.
Human infection by Mycobacterium tuberculosis is endemic, with approximately 2 billion infected and is the most common cause of adult death due to an infectious agent. Because of the slow growth rate of M. tuberculosis and risk to researchers, other species of Mycobacterium have been employed as alternative model systems to study human tuberculosis (TB). Mycobacterium marinum may be a good surrogate pathogen, conferring TB-like chronic infections in some fish. Medaka (Oryzias latipes) has been established for over five decades as a laboratory fish model for: toxicology, genotoxicity, teratogenesis, carcinogenesis, classical genetics and embryology. We are investigating if medaka might also serve as a host for M. marinum in order to model human TB. We show that both acute and chronic infections are inducible in a dose dependent manner. Colonization of target organs and systemic granuloma formation has been demonstrated through the use of histology. M. marinum expressing green fluorescent protein (Gfp) was used to monitor bacterial colonization of these organs in fresh tissues as well as in intact animals. Moreover, we have employed the See-Through fish line, a variety of medaka devoid of major pigments, to monitor real-time disease progression, in living animals. We have also compared the susceptibility of another prominent fish model, zebrafish (Danio rerio), to our medaka-M. marinum model. We determined the course of infections in zebrafish is significantly more severe than in medaka. Together, these results indicate that the medaka-M. marinum model provides unique advantages for studying chronic mycobacteriosis.
Danio rerio; granuloma; medaka; Mycobacterium marinum; Oryzias latipes; tuberculosis; zebrafish
Myc is a global transcriptional regulator and one of the most frequently overexpressed oncoproteins in human tumors. It is well established that activation of Myc leads to enhanced cell proliferation but can also lead to increased apoptosis. The use of animal models expressing deregulated levels of Myc has helped to both elucidate its function in normal cells and give insight into how Myc initiates and maintains tumorigenesis. Analyses of the medaka (Oryzias latipes) genome uncovered the unexpected presence of two Myc gene copies in this teleost species. Comparison of these Myc versions to other vertebrate species revealed that one gene, myc17, differs by the loss of some conserved regulatory protein motifs present in all other known Myc genes. To investigate how such differences might affect the basic biological functions of Myc, we generated a tamoxifen-inducible in vivo model utilizing a natural, fish-specific Myc gene. Using this model we show that, when activated, Myc17 leads to increased proliferation and to apoptosis in a dose-dependent manner, similar to human Myc. We have also shown that long-term Myc17 activation triggers liver hyperplasia in adult fish, allowing this newly established transgenic medaka model to be used to study the transition from hyperplasia to liver cancer and to identify Myc-induced tumorigenesis modifiers.
The availability of multiple teleost (bony fish) genomes is providing unprecedented opportunities to understand the diversity and function of gene duplication events using comparative genomics. Here we examine multiple paralogous genes of γ-glutamyl transferase (GGT) in several distantly related teleost species including medaka, stickleback, green spotted pufferfish, fugu and zebrafish. Through mining genome databases, we have identified multiple GGT orthologs. Duplicate (paralogous) GGT sequences for GGT1 (GGT1 a and b), GGTL1 (GGTL1 a and b) and GGTL3 (GGTL3 a and b) were identified for each species. Phylogenetic analysis suggests that GGTs are ancient proteins conserved across most metazoan phyla and those paralogous GGTs in teleosts likely arose from the serial 3R genome duplication events. A third GGTL1 gene (GGTL1c) was found in green spotted pufferfish; however this gene is not present in medaka, stickleback or fugu. Similarly, one or both paralogs of GGTL3 appear to have been lost in green spotted pufferfish, fugu and zebrafish. Syntenic relationships were highly maintained between duplicated teleost chromosomes, among teleosts and across ray-finned (Actinopterygii) and lobe-finned (Sarcopterygii) species. To assess subfunction partitioning, six medaka GGT genes were cloned and assessed for developmental and tissue specific expression. Based upon these data, we propose a modification of the “duplication-degeneration-complementation” (DDC) model of subfunction partitioning where quantitative differences rather than absolute differences in gene expression are observed between gene paralogs. Our results demonstrate that multiple GGT genes have been retained within teleost genomes. Questions remain however regarding the functional roles of multiple GGTs in these species.
Positive selection is a driving force that has shaped the modern human. Recent developments in high throughput technologies and corresponding statistics tools have made it possible to conduct whole genome surveys at a population scale, and a variety of measurements, such as heterozygosity (HET), FST, and Tajima's D, have been applied to multiple datasets to identify signals of positive selection. However, great effort has been required to combine various types of data from individual sources, and incompatibility among datasets has been a common problem. SNP@Evolution, a new database which integrates multiple datasets, will greatly assist future work in this area.
As part of our research scanning for evolutionary signals in HapMap Phase II and Phase III datasets, we built SNP@Evolution as a multi-aspect database focused on positive selection. Among its many features, SNP@Evolution provides computed FST and HET of all HapMap SNPs, 5+ HapMap SNPs per qualified gene, and all autosome regions detected from whole genome window scanning. In an attempt to capture multiple selection signals across the genome, selection-signal enrichment strength (ES) values of HET, FST, and P-values of iHS of most annotated genes have been calculated and integrated within one frame for users to search for outliers. Genes with significant ES or P-values (with thresholds of 0.95 and 0.05, respectively) have been highlighted in color. Low diversity chromosome regions have been detected by sliding a 100 kb window in a 10 kb step. To allow this information to be easily disseminated, a graphical user interface (GBrowser) was constructed with the Generic Model Organism Database toolkit.
Available at , SNP@Evolution is a hierarchical database focused on positive selection of the human genome. Based on HapMap Phase II and III data, SNP@Evolution includes 3,619,226/1,389,498 SNPs with their computed HET and FST, as well as qualified genes of 21,859/21,099 with ES values of HET and FST. In at least one HapMap population group, window scanning for selection signals has resulted in 1,606/10,138 large low HET regions. Among Phase II and III geographical groups, 660 and 464 regions show strong differentiation.
There are approximately 25 000 species in the division Teleostei and most are believed to have arisen during a relatively short period of time ca. 200 Myr ago. The discovery of 'extra' Hox gene clusters in zebrafish (Danio rerio), medaka (Oryzias latipes), and pufferfish (Fugu rubripes), has led to the hypothesis that genome duplication provided the genetic raw material necessary for the teleost radiation. We identified 27 groups of orthologous genes which included one gene from man, mouse and chicken, one or two genes from tetraploid Xenopus and two genes from zebrafish. A genome duplication in the ancestor of teleost fishes is the most parsimonious explanation for the observations that for 15 of these genes, the two zebrafish orthologues are sister sequences in phylogenies that otherwise match the expected organismal tree, the zebrafish gene pairs appear to have been formed at approximately the same time, and are unlinked. Phylogenies of nine genes differ a little from the tree predicted by the fish-specific genome duplication hypothesis: one tree shows a sister sequence relationship for the zebrafish genes but differs slightly from the expected organismal tree and in eight trees, one zebrafish gene is the sister sequence to a clade which includes the second zebrafish gene and orthologues from Xenopus, chicken, mouse and man. For these nine gene trees, deviations from the predictions of the fish-specific genome duplication hypothesis are poorly supported. The two zebrafish orthologues for each of the three remaining genes are tightly linked and are, therefore, unlikely to have been formed during a genome duplication event. We estimated that the unlinked duplicated zebrafish genes are between 300 and 450 Myr. Thus, genome duplication could have provided the genetic raw material for teleost radiation. Alternatively, the loss of different duplicates in different populations (i.e. 'divergent resolution') may have promoted speciation in ancient teleost populations.
The HapMap project aimed to catalog millions of common single nucleotide polymorphisms (SNPs) in the human genome in four major populations, in order to facilitate association studies of complex diseases. To examine the transferability of Han Chinese in Beijing HapMap data to the Southern Han Chinese in Shanghai, we performed comparative analyses between genotypes from over 4,500 SNPs in a 21 Mb region on chromosome 1q21-q25 in 80 unrelated Shanghai Chinese and 45 HapMap Chinese data.
Three thousand and forty-two SNPs were analyzed after removal of SNPs that failed quality control and those not in the HapMap panel. We compared the allele frequency distributions, linkage disequilibrium patterns, haplotype frequency distributions and tagging SNP sets transferability between the HapMap population and Shanghai Chinese population. Among the four HapMap populations, Beijing Chinese showed the best correlation with Shanghai population on allele frequencies, linkage disequilibrium and haplotype frequencies. Tagging SNP sets selected from four HapMap populations at different thresholds were evaluated in the Shanghai sample. Under the threshold of r2 equal to 0.8 or 0.5, both HapMap Chinese and Japanese data showed better coverage and tagging efficiency than Caucasian and African data.
Our study supported the applicability of HapMap Beijing Chinese SNP data to the study of complex diseases among southern Chinese population.
The Haplotype Map (HapMap) project recently generated genotype data for more than 1 million single-nucleotide polymorphisms (SNPs) in four population samples. The main application of the data is in the selection of tag single-nucleotide polymorphisms (tSNPs) to use in association studies. The usefulness of this selection process needs to be verified in populations outside those used for the HapMap project. In addition, it is not known how well the data represent the general population, as only 90–120 chromosomes were used for each population and since the genotyped SNPs were selected so as to have high frequencies. In this study, we analyzed more than 1,000 individuals from Estonia. The population of this northern European country has been influenced by many different waves of migrations from Europe and Russia. We genotyped 1,536 randomly selected SNPs from two 500-kbp ENCODE regions on Chromosome 2. We observed that the tSNPs selected from the CEPH (Centre d'Etude du Polymorphisme Humain) from Utah (CEU) HapMap samples (derived from US residents with northern and western European ancestry) captured most of the variation in the Estonia sample. (Between 90% and 95% of the SNPs with a minor allele frequency of more than 5% have an r2 of at least 0.8 with one of the CEU tSNPs.) Using the reverse approach, tags selected from the Estonia sample could almost equally well describe the CEU sample. Finally, we observed that the sample size, the allelic frequency, and the SNP density in the dataset used to select the tags each have important effects on the tagging performance. Overall, our study supports the use of HapMap data in other Caucasian populations, but the SNP density and the bias towards high-frequency SNPs have to be taken into account when designing association studies.
The recent completion of the Haplotype Map (HapMap) project of the human genome provides considerable information on the patterns of variation in the genome of four populations. One of the applications is a description of a set of tags that act as proxies for many other surrounding variants. This will greatly help researchers in their quest to find complex disease genes by reducing the number of genetic variants to test in association studies. To evaluate its usefulness, several aspects of the map, including its transferability to other populations, still needed to be verified experimentally. Using genomic regions where variants had been thoroughly documented in Caucasian samples from Estonia, the researchers found that the transferability of tags is extremely good. The researchers also found that variants with low frequency in the general population (i.e., less than 5%) could not be accurately captured with tags, and that the regional density of variants in the HapMap project had a major impact on the performance of the tags. This research indicates that the HapMap project will be useful, but that careful consideration of hypotheses and study design will be essential for the success of association studies.
The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants, but also more recently to assist in the interpretation and elucidation of disease signals. To date, many studies have looked in specific tissues and population-based samples, but there has been limited assessment of the degree of inter-population variability in regulatory variation. We analyzed genome-wide gene expression in lymphoblastoid cell lines from a total of 726 individuals from 8 global populations from the HapMap3 project and correlated gene expression levels with HapMap3 SNPs located in cis to the genes. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We further dissect the specific functional pathways differentiated between populations. We also identify 5,691 expression quantitative trait loci (eQTLs) after controlling for both non-genetic factors and population admixture and observe that half of the cis-eQTLs are replicated in one or more of the populations. We highlight patterns of eQTL-sharing between populations, which are partially determined by population genetic relatedness, and discover significant sharing of eQTL effects between Asians, European-admixed, and African subpopulations. Specifically, we observe that both the effect size and the direction of effect for eQTLs are highly conserved across populations. We observe an increasing proximity of eQTLs toward the transcription start site as sharing of eQTLs among populations increases, highlighting that variants close to TSS have stronger effects and therefore are more likely to be detected across a wider panel of populations. Together these results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation and provide an estimate for the transferability of complex trait variants across populations.
Variation among individuals in the degree to which genes are expressed (i.e. turned on or off) is a characteristic exhibited by all species, and studies have identified regions of the genome harboring genetic variation affecting gene expression levels. To assess the degree of human inter-population variability in regulatory variation, we describe mapping of regions of the genome that have functional effects on gene expression levels. We analyzed genome-wide gene expression in human cell lines derived from 726 unrelated individuals representing 8 global populations that have been genetically well-characterized by the International HapMap Project. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We identify ∼5,700 genes whose expression levels are associated with genetic variation located physically close to the gene, and we observe significant sharing of associations that is partially dependent on population genetic relatedness, among Asians, European-admixed, and African subpopulations. We identify biological functions affected by regulatory variation and describe common and unique characteristics of population-specific and population-shared associations. These results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation.
Since publication of the human genome in 2003, geneticists have been interested in risk variant associations to resolve the etiology of traits and complex diseases. The International HapMap Consortium undertook an effort to catalog all common variation across the genome (variants with a minor allele frequency (MAF) of at least 5% in one or more ethnic groups). HapMap along with advances in genotyping technology led to genome-wide association studies which have identified common variants associated with many traits and diseases. In 2008 the 1000 Genomes Project aimed to sequence 2500 individuals and identify rare variants and 99% of variants with a MAF of <1%.
To determine whether the 1000 Genomes Project includes all the variants in HapMap, we examined the overlap between single nucleotide polymorphisms (SNPs) genotyped in the two resources using merged phase II/III HapMap data and low coverage pilot data from 1000 Genomes.
Comparison of the two data sets showed that approximately 72% of HapMap SNPs were also found in 1000 Genomes Project pilot data. After filtering out HapMap variants with a MAF of <5% (separately for each population), 99% of HapMap SNPs were found in 1000 Genomes data.
Not all variants cataloged in HapMap are also cataloged in 1000 Genomes. This could affect decisions about which resource to use for SNP queries, rare variant validation, or imputation. Both the HapMap and 1000 Genomes Project databases are useful resources for human genetics, but it is important to understand the assumptions made and filtering strategies employed by these projects.
Patients suffering from Usher syndrome (USH) exhibit sensorineural hearing loss, retinitis pigmentosa (RP) and, in some cases, vestibular dysfunction. USH is the most common genetic disorder affecting hearing and vision and is included in a group of hereditary pathologies associated with defects in ciliary function known as ciliopathies. This syndrome is clinically classified into three types: USH1, USH2 and USH3. USH2 accounts for well over one-half of all Usher cases and mutations in the USH2A gene are responsible for the majority of USH2 cases, but also for atypical Usher syndrome and recessive non-syndromic RP. Because medaka fish (Oryzias latypes) is an attractive model organism for genetic-based studies in biomedical research, we investigated the expression and function of the USH2A ortholog in this teleost species. Ol-Ush2a encodes a protein of 5.445 aa codons, containing the same motif arrangement as the human USH2A. Ol-Ush2a is expressed during early stages of medaka fish development and persists into adulthood. Temporal Ol-Ush2a expression analysis using whole mount in situ hybridization (WMISH) on embryos at different embryonic stages showed restricted expression to otoliths and retina, suggesting that Ol-Ush2a might play a conserved role in the development and/or maintenance of retinal photoreceptors and cochlear hair cells. Knockdown of Ol-Ush2a in medaka fish caused embryonic developmental defects (small eyes and heads, otolith malformations and shortened bodies with curved tails) resulting in late embryo lethality. These embryonic defects, observed in our study and in other ciliary disorders, are associated with defective cell movement specifically implicated in left-right (LR) axis determination and planar cell polarity (PCP).