For 10,000 years pigs and humans have shared a close and complex relationship. From domestication to modern breeding practices, humans have shaped the genomes of domestic pigs. Here we present the assembly and analysis of the genome sequence of a female domestic Duroc pig (Sus scrofa) and a comparison with the genomes of wild and domestic pigs from Europe and Asia. Wild pigs emerged in South East Asia and subsequently spread across Eurasia. Our results reveal a deep phylogenetic split between European and Asian wild boars ~1 million years ago, and a selective sweep analysis indicates selection on genes involved in RNA processing and regulation. Genes associated with immune response and olfaction exhibit fast evolution. Pigs have the largest repertoire of functional olfactory receptor genes, reflecting the importance of smell in this scavenging animal. The pig genome sequence provides an important resource for further improvements of this important livestock species, and our identification of many putative disease-causing variants extends the potential of the pig as a biomedical model.
Hydatidiform mole (HM) is an abnormal human pregnancy, where the placenta presents with vesicular swelling of the chorionic villi. A fetus is either not present, or malformed and not viable. Most moles are diploid androgenetic as if one spermatozoon fertilized an empty oocyte, or triploid with one maternal and two paternal chromosome sets as if two spermatozoa fertilized a normal oocyte. However, diploid moles with both paternal and maternal markers of the nuclear genome have been reported. Among 162 consecutively collected diploid moles, we have earlier found indications of both maternal and paternal genomes in 11. In the present study, we have performed detailed analysis of DNA-markers in tissue and single cells from these 11 HMs. In 3/11, we identified one biparental cell population only, whereas in 8/11, we demonstrated mosaicism: one biparental cell population and one androgenetic cell population. One mosaic mole was followed by persistent trophoblastic disease (PTD). In seven of the mosaics, one spermatozoon appeared to have contributed to the genomes of both cell types. Our observations make it likely that mosaic conceptuses, encompassing an androgenetic cell population, result from various postzygotic abnormalities, including paternal pronuclear duplication, asymmetric cytokinesis, and postzygotic diploidization. This corroborates the suggestion that fertilization of an empty egg is not mandatory for the creation of an androgenetic cell population. Future studies of mosaic conceptuses may disclose details about fertilization, early cell divisions and differentiation. Apparently, only a minority of diploid moles with both paternal and maternal markers are ‘genuine' diploid biparental moles (DiBiparHMs).
hydatidiform mole; mosaicism; biparental diploidy; triploidy; genomic imprinting; persistent trophoblastic disease
The FET family of proteins is composed of FUS/TLS, EWS/EWSR1, and TAF15 and possesses RNA- and DNA-binding capacities. The FET-proteins are involved in transcriptional regulation and RNA processing, and FET-gene deregulation is associated with development of cancer and protein granule formations in amyotrophic lateral sclerosis, frontotemporal lobar degeneration, and trinucleotide repeat expansion diseases. We here describe a comparative characterization of FET-protein localization and gene regulatory functions. We show that FUS and TAF15 locate to cellular stress granules to a larger extend than EWS. FET-proteins have no major importance for stress granule formation and cellular stress responses, indicating that FET-protein stress granule association most likely is a downstream response to cellular stress. Gene expression analyses showed that the cellular response towards FUS and TAF15 reduction is relatively similar whereas EWS reduction resulted in a more unique response. The presented data support that FUS and TAF15 are more functionally related to each other, and that the FET-proteins have distinct functions in cellular signaling pathways which could have implications for the neurological disease pathogenesis.
Timely intervention for cancer requires knowledge of its earliest genetic aberrations. Sequencing of tumors and their metastases reveals numerous abnormalities occurring late in progression. A means to temporally order aberrations in a single cancer, rather than inferring them from serially acquired samples, would define changes preceding even clinically evident disease. We integrate DNA sequence and copy number information to reconstruct the order of abnormalities as individual tumors evolve for two separate cancer types. We detect vast, unreported expansion of simple mutation sharply demarcated by recombinative loss of the second copy of TP53 in cutaneous squamous cell carcinomas (cSCCs) and serous ovarian adenocarcinomas, in the former surpassing 50 mutations per megabase. In cSCCs, we also report diverse secondary mutations in known and novel oncogenic pathways, illustrating how such expanded mutagenesis directly promotes malignant progression. These results reframe paradigms in which TP53 mutation is required later, to bypass senescence induced by driver oncogenes.
mutation; p53; cancer genetics; genomic; Notch
Integrins constitute a superfamily of transmembrane signaling receptors that play pivotal roles in cutaneous homeostasis by modulating cell growth and differentiation as well as inflammatory responses in the skin. Subrabasal expression of integrins α2 and/or β1 entails hyperproliferation and aberrant differentiation of keratinocytes and leads to dermal and epidermal influx of activated T-cells. The anatomical and physiological similarities between porcine and human skin make the pig a suitable model for human skin diseases. In efforts to generate a porcine model of cutaneous inflammation, we employed the Sleeping Beauty DNA transposon system for production of transgenic cloned Göttingen minipigs expressing human β1 or α2 integrin under the control of a promoter specific for subrabasal keratinocytes. Using pools of transgenic donor fibroblasts, cloning by somatic cell nuclear transfer was utilized to produce reconstructed embryos that were subsequently transferred to surrogate sows. The resulting pigs were all transgenic and harbored from one to six transgene integrants. Molecular analyses on skin biopsies and cultured keratinocytes showed ectopic expression of the human integrins and localization within the keratinocyte plasma membrane. Markers of perturbed skin homeostasis, including activation of the MAPK pathway, increased expression of the pro-inflammatory cytokine IL-1α, and enhanced expression of the transcription factor c-Fos, were identified in keratinocytes from β1 and α2 integrin-transgenic minipigs, suggesting the induction of a chronic inflammatory phenotype in the skin. Notably, cellular dysregulation obtained by overexpression of either β1 or α2 integrin occurred through different cellular signaling pathways. Our findings mark the creation of the first cloned pig models with molecular markers of skin inflammation. Despite the absence of an overt psoriatic phenotype, these animals may possess increased susceptibility to severe skin damage-induced inflammation and should be of great potential in studies aiming at the development and refinement of topical therapies for cutaneous inflammation including psoriasis.
Animal breeding via Somatic Cell Nuclear Transfer (SCNT) has enormous potential in agriculture and biomedicine. However, concerns about whether SCNT animals are as healthy or epigenetically normal as conventionally bred ones are raised as the efficiency of cloning by SCNT is much lower than natural breeding or In-vitro fertilization (IVF). Thus, we have conducted a genome-wide gene expression and DNA methylation profiling between phenotypically normal cloned pigs and control pigs in two tissues (muscle and liver), using Affymetrix Porcine expression array as well as modified methylation-specific digital karyotyping (MMSDK) and Solexa sequencing technology. Typical tissue-specific differences with respect to both gene expression and DNA methylation were observed in muscle and liver from cloned as well as control pigs. Gene expression profiles were highly similar between cloned pigs and controls, though a small set of genes showed altered expression. Cloned pigs presented a more different pattern of DNA methylation in unique sequences in both tissues. Especially a small set of genomic sites had different DNA methylation status with a trend towards slightly increased methylation levels in cloned pigs. Molecular network analysis of the genes that contained such differential methylation loci revealed a significant network related to tissue development. In conclusion, our study showed that phenotypically normal cloned pigs were highly similar with normal breeding pigs in their gene expression, but moderate alteration in DNA methylation aspects still exists, especially in certain unique genomic regions.
Gene targeting by homologous recombination using recombinant adeno-associated virus (rAAV) is becoming a useful tool for basic research and therapeutic applications due to the remarkably high targeting frequency of rAAV virus vectors. However, the screening for the pure gene-targeted and random-integration-free primary cell clones is difficult since the cells have a limited proliferation capacity and often cannot be grown to produce sufficient DNA for non-PCR based analysis. This hampers the applications of this technology.
In this study, we have developed an improved PCR screening method, which can be used for fast screening of clones with unwanted random integration (RI) of the rAAV genome. This improved screening method includes four PCRs: a PCR for the selection gene (e.g. Neo-PCR), a PCR for targeted gene knockout (e.g. BRCA1-KO-PCR), and two generalized PCRs for random integration of the rAAV genome (5'-AAV-RI-PCR, and 3'-AAV-RI-PCR). We have shown that this screening method greatly facilitates the procedure of screening for BRCA1 (BReast CAncer susceptibility gene 1) targeted cell clones, eliminating cell clones with both BRCA1 knockout and random integration of the rAAV genome.
This screening method has facilitated the screening of correct gene-targeted cells. As the AAV-RI-PCRs are generalized PCRs, this method can also be applied for screening of rAAV-mediated targeting of other genes.
Analogues of vitamin D3 are extensively used in the treatment of various illnesses, such as osteoporosis, inflammatory skin diseases, and cancer. Functional testing of new vitamin D3 analogues and formulations for improved systemic and topical administration is supported by sensitive screening methods that allow a comparative evaluation of drug properties. As a new tool in functional screening of vitamin D3 analogues, we describe a genomically integratable sensor for sensitive drug detection. This system facilitates assessment of the pharmacokinetic and pharmadynamic properties of vitamin D3 analogues. The tri-cistronic genetic sensor encodes a drug-sensoring protein, a reporter protein expressed from an activated sensor-responsive promoter, and a resistance marker.
The three expression cassettes, inserted in a head-to-tail orientation in a Sleeping Beauty DNA transposon vector, are efficiently inserted as a single genetic entity into the genome of cells of interest in a reaction catalyzed by the hyperactive SB100X transposase. The applicability of the sensor for screening purposes is demonstrated by the functional comparison of potent synthetic analogues of vitamin D3 designed for the treatment of psoriasis and cancer. In clones of human keratinocytes carrying from a single to numerous insertions of the vitamin D3 sensor, a sensitive sensor read-out is detected upon exposure to even low concentrations of vitamin D3 analogues. In comparative studies, the sensor unveils superior potency of new candidate drugs in comparison with analogues that are currently in clinical use.
Our findings demonstrate the use of the genetic sensor as a tool in first-line evaluation of new vitamin D3 analogues and pave the way for new types of drug delivery studies in sensor-transgenic animals.
Transfer of full-length genes including regulatory elements has been the preferred gene therapy strategy for clinical applications. However, with significant drawbacks emerging, targeted gene alteration (TGA) has recently become a promising alternative to this method. By means of TGA, endogenous DNA repair pathways of the cell are activated leading to specific genetic correction of single-base mutations in the genome. This strategy can be implemented using single-stranded oligodeoxyribonucleotides (ssODNs), small DNA fragments (SDFs), triplex-forming oligonucleotides (TFOs), adeno-associated virus vectors (AAVs) and zinc-finger nucleases (ZFNs). Despite difficulties in the use of TGA, including lack of knowledge on the repair mechanisms stimulated by the individual methods, the field holds great promise for the future. The objective of this review is to summarize and evaluate the different methods that exist within this particular area of human gene therapy research.
Different cell subpopulations in a single tumor may show diverse capacities for growth, differentiation, metastasis formation, and sensitivity to treatments. Thus, heterogeneity is an important feature of tumors. However, due to limitations in experimental and analytical techniques, tumor heterogeneity has rarely been studied in detail.
Presentation of the hypothesis
Different tumor types have different heterogeneity patterns, thus heterogeneity could be a characteristic feature of a particular tumor type.
Testing the hypothesis
We applied our previously published mathematical heterogeneity model to decipher tumor heterogeneity through the analysis of genetic copy number aberrations revealed by array CGH data for tumors of three different tissues: breast, colon, and skin. The model estimates the number of subpopulations present in each tumor. The analysis confirms that different tumor types have different heterogeneity patterns. Computationally derived genomic copy number profiles from each subpopulation have also been analyzed and discussed with reference to the multiple hypothetical relationships between subpopulations in origin-related samples.
Implications of the hypothesis
Our observations imply that tumor heterogeneity could be seen as an independent parameter for determining the characteristics of tumors. In the context of more comprehensive usage of array CGH or genome sequencing in a clinical setting our study provides a new way to realize the full potential of tumor genetic analysis.
Analysis across the genome of patterns of DNA methylation reveals a rich landscape of allele-specific epigenetic modification and consequent effects on allele-specific gene expression.
DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies.
Epigenetic modifications such as addition of methyl groups to cytosine in DNA play a role in regulating gene expression. To better understand these processes, knowledge of the methylation status of all cytosine bases in the genome (the methylome) is required. DNA methylation can differ between the two gene copies (alleles) in each cell. Such allele-specific methylation (ASM) can be due to parental origin of the alleles (imprinting), X chromosome inactivation in females, and other as yet unknown mechanisms. This may significantly alter the expression profile arising from different allele combinations in different individuals. Using advanced sequencing technology, we have determined the methylome of human peripheral blood mononuclear cells (PBMC). Importantly, the PBMC were obtained from the same male Han Chinese individual whose complete genome had previously been determined. This allowed us, for the first time, to study genome-wide differences in ASM. Our analysis shows that ASM in PBMC is higher than can be accounted for by regions known to undergo parent-of-origin imprinting and frequently (>80%) correlates with allele-specific expression (ASE) of the corresponding gene. In addition, our data reveal a rich landscape of epigenomic variation for 20 genomic features, including regulatory, coding, and non-coding sequences, and provide a valuable resource for future studies. Our work further establishes whole-genome sequencing as an efficient method for methylome analysis.
MicroRNAs(miRNAs) are 18-25 nt small RNAs playing critical roles in many biological processes. The majority of known miRNAs were discovered by conventional cloning and a Sanger sequencing approach. The next-generation sequencing (NGS) technologies enable in-depth characterization of the global repertoire of miRNAs, and different protocols for miRNA library construction have been developed. However, the possible bias between the relative expression levels and sequences introduced by different protocols of library preparation have rarely been explored.
We assessed three different miRNA library preparation protocols, SOLiD, Illumina versions 1 and 1.5, using cloning or SBS sequencing of total RNA samples extracted from skeletal muscles from Hu sheep and Dorper sheep, and then validated 9 miRNAs by qRT-PCR. Our results show that SBS sequencing data highly correlate with Illumina cloning data. The SOLiD data, when compared to Illumina's, indicate more dispersed distribution of length, higher frequency variation for nucleotides near the 3'- and 5'-ends, higher frequency occurrence for reads containing end secondary structure (ESS), and higher frequency for reads that do not map to known miRNAs. qRT-PCR results showed the best correlation with SOLiD cloning data. Fold difference of Hu sheep and Dorper sheep between qRT-PCR result and SBS sequencing data correlated well (r = 0.937), and fold difference of miR-1 and miR-206 among SOLiD cloning data, qRT-PCR and SBS sequencing data was similar.
The sequencing depth can influence the quantitative measurement of miRNA abundance, but the discrepancy caused by it was not statistically significant as high correlation was observed between Illumina cloning and SBS sequencing data. Bias of length distribution, sequence variation, and ESS was observed between data obtained with the different protocols. SOLiD cloning data differ from Illumina cloning data mainly because of distinct methods of adapter ligation. The good correlation between qRT-PCR result and SOLiD data might be due to the similarities of the hybridization-based methods. The fold difference analysis indicated that methods based on hybridization may be superior for quantitative measurement of miRNA abundance. Because of the genome sequence of the sheep is not available, our data may not explain how the entire miRNA bias in the natural miRNAs in sheep or other mammal miRNA expression, unbiased artificially synthesized miRNA will help on evaluating the methodology of miRNA library preparation.
The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing.
Assemblies of the BAC clone derived genome sequence have been annotated using the Pre-Ensembl and Ensembl automated pipelines and made accessible through the Pre-Ensembl/Ensembl browsers. The current annotated genome assembly (Sscrofa9) was released with Ensembl 56 in September 2009. A revised assembly (Sscrofa10) is under construction and will incorporate whole genome shotgun sequence (WGS) data providing > 30× genome coverage. The WGS sequence, most of which comprise short Illumina/Solexa reads, were generated from DNA from the same single Duroc sow as the source of the BAC library from which clones were preferentially selected for sequencing. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication.
In this marker paper, the Swine Genome Sequencing Consortium (SGSC) sets outs its plans for analysis of the pig genome sequence, for the application and publication of the results.
Recent studies in human genomes have demonstrated the use of de novo assemblies to identify genetic variations that are difficult for mapping-based approaches. Construction of multiple human genome assemblies is enabled by massively parallel sequencing, but a conventional bioinformatics solution is costly and slow, creating bottle-necks in the process. This review describes two public short-read de novo assembly applications that can handle human genomes, ABySS and SOAPdenovo. It also discusses the technical aspects and future challenges of human genome de novo assembly by short reads.
de novo assembly; de Bruijn graph; massively parallel sequencing
DNA methylation is a widely studied epigenetic mechanism known to correlate with gene repression and genomic stability. Development of sensitive methods for global detection of DNA methylation events is of particular importance.
We here describe a technique, called modified methylation-specific digital karyotyping (MMSDK) based on methylation-specific digital karyotyping (MSDK) with a novel sequencing approach. Briefly, after a tandem digestion of genomic DNA with a methylation-sensitive mapping enzyme and a fragmenting enzyme, short sequence tags are obtained. These tags are amplified, followed by direct, massively parallel sequencing (Solexa 1G Genome Analyzer). This method allows high-throughput and low-cost genome-wide DNA methylation mapping. We applied this method to investigate global DNA methylation profiles for widely used breast cancer cell lines, MCF-7 and MDA-MB-231, which are representatives for luminal-like and mesenchymal-like cancer types, respectively. By comparison, a highly similar overall DNA methylation pattern was revealed for the two cell lines. However a cohort of individual genomic loci with significantly different DNA methylation status between two cell lines was identified. Furthermore, we revealed a genome-wide significant correlation between gene expression and the methylation status of gene promoters with CpG islands (CGIs) in the two cancer cell lines, and a correlation of gene expression and the methylation status of promoters without CGIs in MCF-7 cells.
The MMSDK method will be a valuable tool to increase the current knowledge of genome wide DNA methylation profiles.
A small region of about 70 kb on human chromosome 19q13.3 encompasses 4 genes of which 3, ERCC1, ERCC2, and PPP1R13L (aka RAI) are related to DNA repair and cell survival, and one, CD3EAP, aka ASE1, may be related to cell proliferation. The whole region seems related to the cellular response to external damaging agents and markers in it are associated with risk of several cancers.
We downloaded the genotypes of all markers typed in the 19q13.3 region in the HapMap populations of European, Asian and African descent and inferred haplotypes. We combined the European HapMap individuals with a Danish breast cancer case-control data set and inferred the association between HapMap haplotypes and disease risk.
We found that the susceptibility haplotype in our European sample had increased from 2 to 50 percent very recently in the European population, and to almost the same extent in the Asian population. The cause of this increase is unknown. The maximal proportion of overall genetic variation due to differences between groups for Europeans versus Africans and Europeans versus Asians (the Fst value) closely matched the putative location of the susceptibility variant as judged from haplotype-based association mapping.
The combined observation that a common haplotype causing an increased risk of cancer in Europeans and a high differentiation between human populations is highly unusual and suggests a causal relationship with a recent increase in Europeans caused either by genetic drift overruling selection against the susceptibility variant or a positive selection for the same haplotype. The data does not allow us to distinguish between these two scenarios. The analysis suggests that the region is not involved in cancer risk in Africans and that the susceptibility variants may be more finely mapped in Asian populations.
Array-based comparative genomic hybridization (CGH) is a commonly-used approach to detect DNA copy number variation in whole genome-wide screens. Several statistical methods have been proposed to define genomic segments with different copy numbers in cancer tumors. However, most tumors are heterogeneous and show variation in DNA copy numbers across tumor cells. The challenge is to reveal the copy number profiles of the subpopulations in a tumor and to estimate the percentage of each subpopulation.
We describe a relation between experimental data and exact DNA copy number and develop a statistical method to reveal the heterogeneity of tumors containing a mixture of different-stage cells. Furthermore, we validate the method on simulated data and apply the method to 29 pairs of breast primary tumors and their matched lymph node metastases.
We demonstrate a new method for CGH array analysis that allows a tumor sample to be classified according to its heterogeneity. The method gives an interpretable series of copy number profiles, one for each major subpopulation in a tumor. The profiles facilitate identification of copy number alterations in cancer development.
Previous results have suggested an association of the region of 19q13.3 with several forms of cancer. In the present study, we investigated 27 public markers within a previously identified 69 kb stretch of chromosome 19q for association with breast cancer by using linkage disequilibrium mapping. The study groups included 434 postmenopausal breast cancer cases and an identical number of individually matched controls.
Methods and Results
Studying one marker at a time, we found a region spanning the gene RAI (alias PPP1R13L or iASPP) and the 5' portion of XPD to be associated with this cancer. The region corresponds to a haplotype block, in which there seems to be very limited recombination in the Danish population. Studying combinations of markers, we found that two to four neighboring markers gave the most consistent and strongest result. The haplotypes with strongest association with cancers were located in the gene RAI and just 3' to the gene. Coinciding peaks were seen in the region of RAI in groups of women of different age.
In a follow-up to these results we sequenced 10 cases and 10 controls in a 44 kb region spanning the peaks of association. This revealed 106 polymorphisms, many of which were not in the public databases. We tested an additional 44 of these for association with disease and found a new tandem repeat marker, called RAI-3'd1, located downstream of the transcribed region of RAI, which was more strongly associated with breast cancer than any other marker we have tested (RR = 2.44 (1.41–4.23, p = 0.0008, all cases; RR = 6.29 (1.49–26.6), p = 0.01, cases up to 55 years of age).
We expect the marker RAI-3'd1 to be (part of) the cause for the association of the chromosome 19q13.3 region's association with cancer.
Keratoacanthoma (KA) is a benign keratinocytic neoplasm that usually presents as a solitary nodule on sunexposed areas, develops within 6-8 weeks and spontaneously regresses after 3-6 months. KAs share features such as infiltration and cytological atypia with squamous cell carcinomas (SCCs). Furthermore, there are reports of KAs that have metastasized, invoking the question of whether or not KA is a variant of SCC. To date no reported criteria are sensitive enough to discriminate reliably between KA and SCC, and consequently there is a clinical need for discriminating markers. We screened fresh frozen material from 132 KAs and 37 SCCs for gross chromosomal aberrations by using comparative genomic hybridization (CGH). Forty-nine KAs (37.1%) and 31 SCCs (83.7%) showed genomic aberrations, indicating a higher degree of chromosomal instability in SCCs. Gains of chromosomal material from 1p, 14q, 16q, 20q, and losses from 4p were seen significantly more frequently in SCCs compared with KAs (P-values 0.0033, 0.0198, 0.0301, 0.0017, and 0.0070), whereas loss from 9p was seen significantly more frequently in KAs (P-value 0.0434). The patterns of recurrent aberrations were also different in the two types of neoplasms, pointing to different genetic mechanisms involved in their developments.
TreeFam (http://www.treefam.org) was developed to provide curated phylogenetic trees for all animal gene families, as well as orthologue and paralogue assignments. Release 4.0 of TreeFam contains curated trees for 1314 families and automatically generated trees for another 14 351 families. We have expanded TreeFam to include 25 fully sequenced animal genomes, as well as four genomes from plant and fungal outgroup species. We have also introduced more accurate approaches for automatically grouping genes into families, for building phylogenetic trees, and for inferring orthologues and paralogues. The user interface for viewing phylogenetic trees and family information has been improved. Furthermore, a new perl API lets users easily extract data from the TreeFam mysql database.
A resource consisting of one million porcine ESTs is described, providing an essential resource for annotation, comparative genomics, assembly of the pig genome sequence, and further porcine transcription studies.
Knowledge of the structure of gene expression is essential for mammalian transcriptomics research. We analyzed a collection of more than one million porcine expressed sequence tags (ESTs), of which two-thirds were generated in the Sino-Danish Pig Genome Project and one-third are from public databases. The Sino-Danish ESTs were generated from one normalized and 97 non-normalized cDNA libraries representing 35 different tissues and three developmental stages.
Using the Distiller package, the ESTs were assembled to roughly 48,000 contigs and 73,000 singletons, of which approximately 25% have a high confidence match to UniProt. Approximately 6,000 new porcine gene clusters were identified. Expression analysis based on the non-normalized libraries resulted in the following findings. The distribution of cluster sizes is scaling invariant. Brain and testes are among the tissues with the greatest number of different expressed genes, whereas tissues with more specialized function, such as developing liver, have fewer expressed genes. There are at least 65 high confidence housekeeping gene candidates and 876 cDNA library-specific gene candidates. We identified differential expression of genes between different tissues, in particular brain/spinal cord, and found patterns of correlation between genes that share expression in pairs of libraries. Finally, there was remarkable agreement in expression between specialized tissues according to Gene Ontology categories.
This EST collection, the largest to date in pig, represents an essential resource for annotation, comparative genomics, assembly of the pig genome sequence, and further porcine transcription studies.
Snap (Single Nucleotide Polymorphism Annotation Platform) is a server designed to comprehensively analyze single genes and relationships between genes basing on SNPs in the human genome. The aim of the platform is to facilitate the study of SNP finding and analysis within the framework of medical research. Using a user-friendly web interface, genes can be searched by name, description, position, SNP ID or clone name. Several public databases are integrated, including gene information from Ensembl, protein features from Uniprot/SWISS-PROT, Pfam and DAS-CBS. Gene relationships are fetched from BIND, MINT, KEGG and are integrated with ortholog data from TreeFam to extend the current interaction networks. Integrated tools for primer-design and mis-splicing analysis have been developed to facilitate experimental analysis of individual genes with focus on their variation. Snap is available at and at .