We have simulated the evolution of sexually reproducing populations composed of individuals represented by diploid genomes. A series of eight bits formed an allele occupying one of 128 loci of one haploid genome (chromosome). The environment required a specific activity of each locus, this being the sum of the activities of both alleles located at the corresponding loci on two chromosomes. This activity is represented by the number of bits set to zero. In a constant environment the best fitted individuals were homozygous with alleles’ activities corresponding to half of the environment requirement for a locus (in diploid genome two alleles at corresponding loci produced a proper activity). Changing the environment under a relatively low recombination rate promotes generation of more polymorphic alleles. In the heterozygous loci, alleles of different activities complement each other fulfilling the environment requirements. Nevertheless, the genetic pool of populations evolves in the direction of a very restricted number of complementing haplotypes and a fast changing environment kills the population. If simulations start with all loci heterozygous, they stay heterozygous for a long time.
Monte-Carlo simulations; Allele complementation; Polymorphic loci
For a diploid organism such as human, the two alleles of a particular gene can be expressed at different levels due to X chromosome inactivation, gene imprinting, different local promoter activity, or mRNA stability. Recently, imbalanced allelic expression was found to be common in human and can follow Mendelian inheritance. Here we present a method that employs real competitive PCR for allele-specific expression analysis.
A transcribed mutation such as a single nucleotide polymorphism (SNP) is used as the marker for allele-specific expression analysis. A synthetic mutation created in the competitor is close to a natural mutation site in the cDNA sequence. PCR is used to amplify the two cDNA sequences from the two alleles and the competitor. A base extension reaction with a mixture of ddNTPs/dNTP is used to generate three oligonucleotides for the two cDNAs and the competitor. The three products are identified and their ratios are calculated based on their peak areas in the MALDI-TOF mass spectrum. Several examples are given to illustrate how allele-specific gene expression can be applied in different biological studies.
This technique can quantify the absolute expression level of each individual allele of a gene with high precision and throughput.
allele-specific; gene expression; MALDI-TOF mass spectrometry; PCR; heterozygous
The diploid, Solanum caripense, a wild relative of potato and tomato, possesses valuable resistance to potato late blight and we are interested in the genetic base of this resistance. Due to extremely low levels of genetic variation within the S. caripense genome it proved impossible to generate a dense genetic map and to assign individual Solanum chromosomes through the use of conventional chromosome-specific SSR, RFLP, AFLP, as well as gene- or locus-specific markers. The ease of detection of DNA polymorphisms depends on both frequency and form of sequence variation. The narrow genetic background of close relatives and inbreds complicates the detection of persisting, reduced polymorphism and is a challenge to the development of reliable molecular markers. Nonetheless, monomorphic DNA fragments representing not directly usable conventional markers can contain considerable variation at the level of single nucleotide polymorphisms (SNPs). This can be used for the design of allele-specific molecular markers. The reproducible detection of allele-specific markers based on SNPs has been a technical challenge.
We present a fast and cost-effective protocol for the detection of allele-specific SNPs by applying Sequence Polymorphism-Derived (SPD) markers. These markers proved highly efficient for fingerprinting of individuals possessing a homogeneous genetic background. SPD markers are obtained from within non-informative, conventional molecular marker fragments that are screened for SNPs to design allele-specific PCR primers. The method makes use of primers containing a single, 3'-terminal Locked Nucleic Acid (LNA) base. We demonstrate the applicability of the technique by successful genetic mapping of allele-specific SNP markers derived from monomorphic Conserved Ortholog Set II (COSII) markers mapped to Solanum chromosomes, in S. caripense. By using SPD markers it was possible for the first time to map the S. caripense alleles of 16 chromosome-specific COSII markers and to assign eight of the twelve linkage groups to consensus Solanum chromosomes.
The method based on individual allelic variants allows for a level-of-magnitude higher resolution of genetic variation than conventional marker techniques. We show that the majority of monomorphic molecular marker fragments from organisms with reduced heterozygosity levels still contain SNPs that are sufficient to trace individual alleles.
Genomic imprinting is an epigenetic phenomenon leading to parent-of-origin specific differential expression of maternally and paternally inherited alleles. In plants, genomic imprinting has mainly been observed in the endosperm, an ephemeral triploid tissue derived after fertilization of the diploid central cell with a haploid sperm cell. In an effort to identify novel imprinted genes in Arabidopsis thaliana, we generated deep sequencing RNA profiles of F1 hybrid seeds derived after reciprocal crosses of Arabidopsis Col-0 and Bur-0 accessions. Using polymorphic sites to quantify allele-specific expression levels, we could identify more than 60 genes with potential parent-of-origin specific expression. By analyzing the distribution of DNA methylation and epigenetic marks established by Polycomb group (PcG) proteins using publicly available datasets, we suggest that for maternally expressed genes (MEGs) repression of the paternally inherited alleles largely depends on DNA methylation or PcG-mediated repression, whereas repression of the maternal alleles of paternally expressed genes (PEGs) predominantly depends on PcG proteins. While maternal alleles of MEGs are also targeted by PcG proteins, such targeting does not cause complete repression. Candidate MEGs and PEGs are enriched for cis-proximal transposons, suggesting that transposons might be a driving force for the evolution of imprinted genes in Arabidopsis. In addition, we find that MEGs and PEGs are significantly faster evolving when compared to other genes in the genome. In contrast to the predominant location of mammalian imprinted genes in clusters, cluster formation was only detected for few MEGs and PEGs, suggesting that clustering is not a major requirement for imprinted gene regulation in Arabidopsis.
Genomic imprinting poses a violation to the Mendelian rules of inheritance, which state functional equality of maternally and paternally inherited alleles. Imprinted genes are expressed dependent on their parent-of-origin, implicating an epigenetic asymmetry of maternal and paternal alleles. Genomic imprinting occurs in mammals and flowering plants. In both groups of organisms, nourishing of the progeny depends on ephemeral tissues, the placenta and the endosperm, respectively. In plants, genomic imprinting predominantly occurs in the endosperm, which is derived after fertilization of the diploid central cell with a haploid sperm cell. In this study we identify more than 60 potentially imprinted genes and show that there are different epigenetic mechanisms causing maternal and paternal-specific gene expression. We show that maternally expressed genes are regulated by DNA methylation or Polycomb group (PcG)-mediated repression, while paternally expressed genes are predominantly regulated by PcG proteins. From an evolutionary perspective, we also show that imprinted genes are associated with transposons and are more rapidly evolving than other genes in the genome. Many MEGs and PEGs encode for transcriptional regulators, implicating important functional roles of imprinted genes for endosperm and seed development.
X inactivation—the transcriptional silencing of one X chromosome copy per female somatic cell—is universal among therian mammals, yet the choice of which X to silence exhibits considerable variation among species. X inactivation strategies can range from strict paternally inherited X inactivation (PXI), which renders females haploid for all maternally inherited alleles, to unbiased random X inactivation (RXI), which equalizes expression of maternally and paternally inherited alleles in each female tissue. However, the underlying evolutionary processes that might account for this observed diversity of X inactivation strategies remain unclear. We present a theoretical population genetic analysis of X inactivation evolution and specifically consider how conditions of dominance, linkage, recombination, and sex-differential selection each influence evolutionary trajectories of X inactivation. The results indicate that a single, critical interaction between allelic dominance and sex-differential selection can select for a broad and continuous range of X inactivation strategies, including unequal rates of inactivation between maternally and paternally inherited X chromosomes. RXI is favored over complete PXI as long as alleles deleterious to female fitness are sufficiently recessive, and the criteria for RXI evolution is considerably more restrictive when fitness variation is sexually antagonistic (i.e., alleles deleterious to females are beneficial to males) relative to variation that is deleterious to both sexes. Evolutionary transitions from PXI to RXI also generally increase mean relative female fitness at the expense of decreased male fitness. These results provide a theoretical framework for predicting and interpreting the evolution of chromosome-wide expression of X-linked genes and lead to several useful predictions that could motivate future studies of allele-specific gene expression variation.
With the exception of its most primitive members, mammal species practice X inactivation, where one copy of each X chromosome pair is silenced in each cell of the female body. The particular copy of the X that is silenced nevertheless shows considerable variability among species, and the evolutionary causes for this variability remain unclear. Here, we show that X inactivation strategies are likely to evolve in response to the sex-differential fitness properties of X-linked genetic variation. Genetic variation with similar effects on male and female fitness will generally favor the evolution of random X inactivation, potentially including preferential inactivation of the maternally inherited X chromosome. Variation with opposing fitness effects in each sex (“sexually antagonistic” variation, which includes mutations that both decrease female fitness and enhance male fitness) selects for preferential or complete inactivation of the paternally inherited X. Paternally biased X inactivation patterns appear to be common in nature, which suggests that sexually antagonistic genetic variation might be an important factor underlying the evolution of X inactivation. The theory provides a conceptual framework for understanding the evolution of X inactivation strategies and generates several novel predictions that may soon be tested with modern genome sequencing technologies.
Mono-allelic expression at the mouse IGF2/H19 locus is controlled by differential allelic DNA methylation of the imprinting control region (ICR). Because a randomly integrated H19 ICR fragment, when incorporated into the genome of transgenic mice (TgM), was allele-specifically methylated in somatic, but not in germ cells, it was suggested that allele-discriminating epigenetic signature, set within or somewhere outside of the Tg H19 ICR fragment in germ cells, was later translated into a differential DNA methylation pattern. To test if the chicken β-globin HS4 (cHS4) chromatin insulator might interfere with methylation imprinting establishment at the H19 ICR, we inserted the H19 ICR fragment, flanked by a set of floxed cHS4 core sequences, into a human β-globin locus YAC and generated TgM (insulated ICR' TgM). As controls, the cHS4 sequences were removed from one side (5'HS4-deleted ICR') or both sides (pseudo-WT ICR') of the insulated ICR' by in vivo cre-loxP recombination. The data show that while maternally inherited transgenic H19 ICR was not methylated in insulated ICR' TgM, it was significantly methylated upon paternal transmission, though the level was lower than in the pseudo-WT ICR' control. Because this reduced level of methylation was also observed in the 5'HS4-deleted ICR' TgM, we speculate that the phenotype is due to VEZF1-dependent demethylation activity, rather than the insulator function, borne in cHS4. Collectively, although we cannot rule out the possibility that cHS4 is incapable of blocking an allele-discriminating signal from outside of the transgene, the epigenetic signature appears to be marked intrinsically within the H19 ICR.
Recent reports have shown that most of the genome is transcribed and that transcription frequently occurs concurrently on both DNA strands. In diploid genomes, the expression level of each allele conditions the degree to which sequence polymorphisms affect the phenotype. It is thus essential to quantify expression in an allele- and strand-specific manner. Using a custom-designed tiling array and a new computational approach, we piloted measuring allele- and strand-specific expression in yeast. Confident quantitative estimates of allele-specific expression were obtained for about half of the coding and non-coding transcripts of a heterozygous yeast strain, of which 371 transcripts (13%) showed significant allelic differential expression greater than 1.5-fold. The data revealed complex allelic differential expression on opposite strands. Furthermore, combining allele-specific expression with linkage mapping enabled identifying allelic variants that act in cis and in trans to regulate allelic expression in the heterozygous strain. Our results provide the first high-resolution analysis of differential expression on all four strands of an eukaryotic genome.
allele-specific expression; linkage mapping; microarray analysis; phosphate metabolism; strand-specific expression
Motivation: The sequencing of personal genomes enabled analysis of variation in transcription factor (TF) binding, chromatin structure and gene expression and indicated how they contribute to phenotypic variation. It is hypothesized that using the reference genome for mapping ChIP-seq or RNA-seq reads may introduce errors, especially at polymorphic genomic regions.
Results: We developed a Personal Genome Editor (perEditor) that changes the reference human genome (NCBI36/hg18) into an individual genome, taking into account single nucleotide polymorphisms (SNPs), insertions and deletions, copy number variation, and chromosomal rearrangements. perEditor outputs two alleles (maternal, paternal) of the individual genome that is ready for mapping ChIP-seq and RNA-seq reads, and enabling the analyses of allele specific binding, chromatin structure and gene expression.
Availability: perEditor is available at http://biocomp.bioen.uiuc.edu/perEditor.
We have investigated the genetics and molecular biology of orange flesh colour in potato (Solanum tuberosum L.). To this end the natural diversity in three genes of the carotenoid pathway was assessed by SNP analyses. Association analysis was performed between SNP haplotypes and flesh colour phenotypes in diploid and tetraploid potato genotypes. We observed that among eleven beta-carotene hydroxylase 2 (Chy2) alleles only one dominant allele has a major effect, changing white into yellow flesh colour. In contrast, none of the lycopene epsilon cyclase (Lcye) alleles seemed to have a large effect on flesh colour. Analysis of zeaxanthin epoxidase (Zep) alleles showed that all (diploid) genotypes with orange tuber flesh were homozygous for one specific Zep allele. This Zep allele showed a reduced level of expression. The complete genomic sequence of the recessive Zep allele, including the promoter, was determined, and compared with the sequence of other Zep alleles. The most striking difference was the presence of a non-LTR retrotransposon sequence in intron 1 of the recessive Zep allele, which was absent in all other Zep alleles investigated. We hypothesise that the presence of this large sequence in intron 1 caused the lower expression level, resulting in reduced Zep activity and accumulation of zeaxanthin. Only genotypes combining presence of the dominant Chy2 allele with homozygosity for the recessive Zep allele produced orange-fleshed tubers that accumulated large amounts of zeaxanthin.
Electronic supplementary material
The online version of this article (doi:10.1007/s11103-010-9647-y) contains supplementary material, which is available to authorized users.
Solanum tuberosum; SNP analysis; Allele mining; Orange flesh; Carotenoids; Zeaxanthin epoxidase
Today, there are at least a dozen different genetic disorders caused by mutations within the LMNA gene, and collectively, they are named laminopathies. Interestingly, the same mutation can cause phenotypes with different severities or even different disorders and might, in some cases, be asymptomatic. We hypothesized that one possible contributing mechanism for this phenotypic variability could be the existence of high and low expressing alleles in the LMNA locus. To investigate this hypothesis, we developed an allele-specific absolute quantification method for lamin A and lamin C transcripts using the polymorphic rs4641C/T LMNA coding SNP. The contribution of each allele to the total transcript level was investigated in nine informative human primary dermal fibroblast cultures from Hutchinson-Gilford progeria syndrome (HGPS) and unaffected controls. Our results show differential expression of the two alleles. The C allele is more frequently expressed and accounts for ∼70% of the lamin A and lamin C transcripts. Analysis of samples from six patients with Hutchinson-Gilford progeria syndrome showed that the c.1824C>T, p.G608G mutation is located in both the C and the T allele, which might account for the variability in phenotype seen among HGPS patients. Our method should be useful for further studies of human samples with mutations in the LMNA gene and to increase the understanding of the link between genotype and phenotype in laminopathies.
Tetraploid cells of Saccharomyces cerevisiae are generated spontaneously in a homothallic MATa/MATα diploid population at low frequency (approximately 10−6 per cell) through the homozygosity of mating-type alleles by mitotic recombination followed by homothallic switching of the mating-type alleles. To isolate tetraploid clones more effectively, a selection method was developed that used a dye plate containing 40 mg each of eosin Y and amaranth in synthetic nutrient agar per liter. It was possible to isolate tetraploid clones on the dye plate at a frequency of 1 to 3% among the colonies colored dark red in contrast to the light red of the original diploid colonies. Isogenic series of haploid to tetraploid clones with homozygous or heterozygous genomic configurations were easily constructed with the tetraploid strains. No significant differences in specific growth rate or fermentative rate were observed corresponding to differences in ploidy, although the haploid clones showed a higher frequency of spontaneous respiratory-deficient cells than did the others. However, a significant increment in the fermentative rate in glucose nutrient medium was observed in the hybrid strains constructed with two independent homozygous cell lines. These observations strongly suggest that the polyploid strains favored by the brewing and baking industries perform well not because of the physical increment of the cellular volume by polyploidy but because of the genetic complexity or heterosis by heterozygosity of the genome in the hybrid polyploid cells.
Differences in gene expression are thought to be an important source of phenotypic diversity, so dissecting the genetic components of natural variation in gene expression is important for understanding the evolutionary mechanisms that lead to adaptation. Gene expression is a complex trait that, in diploid organisms, results from transcription of both maternal and paternal alleles. Directly measuring allelic expression rather than total gene expression offers greater insight into regulatory variation. The recent emergence of high-throughput sequencing offers an unprecedented opportunity to study allelic transcription at a genomic scale for virtually any species. By sequencing transcript pools derived from heterozygous individuals, estimates of allelic expression can be directly obtained. The statistical power of this approach is influenced by the number of transcripts sequenced and the ability to unambiguously assign individual sequence fragments to specific alleles on the basis of transcribed nucleotide polymorphisms. Here, using mathematical modelling and computer simulations, we determine the minimum sequencing depth required to accurately measure relative allelic expression and detect allelic imbalance via high-throughput sequencing under a variety of conditions. We conclude that, within a species, a minimum of 500–1000 sequencing reads per gene are needed to test for allelic imbalance, and consequently, at least five to 10 millions reads are required for studying a genome expressing 10 000 genes. Finally, using 454 sequencing, we illustrate an application of allelic expression by testing for cis-regulatory divergence between closely related Drosophila species.
cis-regulation; Drosophila melanogaster; Drosophila simulans; gene expression; hybrids
A computational pipeline for constructing a personal diploid genome and determining sites of allele-specific activity is developed. Using a regulatory network framework, allele-specific binding and expression are found to be significantly coordinated across the genome.
Software was developed for building a personal diploid genome sequence, and determining sites of allele-specific binding and expression (AlleleSeq).This computational pipeline was used to analyze variation data, and deeply sequenced RNA-Seq and ChIP-Seq datasets, for individual NA12878 from the 1000 Genomes Project.The interaction between allele-specific binding and allele-specific expression are investigated, revealing clear coordination.
To study allele-specific expression (ASE) and binding (ASB), that is, differences between the maternally and paternally derived alleles, we have developed a computational pipeline (AlleleSeq). Our pipeline initially constructs a diploid personal genome sequence (and corresponding personalized gene annotation) using genomic sequence variants (SNPs, indels, and structural variants), and then identifies allele-specific events with significant differences in the number of mapped reads between maternal and paternal alleles. There are many technical challenges in the construction and alignment of reads to a personal diploid genome sequence that we address, for example, bias of reads mapping to the reference allele. We have applied AlleleSeq to variation data for NA12878 from the 1000 Genomes Project as well as matched, deeply sequenced RNA-Seq and ChIP-Seq data sets generated for this purpose. In addition to observing fairly widespread allele-specific behavior within individual functional genomic data sets (including results consistent with X-chromosome inactivation), we can study the interaction between ASE and ASB. Furthermore, we investigate the coordination between ASE and ASB from multiple transcription factors events using a regulatory network framework. Correlation analyses and network motifs show mostly coordinated ASB and ASE.
allele-specific; ChIP-Seq; networks; RNA-Seq
Single nucleotide polymorphisms (SNPs) and small insertions or deletions (indels) are the most common type of polymorphisms and are frequently used for molecular marker development. Such markers have become very popular for all kinds of genetic analysis, including haplotype reconstruction. Haplotypes can be reconstructed for whole chromosomes but also for specific genes, based on the SNPs present. Haplotypes in the latter context represent the different alleles of a gene. The computational approach to SNP mining is becoming increasingly popular because of the continuously increasing number of sequences deposited in databases, which allows a more accurate identification of SNPs. Several software packages have been developed for SNP mining from databases. From these, QualitySNP is the only tool that combines SNP detection with the reconstruction of alleles, which results in a lower number of false positive SNPs and also works much faster than other programs. We have build a web-based SNP discovery and allele detection tool (HaploSNPer) based on QualitySNP.
HaploSNPer is a flexible web-based tool for detecting SNPs and alleles in user-specified input sequences from both diploid and polyploid species. It includes BLAST for finding homologous sequences in public EST databases, CAP3 or PHRAP for aligning them, and QualitySNP for discovering reliable allelic sequences and SNPs. All possible and reliable alleles are detected by a mathematical algorithm using potential SNP information. Reliable SNPs are then identified based on the reconstructed alleles and on sequence redundancy.
Thorough testing of HaploSNPer (and the underlying QualitySNP algorithm) has shown that EST information alone is sufficient for the identification of alleles and that reliable SNPs can be found efficiently. Furthermore, HaploSNPer supplies a user friendly interface for visualization of SNP and alleles. HaploSNPer is available from .
Multiple lines of evidence suggest regulatory variation to play an important role in phenotypic evolution and disease development, but few regulatory polymorphisms have been characterized genetically and molecularly. Recent technological advances have made it possible to identify bona fide regulatory sequences experimentally on a genome-wide scale and opened the window for the biological interrogation of germ-line polymorphisms within these sequences. In this study, through a forward genetic analysis of bona fide p53 binding sites identified by a genome-wide chromatin immunoprecipitation and sequence analysis, we discovered a SNP (rs1860746) within the motif sequence of a p53 binding site where p53 can function as a regulator of transcription. We found that the minor allele (T) binds p53 poorly and has low transcriptional regulation activity as compared to the major allele (G). Significantly, the homozygosity of the minor allele was found to be associated with an increased risk of ER negative breast cancer (OR = 1.47, P = 0.038) from the analysis of five independent breast cancer samples of European origin consisting of 6,127 breast cancer patients and 5,197 controls. rs1860746 resides in the third intron of the PRKAG2 gene that encodes the γ subunit of the AMPK protein, a major sensor of metabolic stress and a modulator of p53 action. However, this gene does not appear to be regulated by p53 in lymphoblastoid cell lines nor in a cancer cell line. These results suggest that either the rs1860746 locus regulates another gene through distant interactions, or that this locus is in linkage disequilibrium with a second causal mutation. This study shows the feasibility of using genomic scale molecular data to uncover disease associated SNPs, but underscores the complexity of determining the function of regulatory variants in human populations.
Electronic supplementary material
The online version of this article (doi:10.1007/s11568-010-9138-x) contains supplementary material, which is available to authorized users.
p53 binding sites; PRKAG2 gene; Polymorphism; ER negative tumors; Breast cancer susceptibility
Analysis of haplotypes is an important tool in population genetics, familial heredity and gene mapping. Determination of haplotypes of multiple single nucleotide polymorphisms (SNPs) or other simple mutations is time consuming and expensive when analyzing large populations, and often requires the help of computational and statistical procedures. Based on double PCR amplification of specific alleles, described previously, we have developed a simple, rapid and low-cost method for direct haplotyping of multiple SNPs and simple mutations found within relatively short specific regions or genes (micro-haplotypes). Using this method, it is possible to directly determine the physical linkage of multiple heterozygous alleles, by conducting a series of double allele-specific PCR amplification sets with simple analysis by gel electrophoresis. Application of the method requires prior information as to the sequence of the segment to be haplotyped, including the polymorphic sites. We applied the method to haplotyping of nine sites in the chicken HSP108 gene. One of the haplotypes in the population apparently arose by recombination between two existing haplotypes, and we were able to locate the point of recombination within a segment of 19 bp. We anticipate rapidly growing needs for SNP haplotyping in human (medical and pharmacogenetics), animal and plant genetics; in this context, the multiple double PCR amplifications of specific alleles (MD-PASA) method offers a useful haplotyping tool.
We developed a digital RNA allelotyping method for quantitatively interrogating allele-specific gene expression. This method involves ultra-deep sequencing of padlock captured SNPs from the transcriptome. We characterized four cell lines established from two human subjects in the Personal Genome Project. Approximately 11–22% of the heterozygous mRNA-associated SNPs show allele-specific expression in each cell line; and 4.3–8.5% are tissue-specific, suggesting the presence of tissue-specific cis-regulation. When applied to two pairs of sibling human embryonic stem cell lines, the sibling lines were more similar in allele-specific expression than were the genetically unrelated lines. We found that the variation of allelic ratios in gene expression among different cell lines is primarily explained by genetic variations, much more so than by specific tissue types or culturing conditions. Comparison of expressed SNPs on the sense and anti-sense transcripts suggested that allelic ratios are primarily determined by cis-regulatory mechanisms on the sense transcripts.
Phenotypic diversity can arise rapidly through loss of heterozygosity (LOH) or by the acquisition of copy number variations (CNV) spanning whole chromosomes or shorter contiguous chromosome segments. In Candida albicans, a heterozygous diploid yeast pathogen with no known meiotic cycle, homozygosis and aneuploidy alter clinical characteristics, including drug resistance. Here, we developed a high-resolution microarray that simultaneously detects ∼39,000 single nucleotide polymorphism (SNP) alleles and ∼20,000 copy number variation loci across the C. albicans genome. An important feature of the array analysis is a computational pipeline that determines SNP allele ratios based upon chromosome copy number. Using the array and analysis tools, we constructed a haplotype map (hapmap) of strain SC5314 to assign SNP alleles to specific homologs, and we used it to follow the acquisition of loss of heterozygosity (LOH) and copy number changes in a series of derived laboratory strains. This high-resolution SNP/CGH microarray and the associated hapmap facilitated the phasing of alleles in lab strains and revealed detrimental genome changes that arose frequently during molecular manipulations of laboratory strains. Furthermore, it provided a useful tool for rapid, high-resolution, and cost-effective characterization of changes in allele diversity as well as changes in chromosome copy number in new C. albicans isolates.
comparative genome hybridization; single nucleotide polymorphisms; loss of heterozygosity; aneuploidy; genome profiling; haplotype mapping
Hybridization, a common process in nature, can give rise to a vast reservoir of allelic variants. Combination of these allelic variants may result in novel patterns of gene action and is thought to contribute to heterosis. In this study, we analyzed genome-wide allele-specific gene expression (ASGE) in the super-hybrid rice variety Xieyou9308 using RNA sequencing technology (RNA-Seq). We identified 9325 reliable single nucleotide polymorphisms (SNPs) distributed throughout the genome. Nearly 68% of the identified polymorphisms were CT and GA SNPs between R9308 and Xieqingzao B, suggesting the existence of DNA methylation, a heritable epigenetic mark, in the parents and their F1 hybrid. Of 2793 identified transcripts with consistent allelic biases, only 480 (17%) showed significant allelic biases during tillering and/or heading stages, implying that trans effects may mediate most transcriptional differences in hybrid offspring. Approximately 67% and 62% of the 480 transcripts showed R9308 allelic expression biases at tillering and heading stages, respectively. Transcripts with higher levels of gene expression in R9308 also exhibited R9308 allelic biases in the hybrid. In addition, 125 transcripts were identified with significant allelic expression biases at both stages, of which 74% showed R9308 allelic expression biases. R9308 alleles may tend to preserve their characteristic states of activity in the hybrid and may play important roles in hybrid vigor at both stages. The allelic expression of 355 transcripts was highly stage-specific, with divergent allelic expression patterns observed at different developmental stages. Many transcripts associated with stress resistance were differently regulated in the F1 hybrid. The results of this study may provide valuable insights into molecular mechanisms of heterosis.
The p73 protein, a paralogue of the p53 tumor suppressor, is essential for normal development and survival of neurons. TP73 is therefore of interest as a candidate gene for Alzheimer's disease (AD) susceptibility. TP73 mRNA is transcribed from three promoters, termed P1 – P3, and there is evidence for an additional complexity in its regulation, namely, a variable allelic expression bias in some human tissues.
We utilized RT-PCR/RFLP and direct cDNA sequencing to measure allele-specific expression of TP73 mRNA, SNP genotyping to assess genetic associations with AD, and promoter-reporter assays to assess allele-specific TP73 promoter activity.
Using a coding-neutral BanI polymorphism in TP73 exon 5 as an allelic marker, we found a pronounced allelic expression bias in one adult brain hippocampus, while 3 other brains (two adult; one fetal) showed approximately equal expression from both alleles. In a tri-ethnic elderly population of African-Americans, Caribbean Hispanics and Caucasians, a G/A single nucleotide polymorphism (SNP) at -386 in the TP73 P3 promoter was weakly but significantly associated with AD (crude O.R. for AD given any -386G allele 1.7; C.I. 1.2–2.5; after adjusting for age and education O.R. 1.5; C.I. 1.1–2.3, N= 1191). The frequency of the -386G allele varied by ethnicity and was highest in African-Americans and lowest in Caucasians. No significant differences in basal P3 promoter activity were detected comparing -386G vs. -386A promoter-luciferase constructs in human SK-NSH-N neuroblastoma cells.
There is a reproducible allelic expression bias in mRNA expression from the TP73 gene in some, though not all, adult human brains, and inter-individual variation in regulatory sequences of the TP73 locus may affect susceptibility to AD. However, additional studies will be necessary to exclude genetic admixture as an alternative explanation for the observed associations.
Analysis of allelic variation for relevant genes and monitoring chromosome segment transmission during selection are important approaches in plant breeding and ecology. To minimize the number of required molecular markers for this purpose is crucial due to cost and time constraints. To date, software for identification of the minimum number of required markers has been optimized for human genetics and is only partly matching the needs of plant scientists and breeders. In addition, different software packages with insufficient interoperability need to be combined to extract this information from available allele sequence data, resulting in an error-prone multi-step process of data handling.
PolyMin, a computer program combining the detection of a minimum set of single nucleotide polymorphisms (SNPs) and/or insertions/deletions (INDELs) necessary for allele differentiation with the subsequent genotype differentiation in plant populations has been developed. Its efficiency in finding minimum sets of polymorphisms is comparable to other available program packages.
A computer program detecting the minimum number of SNPs for haplotype discrimination and subsequent genotype differentiation has been developed, and its performance compared to other relevant software. The main advantages of PolyMin, especially for plant scientists, is the integration of procedures from sequence analysis to polymorphism selection within a single program, including both haplotype and genotype differentiation.
An examination of gene expression in diploids may not always be sufficient for determination of the dominant or recessive character of an allele. In Saccharomyces cerevisiae resistance to cryptopleurine has been attributed to a single recessive nuclear gene, cryl, located on chromosome III. We found, contrary to expectations, that resistance to cryptopleurine is not expressed in diploids that are monosomic for chromosome III. Examination of strains of different ploidy on gradient plates shows that the presence of the sensitive allele in a cell does not affect the level of resistance, but rather the level of resistance is directly related to the ratio of resistant alleles to the number of chromosome sets.
Genomic hybridization platforms, including BAC-CGH and genotyping arrays, have been used to estimate chromosome copy number (CN) in tumor samples by detecting the relative strength of genomic signal. The methods rely on the assumption that the predominant chromosomal background of the samples is diploid, an assumption that is frequently incorrect for tumor samples. In addition to generally greater resolution, an advantage of genotyping arrays over CGH arrays is the ability to detect signals from individual alleles, allowing estimation of loss-of-heterozygosity (LOH) and allelic ratios to enhance the interpretation of copy number alterations. Copy number events associated with LOH potentially have the same genetic consequences as deletions.
We have utilized allelic ratios to detect patterns that are indicative of higher ploidy levels. An integrated analysis using allelic ratios, total signal and LOH indicates that many or most of the chromosomes from 24 glioblastoma tumors are in fact aneuploid. Some putative whole-chromosome losses actually represent trisomy, and many apparent sub-chromosomal losses are in fact relative losses against a triploid or tetraploid background.
These results suggest a re-interpretation of previous findings based only on total signal ratios. One interesting observation is that many single or multiple-copy deletions occur at common putative tumor suppressor sites subsequent to chromosomal duplication; these losses do not necessarily result in LOH, but nonetheless occur in conspicuous patterns. The 500 K Mapping array was also capable of detecting many sub-mega base losses and gains that were overlooked by CGH-BAC arrays, and was superior to CGH-BAC arrays in resolving regions of complex CN variation.
Allele-specific DNA methylation, histone acetylation and histone methylation are recognized as epigenetic characteristics of imprinted genes and imprinting centers (ICs). These epigenetic modifications are also used to regulate tissue-specific gene expression. Epigenetic differences between alleles can be significant either in the function of the IC or in the cis-acting effect of the IC on ‘target’ genes responding to it. We have now examined the epigenetic characteristics of NDN, a target gene of the chromosome 15q11–q13 Prader–Willi Syndrome IC, using sodium bisulfite sequencing to analyze DNA methylation and chromatin immunoprecipitation to analyze histone modifications. We observed a bias towards maternal allele-specific DNA hypermethylation of the promoter CpG island of NDN, independent of tissue-specific transcriptional activity. We also found that NDN lies in a domain of paternal allele-specific histone hyperacetylation that correlates with transcriptional state, and a domain of differential histone H3 lysine 4 di- and tri-methylation that persists independent of transcription. These results suggest that DNA methylation and histone H3 lysine 4 methylation are persistent markers of imprinted gene regulation while histone acetylation participates in tissue-specific activity and silencing in somatic cells.
Multiple hybridization events gave rise to pentaploid dogroses which can reproduce sexually despite their uneven ploidy level by the unique canina meiosis. Two homologous chromosome sets are involved in bivalent formation and are transmitted by the haploid pollen grains and the tetraploid egg cells. In addition the egg cells contain three sets of univalent chromosomes which are excluded from recombination. In this study we investigated whether differential behavior of chromosomes as bivalents or univalents is reflected by sequence divergence or transcription intensity between homeologous alleles of two single copy genes (LEAFY, cGAPDH) and one ribosomal DNA locus (nrITS).
We detected a maximum number of four different alleles of all investigated loci in pentaploid dogroses and identified the respective allele with two copies, which is presumably located on bivalent forming chromosomes. For the alleles of the ribosomal DNA locus and cGAPDH only slight, if any, differential transcription was determined, whereas the LEAFY alleles with one copy were found to be significantly stronger expressed than the LEAFY allele with two copies. Moreover, we found for the three marker genes that all alleles have been under similar regimes of purifying selection.
Analyses of both molecular sequence evolution and expression patterns did not support the hypothesis that unique alleles probably located on non-recombining chromosomes are less functional than duplicate alleles presumably located on recombining chromosomes.