To investigate the underlying phenotypic constructs in autism spectrum disorders (ASD) and to identify genetic loci that are linked to these empirically derived factors.
Exploratory factor analysis was applied to two datasets with 28 selected Autism Diagnostic Interview-Revised (ADI-R) algorithm items. The first dataset was from the Autism Genome Project (AGP) phase I (1,236 ASD subjects from 618 families); the second was from the AGP phase II (804 unrelated ASD subjects). Variables derived from the factor analysis were then used as quantitative traits in genome-wide variance components linkage analyses.
Six factors, joint attention, social interaction and communication, non-verbal communication, repetitive sensory-motor behaviour, peer interaction, and compulsion/restricted interests, were retained for both datasets. There was good agreement between the factor loading patterns from the two datasets. All factors showed familial aggregation. Suggestive evidence for linkage was obtained for the joint attention factor on 11q23. Genome-wide significant evidence for linkage was obtained for the repetitive sensory-motor behaviour factor on 19q13.3.
This study demonstrates that the underlying phenotypic constructs based on the ADI-R algorithm items are replicable in independent datasets; and the empirically derived factors are suitable and informative in genetic studies of ASD.
autism; ADI-R; factor analysis; linkage analysis; quantitative trait
We describe a recombinant inbred line (RIL) population of 161 F5 genotypes for the widest euploid cross that can be made to cultivated sorghum (Sorghum bicolor) using conventional techniques, S. bicolor × Sorghum propinquum, that segregates for many traits related to plant architecture, growth and development, reproduction, and life history. The genetic map of the S. bicolor × S. propinquum RILs contains 141 loci on 10 linkage groups collectively spanning 773.1 cM. Although the genetic map has DNA marker density well-suited to quantitative trait loci mapping and samples most of the genome, our previous observations that sorghum pericentromeric heterochromatin is recalcitrant to recombination is highlighted by the finding that the vast majority of recombination in sorghum is concentrated in small regions of euchromatin that are distal to most chromosomes. The advancement of the RIL population in an environment to which the S. bicolor parent was well adapted (indeed bred for) but the S. propinquum parent was not largely eliminated an allele for short-day flowering that confounded many other traits, for example, permitting us to map new quantitative trait loci for flowering that previously eluded detection. Additional recombination that has accrued in the development of this RIL population also may have improved resolution of apices of heterozygote excess, accounting for their greater abundance in the F5 than the F2 generation. The S. bicolor × S. propinquum RIL population offers advantages over early-generation populations that will shed new light on genetic, environmental, and physiological/biochemical factors that regulate plant growth and development.
quantitative trait locus; simple-sequence repeat; DNA marker; recombination; segregation distortion
Genome duplication (GD) has permanently shaped the architecture and function of many higher eukaryotic genomes. The angiosperms (flowering plants) are outstanding models in which to elucidate consequences of GD for higher eukaryotes, owing to their propensity for chromosomal duplication or even triplication in a few cases. Duplicated genome structures often require both intra- and inter-genome alignments to unravel their evolutionary history, also providing the means to deduce both obvious and otherwise-cryptic orthology, paralogy and other relationships among genes. The burgeoning sets of angiosperm genome sequences provide the foundation for a host of investigations into the functional and evolutionary consequences of gene and GD. To provide genome alignments from a single resource based on uniform standards that have been validated by empirical studies, we built the Plant Genome Duplication Database (PGDD; freely available at http://chibba.agtec.uga.edu/duplication/), a web service providing synteny information in terms of colinearity between chromosomes. At present, PGDD contains data for 26 plants including bryophytes and chlorophyta, as well as angiosperms with draft genome sequences. In addition to the inclusion of new genomes as they become available, we are preparing new functions to enhance PGDD.
Percent mammographic breast density (PMD) is a strong heritable risk factor for breast cancer. However, the pathways through which this risk is mediated are still unclear. To explore whether PMD and breast cancer have a shared genetic basis, we identified genetic variants most strongly associated with PMD in a published meta-analysis of five genome-wide association studies (GWAS) and used these to construct risk scores for 3628 breast cancer cases and 5190 controls from the UK2 GWAS of breast cancer. The signed per-allele effect estimates of SNPs were multiplied with the respective allele counts in the individual and summed over all SNPs to derive the risk score for an individual. These scores were included as the exposure variable in a logistic regression model with breast cancer case-control status as the outcome. This analysis was repeated using ten different cut-offs for the most significant density SNPs (1-10% representing 5,222-50,899 SNPs). Permutation analysis was also performed across all 10 cut-offs. The association between risk score and breast cancer was significant for all cut-offs from 3-10% of top density SNPs, being most significant for the 6% (2-sided P=0.002) to 10% (P=0.001) cut-offs (overall permutation P=0.003). Women in the top 10% of the risk score distribution had a 31% increased risk of breast cancer [OR= 1.31 (95%CI 1.08-1.59)] compared to women in the bottom 10%. Together, our results demonstrate that PMD and breast cancer have a shared genetic basis that is mediated through a large number of common variants.
breast cancer; mammographic density; SNPs; polygenic; Mendelian Randomisation
Background & Aims
RAC1 is a GTPase that has an evolutionarily conserved role in coordinating immune defenses, from plants to mammals. Chronic inflammatory bowel diseases (IBD) are associated with dysregulation of immune defenses. We studied the role of RAC1 in IBD using human genetic and functional studies and animal models of colitis.
We used a candidate gene approach to HapMap-Tag single nucleotide polymorphisms (SNPs) in a discovery cohort; findings were confirmed in 2 additional cohorts. RAC1 mRNA expression was examined from peripheral blood cells of patients. Colitis was induced in mice with conditional disruption of Rac1 in phagocytes by administration of dextran sulphate sodium (DSS).
We observed a genetic association between RAC1 with ulcerative colitis (UC) in a discovery cohort, 2 independent replication cohorts, and in combined analysis for the SNPs rs10951982 (Pcombined UC = 3.3 × 10–8, odds ratio [OR]=1.43 [1.26–1.63]) and rs4720672 (Pcombined UC=4.7 × 10–6, OR=1.36 [1.19–1.58]). Patients with IBD who had the rs10951982 risk allele had increased expression of RAC1, compared to those without this allele. Conditional disruption of Rac1 in macrophage and neutrophils of mice protected them against DSS-induced colitis.
Studies of human tissue samples and knockout mice demonstrated a role for the GTPase RAC1 in the development of UC; increased expression of RAC1 was associated with susceptibility to colitis.
innate immunity; Crohn's disease; CD; Rac-1 knockout
Papaya is a major fruit crop in tropical and subtropical regions worldwide. It is trioecious with three sex forms: male, female, and hermaphrodite. Sex determination is controlled by a pair of nascent sex chromosomes with two slightly different Y chromosomes, Y for male and Yh for hermaphrodite. The sex chromosome genotypes are XY (male), XYh (hermaphrodite), and XX (female). The papaya hermaphrodite-specific Yh chromosome region (HSY) is pericentromeric and heterochromatic. Physical mapping of HSY and its X counterpart is essential for sequencing these regions and uncovering the early events of sex chromosome evolution and to identify the sex determination genes for crop improvement.
A reiterate chromosome walking strategy was applied to construct the two physical maps with three bacterial artificial chromosome (BAC) libraries. The HSY physical map consists of 68 overlapped BACs on the minimum tiling path, and covers all four HSY-specific Knobs. One gap remained in the region of Knob 1, the only knob structure shared between HSY and X, due to the lack of HSY-specific sequences. This gap was filled on the physical map of the HSY corresponding region in the X chromosome. The X physical map consists of 44 BACs on the minimum tiling path with one gap remaining in the middle, due to the nature of highly repetitive sequences. This gap was filled on the HSY physical map. The borders of the non-recombining HSY were defined genetically by fine mapping using 1460 F2 individuals. The genetically defined HSY spanned approximately 8.5 Mb, whereas its X counterpart extended about 5.4 Mb including a 900 Kb region containing the Knob 1 shared by the HSY and X. The 8.5 Mb HSY corresponds to 4.5 Mb of its X counterpart, showing 4 Mb (89%) DNA sequence expansion.
The 89% increase of DNA sequence in HSY indicates rapid expansion of the Yh chromosome after genetic recombination was suppressed 2–3 million years ago. The genetically defined borders coincide with the common BACs on the minimum tiling paths of HSY and X. The minimum tiling paths of HSY and its X counterpart are being used for sequencing these X and Yh-specific regions.
Bacterial artificial chromosome (BAC); Carica papaya; Sex chromosomes; Sex determination; Suppression of recombination
Photoperiod-sensitive flowering is a key adaptive trait for sorghum (Sorghum bicolor) in West and Central Africa. In this study we performed an association analysis to investigate the effect of polymorphisms within the genes putatively related to variation in flowering time on photoperiod-sensitive flowering in sorghum. For this purpose a genetically characterized panel of 219 sorghum accessions from West and Central Africa was evaluated for their photoperiod response index (PRI) based on two sowing dates under field conditions.
Sorghum accessions used in our study were genotyped for single nucleotide polymorphisms (SNPs) in six genes putatively involved in the photoperiodic control of flowering time. Applying a mixed model approach and previously-determined population structure parameters to these candidate genes, we found significant associations between several SNPs with PRI for the genes CRYPTOCHROME 1 (CRY1-b1) and GIGANTEA (GI).
The negative values of Tajima's D, found for the genes of our study, suggested that purifying selection has acted on genes involved in photoperiodic control of flowering time in sorghum. The SNP markers of our study that showed significant associations with PRI can be used to create functional markers to serve as important tools for marker-assisted selection of photoperiod-sensitive cultivars in sorghum.
MCScan is an algorithm able to scan multiple genomes or subgenomes in order to identify putative homologous chromosomal regions, and align these regions using genes as anchors. The MCScanX toolkit implements an adjusted MCScan algorithm for detection of synteny and collinearity that extends the original software by incorporating 14 utility programs for visualization of results and additional downstream analyses. Applications of MCScanX to several sequenced plant genomes and gene families are shown as examples. MCScanX can be used to effectively analyze chromosome structural changes, and reveal the history of gene family expansions that might contribute to the adaptation of lineages and taxa. An integrated view of various modes of gene duplication can supplement the traditional gene tree analysis in specific families. The source code and documentation of MCScanX are freely available at http://chibba.pgml.uga.edu/mcscan2/.
We summarize the contributions of Group 9 of Genetic Analysis Workshop 17. This group addressed the problems of linkage disequilibrium and other longer range forms of allelic association when evaluating the effects of genotypes on phenotypes. Issues raised by long-range associations, whether a result of selection, stratification, possible technical errors, or chance, were less expected but proved to be important. Most contributors focused on regression methods of various types to illustrate problematic issues or to develop adaptations for dealing with high-density genotype assays. Study design was also considered, as was graphical modeling. Although no method emerged as uniformly successful, most succeeded in reducing false-positive results either by considering clusters of loci within genes or by applying smoothing metrics that required results from adjacent loci to be similar. Two unexpected results that questioned our assumptions of what is required to model linkage disequilibrium were observed. The first was that correlations between loci separated by large genetic distances can greatly inflate single-locus test statistics, and, whether the result of selection, stratification, possible technical errors, or chance, these correlations seem overabundant. The second unexpected result was that applying principal components analysis to genome-wide genotype data can apparently control not only for population structure but also for linkage disequilibrium.
score tests; two-stage study designs; robust regression; higher criticism; principal components analysis; graphical modeling
Fossil records indicate that life appeared in marine environments ∼3.5 billion years ago (Gyr) and transitioned to terrestrial ecosystems nearly 2.5 Gyr. Sequence analysis suggests that “hydrobacteria” and “terrabacteria” might have diverged as early as 3 Gyr. Bacteria of the genus Azospirillum are associated with roots of terrestrial plants; however, virtually all their close relatives are aquatic. We obtained genome sequences of two Azospirillum species and analyzed their gene origins. While most Azospirillum house-keeping genes have orthologs in its close aquatic relatives, this lineage has obtained nearly half of its genome from terrestrial organisms. The majority of genes encoding functions critical for association with plants are among horizontally transferred genes. Our results show that transition of some aquatic bacteria to terrestrial habitats occurred much later than the suggested initial divergence of hydro- and terrabacterial clades. The birth of the genus Azospirillum approximately coincided with the emergence of vascular plants on land.
Genome sequencing and analysis of plant-associated beneficial soil bacteria Azospirillum spp. reveals that these organisms transitioned from aquatic to terrestrial environments significantly later than the suggested major Precambrian divergence of aquatic and terrestrial bacteria. Separation of Azospirillum from their close aquatic relatives coincided with the emergence of vascular plants on land. Nearly half of the Azospirillum genome has been acquired horizontally, from distantly related terrestrial bacteria. The majority of horizontally acquired genes encode functions that are critical for adaptation to the rhizosphere and interaction with host plants.
Mammographic breast density is a highly heritable (h2 > 0.6) and strong risk factor for breast cancer. We conducted a genome-wide linkage study to identify loci influencing mammographic breast density (MD).
Epidemiological data were assembled on 1,415 families from the Australia, Northern California and Ontario sites of the Breast Cancer Family Registry, and additional families recruited in Australia and Ontario. Families consisted of sister pairs with age-matched mammograms and data on factors known to influence MD. Single nucleotide polymorphism (SNP) genotyping was performed on 3,952 individuals using the Illumina Infinium 6K linkage panel.
Using a variance components method, genome-wide linkage analysis was performed using quantitative traits obtained by adjusting MD measurements for known covariates. Our primary trait was formed by fitting a linear model to the square root of the percentage of the breast area that was dense (PMD), adjusting for age at mammogram, number of live births, menopausal status, weight, height, weight squared, and menopausal hormone therapy. The maximum logarithm of odds (LOD) score from the genome-wide scan was on chromosome 7p14.1-p13 (LOD = 2.69; 63.5 cM) for covariate-adjusted PMD, with a 1-LOD interval spanning 8.6 cM. A similar signal was seen for the covariate adjusted area of the breast that was dense (DA) phenotype. Simulations showed that the complete sample had adequate power to detect LOD scores of 3 or 3.5 for a locus accounting for 20% of phenotypic variance. A modest peak initially seen on chromosome 7q32.3-q34 increased in strength when only the 513 families with at least two sisters below 50 years of age were included in the analysis (LOD 3.2; 140.7 cM, 1-LOD interval spanning 9.6 cM). In a subgroup analysis, we also found a LOD score of 3.3 for DA phenotype on chromosome 12.11.22-q13.11 (60.8 cM, 1-LOD interval spanning 9.3 cM), overlapping a region identified in a previous study.
The suggestive peaks and the larger linkage signal seen in the subset of pedigrees with younger participants highlight regions of interest for further study to identify genes that determine MD, with the goal of understanding mammographic density and its involvement in susceptibility to breast cancer.
Both single gene and whole genome duplications (WGD) have recurred in angiosperm evolution. However, the evolutionary effects of different modes of gene duplication, especially regarding their contributions to genetic novelty or redundancy, have been inadequately explored.
In Arabidopsis thaliana and Oryza sativa (rice), species that deeply sample botanical diversity and for which expression data are available from a wide range of tissues and physiological conditions, we have compared expression divergence between genes duplicated by six different mechanisms (WGD, tandem, proximal, DNA based transposed, retrotransposed and dispersed), and between positional orthologs. Both neo-functionalization and genetic redundancy appear to contribute to retention of duplicate genes. Genes resulting from WGD and tandem duplications diverge slowest in both coding sequences and gene expression, and contribute most to genetic redundancy, while other duplication modes contribute more to evolutionary novelty. WGD duplicates may more frequently be retained due to dosage amplification, while inferred transposon mediated gene duplications tend to reduce gene expression levels. The extent of expression divergence between duplicates is discernibly related to duplication modes, different WGD events, amino acid divergence, and putatively neutral divergence (time), but the contribution of each factor is heterogeneous among duplication modes. Gene loss may retard inter-species expression divergence. Members of different gene families may have non-random patterns of origin that are similar in Arabidopsis and rice, suggesting the action of pan-taxon principles of molecular evolution.
Gene duplication modes differ in contribution to genetic novelty and redundancy, but show some parallels in taxa separated by hundreds of millions of years of evolution.
We describe the construction of a BAC contig and identification of a minimal tiling path that encompass the dominant and monogenically inherited downy mildew resistance locus Pp523 of Brassica oleracea L. The selection of BAC clones for construction of the physical map was carried out by screening gridded BAC libraries with DNA overgo probes derived from both genetically mapped DNA markers flanking the locus of interest and BAC-end sequences that align to Arabidopsis thaliana sequences within the previously identified syntenic region. The selected BAC clones consistently mapped to three different genomic regions of B. oleracea. Although 83 BAC clones were accurately mapped within a ∼4.6 cM region surrounding the downy mildew resistance locus Pp523, a subset of 33 BAC clones mapped to another region on chromosome C8 that was ∼60 cM away from the resistance gene, and a subset of 63 BAC clones mapped to chromosome C5. These results reflect the triplication of the Brassica genomes since their divergence from a common ancestor shared with A. thaliana, and they are consonant with recent analyses of the C genome of Brassica napus. The assembly of a minimal tiling path constituted by 13 (BoT01) BAC clones that span the Pp523 locus sets the stage for map-based cloning of this resistance gene.
genetic resistance; plant disease resistance; map-based cloning; BAC contig; genome triplication
Genetic Analysis Workshop 17 (GAW17) provided a platform for evaluating existing statistical genetic methods and for developing novel methods to analyze rare variants that modulate complex traits. In this article, we present an overview of the 1000 Genomes Project exome data and simulated phenotype data that were distributed to GAW17 participants for analyses, the different issues addressed by the participants, and the process of preparation of manuscripts resulting from the discussions during the workshop.
Pathway-based analysis has been recently used in joint tests of association between disease and a group of common genetic variants. Here we explore this idea for the joint effects analysis of rare genetic variants and their association with quantitative traits and disease. We accumulate multiple rare minor alleles in a genetic risk score for each individual in a given pathway; this score is then used to assess association with quantitative phenotypes and disease. We demonstrate that this approach may be better than studying single rare variants or a gene risk score for identifying individuals with significantly greater risk.
Evolution of the Brassica species has been recursively affected by polyploidy events, and comparison to their relative, Arabidopsis thaliana, provides means to explore their genomic complexity.
A genome-wide physical map of a rapid-cycling strain of B. oleracea was constructed by integrating high-information-content fingerprinting (HICF) of Bacterial Artificial Chromosome (BAC) clones with hybridization to sequence-tagged probes. Using 2907 contigs of two or more BACs, we performed several lines of comparative genomic analysis. Interspecific DNA synteny is much better preserved in euchromatin than heterochromatin, showing the qualitative difference in evolution of these respective genomic domains. About 67% of contigs can be aligned to the Arabidopsis genome, with 96.5% corresponding to euchromatic regions, and 3.5% (shown to contain repetitive sequences) to pericentromeric regions. Overgo probe hybridization data showed that contigs aligned to Arabidopsis euchromatin contain ~80% of low-copy-number genes, while genes with high copy number are much more frequently associated with pericentromeric regions. We identified 39 interchromosomal breakpoints during the diversification of B. oleracea and Arabidopsis thaliana, a relatively high level of genomic change since their divergence. Comparison of the B. oleracea physical map with Arabidopsis and other available eudicot genomes showed appreciable 'shadowing' produced by more ancient polyploidies, resulting in a web of relatedness among contigs which increased genomic complexity.
A high-resolution genetically-anchored physical map sheds light on Brassica genome organization and advances positional cloning of specific genes, and may help to validate genome sequence assembly and alignment to chromosomes.
All the physical mapping data is freely shared at a WebFPC site (http://lulu.pgml.uga.edu/fpc/WebAGCoL/brassica/WebFPC/; Temporarily password-protected: account: pgml; password: 123qwe123.
Comparative genomics; polyploidy; Arabidopsis thaliana
Alternative splicing (AS) of pre-mRNA is a fundamental molecular process that generates diversity in the transcriptome and proteome of eukaryotic organisms. SR proteins, a family of splicing regulators with one or two RNA recognition motifs (RRMs) at the N-terminus and an arg/ser-rich domain at the C-terminus, function in both constitutive and alternative splicing. We identified SR proteins in 27 eukaryotic species, which include plants, animals, fungi and “basal” eukaryotes that lie outside of these lineages. Using RNA recognition motifs (RRMs) as a phylogenetic marker, we classified 272 SR genes into robust sub-families. The SR gene family can be split into five major groupings, which can be further separated into 11 distinct sub-families. Most flowering plants have double or nearly double the number of SR genes found in vertebrates. The majority of plant SR genes are under purifying selection. Moreover, in all paralogous SR genes in Arabidopsis, rice, soybean and maize, one of the two paralogs is preferentially expressed throughout plant development. We also assessed the extent of AS in SR genes based on a splice graph approach (http://combi.cs.colostate.edu/as/gmap_SRgenes). AS of SR genes is a widespread phenomenon throughout multiple lineages, with alternative 3′ or 5′ splicing events being the most prominent type of event. However, plant-enriched sub-families have 57%–88% of their SR genes experiencing some type of AS compared to the 40%–54% seen in other sub-families. The SR gene family is pervasive throughout multiple eukaryotic lineages, conserved in sequence and domain organization, but differs in gene number across lineages with an abundance of SR genes in flowering plants. The higher number of alternatively spliced SR genes in plants emphasizes the importance of AS in generating splice variants in these organisms.
High percent mammographic density adjusted for age and body mass index (BMI) is one of the strongest risk factors for breast cancer. We conducted a meta-analysis of five genome-wide association studies of percent mammographic density and report an association with rs10995190 in ZNF365 (combined P=9×6·10−10). This finding might partly explain the underlying biology of the recently discovered association between common variants in ZNF365 and breast cancer risk.
Recombination in the family Coronaviridae has been well documented and is thought to be a contributing factor in the emergence and evolution of different coronaviral genotypes as well as different species of coronavirus. However, there are limited data available on the frequency and extent of recombination in coronaviruses in nature and particularly for the avian gamma-coronaviruses where only recently the emergence of a turkey coronavirus has been attributed solely to recombination. In this study, the full-length genomes of eight avian gamma-coronavirus infectious bronchitis virus (IBV) isolates were sequenced and along with other full-length IBV genomes available from GenBank were analyzed for recombination. Evidence of recombination was found in every sequence analyzed and was distributed throughout the entire genome. Areas that have the highest occurrence of recombination are located in regions of the genome that code for nonstructural proteins 2, 3 and 16, and the structural spike glycoprotein. The extent of the recombination observed, suggests that this may be one of the principal mechanisms for generating genetic and antigenic diversity within IBV. These data indicate that reticulate evolutionary change due to recombination in IBV, likely plays a major role in the origin and adaptation of the virus leading to new genetic types and strains of the virus.
gamma coronavirus; avian coronavirus; infectious bronchitis virus; genome; recombination
The complex inheritance of resistance to Cercospora leaf spot (CLS), the most severe fungal foliar disease in sugar beet, was investigated by means of quantitative trait loci (QTL) analysis. Over a three year period, recombinant inbred lines (RILs) of sugar beet (Beta vulgaris L.), generated through a cross between lines resistant (‘NK-310mm-O’) and susceptible (‘NK-184mm-O’) to CLS, were field-tested for their resistance to the pathogen. Composite interval mapping (CIM) showed four QTL involved in CLS resistance to be consistently detected. Two resistant QTL (qcr1 on chromosome III, qcr4 on chromosome IX) bearing ‘NK-310mm-O’ derived alleles promoted resistance. Across 11 investigations, the qcr1 and qcr4 QTL explained approximately 10% and over 20%, respectively, of the variance in the resistance index. Two further QTL (qcr2 on chromosome IV, qcr3 on chromosome VI) bearing ‘NK-184mm-O’ derived alleles each explained about 10% of the variance. To identify the monogenic effect of the resistance, two QTL derived from ‘NK-310mm-O’ against the genetic background of ‘NK-184mm-O’, using molecular markers. The qcr1 and qcr4 were precisely mapped as single QTL, using progenies BC5F1 and BC2F1, respectively. The qcr1 that was located near e11m36-8 had CLS disease severity indices (DSI) about 15% lower than plants homozygous for the ‘NK-184mm-O’ genotype. As with qcr1, heterozygosis of the qcr4 that was located near e17m47-81 reduced DSI by about 45% compared to homozygosis. These two resistant QTL might be of particular value in marker-assisted selection (MAS) programs in CLS resistance progression.
Cercospora leaf spot; disease resistance; mapping; QTL; sugar beet
Elevated serum soluble E-selectin levels have been associated with a number of diseases. Although E-selectin levels are heritable, little is known about the specific genetic factors involved. E-selectin levels have been associated with the ABO blood group phenotype.
Methods and Results
We performed a high-resolution genome-wide association study of serum soluble E-selectin levels in 685 white individuals with type 1 diabetes from the Diabetes Control and Complications Trial (DCCT)/Epidemiology of Diabetes Intervention and Complications (EDIC) study to identify major loci influencing levels. Highly significant evidence for association (P=10−29) was observed for rs579459 near the ABO blood group gene, accounting for 19% of the variance in E-selectin levels. Levels of E-selectin were higher in O/O than O/A heterozygotes, which were likewise higher than A/A genotypes. Analysis of subgroups of A alleles reveals heterogeneity in the association, and even after this was accounted for, an intron 1 SNP remained significantly associated. We replicate the ABO association in nondiabetic individuals.
ABO is a major locus for serum soluble E-selectin levels. We excluded population stratification, fine-mapped the association to sub-A alleles, and also document association with additional variation in the ABO region.
E-selectin; ABO blood group; genome-wide association; SNP
In a recently published study,1 both genome-level and local-level comparison between cotton and grape genomes provided new evidence that showed cotton to be an ancient polyploid. In particular, the gene loss pattern in the local-level sequence comparison across four different species (also including Arabidopsis and papaya) showed that roughly half the ancestral genes have been lost in the cotton lineage, resembling the effect of diploidization after a genome duplication event. We also observed that the lost homologues of duplicated gene pairs were mostly physically removed, rather than merely pseudogenized. In this letter to the editor, we continue to explore the possible implications of our observations with new data and further analysis.
genome duplication; gene loss; gene density; genome size; Gossypium; Vitis; Glycine; Populus
Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome.
Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella.
When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution.
It is difficult to accurately interpret chromosomal correspondences such as true orthology and paralogy due to significant divergence of genomes from a common ancestor. Analyses are particularly problematic among lineages that have repeatedly experienced whole genome duplication (WGD) events. To compare multiple "subgenomes" derived from genome duplications, we need to relax the traditional requirements of "one-to-one" syntenic matchings of genomic regions in order to reflect "one-to-many" or more generally "many-to-many" matchings. However this relaxation may result in the identification of synteny blocks that are derived from ancient shared WGDs that are not of interest. For many downstream analyses, we need to eliminate weak, low scoring alignments from pairwise genome comparisons. Our goal is to objectively select subset of synteny blocks whose total scores are maximized while respecting the duplication history of the genomes in comparison. We call this "quota-based" screening of synteny blocks in order to appropriately fill a quota of syntenic relationships within one genome or between two genomes having WGD events.
We have formulated the synteny block screening as an optimization problem known as "Binary Integer Programming" (BIP), which is solved using existing linear programming solvers. The computer program QUOTA-ALIGN performs this task by creating a clear objective function that maximizes the compatible set of synteny blocks under given constraints on overlaps and depths (corresponding to the duplication history in respective genomes). Such a procedure is useful for any pairwise synteny alignments, but is most useful in lineages affected by multiple WGDs, like plants or fish lineages. For example, there should be a 1:2 ploidy relationship between genome A and B if genome B had an independent WGD subsequent to the divergence of the two genomes. We show through simulations and real examples using plant genomes in the rosid superorder that the quota-based screening can eliminate ambiguous synteny blocks and focus on specific genomic evolutionary events, like the divergence of lineages (in cross-species comparisons) and the most recent WGD (in self comparisons).
The QUOTA-ALIGN algorithm screens a set of synteny blocks to retain only those compatible with a user specified ploidy relationship between two genomes. These blocks, in turn, may be used for additional downstream analyses such as identifying true orthologous regions in interspecific comparisons. There are two major contributions of QUOTA-ALIGN: 1) reducing the block screening task to a BIP problem, which is novel; 2) providing an efficient software pipeline starting from all-against-all BLAST to the screened synteny blocks with dot plot visualizations. Python codes and full documentations are publicly available http://github.com/tanghaibao/quota-alignment. QUOTA-ALIGN program is also integrated as a major component in SynMap http://genomevolution.com/CoGe/SynMap.pl, offering easier access to thousands of genomes for non-programmers.
The papers in Genetic Analysis Workshop 16 Group 7 covered a wide range of topics. The effects of confounder misclassification and selection bias on association results were examined by one group. Another focused on bias introduced by various methods of accounting for treatment effects. Two groups used related methods to derive phenotypic traits. They used different analytic strategies for genetic associations with non-overlapping results (but because they used different sets of single-nucleotide polymorphisms and significance criteria, this is not surprising). Another group relied on the well characterized definition of type 2 diabetes to show benefits of a novel predictive test. Transmission-ratio distortion was the focus of another paper. The results were extended to show a potential secondary benefit of the test to identify potentially mis-called single-nucleotide polymorphisms.
Genetic Analysis Workshop 16; association analysis; confounder misclassification; selection bias; optimal robust ROC; structural equation modelling; treatment adjustment; empirically derived phenotypes; transmission disequilibrium; transmission distortion; candidate genes; genome-wide association