The occurrence of polyploidy in land plant evolution has led to an acceleration of genome modifications relative to other crown eukaryotes and is correlated with key innovations in plant evolution. Extensive genome resources provide for relating genomic changes to the origins of novel morphological and physiological features of plants. Ancestral gene contents for key nodes of the plant family tree are inferred. Pervasive polyploidy in angiosperms appears likely to be the major factor generating novel angiosperm genes and expanding some gene families. However, most gene families lose most duplicated copies in a quasi-neutral process, and a few families are actively selected for single-copy status. One of the great challenges of evolutionary genomics is to link genome modifications to speciation, diversification and the morphological and/or physiological innovations that collectively compose biodiversity. Rapid accumulation of genomic data and its ongoing investigation may greatly improve the resolution at which evolutionary approaches can contribute to the identification of specific genes responsible for particular innovations. The resulting, more ‘particulate’ understanding of plant evolution, may elevate to a new level fundamental knowledge of botanical diversity, including economically important traits in the crop plants that sustain humanity.
genome modification; ancestral gene content; polyploidy; gene family gain and loss
Typical data in a microbiome study consist of the operational taxonomic unit (OTU) counts that have the characteristic of excess zeros, which are often ignored by investigators. In this paper, we compare the performance of different competing methods to model data with zero inflated features through extensive simulations and application to a microbiome study. These methods include standard parametric and non-parametric models, hurdle models, and zero inflated models. We examine varying degrees of zero inflation, with or without dispersion in the count component, as well as different magnitude and direction of the covariate effect on structural zeros and the count components. We focus on the assessment of type I error, power to detect the overall covariate effect, measures of model fit, and bias and effectiveness of parameter estimations. We also evaluate the abilities of model selection strategies using Akaike information criterion (AIC) or Vuong test to identify the correct model. The simulation studies show that hurdle and zero inflated models have well controlled type I errors, higher power, better goodness of fit measures, and are more accurate and efficient in the parameter estimation. Besides that, the hurdle models have similar goodness of fit and parameter estimation for the count component as their corresponding zero inflated models. However, the estimation and interpretation of the parameters for the zero components differs, and hurdle models are more stable when structural zeros are absent. We then discuss the model selection strategy for zero inflated data and implement it in a gut microbiome study of > 400 independent subjects.
In silico models have recently been created in order to predict which genetic variants are more likely to contribute to the risk of a complex trait given their functional characteristics. However, there has been no comprehensive review as to which type of predictive accuracy measures and data visualization techniques are most useful for assessing these models.
We assessed the performance of the models for predicting risk using various methodologies, some of which include: receiver operating characteristic (ROC) curves, histograms of classification probability, and the novel use of the quantile-quantile plot. These measures have variable interpretability depending on factors such as whether the dataset is balanced in terms of numbers of genetic variants classified as risk variants versus those that are not.
We conclude that the area under the curve (AUC) is a suitable starting place, and for models with similar AUCs, violin plots are particularly useful for examining the distribution of the risk scores.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1616-z) contains supplementary material, which is available to authorized users.
Predictive accuracy; Genetic prediction; Receiver operating characteristic curve
Rare copy number variants (CNVs) disrupting ASTN2 or both ASTN2 and TRIM32 have been reported at 9q33.1 by genome-wide studies in a few individuals with neurodevelopmental disorders (NDDs). The vertebrate-specific astrotactins, ASTN2 and its paralog ASTN1, have key roles in glial-guided neuronal migration during brain development. To determine the prevalence of astrotactin mutations and delineate their associated phenotypic spectrum, we screened ASTN2/TRIM32 and ASTN1 (1q25.2) for exonic CNVs in clinical microarray data from 89 985 individuals across 10 sites, including 64 114 NDD subjects. In this clinical dataset, we identified 46 deletions and 12 duplications affecting ASTN2. Deletions of ASTN1 were much rarer. Deletions near the 3′ terminus of ASTN2, which would disrupt all transcript isoforms (a subset of these deletions also included TRIM32), were significantly enriched in the NDD subjects (P = 0.002) compared with 44 085 population-based controls. Frequent phenotypes observed in individuals with such deletions include autism spectrum disorder (ASD), attention deficit hyperactivity disorder (ADHD), speech delay, anxiety and obsessive compulsive disorder (OCD). The 3′-terminal ASTN2 deletions were significantly enriched compared with controls in males with NDDs, but not in females. Upon quantifying ASTN2 human brain RNA, we observed shorter isoforms expressed from an alternative transcription start site of recent evolutionary origin near the 3′ end. Spatiotemporal expression profiling in the human brain revealed consistently high ASTN1 expression while ASTN2 expression peaked in the early embryonic neocortex and postnatal cerebellar cortex. Our findings shed new light on the role of the astrotactins in psychopathology and their interplay in human neurodevelopment.
Noroviruses are recognized worldwide as the principal cause of acute, non-bacterial gastroenteritis, resulting in 19-21 million cases of disease every year in the United States. Noroviruses have a very low infectious dose, a short incubation period, high resistance to traditional disinfection techniques and multiple modes of transmission, making early, point-of-care detection essential for controlling the spread of the disease. The traditional diagnostic tools, electron microscopy, RT-PCR and ELISA require sophisticated and expensive instrumentation, and are considered too laborious and slow to be useful during severe outbreaks. In this paper we describe the development of a new, rapid and sensitive lateral-flow assay using labeled phage particles for the detection of the prototypical norovirus GI.1 (Norwalk), with a limit of detection of 107 virus-like particles per mL, one hundred-fold lower than a conventional gold nanoparticle lateral-flow assay using the same antibody pair.
We assessed whether epigenetic histone posttranslational modifications are associated with the prolonged beneficial effects (metabolic memory) of intensive versus conventional therapy during the Diabetes Control and Complications Trial (DCCT) on the progression of microvascular outcomes in the long-term Epidemiology of Diabetes Interventions and Complications (EDIC) study. We performed chromatin immunoprecipitation linked to promoter tiling arrays to profile H3 lysine-9 acetylation (H3K9Ac), H3 lysine-4 trimethylation (H3K4Me3), and H3K9Me2 in blood monocytes and lymphocytes obtained from 30 DCCT conventional treatment group subjects (case subjects: mean DCCT HbA1c level >9.1% [76 mmol/mol] and progression of retinopathy or nephropathy by EDIC year 10 of follow-up) versus 30 DCCT intensive treatment subjects (control subjects: mean DCCT HbA1c level <7.3% [56 mmol/mol] and without progression of retinopathy or nephropathy). Monocytes from case subjects had statistically greater numbers of promoter regions with enrichment in H3K9Ac (active chromatin mark) compared with control subjects (P = 0.0096). Among the patients in the two groups combined, monocyte H3K9Ac was significantly associated with the mean HbA1c level during the DCCT and EDIC (each P < 2.2E-16). Of note, the top 38 case hyperacetylated promoters (P < 0.05) included >15 genes related to the nuclear factor-κB inflammatory pathway and were enriched in genes related to diabetes complications. These results suggest an association between HbA1c level and H3K9Ac, and a possible epigenetic explanation for metabolic memory in humans.
Mammographic density reflects the amount of stromal and epithelial tissues in relation to adipose tissue in the breast and is a strong risk factor for breast cancer. Here we report the results from meta-analysis of genome-wide association studies (GWAS) of three mammographic density phenotypes: dense area, non-dense area and percent density in up to 7,916 women in stage 1 and an additional 10,379 women in stage 2. We identify genome-wide significant (P<5×10−8) loci for dense area (AREG, ESR1, ZNF365, LSP1/TNNT3, IGF1, TMEM184B, SGSM3/MKL1), non-dense area (8p11.23) and percent density (PRDM6, 8p11.23, TMEM184B). Four of these regions are known breast cancer susceptibility loci, and four additional regions were found to be associated with breast cancer (P<0.05) in a large meta-analysis. These results provide further evidence of a shared genetic basis between mammographic density and breast cancer and illustrate the power of studying intermediate quantitative phenotypes to identify putative disease susceptibility loci.
Domestication has played an important role in shaping characteristics of the inflorescence and plant height in cultivated cereals. Taking advantage of meta-analysis of QTLs, phylogenetic analyses in 502 diverse sorghum accessions, GWAS in a sorghum association panel (n = 354) and comparative data, we provide insight into the genetic basis of the domestication traits in sorghum and rice.
We performed genome-wide association studies (GWAS) on 6 traits related to inflorescence morphology and 6 traits related to plant height in sorghum, comparing the genomic regions implicated in these traits by GWAS and QTL mapping, respectively. In a search for signatures of selection, we identify genomic regions that may contribute to sorghum domestication regarding plant height, flowering time and pericarp color. Comparative studies across taxa show functionally conserved ‘hotspots’ in sorghum and rice for awn presence and pericarp color that do not appear to reflect corresponding single genes but may indicate co-regulated clusters of genes. We also reveal homoeologous regions retaining similar functions for plant height and flowering time since genome duplication an estimated 70 million years ago or more in a common ancestor of cereals. In most such homoeologous QTL pairs, only one QTL interval exhibits strong selection signals in modern sorghum.
Intersections among QTL, GWAS and comparative data advance knowledge of genetic determinants of inflorescence and plant height components in sorghum, and add new dimensions to comparisons between sorghum and rice.
Electronic supplementary material
The online version of this article (doi:10.1186/s12870-015-0477-6) contains supplementary material, which is available to authorized users.
Sorghum; GWAS; Biparental QTL mapping; Inflorescence; Flowering time; Plant height; Domestication; Genetic correspondence
Seed size is closely related to fitness of wild plants, and its modification has been a key recurring element in domestication of seed/grain crops. In sorghum, a genomic and morphological model for panicoid cereals, a rich history of research into the genetics of seed size is reflected by a total of 13 likelihood intervals determined by conventional QTL (linkage) mapping in 11 nonoverlapping regions of the genome. To complement QTL data and investigate whether the discovery of seed size QTL is approaching “saturation,” we compared QTL data to GWAS for seed mass, seed length, and seed width studied in 354 accessions from a sorghum association panel (SAP) that have been genotyped at 265,487 SNPs. We identified nine independent GWAS-based “hotspots” for seed size associations. Targeted resequencing near four association peaks with the most notable linkage disequilibrium provides further support of the role(s) of these regions in the genetic control of sorghum seed size and identifies two candidate causal variants with nonsynonymous mutations. Of nine GWAS hotspots in sorghum, seven have significant correspondence with rice QTL intervals and known genes for components of seed size on orthologous chromosomes. Identifying intersections between positional and association genetic data are a potentially powerful means to mitigate constraints associated with each approach, and nonrandom correspondence of sorghum (panicoid) GWAS signals to rice (oryzoid) QTL adds a new dimension to the ability to leverage genetic data about this important trait across divergent plants.
quantitative trait locus; genome-wide association studies
A high-throughput optical biosensing technique is proposed and demonstrated. This hybrid technique combines optical transmission of nanoholes with colorimetric silver staining. The size and spacing of the nanoholes are chosen so that individual nanoholes can be independently resolved in massive parallel using an ordinary transmission optical microscope, and, in place of determining a spectral shift, the brightness of each nanohole is recorded to greatly simplify the readout. Each nanohole then acts as an independent sensor, and the blocking of nanohole optical transmission by enzymatic silver staining defines the specific detection of a biological agent. Nearly 10,000 nanoholes can be simultaneously monitored under the field of view of a typical microscope. As an initial proof of concept, biotinylated lysozyme (biotin-HEL) was used as a model analyte, giving a detection limit as low as 0.1 ng/mL.
Nanohole array; transmission optical microscope; immunoassay; biosensing; enzymatic silver staining; colorimetric detection
Global environmental change has influenced lake surface temperatures, a key driver of ecosystem structure and function. Recent studies have suggested significant warming of water temperatures in individual lakes across many different regions around the world. However, the spatial and temporal coherence associated with the magnitude of these trends remains unclear. Thus, a global data set of water temperature is required to understand and synthesize global, long-term trends in surface water temperatures of inland bodies of water. We assembled a database of summer lake surface temperatures for 291 lakes collected in situ and/or by satellites for the period 1985–2009. In addition, corresponding climatic drivers (air temperatures, solar radiation, and cloud cover) and geomorphometric characteristics (latitude, longitude, elevation, lake surface area, maximum depth, mean depth, and volume) that influence lake surface temperatures were compiled for each lake. This unique dataset offers an invaluable baseline perspective on global-scale lake thermal conditions as environmental change continues.
A high-throughput optical biosensing
technique is proposed and
demonstrated. This hybrid technique combines optical transmission
of nanoholes with colorimetric silver staining. The size and spacing
of the nanoholes are chosen so that individual nanoholes can be independently
resolved in massive parallel using an ordinary transmission optical
microscope, and, in place of determining a spectral shift, the brightness
of each nanohole is recorded to greatly simplify the readout. Each
nanohole then acts as an independent sensor, and the blocking of nanohole
optical transmission by enzymatic silver staining defines the specific
detection of a biological agent. Nearly 10000 nanoholes can be simultaneously
monitored under the field of view of a typical microscope. As an initial
proof of concept, biotinylated lysozyme (biotin-HEL) was used as a
model analyte, giving a detection limit as low as 0.1 ng/mL.
nanohole array; transmission optical microscope; immunoassay; biosensing; enzymatic silver staining; colorimetric detection
The QT interval, an electrocardiographic measure reflecting myocardial repolarization, is a heritable trait. QT prolongation is a risk factor for ventricular arrhythmias and sudden cardiac death (SCD) and could indicate the presence of the potentially lethal Mendelian Long QT Syndrome (LQTS). Using a genome-wide association and replication study in up to 100,000 individuals we identified 35 common variant QT interval loci, that collectively explain ∼8-10% of QT variation and highlight the importance of calcium regulation in myocardial repolarization. Rare variant analysis of 6 novel QT loci in 298 unrelated LQTS probands identified coding variants not found in controls but of uncertain causality and therefore requiring validation. Several newly identified loci encode for proteins that physically interact with other recognized repolarization proteins. Our integration of common variant association, expression and orthogonal protein-protein interaction screens provides new insights into cardiac electrophysiology and identifies novel candidate genes for ventricular arrhythmias, LQTS,and SCD.
genome-wide association study; QT interval; Long QT Syndrome; sudden cardiac death; myocardial repolarization; arrhythmias
Climate change affects agricultural productivity worldwide. Increased prices of food commodities are the initial indication of drastic edible yield loss, which is expected to increase further due to global warming. This situation has compelled plant scientists to develop climate change-resilient crops, which can withstand broad-spectrum stresses such as drought, heat, cold, salinity, flood, submergence and pests, thus helping to deliver increased productivity. Genomics appears to be a promising tool for deciphering the stress responsiveness of crop species with adaptation traits or in wild relatives toward identifying underlying genes, alleles or quantitative trait loci. Molecular breeding approaches have proven helpful in enhancing the stress adaptation of crop plants, and recent advances in high-throughput sequencing and phenotyping platforms have transformed molecular breeding to genomics-assisted breeding (GAB). In view of this, the present review elaborates the progress and prospects of GAB for improving climate change resilience in crops, which is likely to play an ever increasing role in the effort to ensure global food security.
climate change; crop improvement; stress tolerance; breeding; genomics
Refractive error (RE) is a complex, multifactorial disorder characterized by a mismatch between the optical power of the eye and its axial length that causes object images to be focused off the retina. The two major subtypes of RE are myopia (nearsightedness) and hyperopia (farsightedness), which represent opposite ends of the distribution of the quantitative measure of spherical refraction. We performed a fixed effects meta-analysis of genome-wide association results of myopia and hyperopia from 9 studies of European-derived populations: AREDS, KORA, FES, OGP-Talana, MESA, RSI, RSII, RSIII and ERF. One genome-wide significant region was observed for myopia, corresponding to a previously identified myopia locus on 8q12 (p = 1.25×10−8), which has been reported by Kiefer et al. as significantly associated with myopia age at onset and Verhoeven et al. as significantly associated to mean spherical-equivalent (MSE) refractive error. We observed two genome-wide significant associations with hyperopia. These regions overlapped with loci on 15q14 (minimum p value = 9.11×10−11) and 8q12 (minimum p value 1.82×10−11) previously reported for MSE and myopia age at onset. We also used an intermarker linkage- disequilibrium-based method for calculating the effective number of tests in targeted regional replication analyses. We analyzed myopia (which represents the closest phenotype in our data to the one used by Kiefer et al.) and showed replication of 10 additional loci associated with myopia previously reported by Kiefer et al. This is the first replication of these loci using myopia as the trait under analysis. “Replication-level” association was also seen between hyperopia and 12 of Kiefer et al.'s published loci. For the loci that show evidence of association to both myopia and hyperopia, the estimated effect of the risk alleles were in opposite directions for the two traits. This suggests that these loci are important contributors to variation of refractive error across the distribution.
Peanut (Arachis hypogaea L.) causes one of the most serious food allergies. Peanut seed proteins, Arah1, Arah2, and Arah3, are considered to be among the most important peanut allergens. To gain insights into genome organization and evolution of allergen-encoding genes, approximately 617 kb from the genome of cultivated peanut and 215 kb from a wild relative were sequenced including three Arah1, one Arah2, eight Arah3, and two Arah6 gene family members. To assign polarity to differences between homoeologous regions in peanut, we used as outgroups the single orthologous regions in Medicago, Lotus, common bean, chickpea, and pigeonpea, which diverged from peanut about 50 Ma and have not undergone subsequent polyploidy. These regions were also compared with orthologs in many additional dicot plant species to help clarify the timing of evolutionary events. The lack of conservation of allergenic epitopes between species, and the fact that many different proteins can be allergenic, makes the identification of allergens across species by comparative studies difficult. The peanut allergen genes are interspersed with low-copy genes and transposable elements. Phylogenetic analyses revealed lineage-specific expansion and loss of low-copy genes between species and homoeologs. Arah1 syntenic regions are conserved in soybean, pigeonpea, tomato, grape, Lotus, and Arabidopsis, whereas Arah3 syntenic regions show genome rearrangements. We infer that tandem and segmental duplications led to the establishment of the Arah3 gene family. Our analysis indicates differences in conserved motifs in allergen proteins and in the promoter regions of the allergen-encoding genes. Phylogenetic analysis and genomic organization studies provide new insights into the evolution of the major peanut allergen-encoding genes.
Arachis hypogaea L.; allergens; gene synteny; genome organization; homologs; evolution
Many patients with type 1 diabetes develop renal disease despite moderately good metabolic control, suggesting other risk factors may play a role. Recent evidence suggests that the haptoglobin (HP) 2-2 genotype, which codes for a protein with reduced antioxidant activity, may predict renal function decline in type 1 diabetes. We examined this hypothesis in 1,303 Caucasian participants in the Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications (DCCT/EDIC) study. HP genotype was determined by polyacrylamide gel electrophoresis. Glomerular filtration rate was estimated by the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation and albumin excretion based on timed urine samples. Participants were followed up for a mean of 22 years. HP genotype was significantly associated with the development of sustained estimated glomerular filtration rate (GFR) <60 mL/min/1.73 m2 and with end-stage renal disease (ESRD), with HP 2-2 having greater risk than HP 2-1 and 1-1. No association was seen with albuminuria. Although there was no treatment group interaction, the associations were only significant in the conventional treatment group, where events rates were much higher. We conclude that the HP genotype is significantly associated with the development of reduced GFR and ESRD in the DCCT/EDIC study.
Little is known about the genetic factors that contribute to familial colorectal cancer type X (FCCX), characterized by hereditary nonpolyposis colorectal carcinoma with no mismatch repair defects. Genetic linkage analysis, exome sequencing, tumor studies, and functional investigations of 4 generations of a FCCX family led to the identification of a truncating germline mutation in RPS20, which encodes a component (S20) of the small ribosomal subunit and is a new colon cancer predisposition gene. The mutation was associated with a defect in pre–ribosomal RNA maturation. Our findings show that mutations in a gene encoding a ribosomal protein can predispose individuals to microsatellite-stable colon cancer. Evaluation of additional FCCX families for mutations in RPS20 and other ribosome-associated genes is warranted.
Colon Cancer; Hereditary Nonpolyposis Colorectal Cancer; Ribosome; Exome Sequencing; FCCX, hereditary nonpolyposis colorectal cancer type X; rRNA, ribosomal RNA
Background and Aims
Peanut (Arachis hypogaea) is an allotetraploid (AABB-type genome) of recent origin, with a genome of about 2·8 Gb and a high repetitive content. This study reports an analysis of the repetitive component of the peanut A genome using bacterial artificial chromosome (BAC) clones from A. duranensis, the most probable A genome donor, and the probable consequences of the activity of these elements since the divergence of the peanut A and B genomes.
The repetitive content of the A genome was analysed by using A. duranensis BAC clones as probes for fluorescence in situ hybridization (BAC-FISH), and by sequencing and characterization of 12 genomic regions. For the analysis of the evolutionary dynamics, two A genome regions are compared with their B genome homeologues.
BAC-FISH using 27 A. duranensis BAC clones as probes gave dispersed and repetitive DNA characteristic signals, predominantly in interstitial regions of the peanut A chromosomes. The sequences of 14 BAC clones showed complete and truncated copies of ten abundant long terminal repeat (LTR) retrotransposons, characterized here. Almost all dateable transposition events occurred <3·5 million years ago, the estimated date of the divergence of A and B genomes. The most abundant retrotransposon is Feral, apparently parasitic on the retrotransposon FIDEL, followed by Pipa, also non-autonomous and probably parasitic on a retrotransposon we named Pipoka. The comparison of the A and B genome homeologous regions showed conserved segments of high sequence identity, punctuated by predominantly indel regions without significant similarity.
A substantial proportion of the highly repetitive component of the peanut A genome appears to be accounted for by relatively few LTR retrotransposons and their truncated copies or solo LTRs. The most abundant of the retrotransposons are non-autonomous. The activity of these retrotransposons has been a very significant driver of genome evolution since the evolutionary divergence of the A and B genomes.
Arachis hypogaea; A. duranensis; peanut; groundnut; BAC-FISH; BAC sequencing; retrotransposons; genome evolution; phylogeny; homeology
Visual refractive errors (REs) are complex genetic traits with a largely unknown etiology. To date, genome-wide association studies (GWASs) of moderate size have identified several novel risk markers for RE, measured here as mean spherical equivalent (MSE). We performed a GWAS using a total of 7280 samples from five cohorts: the Age-Related Eye Disease Study (AREDS); the KORA study (‘Cooperative Health Research in the Region of Augsburg’); the Framingham Eye Study (FES); the Ogliastra Genetic Park-Talana (OGP-Talana) Study and the Multiethnic Study of Atherosclerosis (MESA). Genotyping was performed on Illumina and Affymetrix platforms with additional markers imputed to the HapMap II reference panel. We identified a new genome-wide significant locus on chromosome 16 (rs10500355, P = 3.9 × 10−9) in a combined discovery and replication set (26 953 samples). This single nucleotide polymorphism (SNP) is located within the RBFOX1 gene which is a neuron-specific splicing factor regulating a wide range of alternative splicing events implicated in neuronal development and maturation, including transcription factors, other splicing factors and synaptic proteins.
Sugarcane is the source of sugar in all tropical and subtropical countries and is becoming increasingly important for bio-based fuels. However, its large (10 Gb), polyploid, complex genome has hindered genome based breeding efforts. Here we release the largest and most diverse set of sugarcane genome sequences to date, as part of an on-going initiative to provide a sugarcane genomic information resource, with the ultimate goal of producing a gold standard genome.
Three hundred and seventeen chiefly euchromatic BACs were sequenced. A reference set of one thousand four hundred manually-annotated protein-coding genes was generated. A small RNA collection and a RNA-seq library were used to explore expression patterns and the sRNA landscape. In the sucrose and starch metabolism pathway, 16 non-redundant enzyme-encoding genes were identified. One of the sucrose pathway genes, sucrose-6-phosphate phosphohydrolase, is duplicated in sugarcane and sorghum, but not in rice and maize. A diversity analysis of the s6pp duplication region revealed haplotype-structured sequence composition. Examination of hom(e)ologous loci indicate both sequence structural and sRNA landscape variation. A synteny analysis shows that the sugarcane genome has expanded relative to the sorghum genome, largely due to the presence of transposable elements and uncharacterized intergenic and intronic sequences.
This release of sugarcane genomic sequences will advance our understanding of sugarcane genetics and contribute to the development of molecular tools for breeding purposes and gene discovery.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-540) contains supplementary material, which is available to authorized users.
Saccharum; Bacterial artificial chromosome sequencing; Polyploidy; Genome; Genetics; Grasses
The Saccharinae, especially sugarcane, Miscanthus and sorghum, present remarkable characteristics for bioenergy production. Biotechnology of these plants will be important for a sustainable feedstock supply. Herein, we review knowledge useful for their improvement and synergies gained by their parallel study.
Skin fluorescence (SF) is a non-invasive marker of AGEs and is associated with the long-term complications of diabetes. SF increases with age and is also greater among individuals with diabetes. A familial correlation of SF suggests that genetics may play a role. We therefore performed parallel genome-wide association studies of SF in two cohorts.
Cohort 1 included 1,082 participants, 35–67 years of age with type 1 diabetes. Cohort 2 included 8,721 participants without diabetes, aged 18–90 years.
rs1495741 was significantly associated with SF in Cohort 1 (p < 6 × 10−10), which is known to tag the NAT2 acetylator phenotype. The fast acetylator genotype was associated with lower SF, explaining up to 15% of the variance. In Cohort 2, the top signal associated with SF (p = 8.3 × 10−42) was rs4921914, also in NAT2, 440 bases upstream of rs1495741 (linkage disequilibrium r2 = 1.0 for rs4921914 with rs1495741). We replicated these results in two additional cohorts, one with and one without type 1 diabetes. Finally, to understand which compounds are contributing to the NAT2–SF signal, we examined 11 compounds assayed from skin biopsies (n = 198): the fast acetylator genotype was associated with lower levels of the AGEs hydroimidazolones of glyoxal (p = 0.017).
We identified a robust association between NAT2 and SF in people with and without diabetes. Our findings provide proof of principle that genetic variation contributes to interindividual SF and that NAT2 acetylation status plays a major role.
Electronic supplementary material
The online version of this article (doi:10.1007/s00125-014-3286-9) contains peer-reviewed but unedited supplementary material, which is available to authorised users.
Acetylation; Genome-wide association study; NAT2; Skin autofluorescence; Skin fluorescence; Skin intrinsic fluorescence
Pleiotropy, which occurs when a single genetic factor influences multiple phenotypes, is present in many genetic studies of complex human traits. Longitudinal family data, such as the Genetic Analysis Workshop 18 data, combine the features of longitudinal studies in individuals and cross-sectional studies in families, thus providing richer information about the genetic and environmental factors associated with the trait of interest. We recently proposed a Bayesian latent variable methodology for the study of pleiotropy, in the presence of longitudinal and family correlation. The purpose of this work is to evaluate the Bayesian latent variable method in a real data setting using the Genetic Analysis Workshop 18 blood pressure phenotypes and sequenced genotype data. To detect single-nucleotide polymorphisms with pleiotropic effect on both diastolic and systolic blood pressure, we focused on a set of 6 single-nucleotide polymorphisms from chromosome 3 that was reported in the literature to be significantly associated with either diastolic blood pressure or the binary hypertension trait. Our analysis suggests that both diastolic blood pressure and systolic blood pressure are associated with the latent hypertension severity variable, but the analysis did not find any of the 6 single-nucleotide polymorphisms to have statistically significant pleiotropic effect on both diastolic blood pressure and systolic blood pressure.
The focus of our work is to evaluate several recently developed pooled association tests for rare variants and assess the impact of different gene annotation methods and binning strategies on the analyses of rare variants under Genetic Analysis Workshop 18 real and simulated data settings. We considered the sample of 103 unrelated individuals with sequence data, genotypes of rare variants from chromosome 3, real phenotype of hypertension status and simulated phenotypes of systolic blood pressure (SBP) and diastolic blood pressure (DBP), and covariates of age, sex, and the interaction between age and sex. In the analysis of real phenotype data, we did not obtain significant results for any binning strategy; however, we observed a slight deviation of the p-values from the uniform distribution based on the protein-damaging variant grouping strategy. Evaluation of methods using simulated data showed lack of power even at the conservative level of 0.05 for most of the causal genes on chromosome 3. Nevertheless, analysis of MAP4 produced good power for all tests at various levels of the tests for both DBP and SBP. Our results also confirmed that Fisher's method is not only robust but can also improve power over individual pooled linear and quadratic tests and is often better than other robust tests such as SKAT-O.