1.  Seforta, an integrated tool for detecting the signature of selection in coding sequences 
BMC Research Notes  2014;7:240.
The majority of amino acid residues are encoded by more than one codon, and a bias in the usage of such synonymous codons has been repeatedly demonstrated. One assumption is that this phenomenon has evolved to improve the efficiency of translation by reducing the time required for the recruitment of isoacceptors. The most abundant tRNA species are preferred at sites on the protein which are key for its functionality, a behavior which has been termed “translational accuracy”. Although observed in many species, as yet no public domain software has been made available for its quantification.
We present here Seforta (Selection for Translational Accuracy), a program designed to quantify translational accuracy. It searches for synonymous codon usage bias in both conserved and non-conserved regions of coding sequences and computes a cumulative odds ratio and a Z-score. The specification of a set of preferred codons is desirable, but the program can also generate these. Finally, a randomization protocol calculates the probability that preferred codon combinations could have arisen by chance.
Seforta is the first public domain program able to quantify translational accuracy. It comes with a simple graphical user interface and can be readily installed and adjusted to the user's requirements.
PMCID: PMC4022393  PMID: 24739143
Codon bias; Translation optimization; Translational accuracy
2.  Genetic and Metabolite Diversity of Sardinian Populations of Helichrysum italicum 
PLoS ONE  2013;8(11):e79043.
Helichrysum italicum (Asteraceae) is a small shrub endemic to the Mediterranean Basin, growing in fragmented and diverse habitats. The species has attracted attention due to its secondary metabolite content, but little effort has as yet been dedicated to assessing the genetic and metabolite diversity present in these populations. Here, we describe the diversity of 50 H. italicum populations collected from a range of habitats in Sardinia.
H. italicum plants were AFLP fingerprinted and the composition of their leaf essential oil characterized by GC-MS. The relationships between the genetic structure of the populations, soil, habitat and climatic variables and the essential oil chemotypes present were evaluated using Bayesian clustering, contingency analyses and AMOVA.
Key results
The Sardinian germplasm could be partitioned into two AFLP-based clades. Populations collected from the southwestern region constituted a homogeneous group which remained virtually intact even at high levels of K. The second, much larger clade was more diverse. A positive correlation between genetic diversity and elevation suggested the action of natural purifying selection. Four main classes of compounds were identified among the essential oils, namely monoterpenes, oxygenated monoterpenes, sesquiterpenes and oxygenated sesquiterpenes. Oxygenated monoterpene levels were significantly correlated with the AFLP-based clade structure, suggesting a correspondence between gene pool and chemical diversity.
The results suggest an association between chemotype, genetic diversity and collection location which is relevant for the planning of future collections aimed at identifying valuable sources of essential oil.
PMCID: PMC3832510  PMID: 24260149
3.  gff2sequence, a new user friendly tool for the generation of genomic sequences 
BioData Mining  2013;6:15.
General Feature Format (GFF) files are used to store genome features such as genes, exons, introns, primary transcripts etc. Although many software packages (i.e. ab initio gene prediction programs) can annotate features by using such a standard, a small number of tools have been developed to extract the corresponding sequence information from the original genome. However the present tools do not execute either a quality control or a customizable filter of the annotated features is available.
gff2sequence is a program that extracts nucleotide/protein sequences from a genomic multifasta by using the information provided by a general feature format file. While a graphical user interface makes this software very easy to use, a C++ algorithm allows high performance together with low hardware demand. The software also allows the extraction of the genic portions such as the untranslated and the coding sequences. Moreover a highly customizable quality control pipeline can be used to deal with anomalous splicing sites, incorrect open reading frames and not canonical characters within the retrieved sequences.
gff2sequence is a user friendly program that allows the generation of highly customizable sequence datasets by processing a general feature format file. The presence of a wide range of quality filters makes this tool also suitable for refining the ab initio gene predictions.
PMCID: PMC3848729  PMID: 24020993
Gene annotation; General feature format; Sequence quality
4.  The Signatures of Selection for Translational Accuracy in Plant Genes 
Genome Biology and Evolution  2013;5(6):1117-1126.
Little is known about the natural selection of synonymous codons within the coding sequences of plant genes. We analyzed the distribution of synonymous codons within plant coding sequences and found that preferred codons tend to encode the more conserved and functionally important residues of plant proteins. This was consistent among several synonymous codon families and applied to genes with different expression profiles and functions. Most of the randomly chosen alternative sets of codons scored weaker associations than the actual sets of preferred codons, suggesting that codon position within plant genes and codon usage bias have coevolved to maximize translational accuracy. All these findings are consistent with the mistranslation-induced protein misfolding theory, which predicts the natural selection of highly preferred codons more frequently at sites where translation errors could compromise protein folding or functionality. Our results will provide an important insight in future studies of protein folding, molecular evolution, and transgene design for optimal expression.
PMCID: PMC3698923  PMID: 23695187
coding sequences evolution; codon bias; constrained sites
5.  The Relation of Codon Bias to Tissue-Specific Gene Expression in Arabidopsis thaliana 
Genetics  2012;192(2):641-649.
The codon composition of coding sequences plays an important role in the regulation of gene expression. Herein, we report systematic differences in the usage of synonymous codons among Arabidopsis thaliana genes that are expressed specifically in distinct tissues. Although we observed that both regionally and transcriptionally associated mutational biases were associated significantly with codon bias, they could not explain the observed differences fully. Similarly, given that transcript abundances did not account for the differences in codon usage, it is unlikely that selection for translational efficiency can account exclusively for the observed codon bias. Thus, we considered the possible evolution of codon bias as an adaptive response to the different abundances of tRNAs in different tissues. Our analysis demonstrated that in some cases, codon usage in genes that were expressed in a broad range of tissues was influenced primarily by the tissue in which the gene was expressed maximally. On the basis of this finding we propose that genes that are expressed in certain tissues might show a tissue-specific compositional signature in relation to codon usage. These findings might have implications for the design of transgenes in relation to optimizing their expression.
PMCID: PMC3454886  PMID: 22865738
Arabidopsis thaliana; expression pattern; codon bias; compositional signature
6.  Spatial Analyses of Mono, Di and Trinucleotide Trends in Plant Genes 
PLoS ONE  2011;6(8):e22855.
Genomic DNA sequences display compositional heterogeneity on many scales. In this paper we analyzed tendencies and anomalies in the occurence of mono, di and trinucleotides in structural regions of plant genes. Representation of these trends as a function of position along genic sequences highlighted compositional features peculiar of either monocots or eudicots that were remarkably uniform within these two evolutionary clades. The most evident of these features appeared in the form of gradient of base content along the direction of transcription. The robustness of such a representation was validated in sequences sub-datasets generated considering structural and compositional features such as total length of cds, overall GC content and genic orientation in the genome. Piecewise regression analyses indicated that the gradients could be conveniently approximated to a two segmented model where a first region featuring a steep slope is followed by a second segment fitting a milder variation. In general, monocots species showed steeper segments than eudicots. The guanine gradient was the most distinctive feature between the two evolutionary clades, being moderately increasing in eudicots and firmly decreasing in monocots. Single gene investigation revealed that a high proportion of genes show compositional trends compatible with a segmented model suggesting that these features are essential attributes of gene organization. Dinucleotide and trinucleotide biases were referred to expectation based on a random union of the component elements. The average bias at dinucleotide level identified a significant undererpresentation of some dinucleotide and the overrepresention of others. The bias at trinucleotide level was on average low. Finally, the analysis of bryophyte coding sequences showed mononucleotide, dinucleotide and trinucleotide compositional trends resembling those of higher plants. This finding suggested that the emergenge of compositional bias is an ancient event in evolution which was already present at the time of land conquest by green plants.
PMCID: PMC3148226  PMID: 21829660
7.  Mutational Biases and Selective Forces Shaping the Structure of Arabidopsis Genes 
PLoS ONE  2009;4(7):e6356.
Recently features of gene expression profiles have been associated with structural parameters of gene sequences in organisms representing a diverse set of taxa. The emerging picture indicates that natural selection, mediated by gene expression profiles, has a significant role in determining genic structures. However the current situation is less clear in plants as the available data indicates that the effect of natural selection mediated by gene expression is very weak. Moreover, the direction of the patterns in plants appears to contradict those observed in animal genomes. In the present work we analized expression data for >18000 Arabidopsis genes retrieved from public datasets obtained with different technologies (MPSS and high density chip arrays) and compared them with gene parameters. Our results show that the impact of natural selection mediated by expression on genes sequences is significant and distinguishable from the effects of regional mutational biases. In addition, we provide evidence that the level and the breadth of gene expression are related in opposite ways to many structural parameters of gene sequences. Higher levels of expression abundance are associated with smaller transcripts, consistent with the need to reduce costs of both transcription and translation. Expression breadth, however, shows a contrasting pattern, i.e. longer genes have higher breadth of expression, possibly to ensure those structural features associated with gene plasticity. Based on these results, we propose that the specific balance between these two selective forces play a significant role in shaping the structure of Arabidopsis genes.
PMCID: PMC2712092  PMID: 19633720
8.  An Italian functional genomic resource for Medicago truncatula 
BMC Research Notes  2008;1:129.
Medicago truncatula is a model species for legumes. Its functional genomics have been considerably boosted in recent years due to initiatives based both in Europe and US. Collections of mutants are becoming increasingly available and this will help unravel the genetic control of important traits for many species of legumes.
Our report is on the production of three complementary mutant collections of the model species Medicago truncatula produced in Italy in the frame of a national genomic initiative. Well established strategies were used: Tnt1 mutagenesis, TILLING and activation tagging. Both forward and reverse genetics screenings proved the efficiency of the mutagenesis approaches adopted, enabling the isolation of interesting mutants which are in course of characterization. We anticipate that the reported collections will be complementary to the recently established functional genomics tools developed for Medicago truncatula both in Europe and in the United States.
PMCID: PMC2633015  PMID: 19077311
9.  Genetic Diversity and Population Structure of Wild Olives from the North-western Mediterranean Assessed by SSR Markers 
Annals of Botany  2007;100(3):449-458.
Background and Aims
This study examines the pattern of genetic variability and genetic relationships of wild olive (Olea europaea subsp. europaea var. sylvestris) populations in the north-western Mediterranean. Recent bottleneck events are also assessed and an investigation is made of the underlying population structure of the wild olive populations.
The genetic variation within and between 11 wild olive populations (171 individuals) was analysed with eight microsatellite markers. Conventional and Bayesian-based analyses were applied to infer genetic structure and define the number of gene pools in wild olive populations.
Key Results
Bayesian model-based clustering identified four gene pools, which was in overall concordance with the Factorial Correspondence Analysis and Fitch–Margoliash tree. Two gene pools were predominantly found in southern Spain and Italian islands, respectively, in samples gathered from undisturbed forests of the typical Mediterranean climate. The other two gene pools were mostly detected in the north-eastern regions of Spain and in continental Italy and belong to the transition region between the temperate and Mediterranean climate zones.
On the basis of these results, it can be assumed that the population structure of wild olives from the north-western Mediterranean partially reflects the evolutionary history of these populations, although hybridization between true oleasters and cultivated varieties in areas of close contact between the two forms must be assumed as well. The study indicates a degree of admixture in all the populations, and suggests some caution regarding genetic differentiation at the population level, making it difficult to identify clear-cut genetic boundaries between candidate areas containing either genuinely wild or feral germplasm.
PMCID: PMC2533604  PMID: 17613587
Olea europaea; genetic variability; gene pools; microsatellites; oleasters; population structure
10.  Genetic Structure of Wild and Cultivated Olives in the Central Mediterranean Basin 
Annals of Botany  2006;98(5):935-942.
• Background and Aims Olive cultivars and their wild relatives (oleasters) represent two botanical varieties of Olea europaea subsp. europaea (respectively europaea and sylvestris). Olive cultivars have undergone human selection and their area of diffusion overlaps that of oleasters. Populations of genuine wild olives seem restricted to isolated areas of Mediterranean forests, while most other wild-looking forms of olive may include feral forms that escaped cultivation.
• Methods The genetic structure of wild and cultivated olive tree populations was evaluated by amplified fragment length polymorphism (AFLP) markers at a microscale level in one continental and two insular Italian regions.
• Key Results The observed patterns of genetic variation were able to distinguish wild from cultivated populations and continental from insular regions. Island oleasters were highly similar to each other and were clearly distinguishable from those of continental regions. Ancient cultivated material from one island clustered with the wild plants, while the old plants from the continental region clustered with the cultivated group.
• Conclusions On the basis of these results, we can assume that olive trees have undergone a different selection/domestication process in the insular and mainland regions. The degree of differentiation between oleasters and cultivated trees on the islands suggests that all cultivars have been introduced into these regions from the outside, while the Umbrian cultivars have originated either by selection from local oleasters or by direct introduction from other regions.
PMCID: PMC2803593  PMID: 16935868
Olea europaea; AFLP; genetic diversity; population structure; wild populations
11.  In planta production of two peptides of the Classical Swine Fever Virus (CSFV) E2 glycoprotein fused to the coat protein of potato virus X 
BMC Biotechnology  2006;6:29.
Classical Swine Fever (CSFV) is one of the most important viral infectious diseases affecting wild boars and domestic pigs. The etiological agent of the disease is the CSF virus, a single stranded RNA virus belonging to the family Flaviviridae.
All preventive measures in domestic pigs have been focused in interrupting the chain of infection and in avoiding the spread of CSFV within wild boars as well as interrupting transmission from wild boars to domestic pigs. The use of plant based vaccine against CSFV would be advantageous as plant organs can be distributed without the need of particular treatments such as refrigeration and therefore large areas, populated by wild animals, could be easily covered.
We report the in planta production of peptides of the classical swine fever (CSF) E2 glycoprotein fused to the coat protein of potato virus X. RT-PCR studies demonstrated that the peptide encoding sequences are correctly retained in the PVX construct after three sequential passage in Nicotiana benthamiana plants. Sequence analysis of RT-PCR products confirmed that the epitope coding sequences are replicated with high fidelity during PVX infection. Partially purified virions were able to induce an immune response in rabbits.
Previous reports have demonstrated that E2 synthetic peptides can efficiently induce an immunoprotective response in immunogenized animals. In this work we have showed that E2 peptides can be expressed in planta by using a modified PVX vector. These results are particularly promising for designing strategies for disease containment in areas inhabited by wild boars.
PMCID: PMC1534020  PMID: 16792815

