|Home | About | Journals | Submit | Contact Us | Français|
Transgenic crops are widespread in some countries and sectors of the agro-economy, but are also highly contentious. Proponents of transgenic crop improvement often cite the “substantial equivalence” of transgenic crops to the their nontransgenic parents and sibling varieties. Opponents of transgenic crop improvement dismiss the substantial equivalence standard as being without statistical basis and emphasize the possible unintended effects to food quality and composition due to genetic transformation. Systems biology approaches should help consumers, regulators, and other stakeholders make better decisions regarding transgenic crop improvement by characterizing the composition of conventional and transgenically improved crop species and products. In particular, metabolomic profiling via mass spectrometry and nuclear magnetic resonance can make broad and deep assessments of food quality and content. The metabolome observed in a transgenic variety can then be assessed relative to the consumer and regulator accepted phenotypic range observed among conventional varieties. I briefly discuss both targeted (closed architecture) and nontargeted (open architecture) metabolomics with respect to the transgenic crop debate and highlight several challenges to the field. While most experimental examples come from tomato (Solanum lycoperiscum), analytical methods from all of systems biology are discussed.
Not everything turns out quite the way people expect. Very few people shoot holes-in-one in golf, bowl perfect games or score 1600 on their SAT exams. We all try our best to accomplish our goals and largely do, no matter whether we are at work or at play. This is certainly true for plant breeding and improvement. Unintended effects to composition and quality occur at some frequency no matter what method is applied to make a bigger ear of corn or a tastier apple. Unintended effects represent statistically significant differences in phenotype, which, for example, could be the disappearance of a particular protein or alterations in polysaccharide composition.1 These differences may or may not have an impact on quality or ultimate safety. Unintended effects can be further classified as predictable or unpredictable, where predictable changes can be explained based on an understanding of the underlying biology, from prior knowledge of the genetics of the parental varieties, or from the function of a transgene or the site of genomic integration. Unpredictable changes fall outside obvious explanation.1 A National Research Council (U.S.) taskforce estimated the probability of unintended effects due to a variety of methods for crop improvement.2 They considered selection from homogenous populations to produce the smallest number of unintended effects (of any kind), with transformation of genes from closely related species producing similar outcomes to those observed by genetic crosses between existing germplasm pools. Genetic crosses between closely related species were estimated to produce a wider range of variance than the previous three methods. Transformation with genes from distantly related species was estimated to produce an even greater incidence of unintended effects, but fewer than those seen from mutational breeding methods, where deliberate mutagenesis using chemicals or ionizing radiation is used to induce novel genetic variation.
Most new plant varieties are generated by crossing highly related (homogenous) varieties together and selecting the small number that are more desirable than either parent. This style of plant improvement has lead to steady gains in many traits, such as yield in maize, but is not likely to produce crops with radical changes in quality, composition, or adaptation to a particular stress. That this strategy will succeed presupposes that genetic diversity exists within the “elite” panel of varieties. In species with little genetic diversity, such as tomato, genetic hybridization between closely related species has introduced novel characteristics that are highly desirable. Cultivated tomato (Solanum lycoperiscum) has been improved by replacing chromosomal segments with ones from wild relatives (e.g., S. pennellii), which contain pathogen resistance or fruit quality genes not found in S. lycoperiscum.3,4 Deliberate mutagenesis is a tool commonly used by geneticists to discover new genes and their functions. At least fourteen genes important for carbohydrate metabolism in maize seeds have been described by mutagenesis. Eight of these genes have been commercialized to some greater or lesser degree in the form of sweet corn (3 genes) or to provide specialty starches for various ethnic cuisines (5 genes).5 Sweet corn varieties can carry the sugary1 mutation, the shrunken2 mutation, contain both sugary1 and sugary-enhancer1, or carry all three mutations. Producers, consumers, regulators, and other stakeholders accept these products as safe. This implicitly means that the degree of phenotypic variation away from “normal” is acceptable as well.
Transgenic crop improvement is widespread for some commodities and in some parts of the world, but its use is hotly debated. In 2005, transgenic crops were planted on 87.2 million hectares around the world, including 47.4 million hectares in the United States.6 In 2007, the majority of the major commodity crops in the United States were transgenic: soybean (91%), cotton (87%), and maize (73%), with wheat being the notable exception (crops with >5 million hectares planted).7 Within a similar time frame, a poll conducted by the Mellman Group for the Pew Charitable Trusts determined that while 45% of Americans regard transgenic crops as “safe,” 29% regard them as “unsafe.”8 A similar poll found that within the then 25-member state European Union, only 27% of respondents supported the use of transgenic crop improvement.9
There are many points of contention in the debate over transgenic crops, including arguments that rely upon cultural, economic, or scientific underpinnings.10–14 Addressing cultural or economic concerns toward biotechnology is far beyond the scope of a minireview in a scientific journal. Thus, my comments will be confined to two of the scientific issues and how systems biology approaches, especially metabolomics, may help to constructively shape that debate.
Proponents of transgenic crop improvement often speak to the “substantial equivalence” (SE) of transgenic crops relative to their nontransgenic parents.15 In this context, SE means that two varieties are so similar to one another that they can be taken to be same.16,17 On one level, the notion of SE makes sense—a transgenic daughter contains the construct of interest, which may express one or a small number of genes, in addition to whatever disturbance may have arisen from the site of genomic integration, relative to the nontransgenic mother variety. This degree of difference is much smaller in an absolute sense than the differences between two conventional varieties of the same market class. Maize has a high degree of nucleotide polymorphism, similar to that seen in potato, and far more than in tomato.18–20 This is such a high degree that two maize varieties may be as different from each other as humans are from chimpanzees at the DNA level.21 Any one would recognize that humans and chimpanzees are different, although it may take a highly trained botanist to tell the difference between, say cultivars B73 and Mo17 in maize.
In a plant-breeding context, near isogenic lines (NIL) are often considered to be SE to their progenitors. NILs are constructed by crossing back into one parental variety repeatedly over several (5–8) generations, so that the new stock is largely identical to that recurrent parent. If one assumes that a typical crop plant has 50,000 genes, after 5 generations of backcrossing, the parent and daughter will differ only 3.125% [i.e., (1/2)5], which still represents differences at ~1600 genes. If 8 generations of backcrossing are used, parent and daughter will differ at only 0.4% of their genome, or 200 genes. This degree of difference between parent and daughter varieties is acceptable and does not keep NILs from being considered to SE to their progenitor mother varieties. However, the parent and daughter varieties are significantly different in at least one regard—the target trait that was improved in the daughter. Of course, the standard proposed by the Organization for Economic Cooperation and Development was of “substantial” equivalence rather than of “complete” or “total” equivalence.
Two principle objections are made to the SE concept, and by extension to the regulation of transgenic crops. First, opponents of transgenic crop improvement dismiss SE, as there is no specific, statistical basis for the standard.22 In the original statement of principle, particular statistical tests were not explicitly defined to evaluate the SE standard.15 “Substantial” is an adjective rather than an F-score or p-value, which begs the obvious question: How different is acceptably different versus unacceptability different? Opponents to transgenic crop improvement also may willingly miss the point that SE is a part of rather than the complete risk determination.23 Second, opponents of transgenic crop improvement emphasize the possible consequences to food quality and composition due to unintended effects, either predictable or unpredictable ones. The original statement from the Organization for Economic Cooperation and Development emphasizes testing for known toxins and quality biomarkers, examples of which can be found in the excellent review of Cellini et al. (2004).1 This second objection begs the next obvious question: How can one investigate unintended effects using directed testing methods?
Systems biology can make a contribution to the transgenic crop debate by providing datasets that examine the composition of transgenic crops relative to their nontransgenic relatives. These hopefully comprehensive datasets can then be rigorously analyzed using the best statistical methods possible. To take full advantage of the literature available within a particular organism will require the existing data be easily compared from one study/laboratory to another and combined in meta-analyses. Transgenic varieties should not be the only subjects for analysis; a deeper understanding of naturally occurring variation, which represents the range of consumer-acceptable variation, should also be included and thus provide a frame of reference. Complete and unbiased information should facilitate decision-making by consumers, regulators, and other stakeholders and provide more substantial bases for those decisions. I would argue that the combination of analytical tools available to the community, in gene, protein, and metabolite expression analysis and identification, are mature and useful to examine transgenic crops and potential unintended effects. Rather, our ability to design and analyze these experiments will be the factors that limit their utility and contributions to the transgenic crop debate.
To limit the scope of this review, I will confine my comments to studies within tomato, although excellent examples can be found in many other crops.1 Tomato has several notable features in this context. It is an economically important crop, with market classes that have divergent properties (e.g., fresh eating versus processing tomatoes; heirloom varieties versus improved hybrids).24 Tomato is a model organism for genomic studies, with genome sequencing underway, a large collection of molecular markers, cDNA and genomic libraries, and a worldwide community of researchers.25 Tomato also represents an early false-start in transgenic crop improvement (e.g., FlavrSavr), serving as a cautionary tale for future transgenic crops.26
Perhaps the principal agronomic consideration for tomato is the quality of the fruit, which is due to a large number of factors and has been the subject of intense research for many years.27 Research into fruit quality provides answers to basic scientific questions that have a significant relevance to applied research. Many different categories of chemical compounds contribute to fruit quality, including sugars, organic acids, amino acids, fatty acids, isoprenoids, and polyphenolic compounds. This wide range of quality biomarkers means that a wide range of separation chemistries have been used to investigate the tomato metabolome, using both targeted and nontargeted metabolomics. Targeted metabolomics are by far the most common studies, as most research programs are focused on understanding or improving a single target trait. Thus, a great deal of information exists that describes the range of stakeholder acceptable phenotypic variation, while this information may not exist in an easily accessible format.
Small molecules can have large effects. The variation in the ratio between sweetness and acidity can cause tomatoes to taste sharp, sweet, insipid, or lovely.28 For breeding programs, simple, low-cost assays are required to accommodate the scale of research, where thousands of samples may need to be analyzed in a short period of time.29 While rather limited in their depth, phenotypic surveys of diverse germplasm have a very broad scope and help define the range of acceptable phenotypic variation. These kinds of organic acid and sugar data can be leveraged using gene expression analysis to look for the underlying genetic causes of fruit quality, leveraging applied data sets into more basic research results.30 The same trait information on carbohydrates and organic acids can also be developed using more sophisticated tools such as nuclear magnetic resonance (NMR) spectroscopy, which identify far more compounds per assay than enzymatic or colorimetric methods but at far lower throughput.31 NMR spectroscopy also offers a high probability of an unambiguous structural determination for a novel metabolite of particular interest. Gas chromatography (GC) paired with mass spectrometry (MS) (GC-MS) is an alternative platform for broad-scope metabolomic profiling, with notable benefit of higher throughput relative to NMR.32 On the other hand, GC-MS does require chemical derivativization, which may exclude classes of metabolites from the analysis, and also may not produce sufficient information for the unambiguous identification of a particular metabolite. However, the intersection of multiple data sets developed on these complementary analytical platforms offers a powerful strategy to analyze metabolomes.
Color and aroma are other targets for tomato improvement and study. Many of the pigments in tomato are isoprenoids, such as carotenoids, while others are polyphenolics, such as flavonoids.33 Traditionally, liquid chromatography (LC) protocols with commercial standards have been sufficient for carotenoid profiling.34 However, as investigators wish to build more complete estimates for various metabolomes, LC-MS is becoming more common for analysis of isoprenoids. The MS portion of the experiment can occur either in-line with the LC or in an off-line mode.35,36 In-line MS simplifies work flow, while off-line MS may provide greater sensitivity due to the greater reduction of sample complexity.36 NMR spectroscopy is also an option for isoprenoid profiling, which is effective at distinguishing E from Z isomers while MS methods cannot.37 This is significant, as different carotenoid isomers may have different biological activities and thus nutritive qualities.38 Carotenoid composition can change through food preparation and processing, both in quality (i.e., isomerization) and identity (i.e., degradation by heat); analysis of both raw and cooked samples may be necessary to present a full description of the isoprenoid portion of a metabolome.39,40 In addition to color, carotenoids also contribute to fruit aroma, as do fatty acid and amino acid derivatives.41 As all three categories represent volatile compounds, GC and GC-MS are the platforms of choice for separation and identification.41,42 A relatively small number of compounds contribute to aroma, which has facilitated the discovery of key biosynthetic and regulatory genes using traditional genetic and transgenic approaches.41,43–46
Flavonoids, like carotenoids, contribute to both the color and nutritional quality of tomato fruit.47 Like carotenoids, LC-based approaches have been widely utilized to quantitate and identify flavonoids, using either commercially available standards or MS.48,49 Flavonoids can degrade during food/sample preparation so that analysis of raw and cooked samples may need to be performed to insure the relevance of the results.40 The regulation of flavonoid biosynthesis has been studied for decades and many genes are available for direct study.33 As a result, many studies in tomato utilize either conventional mutants (e.g., high-pigment1) or transgenic strategies to manipulate the flavonoid pathway and products.33,50–55 A third method to manipulate the flavonoid pathway (and others) is to observe changes through developmental time; the ripening process makes dramatic changes to fruit composition.56 In addition, there is compartmentalization within fruit for many different compounds.35 Tissue type explained far more variance in chemical species observed than did genetic differences in a small panel of fresh market/greenhouse adapted cultivars.56 Even greater resolution of variance was obtained by using a time-course approach to dissect the tissue-specific metabolomes in ripening fruit. Nearly 70% of phenotypic variance observed was explained by a principal component analysis, with the obvious clustering according to tissue and developmental stage.56 With further application of this sophisticated experimental design, more of the phenotypic variance could be explained; however, this dramatically increased the number of samples analyzed, as five tissues and five developmental time points were examined.56
Nontargeted metabolomic approaches offer many advantages over targeted ones, if the goal is to characterize unintended effects.57,58 If a primary concern is to understand the predictable and unpredictable unintended effects to composition and quality, then we need to look beyond the obvious quality and toxin biomarkers. However, this kind of survey does not require the unambiguous identification of every compound in the first draft of the analysis. Molecular fingerprinting is likely sufficient in the first draft of a metabolomic comparison between conventional and transgenic varieties.1,57,59 For example, if there are 2000 negative ions produced by MS and only 10 are different between the conventional and transgenic varieties, then obviously those require the most attention in the second round of metabolomic analysis. Nontargeted metabolomics also offer advantages to the analysis of conventional varieties, as they have no preconceptions about what the so-called “interesting” molecules will be, beyond the limitations of whatever extraction or separation chemistry is utilized for sample preparation and analysis. Nontargeted surveys have been conducted using MS on conventional mutants, developmental time series, diverse germplasm, and transgenic varieties.35,56,60,61 Similar studies have been conducted with NMR spectroscopy.55,62
One of the common difficulties with analyzing MS or NMR data is the need for a highly curated database to best understand the spectra produced during the course of the experiment. Fortunately, recent developments in tomato, in particular and within the larger metabolomics community, provide these resources. A metabolite survey of approximately 100 Dutch tomato cultivars was conducted using LC-MS and MS/MS as necessary.63 A high-throughput methodology was described in a separate publication; software for comparison of LC-MS and GC-MS spectra (MetAlign) and the tomato metabolite database (MoToDB) are both freely available.60,63 A more comprehensive (>20,000 compounds) public database has been organized to warehouse MS and NMR data.64 The latter group has also made their software (Sesame) freely available to encourage community participation in the curation of the metabolite database.65 As the Sesame software was originally written to manage proteomic data, comparative tools will hopefully be developed for the joint analysis of metabolite and protein data.66
Many experiments executed to date have had well-defined goals but have lacked thorough descriptions of the experimental protocols and results. This is a general problem that is now being addressed by the establishment of community standards for reporting the details of experimental design and execution. This solution was first promoted in the gene expression community as the MIAME standard (the Minimum Information About a Microarray Experiment).67 For tomato, several public databases are MIAME compliant, including the Tomato Expression Database, the Gene Expression Omnibus, and the Plant Expression Database.68–70 One of the limitations for gene expression data mining is this distribution of community-generated results. In the cereals genomic community, computational biology tools (“middleware”) have been developed to facilitate the invisible exchange of information between model organism databases and clade-oriented databases.71 Hopefully, the development of middleware tools will be expanded to promote the amalgamation of data between multiple web-accessible databases and thus encourage meta-analysis. The ability to consolidate all of the tomato gene expression data would be highly useful to help connect gene and metabolite expression analyses, especially as several microarray-based studies have been made on tomato fruit development.72–77 Tools have been developed for the Arabidopsis community, such that both gene and metabolite expression analysis can be conducted within a single website.78 At the SOL Genomics Network, the clade-oriented database that includes tomato, comparative genomics tools are apparently being developed, while some already exist at the Tomato Expression Database, a model organism database.25,69
The same issue of data exchange exists within the proteomic research community. As with MIAME, a similar community standard has been established for protein expression profiling (proteomics) and is called the MIAPE, for Minimum Information About a Proteomics Experiment.79 For research groups that do apply both genomic and proteomic methods, tools have been developed to merge these data sets into single, coherent entities for consolidated analysis.80,81 Studies have described the tomato fruit proteome as it changes through developmental time.82,83 These methods have not been applied to tomato, although there is sufficient information in the literature to make that joint comparison of gene and protein expression.
There is at least one obstacle to effectively consolidating disparate data sources developed by different investigators. Data mining and middleware function are stymied by the tendency of researchers to use idiosyncratic or incomplete language to describe what they did and how they did it. The use of controlled vocabularies to describe genes, traits, and phenotypes can overcome some of these difficulties.84,85 Once controlled vocabularies are in place, gene and protein expression profiling experiments can be analyzed much more easily.86–88 This is in large part due to the fact that more of the decision-making can be made automatically rather than subjectively by the particular researcher, which is a key consideration when there can be millions of data points. Controlled vocabularies allow data sets to be consolidated with confidence when assembled by multiple researchers since everyone has agreed upon the methods to be used for organization and for the classification of their results.
One of the driving forces for organization of metabolomic, genomic, and proteomic datasets is their scale. All of these approaches generate waves of information that can quickly overwhelm the investigator, which make the visualization and appreciation of the results difficult. In the elegant study of Fraser et al.,32 normal and transgenic tomatoes overexpressing phytoene synthase1 were examined using directed metabolomics, gene expression, protein activity, and physiological parameters. This multidisciplinary approached allowed them to construct the most complete understanding of changes in the metabolism of the transgenic tomato fruit relative to nontransgenic. Their data were summarized in a color-coded metabolic pathway diagram, which made for a highly effective transmission of the scope of the changes observed. However, these diagrams were drawn for only two of the ten possible comparisons that could have been made (two varieties, five developmental stages tested) and focused only on a small subset of overall metabolic processes.32 For similar reasons, it seems highly likely that the sense of scale has interfered with reporting on the nondirected LC-MS studies on the panel of ~100 Dutch tomato varieties; all we have heard to date have been the descriptions of the methods necessary to process that many samples and the database that hosts some fraction of their results.60,63 It must be a difficult process to analyze and visualize a data-set that contains hundreds of thousands of mass identifications derived from hundreds of LC runs, let alone parse this story into appreciable chapters. This work will provide a broad and deep description of the range of phenotypic variation observed and provide highly useful information to both basic and applied biologists working with tomato.
However, this is precisely the kind of information that is required to fully examine the predictable and unpredictable unintended effects to the composition of transgenic crops. Given the scope and depth of experiments on this scale, it is unlikely that single investigators will be able to conduct and analyze these experiments. I think that only through well-organized community efforts, where individuals laboratories divide responsibilities and conduct experiments that can serve multiple functions, will the data necessary to judge the existence and importance of unintended effects be possible. These efforts need to be sufficiently large to describe the phenotypic range observed among a diverse selection of conventional varieties using metabolomic, genomic, and proteomic profiling methods. Once this is accomplished, the same metrics can be applied to a realistic panel of transgenic varieties with possible commercial application. While the characterization of transgenic varieties will likely serve only a biotechnology risk-assessment purpose, the natural diversity survey will provide broad and deep knowledge of food composition and quality. Such a systems biology approach should answer a large number of applied and basic biological questions, thus justifying the investment.
The MIAME and MIAPE standards are now requirements for publishing microarray and proteomic experiments in many journals. Hopefully, similar standards will be applied to metabolomic data as well. As more and more of these datasets are deposited in publicly accessible databases, meta-analyses that integrate multiple levels of information will allow us to ask many different systems biology questions. The adoption of controlled vocabularies for gene, trait, and phenotypic ontologies will further assist these meta-analyses. The benefit of this ability to leverage large collections of data should be obvious to the scientific community. Likewise, the identification of genetically informative populations has been very effective to address important biomedical and agronomic questions, such as the identification of cancer risk factors and genes important for carotenoid biofortification in staple crops.89,90 If these genetically informative populations are studied using metabolomic, genomic, and proteomic methods, this should provide an immediately useful but also durable resource. From this base of knowledge, the range and identity of unintended effects to composition and quality of transgenic foods can be assessed in the most complete manner, and help inform consumers, regulators, and other stakeholders in their decision-making.
This research program is supported by US Department of Agriculture, Agricultural Research Service base funds to O.A. Hoekenga, J.J. Giovannoni, and L.V. Kochian, “Determination and characterization of unintended effects in genetically modified crop plants” 1907-21000-028-00.