Soybeans accumulate protein and oil during development, making the mature soybean seed a valuable commodity. For 2008/09, the farm value of soybeans in the United States neared $30 billion (
http://www.ers.usda.gov/Briefing/SoybeansOilCrops/), exceeded in value only by corn. The value of soybean meal as both a food and feed source is dependent on the character of its storage protein reserves. In soybean, the most abundant storage proteins are members of the cupin superfamily; the legumins (glycinins), and vicilins (β-conglycinins; Derbyshire et al.,
1976; Kinney et al.,
2001; Takahashi et al.,
2003; Radauer and Breiteneder,
2007), which account for approximately 70% of total seed protein. Together with cotton and maize, soybeans represent the majority of the genetically modified (GM) products developed to date. Most of the soybeans grown in the United States are GM varieties. In 2011, about 94% of all soybeans cultivated for the commercial market in the United States were GM (USDA National Agricultural Statistics Service (NASS),
2011). Soybean is considered one of the “big eight” allergenic foods in the United States. Soybean allergens are grouped into five categories: seed storage proteins Gly m 5 (β-conglycinin) and Gly m 6 (glycinin); Gly m TI (Kunitz trypsin inhibitor), Gly m Bd 28K, and Gly m Bd 30K (P34; FARRP version 10 allergen database, 2010: Ogawa et al.,
2000; L’Hocine and Boye,
2007). One of the concerns involving GM soybeans, particularly in the European Union, is the potential increase in the levels of such endogenous allergens compared to those obtained with traditional breeding methods and subsequent enhancement in their sensitization or elicitation capacity. Data, however, are lacking in regard to the natural variability of endogenous allergen levels in non-GM soybean (Doerrer et al.,
2010). Variability in protein expression levels may result from genetic differences (Fehr et al.,
2003), environmental differences (Murphy and Resurreccion,
1984; Maestri et al.,
1998), nutrient stress (Gayler and Sykes,
1985; Paek et al.,
1997), use of different breeding methods (Burton,
1989; Yaklich,
2001; Krishnan et al.,
2007), or interaction between genotype and the environment (Paek et al.,
1997; Piper and Boote,
1999). These data are critical for describing and understanding potential differences of protein allergen levels among GM and non-GM soybean varieties (Doerrer et al.,
2010).
Much of what we know about the protein component of mature soybean has come from various offline separation techniques like polyacrylamide gel electrophoresis, with or without isoelectric focusing, coupled with antibody or dye based detection methods. Two-dimensional (2-D) gels utilizing appropriate immobilized pH gradients that zoom to either acidic or basic pH ranges have been very successful at separating acidic and basic subunits of glycinin as well as the α and β subunits of β-conglycinin (Mooney et al.,
2004; Hajduch et al.,
2005; Natarajan et al.,
2006; Danchenko et al.,
2009). Achieving separation allows for quantification of the various spots using densitometry methods. Ultimately, these gel methods rely on mass spectrometry to distinguish the highly related proteins present in the various spots, which in the case of seed storage proteins exceed 100 (Agrawal et al.,
2008). The multiplicity of 2-D spots that must each be accounted for in a quantitative study coincides with the biggest problem of 2-D gel-based quantification – low throughput (Rabilloud et al.,
2010).
Quantification of proteins by mass spectrometry can be achieved using a technique called multiple reaction monitoring (MRM), during which signals from the endogenous protein are compared to those from a synthetic heavy-labeled internal standard (Kirkpatrick et al.,
2005; Brun et al.,
2009; Houston et al.,
2010; Stevenson et al.,
2010). More specifically, MRM analyses monitor peptides from proteins of interest, which are specific products of proteolysis often generated using the enzyme trypsin. The internal standards are synthetic peptides identical to the endogenous peptides of interest that have been labeled by the addition of a heavy isotope-containing amino acid, thereby changing its mass but nothing else. Because MRM is a form of tandem mass spectrometry (MS/MS), peptides (precursors) are fragmented during the analysis. Fragmentation is used to verify the sequence of the specific peptide of interest, which provides an additional level of specificity to the analysis. This technology is aided greatly by genome and RNA sequencing efforts that provide the sequences for proteins of interest, which allows for rapid MRM method development. Using this technology, proteins that differ in sequence by a single non-isobaric amino acid are theoretically discernable. In addition, MRM analysis using nanospray ionization is highly reproducible with replicate analyses, having less than 15% variation on average including analyses at the limits of instrument detection (Addona et al.,
2009).
Recently, to begin to understand the natural variation of the allergenic protein levels in soybean, Houston et al. (
2010) utilized tandem mass spectrometry to characterize the natural variation of 10 allergens (representing all five categories of soybean allergens identified to date) in 20 commercially available non-GM soybean varieties. The authors reported that the absolute quantities [absolute per unit protein (μg allergen/mg protein)] of the studied allergens extended over a 10-fold range. The objective of the current study is to further evaluate the influence of genotypes and environments. Four non-GM commercially available varieties of soybeans were each grown at six different locations (Georgia, Iowa, Kansas, Nebraska, Ontario, and Pennsylvania, Figure ). Soybean allergens were measured similarly to Houston et al. (
2010) and absolute quantities of soybean allergens reported. Principal Component Analysis (PCA) was conducted to explore the patterns of multivariate data and a biplot (Gabriel,
1971) was utilized to provide results graphically. Genotypic variation and environmental variation were quantified relative to the overall mean and relative to residual variation using Analysis of Variance (ANOVA)-based
F-test. The data suggest that over a broad geographical region, the environment plays a larger role than genotype in determining allergen/anti-nutrient protein levels.