Plant material
Grape clusters were sampled from V. vinifera cv. Cabernet Sauvignon clone 15 grafted on rootstock 101-14 in a commercial vineyard near Osoyoos, British Columbia, in the 2004 and 2005 seasons. Sampling dates during each season were focused on the developmental stages undergoing ripening initiation. Clusters were sampled on a single date in 2004, August 12th, which was the timing of approximately 50% ripening initiation based on a turning pink color phenotype. For the 2005 season, the ripening initiation stage was sampled over a longer period (August 10th through August 16th), since in this growing season, ripening advanced slowly due to lower atmospheric temperatures. Five clusters from five different vines were sampled in each season and snap frozen directly in liquid nitrogen in the vineyard and then transported on dry ice to UBC Vancouver where they were stored at -80°C.
Individual grapes from each of the 2004 and 2005 clusters were developmentally staged based on a visual pigmentation assessment and were segregated for each season into green, pink/turning, fully turned red, and fully turned purple phenotypic classes. For the 2005 samples, green grapes were only taken from clusters collected on August 10th, since for this date and August 12th there was no visible change in color present in any of the grape clusters. Thirty grapes of similar sizes per pigmentation class per year were segregated for experimentation. Prior to total protein extraction, individual grapes were partially thawed in gloved hands and then, using a forceps, the exocarp tissue was carefully peeled away from the mesocarp and placed immediately into liquid nitrogen. Seeds were then carefully removed while keeping the remaining mesocarp tissue frozen in liquid nitrogen. Exocarp and mesocarp samples were ground to a powder under liquid nitrogen and then used for total protein extractions.
Tissue preparation for protein extraction
Preparation of exocarp tissue samples for protein extraction was performed according to a previously described protocol for olive leaf [
15] with some modifications described here. The procedure was carried out on ice and centrifugations were performed at 4°C. Throughout the procedure, each wash was done by complete resuspending of the tissue pellet. Four hundred mg of powdered exocarp tissue was placed in a 2 mL G-tube (Fisher Scientific Canada, Ottawa, ON). The tissue was suspended in 1.5 mL of a cold (-20°C) ethyl acetate:ethanol (1:2 (v:v)) solution by vortexing for 30 s; the ethyl acetate:ethanol extraction was previously found to be useful for removing pectins as well as pigments such as chlorophylls [
16]. Following centrifugation for 3 min at 21000 × g, the supernatant was removed and the ethyl acetate:ethanol extraction and centrifugation steps were repeated on the remaining tissue. The sample was next extracted twice with cold (-20°C) 100% acetone by vortexing and centrifuging, as before. Subsequently, the tissue with added acetone was transferred from the G-tube to a mortar using a 1 mL pipette with the tip end excised to increase diameter and then the acetone was evaporated from the tissue at room temperature. After the addition of 1/3 vol of white quartz sand (Sigma-Aldrich, Oakville, ON, Canada) to the tissue, it was ground to an even finer powder. The powder was transferred back to a clean 2 mL G-tube by suspending the tissue in 1.5 mL of cold (-20°C) TCA:acetone (1:9 (v/v)) and vigorously mixed and centrifuged, as before. Extraction with 10% TCA:acetone was repeated five to seven times, or until no more anthocyanins (red-pigmented flavonoids) could be extracted from the tissue. This was followed by three washes with chilled (4°C) 10% TCA in water by vigorous mixing and centrifugation, as before, to extract the pectins and remaining anthocyanins from the tissue. After this, the tissue was washed twice with cold (-20°C) 80% acetone and centrifuged, as before. Protein extraction was performed after drying the tissue pellet to completion in a speed vacuum extractor (SPD131DDA, Thermo Scientific, Milford, MA, USA).
For the preparation of the mesocarp tissue, the same procedure for the exocarp was used with the following modifications. Three g of starting material was used per sample and the first extractions up to the grinding step with white quartz were done in 50 mL Oakridge tubes. Since some protein can be extracted from the mesocarp via TCA:acetone extraction alone [
14], a 20 min incubation time at -20°C was introduced after the first 100% acetone step and included in the subsequent TCA:acetone containing steps to ensure that all of the protein remained precipitated. In the TCA:H
2O step, the 20 min incubation was done on ice. Since no anthocyanins are present in mesocarp, only two TCA:acetone extractions were carried out for the mesocarp tissue.
Total protein extraction
Two hundred to 300 mg of pre-extracted and dried exocarp or mesocarp tissue contained in a 2 mL G-tube was extracted by resuspending the pellet in 0.75 mL cold Tris-buffered phenol, pH 7.9. Then, 0.75 mL of dense SDS buffer (30% sucrose, 2% SDS, 0.1 M Tris-HCL, pH 8.0) was added. The mixture was vortexed for 30 s and incubated on ice for 40 min with intermittent vortexing. The phenol phase containing the protein as the top phase was separated by centrifugation at 21000 × g for 5 min and transferred into a clean 2 mL G-tube. The remaining SDS phase was re-extracted with another 0.75 mL Tris-buffered phenol and incubated for 20 min before centrifuging and subsequent transfer and combination of the two phenol phases. Protein was precipitated by adding a minimum of 5 vol cold methanol plus 0.1 M ammonium acetate to the combined phenol phase. Precipitation was carried out at -20°C for 30 min or overnight. After centrifugation at 21000 g for 10 min, the pellet was washed twice with cold methanol containing 0.1 M ammonium acetate and subsequently with 80% acetone twice. Pellets were next dissolved in 200–300 μL fresh buffer containing 6 M urea, 2% CHAPS, 5 mM EDTA, and 30 mM HEPES, pH 8.1, to obtain a concentration of approximately 1.0 μg/μL. Careful sonication on ice was used to dissolve the samples.
Protein quantitation was done using a bicinchoninic acid (BCA) absorption assay (Sigma-Aldrich Canada Ltd., Oakville, ON) and read in a Victor V plate reader (PerkinElmer Life and Analytical Sciences, Woodbridge, ON, Canada) equipped with a photometric filter of 560 nm and 10 nm bandwidth. The quality of each protein sample was checked via SDS-PAGE; all samples were devoid of indications of degradation and showed good resolution with low background. Total protein samples were then shipped on dry ice to the University of Victoria-Genome BC Proteomics Centre in Victoria, BC, for iTRAQ analyses. Using a second BCA assay, each protein sample was re-quantified just before aliquoting 100 μg of each sample for iTRAQ labeling steps.
Experimental design and labeling of peptides with iTRAQ reagents
The experimental design consisted of the four developmental stages described earlier for each of exocarp 2004, mesocarp 2004, exocarp 2005, and mesocarp 2005. Two biological replicates were employed for each stage and tissue for the 2005 samples, whereas one 2004 sample was used for each stage of mesocarp or exocarp. An additional technical replicate was carried out for exocarp 2004, representing separate iTRAQ labeling reactions and analyses starting from the same protein sample.
Labeling of peptides with iTRAQ reagents (Applied Biosystems Canada, Streetsville, ON) was performed according to the manufacturer's recommendations as follows. One hundred μg of each protein sample in a maximum volume of 200 μL was precipitated overnight using 100% acetone and dissolved in 20 μL of denaturing buffer containing 1 μL denaturant and 2 μL reducing reagent (TCEP) as provided in the iTRAQ kit, followed by vortexing and incubation at 60°C for 1 h. One μL of cysteine blocking solution (MMTS) was then added to each sample, followed by incubation at room temperature for 10 min. These protein samples were digested overnight with trypsin (Promega, Madison, WI, USA) at 37°C. iTRAQ labeling was carried out by adding iTRAQ reagents 114, 115, 116, and 117 to either the exocarp or the mesocarp samples representing the four developmental stages, green, pink/turning stage, red/fully turned, and purple, respectively. Subsequently, these four samples were mixed by vortexing and further incubated at room temperature for 1 h.
The four iTRAQ-labeled peptide samples were pooled together, diluted 1:10 with cation exchange sample buffer (A) containing 25% acetonitrile in 10 mM KH2PO4, and then adjusted to pH 3.0 using phosphoric acid. Because of this acidification step, it is important to remove pectins prior to total protein extractions; we found in previous trials that the pectins likely polymerized and precipitated out of solution, converting samples mostly to a gelatinous state unsuitable for further analyses (data not shown). The combined peptide mixture was fractionated by strong cation exchange (SCX) chromatography on a BioCAD workstation (Applied Biosystems), using a 4.6 mm × 20 cm polysulfoethyl aspartamide column (PolyLC Inc, Columbia, MD, USA). First, the mixed samples were loaded in buffer A at a flow rate of 0.2 mL/min. Once completely loaded, the column was washed for 20 min with buffer A. Peptides were eluted by a linear gradient of 0 to 350 mM KCl in buffer B (20 mM KH2PO4, 25% acetonitrile, pH 3.0). Sixty-nine fractions were collected over the course of 70 min at a flow rate of 1 mL/min. Of these fractions, only 12 fractions containing the eluted labeled peptides as measured by optical density monitoring at 214 nm were chosen for analysis on a 2 h LC-MS/MS program. The fractionated samples were reduced to 150 μL in a speed-vac (Thermo-Savant, Holbrook, NY, USA) and transferred to autosampler tubes (LC Packings, Amsterdam, The Netherlands).
Liquid chromatography and mass spectrometry
The samples were analyzed for identification and quantitation on a QSTAR Pulsar i hybrid tandem mass spectrometry (LC-MS/MS) system (Applied Biosystems, MDS Sciex), fitted with a nano-electrospray ionization source (Proxeon, Odense, Denmark) using a 10 μm fused silica emitter tip (New Objectives, Woburn, MA, USA) and interfaced with an integrated LC system consisting of a Famos autosampler, SwitchOS II switching pump, and Ultimate micropump (LC Packings). Individual fractions containing peptides were injected onto a 300 μm × 5 cm C18 PepMap guard column (5 μm, 100A; LC Packings), resolved using a 75 μm × 150 mm analytical column (3 μm, 100A; LC Packings), and eluted using an automated binary gradient (200 nL/min) from 100% buffer A (2% acetonitrile (ACN), 0.05% formic acid in H2O) to 40% buffer B (0.05% formic acid in 98% ACN) in 40 min, then from 40% to 80% buffer B for 5 min. MS time of flight (TOF) scans were acquired from m/z 400 to 1200 for one second with up to two precursors selected for MS/MS from m/z 100 to 1500 using information-dependent acquisition at 2.5 seconds per scan; rolling collision energy was used to promote fragmentation.
Custom predicted tryptic peptide database
A schema showing the pipeline for production of the predicted peptide database in support of this subsection is shown in Figure . All publicly available EST data for each
Vitis species (AS, all sequences), including those from all
V. vinifera (wine grape) cultivars, were downloaded in August 2007 as FASTA files from the National Center for Biotechnology Information (NCBI, Bethesda, MD, USA). These data were parsed on the basis of reported
Vitis species of origin with the vast majority being from
V. vinifera cultivars. Since we were specifically interested in studying the proteome in
V. vinifera cv. Cabernet Sauvignon (CS) pericarp tissue, an additional, more rigorous approach to the parsing of the CS ESTs was carried out in order to reduce or eliminate the potential for subsequent assembly of paralogous CS sequences into invalid contigs, thereby striving to strengthen the validity of protein identification in our iTRAQ experiments. CS ESTs were obtained from the NCBI Genbank database or from an in-house EST project [
17] and subdivided into the following categories based upon the reported source tissues for the cDNAs used for single pass sequencing: Whole berry including seed (CSB), berry without seed (pericarp, CSP), skin without seed or flesh (exocarp, CSE), seed only (CSS), and other tissues (CSO) including leaf, flower, tendril, and root. Because the in-house ESTs were also present in the NCBI Genbank database, the corresponding entries in Genbank were removed since the Genbank entries do not have sequence quality (phred [
18]) scores. The following files containing EST data comprised each of the above mentioned groups: VV (VV.fasta representing all
V. vinifera ESTs including in-house ESTs); WS (WS.fasta including ESTs from all available wild species,
V. aestivalis;
V. cinerea ×
V. riparia;
V. cinerea ×
V. rupestris;
Vitis hybrid (species not indicated in Genbank);
V. labrusca;
V. pseudoreticulata;
V. riparia;
V. rotundifolia;
V. shuttleworthii), CSO (Bud.fasta; Flower_leaf _root.fasta; Leaf_blade.fasta; Petiole.fasta; Root.fasta; Flower-Pre-bloom.fasta; Inflorescence_including_flowers.fasta; Stem.fasta; Nectary_of_flowers.fasta; Flower_Bloom.fasta; Leaf.fasta; Inflorescence.fasta), CSS (Seed.fasta), CSP (Pericarp.fasta; Fruit_with_seeds_removed.fasta; Fruit_without_seeds.fasta), CSE (Fruit_skin.fasta), and CSB (Berry.fasta; Fruit.fasta).
Sequences were processed using cross_match (minmatch 12, penalty -2, minscore 20;
http://www.phrap.org) and trim2 (G. Williams,
http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/trimseq.html) in order to remove vector sequences as well as ambiguous nucleotides at the sequence ends. To successfully perform the above cleanup analyses, phred quality scores were used where available; otherwise, 'place-holder' quality scores were generated for any sequences for which no phred scores were available, as was the case for most of the ESTs in Genbank. Place-holder quality scores were also used later in the cluster assembly process as discussed in more detail, below. Following the cross_match and trim2 processing, the sequences were further trimmed using Perl scripts designed in-house to eliminate known invalid sequences (e.g. microbial sequences, simple sequence repeats) and trim polyA/T tails, if present in a given sequence. PolyA/T stretches were limited to 12 bp in order to prevent subsequent chimeric contig assembly based on those repeats. If polyA was followed by a > 30 bp stretch of AC, AT, GC, or GT repeats, the polyA stretch was trimmed to 12 bp and all sequence 3' to this was discarded; if polyT was preceded by a > 30 bp stretch of AC, AT, GC, or GT repeating sequence, the polyT stretch was trimmed to 12 bp and all sequence 5' to this was discarded. If polyA started at least two thirds of the EST sequence length, it was trimmed to 12 bp; if polyT started at less than one third of the EST sequence, it was trimmed to 12 bp. Any part of a sequence that started or ended with > 30 bp of repeats of AC, AT, GC, or GT was deleted. If a sequence started or ended with 'N's (indicating ambiguous base calls), the 'N's were deleted and the corresponding quality scores were also removed.
To better ensure that contig assemblies were based on high quality nucleotide sequence data, percent 'N' (ambiguous base call) content was determined for each sequence. If the percentage was > 0.3 (i.e. four or more 'N's per 1000 bp), the flanking 100 bp regions where scanned for 'N's and, if present, were trimmed to exclude the 'N's, thereby lowering the total 'N' percentage. Sequences shorter than 200 bp were trimmed to the first and last occurrences of an 'N'. For resulting sequences longer than 50 bp, the 'N' percentage was recalculated and, if still > 0.3%, a record of the sequence was made. Each of these sequences was then compared with other sequences in a combined dataset using BLASTN to determine its uniqueness. If a given sequence was already represented in the dataset by another sequence with a lower 'N' content, the sequence in question was eliminated.
The curated sequence datasets were next clustered using PCAP software [
19] with parameters of 95% overlap identity and 60 bp overlap length [
17]. PCAP was used instead of CAP3 in order to take advantage of parallelized processing. Parallelization provided the ability to distribute each dataset assembly workload across 100 CPUs for significantly faster processing time. The PCAP assembly program was modified and recompiled with EST_flag set at 1 (the default is 0, which indicates genomic reads). The PCAP assembly step was followed by a series of post-assembly steps (bdocs -y 100 -z 0, bclean -y 100 -w 1, bcontig -y 100 -p 95, bconsen -y 100 -z 0 -p 95, bform -y 100). We performed two clustering permutations in order to test the effects of database design on peptide identification using our iTRAQ data. First, we clustered all sequences together to create the "AS" database, including WS, VV, and all CS_ files; all sequences were weighted evenly. Second, CSB, CSE, CSP, CSO, CSS, WS, and VV (including CS sequences) were clustered separately with higher weighting (place-holder scores) placed on CS sequences in the VV build and the original phred scores retained for the in-house CS sequences. Weighting was accomplished by assigning higher quality scores such that when polymorphisms were encountered by PCAP in an assembly, preference was given for selection of the CS nucleotide for the resulting contig. Following assemblies, the generated contigs and singletons were merged into one file for each dataset (AS, CSB, CSE, CSP, CSO, CSS, VV, WS). Any sequences longer than 2500 bp were suspected to be chimeric, so they were parsed to a separate file, translated in all 6 frames, and peptides with a minimal size of 80 amino acids before a predicted stop codon were submitted to a BLASTX search against the nr database. The resulting multiple peptides predicted within long contigs were coded with "LC", as well as with "F" for the translational frame, with the frame number (either positive for forward or negative for reverse) and the peptide number designated from among the multiple peptides (separated by a period from the frame number).
A BLASTX analysis was next performed on each contig and singleton sequence against the nr database in order to identify the best frame for subsequent in silico translation. The frame identified via BLASTX analysis was then used to generate the predicted ORF (i.e. amino acid translation) for a given contig or singleton. In order to further curate predicted ORFs, each was subjected to in silico cleavage at any 'unknown' amino acid ('X') or stop codon and then compared to a similarly generated list of peptides from the corresponding best scoring protein sequence identified in the BLASTX search. The 'best peptide' was then identified in the translation frame as the peptide with an exact match to the BLASTX peptide. If no such peptide could be identified, the longest peptide generated by in silico cleavage of the sequence at each occurrence of an unknown amino acid and/or stop codon was used. All sequences which resulted in "no hit found" in the BLASTX results were subsequently translated in all six frames and appended to the end of the 'best peptide' file. In all cases where a six-frame translation was applied, the resulting peptides (designated as 'NH' in the database) were cleaved in silico at every unknown amino acid and/or stop codon and only those sequences 80 amino acids or longer were kept.
The resulting list of 'best peptides' for each of the sequences was then subjected to BLASTP analysis using the UniProtKB database in order to determine the sequence identity. The five highest BLASTP hits for each query sequence were aligned using an in-house Perl script to identify putative N- and/or C-termini. If no consensus site could determined for an N- or C-terminus via alignment with similar sequences, then these sequences were trimmed at the tryptic digestion site nearest to the ends of the predicted ORF to eliminate potentially truncated predicted tryptic peptides from the database. The parameters programmed into the scripts included: 1) BLASTP e-values, where E = H indicates a less significant hit (> 1e-05) and E = L indicates a stronger hit (E ≤ 1e-05), 2) the difference in length of the Vitis query sequence versus each of the top five subject sequences, and 3) the length of the exact match of amino acids to the top hit. These parameters improved automation of accurate predictions of methionine (M) sites and identification of likely full-length amino acid sequences without requiring manual inspection of the BLASTP results. Sequences identified with a predicted methionine at the N-terminus were coded with '(M)'. Determinations of C termini were done based on a small range cutoff (± 2) of amino acids between the stop codon in each top hit and the predicted stop codon in each corresponding Vitis ORF; if the difference was greater than two amino acids and deemed unclear, the C-terminal end of the predicted ORF was trimmed at the nearest upstream tryptic cleavage site.
The detailed process, above, was applied to each data set individually (AS, WS, VV, CSO, CSS, CSP, CSE and CSB). In preparation for the merger of the datasets, the CS tissue-specific sequences were analyzed for uniqueness based on comparing each sequence to every other sequence and discarding all shorter sequences for each exact match. Once all CS duplicate sequences were removed, this dataset was merged with the remaining two sets, VV and WS. This final set consisting of all of the sequences was then subjected to yet another uniqueness test where each sequence was compared to every other sequence but this time CS sequences where intentionally not removed, even if an exact duplicate existed in either the VV or the WS set and was of greater length. This allowed for preferential retention of the CS sequences in order to keep information about the tissue of origin of a detected protein. Out of a total of 113243 sequences submitted to the uniqueness test, 52394 were identified with CS duplicates present due to the preferential retention of those sequences. From the resulting sequences, only those that started with a predicted methionine were then submitted for SignalP analysis
http://www.cbs.dtu.dk/services/SignalP/ and processing which allowed for the identification and ultimate trimming of signal peptides giving rise to predicted mature protein sequences. Those trimmed sequences that had a predicted cleavable targeting signal were coded '(SP)'. After removal of the predicted signal peptides, a final uniqueness test was performed.
Analysis of MS/MS data
iTRAQ MS/MS data were analyzed using ProteinPilot software v. 2.0.1 (Applied Biosystems) for both tryptic peptide identification and quantification. The peptides and corresponding relative abundances were obtained in ProteinPilot using a confidence cutoff (called a 'Prot Score') of > 1.3 (> 95%). Database searching for each sample was done on predicted tryptic peptide sequence data using either the MSDB database (Release 20063108, Hammersmith Campus of Imperial College London) or in-house databases (Vitis_spp_ORF_db_v1.0, untrimmed, or AS databases). Annotations and annotated protein names indicated in ProteinPilot output files were coded to indicate several parameters specific to the ORF identified as well as the EST or contig from which the ORF sequence was predicted.
iTRAQ data representing the four ripening initiation stages in each of the three exocarp samples (2004, 2005-1, 2005-2) were combined into a single tab delimited file. Likewise, iTRAQ data representing each of the three mesocarp samples (2004, 2005-1, 2005-2) were combined into a second tab delimited file. Duplicate entries among exocarp or mesocarp files were identified using an in-house script in the R environment with 'Custom ORF ID' as the search string. Then, ratiometric data at each of the three comparisons using 'green' as the reference stage (i.e. pink/green; red/green; purple/green) were averaged prior to export for cluster analyses. Entries with the same name but different template cDNA sources were not averaged since these may represent isoforms from different source tissues and/or cultivars. We chose to cluster all proteins detected in the exocarp or mesocarp in order to capture all information on expression patterns detected, without restricting our analyses to only those proteins that were replicated amongst the individual exocarp or mesocarp files. K-means clustering into four partitions was carried out on ratiometric data for the exocarp and mesocarp files separately using MultiExperiment Viewer (MeV) software (The Institute for Genome Research;
http://www.tm4.org/mev.html). We used a 1.5-fold threshold for biological significance which was validated by consistencies between trends in protein expression presented here as increasing or decreasing with corresponding patterns of gene expression identified in previous publications (see Figure legend for citations).