Cell culture and cell apoptosis analysis
HUVECs were purchased from Cascade Biologics (USA) and cultured in Medium 200 supplemented with Low Serum Growth Supplement (LSGS) (Cascade Biologics, USA) in a CO2 incubator (5% CO2) at 37°C. The cells were treated with different concentrations of CoCl2 (100 μM, 300 μM, 600 μM, 900 μM) (Sigma, USA) for 0, 12, 24, 36 and 48 hrs to mimic hypoxia. The cells were then incubated with fluorescein isothiocyanate-conjugated Annexin V (A-FITC) and propidium iodide (PI) using the Apoptest kit (Jiancheng, China) according to the manufacturer's instructions. Flow cytometry analysis was performed using the FACSCalibur system (Becton Dickinson, USA). The data were analyzed using CellQuest software to estimate the apoptosis rate at different time points.
Sample preparation and array hybridization
After being cultured under normoxia or mimicked hypoxia (300 μM CoCl2 for 24 hrs), total RNA was extracted from the HUVECs using the TRIzol reagent, according to the manufacturer's protocol (Invitrogen, USA). Total RNA was dissolved in an appropriate volume of DEPC-treated water following A260/A280 measurement, while the total RNA integrity was evaluated by electrophoresis in a denaturing gel. The RNA samples were further purified using DNase (TaKaRa, Japan). For each experimental condition, three independent replicate samples were obtained for exon array analysis. For each sample, 1 μg of RNA was processed using the Affymetrix GeneChip® Whole Transcript Sense Target Labeling Assay. The GeneChip® WT cDNA Synthesis Kit, the WT cDNA Amplification Kit, and the WT Terminal Labeling Kit (Affymetrix, Inc., Santa Clara, CA) were used for the sample preparation. 8 μg of cDNA were used for the second cycle cDNA reaction. Hybridization cocktails containing 3–4 μg of fragmented, end-labeled cDNA were applied to the GeneChip® Human Exon 1.0 ST arrays. Hybridization was performed for 16 hrs using the MES_EukGE-WS2v5_450-DEV fluidics wash and stain script. The arrays were scanned using the Affymetrix GCS 3000 7G and Gene-Chip Operating Software v1.3 to produce the intensity files.
RT-PCR and quantitative Real-time RT-PCR
1 μg of each RNA sample was used for first strand cDNA synthesis using SuperScript II reverse transcriptase (Invitrogen, USA) and a combination of random hexamer primers and oligo-dT in a total volume of 10 μl. PCR was carried out using 2 μl of cDNA, with specific primers flanking the constitutive exons, and ExTaq Polymerase (TaKaRa, Japan) in a volume of 25 μl. The conditions for PCR amplification were denaturation at 95°C for 5 min, 32 cycles of 95°C for 30 sec, 55°C for 30 sec, and 72°C for 45 sec, followed by a final elongation step at 72°C for 7 min. The PCR products were then separated on 1.5% agarose gels. The RT-PCR products were gel-purified using a PCR purification kit (Promega, USA) and subcloned into the pGEM-T Easy Vector (Promega, USA) for direct sequencing to validate the transcript variants.
1 μl of each cDNA product was used for quantitative real-time PCR amplification with SYBR Green PCR Master Mix (TianGen). The primers were designed and verified by the primer specificity-checking program MFEprimer
http://biocompute.bmi.ac.cn/MFEprimer[
51]. PCR was carried out with an iCycler Real-time PCR detection system (Bio-Rad) under the following conditions (40 cycles): 95°C for 2 min, 95°C for 30 sec, 57°C for 30 sec, and 68°C for 30 sec. SYBR Green analyses were followed by dissociation curves in a temperature range of 60°C~90°C to assess the amplification specificity. Each sample was tested in triplicate and quantified according to the mean expression values obtained for both samples.
Low level analysis of the exon array
Low-level analysis of the optical intensity files of the exon array (".CEL" format) was performed by Affymetrix Power Tools (APT). Background noise was detected by the "Detection above Background (DABG)" algorithm. Normalization was performed using the "quantile normalization" algorithm for both the exon and gene levels. The "Probe Logarithmic Intensity Error Estimation" (PLIER) algorithm was used to estimate exon signals based on probe intensities. At the gene level, a variant algorithm called "Iter-PLIER" was used to summarize gene signals from probeset intensities. The "Iter-PLIER" algorithm can discard probesets with inconsistent signals to avoid low-weighted effects introduced by differentially expressed exons.
Filtering
Hierarchical filtering was then performed to eliminate noise and outliers at both the gene and exon levels. At the exon level, only the probesets considered "Present" (DABG P < 0.05) in at least 50% of the samples in either group were reserved. At the gene level, only the "core" meta-probesets with high confidence were used to estimate gene signals. The differentially expressed genes were considered acceptable based on two principles. First, genes with more than 50% of the "core" exons designated as "Present" (DABG P < 0.05) should appear in more than 50% of the samples in both groups. Second, the gene signals needed to exceed 100. We subsequently removed the probesets labeled as potential cross-hybridization targets based on Affymetrix CSV annotation files.
Detection of differentially expressed genes and alternative splicing
A Bioconductor package called "samr" was used to infer the differentially expressed genes (DEGs) between mimicked hypoxia and normal groups. Corrections for multiple hypothesis testing included using the Benjamini-Hochberg method [
52]. We set parameters Δ = 2.3 and FDR < 2.6 × 10
-4 as cut-off values for DEGs. Other than some regression models [
53,
54], most of the previously published papers used the "Splicing Index" model [
55,
56] to detect alternative splicing events from the exon array data. A program built in-house based on the "Splicing Index" model was used to detect differentially expressed exons. The rate of exon signals to summarized gene signals were defined as the transcription normalized exon signals:
The "Splicing Index (SI)" model was then employed to indicate alternative splicing capability based on the relative inclusion rate of exons:
The absolute value of SI represented the magnitude of difference of the exon inclusion rate between the two groups. To identify the significant alternatively spliced exons, a Student's t-test was used to compare TNS values between the two groups. Finally, the high proportion of true positives, with P-value < 0.015 and fold change magnitudes > 0.5, were retained as potential alternatively spliced exons.
Data deposition
The raw ".CEL" files and normalized data at both the gene and exon levels have been deposited in the Gene Expression Omnibus (GEO) of the National Center for Biotechnology Information
http://www.ncbi.nlm.nih.gov/geo under GEO Series record GSE12546.
Visualization and classification of alternative splicingevents
Before validating the exon array data by various approaches, an expert investigation on gene structure and genomic context was carried out to assess the positions and surrounding mRNA/cDNA sequences of alternatively spliced exons. The Blat program [
57] was used to map alternatively spliced exons in the UCSC Genome Browser
http://genome.ucsc.edu/ referred to the mRNA/cDNA sequences (from the NCBI RefSeq and GenBank databases) or expressed sequence tags (ESTs). Alternatively spliced multi-exon genes were classified into six splicing patterns according to the relative positions of the affected probe selected regions (PSRs) in exons and genes based on the sequence mapping. These classifications were cassette exons, namely exon inclusion and exon skipping, alternative promoters, alternative polyadenylation, alternative donor sites, alternative acceptor sites and intron retention.
Function and pathway analysis
GO, protein function, and pathway enrichment analyses were carried out by the DAVID tool
http://david.abcc.ncifcrf.gov/[
58]. DEGs and alternatively spliced genes were mapped to the KEGG database using GenMAPP software, in order to visualize their distributions in the pathways [
59].
After detecting alternatively spliced exons, their sequences and gene annotations were obtained from the Affymetrix website
http://www.affymetrix.com/support/technical/byproduct.affx?product=huexon-st. The protein sequences of the coding regions of alternatively spliced exons were extracted from the NCBI RefSeq database by a in-house developed Perl program [
60]. The InterProScan software was used to search protein domains via the interfaces of the PFAM, PROSITE, PRODOM, and SMART databases [
61].
Literature mining for functional modules
The purpose of the analysis is to find functional modules from complex biological networks. The functional module was defined as a part of a biological network with specific functions and topological features [
62]. The nodes represent genes, and the links represent regulatory relationships between genes in the modules. A two-step literature mining strategy was performed on up- and downregulated genes to find activated functional modules in affected HUVECs. First, we used the cytoscape plugin "Agilent Literature Search" to construct the biological networks by a literature mining algorithm [
63]. Only direct regulatory relationships between genes were preserved in building the network. Second, some orphan nodes and fake links were manually removed by checking relevant sentences in the obtained literatures in the first step. During the manual module check, the nodes were annotated by description from NCBI Entrez Gene [
64], and the links were classified by regulatory relationships stated in the sentences from the relevant literature.