|Home | About | Journals | Submit | Contact Us | Français|
We have developed novel molecular methods using a stool sample, which contains intact sloughed colon cells, to quantify colonic gene expression profiles. In this study, our goal was to identify diagnostic gene sets (combinations) for the noninvasive classification of different phenotypes. For this purpose, the effects of a legume-enriched, low glycemic index, high fermentable fiber diet was evaluated in subjects with four possible combinations of risk factors, including insulin resistance and a history of adenomatous polyps. In a randomized crossover design controlled feeding study, each participant (a total of 23; 5–12 per group) consumed the experimental diet (1.5 cups of cooked dry beans) and a control diet (isocaloric average American diet) for 4 weeks with a 3-week washout period between diets. Using prior biological knowledge, the complexity of feature selection was reduced to perform an exhaustive search on all allowable feature (gene) sets of size 3, and among these, 27 had (unbiased) error estimates of 0.15 or less. Linear discriminant analysis was successfully used to identify the best single genes and two- to three-gene combinations for distinguishing subjects with insulin resistance, a history of polyps, or exposure to a chemoprotective legume-rich diet. These results support our premise that gene products (RNA) isolated from stool have diagnostic value in terms of assessing colon cancer risk.
Colon cancer is one of the leading causes of cancer-related deaths in the United States. Early detection is one of the proven strategies resulting in a higher cure rate (1). Unfortunately, the currently adopted screening procedures for early detection are often invasive (e.g., colonoscopy), and discomfort associated with such procedures generally leads to resistance toward the screening process. Thus, adoption of noninvasive methods designed to reduce anxiety over colorectal cancer screening and improve overall acceptance of the screening process would be highly desirable.
We recently showed that a high level of dry bean intake reduced tumor formation in carcinogen-injected mice (2, 3) and decreased the risk of advanced colorectal adenoma recurrence among participants in the Polyp Prevention Trial (4). Based on these studies, we hypothesized that a legume-enriched diet may reduce the rate of absorption of carbohydrates, lowering the postprandial glycemic index and insulinemic response, leading to a suppression in the level of inflammatory mediators and markers of insulin resistance (IR; ref. 5). In addition, the high level of fermentable fibers in beans would enhance the production of butyrate, an anti-inflammatory, antineoplastic short-chain fatty acid (6, 7). Although further studies are warranted to characterize the molecular features of chemoprotective diets, rigorous analysis of the effects of diet on transcriptome profiling has been limited thus far, largely due to difficulties in obtaining appropriate samples. Therefore, the development of noninvasive molecular methods using stool for the purpose of quantifying colonic gene expression profiles would be highly desirable.
Approximately one sixth to one third of normal adult colonic epithelial cells are shed daily (8). Exploiting this fact, we have developed novel noninvasive methods using feces, containing exfoliated colonocytes, to quantify colonic mRNAs (9–11). Although RNA is generally less suitable than DNA because it is readily degraded, we and others have shown that intact fecal eukaryotic mRNA can be isolated because of the presence of viable exfoliated colonocytes in the fecal stream (9, 11–14).
Using exfoliated colonocytes, we have previously reported the discriminative mRNA expression signatures between inflammatory bowel disease versus normal and between adenoma versus normal (11). These data suggest that mRNA isolated from exfoliated human colonocytes can be used to detect early stages of colon cancer and possibly chronic inflammation. However, the microarray gene expression profile–based classification of colonic diseases for diagnostic purposes has yet to be solved. Therefore, in this study, we further determined the feasibility of the noninvasive mRNA procedure in patients at high risk for colorectal adenoma recurrence. Specifically, the effect of a legume-enriched, low glycemic index, high fermentable fiber diet on subjects exhibiting a combination of risk factors including IR and history of adenomatous polyps was evaluated. To our knowledge, this is the first controlled feeding study to examine the effects of legumes or a low glycemic index diet on changes in intestinal gene expression profiles using exfoliated colonocytes. Our goal was to develop diagnostic gene sets (combinations) for the objective classification of different phenotypes. Applying this approach to a test set of 23 subjects, we have identified the best single genes and two- to three-gene combinations for distinguishing polyps, IR, and exposure to a legume diet. We also report that using combinations of genes, the classification error rate can be significantly lowered. Two- and three-gene combinations thus provide robust classifiers with potential to noninvasively identify discriminative signatures for differential diagnostic purposes.
After obtaining informed consent of the subjects, a controlled feeding study was conducted, comparing the effects of a legume-enriched, low glycemic index diet to the average American diet (control) in four different groups of male participants: (a) previous history of adenomas and IR; (b) previous history of adenomas with no IR; (c) IR with no history of adenomas; and (d) no-IR and no history of adenomas. Subjects were enrolled into a two-period crossover study in which all four groups were randomly allocated to each of two diets: (i) a control diet); (ii) a high-legume, low glycemic index diet. The subjects (a total of 23; 5-12 per group) consumed the experimental diets for 4 wk with a 3-wk washout period before crossing over to the other diet. The overall study design is shown in Fig. 1. Baseline samples were collected before commencing each diet period, and additional samples collected at the end of each diet period. All procedures used in the study were reviewed and approved by the human subjects' committees at the Pennsylvania State University (PSU), Texas A&M University, and the NIH. Study procedures are briefly summarized below.
Subjects were recruited with the assistance of gastroenterologists performing colonoscopies at the Mount Nittany Medical Center in State College, Pennsylvania. Nursing staff reviewed all colonoscopy records for eligible participants. Eligible patients were mailed a letter signed by their endoscopist inviting them to participate in the study and asking them to return a postcard to indicate that they would like to be contacted or to call the PSU study coordinator (toll-free), if they would like to learn more about the study. A preliminary telephone eligibility screening was completed by the coordinator, and subjects eligible according to the phone interview were invited to the PSU General Clinical Center Research Center (GCRC). After receiving informed consent, the participant's height, weight, and blood pressure were checked by study staff or the nurses at the clinic, and a fasting blood sample was taken to determine overall health (including fasting insulin and glucose to determine insulin sensitivity and cholesterol levels and lab tests for heart and liver function). A physician at the GCRC reviewed the results to determine eligibility for participation. All eligible consented participants were asked to return to the GCRC to assess their resting metabolic rate. Demographic, health, and lifestyle questionnaires were completed, and the participants were given instructions for completing a 4-day food record for the purpose of estimating pre-study baseline dietary intake.
Eligible participants for the study were males between 35 and 75 y of age, with a body mass index of 25.0 to 34.9 kg/m2, and having undergone a screening colonoscopy within the past 2 y. Only male subjects were recruited because in our previous study, males had a greater response to diet (e.g., beans) and exhibited a higher risk for polyp recurrence (4). Participants could not be diagnosed with a serious medical condition such as cancer, heart disease, kidney disease, diabetes, or other serious medical condition including a history of colorectal cancer, bowel resection, polyposis syndrome, or inflammatory bowel disease. Subjects were not permitted to take any medication that would alter inflammation markers, insulin, glucose, or blood lipids. The +Polyp group had polyps removed within the past 2 y. No subjects reported the development of colon polyps during the study.
Subjects consumed one meal per day, breakfast or dinner, on site during the weekdays and consumed a packed lunch, snack, and other meal at a time and place of convenience. Weekend meals were prepared and packed for carryout. No foods other than those provided by the study kitchen were permitted. At each visit, subjects were weighed and asked to return any uneaten foods. Thus, compliance was monitored on a daily basis by assessment of body weight, direct observation of the consumption of one “in-house” meal per day, and by daily review of uneaten foods. Subjects were also asked to record any food items not provided by the study. Alcohol consumption was limited to no more than two drinks per week during the controlled feeding period. In addition to the monitoring of the dietary records, subjects were queried daily about alcohol consumption to ensure compliance. No subjects reported the consumption of non-study foods or excessive alcohol during the week of fecal and blood collections.
A 7-day menu cycle was developed with a standard set of legumes of the Phaseolus vulgaris species, such as navy beans, pinto beans, and kidney beans, to limit nutrient and phytochemical differences in the 7-day diet cycle. The diet contained ~250 g of legumes per day (1.5 cups). This level added ~20 g of total dietary fiber and 8 g of soluble fiber per day. The diet was modified to provide other high glycemic index foods in the control diet so that the glycemic index of the control diet was ~70, compared with a glycemic index of 30 in the legume diet. Each daily menu was designed to maintain a constant level of fat (32-33 energy %), whereas the high-legume low glycemic index diet had a total dietary fiber intake of ~40 g/d, compared with 20 g/d for the high glycemic index diet. The protein level of both diets was ~18 energy %. To maintain the same level of red meat and fish (foods that have been associated with colon cancer) in both diets, the protein in legumes was substituted for protein from poultry. All nutrients were provided in amounts to meet the recommended dietary allowances for men of the same age groups. A food composite for each of the 6 days was freeze-dried and analyzed for macronutrient and fiber levels. Individual food items were purchased at the same time from the same supplier to ensure uniformity of the diet.
The overall structure of the microarray data set is shown in Supplementary Table S1. Stool samples were collected by the subject into a sterile cup, sealed, and placed at 4°C storage for up to 12 h. Samples were then coded by the Research Assistant, homogenized in a guanidinium-based solution, and stored at −80°C until polyA RNA was isolated. From each subject, poly A+ RNA was isolated from feces as we have previously described (11). Due to the high level of bacterial RNA in fecal samples, poly A+ RNA was isolated to obtain a highly enriched mammalian poly A+ RNA population. We have previously shown that with the isolation of poly A+ RNA, contamination with bacterial RNA is undetectable (9). In addition, an Agilent 2100 Bioanalyzer was used to assess integrity of fecal poly A+ RNA. Samples were processed in strict accordance to the CodeLink Gene Expression Assay manual (Applied Microarray) and analyzed using the Human Whole Genome Expression Bioarray as we have previously described (15). Each array contained the entire human genome derived from publicly available, well-annotated mRNA sequences.
Arrays were inspected for spot morphology. Marginal spots were flagged as background contaminated (C), irregularly shaped (I), or saturated (S) in the output of the scanning software. Spots that passed the quality control standards were categorized as good (G). In addition, spots marked with (L) indicated the reading was “near background.” The low (L) measurements reflect either true low gene expression levels or may have been caused by degradation of the mRNA resulting in a low signal. Typically, samples collected from colonic mucosa (15) exhibited a relatively low proportion (5-8%) of L spots. In contrast, the proportion of L spots obtained from fecal samples was significantly higher (65-83%).
For the purpose of interarray normalization, a set of housekeeping genes was used. These were determined in the following manner.
Common good probes (2,584) across all 86 microarrays were identified. A good probe is defined as having, at most, two low measures across all 86 microarrays. Using a list of 575 housekeeping genes (16), 18 genes were identified from the 2,584 probes found in the previous step. Subsequently, the raw intensity of each of the 18 housekeeping genes was quantified, and those with missing values were excluded. As a result, there were a total of 18 housekeeping genes used for normalization. Refer to Supplementary Methods and Supplementary Fig. S1 for details.
Arrays were grouped across time and the average values of 18 housekeeping genes were calculated (Supplementary Fig. S1). Median values of the averages were also calculated for the first 67 arrays. Subsequently, a robust piecewise linear regression was done and the corresponding regression value for each array was calculated. Following this step, the difference between the median and regression values for each array was determined, and the raw expression values of the genes on each array were shifted by the corresponding discrepancies.
Details related to the development of an algorithm for identifying feature (gene) sets are described in Supplementary Methods. Because our main goal was to determine if mRNA data from exfoliated colonocytes have the potential to classify different colon cancer risk factors, we compared the obtained array data sets (termed A) with a set of 529 putative human colonic markers (termed B; refer to Supplementary Table S2). Using such prior biological knowledge, we investigated the set of genes common to the microarrays and putative colonic markers . The number of common genes for various values of analysis parameters is given in Supplementary Table S3. Based on these results, we used a conservative approach that provided us with a subset of putative colonic biomarkers that have strong signal (k = 2), compared with the CodeLink weaker default condition (k = 1.5), and no more than one low signal spot (j = 1) in the entire data set. It is possible, therefore, to group microarray data into various combinations of two different classes. This is due to the experimental design that lists risk factors: (+IR) and (−IR); four time points: baseline 1 (BL1), diet period 1 (DP1), baseline 2 (BL2), diet period 2 (DP2); and two diets: high legume/low glycemic index and control. These different groupings produced their respective sets of genes, which could be larger or smaller depending on the microarrays that were included in the corresponding groups or classes (Supplementary Table S4).
For the purpose of identifying feature sets, we designed classifiers that categorize samples based on the expression values of the genes from the intersection of the array gene set and the colon biomarker list . An important consideration is that the number of genes in the feature sets should be sufficiently small. Hence, we constructed the classifiers for feature sets of sizes 1, 2, and 3. Generally, there are two reasons why it is desirable to design classifiers involving small numbers of genes: (a) the limited number of samples often available in clinical studies makes classifier design and error estimation problematic for large feature sets (17); and (b) small gene sets facilitate design of practical immunohistochemical diagnostic panels. For similar reasons, simple classifiers are preferable for small samples; indeed, for small samples, if good classification is possible, then a simple classifier such as linear discriminant analysis (LDA) using a small number of genes will typically outperform a complex classifier (18).
Given a set of features on which to base a classifier, one has to address not only the classifier design from sample data but also the estimation of its error. When the number of potential feature sets is large, the key issue is whether a particular feature set provides good classification. Therefore, a concern is the precision with which the error of the designed classifier estimates the error of the optimal classifier. When data are limited, an error estimator may have a large variance and therefore may often be low even if it is approximately unbiased. This can produce many feature sets and classifiers with low error estimates. The algorithm used in this study mitigated this problem by applying the bolstered error estimation (19). It has advantages with respect to commonly used error estimators such as resubstitution, cross-validation, and bootstrap methods for error estimation in terms of speed and accuracy (bias and variance). Basically, this approach “bolsters” the original empirical distribution of the available data by means of suitable bolstering kernels placed at each data point location. The error can be computed analytically in some cases, such as in the case of LDA. The relatively small size of the gene set allows for the comparison of the errors of all of the possible feature sets of sizes 1, 2, and 3, thereby avoiding feature selection, which can be highly unreliable in small sample settings (20). The result of the overall approach is a list of “best” feature sets from among all possible feature sets. Hence, the best feature set is the one possessing minimum classification error. Because we only have data and not the underlying feature-label distributions, the errors have been estimated from the data. This approach takes into account that, in small-sample settings, we do not have much confidence in any single feature set and that it is much more likely that, if there is an adequate sized collection of good-performing feature sets, then there are likely to be some that perform well on the overall population (21).
Stool-based molecular diagnostic tests are emerging as important new approaches that have the potential of providing cost-effective, sensitive early detection of colorectal neoplasia. Details of many of the currently used and novel approaches have been recently reviewed (22). Because a single genetic product is unlikely to have sufficient detection sensitivity and specificity to be used as a “stand-alone” diagnostic test, a fecal-based DNA detection system that exploits the concept of chromosomal instability with mutations progressively accumulating in the adenomatous polyposis coli, p53 tumor suppressor genes, and the K-ras oncogene has been recently developed (23, 24). Publications in small trials (16-65 subjects) reported test sensitivity ranging from 62% to 91% for adenocarcinoma detection and 27% to 82% for adenoma detection, with specificity ranging from 93% to 98%. Validation of these preliminary data in a large (4,404 evaluated subjects) prospective colorectal cancer screening trial resulted in a sensitivity of 52% (95% confidence interval, 35-68%) for detection of adenocarcinoma and 15% (95% confidence interval, 12-19%) for detection of adenomas ≥1 cm, with double the sensitivity when the adenoma had dysplasia. Specificity for the fecal DNA test was 94% (23). Very recently, stool DNA test 2 and a novel digital melt curve assay, which targets more broadly informative markers, detected significantly more screen-relevant neoplasms compared with occult blood testing (25, 26). From these data, it is logical to assume that fecal DNA tests could serve as an intermediate, noninvasive screening tool for colorectal adenocarcinoma.
A major disadvantage of DNA-based methods is that it is inherently limited to a small number of hybridizing oligonucleotides, which reduces the likelihood that a neoplasia-associated mutation will be found in the large number and heterogeneity of mutational events occurring in human neoplasia. In addition, a fecal DNA testing panel using nucleotide probes will not detect important epigenetic events associated with human carcinogenesis. For example, epigenetic modifications of DNA (i.e., aberrant promoter hypermethylation) of multiple tumor suppressor genes lead to loss of expression (27). DNA-based methods do not detect these important molecular events. This severely limits the utility of current DNA-based assays. Recently, several attempts have been made to use DNA from stool to detect aberrant CpG island methylation (28, 29). Thus, it is possible that methylated genes may be effective early detection markers for colon adenomas, and offer another mechanistic approach that may increase performance characteristics of stool markers based on mutation detection alone.
To enhance current colon cancer molecular detection assays, our laboratory was first to develop a novel noninvasive molecular method using feces containing intact viable exfoliated colonocytes to quantify colonic mRNAs and determine gene expression profiles (9). Because “global” changes in patterns of gene expression occur throughout the colon well before macroscopic tumors are apparent (30, 31), these data suggest that “diagnostic” gene expression profiles are associated with a large number of shed cells, and hence, recovered cell number should not be a limiting factor (13).
In this feasibility study, our goal was to identify mRNA expression patterns that may establish the basis of a new noninvasive molecular diagnostic method. For this purpose, we applied an algorithm to 12 different pairs of classes arising from the experimental design as described in Fig. 1 and Supplementary Table S1. The number of genes/features for each linear classifier was limited to 3, which allowed for an exhaustive search. The use of small (three-gene) classifiers is not new in the classification of cancer. It goes back some number of years (21, 32). As an initial step within the context of classification, we identified the best single genes (single-gene classifiers) to distinguish phenotype. To illustrate how this approach compares to the traditional statistical analysis, we considered the classes (+IR, + Polyps) versus (−IR, −Polyps) at BL1. The top 10 feature sets of size 1 were compared with the differentially expressed genes in the colonic biomarker set , where t tests were done using normalized and log-transformed gene intensity values. The comparison revealed that 7 of the 10 top one-feature sets (genes) identified by the linear (LDA) classifier also had P values <0.05. This is not surprising because individual differentially expressed genes have been traditionally used to discriminate between phenotypes (33). Interestingly, the results show that there are several cases where single genes can provide good (in terms of the error estimate) classification (Table 1). However, when comparing these results to the two-feature classification for the same two classes, a phenomenon was observed that has been recently documented in the context of gene network modeling (34). Specifically, the expression profiles of a group of genes predicted the target (either a gene or a phenotype) with greater accuracy relative to any proper subset of these genes. For example, single-gene classifiers (one-feature) based on either the Homeobox protein-A3 (HOXA3) or uncoupling protein-2 (UCP2) performed very poorly when discriminating between (+IR, + Polyps) and (−IR, −Polyps) at BL1 (Table 1; Fig. 2A and B). Interestingly, HOXA3 was close to the worst predictor of all of the available 97 genes (ranked 94). In comparison, when combined as a two-feature set, UCP2 and HOXA3 provided one of the best two-feature classifiers (one misclassified data point only) among all of the 4,656 possible two-gene sets (Table 1; Fig. 2C). These data clearly illustrate why complex phenotypes can be explained better by multivariate feature sets.
To identify sets of genes that perform in a multivariate manner to provide strong classification, we specifically looked for pairs of genes that performed better than either of the genes individually, and triplets of genes that performed well and substantially better than the best-performing pair among the three, and so on. To estimate the improvements of the classification performance, we introduced two quantities for each feature set: εbolstered and Δ(εbolstered). εbolstered denotes the bolstered resubstitution error for the LDA classifier for the respective feature set, and Δ(εbolstered) denotes the largest decrease in error for the full feature set relative to all of its subsets. The feature sets were initially ranked based on the value of εbolstered, and subsequently ranked again based on the improvement Δ(εbolstered). For multiple-gene classifiers, we focused on feature sets with high rank in both lists. Along these lines, we designed two-feature classifiers for the classification of (+IR, +Polyps) versus (−IR, −Polyps) data at baseline BL1; (−IR, −Polyps, control diet) versus (−IR, −Polyps, legume diet) data at the end of the two diet periods DP1 and DP2; (+IR, + Polyps) versus (−IR, −Polyps) at baselines BL1 and BL2; (+Polyps) versus (−Polyps) at baselines BL1 and BL2; and (+IR) versus (−IR) at all of the time points. Tables 1 and and22 describe the best (according to this ranking procedure) feature sets identified for the first two of these classification categories, and Fig. 3A and B shows representative multivariate classifiers.
The results in Fig. 4 show that the two factors, IR and history of adenomas, should be considered in tandem when determining the risk for the patient. For example, combining baseline samples (BL1 and BL2) increased the classification error, indicating complications related to the crossover design (Fig. 4A). Similarly, the three-feature set LDA classifiers performed poorly when the classification was considered separately with respect to either one of the two experimental factors (IR) or (Polyps; Fig. 4B and C). The advantage of reporting the results in this way is that multivariate discriminatory power is revealed. This is clearly shown in Table 1 with regard to HOXA3. The gene did not appear on the single-gene list, indicating that the error of the respective classifier exceeded 0.3 (εbolstered = 0.4882). However, it appeared with UCP2, 14-3-3ζ (YWHAZ), insulin growth factor receptor-I (IGF1R), beclin-1 (BECN1), and mitogen-activated protein kinase-11 (MAPK11) genes in the two-gene and three-gene lists, which improved classification error. Interestingly, members of the homeoprotein family of transcription factors (HOXA3 and HOXC6) are developmental regulators of gastrointestinal growth, patterning, and differentiation (35). It is also noteworthy that YWHAZ and IGF1R are capable of regulating apoptosis and cell adhesion (36, 37); UCP2 promotes chemoresistance in cancer cells and mitochondrial Ca2+ sequestration (38, 39); BECN1 stimulates autophagy and inhibits tumor cell growth (40); and MAPK11 (p38β) mediates response to inflammatory cytokines and cellular stress (41). For comparative purposes, fold changes in select genes are presented in Supplementary Table S5.
Legumes and pulses are a rich source of fermentable dietary fibers, which are precursors to luminal butyrate (4). Butyrate has well-known anti-inflammatory and antineoplastic actions (6, 7). In addition, pulses have a low glycemic index (5). Some studies suggest that diets high in fiber and with a lower glycemic index may reduce risk of colorectal cancer and decrease inflammatory markers (4, 42, 43). Therefore, it was important to note that the approach applied in this study can be used to identify genes that are modulated by the consumption of a legume-rich diet (Table 2). Our data show that although transforming growth factor β (TGFβ), which plays a permissive role in cancer progression and wound repair (44, 45), is by itself a reasonable discriminator, when it is combined with HOXA3 and death-associated protein kinase (DAPK1), the error is significantly improved. These observations are worth noting in view of the fact that DAPK1 is an extremely pleiotropic molecule capable of influencing the propensity of cells to undergo autophagy (46). Moreover, it has been recently shown that dietary fiber (butyrate) can enhance TGFβ/Smad3-tumor suppressor signaling in the colon (47, 48). Considering that dietary legumes promote short-chain fatty acid production in the colonic lumen, it is probable that butyrate may have altered TGFβ expression. Clearly, additional studies are needed to elucidate the effect of legume consumption on TGFβ-dependent signaling.
The objective of this proof-of-principle study was to develop diagnostic gene sets for the noninvasive identification of different phenotypes. As opposed to using expression levels of either significantly increased or decreased genes, we applied novel mRNA-based noninvasive methods to identify the best single genes and two- to three-gene combinations for distinguishing polyps, IR, and exposure to a chemoprotective legume-enriched diet. Similar to previous studies (20, 21, 32, 49), we report that by using combinations of genes, the classification error rate can be significantly lowered. Two- and three-gene combinations thus provide robust classifiers with potential to noninvasively identify discriminative molecular signatures for differential diagnostic purposes. These findings provide insight into a new paradigm and support the development of noninvasive methods using exfoliated colonocytes to quantify colonic mRNAs. This strategy can be a complementary, and likely useful, approach to enhance current efforts to define colon cancer risk. In addition, because of a lack of genomic precision in defining clinically relevant phenotypes, two- and three-gene combinations may have application in personalized genomic medicine (e.g., the stratification of patients according to response to risk of recurrence in trials of adjuvant treatment of the disease). Further studies are needed to validate the prognostic power and reliability of this molecular diagnostic approach.
Supplemental Table 1. Overall structure of the microarray data set.
Supplemental Table 2. Final classifier gene list. Refer to attached 529 genes - XLS file.
Supplemental Table 3. represents the number of genes that are common between the set B of established colonic biomarkers and the spots on the microarray set that passed quality threshold set by the parameters k and j. The value k=1.5 is the default value for the CodeLink image processing software, and j represents the number of accepted low (L) spots for a gene across all of the microarrays in the experiment.
Supplemental Table 4. Classification groups, sample size and number of common genes in each data set. BL1, baseline 1; BL2, baseline 2; +IR and −IR indicate presence or absence of insulin resistance, respectively. +Polyps and −polyps indicate the presence or absence of polyps, respectively.
Supplemental Table 5. Relative exfoliated cell gene expression levels in (+IR, +Polyps) vs (-IR, -Polyps) subjects at baseline 1 (BL1). Fold change represents the relative expression level in (+IR, +Polyps) subjects divided by (−IR, -Polyps) subjects for individual genes described in Table 1. p-values were computed using t-tests applied to the normalized data.
We thank Jennifer Goldsby (Genomics Core Facility, Texas A&M) for technical assistance.
Grant support: National Cancer Institute and NIH grants CA59034, CA129444, DK071707, and P30ES09106.
Note: Supplementary data for this article are available at Cancer Prevention Research Online (http://cancerprevres.aacrjournals.org/).
Disclosure of Potential Conflicts of Interest: No potential conflicts of interest were disclosed.