In this study we investigated the advantage of including network information in prioritizing disease genes of type 1 diabetes (T1D). First, a naïve Bayesian network (NBN) model was developed to integrate information from multiple data sources and to define a T1D-involvement probability score (PS) for each individual gene. The algorithm was validated using known functional candidate genes as a benchmark. Genes with higher PS were found to be more likely to appear in T1D-related publications. Next a new network activity metric was proposed to evaluate the T1D relevance of protein-protein interaction (PPI) subnetworks. The metric considered the contribution both from individual genes and from network topological characteristics. The predictions were confirmed by several independent datasets, including a genome wide association study (GWAS), and two large-scale human gene expression studies. We found that novel candidate genes in the T1D subnetworks showed more significant associations with T1D than genes predicted using PS alone. Interestingly, most novel candidates were not encoded within the human leukocyte antigen (HLA) region, and their expression levels showed correlation with disease only in cohorts with low-risk HLA genotypes. The results suggested the importance of mapping disease gene networks in dissecting the genetics of complex diseases, and offered a general approach to network-based disease gene prioritization from multiple data sources.
It has been demonstrated that urokinase-type plasminogen activator (uPA) is involved in tumor cell metastasis by degrading the extracellular matrix. However, there is little direct evidence of clinical uPA system expression in peritoneal metastatic tissues of gastric cancer. The objective of this study was to investigate uPA system expression in peritoneal tissues of peritoneal and nonperitoneal metastasis patients, and to explore the diagnostic value of the uPA system.
Expressions of uPA, uPAR, and PAI-1 were measured by semi-quantitative RT-PCR and ELISA. uPA activity was detected using a uPA activity kit.
There was no significant difference in uPA, uPAR, and PAI-1 expression in two types of peritoneal tissue in seven patients with peritoneal metastasis. However, uPA, uPAR, and PAI-1 expressions in peritoneal metastatic lesions were significantly higher than those in normal peritoneal tissues of 24 nonperitoneal metastasis patients (P <0.05). Moreover, no statistical discrepancy of uPA activity was observed in various different tissues.
The expression of the uPA system positively correlates with peritoneal metastasis of gastric cancer. This expression difference in peritoneal or nonperitoneal metastasis patients may provide a reference for diagnosis of peritoneal metastasis.
Gastric cancer; ELISA; Peritoneal metastasis; RT-PCR; UPA system
There has been a growing interest in identifying context-specific active protein-protein interaction (PPI) subnetworks through integration of PPI and time course gene expression data. However the interaction dynamics during the biological process under study has not been sufficiently considered previously.
Here we propose a topology-phase locking (TopoPL) based scoring metric for identifying active PPI subnetworks from time series expression data. First the temporal coordination in gene expression changes is evaluated through phase locking analysis; The results are subsequently integrated with PPI to define an activity score for each PPI subnetwork, based on individual member expression, as well topological characteristics of the PPI network and of the expression temporal coordination network; Lastly, the subnetworks with the top scores in the whole PPI network are identified through simulated annealing search.
Application of TopoPL to simulated data and to the yeast cell cycle data showed that it can more sensitively identify biologically meaningful subnetworks than the method that only utilizes the static PPI topology, or the additive scoring method. Using TopoPL we identified a core subnetwork with 49 genes important to yeast cell cycle. Interestingly, this core contains a protein complex known to be related to arrangement of ribosome subunits that exhibit extremely high gene expression synchronization.
Inclusion of interaction dynamics is important to the identification of relevant gene networks.
Peritoneal metastasis in gastric cancer represents a ubiquitous human health problem but effective therapies with limited side effects are still lacking. Although previous research suggested that u-PA was involved in some tumor metastasis such as lung-specific metastasis, the role of u-PA for peritoneal metastasis in gastric cancer is still unclear. The aim of this study was to explore whether selective pharmacological blockade of u-PA is able to affect the peritoneal metastasis of gastric cancer both in vivo and in vitro.
In the present study, we evaluated the effects and explored the anti-tumor mechanisms of amiloride, a selective u-PA inhibitor, on a panel of gastric cancer cell lines and in a murine model of human gastric cancer MKN45.
The study showed that amiloride significantly inhibited the tumor growth and prolonged the survival of the tumor-bearing mice. In vitro, compared with controls, amiloride could not only significantly down-regulate the mRNA expression and protein level of u-PA from MKN45 cells with dose dependence but also inhibit the adhesion of HMrSV5 cells, migration and invasion of MKN45 cells.
The findings in our current report provide evidence that selective u-PA inhibitor amiloride has potent effects against peritoneal metastasis in gastric cancer, suggesting its possible therapeutic value for the treatment of gastric cancer.
u-PA inhibitor; Amiloride; Peritoneal metastasis; Gastric cancer
Complex disorders often involve dysfunctions in multiple tissue organs. Elucidating the communication among them is important to understanding disease pathophysiology. In this study we integrate multiple tissue gene expression and quantitative trait measurements of an obesity-induced diabetes mouse model, with databases of molecular interaction networks, to construct a cross tissue trait-pathway network. The animals belong to two strains of mice (BTBR or B6), of two obesity status (obese or lean), and at two different ages (4 weeks and 10 weeks). Only 10 week obese BTBR animals are diabetic. The expression data was first utilized to determine the state of every pathway in each tissue, which is subsequently utilized to construct a pathway co-expression network and to define trait-relevant and trait-linking pathways. Among the six tissues profiled, the adipose contains the largest number of trait-linking pathways. Among the eight traits measured, the body weight and plasma insulin level possess the most number of relevant and linking pathways. Topological analysis of the trait-pathway network revealed that the glycolysis/gluconeogenesis pathway in liver and the insulin signaling pathway in muscle are of top importance to the information flow in the network, with the highest degrees and betweenness centralities. Interestingly, pathways related to metabolism and oxidative stress actively interact with many other pathways in all animals, whereas, among the 10 week animals, the inflammation pathways were preferentially interactive in the diabetic ones only. In summary, our method offers a systems approach to delineate disease trait relevant intra- and cross tissue pathway interactions, and provides insights to the molecular basis of the obesity-induced diabetes.
Inflammatory mediators associated with type 1 diabetes are dilute and difficult to measure in the periphery, necessitating development of more sensitive and informative biomarkers for studying diabetogenic mechanisms, assessing preonset risk, and monitoring therapeutic interventions.
RESEARCH DESIGN AND METHODS
We previously utilized a novel bioassay in which human type 1 diabetes sera were used to induce a disease-specific transcriptional signature in unrelated, healthy peripheral blood mononuclear cells (PBMCs). Here, we apply this strategy to investigate the inflammatory state associated with type 1 diabetes in biobreeding (BB) rats.
Consistent with their common susceptibility, sera of both spontaneously diabetic BB DRlyp/lyp and diabetes inducible BB DR+/+ rats induced transcription of cytokines, immune receptors, and signaling molecules in PBMCs of healthy donor rats compared with control sera. Like the human type 1 diabetes signature, the DRlyp/lyp signature, which is associated with progression to diabetes, was differentiated from that of the DR+/+ by induction of many interleukin (IL)-1–regulated genes. Supplementing cultures with an IL-1 receptor antagonist (IL-1Ra) modulated the DRlyp/lyp signature (P < 10−6), while administration of IL-1Ra to DRlyp/lyp rats delayed onset (P = 0.007), and sera of treated animals did not induce the characteristic signature. Consistent with the presence of immunoregulatory cells in DR+/+ rats was induction of a signature possessing negative regulators of transcription and inflammation.
Paralleling our human studies, serum signatures in BB rats reflect processes associated with progression to type 1 diabetes. Furthermore, these studies support the potential utility of this approach to detect changes in the inflammatory state during therapeutic intervention.
Bayesian Network (BN) is a powerful approach to reconstructing genetic regulatory networks from gene expression data. However, expression data by itself suffers from high noise and lack of power. Incorporating prior biological knowledge can improve the performance. As each type of prior knowledge on its own may be incomplete or limited by quality issues, integrating multiple sources of prior knowledge to utilize their consensus is desirable.
We introduce a new method to incorporate the quantitative information from multiple sources of prior knowledge. It first uses the Naïve Bayesian classifier to assess the likelihood of functional linkage between gene pairs based on prior knowledge. In this study we included cocitation in PubMed and schematic similarity in Gene Ontology annotation. A candidate network edge reservoir is then created in which the copy number of each edge is proportional to the estimated likelihood of linkage between the two corresponding genes. In network simulation the Markov Chain Monte Carlo sampling algorithm is adopted, and samples from this reservoir at each iteration to generate new candidate networks. We evaluated the new algorithm using both simulated and real gene expression data including that from a yeast cell cycle and a mouse pancreas development/growth study. Incorporating prior knowledge led to a ~2 fold increase in the number of known transcription regulations recovered, without significant change in false positive rate. In contrast, without the prior knowledge BN modeling is not always better than a random selection, demonstrating the necessity in network modeling to supplement the gene expression data with additional information.
our new development provides a statistical means to utilize the quantitative information in prior biological knowledge in the BN modeling of gene expression data, which significantly improves the performance.
In nonlinear dynamic systems, synchrony through oscillation and frequency modulation is a general control strategy to coordinate multiple modules in response to external signals. Conversely, the synchrony information can be utilized to infer interaction. Increasing evidence suggests that frequency modulation is also common in transcription regulation.
In this study, we investigate the potential of phase locking analysis, a technique to study the synchrony patterns, in the transcription network modeling of time course gene expression data. Using the yeast cell cycle data, we show that significant phase locking exists between transcription factors and their targets, between gene pairs with prior evidence of physical or genetic interactions, and among cell cycle genes. When compared with simple correlation we found that the phase locking metric can identify gene pairs that interact with each other more efficiently. In addition, it can automatically address issues of arbitrary time lags or different dynamic time scales in different genes, without the need for alignment. Interestingly, many of the phase locked gene pairs exhibit higher order than 1:1 locking, and significant phase lags with respect to each other. Based on these findings we propose a new phase locking metric for network reconstruction using time course gene expression data. We show that it is efficient at identifying network modules of focused biological themes that are important to cell cycle regulation.
Our result demonstrates the potential of phase locking analysis in transcription network modeling. It also suggests the importance of understanding the dynamics underlying the gene expression patterns.
Previously we have reported a microarray image processing and data analysis package Matarray, where quality scores are defined for every spot that reflect the reliability and variability of the data acquired from each spot. In this article we present a new development in Matarray, where the quality scores are incorporated as weights in the statistical evaluation and data mining of microarray data. With this approach filtering of poor quality data is automatically achieved through the reduction in their weights, thereby eliminating the need to manually flag or remove bad data points, as well as the problem of missing values. More significantly, utilizing a set of control clones spiked in at known input ratios ranging from 1:30 to 30:1, we find that the quality-weighted statistics leads to more accurate gene expression measurements and more sensitive detection of their changes with significantly lower type II error rates. Further, we have applied the quality-weighted clustering to a time-course microarray data set, and find that the new algorithm improves grouping accuracy. In summary, incorporating quantitative quality measure of microarray data as weight in complex data analysis leads to improved reliability and convenience. In addition it provides a practical way to deal with the missing value issue in establishing automatic statistical tests.
microarray; quality score; weighted algorithms; accurate expression measurement
Proteins directly interacting with each other tend to have similar functions and be involved in the same cellular processes. Mutations in genes that code for them often lead to the same family of disease phenotypes. Efforts have been made to prioritize positional candidate genes for complex diseases utilize the protein-protein interaction (PPI) information. But such an approach is often considered too general to be practically useful for specific diseases.
In this study we investigate the efficacy of this approach in type 1 diabetes (T1D). 266 known disease genes, and 983 positional candidate genes from the 18 established linkage loci of T1D, are compiled from the T1Dbase (http://t1dbase.org). We found that the PPI network of known T1D genes has distinct topological features from others, with significantly higher number of interactions among themselves even after adjusting for their high network degrees (p<1e-5). We then define those positional candidates that are first degree PPI neighbours of the 266 known disease genes to be new candidate disease genes. This leads to a list of 68 genes for further study. Cross validation using the known disease genes as benchmark reveals that the enrichment is ~17.1 fold over random selection, and ~4 fold better than using the linkage information alone. We find that the citations of the new candidates in T1D-related publications are significantly (p<1e-7) more than random, even after excluding the co-citation with the known disease genes; they are significantly over-represented (p<1e-10) in the top 30 GO terms shared by known disease genes. Furthermore, sequence analysis reveals that they contain significantly (p<0.0004) more protein domains that are known to be relevant to T1D. These findings provide indirect validation of the newly predicted candidates.
Our study demonstrates the potential of the PPI information in prioritizing positional candidate genes for T1D.
Members of the apolipoprotein gene cluster (APOA1/C3/A4/A5) on human chromosome 11q23 play an important role in lipid metabolism. Polymorphisms in both APOA5 and APOC3 are strongly associated with plasma triglyceride concentrations. The close genomic locations of these two genes as well as their functional similarity have hindered efforts to define whether each gene independently influences human triglyceride concentrations. In this study, we examined the linkage disequilibrium and haplotype structure of 49 SNPs in a 150-kb region spanning the gene cluster. We identified a total of five common APOA5 haplotypes with a frequency of greater than 8% in samples of northern European origin. The APOA5 haplotype block did not extend past the 7 SNPs in the gene and was separated from the other apolipoprotein gene in the cluster by a region of significantly increased recombination. Furthermore, one previously identified triglyceride risk haplotype of APOA5 (APOA5*3) showed no association with three APOC3 SNPs previously associated with triglyceride concentrations, in contrast to the other risk haplotype (APOA5*2), which was associated with all three minor APOC3 SNP alleles. These results highlight the complex genetic relationship between APOA5 and APOC3 and support the notion that APOA5 represents an independent risk gene affecting plasma triglyceride concentrations in humans.
Single nucleotide polymorphism; Apolipoprotein A5; Haplotype; Linkage disequilibrium; Recombination; Four-gamete test
Several studies have confirmed the increasing rate of type 1 diabetes mellitus (T1DM) in children and the link with increasing BMI at diagnosis termed the ‘accelerator hypothesis’. Our objective was to assess whether changing incidence of type 1 diabetes in a group of children and adolescent from the Midwest United States was associated with changes in BMI.
Data from 1618 (52.1% M/47.9% F) newly-diagnosed children and adolescents (<19 years) with T1DM, admitted to Children's Hospital of Wisconsin (CHW) between January 1995 and December 2004, was analyzed in relationship to body mass index (BMI) standard deviation score (SDS).
An overall, 10-year cumulative incidence of 27.92 per 100,000 (19.12 to 41.72/100,000) was observed, with an average yearly cumulative incidence of 2.39%. The increase was largest in the younger age groups, 0–4, 5–9, and 10–14 having an average yearly increase of 2.4, 2.3, and 3.0%, respectively, corresponding to a relative 10-year increase of 25.3, 33.8, and 38.0%, respectively. Age at diagnosis was inversely correlated with BMI SDS (p<0.001) and remained significant for both males and females.
Annual incidence of T1DM increased two-fold at CHW over the 10-year study period. The majority of the increase was observed in the youngest age groups, which also appeared to be the heaviest. This research adds to the growing literature supporting the hypothesis that excess weight gain during childhood may be a risk factor for early manifestation of T1DM.
The effects of AC field exposure on the viability and proliferation of mammalian cells under conditions appropriate for their dielectrophoretic manipulation and sorting were investigated using DS19 murine erythroleukemia cells as a model system. The frequency range 100 Hz-10 MHz and medium conductivities of 10 mS/m, 30 mS/m and 56 mS/m were studied for fields generated by applying signals of up to 7V peak to peak (p-p) to a parallel electrode array having equal electrode widths and gaps of 100 μm. Between 1 kHz and 10 MHz, cell viability after up to 40 min of field exposure was found to be above 95% and cells were able to proliferate. However, cell growth lag phase was extended with decreasing field frequency and with increasing voltage, medium conductivity and exposure duration. Modified growth behavior was not passed on to the next cell passage, indicating that field exposure did not cause permanent alterations in cell proliferation characteristics. Cell membrane potentials induced by field exposure were calculated and shown to be well below values typically associated with cell damage. Furthermore, medium treated by field exposure and then added to untreated cells produced the same modifications of growth as exposing cells directly, and these modifications occurred only when the electrode polarization voltage exceeded a threshold of ~0.4 V p-p. These findings suggested that electrochemical products generated during field exposure were responsible for the changes in cell growth. Finally, it was found that hydrogen peroxide was produced when sugar-containing media were exposed to fields and that normal cell growth could be restored by addition of catalase to the medium, whether or not field exposure occurred in the presence of cells. These results show that AC fields typically used for dielectrophoretic manipulation and sorting of cells do not damage DS19 cells and that cell alterations arising from electrochemical effects can be completely mitigated.
AC field exposure; Lag in cell growth; Hydrogen peroxide; Catalase; Dielectrophoresis; Electrode polarization
The specific membrane capacitance and conductivity of mammalian cells, which reflect their surface morphological complexities and membrane barrier functions, respectively, have been shown to respond to cell physiologic and pathologic changes. Here, the effects of induced apoptosis on these membrane properties of cultured human promyelocytic HL-60 cells are reported. Changes in membrane capacitance and conductivity were deduced from measurements of cellular dielectrophoretic crossover frequencies following treatment with genistein (GEN). The apparent specific cell membrane capacitance of HL-60 cells fell from an initial value of 17.6±0.9 to 9.1±0.5 mF/m2 4 h after treatment. Changes began within minutes of treatment and preceded both the externalization of phosphatidylserine (PS), as gauged by the Annexin V assay, and the appearance of a sub-G1 cell subpopulation, as determined through ethidium bromide staining of DNA. Treatment by the broad spectrum caspase inhibitor N-benzyloxycarbony-Val-Ala-Asp(O-methyl)-fluoromethyketone (zVAD-fmk) did not prevent these early cell membrane dielectric responses, suggesting that the caspase system was not involved. Although membrane conductivity did not alter during the first 4 h of GEN treatment, it rose significantly and progressively thereafter. Finally, as the barrier function failed and the cells became necrotic, it increased by many orders of magnitude. The effective membrane capacitance and conductivity findings serve to focus attention on the membrane as a site for early participation in apoptosis. In conjunction with our prior reports of the use of dielectric methods for cell manipulation and separation, these results demonstrate that dielectrophoretic technologies should be applicable to the rapid detection, separation, and quantification of normal, apoptotic, and necrotic cells from cell mixtures.
Apoptosis; Dielectrophoresis; Membrane capacitance; Membrane conductance; DEP crossover method; Detection of apoptotic cells
Type 1 diabetes (T1D) is a T-cell mediated autoimmune disease targeting the insulin-producing pancreatic β cells. Naturally occurring FOXP3+CD4+CD25high regulatory T cells (Tregs) play an important role in dominant tolerance, suppressing autoreactive CD4+ effector T cell activity. Previously, in both recent-onset T1D patients and β cell antibody-positive at-risk individuals, we observed increased apoptosis and decreased function of polyclonal Tregs in the periphery. Our objective here was to elucidate the genes and signaling pathways triggering apoptosis in Tregs from T1D subjects.
Gene expression profiles of unstimulated Tregs from recent-onset T1D (n = 12) and healthy control subjects (n = 15) were generated. Statistical analysis was performed using a Bayesian approach that is highly efficient in determining differentially expressed genes with low number of replicate samples in each of the two phenotypic groups. Microarray analysis showed that several cytokine/chemokine receptor genes, HLA genes, GIMAP family genes and cell adhesion genes were downregulated in Tregs from T1D subjects, relative to control subjects. Several downstream target genes of the AKT and p53 pathways were also upregulated in T1D subjects, relative to controls. Further, expression signatures and increased apoptosis in Tregs from T1D subjects partially mirrored the response of healthy Tregs under conditions of IL-2 deprivation. CD4+ effector T-cells from T1D subjects showed a marked reduction in IL-2 secretion. This could indicate that prior to and during the onset of disease, Tregs in T1D may be caught up in a relatively deficient cytokine milieu.
In summary, expression signatures in Tregs from T1D subjects reflect a cellular response that leads to increased sensitivity to apoptosis, partially due to cytokine deprivation. Further characterization of these signaling cascades should enable the detection of genes that can be targeted for restoring Treg function in subjects predisposed to T1D.
For a dense set of genetic markers such as single nucleotide polymorphisms (SNPs) on high linkage disequilibrium within a small candidate region, a haplotype-based approach for testing association between a disease phenotype and the set of markers is attractive in reducing the data complexity and increasing the statistical power. However, due to unknown status of the underlying disease variant, a comprehensive association test may require consideration of various combinations of the SNPs, which often leads to severe multiple testing problems. In this paper, we propose a latent variable approach to test for association of multiple tightly linked SNPs in case-control studies. First, we introduce a latent variable into the penetrance model to characterize a putative disease susceptible locus (DSL) that may consist of a marker allele, a haplotype from a subset of the markers, or an allele at a putative locus between the markers. Next, through using of a retrospective likelihood to adjust for the case-control sampling ascertainment and appropriately handle the Hardy-Weinberg equilibrium constraint, we develop an expectation-maximization (EM)-based algorithm to fit the penetrance model and estimate the joint haplotype frequencies of the DSL and markers simultaneously. With the latent variable to describe a flexible role of the DSL, the likelihood ratio statistic can then provide a joint association test for the set of markers without requiring an adjustment for testing of multiple haplotypes. Our simulation results also reveal that the latent variable approach may have improved power under certain scenarios comparing with classical haplotype association methods.
haplotype association; retrospective likelihood; latent variable; logistic mixture model; EM algorithm
Insulin, the principal regulating hormone of blood glucose, is released through the bursting of the pancreatic islets. Increasing evidence indicates the importance of islet morphostructure in its function, and the need of a quantitative investigation. Recently we have studied this problem from the perspective of islet bursting of insulin, utilizing a new 3D hexagonal closest packing (HCP) model of islet structure that we have developed. Quantitative non-linear dependence of islet function on its structure was found. In this study, we further investigate two key structural measures: the number of neighboring cells that each β-cell is coupled to, nc, and the coupling strength, gc.
β-cell clusters of different sizes with number of β-cells nβ ranging from 1–343, nc from 0–12, and gc from 0–1000 pS, were simulated. Three functional measures of islet bursting characteristics – fraction of bursting β-cells fb, synchronization index λ, and bursting period Tb, were quantified. The results revealed a hyperbolic dependence on the combined effect of nc and gc. From this we propose to define a dimensionless cluster coupling index or CCI, as a composite measure for islet morphostructural integrity. We show that the robustness of islet oscillatory bursting depends on CCI, with all three functional measures fb, λ and Tb increasing monotonically with CCI when it is small, and plateau around CCI = 1.
CCI is a good islet function predictor. It has the potential of linking islet structure and function, and providing insight to identify therapeutic targets for the preservation and restoration of islet β-cell mass and function.
Extracting biological insight from microarray data is important but challenging. Here we describe TAPPA, a java-based tool, for identification of phenotype-associated genetic pathways utilizing the pathway topological measures. This is achieved by first calculating a Pathway Connectivity Index (PCI) for each pathway, followed by evaluating its correlation to the phenotypic variation. Our PCI definition not only efficiently captures the contributions from genes that show subtle but consistent changes in expression, but also naturally overweighs the hub genes that interact with a large number of other genes in the pathway. TAPPA also allows evaluation of sub-modules within a pathway and their association to phenotypes.
The oscillatory insulin release is fundamental to normal glycemic control. The basis of the oscillation is the intercellular coupling and bursting synchronization of β cells in each islet. The functional role of islet β cell mass organization with respect to its oscillatory bursting is not well understood. This is of special interest in view of the recent finding of islet cytoarchitectural differences between human and animal models. In this study we developed a new hexagonal closest packing (HCP) cell cluster model. The model captures more accurately the real islet cell organization than the simple cubic packing (SCP) cluster that is conventionally used. Using our new model we investigated the functional characteristics of β-cell clusters, including the fraction of cells able to burst fb, the synchronization index λ of the bursting β cells, the bursting period Tb, the plateau fraction pf, and the amplitude of intracellular calcium oscillation [Ca]. We determined their dependence on cluster architectural parameters including number of cells nβ, number of inter-β cell couplings of each β cell nc, and the coupling strength gc. We found that at low values of nβ, nc and gc, the oscillation regularity improves with their increasing values. This functional gain plateaus around their physiological values in real islets, at nβ∼100, nc∼6 and gc∼200 pS. In addition, normal β-cell clusters are robust against significant perturbation to their architecture, including the presence of non-β cells or dead β cells. In clusters with nβ>∼100, coordinated β-cell bursting can be maintained at up to 70% of β-cell loss, which is consistent with laboratory and clinical findings of islets. Our results suggest that the bursting characteristics of a β-cell cluster depend quantitatively on its architecture in a non-linear fashion. These findings are important to understand the islet bursting phenomenon and the regulation of insulin secretion, under both physiological and pathological conditions.
Providing quantitative microarray data that is sensitive to very small differences in target sequence would be a useful tool in any number of venues where a sample can consist of a multiple related sequences present in various abundances. Examples of such applications would include measurement of pseudo species in viral infections and the measurement of species of antibodies or T cell receptors that constitute immune repertoires. Difficulties that must be overcome in such a method would be to account for cross-hybridization and for differences in hybridization efficiencies between the arrayed probes and their corresponding targets. We have used the memory T cell repertoire to an influenza-derived peptide as a test case for developing such a method.
The arrayed probes were corresponded to a 17 nucleotide TCR-specific region that distinguished sequences differing by as little as a single nucleotide. Hybridization efficiency between highly related Cy5-labeled subject sequences was normalized by including an equimolar mixture of Cy3-labeled synthetic targets representing all 108 arrayed probes. The same synthetic targets were used to measure the degree of cross hybridization between probes. Reconstitution studies found the system sensitive to input ratios as low as 0.5% and accurate in measuring known input percentages (R2 = 0.81, R = 0.90, p < 0.0001). A data handling protocol was developed to incorporate the differences in hybridization efficiency. To validate the array in T cell repertoire analysis, it was used to analyze human recall responses to influenza in three human subjects and compared to traditional cloning and sequencing. When evaluating the rank order of clonotype abundance determined by each method, the approaches were not found significantly different (Wilcoxon rank-sum test, p > 0.05).
This novel strategy appears to be robust and can be adapted to any situation where complex mixtures of highly similar sequences need to be quantitatively resolved.
Gene expression profiling using microarrays has become an important genetic tool. Spotted arrays prepared in academic labs have the advantage of low cost and high design and content flexibility, but are often limited by their susceptibility to quality control (QC) issues. Previously, we have reported a novel 3-color microarray technology that enabled array fabrication QC. In this report we further investigated its advantage in spot-level data QC.
We found that inadequate amount of bound probes available for hybridization led to significant, gene-specific compression in ratio measurements, increased data variability, and printing pin dependent heterogeneities. The impact of such problems can be captured through the definition of quality scores, and efficiently controlled through quality-dependent filtering and normalization. We compared gene expression measurements derived using our data processing pipeline with the known input ratios of spiked in control clones, and with the measurements by quantitative real time RT-PCR. In each case, highly linear relationships (R2>0.94) were observed, with modest compression in the microarray measurements (correction factor<1.17).
Our microarray analytical and technical advancements enabled a better dissection of the sources of data variability and hence a more efficient QC. With that highly accurate gene expression measurements can be achieved using the cDNA microarray technology.
Frequently Sampled Intravenous Glucose Tolerance Test (FSIVGTT) together with its mathematical model, the minimal model (MINMOD), have become important clinical tools to evaluate the metabolic control of glucose in humans. Dimensional analysis of the model is up to now not available.
A formal dimensional analysis of MINMOD was carried out and the degree of freedom of MINMOD was examined. Through re-expressing all state variable and parameters in terms of their reference scales, MINMOD was transformed into a dimensionless format. Previously defined physiological indices including insulin sensitivity, glucose effectiveness, and first and second phase insulin responses were re-examined in this new formulation. Further, the parameter estimation from FSIVGTT was implemented using both the dimensional and the dimensionless formulations of MINMOD, and the performances were compared utilizing Monte Carlo simulation as well as real human FSIVGTT data.
The degree of freedom (DOF) of MINMOD was found to be 7. The model was maximally simplified in the dimensionless formulation that normalizes the variation in glucose and insulin during FSIVGTT. In the new formulation, the disposition index (Dl), a composite parameter known to be important in diabetes pathology, was naturally defined as one of the dimensionless parameters in the system. The numerical simulation using the dimensionless formulation led to a 1.5–5 fold gain in speed, and significantly improved accuracy and robustness in parameter estimation compared to the dimensional implementation.
Dimensional analysis of MINMOD led to simplification of the model, direct identification of the important composite factors in the dynamics of glucose metabolic control, and better simulations algorithms.
Global gene expression studies with microarrays can offer biological insights never before possible. However, the technology possesses many sources of technical variability that are an obstacle to obtaining high quality data sets. Since spotted microarrays offer design/content flexibility and potential cost savings over commercial systems, we have developed prehybridization quality control strategies for spotted cDNA and oligonucleotide arrays. These approaches utilize a third fluorescent dye (fluorescein) to monitor key fabrication variables, such as print/spot morphology, DNA retention, and background arising from probe redistributed during blocking. Here, our labeled cDNA array platform is used to study, 1) compression of array data using known input ratios of Arabidopsis in vitro transcripts and arrayed serial dilutions of homologous probes; 2) how curing time of in-house poly-L-lysine coated slides impacts probe retention capacity; and 3) the retention characteristics of 13 commercially available surfaces.
When array element fluorescein intensity drops below 5,000 RFU/pixel, gene expression measurements become increasingly compressed, thereby validating this value as a prehybridization quality control threshold. We observe that the DNA retention capacity of in-house poly-L-lysine slides decreases rapidly over time (~50% reduction between 3 and 12 weeks post-coating; p < 0.0002) and that there are considerable differences in retention characteristics among commercially available poly-L-lysine and amino silane-coated slides.
High DNA retention rates are necessary for accurate gene expression measurements. Therefore, an understanding of the characteristics and optimization of protocols to an array surface are prerequisites to fabrication of high quality arrays.
Spotted 70-mer oligonucleotide arrays offer potentially greater specificity and an alternative to expensive cDNA library maintenance and amplification. Since microarray fabrication is a considerable source of data variance, we previously directly tagged cDNA probes with a third fluorophore for prehybridization quality control. Fluorescently modifying oligonucleotide sets is cost prohibitive, therefore, a co-spotted Staphylococcus aureus-specific fluorescein-labeled "tracking" oligonucleotide is described to monitor fabrication variables of a Mycobacterium tuberculosis oligonucleotide microarray.
Significantly (p < 0.01) improved DNA retention was achieved printing in 15% DMSO/1.5 M betaine compared to the vendor recommended buffers. Introduction of tracking oligonucleotide did not effect hybridization efficiency or introduce ratio measurement bias in hybridizations between M. tuberculosis H37Rv and M. tuberculosis mprA. Linearity between the mean log Cy3/Cy5 ratios of genes differentially expressed from arrays either possessing or lacking the tracking oligonucleotide was observed (R2 = 0.90, p < 0.05) and there were no significant differences in Pearson's correlation coefficients of ratio data between replicates possessing (0.72 ± 0.07), replicates lacking (0.74 ± 0.10), or replicates with and without (0.70 ± 0.04) the tracking oligonucleotide. ANOVA analysis confirmed the tracking oligonucleotide introduced no bias. Titrating target-specific oligonucleotide (40 μM to 0.78 μM) in the presence of 0.5 μM tracking oligonucleotide, revealed a fluorescein fluorescence inversely related to target-specific oligonucleotide molarity, making tracking oligonucleotide signal useful for quality control measurements and differentiating false negatives (synthesis failures and mechanical misses) from true negatives (no gene expression).
This novel approach enables prehybridization array visualization for spotted oligonucleotide arrays and sets the stage for more sophisticated slide qualification and data filtering applications.
Spotted oligonucleotide arrays; 70-mers; gene expression analysis
Construction methodologies for cDNA microarrays lack the ability to determine array integrity prior to hybridization, leaving the array itself a source of uncontrolled experimental variation. We solved this problem through development of a three-color cDNA array platform whereby printed probes are tagged with fluorescein and are compatible with Cy3 and Cy5 target labeling dyes when using confocal laser scanners possessing narrow bandwidths. Here we use this approach to: (i) develop a tracking system to monitor the printing of probe plates at predicted coordinates; (ii) define the quantity of immobilized probe necessary for quality hybridized array data to establish pre-hybridization array selection criteria; (iii) investigate factors that influence probe availability for hybridization; and (iv) explore the feasibility of hybridized data filtering using element fluorescein intensity. A direct and significant relationship (R2 = 0.73, P < 0.001) between pre-hybridization average fluorescein intensity and subsequent hybridized replicate consistency was observed, illustrating that data quality can be improved by selecting arrays that meet defined pre-hybridization criteria. Furthermore, we demonstrate that our three-color approach provides a means to filter spots possessing insufficient bound probe from hybridized data sets to further improve data quality. Collectively, this strategy will improve microarray data and increase its utility as a sensitive screening tool.