AdpA is the key transcriptional activator for a number of genes of various functions in the A-factor regulatory cascade in Streptomyces griseus, forming an AdpA regulon. Trypsin-like activity was detected at a late stage of growth in the wild-type strain but not in an A-factor-deficient mutant. Consistent with these observations, two trypsin genes, sprT and sprU, in S. griseus were found to be members of the AdpA regulon; AdpA activated the transcription of both genes by binding to the operators located at about −50 nucleotide positions with respect to the transcriptional start point. The transcription of sprT and sprU, induced by AdpA, was most active at the onset of sporulation. Most trypsin activity exerted by S. griseus was attributed to SprT, because trypsin activity in an sprT-disrupted mutant was greatly reduced but that in an sprU-disrupted mutant was only slightly reduced. This was consistent with the observation that the amount of the sprT mRNA was much greater than that of the sprU transcript. Disruption of both sprT and sprU (mutant ΔsprTU) reduced trypsin activity to almost zero, indicating that no trypsin genes other than these two were present in S. griseus. Even the double mutant ΔsprTU grew normally and developed aerial hyphae and spores over the same time course as the wild-type strain.
The conventional fMRI image analysis approach to associating stimuli to brain activation is performed by carrying out a massive number of parallel univariate regression analyses. fMRI blood-oxygen-level dependent (BOLD) signal, the basis of these analyses, is known for its low signal-noise-ratio and high spatial and temporal signal correlation. In order to ensure accurate localization of brain activity, stimulus administration in an fMRI session is often lengthy and repetitive. Real-time fMRI BOLD signal analysis is carried out as the signal is observed. This method allows for dynamic, real-time adjustment of stimuli through sequential experimental designs. We have developed a voxel-wise sequential probability ratio test (SPRT) approach for dynamically determining localization, as well as decision rules for stopping stimulus administration. SPRT methods and general linear model (GLM) approaches are combined to identify brain regions that are activated by specific elements of stimuli. Stimulus administration is dynamically stopped when sufficient statistical evidence is collected to determine activation status across regions of interest, following predetermined statistical error thresholds. Simulation experiments and an example based on real fMRI data show that scan volumes can be substantially reduced when compared with pre-determined, fixed designs while achieving similar or better accuracy in detecting activated voxels. Moreover, the proposed approach is also able to accurately detect differentially activated areas, and other comparisons between task-related GLM parameters that can be formulated in a hypothesis-testing framework. Finally, we give a demonstration of SPRT being employed in conjunction with a halving algorithm to dynamically adjust stimuli.
Accurate classification into genotypes is critical in understanding evolution of divergent viruses. Here we report a new approach, MuLDAS, which classifies a query sequence based on the statistical genotype models learned from the known sequences. Thus, MuLDAS utilizes full spectra of well characterized sequences as references, typically of an order of hundreds, in order to estimate the significance of each genotype assignment.
MuLDAS starts by aligning the query sequence to the reference multiple sequence alignment and calculating the subsequent distance matrix among the sequences. They are then mapped to a principal coordinate space by multidimensional scaling, and the coordinates of the reference sequences are used as features in developing linear discriminant models that partition the space by genotype. The genotype of the query is then given as the maximum a posteriori estimate. MuLDAS tests the model confidence by leave-one-out cross-validation and also provides some heuristics for the detection of 'outlier' sequences that fall far outside or in-between genotype clusters. We have tested our method by classifying HIV-1 and HCV nucleotide sequences downloaded from NCBI GenBank, achieving the overall concordance rates of 99.3% and 96.6%, respectively, with the benchmark test dataset retrieved from the respective databases of Los Alamos National Laboratory.
The highly accurate genotype assignment coupled with several measures for evaluating the results makes MuLDAS useful in analyzing the sequences of rapidly evolving viruses such as HIV-1 and HCV. A web-based genotype prediction server is available at http://www.muldas.org/MuLDAS/.
This study investigates the use of saliva, as an emerging diagnostic fluid in conjunction with classification techniques to discern biological heterogeneity in clinically labelled gingivitis and periodontitis subjects (80 subjects; 40/group) A battery of classification techniques were investigated as traditional single classifier systems as well as within a novel selective voting ensemble classification approach (SVA) framework. Unlike traditional single classifiers, SVA is shown to reveal patient-specific variations within disease groups, which may be important for identifying proclivity to disease progression or disease stability. Salivary expression profiles of IL-1ß, IL-6, MMP-8, and MIP-1α from 80 patients were analyzed using four classification algorithms (LDA: Linear Discriminant Analysis [LDA], Quadratic Discriminant Analysis [QDA], Naïve Bayes Classifier [NBC] and Support Vector Machines [SVM]) as traditional single classifiers and within the SVA framework (SVA-LDA, SVA-QDA, SVA-NB and SVA-SVM). Our findings demonstrate that performance measures (sensitivity, specificity and accuracy) of traditional classification as single classifier were comparable to that of the SVA counterparts using clinical labels of the samples as ground truth. However, unlike traditional single classifier approaches, the normalized ensemble vote-counts from SVA revealed varying proclivity of the subjects for each of the disease groups. More importantly, the SVA identified a subset of gingivitis and periodontitis samples that demonstrated a biological proclivity commensurate with the other clinical group. This subset was confirmed across SVA-LDA, SVA-QDA, SVA-NB and SVA-SVM. Heatmap visualization of their ensemble sets revealed lack of consensus between these subsets and the rest of the samples within the respective disease groups indicating the unique nature of the patients in these subsets. While the source of variation is not known, the results presented clearly elucidate the need for novel approaches that accommodate inherent heterogeneity and personalized variations within disease groups in diagnostic characterization. The proposed approach falls within the scope of P4 medicine (predictive, preventive, personalized, and participatory) with the ability to identify unique patient profiles that may predict specific disease trajectories and targeted disease management.
When designing programs or software for the implementation of Monte Carlo (MC) hypothesis tests, we can save computation time by using sequential stopping boundaries. Such boundaries imply stopping resampling after relatively few replications if the early replications indicate a very large or very small p-value. We study a truncated sequential probability ratio test (SPRT) boundary and provide a tractable algorithm to implement it. We review two properties desired of any MC p-value, the validity of the p-value and a small resampling risk, where resampling risk is the probability that the accept/reject decision will be different than the decision from complete enumeration. We show how the algorithm can be used to calculate a valid p-value and confidence intervals for any truncated SPRT boundary. We show that a class of SPRT boundaries is minimax with respect to resampling risk and recommend a truncated version of boundaries in that class by comparing their resampling risk (RR) to the RR of fixed boundaries with the same maximum resample size. We study the lack of validity of some simple estimators of p-values and offer a new simple valid p-value for the recommended truncated SPRT boundary. We explore the use of these methods in a practical example and provide the MChtest R package to perform the methods.
Bootstrap; B-value; Permutation; Resampling Risk; Sequential Design; Sequential Probability Ratio Test
Automated adverse outcome surveillance tools and methods have potential utility in quality improvement and medical product surveillance activities. Their use for assessing hospital performance on the basis of patient outcomes has received little attention. We compared risk-adjusted sequential probability ratio testing (RA-SPRT) implemented in an automated tool to Massachusetts public reports of 30-day mortality after isolated coronary artery bypass graft surgery.
A total of 23,020 isolated adult coronary artery bypass surgery admissions performed in Massachusetts hospitals between January 1, 2002 and September 30, 2007 were retrospectively re-evaluated. The RA-SPRT method was implemented within an automated surveillance tool to identify hospital outliers in yearly increments. We used an overall type I error rate of 0.05, an overall type II error rate of 0.10, and a threshold that signaled if the odds of dying 30-days after surgery was at least twice than expected. Annual hospital outlier status, based on the state-reported classification, was considered the gold standard. An event was defined as at least one occurrence of a higher-than-expected hospital mortality rate during a given year.
We examined a total of 83 hospital-year observations. The RA-SPRT method alerted 6 events among three hospitals for 30-day mortality compared with 5 events among two hospitals using the state public reports, yielding a sensitivity of 100% (5/5) and specificity of 98.8% (79/80).
The automated RA-SPRT method performed well, detecting all of the true institutional outliers with a small false positive alerting rate. Such a system could provide confidential automated notification to local institutions in advance of public reporting providing opportunities for earlier quality improvement interventions.
Gene expression in a developing embryo occurs in particular cells (spatial patterns) in a time-specific manner (temporal patterns), which leads to the differentiation of cell fates. Images of a Drosophila melanogaster embryo at a given developmental stage, showing a particular gene expression pattern revealed by a gene-specific probe, can be compared for spatial overlaps. The comparison is fundamentally important to formulating and testing gene interaction hypotheses. Expression pattern comparison is most biologically meaningful when images from a similar time point (developmental stage) are compared. In this paper, we present LdaPath, a novel formulation of Linear Discriminant Analysis (LDA) for automatic developmental stage range classification. It employs multivariate linear regression with the L1-norm penalty controlled by a regularization parameter for feature extraction and visualization. LdaPath computes an entire solution path for all values of regularization parameter with essentially the same computational cost as fitting one LDA model. Thus, it facilitates efficient model selection. It is based on the equivalence relationship between LDA and the least squares method for multi-class classifications. This equivalence relationship is established under a mild condition, which we show empirically to hold for many high-dimensional datasets, such as expression pattern images. Our experiments on a collection of 2705 expression pattern images show the effectiveness of the proposed algorithm. Results also show that the LDA model resulting from LdaPath is sparse, and irrelevant features may be removed. Thus, LdaPath provides a general framework for simultaneous feature selection and feature extraction.
Gene expression pattern image; dimensionality reduction; Linear Discriminant Analysis; linear regression
This study presents a 2-stage heartbeat classifier of supraventricular (SVB) and ventricular (VB) beats. Stage 1 makes computationally-efficient classification of SVB-beats, using simple correlation threshold criterion for finding close match with a predominant normal (reference) beat template. The non-matched beats are next subjected to measurement of 20 basic features, tracking the beat and reference template morphology and RR-variability for subsequent refined classification in SVB or VB-class by Stage 2. Four linear classifiers are compared: cluster, fuzzy, linear discriminant analysis (LDA) and classification tree (CT), all subjected to iterative training for selection of the optimal feature space among extended 210-sized set, embodying interactive second-order effects between 20 independent features. The optimization process minimizes at equal weight the false positives in SVB-class and false negatives in VB-class. The training with European ST-T, AHA, MIT-BIH Supraventricular Arrhythmia databases found the best performance settings of all classification models: Cluster (30 features), Fuzzy (72 features), LDA (142 coefficients), CT (221 decision nodes) with top-3 best scored features: normalized current RR-interval, higher/lower frequency content ratio, beat-to-template correlation. Unbiased test-validation with MIT-BIH Arrhythmia database rates the classifiers in descending order of their specificity for SVB-class: CT (99.9%), LDA (99.6%), Cluster (99.5%), Fuzzy (99.4%); sensitivity for ventricular ectopic beats as part from VB-class (commonly reported in published beat-classification studies): CT (96.7%), Fuzzy (94.4%), LDA (94.2%), Cluster (92.4%); positive predictivity: CT (99.2%), Cluster (93.6%), LDA (93.0%), Fuzzy (92.4%). CT has superior accuracy by 0.3–6.8% points, with the advantage for easy model complexity configuration by pruning the tree consisted of easy interpretable ‘if-then’ rules.
Linear discriminant analysis (LDA) is a classical statistical approach for dimensionality reduction and classification. In many cases, the projection direction of the classical and extended LDA methods is not considered optimal for special applications. Herein we combine the Partial Least Squares (PLS) method with LDA algorithm, and then propose two improved methods, named LDA-PLS and ex-LDA-PLS, respectively. The LDA-PLS amends the projection direction of LDA by using the information of PLS, while ex-LDA-PLS is an extension of LDA-PLS by combining the result of LDA-PLS and LDA, making the result closer to the optimal direction by an adjusting parameter. Comparative studies are provided between the proposed methods and other traditional dimension reduction methods such as Principal component analysis (PCA), LDA and PLS-LDA on two data sets. Experimental results show that the proposed method can achieve better classification performance.
More studies based on gene expression data have been reported in great detail, however, one major challenge for the methodologists is the choice of classification methods. The main purpose of this research was to compare the performance of linear discriminant analysis (LDA) and its modification methods for the classification of cancer based on gene expression data.
The classification performance of linear discriminant analysis (LDA) and its modification methods was evaluated by applying these methods to six public cancer gene expression datasets. These methods included linear discriminant analysis (LDA), prediction analysis for microarrays (PAM), shrinkage centroid regularized discriminant analysis (SCRDA), shrinkage linear discriminant analysis (SLDA) and shrinkage diagonal discriminant analysis (SDDA). The procedures were performed by software R 2.80.
PAM picked out fewer feature genes than other methods from most datasets except from Brain dataset. For the two methods of shrinkage discriminant analysis, SLDA selected more genes than SDDA from most datasets except from 2-class lung cancer dataset. When comparing SLDA with SCRDA, SLDA selected more genes than SCRDA from 2-class lung cancer, SRBCT and Brain dataset, the result was opposite for the rest datasets. The average test error of LDA modification methods was lower than LDA method.
The classification performance of LDA modification methods was superior to that of traditional LDA with respect to the average error and there was no significant difference between theses modification methods.
Linear discriminant analysis (LDA) is one of the most popular classification algorithms for brain-computer interfaces (BCI). LDA assumes Gaussian distribution of the data, with equal covariance matrices for the concerned classes, however, the assumption is not usually held in actual BCI applications, where the heteroscedastic class distributions are usually observed. This paper proposes an enhanced version of LDA, namely z-score linear discriminant analysis (Z-LDA), which introduces a new decision boundary definition strategy to handle with the heteroscedastic class distributions. Z-LDA defines decision boundary through z-score utilizing both mean and standard deviation information of the projected data, which can adaptively adjust the decision boundary to fit for heteroscedastic distribution situation. Results derived from both simulation dataset and two actual BCI datasets consistently show that Z-LDA achieves significantly higher average classification accuracies than conventional LDA, indicating the superiority of the new proposed decision boundary definition strategy.
Making an accurate diagnosis of schizophrenia and related psychoses early in the course of the disease is important for initiating treatment and counseling patients and families. In this study, we developed classification models for early disease diagnosis using structural MRI (sMRI) and neuropsychological (NP) testing. We used sMRI measurements and NP test results from 28 patients with recent-onset schizophrenia and 47 healthy subjects, drawn from the larger sample of the Mind Clinical Imaging Consortium. We developed diagnostic models based on Linear Discriminant Analysis (LDA) following two approaches; namely, (a) stepwise (STP) LDA on the original measurements, and (b) LDA on variables created through Principal Component Analysis (PCA) and selected using the Humphrey-Ilgen parallel analysis. Error estimation of the modeling algorithms was evaluated by leave-one-out external cross-validation. These analyses were performed on sMRI and NP variables separately and in combination. The following classification accuracy was obtained for different variables and modeling algorithms. sMRI only: (a) STP-LDA: 64.3% sensitivity and 76.6% specificity, (b) PCA-LDA: 67.9% sensitivity and 72.3% specificity. NP only: (a) STP-LDA: 71.4% sensitivity and 80.9% specificity, (b) PCA-LDA: 78.5% sensitivity and 91.5% specificity. Combined sMRI-NP: (a) STP-LDA: 64.3% sensitivity and 83.0% specificity, (b) PCA-LDA: 89.3% sensitivity and 93.6% specificity. (i) Maximal diagnostic accuracy was achieved by combining sMRI and NP variables. (ii) NP variables were more informative than sMRI, indicating that cognitive deficits can be detected earlier than volumetric structural abnormalities. (iii) PCA-LDA yielded more accurate classification than STP-LDA. As these sMRI and NP tests are widely available, they can increase accuracy of early intervention strategies and possibly be used in evaluating treatment response.
Schizophrenia; Schizophreniform; Schizoaffective; PCA; LDA; Biomarkers; Neuropsychology; MRI; Cross-validation; Diagnosis; MCIC
Optical spectroscopic techniques, including Raman spectroscopy, have shown promise for in vivo cancer diagnostics in a variety of organs. In this study, the potential use of a home-made Raman spectral system with a millimeter order excitation laser spot size combined with a multivariate statistical analysis for the rapid detection and discrimination of nasopharyngeal cancer from normal nasopharyngeal tissue was evaluated. Raman scattering signals were acquired from 16 normal and 32 nasopharyngeal carcinoma tissue samples. Linear discriminant analysis (LDA) based on principal component analysis (PCA) and partial least squares (PLS) were employed to generate diagnostic algorithms for the classification of different nasopharyngeal tissue types. Spectral differences in Raman spectra between the two types of tissues were revealed; the normalized intensities of Raman peaks at 1,001, 1,207 and 1,658 cm−1 were more intense for nasopharyngeal carcinoma tissue compared to normal tissue, while Raman bands at 848, 936 and 1,446 cm−1 were stronger in normal nasopharyngeal samples. The PCA-LDA algorithm together with leave-one-out cross validation yields a diagnostic sensitivity of 81% and a specificity of 87%, while the PLS method coupled with subwindow permutation analysis improves the diagnostic sensitivity and specificity to 85 and 88%, respectively. Therefore, Raman spectroscopy combined with PCA-LDA/PLS demonstrated good potential for improving the clinical diagnosis of nasopharyngeal cancers.
discriminant analysis; nasopharyngeal carcinoma; principal component analysis; Raman spectroscopy
To evaluate the effects of electric cortical stimulation in the experimentally induced focal traumatic brain injury (TBI) rat model on motor recovery and plasticity of the injured brain.
Twenty male Sprague-Dawley rats were pre-trained on a single pellet reaching task (SPRT) and on a Rotarod task (RRT) for 14 days. Then, the TBI model was induced by a weight drop device (40 g in weight, 25 cm in height) on the dominant motor cortex, and the electrode was implanted over the perilesional cortical surface. All rats were divided into two groups as follows: Electrical stimulation (ES) group with anodal continuous stimulation (50 Hz and 194 µs duration) or Sham-operated control (SOC) group with no electrical stimulation. The rats were trained SPRT and RRT for 14 days for rehabilitation and measured Garcia's neurologic examination. Histopathological and immunostaining evaluations were performed after the experiment.
There were no differences in the slice number in the histological analysis. Garcia's neurologic scores & SPRT were significantly increased in the ES group (p<0.05), yet, there was no difference in RRT in both groups. The ES group showed more expression of c-Fos around the brain injured area than the SOC group.
Electric cortical stimulation with rehabilitation is considered to be one of the trial methods for motor recovery in TBI. However, more studies should be conducted for the TBI model in order to establish better stimulation methods.
Cortical stimulation; Traumatic brain injury; Rehabilitation; Motor recovery
This study investigated the feasibility of using near infrared hyperspectral imaging (NIR-HSI) technique for non-destructive identification of sesame oil. Hyperspectral images of four varieties of sesame oil were obtained in the spectral region of 874–1734 nm. Reflectance values were extracted from each region of interest (ROI) of each sample. Competitive adaptive reweighted sampling (CARS), successive projections algorithm (SPA) and x-loading weights (x-LW) were carried out to identify the most significant wavelengths. Based on the sixty-four, seven and five wavelengths suggested by CARS, SPA and x-LW, respectively, two classified models (least squares-support vector machine, LS-SVM and linear discriminant analysis,LDA) were established. Among the established models, CARS-LS-SVM and CARS-LDA models performed well with the highest classification rate (100%) in both calibration and prediction sets. SPA-LS-SVM and SPA-LDA models obtained better results (95.59% and 98.53% of classification rate in prediction set) with only seven wavelengths (938, 1160, 1214, 1406, 1656, 1659 and 1663 nm). The x-LW-LS-SVM and x-LW-LDA models also obtained satisfactory results (>80% of classification rate in prediction set) with the only five wavelengths (921, 925, 995, 1453 and 1663 nm). The results showed that NIR-HSI technique could be used to identify the varieties of sesame oil rapidly and non-destructively, and CARS, SPA and x-LW were effective wavelengths selection methods.
This study aims to characterize and classify serum surface-enhanced Raman spectroscopy (SERS) spectra between bladder cancer patients and normal volunteers by genetic algorithms (GAs) combined with linear discriminate analysis (LDA). Two group serum SERS spectra excited with nanoparticles are collected from healthy volunteers (n = 36) and bladder cancer patients (n = 55). Six diagnostic Raman bands in the regions of 481–486, 682–687, 1018–1034, 1313–1323, 1450–1459 and 1582–1587 cm−1 related to proteins, nucleic acids and lipids are picked out with the GAs and LDA. By the diagnostic models built with the identified six Raman bands, the improved diagnostic sensitivity of 90.9% and specificity of 100% were acquired for classifying bladder cancer patients from normal serum SERS spectra. The results are superior to the sensitivity of 74.6% and specificity of 97.2% obtained with principal component analysis by the same serum SERS spectra dataset. Receiver operating characteristic (ROC) curves further confirmed the efficiency of diagnostic algorithm based on GA-LDA technique. This exploratory work demonstrates that the serum SERS associated with GA-LDA technique has enormous potential to characterize and non-invasively detect bladder cancer through peripheral blood.
To determine the prognostic significance of data collected early after starting certolizumab pegol (CZP) to predict low disease activity (LDA) at Week 52.
Data through Week 12 from 703 CZP-treated patients in the RA PreventIon of structural Damage (RAPID 1) trial were used as variables to predict LDA (DAS28 [ESR] ≤3.2) at Week 52. We identified variables, developed prediction models using classification trees, and tested performance using training and testing datasets. Additional prediction models were constructed using CDAI and an alternate outcome definition (composite of LDA or ACR50).
Using Week 6 and 12 data and across several different prediction models, response (LDA) and nonresponse at 1 year was predicted with relatively high accuracy (70–90%) for most patients. The best performing model predicting nonresponse by 12 weeks was 90% accurate and applied to 46% of the population. Model accuracy for predicted responders (30% of the RAPID1 population) was 74%. The area under the receiver operator curve was 0.76. Depending on the desired certainty of prediction at 12 weeks, ~12–24% of patients required >12 weeks of treatment to be accurately classified. CDAI-based models, and those evaluating the composite outcome (LDA or ACR50), achieved comparable accuracy.
We could accurately predict within 12 weeks of starting CZP whether most established RA patients with high baseline disease activity would likely achieve/not achieve LDA at 1 year. Decision trees may be useful to guide prospective management for RA patients treated with CZP and other biologics.
Cancer diagnosis is one of the most important tasks of biomedical research and has become the main objective of medical investigations.
The present paper proposed an analytical strategy for distinguishing between normal and malignant colorectal tissues
by combining the use of near-infrared (NIR) spectroscopy with chemometrics. The successive projection algorithm-linear discriminant analysis
(SPA-LDA) was used to seek a reduced subset of variables/wavenumbers and build a diagnostic model of LDA. For comparison, the partial least
squares-discriminant analysis (PLS-DA) based on full-spectrum classification was also used as the reference. Principal component analysis (PCA)
was used for a preliminary analysis. A total of 186 spectra from 20 patients with partial colorectal resection were collected and divided into three subsets for training,
optimizing, and testing the model. The results showed that, compared to PLS-DA, SPA-LDA provided more parsimonious model using only three
wavenumbers/variables (4065, 4173, and 5758 cm−1) to achieve the sensitivity of 84.6%, 92.3%, and 92.3%
for the training, validation, and test sets, respectively, and the specificity of 100% for each subset. It indicated that the combination of
NIR spectroscopy and SPA-LDA algorithm can serve as a potential tool for distinguishing between normal and malignant colorectal tissues.
Supplemental Digital Content is available in the text
Epidermal growth factor receptor (EGFR) activating mutations are a predictor of tyrosine kinase inhibitor effectiveness in the treatment of non–small-cell lung cancer (NSCLC). The objective of this study is to build a model for predicting the EGFR mutation status of brain metastasis in patients with NSCLC.
Observation and model set-up.
This study was conducted between January 2003 and December 2011 in 6 medical centers in Southwest China.
The study included 31 NSCLC patients with brain metastases.
Eligibility requirements were histological proof of NSCLC, as well as sufficient quantity of paraffin-embedded lung and brain metastases specimens for EGFR mutation detection. The linear discriminant analysis (LDA) method was used for analyzing the dimensional reduction of clinical features, and a support vector machine (SVM) algorithm was employed to generate an EGFR mutation model for NSCLC brain metastases. Training-testing-validation (3 : 1 : 1) processes were applied to find the best fit in 12 patients (validation test set) with NSCLC and brain metastases treated with a tyrosine kinase inhibitor and whole-brain radiotherapy.
Primary and secondary outcome measures: EGFR mutation analysis in patients with NSCLC and brain metastases and the development of a LDA-SVM-based EGFR mutation model for NSCLC brain metastases patients.
EGFR mutation discordance between the primary lung tumor and brain metastases was found in 5 patients. Using LDA, 13 clinical features were transformed into 9 characteristics, and 3 were selected as primary vectors. The EGFR mutation model constructed with SVM algorithms had an accuracy, sensitivity, and specificity for determining the mutation status of brain metastases of 0.879, 0.886, and 0.875, respectively. Furthermore, the replicability of our model was confirmed by testing 100 random combinations of input values.
The LDA-SVM-based model developed in this study could predict the EGFR status of brain metastases in this small cohort of patients with NSCLC. Further studies with larger cohorts should be carried out to validate our findings in the clinical setting.
Computational theories of decision making in the brain usually assume that sensory 'evidence' is accumulated supporting a number of hypotheses, and that the first accumulator to reach threshold triggers a decision in favour of its associated hypothesis. However, the evidence is often assumed to occur as a continuous process whose origins are somewhat abstract, with no direct link to the neural signals - action potentials or 'spikes' - that must ultimately form the substrate for decision making in the brain. Here we introduce a new variant of the well-known multi-hypothesis sequential probability ratio test (MSPRT) for decision making whose evidence observations consist of the basic unit of neural signalling - the inter-spike interval (ISI) - and which is based on a new form of the likelihood function. We dub this mechanism s-MSPRT and show its precise form for a range of realistic ISI distributions with positive support. In this way we show that, at the level of spikes, the refractory period may actually facilitate shorter decision times, and that the mechanism is robust against poor choice of the hypothesized data distribution. We show that s-MSPRT performance is related to the Kullback-Leibler divergence (KLD) or information gain between ISI distributions, through which we are able to link neural signalling to psychophysical observation at the behavioural level. Thus, we find the mean information needed for a decision is constant, thereby offering an account of Hick's law (relating decision time to the number of choices). Further, the mean decision time of s-MSPRT shows a power law dependence on the KLD offering an account of Piéron's law (relating reaction time to stimulus intensity). These results show the foundations for a research programme in which spike train analysis can be made the basis for predictions about behavior in multi-alternative choice tasks.
Objective. This study aimed at evaluating linear discriminant analysis (LDA) and support vector machine (SVM) classifiers for estimating final Gleason score preoperatively using multiparametric magnetic resonance imaging (mp-MRI) and clinical parameters. Materials and Methods. Thirty-three patients who underwent mp-MRI on a 3T clinical MR scanner and radical prostatectomy were enrolled in this study. The input features for classifiers were age, the presence of a palpable prostate abnormality, prostate specific antigen (PSA) level, index lesion size, and Likert scales of T2 weighted MRI (T2w-MRI), diffusion weighted MRI (DW-MRI), and dynamic contrast enhanced MRI (DCE-MRI) estimated by an experienced radiologist. SVM based recursive feature elimination (SVM-RFE) was used for eliminating features. Principal component analysis (PCA) was applied for data uncorrelation. Results. Using a standard PCA before final Gleason score classification resulted in mean sensitivities of 51.19% and 64.37% and mean specificities of 72.71% and 39.90% for LDA and SVM, respectively. Using a Gaussian kernel PCA resulted in mean sensitivities of 86.51% and 87.88% and mean specificities of 63.99% and 56.83% for LDA and SVM, respectively. Conclusion. SVM classifier resulted in a slightly higher sensitivity but a lower specificity than LDA method for final Gleason score prediction for prostate cancer for this limited patient population.
In class prediction problems using microarray data, gene selection is essential to improve the prediction accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVM-RFE) has become one of the leading methods and is being widely used. The SVM-based approach performs gene selection using the weight vector of the hyperplane constructed by the samples on the margin. However, the performance can be easily affected by noise and outliers, when it is applied to noisy, small sample size microarray data.
In this paper, we propose a recursive gene selection method using the discriminant vector of the maximum margin criterion (MMC), which is a variant of classical linear discriminant analysis (LDA). To overcome the computational drawback of classical LDA and the problem of high dimensionality, we present efficient and stable algorithms for MMC-based RFE (MMC-RFE). The MMC-RFE algorithms naturally extend to multi-class cases. The performance of MMC-RFE was extensively compared with that of SVM-RFE using nine cancer microarray datasets, including four multi-class datasets.
Our extensive comparison has demonstrated that for binary-class datasets MMC-RFE tends to show intermediate performance between hard-margin SVM-RFE and SVM-RFE with a properly chosen soft-margin parameter. Notably, MMC-RFE achieves significantly better performance with a smaller number of genes than SVM-RFE for multi-class datasets. The results suggest that MMC-RFE is less sensitive to noise and outliers due to the use of average margin, and thus may be useful for biomarker discovery from noisy data.
AIM: To evaluate the usefulness of differentially expressed proteins from colorectal cancer (CRC) tissues for differentiating cancer and normal tissues.
METHODS: A Proteomic approach was used to identify the differentially expressed proteins between CRC and normal tissues. The proteins were extracted using Tris buffer and thiourea lysis buffer (TLB) for extraction of aqueous soluble and membrane-associated proteins, respectively. Chemometrics, namely principal component analysis (PCA) and linear discriminant analysis (LDA), were used to assess the usefulness of these proteins for identifying the cancerous state of tissues.
RESULTS: Differentially expressed proteins identified were 37 aqueous soluble proteins in Tris extracts and 24 membrane-associated proteins in TLB extracts. Based on the protein spots intensity on 2D-gel images, PCA by applying an eigenvalue > 1 was successfully used to reduce the number of principal components (PCs) into 12 and seven PCs for Tris and TLB extracts, respectively, and subsequently six PCs, respectively from both the extracts were used for LDA. The LDA classification for Tris extract showed 82.7% of original samples were correctly classified, whereas 82.7% were correctly classified for the cross-validated samples. The LDA for TLB extract showed that 78.8% of original samples and 71.2% of the cross-validated samples were correctly classified.
CONCLUSION: The classification of CRC tissues by PCA and LDA provided a promising distinction between normal and cancer types. These methods can possibly be used for identification of potential biomarkers among the differentially expressed proteins identified.
Colorectal cancer; Proteomics; Marker protein; Principal component analysis; Linear discriminant analysis
Raman spectroscopy is a molecular vibrational spectroscopic technique that is capable of optically probing the biomolecular changes associated with diseased transformation. The purpose of this study was to explore near-infrared (NIR) Raman spectroscopy for identifying dysplasia from normal gastric mucosa tissue. A rapid-acquisition dispersive-type NIR Raman system was utilised for tissue Raman spectroscopic measurements at 785 nm laser excitation. A total of 76 gastric tissue samples obtained from 44 patients who underwent endoscopy investigation or gastrectomy operation were used in this study. The histopathological examinations showed that 55 tissue specimens were normal and 21 were dysplasia. Both the empirical approach and multivariate statistical techniques, including principal components analysis (PCA), and linear discriminant analysis (LDA), together with the leave-one-sample-out cross-validation method, were employed to develop effective diagnostic algorithms for classification of Raman spectra between normal and dysplastic gastric tissues. High-quality Raman spectra in the range of 800–1800 cm−1 can be acquired from gastric tissue within 5 s. There are specific spectral differences in Raman spectra between normal and dysplasia tissue, particularly in the spectral ranges of 1200–1500 cm−1 and 1600–1800 cm−1, which contained signals related to amide III and amide I of proteins, CH3CH2 twisting of proteins/nucleic acids, and the C=C stretching mode of phospholipids, respectively. The empirical diagnostic algorithm based on the ratio of the Raman peak intensity at 875 cm−1 to the peak intensity at 1450 cm−1 gave the diagnostic sensitivity of 85.7% and specificity of 80.0%, whereas the diagnostic algorithms based on PCA-LDA yielded the diagnostic sensitivity of 95.2% and specificity 90.9% for separating dysplasia from normal gastric tissue. Receiver operating characteristic (ROC) curves further confirmed that the most effective diagnostic algorithm can be derived from the PCA-LDA technique. Therefore, NIR Raman spectroscopy in conjunction with multivariate statistical technique has potential for rapid diagnosis of dysplasia in the stomach based on the optical evaluation of spectral features of biomolecules.
dysplasia; near-infrared Raman spectroscopy; optical diagnosis; stomach; principal components analysis; linear discriminant analysis
Extracting relevant information from microarray data is a very complex task due to the characteristics of the data sets, as they comprise a large number of features while few samples are generally available. In this sense, feature selection is a very important aspect of the analysis helping in the tasks of identifying relevant genes and also for maximizing predictive information.
Due to its simplicity and speed, Stepwise Forward Selection (SFS) is a widely used feature selection technique. In this work, we carry a comparative study of SFS and Genetic Algorithms (GA) as general frameworks for the analysis of microarray data with the aim of identifying group of genes with high predictive capability and biological relevance. Six standard and machine learning-based techniques (Linear Discriminant Analysis (LDA), Support Vector Machines (SVM), Naive Bayes (NB), C-MANTEC Constructive Neural Network, K-Nearest Neighbors (kNN) and Multilayer perceptron (MLP)) are used within both frameworks using six free-public datasets for the task of predicting cancer outcome.
Better cancer outcome prediction results were obtained using the GA framework noting that this approach, in comparison to the SFS one, leads to a larger selection set, uses a large number of comparison between genetic profiles and thus it is computationally more intensive. Also the GA framework permitted to obtain a set of genes that can be considered to be more biologically relevant. Regarding the different classifiers used standard feedforward neural networks (MLP), LDA and SVM lead to similar and best results, while C-MANTEC and k-NN followed closely but with a lower accuracy. Further, C-MANTEC, MLP and LDA permitted to obtain a more limited set of genes in comparison to SVM, NB and kNN, and in particular C-MANTEC resulted in the most robust classifier in terms of changes in the parameter settings.
This study shows that if prediction accuracy is the objective, the GA-based approach lead to better results respect to the SFS approach, independently of the classifier used. Regarding classifiers, even if C-MANTEC did not achieve the best overall results, the performance was competitive with a very robust behaviour in terms of the parameters of the algorithm, and thus it can be considered as a candidate technique for future studies.
Microarray; Genetic algorithms; Constructive neural networks; Feature Selection