Intravascular optical coherence tomography (IVOCT) is rapidly becoming the method of choice for the in vivo investigation of coronary artery disease. While IVOCT visualizes atherosclerotic plaques with a resolution <20µm, image analysis in terms of tissue composition is currently performed by a time-consuming manual procedure based on the qualitative interpretation of image features. We illustrate an algorithm for the automated and systematic characterization of IVOCT atherosclerotic tissue. The proposed method consists in a supervised classification of image pixels according to textural features combined with the estimated value of the optical attenuation coefficient. IVOCT images of 64 plaques, from 49 in vivo IVOCT data sets, constituted the algorithm’s training and testing data sets. Validation was obtained by comparing automated analysis results to the manual assessment of atherosclerotic plaques. An overall pixel-wise accuracy of 81.5% with a classification feasibility of 76.5% and per-class accuracy of 89.5%, 72.1% and 79.5% for fibrotic, calcified and lipid-rich tissue respectively, was found. Moreover, measured optical properties were in agreement with previous results reported in literature. As such, an algorithm for automated tissue characterization was developed and validated using in vivo human data, suggesting that it can be applied to clinical IVOCT data. This might be an important step towards the integration of IVOCT in cardiovascular research and routine clinical practice.
(100.0100) Image processing; (100.2960) Image analysis; (100.4995) Pattern recognition, metrics; (170.0170) Medical optics and biotechnology; (170.6935) Tissue characterization
For quantitative analysis of histopathological images, such as the lymphoma grading systems, quantification of features is usually carried out on single cells before categorizing them by classification algorithms. To this end, we propose an integrated framework consisting of a novel supervised cell-image segmentation algorithm and a new touching-cell splitting method.
For the segmentation part, we segment the cell regions from the other areas by classifying the image pixels into either cell or extra-cellular category. Instead of using pixel color intensities, the color-texture extracted at the local neighborhood of each pixel is utilized as the input to our classification algorithm. The color-texture at each pixel is extracted by local Fourier transform (LFT) from a new color space, the most discriminant color space (MDC). The MDC color space is optimized to be a linear combination of the original RGB color space so that the extracted LFT texture features in the MDC color space can achieve most discrimination in terms of classification (segmentation) performance. To speed up the texture feature extraction process, we develop an efficient LFT extraction algorithm based on image shifting and image integral.
For the splitting part, given a connected component of the segmentation map, we initially differentiate whether it is a touching-cell clump or a single non-touching cell. The differentiation is mainly based on the distance between the most likely radial-symmetry center and the geometrical center of the connected component. The boundaries of touching-cell clumps are smoothed out by Fourier shape descriptor before carrying out an iterative, concave-point and radial-symmetry based splitting algorithm.
To test the validity, effectiveness and efficiency of the framework, it is applied to follicular lymphoma pathological images, which exhibit complex background and extracellular texture with non-uniform illumination condition. For comparison purposes, the results of the proposed segmentation algorithm are evaluated against the outputs of Superpixel, Graph-Cut, Mean-shift, and two state-of-the-art pathological image segmentation methods using ground-truth that was established by manual segmentation of cells in the original images. Our segmentation algorithm achieves better results than the other compared methods. The results of splitting are evaluated in terms of under-splitting, over-splitting, and encroachment errors. By summing up the three types of errors, we achieve a total error rate of 5.25% per image.
Histopathological image segmentation; touching-cell splitting; supervised learning; color-texture feature extraction; local fourier transform; discriminant analysis; radial-symmetry point; follicular lymphoma
There is critical need for improved biomarker assessment platforms which integrate traditional pathological parameters (TNM stage, grade and ER/PR/HER2 status) with molecular profiling, to better define prognostic subgroups or systemic treatment response. One roadblock is the lack of semi-quantitative methods which reliably measure biomarker expression. Our study assesses reliability of automated immunohistochemistry (IHC) scoring compared to manual scoring of five selected biomarkers in a tissue microarray (TMA) of 63 human breast cancer cases, and correlates these markers with clinico-pathological data. TMA slides were scanned into an Ariol Imaging System, and histologic (H) scores (% positive tumor area x staining intensity 0–3) were calculated using trained algorithms. H scores for all five biomarkers concurred with pathologists’ scores, based on Pearson correlation coefficients (0.80–0.90) for continuous data and Kappa statistics (0.55–0.92) for positive vs. negative stain. Using continuous data, significant association of pERK expression with absence of LVI (p = 0.005) and lymph node negativity (p = 0.002) was observed. p53 over-expression, characteristic of dysfunctional p53 in cancer, and Ki67 were associated with high grade (p = 0.032 and 0.0007, respectively). Cyclin D1 correlated inversely with ER/PR/HER2-ve (triple negative) tumors (p = 0.0002). Thus automated quantitation of immunostaining concurs with pathologists’ scoring, and provides meaningful associations with clinico-pathological data.
breast cancer; p53/cyclin D1/Ki67/pERK; tissue microarray; automated image analysis; clinico-pathological parameters
Summary: High-throughput DNA sequencing technologies have spurred the development of numerous novel methods for genome assembly. With few exceptions, these algorithms are heuristic and require one or more parameters to be manually set by the user. One approach to parameter tuning involves assembling data from an organism with an available high-quality reference genome, and measuring assembly accuracy using some metrics.
We developed a system to measure assembly quality under several scoring metrics, and to compare assembly quality across a variety of assemblers, sequence data types, and parameter choices. When used in conjunction with training data such as a high-quality reference genome and sequence reads from the same organism, our program can be used to manually identify an optimal sequencing and assembly strategy for de novo sequencing of related organisms.
Availability: GPL source code and a usage tutorial is at http://ngopt.googlecode.com
Supplementary information: Supplementary data is available at Bioinformatics online.
Recent technological advances as well as progress in theoretical understanding of neural systems have created a need for synthetic spike trains with controlled mean rate and pairwise cross-correlation. This report introduces and analyzes a novel algorithm for the generation of discretized spike trains with arbitrary mean rates and controlled cross correlation. Pairs of spike trains with any pairwise correlation can be generated, and higher-order correlations are compatible with common synaptic input. Relations between allowable mean rates and correlations within a population are discussed. The algorithm is highly efficient, its complexity increasing linearly with the number of spike trains generated and therefore inversely with the number of cross-correlated pairs.
Motivation: High-throughput single nucleotide polymorphism (SNP) arrays have become the standard platform for linkage and association analyses. The high SNP density of these platforms allows high-resolution identification of ancestral recombination events even for distant relatives many generations apart. However, such inference is sensitive to marker mistyping and current error detection methods rely on the genotyping of additional close relatives. Genotyping algorithms provide a confidence score for each marker call that is currently not integrated in existing methods. There is a need for a model that incorporates this prior information within the standard identical by descent (IBD) and association analyses.
Results: We propose a novel model that incorporates marker confidence scores within IBD methods based on the Lander–Green Hidden Markov Model. The novel parameter of this model is the joint distribution of confidence scores and error status per array. We estimate this probability distribution by applying a modified expectation-maximization (EM) procedure on data from nuclear families genotyped with Affymetrix 250K SNP arrays. The converged tables from two different genotyping algorithms are shown for a wide range of error rates. We demonstrate the efficacy of our method in refining the detection of IBD signals using nuclear pedigrees and distant relatives.
Availability: Plinke, a new version of Plink with an extended pairwise IBD inference model allowing per marker error probabilities is freely available at: http://bioinfo.bgu.ac.il/bsu/software/plinke.
Contact: email@example.com; firstname.lastname@example.org
Supplementary Information: Supplementary data are available at Bioinformatics online.
Motivation: High-throughput sequencing technologies place ever increasing demands on existing algorithms for sequence analysis. Algorithms for computing maximal exact matches (MEMs) between sequences appear in two contexts where high-throughput sequencing will vastly increase the volume of sequence data: (i) seeding alignments of high-throughput reads for genome assembly and (ii) designating anchor points for genome–genome comparisons.
Results: We introduce a new algorithm for finding MEMs. The algorithm leverages a sparse suffix array (SA), a text index that stores every K-th position of the text. In contrast to a full text index that stores every position of the text, a sparse SA occupies much less memory. Even though we use a sparse index, the output of our algorithm is the same as a full text index algorithm as long as the space between the indexed suffixes is not greater than a minimum length of a MEM. By relying on partial matches and additional text scanning between indexed positions, the algorithm trades memory for extra computation. The reduced memory usage makes it possible to determine MEMs between significantly longer sequences.
Availability: Source code for the algorithm is available under a BSD open source license at http://compbio.cs.princeton.edu/mems. The implementation can serve as a drop-in replacement for the MEMs algorithm in MUMmer 3.
Supplementary information: Supplementary data are available at Bioinformatics online.
Hidden Markov models are widely employed by numerous bioinformatics programs used today. Applications range widely from comparative gene prediction to time-series analyses of micro-array data. The parameters of the underlying models need to be adjusted for specific data sets, for example the genome of a particular species, in order to maximize the prediction accuracy. Computationally efficient algorithms for parameter training are thus key to maximizing the usability of a wide range of bioinformatics applications.
We introduce two computationally efficient training algorithms, one for Viterbi training and one for stochastic expectation maximization (EM) training, which render the memory requirements independent of the sequence length. Unlike the existing algorithms for Viterbi and stochastic EM training which require a two-step procedure, our two new algorithms require only one step and scan the input sequence in only one direction. We also implement these two new algorithms and the already published linear-memory algorithm for EM training into the hidden Markov model compiler HMM-CONVERTER and examine their respective practical merits for three small example models.
Bioinformatics applications employing hidden Markov models can use the two algorithms in order to make Viterbi training and stochastic EM training more computationally efficient. Using these algorithms, parameter training can thus be attempted for more complex models and longer training sequences. The two new algorithms have the added advantage of being easier to implement than the corresponding default algorithms for Viterbi training and stochastic EM training.
DNA microarrays have emerged as a viable platform for detection of pathogenic organisms in clinical and environmental samples. These microbial detection arrays occupy a middle ground between low cost, narrowly focused assays such as multiplex PCR and more expensive, broad-spectrum technologies like high-throughput sequencing. While pathogen detection arrays have been used primarily in a research context, several groups are aggressively working to develop arrays for clinical diagnostics, food safety testing, environmental monitoring and biodefense. Statistical algorithms that can analyze data from microbial detection arrays and provide easily interpretable results are absolutely required in order for these efforts to succeed. In this article, we will review the most promising array designs and analysis algorithms that have been developed to date, comparing their strengths and weaknesses for pathogen detection and discovery.
microarrays; pathogens; genomics
The Gleason score is the single most important prognostic indicator for prostate cancer candidates and plays a significant role in treatment planning. Histopathological imaging of prostate tissue samples provides the gold standard for obtaining the Gleason score, but the manual assignment of Gleason grades is a labor-intensive and error-prone process. We have developed a texture classification system for automatic and reproducible Gleason grading. Our system characterizes the texture in images belonging to a tumor grade by clustering extracted filter responses at each pixel into textons (basic texture elements). We have used random forests to cluster the filter responses into textons followed by the spatial pyramid match kernel in conjunction with an SVM classifier. We have demonstrated the efficacy of our system in distinguishing between Gleason grades 3 and 4.
Gleason grading; prostate cancer; texture classification
Currently, measuring ethanol behaviors in flies depends on expensive image analysis software or time intensive experimenter observation. We have designed an automated system for the collection and analysis of locomotor behavior data, using the IEEE 1394 acquisition program dvgrab, the image toolkit ImageMagick and the programming language Perl. In the proposed method, flies are placed in a clear container and a computer-controlled camera takes pictures at regular intervals. Digital subtraction removes the background and non-moving flies, leaving white pixels where movement has occurred. These pixels are tallied, giving a value that corresponds to the number of animals that have moved between images. Perl scripts automate these processes, allowing compatibility with high-throughput genetic screens. Four experiments demonstrate the utility of this method, the first showing heat-induced locomotor changes, the second showing tolerance to ethanol in a climbing assay, the third showing tolerance to ethanol by scoring the recovery of individual flies, and the fourth showing a mouse’s preference for a novel object. Our lab will use this method to conduct a genetic screen for ethanol induced hyperactivity and sedation, however, it could also be used to analyze locomotor behavior of any organism.
ethanol; Drosophila melanogaster; tolerance; locomotor behavior; movement analysis; genetic screen; mammal
A Content-Based Retrieval Architecture (COBRA) for picture archiving and communication systems (PACS) is introduced. COBRA improves the diagnosis, research, and training capabilities of PACS systems by adding retrieval by content features to those systems. COBRA is an open architecture based on widely used health care and technology standards. In addition to regular PACS components, COBRA includes additional components to handle representation, storage, and content-based similarity retrieval. Within COBRA, an anatomy classification algorithm is introduced to automatically classify PACS studies based on their anatomy. Such a classification allows the use of different segmentation and image-processing algorithms for different anatomies. COBRA uses primitive retrieval criteria such as color, texture, shape, and more complex criteria including object-based spatial relations and regions of interest. A prototype content-based retrieval system for MR brain images was developed to illustrate the concepts introduced in COBRA.
content-based image retrieval; medical image databases; medical information system; picture archiving and communication systems; information retrieval
One challenge in applying bioinformatic tools to clinical or biological data is high number of features that might be provided to the learning algorithm without any prior knowledge on which ones should be used. In such applications, the number of features can drastically exceed the number of training instances which is often limited by the number of available samples for the study. The Lasso is one of many regularization methods that have been developed to prevent overfitting and improve prediction performance in high-dimensional settings. In this paper, we propose a novel algorithm for feature selection based on the Lasso and our hypothesis is that defining a scoring scheme that measures the "quality" of each feature can provide a more robust feature selection method. Our approach is to generate several samples from the training data by bootstrapping, determine the best relevance-ordering of the features for each sample, and finally combine these relevance-orderings to select highly relevant features. In addition to the theoretical analysis of our feature scoring scheme, we provided empirical evaluations on six real datasets from different fields to confirm the superiority of our method in exploratory data analysis and prediction performance. For example, we applied FeaLect, our feature scoring algorithm, to a lymphoma dataset, and according to a human expert, our method led to selecting more meaningful features than those commonly used in the clinics. This case study built a basis for discovering interesting new criteria for lymphoma diagnosis. Furthermore, to facilitate the use of our algorithm in other applications, the source code that implements our algorithm was released as FeaLect, a documented R package in CRAN.
In the auditory system, the stimulus-response properties of single neurons are often described in terms of the spectrotemporal receptive field (STRF), a linear kernel relating the spectrogram of the sound stimulus to the instantaneous firing rate of the neuron. Several algorithms have been used to estimate STRFs from responses to natural stimuli; these algorithms differ in their functional models, cost functions, and regularization methods. Here, we characterize the stimulus-response function of auditory neurons using a generalized linear model (GLM). In this model, each cell's input is described by: 1) a stimulus filter (STRF); and 2) a post-spike filter, which captures dependencies on the neuron's spiking history. The output of the model is given by a series of spike trains rather than instantaneous firing rate, allowing the prediction of spike train responses to novel stimuli. We fit the model by maximum penalized likelihood to the spiking activity of zebra finch auditory midbrain neurons in response to conspecific vocalizations (songs) and modulation limited (ml) noise. We compare this model to normalized reverse correlation (NRC), the traditional method for STRF estimation, in terms of predictive power and the basic tuning properties of the estimated STRFs. We find that a GLM with a sparse prior predicts novel responses to both stimulus classes significantly better than NRC. Importantly, we find that STRFs from the two models derived from the same responses can differ substantially and that GLM STRFs are more consistent between stimulus classes than NRC STRFs. These results suggest that a GLM with a sparse prior provides a more accurate characterization of spectrotemporal tuning than does the NRC method when responses to complex sounds are studied in these neurons.
High throughput microarray-based single nucleotide polymorphism (SNP) genotyping has revolutionized the way genome-wide linkage scans and association analyses are performed. One of the key features of the array-based GeneChip® Mapping 10K Array from Affymetrix is the automated SNP calling algorithm. The Affymetrix algorithm was trained on a database of ethnically diverse DNA samples to create SNP call zones that are used as static models to make genotype calls for experimental data. We describe here the implementation of clustering algorithms on large training datasets resulting in improved SNP call rates on the 10K GeneChip.
A database of 948 individuals genotyped on the GeneChip® Mapping 10K 2.0 Array was used to identify 822 SNPs that were called consistently less than 75% of the time. These SNPs represent on average 8.25% of the total SNPs on each chromosome with chromosome 19, the most gene-rich chromosome, containing the highest proportion of poor performers (18.7%). To remedy this, we created SNiPer, a new application which uses two clustering algorithms to yield increased call rates and equivalent concordance to Affymetrix called genotypes. We include a training set for these algorithms based on individual genotypes for 705 samples. SNiPer has the capability to be retrained for lab-specific training sets. SNiPer is freely available for download at .
The correct calling of poor performing SNPs may prove to be key in future linkage studies performed on the 10K GeneChip. It would prove particularly invaluable for those diseases that map to chromosome 19, known to contain a high proportion of poorly performing SNPs. Our results illustrate that SNiPer can be used to increase call rates on the 10K GeneChip® without sacrificing accuracy, thereby increasing the amount of valid data generated.
An algorithm was developed to statistically predict ischemic tissue fate on a pixel-by-pixel basis. Quantitative high-resolution (200 × 200 µm) cerebral blood flow (CBF) and apparent diffusion coefficient (ADC) were measured on acute stroke rats subjected to permanent middle cerebral artery occlusion and an automated clustering (ISODATA) technique was used to classify ischemic tissue types. Probability and probability density profiles were derived from a training data set (n = 6) and probability maps of risk of subsequent infarction were computed in another group of animals (n = 6) as ischemia progressed. Predictions were applied to overall tissue fate. Performance measures (sensitivity, specificity, and receiver operating characteristic) showed that prediction made based on combined ADC + CBF data outperformed those based on ADC or CBF data alone. At the optimal operating points, combined ADC + CBF predicted tissue infarction with 86%±4% sensitivity and 89%±6% specificity. More importantly, probability of infarct (PI) for different ISODATA-derived ischemic tissue types were also computed: (1) For the ‘normal’ cluster in the ischemic right hemisphere, PI based on combined ADC + CBF data (PI[ADC + CBF]) accurately reflected tissue fate, whereas PI[ADC] and PI[CBF] overestimated infarct probability. (2) For the ‘perfusion–diffusion mismatch’ cluster, PI[ADC + CBF] accurately predicted tissue fate, whereas PI[ADC] underestimated and PI[CBF] overestimated infarct probability. (3) For the core cluster, PI[ADC + CBF], PI[ADC], and PI[CBF] prediction were high and similar (~90%). This study shows an algorithm to statistically predict overall, normal, ischemic core, and ‘penumbral’ tissue fate using early quantitative perfusion and diffusion information. It is suggested that this approach can be applied to stroke patients in a computationally inexpensive manner.
DWI; multispectral analysis; penumbra; perfusion–diffusion mismatch; PWI; viability thresholds
A methodology based on the fuzzy set theory and the convolution neural network (CNN) architecture is proposed to tackle the problem of reducing false-positive rate in automatic lung nodule detection. The CNN which simulates human visual mechanism was trained by a supervised back-propagation algorithm based on fuzzy membership functions. The training and testing database consists of image blocks (each 32 x 32 pixels) of suspected lung nodule areas (nodule candidates) which were generated from our pre-scanning program . A linguistic label was assigned to each nodule candidate of the training set, then the label was converted to a membership value through a pre-defined membership function and used as teaching signal (desired outputs) during the network learning. Before the nodule candidate was fed to the network input, it was pre-processed to reduce the complex background noise and the contrast discrepancy resulted from film development. During the network testing phase, a defuzzification process was applied to decipher the trained network's output triggered by the nodule candidate in the testing set. Finally, a Receiver Operating Characteristic (ROC) analysis was used to evaluate the CNN's performance based on the defuzzified output of the testing database. Preliminary results showed an average Az (the performance index) of 0.84 which is equivalent to 0.80 true-positive detection (sensitivity) with an average 2-3 false-positive detections per chest image.
Identification of prostatic calculi is an important basis for determining the tissue origin. Computation-assistant diagnosis of prostatic calculi may have promising potential but is currently still less studied. We studied the extraction of prostatic lumina and automated recognition for calculus images. Extraction of lumina from prostate histology images was based on local entropy and Otsu threshold recognition using PCA-SVM and based on the texture features of prostatic calculus. The SVM classifier showed an average time 0.1432 second, an average training accuracy of 100%, an average test accuracy of 93.12%, a sensitivity of 87.74%, and a specificity of 94.82%. We concluded that the algorithm, based on texture features and PCA-SVM, can recognize the concentric structure and visualized features easily. Therefore, this method is effective for the automated recognition of prostatic calculi.
Follicular Lymphoma (FL) is one of the most common non-Hodgkin Lymphoma in the United States. Diagnosis and grading of FL is based on the review of histopathological tissue sections under a microscope and is influenced by human factors such as fatigue and reader bias. Computer-aided image analysis tools can help improve the accuracy of diagnosis and grading and act as another tool at the pathologist’s disposal. Our group has been developing algorithms for identifying follicles in immunohistochemical images. These algorithms have been tested and validated on small images extracted from whole slide images. However, the use of these algorithms for analyzing the entire whole slide image requires significant changes to the processing methodology since the images are relatively large (on the order of 100k × 100k pixels). In this paper we discuss the challenges involved in analyzing whole slide images and propose potential computational methodologies for addressing these challenges. We discuss the use of parallel computing tools on commodity clusters and compare performance of the serial and parallel implementations of our approach.
Follicular Lymphoma; Immunohistochemical; CD10; K-means; Parallel Computing
Pixel-size limitation of lensfree on-chip microscopy can be circumvented by utilizing pixel-super-resolution techniques to synthesize a smaller effective pixel, improving the resolution. Here we report that by using the two-dimensional pixel-function of an image sensor-array as an input to lensfree image reconstruction, pixel-super-resolution can improve the numerical aperture of the reconstructed image by ~3 fold compared to a raw lensfree image. This improvement was confirmed using two different sensor-arrays that significantly vary in their pixel-sizes, circuit architectures and digital/optical readout mechanisms, empirically pointing to roughly the same space-bandwidth improvement factor regardless of the sensor-array employed in our set-up. Furthermore, such a pixel-count increase also renders our on-chip microscope into a Giga-pixel imager, where an effective pixel count of ~1.6–2.5 billion can be obtained with different sensors. Finally, using an ultra-violet light-emitting-diode, this platform resolves 225 nm grating lines and can be useful for wide-field on-chip imaging of nano-scale objects, e.g., multi-walled-carbon-nanotubes.
We introduce GenRev, a network-based software package developed to explore the functional relevance of genes generated as an intermediate result from numerous high-throughput technologies. GenRev searches for optimal intermediate nodes (genes) for the connection of input nodes via several algorithms, including the Klein-Ravi algorithm, the limited kWalks algorithm and a heuristic local search algorithm. Gene ranking and graph clustering analyses are integrated into the package. GenRev has the following features. (1) It provides users with great flexibility to define their own networks. (2) Users are allowed to define each gene’s importance in a subnetwork search by setting its score. (3) It is standalone and platform independent. (4) It provides an optimization in subnetwork search, which dramatically reduces the running time. GenRev is particularly designed for general use so that users have the flexibility to choose a reference network and define the score of genes. GenRev is freely available at http://bioinfo.mc.vanderbilt.edu/GenRev.html.
Gene ranking; Network; Subnetwork; Klein-Ravi algorithm; limited kWalks algorithm; Disease genes
The positron emission tomography (PET) imaging technique enables the measurement of receptor distribution or neurotransmitter release in the living brain and the changes of the distribution with time and thus allows quantification of binding sites as well as the affinity of a radioligand. However, quantification of receptor binding studies obtained with PET is complicated by tissue heterogeneity in the sampling image elements (i.e., voxels, pixels). This effect is caused by a limited spatial resolution of the PET scanner. Spatial heterogeneity is often essential in understanding the underlying receptor binding process. Tracer kinetic modeling also often requires an intrusive collection of arterial blood samples. In this paper, we propose a likelihood-based framework in the voxel domain for quantitative imaging with or without the blood sampling of the input function. Radioligand kinetic parameters are estimated together with the input function. The parameters are initialized by a subspace-based algorithm and further refined by an iterative likelihood-based estimation procedure. The performance of the proposed scheme is examined by simulations. The results show that the proposed scheme provides reliable estimation of factor time-activity curves (TACs) and the underlying parametric images. A good match is noted between the result of the proposed approach and that of the Logan plot. Real brain PET data are also examined, and good performance is observed in determining the TACs and the underlying factor images.
Brain receptor study; compartmental model; distribution volume; dynamic imaging; likelihood; PET; tracer kinetic modeling; voxel-domain quantitative imaging
Description: VARNA is a tool for the automated drawing, visualization and annotation of the secondary structure of RNA, designed as a companion software for web servers and databases.
Features: VARNA implements four drawing algorithms, supports input/output using the classic formats dbn, ct, bpseq and RNAML and exports the drawing as five picture formats, either pixel-based (JPEG, PNG) or vector-based (SVG, EPS and XFIG). It also allows manual modification and structural annotation of the resulting drawing using either an interactive point and click approach, within a web server or through command-line arguments.
Availability: VARNA is a free software, released under the terms of the GPLv3.0 license and available at http://varna.lri.fr
Supplementary information: Supplementary data are available at Bioinformatics online.
We describe a protocol for fully automated detection and segmentation of asymmetric, presumed excitatory, synapses in serial electron microscopy images of the adult mammalian cerebral cortex, taken with the focused ion beam, scanning electron microscope (FIB/SEM). The procedure is based on interactive machine learning and only requires a few labeled synapses for training. The statistical learning is performed on geometrical features of 3D neighborhoods of each voxel and can fully exploit the high z-resolution of the data. On a quantitative validation dataset of 111 synapses in 409 images of 1948×1342 pixels with manual annotations by three independent experts the error rate of the algorithm was found to be comparable to that of the experts (0.92 recall at 0.89 precision). Our software offers a convenient interface for labeling the training data and the possibility to visualize and proofread the results in 3D. The source code, the test dataset and the ground truth annotation are freely available on the website http://www.ilastik.org/synapse-detection.
A novel wildfire segmentation algorithm is proposed with the help of sample training based 2D histogram θ-division and minimum error. Based on minimum error principle and 2D color histogram, the θ-division methods were presented recently, but application of prior knowledge on them has not been explored. For the specific problem of wildfire segmentation, we collect sample images with manually labeled fire pixels. Then we define the probability function of error division to evaluate θ-division segmentations, and the optimal angle θ is determined by sample training. Performances in different color channels are compared, and the suitable channel is selected. To further improve the accuracy, the combination approach is presented with both θ-division and other segmentation methods such as GMM. Our approach is tested on real images, and the experiments prove its efficiency for wildfire segmentation.