|Home | About | Journals | Submit | Contact Us | Français|
The Drosophila brain is formed by an invariant set of lineages, each of which is derived from a unique neural stem cell (neuroblast) and forms a genetic and structural unit of the brain. The task of reconstructing brain circuitry at the level of individual neurons can be made significantly easier by assigning neurons to their respective lineages. In this paper we address the automatization of neuron and lineage identification. We focused on the Drosophila brain lineages at the larval stage when they form easily recognizable secondary axon tracts (SATs) that were previously partially characterized. We now generated an annotated digital database containing all lineage tracts reconstructed from five registered wild-type brains, at higher resolution and including some that were previously not characterized. We developed a method for SAT structural comparisons based on a dynamic programming approach akin to nucleotide sequence alignment, and a machine learning classifier trained on the annotated database of reference SATs. We quantified the stereotypy of SATs by measuring the residual variability of aligned wild-type SATs. Next, we employed our method for the identification of SATs within wild-type larval brains, and found it highly accurate (93–99 %). The method proved highly robust for the identification of lineages in mutant brains, and in brains that differed in developmental time or labeling. We describe for the first time an algorithm that quantifies neuronal projection stereotypy in the Drosophila brain, and use the algorithm for automatic neuron and lineage recognition.
The access to molecular-genetic tools and the ability to reliably identify specific neurons across multiple individuals makes the Drosophila brain a powerful model system for dissecting neuronal function and development (Jefferis et al., 2001; Urbach and Technau, 2004; Yu, 2009). The nervous system of Drosophila and insects in general is formed by a relatively small number of genetically and structurally defined neural lineages. Each contains about 100–150 neurons that are produced by a single, precursor cell, the neuroblast.
Neurons, neuronal lineages and compartments present strong stereotypy throughout all developmental stages of Drosophila. Neuroblasts undergo two proliferative phases. During embryonic development, each neuroblast spawns 10–20 primary neurons which emit axons that fasciculates with their sisters, forming cross-brain recognizable primary axon tracts (PATs) (Nassif et al., 1998). During the larval period, a second proliferative phase generates the secondary neurons (Ito and Hotta, 1992), which remain clustered and extend a cohesive bundle, the secondary axon tract (SAT; fig. 1) (Dumstrei et al., 2003; Pereanu and Hartenstein, 2006). Most SATs follow the corresponding PAT across the neuropile formed by the arborizations of primary neurons (Larsen et al., 2009). Secondary neurons generally do not further differentiate until pupal stages (Nassif et al., 2003), when they extend widespread terminal arborizations which, in conjunction with remodeled arbors of surviving primary neurons, generate the adult brain neuropile. Neuronal lineages represent genetic (Urbach and Technau, 2004) and structural (Ito et al., 1997) modules of brain organization and provide a natural neuron-grouping system. Single neurons map to a lineage by spatial pattern of its axonal tract. Mapping neurons involves computer-assisted manual labor in which expert neuroanatomists perform crucial structure recognition decisions relative to published descriptions. Such manual analysis is very laborious and requires substantial training. The practicality of annotating and mapping each of the over 100,000 neurons of the adult fly brain depends on availability of semi-automated methods, for improved identification speed and quantitative measurement of annotation reliability.
Our goal is to automatize the task of identifying neuronal lineages in the Drosophila brain. We have developed the Neurite Identification Tool (NIT) that takes as input a traced three-dimensional neurite and a set of fiduciary landmarks, and identifies its corresponding lineage by automated comparison with a reference library of manually annotated SATs. NIT performs pairwise global sequence alignment between the input trace and each reference annotated trace, and measures multiple shape- and Euclidean distance-based parameters using the correspondences. A machine learning classifier trained with manual annotations labels each match between the query SAT and each reference SAT as correct or incorrect.
We have measured the robustness of NIT for SAT identification on mutants, on brains with shorter and longer SAT traces than the reference ones, and across developmental time, with satisfactory results. The accuracy of the recognition surpasses 99 % for wild-type brains. Our algorithm can be extended to all developmental stages, and can form the foundation of mapping individual neurons to their parent lineages.
The five brains used for the generation of the SAT reference library were: three brains from a flip-out screen with wild type clones (hs-FLP, elav-Gal4, UAS-mCD8-GFP/FM7c; FRT42D tub-Gal80/Cyo; Fung et al., 2009), an Oregon R carrying en-Gal4/+;UAS-cd8GFP/+ (Kumar et al., 2008), and the 3rd instar brain on which the lineage nomenclature was established (Cha-Gal4,UAS-GFP; stock number 6793 from Bloomington Drosophila Stock Center; Pereanu and Hartenstein, 2006). To generate secondary lineage clones, we applied the FLP/FRT technique (Ito et al., 1997) to induce GFP-labeled clones in early larval brains, as detailed in (Pereanu and Hartenstein, 2006; Fung et al., 2009).
The larval secondary neurons were labeled with an antibody against the Neurotactin protein (BP106; Developmental Studies Hybridoma Bank), or an antibody against the Drosophila N-Cadherin protein (DN-Ex#8, Developmental Studies Hybridoma Bank). Apoptotic cells were labeled with anti-cleaved Caspase-3 antibody (Cell Signaling Technology, Cat #9661S). Glial cells were labeled with an antibody against the Repo protein (8D12; Developmental Studies Hybridoma Bank). Antibody staining of brains was performed as described in (Pereanu and Hartenstein, 2006). Confocal image stacks were acquired using a confocal microscope (40× objective; Laser sharp 2000 from Bio-Rad, Hercules, CA), with 1 or 2 μm section intervals.
The software and documentation are available at http://t2.ini.uzh.ch/nit/. The Neurite Identification Tool (NIT) algorithms have been implemented in Clojure (http://clojure.org) as a TrakEM2 component (http://t2.ini.uzh.ch/trakem2.html). NIT is released under the General Public License (Free Software Foundation; http://www.gnu.org/licenses/gpl-3.0.txt). Semiautomatic secondary axon tract tracing was implemented using the Simple Neurite Tracer library (Mark Longair; http://homepages.inf.ed.ac.uk/s9808248/imagej/tracer/). Three-dimensional visualization was built using the 3D Viewer library (Bene Schmid; http://www.neurofly.de). Registration used a high-performance all-purpose registration library by Stephan Saalfeld (http://fly.mpi-cbg.de/~saalfeld/Projects/). All components are distributed as part of Fiji (http://pacific.mpi-cbg.de), a scientific image processing application based on ImageJ (http://rsb.info.nih.gov/ij).
Secondary axon tracts (SATs) are mostly unbranched at the 3rd instar larva, or present at most a single branch point (Pereanu and Hartenstein, 2006). With sparse labeling, such as with the antibody anti-neurotactin or with targeted GFP expression, SATs are recognizable in the 3rd instar brain as thick processes that centripetally traverse the unlabeled primary neuron cell body clusters, and end up in the neuropile of primary neuron arborizations (fig. 2).
While identifying a SAT as such is simple, the assignment of a precise identity in relation to published descriptions and nomenclature (Pereanu and Hartenstein, 2006) is extremely laborious and error prone. Currently, an experienced Drosophila neuroanatomist requires between 2 and 5 days to identify all secondary lineages in a neurotactin-labeled third instar brain hemisphere (2.5 to 6.25 lineages per hour). Particularly hard are the type II lineages (Bello et al., 2008) such as those in the CM and DPM groups, which present anywhere between 4 and 8 SATs each.
The automatization of neuronal identification has just but started with the creation of databases for neuronal morphology, to which researchers can deposit traced neurons (Ascoli et al., 2007). Numerous methods for quantitative neuroanatomy such as bouton spatial distribution analysis (Sholl analysis; Sholl, 1953; Condron, 2008), tree morphology (van Pelt et al., 1992) and estimations of synaptic density (Geinisman et al., 1996) are available. Developments in computer vision and image processing are driving automatization of neurite tracing (the digitization of the three-dimensional trajectory of a neurite), and bouton and synapse recognition, in light (Weaver et al., 2004; Schmitt et al., 2004; Palhares Viana et al., 2009) and electron microscopy (Macke et al., 2008; Mishchenko, 2009; Jurrus et al., 2009), substantially reducing manual labor. Quantitative analysis of neuronal structures usually take the form of analysis of dendritic and axonal arbors by tree-edit-distance algorithms (Heumann and Wittum, 2009; the tree-edit distance is a metric to express the quantification of the number of editions necessary to transform a tree to another tree, by adding, shifting or removing branches), for the purpose of analyzing spatial distribution of synapses (Weaver et al., 2004) and neuron type classification (Ascoli et al., 2009).
There have been several attempts to identify elements of fly brains based on morphological features, and to quantify their stereotypy and variability, across multiple brain samples. These studies focused on three types of structures: volumes (such as Drosophila brain neuropile compartments; see Jenett et al., 2006), points (such as neuronal cell bodies in the ventral nerve cord, see Bossing et al., 1996 and Schmidt et al., 1997), and paths or arborizations (such as ORNs, or olfactory receptor neurons; see Brandt et al., 2005; Jefferis et al., 2007).
For approximately isometric volumes like neuropile compartments, a simple measure of their relative center of mass may suffice for identification, while the measurement of their volume and relative location may suffice for the rough quantification of their variability (Jenett et al., 2006). However, SATs are essentially linear structures winding in a three-dimensional space, for which there is no obvious spatial center. Even small differences in the starting and ending points, and the length, may alter any center-of-mass-like measurement significantly. By their linear (rather than volumetric) nature, multiple SATs may be confined to the same enclosing volume, not differing significantly in regard to their center of mass. Some SATs may occupy nearly the exact same neuropile space, and yet project in opposite directions (for example, BAmv3 and DPMpm1:2; suppl. fig. 1). For this reason, precise image volume registration alone may be insufficient for identification of lineages or neurons structures in the Drosophila brain despite its high level of stereotypy. One may argue that the relevance of the center of mass of a volume lies in the fact that it approximates a homologous point or region of that volume across multiple brains. Following this idea, we engaged in the search for homologous points on linear objects like SATs. For this purpose, we trace and represent SATs as a sequence of points in space, and then measure relative position (shape) and absolute position (distance) properties of these sequences relative to SAT traces from reference annotated brains. The measurement is performed on resampled SAT traces that have been aligned using their direction vector sequence representation (fig. 4; see below).
The first step in the identification of a SAT is its representation as a sequence of points in space. For this purpose, we have created a simple manual method: using semi-3D Bézier curves (Bézier curves whose points have a linearly-interpolated Z coordinate), a human operator can segment a SAT in well under a minute, approximating the skeleton of the SAT (fig. 2). As an alternative and as part of an effort towards automatization, we also use a semi-automated tracing method (Simple Neurite Tracer by Mark Longair) that automatically generates the most likely 3D path between user-defined starting and ending points as a sequence of sampled points in space.
Using these methods, we manually traced all neurotactin-labeled SATs from 5 Drosophila third instar brain hemispheres, building a reference database of annotated SAT traces.
Each brain differs in its orientation relative to the volume defined by the confocal image stack that contains it. In order to compare SAT traces between brains, the latter must be registered with each other; otherwise, shape and position differences lack meaning. For this purpose, we use internal brain reference points to estimate a 3D transformation. The most obvious reference points to the Drosophila neuroanatomist are the mushroom body lobes, particularly the tips of the dorsal lobe, medial lobe and peduncle, and the peduncle’s branching point into dorsal and medial lobes (fig. 3). These 4 reference points are easily identifiable across a variety of labelings, including no labeling (just differential background intensities).
The mushroom body constitutes an acceptable reference system for the following reasons. First, all lobes reasonably approximate a straight line, and present clear boundaries. Second, their span is that of a large proportion of the neuropile volume, en par with the dimensions of the SATs to analyze. Third, each brain hemisphere contains a unique mushroom body. The mushroom bodies of the two hemispheres of a brain are mirror images of each other. Fourth, the fact that a mushroom body has three approximately perpendicular lobes naturally suggest a well-posed 3D coordinate system. And fifth, the mushroom body is generally conspicuous even in the absence of a specific label, as background. Anti-neurotactin antibody, our molecular marker of choice for SATs, specifically labels the axons of newly born Kenyon cells in the mushroom body. Our initial results suggested that the 4 reference points provided by the mushroom body allow for a good registration of lineages close to the mushroom body (posterior-dorsal, dorsal, dorso-medial, and dorso-anterior lineages), but that they are not sufficient for the lineages which are relatively far from it (the lateral, basal, and posteriormedial lineages). We therefore searched for 4 additional reference points which are evenly spread out throughout the rest of the brain, and which are also easily recognizable in neurotactin-labeled brains. We have defined the BLD 6 elbow point, the BAla split point, the BLV junction point, and the DPM entry point into the neuropile (fig. 3). With all eight reference points, we estimate a non-linear transformation using the Moving Least Squares (MLS) method (Schaefer et al., 2006) for 3D affine transformations which provide an appropriate registration for all proto- and deuterocerebral SATs. By this method, the brain of interest is smoothly warped onto a reference brain, preserving the relative position of internal brain components.
With the 8-point 3D Moving Least Squares transformation, any query SAT can be brought with a very good approximation into the coordinate space of the reference SATs with a very fast operation. The transformation corrects for global orientation, shear, scale including mirroring (hemispheric chirality), and to a sufficient degree for local deformation (fig. 4). We have designed our system to provide the option to constrict the desired transformation to a linear transformation as are translation, rigid body, similarity (isometric scaling), and affine; or non-linear transformation through Moving Least Squares, for minimal to maximal warping, as desired.
The availability of pixel-accurate 3D registration methods (Jefferis et al., 2007) would necessarily increase the accuracy of our SAT recognition system if constrained to the invariant parts of the brain. Relying on manually selected fiduciary points has the practical advantages of 1) avoiding the computationally very time-consuming process of pixel-accurate brain registration, with the inevitable loss of pixel resolution, at the cost of accepting small very local inaccuracies in the registration process (we register not the brain but the traced SATs only, being a very fast computation); 2) to transition nicely into fewer fiduciary points and thus less accurate brain registration (but still valuable registration), if fewer fiduciary points are available due to lack of neurotactin or equivalent labeling; and 3) to improve with the availability of more fiduciary points, such as obtained by automatic 3D feature extraction.
A SAT trace is a sequence of ordered points, where origin and point order matter. Jiang et al. (2002) conceived a method to generate any number of intermediate lines between any pair of lines on the plane, by applying dynamic programming to an arbitrary point sequence representation of the lines. Their method is robust as long as lines do not present loops. SAT traces are ideally suited for the application of Jiang et al. method, with modifications, for the purpose of quantifying the similarity of any pair of SAT traces. The core concept is the transformation of lines into sequences of comparable elements, and then defining a cost function for the comparison of any two elements.
Both the manual and the semi-automatic tracing of a SAT in a confocal stack generates a sequence of points which, because of the nature of Bézier curves (or rather, their de Casteljau approximation) and the integer accuracy of pixel coordinates, respectively, present uneven consecutive point interdistances. In order to generate sequences of comparable elements necessary for dynamic programming comparisons, we homogenize point interdistances by resampling SAT traces with an arbitrary point interdistance d, common to all SAT traces to compare. Then, we convert the sequences of points into their corresponding sequences of direction vectors between consecutive points (fig. 4). For any given pair of traces to compare, the number of vectors thus generated may be unequal, but the resampling to a common point interdistance d imposed an homogeneous, equal vector length (fig 4).
The choice of the resampling point interdistance d will affect the final number of points in the sequence: when d is small, sequences will have more points, and represent more accurately the original SAT trace. In the presence of various sources of noise, increasing d will smooth 3D traces (by eliminating high frequencies), and thereby potentially increase the overall accuracy of the recognition. The number of points, in turn, affects quadratically the number of pairwise operations to compute. The choice of an adequate d for resampling is thus critical for the optimal performance of our algorithm. The calibrated pixel size sets a lower bound (the length of the diagonal of a voxel); the diameter of the volume enclosing a trace (i.e. the brain surface) sets the upper bound. We have explored numerically a range of values for d and found an optimal range between 1 to 6 μm for our Drosophila 3rd instar brain confocal stacks (suppl. fig. 2).
After resampling SAT traces, these sequences of direction vectors constitute sequences of comparable elements: a vector from a query sequence may be compared to a vector of the reference sequence simply by subtraction (the cost function), as shown by (Jiang et al., 2002) for the 2D case. When equal, the length of the resulting vector is zero; when maximally unequal, the length is 2d. With this simple cost function for correspondences, we can apply dynamic programming techniques to compare the shape of any two SAT traces (fig. 4; dynamic programming is a mathematical optimization method, where complex problems–such as sequence alignment–are simplified by breaking them down into simpler subproblems such as the alignment of short subsequences, recursively, so that ultimately individual elements are compared). Dynamic programming requires 2 more costs: that of deleting and that of inserting a vector in the sequence. The trivial choice taken in (Jiang et al., 2002) assigns a cost of 1d to both; our numerical parameter exploration suggests that a cost of 1.1d is a better choice empirically (fig. 2).
Once both query and reference SAT traces have been transformed into a common reference space, resampled to an arbitrary and equal point interdistance d, and reformulated as sequences of direction vectors, we perform global sequence alignment using dynamic programming. Sequence alignment algorithms have been previously used for the alignment of sequences of discrete variables, such as for suggesting correct word spellings (Wagner and Fischer, 1974) and for nucleotide sequence alignment (Smith and Waterman, 1981; Altschul et al., 1990); and for alignment of sequences of continuous variables, such as for curve morphing (Jiang et al., 2002) and for recognition of Drosophila flight trajectory motifs (Grover and Tavaré, 2009). Inspired by Jiang et al, 2002, we formulated an alignment algorithm that provides the basis for the quantification of SAT trace similarity (fig. 4, and supplementary text).
The global alignment matrix delivers the Levenshtein distance and the sequence of editions (the correspondences) between the elements of the direction vector sequences representing the SAT traces (fig. 4). The Levenshtein distance measures shape similarity. However, shape information is not enough to confer identity: two SAT traces of very similar shape but far apart in the brain would score as very similar. A good parameter describing SAT trace similarity would incorporate not just shape but also Euclidean distance. Measuring Euclidean distances between two sequences of points in space is not possible without a strategy to select which points to measure against which. The sequence of editions is one such strategy, which provides point correspondences between the query and the reference sequence based on shape similarity around any given point. Thus by combining the sequence of editions, which describes point correspondences between two SAT traces, and the sequences of points in space describing each SAT trace, we compute the mean Euclidean distance between corresponding points, synthesizing information on both shape and position in the brain (fig. 4 and suppl. text).
We measure a total of eleven parameters from the global sequence alignment of two SAT traces (suppl. text). The implementation of a successful SAT trace comparison algorithm revealed a number of strengths and limitations of these eleven parameters. Of note, the sensitivity of many parameters to uneven SAT trace lengths, and to large insertions or deletions; and their value in handling branching events; on all of which we have elaborated further in the supplementary text.
We developed the Neurite Identifier Tool (NIT) with the aim of identifying semiautomatically secondary neuronal lineages in the Drosophila third instar larva brain. For this purpose, we need both a comparison algorithm as described above, and a reference set of annotated traces of SATs representing all the secondary neuronal lineages.
To build the reference set, we collected 5 brain hemispheres from non-isogenic third instar Drosophila melanogaster (see Material and Methods for details). We labeled all brains with the BP106 anti-neurotactin antibody, and imaged them with confocal microscopy from different orientations. The subsequent 5 confocal image stacks were imported into the software package TrakEM2, and all their distinctive secondary axon tracts (SATs) manually traced as described above. Partial SAT tracing was performed in 21 additional preparations labeled with anti-neurotactin or an antibody against Drosophila E-cadherin, containing GFP-labeled flip-out clones.
We traced all SATs from all proto- and deuterocerebral lineages from the 4 new stacks and from the stacks with flip-out clones. We first identified and annotated the clones manually by visual inspection, using the preparation on which the published nomenclature is based as the gold standard (Pereanu and Hartenstein, 2006). The annotation was a very laborious iterative process. Many SATs that have highly characteristic shapes and position were easily classified into a lineage group or subgroup, or even fully identified (all BA, DAL, DAM, BLVa; DPMm1, BLD5 and BLD6). Other SATs were resolved only up to the lineage group. Numerous SATs in the reference brain were traced short of their distal ending; other SATs were mistraced (e.g. DPLd and DPMcm1, and most CM). Numerous lineage SATs were fully resolved in one of the five reference brains or as a GFP-labeled clone. We propagated partial lineage identity resolution in one of the five brains to the other four brains, until all SAT traces were assigned an identity consistent with the other brains and with the published nomenclature in Pereanu and Hartenstein (2006)).
To improve the accuracy of traces and annotations, and to identify errors, we clustered all SATs from all 5 brains by mean Euclidean distance (fig. 5). In the resulting tree, we expect each set of 5 SAT instances, one from each brain, to cluster closely and separately from the rest when correctly annotated. When this was not the case, we revised the outlier SAT traces and annotations, and rerun the clustering. GFP-labeled lineage clones were crucial for the resolution of numerous lineages, particularly for the type II poly-SAT lineages like in the CM and DPM groups (Bello et al., 2008). The final tree contained only 4 outliers (see below; fig 5).
The secondary neuronal lineage nomenclature described in (Pereanu and Hartenstein, 2006) provided individual names for most lineages. Several lineages (e.g., DALcl1/2) were described as pairs of adjacent ‘sister’ lineages whose SATs were so similar and close to each other that they could not be resolved. Finally, the diversity within lineages was not further considered. Numerous lineages are composed of smaller units (sublineages: Bello et al., 2008; hemilineages: Cornbrooks et al., 2007) which, even though initially they project in a common SAT, they may later split into two or more branches. With the help of GFP-labeled clones, we have resolved as many as possible of the sister lineages into individual lineages, and specified several SATs as sublineages.
Furthermore, we had to extend the nomenclature to distinguish individual axon tracts (SATs) formed by hemilineages or sublineages. Each SAT takes the name of the enclosing lineage plus a numerical postfix. For example, lineage CM3 has multiple prominent sublineages termed CM3:1, CM3:2, CM3:3, etc. In general, numbers are lowest the more medial and dorsal the SAT trace lies.
The nomenclature described in (Pereanu and Hartenstein, 2006) defined a few lineages in groups consisting of adjacent or sister lineages (such as DPMpl1/2; and numerous others). We have resolved as many as possible into individual lineages, with the help of flip-out clones and by comparing multiple dense-labeled neurotactin brains.
Six new lineages were added to the map published by Pereanu and Hartenstein (2006). Five of the six newly described lineages (BLD6; BLP6; DALl2; BAlp4; DPMpl4) could be identified in all five reference brains. BLD7 was identified in 2 brains only.
Four of the previously identified lineages (CM2, BAlc2, DPLc5, BLVp2) turned out to be sub/hemilineages, and have been removed. An updated list of all SATs is provided, which details the transition between the old and the new nomenclature (Suppl. table 1).
The large majority of SATs could be identified in all five brains. However, a few SATs do not have homologs across all brains (see suppl. table 2), indicating either differential labeling by the anti-neurotactin antibody marker BP106, our inability to resolve the SAT trace in a dense neurotactin labeling (likely the case for the type II secondary lineages of the CM group; Bello et al., 2008), or variability of SATs. (only 2 out of 5: BLD7, DPLc4; only 3 out of 5: BLAd3, BLD4, BLP1:1, DPMl2, DPMpm2:2; only 4 out of 5: BAmv1:1, BLAd4, CM3:2, CM3:8, CM4:1, CM5, CP1:2, DALl3, DPLl2:2, DPLpv:2, DPMpm1:1, DPMpm1:2, DPMpm2:1, DPMpm2:3.)
Insect brains and Drosophila brains in particular have been described as highly stereotypical (Hartenstein and Campos Ortega, 1997; Jenett et al., 2006; Technau, 2008). Individual cells are recognizable across brains of different individuals (for neuroblasts, see Urbach and Technau, 2004; for late embryo ventral nerve cord neurons, see Bossing et al., 1996 and Schmidt et al., 1997) and in the first instar larva (personal observation). We used the most discriminative parameter of NIT, the mean Euclidean distance, to measure the interbrain variability of secondary axon tracts (SATs).
We use the tree of clustered lineages introduced above (fig. 5). The tree illustrates that roughly half of the lineage traces are most similar to their homonymous partners across brains and thus cluster together (for instance, DAMd1; fig. 5 A). Only three pairs are more similar intrabrain with their sister lineages than interbrain (BAmas 1 and 2, and to a lesser degree CP2/3 :2 and :3; DAMv 1 and 2; DALd and DALcm2:2; fig. 5 B). A few sets of lineages appear unresolved. The least well resolved set consists of the BLAd and the BLD 1–4, which are all closely apposed in space, and all project into the dorsal half of the compact transverse superior fascicle (Pereanu and Hartenstein, 2006). A second unresolved set includes the CM1 and CM3:1 and :2 traces, and the DPMpm1 and 2, DPMpl2 and CM4 traces.
For the lineage BAlv, all 5 instances (one per brain) cluster together in the same clade, but present substantial distance from each other (fig. 5 C). The location of BAlv, as its name indicates, is baso-anterior lateral ventral, furthest away from the eight 3D landmarks used for estimating a 3D transformation for volume registration (fig. 3). All tritocerebral lineages are likewise affected, and have thus been used as outgroups for neighbor joining.
Only 3 out of 634 traces are outliers in the sense that they don’t get resolved well and appear as sister branches to entire groups, near their homonymous traces in the other brains. These are e-BAlc:2:2, e-CM5, and d-DPLpv:1 (where the prefix ‘a’ to ‘e’ indicates one of the five brains). Their idiosyncratic position in the tree despite their spatial position in the brain next to non-idiosyncratic traces suggests true variability in the projection pattern of these SATs.
We created a consensus trace for each SAT by condensing up to five SAT traces into a single one. We measured the mean Euclidean distance for all possible pairwise combinations, and then, using an UPGMA (Dubes and Jain, 1988) strategy, iteratively merged the closest pair until only one trace remained. Merging was performed in a weighted manner (Jiang et al., 2002), where the weight represents the proportion of the number of original traces that each trace in the pair contributes. (E.g. in merging two original SAT traces, each contributes itself only with a weight of 1/2. In merging an original trace with a trace resulting of merging two traces, the weights are 1/3 and 2/3, respectively.)
We used the consensus trace as the ideal trace of each SAT. We then measured the deviation of the real SAT traces from the ideal, in two ways: 1) by determining the minimum envelope that encloses all original SAT traces; and 2) by plotting the standard deviation at each point in the consensus SAT trace sequence. We generated the lists of points of the source traces which were used to generate each point of the consensus trace. We measured the maximum distance between the consensus point and its corresponding source points, and calculated the standard deviation. As a visual representation of SAT stereotypy, we built the minimum enclosing envelope by generating a tube centered on the consensus trace and has the maximum distance as its radius at any given point. The stereotypy of a SAT is correlated to the radius of its minimum enclosing envelope.
When plotting the standard deviation of radii along the trace, we observed a pattern common to most SATs. The initial segment of the envelope (corresponding to the trajectory of the SAT as it crosses the brain cortex), has a conical shape, with the base of the cone at the brain surface. For most of the trajectory of a SAT within the neuropile, the minimum envelope is cylindrical and of relatively small diameter. Distally, towards the ending of the SAT in its target area, we typically observed another, smaller widening of the envelope (see fig. 6).
The Neurite Identifier Tool (NIT) measures eleven parameters that describe the degree of (dis)similarity between any two 3D SAT traces. We test the discriminative power of each parameter by comparing each annotated SAT of any of the five brains against the annotated SATs in each of the other four brains. For each comparison, we sorted the four sets of results and extracted from each the index of the homonymous SAT; ideally the top one. For the few combinations in which a brain didn’t contain the homonymous SAT, such were not considered.
The results of the parameter analysis indicate that the mean Euclidean distance between the substitution correspondences has the highest discriminative power: 82.0 % of homonymous top matches; 93.8 % for an homonymous match within the top 2; and saturates after top 5 with 99.0 % (table 1). By the same parameter, the recognition of the lineage group is 97.1% (where lineage group is one of BA, BLV, etc.). All other parameters present lower discriminative power. The accuracy in the classification varies among lineage groups. Some lineage groups score better than others, ranging from 100 % accuracy for SATs in groups BA and DAM, to 90.4 % for group BLA (table 2).
The combination of multiple parameters allows for more reliable SAT identification. We use a machine learning method known as Random Forest (Breiman, 2001), which combines many decision trees (70 in our case; see suppl. fig. 3), to learn the best way of separating the samples between two different classes: correct or incorrect. The decision trees are simple binary trees in which each node divides the set of samples based on the most differentiating feature at the given tree level. In this fashion, the deeper we go in the tree, the better samples are differentiated.
We trained the algorithm with the results of comparing, on the expert-classified data, each SAT trace in one brain to all SAT traces in every other brain (5 brains total, hence four lists of results for each trace). The number of expected incorrect matches is much higher than the expected correct matches; hence, to avoid overfitting for incorrect matches, we trained with only the top 8 results of each test, as sorted by the most discriminative parameter (mean physical distance, which by itself never scores in our data a correct match lower than at position 8). We used an open source implementation of the Random Forest approach (WEKA library, Witten and Frank, 2005), and stored the trained model for practical application to SAT classification.
The Random Forest approach results in 99.7 % (2358 of 2368) correct good matches and 99.98 % correct bad matches (292745 of 292753), with only 8 false positives and 10 false negatives (table 3). All 8 false positives involve matches between DAMd2 and DAMd3, two extremely similar and closely overlapping SATs.
We tested the reliability of the classifier in identifying SATs of a brain not belonging to the training set. We traced 80 SATs, including the hardest lineage groups, BLA and BLD, in full. The classifier presented a high number of false positives (table 4), but non-homogeneously distributed: 42 out of 80 traces had zero false positives, and another 20 had 3 or less false positives. In 75 out of 80 SAT traces, the classifier found at least one true positive (true positives: 3.30 ±1.66; false positives: 1.71 ±2.31; false negatives: 1.89 ± 1.61). The 5 SATs without a true positive were relatively short SATs (BAlp3, BAlp4, BLD1:2, BLD7, and DPLm1). When sorting results by the mean Euclidean distance, the top result was correct in 68/80 cases, and the top two results contained a homonymous SAT in 76/80 cases. An example of comparing an unknown SAT to all traced SATS in the database is show in figure 7.
The ablation of glial cells results in severely deformed brains, affecting the growth and pathfinding of secondary axon tracts (SAT) (Spindler et al., 2009). In order to test the robustness of our SAT identification algorithm, we traced and annotated 20, 25, 30 and 42 SATs in 4 glia-less 3rd instar brain hemispheres (UAS-hid,UAS-rpr; Nirvana2-GAL4,UAS-GFP; tubGAL80[ts], confocal image stacks kindly provided by Shana Spindler), including the subset of lineages analyzed for fasciculation and growth defects in (Spindler et al., 2009) (fig. 8). Despite SAT defects in the absence of neuropile glia, all SATs were classified correctly except in 1–4 lineages per brain, corresponding to lineages with a very high glia-association score (BAmas, DALcm1, DALcl1, and CP1). The BLD1–4 lineages, whose SAT is short and overlapping, and for which the classifier has a higher error rate in wild type (table 2), were identified only at the lineage group level. The correct identification of a lineage like CP1 in three brains but not in the fourth may be explained by the incomplete penetrance of the heat-shock induced glia-less phenotype.
In accordance with the findings of Spindler et al. (2009), we observe 3 types of pathfinding errors: fusion of the proximal segment of the SAT in sister lineages (the four mushroom body lineages, DPLal1–3, DAMv1/2, and BLD1–4; fig. 8 C-E); shorter terminal projection (BAmas, DALcm1 and DALcl1; fig. 8 F); and complete misrouting (CP1; fig. 8 G; see table 1 in Spindler et al, 2009). Only complete misrouting prevented the classifier from suggesting appropriate SAT annotations. Of note, the absence of glial cells severely disrupted the formation of the optic lobe, which lays immediately adjacent to the BLV, BLD and BLA groups; yet, SATs of the BLV and BLA group were identified correctly.
The success in identifying nearly all SATS in a severely disrupted 3rd instar brain suggests that the combination of a simple 8-landmark based registration approach and the parameters used by NIT are robust enough for reliable SAT identification.
Secondary axon tracts show a temporally and spatially dynamic expression of molecular markers. Some proteins, at a given stage, may be only found in part of the neuron. As a result, a SAT traced for identification may appear labeled over a much shorter length than that of its correspondent in the database. The example we present here is a larval brain labeled with an antibody against DE-cadherin (fig. 9). This adhesion protein is transiently expressed in newly born secondary neurons and therefore visualizes SATs; it is also expressed on glial processes, which prevents one from following SATs in the neuropile.
On the other hand, SATs may be substantially longer than the traces by which they are represented in our database. This applies for the SATs of all lineages with a commissural projection. Since SATs of commissural lineages fasciculate with their contralateral counterparts, they cannot be traced beyond the midline in preparations that are globally labeled with anti-neurotactin. However, using clonal labeling techniques, these lineages may be visualized in their entirety.
The Neurite Identifier Tool (NIT) would deliver inappropriate results in most cased when scoring a SAT trace considerably shorter or longer that its true correspondent in the database. Realizing that the true SAT trace correspondence must lack large sequences of insertions or deletions, we devised a strategy consisting in performing global sequence alignment of the shortest of either the query or the reference SAT trace, over all possible continuous longer trace subsequences of the shorter length (fig. 9 B).
To test the reliability of the assignment for shorter SAT traces, we manually annotated SATs in an anti-DE-cadherin labeled brain. For the eleven lineages analyzed in the test brain (fig. 9), nine were conclusively identified. For the remaining two, only the lineage name but not the specific SAT was identified (fig. 9 H). The random forest classifier presented an increased number of false positives. These arise as a result of very good matches with multiple SATs, since the short fragment may be common to multiple SATs, when the latter join common brain tracts, or near the cell bodies before diverging significantly (fig. 9 H).
For longer SAT traces, we used the flip-out clones of secondary lineages which aided in the construction of the database, and which cross the midline or are considerably longer than the span of the neurotactin-labeled fraction of the SAT. All 12 longer SAT traces tested were correctly identified by the classifier, with 0 to 6 false positives for each.
Primary neurons project a primary axon tract (PAT) into the neuropile (Nassif et al., 1998; Younossi-Hartenstein et al., 2006) where they arborize profusely, forming the larval neuropile. Secondary neurons, which develop throughout larval stages, form a secondary axon tract (SAT) that follows the PAT into the neuropile (Larsen et al., 2009). We traced the proximal segment of the PAT for four individual primary neurons and two clusters of three primary neurons labeled with anti-dopamine in 3rd instar brains. Then we run NIT on the traces to identify SATs that follow similar trajectories (fig. 10). We found that the four individual primary neurons where followed by four distinct SATs (BLVa1, a2 and a3; and DPMm1:4; fig. 10 A–E), and one cluster of 3 primary neurons was followed by one SAT (CP2/3:1; fig. 10 A–C). The remaning dorsomedial cluster of 3 primary neurons was not followed by any SAT. While primary neuron identification results were not and cannot be conclusive when comparing with SATs, NIT cuts down search to about 3 SATs in each case, suggesting between 2 and 6 positive matches with repetitions (fig. 10 B, C).
Secondary neurons arborize during pupal stages, forming the largest fraction of the adult fly brain. The low-order branch of the SAT remains recognizable for at least a subset of lineages with anti-Neuroglian antibody, or with genetically targeted GFP-labeling. We traced the two SATs of one of the two CP2/3 sister lineages (fig. 11 A) and run NIT against the 3rd instar SAT reference library. There were 2 true positives for one SAT (CP2/3:3; fig. 11 C). While the classifier did not find any positives for the other SAT (CP2/3:1; fig. 11 B), the top 5 results contained 3 instances of CP2/3:1. The 3D registration with only 4 fiduciary landmarks (the four mushroom body corners) proved sufficient for NIT to provide good annotation suggestions for the adult CP2/3 lineage SATs.
The algorithmic classification of an anatomical structure depends both on the recognition and quantification of its geometrical properties, and on the correct elucidation of its relative spatial location to other, potentially similar structures. We have approached the spatial positioning problem with a 3D registration strategy based on eight fiduciary points, which are easily recognizable and evenly distributed (fig. 3). Each brain was mounted in a different orientation, and thus artifactual deformations were introduced in a non-systematic way. However, SAT identification across brains is accurate in wild-type brains, in the four mutant glia-less brains, and in the DE-cadherin labeled brain (tables (tables11--44).
We have observed the effect of insufficient brain-to-brain registration for some SATs, as an increased distance between correspondent SATs across the 5 reference brains (e.g. BAlv; fig. 5 C). The fact that no other lineage presents such large distances between its cognates (fig. 5), suggests that volume registration with only eight landmarks is sufficient for the identification of proto- and deuterocerebral lineages. All tritocerebral lineages, which lay beyond the volume circumscribing the eight fiducial points, cannot be reliably identified.
The better the 3D brain registration, the more reliable the recognition of any brain structure will be. Our approach is flexible regarding the number of fiduciary points. For many lineages, using only the four corner points of the mushroom body is sufficient for accurate recognition. The additional four fiduciary points greatly enhance the fidelity for many other lineages, particularly the basal and lateral groups. If needed, further fiduciary points could be added. The strength of our approach is that it performs reliably even if ‘perfect’ registration is not possible (e.g. across mutant brains; fig. 8).
We presented the Neurite Identification Tool (NIT) for the quantitative measurement of similarity between a pair of neurite traces in the same reference space. We built a reference library of traced and annotated secondary axon tracts (SATs) labeled with anti-Neurotactin at the 3rd instar developmental stage of Drosophila larva, which is publically available at http://t2.ini.uzh.ch/nit/. We applied the tool for the semiautomatic annotation of SATs in wild-type, and extended and refined the secondary lineage nomenclature by Pereanu and Hartenstein, 2006. We quantified the accuracy of the automatically suggested annotations, and we found it to be between 93% and 99%. We tested the automatic annotation in mutant 3rd instar brains, and in brains labeled with markers other than anti-Neurotactin, and found the accuracy to be very high. Remarckably, we were able to use NIT for the discovery of primary neurons associated with SATs, and for the identification of SATs in the adult brain. Upon recognition of a few fiduciary points for 3D registration, NIT significantly decreases the time it takes for identifying a neuron or a lineage. Therefore, our tool may provide the means to link digital atlases of neurons and neuronal lineages across developmental stages of Drosophila, from late embryo to the adult.
There exist differences in the accuracy with which individual lineages could be identified with their proper cognate in the reference database. Shorter SATs are likelier to share their trajectory with other SATs, and are less reliably identified. Lineages BLD1-4, BLAd2-3, and BLAl, present relatively short SATs that project into the compact transverse superior fascicle. Beyond small differences in their relative position, their distinctive feature is the position of the cell body cluster. Consequently, SATs in the lineage groups BLD and BLA are among the hardest to annotate by hand, and among the least well discriminated by the median Euclidean distance parameter (table 2).
The second type of errors in SAT identification occurs in cases where multiple lineages share, for substantial parts of their trajectories, a compact fascicle, before diverging from each other to form more discrete terminal segments. This happens for the CM lineage group, all of which share the longitudinal central or longitudinal superior medial fascicle. SATs of the CM groups are also hard to annotate by hand in neurotactin-labeled brains, given 1) the elevated number of SATs per lineage, and 2) the high density of SATs from multiple lineages near the future location of the central complex. The generation of GFP-labeled flip-out clones was essential for the manual annotation of CM lineages.
The ‘sister lineages’ represent a third set of instances with problematic manual and algorithmic identification. The prime example are DAMd2 and DAMd3, which the trained classifier cannot discriminate from each other, presenting results with numerous false positives. These lineages are particularly hard to annotate by hand, and their annotation may remain ambiguous until sufficient GFP-labeled flip-out clones indicate individual characteristics.
The concept of morphological stereotypy may be defined as the degree of variability of a structure across individuals as measured under a given set of conditions. Our SAT semiautomatic annotation approach was made possible by the observed strong stereotypy of neuronal components in Drosophila and insect brains in general. In particular, neuroblasts (Technau, 2008) and neuronal cell bodies (Hiesinger et al., 2006), primary neuronal lineages (Nassif et al., 1998), secondary neuronal lineages (Ito and Hotta, 1992; Ito et al., 1997; Pereanu and Hartenstein, 2006), olfactory neurons (Jefferis et al., 2007), and neuropile compartments (Younossi-Hartenstein et al., 2003; Jenett et al., 2006) have been described as highly stereotypical in their position, dimensions and numbers.
Stereotypy, as defined, is a function of measurement conditions. Measuring stereotypy to generate consensus SAT traces, or to identify the same SAT in different brains, requires non-linear three-dimensional registration, which eliminates global and local differences in brain size and shape. These differences originated in artifactual deformations induced by sample preparation, in the exact developmental time at which each larva was fixed and dissected, and in phenotypical differences among individuals. Under the condition of equal coordinate space, corresponding SATs from different wild-type individuals are very similar, yet not identical. We have observed a non-homogeneous distribution of the residual variability (fig. 6), defined as the remaining variability between two structures after bringing them into approximately the same coordinate space.
The proximal segment of a SAT, located close to the cell bodies, is generally more variable than the rest, in agreement with reports on relatively high variability in the location of the neuroblasts and neuronal cell bodies (Pereanu and Hartenstein, 2006). Neuroblasts and their attached cell body clusters can vary up to about the diameter of a lineage (approximately 10–12 μm). By contrast, we find a much lower variability of the middle and distal segments of the SAT, which indicates a stricter regulation of axon pathfinding and positioning within the neuropile (fig. 6).
We report a high accuracy in the recognition of SATs at 3rd instar larva. We tested the robustness of our method by attempting the identification of SATs in mutant, severely deformed glia-less brains. Our method correctly identified numerous secondary axon tracts, indicating a preservation of relative positions of SATs within the neuropile despite lacking glial cells. The difficulty of manually annotating SATs in mutants is reduced to determining the position of fiduciary points, then trace and compare a SAT to all SATs in the annotated library. The systematic application of our method to the SATs of glia-less brains highlighted some SATs without conclusive identification, suggesting these lineages were strongly affected, in accordance to their close association with glial sheets (Spindler et al., 2009). The sequence analysis approach employed by our method indicates the presence of tract segment deletions (such as the missing terminal segment in BAmas1/2, fig. 8 G), or complete misrouting (fig. 8 F).
This robustness of NIT will facilitate studies of mutant phenotypes. Current methods for quantifying phenotypic changes in the brain were mostly qualitative and restricted to parts where changes are most obvious, like the mushroom body (Heisenberg et al., 1985; Heisenberg, 1998). Quantitative mutant analysis can now be extended to all central brain lineages.
How far can 3D registration be pushed to obtain valuable suggestions on the identity of a SAT? Beyond individuals, our approach enables identification of primary and secondary lineages across developmental time, nerve cord segments, and species (fig. 12). Easily identifiable lineages, such as those forming the mushroom body, antennal lobe, or central complex, have been identified across all insect taxa (Boyan and Williams, 2007; Strausfeld et al., 2009). With appropriate fiduciary points, NIT could relate lineages from different species to those of Drosophila.
Efforts are underway to use the same techniques by which lineages were reconstructed in the larva to follow neuronal lineage differentiation throughout pupal stages into the adult. Our exploratory data (fig. 11) indicates that the brain, despite undergoing massive growth following arborization of secondary lineages, does not change substantially in the relative position of internal components. However, certain changes occur, being non-trivial to identify adult lineages using their larval instances. For example, certain SATs grow massively; more generally, the clustering of somata changes as the brain cortex expands and simultaneously becomes thinner (Larsen et al., 2009). As a result, proximal SAT segments change in direction. For best accuracy, and in order to detect newly developed or lost SATs in the pupal period, we envision multiple digital atlases of SATs for several pupal stages and the adult brain. Our quantitative analysis of similarities between SATs will be invaluable in establishing lineage identity through time.
Systematic approaches to obtain markers for eventually every cell of the brain and ventral nerve cord have been initiated for the Drosophila brain (Pfeiffer et al., 2008). Such markers are crucial for the usage of Drosophila as a model for neural function, development or pathology.
Neurons of the vertebrate brain are defined topologically in relationship to rich spatial frameworks of reference, composed of compartments (e.g. ‘lateral geniculate nucleus’) and tracts (e.g. ‘olivocerebellar tract’). Neurons in invertebrate brains have been classified based on cell body position (‘dorsomedial group’) or special attributes (‘giant fiber neuron’; ‘optic lobe pioneers’), lacking detailed reference frameworks. This non-systematic classification is insufficient for the comparison of different sets of neurons. We propose neuronal lineages as a high-resolution topological framework for single neuron identification.
With NIT, traced low-order branch segments of labeled neurons may be used for the assignment of neurons to their enclosing lineages with high reliability, providing them with a genetic address. The lineage represents an envelope; the knowledge we have about the envelope will overlap to a large extent with the individual neuron enclosed in it.
We thank Shana Spindler and Louie Garcia for confocal image stacks of glia-less brains and dopaminergic neurons, and Sergio Jiménez for help with designing and setting up a classifier with WEKA. Thanks to Parvez Ahammad and Marta Zlatić for critical comments on this manuscript. This work was funded by EU grant 216593 ‘SECO’, and the NIH Grant RO1 NS29357-15 to V.H.