Fly stocks and nucleic acid probes
Wild-type embryos were cultured in cages for many years, starting with a nominally CantonS strain.
Full length eve
cDNAs were inserted in Gateway pDEST-vectors (M Stapleton, B Grondona, unpublished data). A 1.7 kb Sna cDNA fragment in pBSK(+) was a gift from E Bier (UC Santa Cruz, CA, USA). To create linear DNA templates, pDEST full length cDNAs were amplified using extended vector primers such that the T3 primer sequence was 3' of the cDNA and the T7 primer lay 5' (T7: 5'-GTA ATA CGA CTC ACT ATA GGG ACA TCA CCT CGA ATC AAC A; T3: 5'-AAT TAA CCC TCA CTA AAG GGC GGG CTT TGT TAG CAG C). The pBSK+ cDNA was PCR-amplified using M13 ± primers. Antisense biotin (BIO), digoxigenin (DIG) or dinitrophenyl (DNP)-labeled RNA probes were prepared by in vitro
transcription from PCR generated DNA templates for each gene using T3 RNA polymerase. To increase signal, the probes were not hydrolyzed [43
Wild-type embryos were collected for 1 h and matured for 3 h at 25°C, then dechorionated with 50% household bleach for 3 minutes and fixed for 20 minutes with 1:4 (v/v) solution of 10% formaldehyde (Polysciences, Warrington, PA, USA) and heptane (Sigma, St. Louis, MO, USA). Fixed embryos were devitellinized by shaking vigorously in 1:1 methanol/heptane, after which they were washed three times with methanol and once with 100% ethanol, and stored in ethanol at -20°C.
Embryos were rehydrated in phosphate buffered saline pH 7.2, 0.05% Tween20, 0.2% TritonX-100 (PBT+Tx), post-fixed for 20 minutes in 5% formaldehyde/PBT+Tx, and, after several washes in hybridization buffer (50% formamide, 5 × SSC pH 5.2 to 5.4, 0.2% TritonX-100, 50 μg/ml heparin) at 55 to 59°C, prehybridized for 1 to 5 h in hybridization buffer. There was no proteinase K treatment. To improve the staining quality, the prehybridized eggs were stored in -20°C hybridization buffer for at least 16 h.
For each in situ hybridization, 50 to 100 μl of embryos were incubated in 300 μl of hybridization buffer with an RNA probe for one gene labeled with DIG and an RNA probe to a second gene labeled with either DNP or BIO. After 12 to 48 h co-hybridization at 55 to 59°C and several high-stringency and low stringency washes, the two probes were detected sequentially. The DIG-labeled probe was detected using 1:500 horseradish peroxidase conjugated anti-DIG-antibody (anti-DIG-POD; Roche, Basil, Switzerland) and either a Cy3 or coumarin-tyramide reagent (Perkin-Elmer TSA-kit, Wellesley, MA, USA). Before the second probe was detected, the anti-DIG-POD antibody was first removed with several 15 minute washes with 50% formamide, 5 × SSC, 0.2% TritonX-100 at 55°C, followed by inactivation of the remnants with 5% formaldehyde/PBT+Tx. Then the second probe was detected using 1:100 anti-DNP-HRP (Perkin-Elmer) and either the complementary coumarin or Cy3-TSA-tyramide reaction. To allow detection of nuclei with a nucleic acid binding stain, all RNA in the embryo was first removed by digestion with 0.18 μg/ml RNAseA in 500 μl overnight at 37°C, and then the DNA was stained overnight by incubation in 500 to 1,000 μl of a 1:5,000 dilution of Sytox Green dye (Molecular Probes, Carlsbad, CA, USA).
The kni protein expression was detected with guinea pig-anti-kni (a gift from J Reinitz, Stony Brook University, Stony Brook, NY, USA) and Alexa488-anti-guinea pig (Molecular Probes) in embryos hybridized against ftz DIG-mRNA that was detected with coumarin tyramides. For these embryos only, the nuclei were detected using mouse-anti-histoneH1 and Alexa555-anti-mouse.
The stained embryos were dehydrated with an ethanol-series and mounted in xylene-based DePex (Electron Microscopy Sciences, Hatfield, PA, USA). A #1 coverslip was placed on a bridge formed by two #1 coverslips to prevent embryo flattening. This mountant has the advantages of creating permanent slides that protect the fluorophore from oxygen, which makes the samples highly resistant to photobleaching. To estimate the refractive index of the mountant (which determines the scaling of the z-axis), we used the assumption that embryo morphology was independent of the orientation of the embryo when it was imaged. A d/v cross-section of multiple embryos was taken at 50% egg length. Within these cross-sections, the ratio of the d/v length to the left/right length was plotted against orientation angle (data not shown). The refractive index was then computed so that this ratio was independent of the orientation. The average refractive index calculated using this method was 1.62 ± 0.06.
Each of the imaged embryos was individually staged from a phase contrast view and the stages were recorded into BID. Embryos of stage 5 [17
] were subdivided into cohorts based on the degree to which membranes had invaginated during cellularization. For example, an embryo in which the cellular membranes had invaginated 50% of the distance across the cortical cytoplasm would be staged as stage 5:50%. Because the rate of cellular invagination varies along the d/v axis, being most rapid ventrally, the percentage of membrane invagination was visually estimated where possible at the ventral surface of the embryo. If the embryo was lying in an orientation where the ventral surface was not visible in cross-section, however, we estimated the degree of membrane invagination at that side of the embryo where invagination was most advanced. Later, the stage of these embryos was corrected based on our observation that membrane invagination is about 70% laterally when it is at 100% ventrally, yet at 40% invagination it is approximately even all around the embryo. The degree to which membranes had invaginated ventrally was estimated using a linear mapping for cases where membranes had invaginated laterally at least 50% using the function 50 + (5/2)(v
- 50) (where v
is the lateral invagination percentage). The d/v orientation of all embryos was determined from their respective PointClouds based on gene expression features (see below). For the analyses presented in this paper, we used embryos in the range stage 5:50-100% invagination, which is a time window of 10 to 15 minutes [44
Three-dimensional images of the whole embryos were obtained on a Zeiss LSM 510 META/NLO laser scanning microscope (Carl Zeiss MicroImaging, Inc., Thornwood, NY, USA) with a plan-apochromat 20×, 0.75 numerical aperture objective. This objective allowed imaging of entire embryos in a single field-of-view while providing sufficient resolution and sensitivity for the subsequent analyses. The fluorophores were excited simultaneously by dual 750 nm photons supplied by a Chameleon laser (Coherent, Inc., Santa Clara, CA, USA). The resulting emission spectrum was split by dichroic mirrors and collected by three independent photomultiplier tubes (PMTs). The signals were digitized into 12 bits and recorded as three-channel images, each of a size up to 1,024 by 1,024 by 150 pixels, which varied depending on the embryo size. Each pixel had a transverse dimension of 0.45 μm and an axial dimension of approximately 1.6 μm, which varied slightly with the refractive index of the mounting medium. The gain and offset of the PMTs were set so that all the pixels of interest fell within the 12 bit dynamic range.
The position and extent of the nuclei on the surface of the embryo were defined by a model-based three-dimensional segmentation analysis. Here we discuss some of the main aspects of the algorithm. All image processing and analysis algorithms were implemented in MATLAB (The MathWorks Inc, Natick, MA, USA) with the DIPimage toolbox [45
The segmentation routines used as input the image of the Sytox DNA stain channel, labeled 'DNA image' in Figure . To restrict the analysis to the nuclei on the embryo surface, a three-dimensional binary mask, the 'shell mask' (Figure ), was defined around the embryo surface by taking an adaptive threshold of the 'DNA image' that varied on a per-slice basis to account for signal attenuation (Figure ). This shell mask was used to direct spectral unmixing of the Cy3, Sytox and Coumarin channels. It also allowed the initial attenuation correction of the Sytox channel required for the segmentation. This was accomplished using a local contrast stretch within the shell mask. A global threshold was then applied to the unmixed, attenuation-corrected Sytox channel, which was then masked by the shell image. The resulting 'DNA mask' (Figure ) identified the regions in the image that belong to the blastoderm nuclei.
Figure 11 Overview of the segmentation algorithm. The main steps of the algorithm are illustrated here on a small portion of a slice through the middle of an embryo. Note that the actual images are three-dimensional and comprise a whole embryo. The DNA image is (more ...)
Sytox attenuation with depth. Relative intensity of the Sytox stain within each nucleus, plotted against the depth of the nucleus along the optical axis. Sytox levels were normalized by scaling the 99th percentile of intensity to 100.
To locate individual nuclei, the DNA image was convolved with a narrow Gaussian to reduce noise. Local maxima in the resulting image, termed 'seeds' (Figure ), were then used to determine nuclear position. Multiple seeds were often observed in a single nucleus along its apical-basal axis on the sides of images, due to anisotropic resolution and nuclear geometry. Multiple seeds were also occasionally detected on the bottom of the embryo, where the signal to noise ratio was low due to signal attenuation. To eliminate multiple seeds, the embryo 'surface normal' for each seed was computed by applying the structure tensor [47
] to the three-dimensional skeleton [49
] of the shell mask(Figure ). Neighboring seeds that lay along this normal were assumed to belong to the same nucleus and simply removed, leaving only a set of 'pruned seeds' (Figure ).
Once a single seed was determined per nucleus, the pruned seeds were grown to fill the nuclei, using a region-growing algorithm that combined a watershed algorithm [51
] and a gray-weighted distance transform [51
] of the DNA image (Figure ). The combination of these two algorithms created nuclear boundaries that matched actual boundaries when visible, yet divided distances between seeds equally when boundaries where not distinguishable.
In some cases nuclei, predominantly on the sides of images, did not posses a seed and were joined to one of its neighbors. These regions were detected by comparing their sizes to average sizes taken from the top and bottom of the image where segmentation was most accurate (Figure ). The original seeds for these regions were then replaced by an appropriate number of seeds using a cluster analysis algorithm [55
] that placed seeds on the brightest possible locations that created regions of similar total intensity. The region growing algorithm described above was executed again on this refined set of seeds. Finally, regions that were still too large were just split into an appropriate number of equal volumes without regard for the pixel intensities. Our Segmentation Volume Renderer [18
] was used extensively during the development of the segmentation algorithm.
Finally, the segmentation algorithm includes additional features that make it more robust to images with specific artifacts that would have otherwise resulted in failure to generate a PointCloud. One example is the occasional presence of impurities on the embryo surface that caused a bright artifactual fluorescence signal across all channels. These regions were detected in the image and ignored during subsequent analysis. A second example is the occasional presence of a yolk nucleus proximal to the blastoderm nuclei. Such a yolk nucleus results in a shell mask with a local basal bulge. This condition was simply detected and removed. For full details on the segmentation algorithm refer to the source code, available online [10
Measuring expression levels associated with each nucleus
To capture the labeled mRNA expression levels, we first had to estimate the cellular extent surrounding each nucleus. This was achieved by growing the nuclear segmentation mask, in the apical and basal directions, into the cytoplasm by tessellation. The distances grown were established by examining cytoplasmic auto-fluorescence in several sample images. This was then used in combination with the nuclear mask to divide each cell into three regions: apical, nuclear and basal (Figure ). The expression level was estimated in each of these regions and in the whole cell by taking the average values within them for both the Cy3 and Coumarin channels. These expression values, together with the average value of the Sytox channel within each nucleus, the center of mass of the nuclei, the volumes of the various cellular regions, and the neighborhood relationshps between cells were written to a PointCloud file.
For subsequent analysis, expression values from the PointClouds were corrected for attenuation by dividing these values with the average Sytox intensity within the corresponding nucleus. This approach assumes that the average Sytox intensity is constant from nucleus to nucleus, and it is representative of the attenuation of the other channels.
Cylindrical and orthographic projection of the blastoderm
We use two methods to display data on the embryo surface: the cylindrical projection and the orthographic projection. The cylindrical projection provides an 'unrolled' view of the full surface, which we use to display data mapped onto the blastoderm surface. The orthographic projection shows only half the surface, but produces less distortion and, therefore, is useful to show the location of borders of the a/p patterning system. The center of mass of the embryo was computed from the three-dimensional nuclear coordinates in the PointCloud as the mean coordinate of all points. The principal a/p axis of the embryo was estimated as the eigenvector associated with the smallest eigenvalue of the inertia tensor [47
]. The location of the dorsal-most point was determined manually for each PointCloud from the ftz
expression pattern. The embryo was then translated so that the center of mass was at the origin, and rotated so that the estimated a/p axis lay on the x-axis and the d/v axis lay on the z-axis, anterior to the left (negative x), dorsal up (positive z). The cylindrical projection then used the x
-coordinate on the horizontal and ϕ
on the vertical, where y
) and z
). This resulted in a rectangular plot with the embryo's anterior to the left, the dorsal midline split to the top and bottom, and the ventral midline in the middle. Orthographic projections simply used the x
-coordinate on the horizontal and the z
-coordinate on the vertical, discarding y
. As a further aid in managing the complexity of this three-dimensional dataset, we developed a flexible visual analysis tool, PointCloudXplore [19
], which can be used to interactively visualize and analyze the embryo PointClouds in three dimensions.
Computing packing density of nuclei
Nuclear packing densities were calculated as the number of nuclei per unit surface area. The surface of the embryo was first identified from the PointCloud using the Eigencrust algorithm [56
]. Briefly, a region was defined by sweeping a 15 μm arc on the embryo surface about each nucleus. The density was then estimated as the number of nuclei inside this region divided by its area. Average density maps were computed by resampling the per-nucleus density estimates for a given embryo onto a regular grid in cylindrical coordinates, and averaging these resampled projections over the embryos in a temporal cohort. Only the top and bottom parts of the z-stacks were used for density analyses, except for method evaluation comparison in Figure , where the laterals of the z-stacks were used.
Computing apical/basal shift of nuclei
Apical/basal shift was measured by fitting, using least squares, a quadratic surface to the 200 nearest neighbors of a nucleus, and determining the distance of the nucleus to this surface. Average shift maps were computed using resampled cylindrical projections, in the same manner as the average density maps. To eliminate the possibility that bleed-through from mRNA stain channels might influence the segmentation and localization of nuclei in the Sytox channel, we also examined average shift maps produced from subsets of embryos excluding those embryos stained for particular genes (data not shown). All of these maps showed qualitatively similar patterns of nuclear displacement.
Measuring expression boundary location
To determine an initial estimate of the boundary location, two algorithms were created to find the approximate location of the pair rule and gap gene stripe boundaries from PointCloud data. The first algorithm was fully automatic, once the number of stripes was specified. It used a local threshold to detect regions that contain the highest expression values. The edges of these regions provided approximate locations for stripe boundaries. A second semi-automatic algorithm was developed for immature patterns (such as the early ftz
pattern), and those that did not segment properly because of imaging artifacts. In these cases, a user clicked on a nucleus close to the stripe border of interest. The shortest geodesic path [57
] that circumnavigated the embryo through this point was determined. This was done using a gray-weighted distance transform [51
] of the gradient of the stripe expression pattern, taken along the a/p direction, and resulted in a path that followed the stripe edge. When this failed, the stripe boundary was determined manually by placing eight points on each edge.
To compute the location of the stripe boundaries, the embryo was first divided into 16 equal strips running along the a/p axis. Nuclei that fell within each strip were projected onto the a/p axis and their expression values were sampled at 400 regular intervals, using normalized convolution [58
] with a Gaussian of σ
= 1 interval (this yields 16 one-dimensional graphs). Accurate boundaries of expression stripes were then determined by finding the center of mass of peaks in the gradient of expression along the strip. The center of mass was used because it is more robust against noise than the expression gradient maximum, which marks the expression inflection point, a feature commonly used to mark edges.
For Figure where the boundaries were computed using a threshold, we thresholded the one-dimensional projections of the 16 strips as defined above, then determined the location of the boundary closest to the expected boundary location, as given by the inflection points. Due to variation between individuals, some embryos did not posses all points used in this graph. The measurements at each point were averaged for all embryos that possessed a threshold at that point. Where more than 50% of embryos lacked a point, that point was not shown.
Measuring stripe intensity
The intensity of pair rule gene stripes was determined using the 95th percentile of the expression level values (as a more robust substitute for the maximum), within a region determined by the 1/16th strip and the stripe borders as determined above.
Data management and storage
A BID was built to manage and store of all the data and metadata produced by this project [10
]. BID tracks the entire experimental process from the embryo preparation (genotype, phenotype, collection conditions, maturation conditions, and so on) and hybridization (nucleic acid probes, secondary antibodies, fluorophores, and so on, including detailed information such as the vector DNA sequence), all the way to the PointCloud data files (with associated metadata such as a quality score, thumbnails and links to the raw image data). For each step in the experimental process, a corresponding table or set of tables describes the fine-grained details of that process (Figure ).
BID schema. Each table corresponds to a step in the experimental process. The tables have been grouped into four blocks corresponding to a coarser subdivision of the pipeline.
Sophisticated search functions and overviews of the experiments are provided to aid project management. For example, it is possible to quickly find the slide and embryo location for a given PointCloud, should it need to be re-imaged or re-staged. This is accomplished by identifying each slide with a unique bar code and each embryo that was imaged by its coordinates on the slide. For a full schema see Figure .
The raw three-dimensional images are stored in a dedicated repository, and indexed with BID. Because of their large size (approximately 400 Mb each), the raw images require a different backup solution as well as a high-speed network between the storage and the computers used for processing them. The independent repository makes this possible.