|Home | About | Journals | Submit | Contact Us | Français|
With rapid urbanization, highly accurate and semantically rich virtualization of building assets in 3D become more critical for supporting various applications, including urban planning, emergency response and location-based services. Many research efforts have been conducted to automatically reconstruct building models at city-scale from remotely sensed data. However, developing a fully-automated photogrammetric computer vision system enabling the massive generation of highly accurate building models still remains a challenging task. One the most challenging task for 3D building model reconstruction is to regularize the noises introduced in the boundary of building object retrieved from a raw data with lack of knowledge on its true shape. This paper proposes a data-driven modeling approach to reconstruct 3D rooftop models at city-scale from airborne laser scanning (ALS) data. The focus of the proposed method is to implicitly derive the shape regularity of 3D building rooftops from given noisy information of building boundary in a progressive manner. This study covers a full chain of 3D building modeling from low level processing to realistic 3D building rooftop modeling. In the element clustering step, building-labeled point clouds are clustered into homogeneous groups by applying height similarity and plane similarity. Based on segmented clusters, linear modeling cues including outer boundaries, intersection lines, and step lines are extracted. Topology elements among the modeling cues are recovered by the Binary Space Partitioning (BSP) technique. The regularity of the building rooftop model is achieved by an implicit regularization process in the framework of Minimum Description Length (MDL) combined with Hypothesize and Test (HAT). The parameters governing the MDL optimization are automatically estimated based on Min-Max optimization and Entropy-based weighting method. The performance of the proposed method is tested over the International Society for Photogrammetry and Remote Sensing (ISPRS) benchmark datasets. The results show that the proposed method can robustly produce accurate regularized 3D building rooftop models.
A key problem domain that we address in this paper is to reconstruct a 3D geometric model of building rooftop from remotely sensed data such as airborne laser point clouds. The representation that we follow for 3D rooftop models draws on ideas from geometric modeling used in Photogrammetry and Geographical Information Science (GIS). In the representation scheme, a 3D rooftop is modeled with either primitive geometric elements (i.e., points, lines, planes and objects), or primitive topological elements (i.e., vertices, edges, faces, and edge-groups (rings of edges on faces)). Typically, both primitive geometric and topological elements are used together for representing 3D rooftop models (e.g., CityGML and Esri ArcGIS’s shapefile). CityGML is an open data model and XML-based format for the storage and exchange of virtual 3D city models .
In CityGML, 3D rooftop models can be differently represented according to the level-of-detail (LoD). A prismatic model of rooftop that is a height extrusion of a building footprint is defined as LoD1 in CityGML, while LoD2 requires a detailed representation of the primitive geometric and topological elements in a 3D rooftop model. An important aspect in GIS-driven 3D model representation is that the reconstructed model elements should correspond to semantically meaningful spatial entities used in architecture, civil and urban planning: for instance, the reconstructed geometric elements represent roof lines (ridges and eaves), roof planes (gables and hips), vents, windows, doors, wall columns, chimneys, etc. Thus, a photo-realistic reconstructed rooftop model can be used for assisting human decisions on but not limited to asset management, renovating planning, energy consumption, evacuation planning, etc. As discussed in Rottensteiner et al. , a city-scale building model will provide an important mean to manage urban infrastructure more effectively and safely for addressing critical issues related to rapid urbanization. In this study, we aim to reconstruct LoD2 models of the rooftops from remote sensed data.
Traditionally, 3D rooftop models are derived through interaction with a user using photogrammetric workstations (e.g., multiple-view plotting or mono-plotting technology). This labor-intensive model generation process is tedious and time-consuming, which is not suitable for reconstructing rooftop models at city-scale. As an alternative method, great research efforts have been made for developing a machine-intelligent algorithm to reconstruct photo-realistic rooftop models in a fully-automated manner for the last two decades . Recently, airborne light detection and ranging (LiDAR) scanners became one of the primary data acquisition tools, which enable rapid capturing of targeted environments in 3D with high density and accuracy. Due to these advantages, state-of-the-art technologies for automatically reconstructing 3D rooftop models using airborne LiDAR data have been proposed by many researchers [2,3,4,5,6]. However, only limited success in a controlled environment has been reported, and the success of developing an error-free rooftop modeling algorithm is not achieved yet .
In general, 3D rooftop models are derived automatically from 3D LiDAR point clouds by: (1) extracting the primitive geometric elements, namely “modeling cues”; and (2) recovering the primitive topological elements among the modeling cues. A critical problem to hinder the automation of 3D rooftop model generation is that many portions of the object (rooftop) are unknown, and recovered with errors caused by the following reasons:
As discussed previously, the aforementioned factors lead to errors in recovering the modeling cues sufficiently well for generating an error-free rooftop model. Typically, knowledge of a rooftop object of interest (e.g., roof type, structure, number of roof planes, etc.) is unknown. Thus, recovering all the primitive topological elements accurately with an error-free geometric model is a very challenging vision task. To address this issue, many researchers have introduced some modeling constraints to compensate the limitations of erroneous modeling cues [7,8,9,10]. These constraints are used as prior knowledge on targeted rooftop structures: (1) for constructing the modeling cues to conform to Gestalt law (i.e., parallelism, symmetry, and orthogonality), and linking fragmented modeling cues in the frame of perceptual grouping; and (2) by determining optimal parametric rooftop model fit into part of rooftop objects through a trial-and-error of model section from a given primitive model database. We refer these modeling constraints as an “explicit regularity” imposed on rooftop shape as the definition of regularity is fully and clearly described. However, only a few of the explicit regularity terms can be applicable, and the shapes of rooftops in reality appear too complex to be reconstructed with those limited constraints.
In this paper, we focus on the data-driven modeling approach to reconstruct 3D rooftop models from airborne LiDAR data by introducing flexible regularity constraints that can be adjusted to given objects in the recovery of modeling cues and topological elements. The regularity terms that are used in this study represent a regular pattern of the line orientations, and the linkage between adjacent lines. In contrast to the term of “explicit regularity”, we refer to it as an “implicit regularity” because its pattern is not directly expressed, but found with given data and object (rooftop). This implicit regularity is used as a constraint for changing the geometric properties of the modeling cues and topological relations among adjacent modeling cues to conform to a regular pattern found in the given data. This data-adaptive regularity (or regularization process) allows us to reconstruct more complex rooftop models.
In this paper, we describe a pipeline of 3D rooftop model reconstruction from airborne LiDAR data. First, to gain some computational efficiency, we decompose a rooftop object into a set of homogeneous point clouds based on height similarity and plane similarity, from which the modeling cues of line and plane primitives are extracted. Secondly, the topological elements among the modeling cues are recovered by iteratively partitioning and merging over a given point space with line primitives extracted at a global scale using the Binary Space Partitioning (BSP) technique. Thirdly, errors in the modeling cues and topological elements are implicitly regularized by removing erroneous vertices or rectifying the geometric properties to conform to globally derived regularity. This implicit regularization process is implemented in the framework of Minimum Description Length (MDL) combined with Hypothesize and Test (HAT). The parameters governing the MDL optimization are automatically estimated based on Min-Max optimization and Entropy-based weighting method. The proposed parameter estimators provide optimal weight values that adapt according to building properties such as; size, shape, and the number of boundary points. The proposed pipeline of rooftop model generation was developed based on previous works reported in . We extended the work by proposing data-adaptive parameter estimation, conducting an extensive performance evaluation and engineering works to implement a computationally efficient modeling pipeline.
Numerous building reconstruction algorithms have been published for the past two decades. Although it is difficult to clearly classify these various methods into specific categories, there are several ways to categorize the methods: the used data source (single vs. multi-sources), the data processing strategy (data-driven (or generic), model-driven (or parametric)), and the amount of human interaction (manual, semi-automatic, or fully automated) . Of those, classifying existing methods into data-driven or model-driven approaches provides a good insight for understanding and developing 3D building model reconstruction algorithms.
In the model-driven approaches, 3D building models are reconstructed by fitting parameterized primitives to data. This is possible because many buildings in rural and suburban area have common shapes in whole building or building roof parts. These common roof shapes such as flat, gable, and hip roof are considered as standard primitives for representing building rooftop structures. Simple buildings can be well represented as regularized building models using pre-defined parameterized primitives even with low density data and presence of missing data. However, complex buildings and arbitrarily shaped buildings are difficult to model using a basic set of primitives. In addition, the selection of the proper primitives among a set of primitives is not an easy task. To address the limitations, Verma et al.  presented a parametric modeling method to reconstruct relatively complex buildings by combining simple parametric roof shapes that are categorized into four types of simple primitives. In this study, the roof-topology graph is constructed to represent the relationships among the various planar patches of approximate roof geometry. The constructed roof-topology graph is decomposed into sub-graphs, which represents simple parametric roof shapes, and then parameters of the primitives are determined by fitting LiDAR data. Although they decomposed complex buildings into simple building parts, many building parts cannot be still explained by their four simple shape primitives. Similarly, Milde et al.  reconstructed 3D building models by matching sub-graphs of the region adjacency graph (RAG) with five basic roof shapes and then by combining them using three connectors. Kada and McKinley  decomposed the building’s footprint into cells, which provided the basic building blocks. Three types of roof shapes including basic, connecting, and manual shapes are defined. Basic shapes consist of flat, shed, gabled, hipped, and Berliner roofs while connecting shapes are used to connect the roofs of the sections with specific junction shapes. The parameterized roof shapes of all cells are determined from the normal direction of LiDAR points. The entire 3D building model is represented by integrating the parameterized roof elements with the neighboring pieces. Although a high level of automation is achieved, the method still requires manual works to adjust cell parameters and to model more complex roof shapes like mansard, cupola, barrel, and even some detail elements. Lafarge et al.  reconstructed building models from a digital surface model (DSM) by combining generic and parametric methods. Buildings are considered as assemblages of 3D parametric blocks from a library. After extracting 2D building supports, 3D parametric blocks are placed on the 2D supports using Gibbs model, which controls both the block assemblage and the fitting to data. The optimal configuration of 3D blocks is determined using the Bayesian framework. They mentioned that the optimization step needs to be improved to achieve both higher precision and shorter computing time as future work. Based on a predefined primitive library, Huang et al.  conducted a generative modeling to reconstruct roof models that fit the data. The library provides three groups including 11 types of roof primitives whose parameters consist of position parameters, contour parameters, and shape parameters. Building roofs are represented as one primitive or an assemblage of primitives allowing primitives overlaps. For combining primitives, they derived combination and merging rules which consider both vertical and horizontal intersections. Reversible Jump Markov Chain Monte Carlo (RJMCMC) with a specified jump mechanism is conducted for the selection of roof primitives, and the sampling of their parameters. Although they have shown potential and flexibility of their method, there are issues to be solved: (1) uncertainty and instability of the reconstructed building model; (2) influence of prior knowledge and scene complexity on completeness of the reconstruction; and (3) heavy computation time.
In contrast with model-driven approaches, data-driven approaches do not make any assumptions regarding to the building shapes, thus they can theoretically handle all kinds of buildings. However, the approach may cause considerable deformations due to the sensitivity to surface fluctuations and outliers in the data. In addition, it requires a regularization step during the reconstruction process. In general, the generic approach starts by extracting building modeling cues such as surface primitives, step lines, intersection lines, and outer boundary lines followed by reconstructing the 3D building model. The segmentation procedure for extracting surface primitives divides a given data set into homogeneous regions. Classical segmentation algorithms such as region growing [16,17] and RANSAC  can be used for segmenting building roof planes. In addition, Sampath and Shan  conducted eigenanalysis for each roof point within its Voronoi neighborhood, and then adopted the fuzzy k-means approach to cluster the planar points into roof segments based on their surface normal. Then, they separated the clusters into parallel and coplanar segments based on their distance and connectivity. Lafarge and Mallet  extracted geometric shapes such as planes, cylinders, spheres, or cones for identifying the roof sections by fitting points into various geometric shapes, and then proposed a method for arranging both the geometric shapes and the other urban components by propagating point labels based on MRF. Yan et al.  proposed a global solution for roof segmentation. Initial segmentation is optimized by minimizing a global energy function consisting of the distances of LiDAR points to initial planes, spatial smoothness between data points, and the number of planes. After segmenting points or extracting homogeneous surface primitives, modeling cues such as intersection lines and step lines can be extracted based on geometrical and topological relationships of the segmented roof planes. Intersection lines are easily obtained by intersecting two adjacent planes or segmented points while step lines are extracted at roof plane boundary with abrupt height discontinuity. To extract step lines, Rottensteiner et al.  detected edge candidate points and then extracted step lines from an adjustment considering edge points within user-specified threshold. In addition, Sohn et al.  proposed a step line extractor, called Compass Line filter (CLF), for extracting straight lines from irregularly distributed LiDAR points. Although outer boundary is one type of step line, it is recognized as a separate process in many data-driven approaches. Some researchers delineated initial boundary lines from building boundary points using alpha shape , ball-pivoting , and contouring algorithm . Then, the initial boundary was simplified or regularized. The detail reviews for simplification or regularization of boundary will be given in following paragraphs. Once all building modeling cues are collected, 3D building models are reconstructed by aggregating the modeling cues. To reconstruct topologically and geometrically correct 3D building models, Sohn et al.  proposed the BSP technique, which progressively partitions a building region into homogeneous binary convex polygons. Rau and Lin  proposed a line-based roof model reconstruction algorithm, namely TIN-Merging and Reshaping (TMR), to reconstruct topology with geometric modeling. Oude Elberink and Vosselman , and Perera and Maas  used a roof topology graph to preserve roof topology. In the latter, roof corners are geometrically modeled using the shortest closed cycles and the outermost cycle derived from the roof topology graph.
Detection of building boundary is an intermediate step for 3D building reconstruction although it is not required in all building reconstruction algorithms. Generally, the initial boundaries extracted from irregular LiDAR points have jagged shape with large numbers of vertices. Thus, a simplification or regularization process is required to delineate plausible building boundaries with certain regularities such as orthogonality, parallelism, and symmetry. Various techniques related to the regularization of building boundary have been proposed in the literature . In most methods, the boundary detection process starts by extracting boundary points from segmented points. From extracted boundary points, initial building boundaries are generated by tracing boundary points followed by a simplification or regularization process, which improves the initial boundary. The easiest method to improve initial boundary is to simplify the initial boundary by removing vertices but preserving relevant points. The well-known Douglas–Peucker (DP) algorithm  is widely recognized as the most visually effective line simplification algorithm. The algorithm starts by selecting two points which have the longest distance and recursively adding vertices whose distance from the line is less than a given threshold. However, the performance of the algorithm fully depends on the used threshold and is substantially affected by outliers. Another approach extracts straight lines from boundary points using the Hough Transform  or using RANSAC . The extracted lines are then connected by intersections of the extracted straight lines to generate closed outer boundary lines. However, Brenner  pointed out that the methods require some additional steps due to missing small building edges.
On the other hand, the regularization process imposes certain regularities when the initial boundary is simplified. Vosselman  assumed that building outlines are along or perpendicular to the main direction of a building. After defining the position of a line by the first two boundary points, the line is updated using the succeeding boundary points until the distance of a point to the line exceeds some bound. The next line starts from this point in a direction perpendicular to the previous line. A similar approach was proposed by Sampath and Shan . They grouped points on consecutive edges with similar slopes and then applied a hierarchical least squares solution to fit parametric lines representing the building boundary.
Some methods are based on the model hypothesis and verification approach. Ameri  introduced the Feature Based Model Verification (FBMV) for modification and refinement of polyhedral-like building objects. In their approach, they imposed the geometrical and topological model information to the FBMV process as external and internal constraints which consider linearity for straightening consecutive lines, connectivity for establishing topology between adjacent lines, orthogonality, and co-planarity. Then, the weighted least squares minimization was adopted to produce a good regularized description of a building model. Weidner and Förstner  adopted the MDL concept to regularize noisy building boundaries. For four local consecutive points, ten different hypothetical models are generated with respect to regularization criteria. Then, MDL, which depends on the mutual fit of the data and model and on the complexity of the model, is used to find the optimal regularity of the local configuration. Jwa et al.  extended the MDL-based regularization method by proposing new implicit hypothesis generation rules and by re-designing model complexity terms where line directionality, inner angle and number of vertices are considered as geometric parameters. Furthermore, Sohn et al.  used the MDL-based concept to regularize topologies within rooftop model. Zhou and Neumann  introduced global regularities in building modeling to reflect the orientation and placement similarities among 2.5D elements, which consist of planar roof patches and roof boundary segments. In their method, roof-roof regularities, roof-boundary regularities, and boundary-boundary regularities are defined and then the regularities are integrated into a unified framework.
Figure 1 shows the overall workflow implemented for generating 3D building rooftop models from airborne LiDAR point clouds, where individual buildings are detected. The method consists of three main parts: (1) modeling cue extraction; (2) topology element reconstruction; and (3) regularization. In the modeling cue extraction, roof element clusters, lines (intersection and step lines), and outer-boundaries are extracted from a set of laser point clouds labeled as single building objects (i.e., building labeled points) (Section 2.1). Then, the topology relations among the modeling cues are established by BSP (Section 2.2). Finally, an implicit regularization process is applied to outer-building boundaries and rooftop polygons (Section 3). The regularization process is based on the framework of MDL in combination with HAT optimization. Note that the regularization process is conducted twice; once for regularizing building outer-boundaries which represent LoD1 models, and then for rooftop models which represent LoD2 models. Two types of weight parameters in the MDL-based objective function are automatically determined by Min-Max optimization and Entropy-based parameter estimation method, respectively (Section 4).
The first step towards generating 3D building models using LiDAR data is to gather the evidence of building structures (i.e., primitive geometric elements). Planes and lines are recognized as the most important evidence to interpret building structures due to the fact that 3D building rooftop models can be mainly represented by planar roof faces and edges. The two different modeling cues (planar and linear modeling cues) have different properties and can be separately extracted from LiDAR points. In Section 2.1.1, building points are sequentially segmented into homogeneous clusters, first based on height similarity and then based on plane similarity. In Section 2.1.2, linear modeling cues are extracted using boundary points of the homogeneous clusters.
Roof element clustering segments building-labeled points into homogeneous rooftop regions with a hierarchical structure. A building rooftop in an urban area is a combination of multiple stories, each of which consists of various shapes of flat and sloped planes. Directly extracting homogeneous regions from entire building points may result in difficulties due to a high degree of shape complexity. To reduce the complexity, the building-labeled points are decomposed into homogeneous clusters by sequentially applying height similarity and plane similarity in order.
In the height clustering step, the rooftop region with n numbers of building-labeled points is divided into m height clusters . Height similarity at each point is measured over its adjacent neighboring points in Triangulated Irregular Network (TIN). A point with the maximum height is first selected as a seed point, and then a conventional region growing algorithm is applied to add neighbor points to a corresponding height cluster with a certain threshold (). This process is repeated until all building rooftop points are assigned to one of the height clusters. As a result, the height clusters satisfy the property , , . Note that each height cluster consists of one or more different roof planes.
In the plane clustering step, each height cluster is decomposed into k plane clusters based on a plane similarity criterion. The well-known random sample consensus (RANSAC) algorithm is adopted to obtain reliable plane clusters as suggested in previous studies [18,36]. The process starts by randomly selecting three points as seed points to generate an initial plane. After a certain period of random sampling, a plane, which has the maximum number of inliers with a user defined tolerance distance ζ (ζ = 0.1 m) from the estimated plane, is selected as a best plane. Points, which are assigned in the previous iteration, are excluded in the next step. The process continues until all points of the height cluster are assigned into certain plane clusters. Figure 2b,c shows examples of height clusters and plane clusters, respectively, where different colors represent different clusters.
Once building-labeled points are segmented into homogeneous clusters with a hierarchical structure, linear modeling cues are extracted from the homogeneous clusters. We divide linear modeling cues into three different types in order to reduce the complexity in the modeling cue extraction process as follows: (1) outer boundaries of height clusters; (2) intersection lines; and (3) step lines within each height cluster.
In boundaries of height clusters, two adjacent planes have a large height discontinuity. Thus, outer boundaries of height clusters can be recognized as step lines. However, distinguishing between outer boundaries of height clusters and step lines within each height cluster can reduce ambiguity in the topology recovering process (Section 2.2). In addition, outer boundaries of height clusters can serve to generate the LoD1 model. For these reasons, in this study, we separately extract outer boundaries of height clusters. The process starts by detecting boundary points of height clusters which share neighbor height clusters in a TIN structure. After selecting a starting boundary point, a next boundary point is determined by surveying neighbor boundary points, which are connected with the previous boundary point in TIN structure, and by selecting a boundary point which appears first in an anti-clockwise direction. The process continues until the boundary is closed. Then, the closed boundary is regularized by the MDL-based regularization method which will be described in Section 3.
An intersection line candidate is extracted by two adjacent roof planes. Candidates are accepted as valid intersection line if they separate the point sets of the planes and if a sufficient number of points is close to the generated lines.
For step lines, boundary points of plane clusters, which do not belong to outer boundaries or intersection lines, are considered as candidate points for step lines. Given a sequence of l candidate points, step lines are extracted in a similar way to the Douglas–Peucker (DP) algorithm. The process starts with a straight line () connecting the first point and last point of the sequence and then recursively adding candidate points which have a distance larger than a user-defined tolerance (0.5 m). Each segment of the line segments is considered a step line. Figure 3 gives examples of each type of linear modeling cues.
Once all modeling cues are collected, topological relations among the modeling cues are constructed by the BSP technique. In computer science, the BSP is a hierarchical partitioning method for recursively subdividing a space into convex sets with hyperlines. Sohn et al.  used the BSP to recover topological relations of 3D building rooftop planes. We adopt the method to reconstruct a topologically and geometrically correct 3D building rooftop model from incomplete modeling cues. The topology recovery process consists of a partitioning step and plane merging step. In the partitioning step, a hierarchical binary tree is generated by dividing a parent region into two child regions with hyperlines (linear modeling cue). The partitioning optimum is achieved by maximizing partitioning score which consists of planar homogeneity, geometric regularity and edge correspondence . In plane merging step, the adjacent roof planes having similar normal vector angles are merged by applying a user-defined threshold. The merging process continues until no plane can be accepted by the co-planar similarity test. Once all polygons are merged together, 3D building rooftop model can be reconstructed by collecting final leave nodes in the BSP tree. Figure 4 shows results of partitioning step, merging step and corresponding 3D rooftop model.
As mentioned before, recovering error-free 3D rooftop models from erroneous modeling cues is a challenging task. Geometric constraints such as parallelism, symmetry, and orthogonality can be explicitly used as a prior knowledge on rooftop structures to compensate the limitations of erroneous modeling cues. However, explicitly imposing the constraints has limitations on describing complex buildings that appear in reality. In this study, we propose an implicit regularization where regular patterns of building structures are not directly expressed, but implicitly imposed on reconstructed building models providing flexibility for describing more complex rooftop models. The proposed regularization process is conducted based on HAT optimization in MDL framework. Possible hypotheses are generated by incorporating regular patterns that are present in the given data. MDL is used as a criterion for selecting an optimal model out of the possible hypotheses. The MDL concept for model selection is introduced in Section 3.1 while Section 3.2 introduces a method for hypothesis generation.
The MDL proposed by Rissanen  is a method for inductive inference that provides a generic solution to the model selection problem . The MDL is based on the idea of transmitting data as a coded message, where the coding is based on some prearranged set of parametric statistical model. The full transmission has to include not only the encoded data values, but also the coded model parameter values . Thus, the MDL consists of model complexity and model closeness as follows:
where indicates a goodness-of-fit of observations D given a model H while represents how complex the model H is. is a weight parameter for balancing the model closeness and the model complexity. Assuming that an optimal model representing the data has the minimal description length, the model selection process allows a model H to be converged to the optimal model H* as follows:
The first term in Equation (1) is optimized for good data attachment to the corresponding model. With an assumption that an irregular distribution of data with n measurements caused by random errors follows a Gaussian distribution with expectation and variance , its density function can be represented as . By using a statistical model of the data, the degree of fit between a model and data can be measured by , and then the term of model closeness can be rewritten in a logarithmic form as follows:
In Equation (3), the last two terms can be ignored with an assumption that all the hypotheses have the same . Thus, the equation is simplified as follows:
where is the weighted sum of the squared residuals between a model H and a set of observations D, that is in matrix form.
The second term in Equation (1) is designed to encode the model complexity. In this study, the model complexity is explained by three geometric factors: (1) the number of vertices ; (2) the number of identical line directions ; and (3) the inner angle transition . By using the three geometric factors, an optimal model is chosen if its polygon has a small number of vertices and a small number of the identical line directions, and if the inner angle transition is smoother or more orthogonal.
Suppose that , , and are used for an initial model, while , , and are used for a hypothetical model generated from the initial model. To measure the description length for the number of vertices, we start by deriving the probability that a vertex is randomly selected from a given model, . Then, it can be expressed in bits as . Since a hypothetic model generated by hypothesis generation process has vertices, its description length is . Similarly, the probability for the number of identical line directions is and can be expressed in bits as . By considering the required number of line directions , the description length for identical line direction is measured by . To define line directions, we adopt compass line filter (CLF) suggested by Sohn et al. , as shown in Figure 5. The CLF is determined by the whole set of eight filtering lines with different slopes that is equally separated in steps of 22.5°. The representative angle for each slope, , is calculated by a weighted averaging of angles that takes the summed line length of each CLF slope into account.
Lastly, the description length for inner angle transition is measured by assigning a certain penalty value to quantized inner angles. As depicted in Equation (5), the penalty values are heuristically determined to have the minimum value of 0 (i.e., favour inner angle) if inner angle is close to 90° or 180°, while the maximum value of 2 (i.e., unfavorable inner angle) is assigned to very acute inner angles. This is because acute inner angle at two consecutive building vectors rarely appears in reality. Thus, the probability for can be derived from an inner angle that is located in one of the quantized angles, , and expressed in bits as . In the optimal model, the cost imposed by penalty values is , and its description length is calculated by .
As a result, the description length for sub-terms of model complexity is obtained by the summation of three geometric factors as follows:
where , , and are weight values for each sub-factor in the model complexity.
The hypothesis generation process proposes a set of possible hypotheses under certain configurations of a rooftop model (or building boundary). Suppose a rooftop model consists of a polygon and a polygon , where , and are common vertices in both polygons (Figure 6a). A task is to generate possible hypotheses at a certain vertex considering a given configuration of rooftop model. The hypothesis generation process starts by defining an Anchor Point (AP), Floating Point (FP), and Guide Point (GP) and then by deriving a Floating Line (FL = [AP, FP]) and Guiding Line (GL = [GP, FP]). The role of AP is to define the origin of a line to be changed (FL). FP is a point to be moved while GP is used to generate GL which guides the movement of FP. Hypotheses are generated by moving FP along the GL with AP as an origin of FL. The orientation of FL is determined by representative angles of CLF which consists of eight directions as shown in Figure 5. There are different cases for hypothesis generation: (1) depending on a relative direction of AP and FP (forward (clockwise) and backward (anti-clockwise)); (2) depending on whether a vertex is removed (removal or non-removal); and (3) depending on whether FP is a common vertex in more than two adjacent polygons (common vertex or non-common vertex). For the reader's understanding, some cases are explained as follows:
In the MDL-based objective function, two types of weight parameters are used to evaluate the relative importance of sub-terms. One is a weight parameter () for balancing the model closeness and the model complexity in Equation (1). The other is weight parameters () for the three sub-terms in the complexity term in Equation (6). In previous research , these weight parameters were set as constant values, which were empirically determined, for all building models ( = 0.5 and ). However, buildings have different shapes and sizes in reality. In addition, the density of LiDAR points varies on data acquisition settings and flight height. These properties, which vary on individual buildings, may cause unbalanced values in model closeness and model complexity. For instance, when building shape is very simple and the number of observations is significantly large, the closeness value is relatively larger than the complexity value. As a result, optimization process may be dominant to the variation of the model closeness. Thus, the weight parameters have to be appropriately tuned in an automated manner by individually considering the properties of each building. To automatically determine proper weight values, we propose two different weighting methods: (1) Min-Max weighting method (Section 4.1); and (2) Entropy-based weighting method (Section 4.2). The Min-Max weighting method is used to balance the model closeness and the model complexity while the Entropy-based weighting method is employed to determine the weight values for the three sub-terms in the complexity term.
The proposed MDL-based objective function consists of two conflicting terms: the model closeness term and the model complexity term as shown in Equation (1). is a weight parameter which affects modeling result. The smaller the value of , the simpler the optimal model is. In contrast, a larger value of emphasizes goodness-of-fit to data, causing under-simplified model (or over-fitting problem) (see Figure 7). To automatically estimate an appropriate weight value, we adopt Min-Max criterion , which minimizes possible loss while maximizing the gain. In this study, the Min-Max principle is closely related to minimizing the cost value DL for each and maximizing contributions from both of two terms, thereby finding the optimal . For each term, this leads to avoid the best scenario where one of two terms dominates by having an excessively low or high value of . To achieve this goal, the “Min” operator first finds the optimal model for each using Equation (2). Considering the boundary conditions, at and at corresponds to zero. Then, and are normalized using min-max normalization method, respectively, as follows:
where is a normalized value for the ith variable ; and are the minimum value and maximum value for variable x. After the total DL value is computed from normalized and for each , the “Max” operator derives an optimal weight value by selecting the worst scenario showing the maximum DL. Figure 7 shows an example of the Min-Max weighting method. As shown in Figure 7a, as is close to 0, a simple model is selected as the optimal model. As . gets larger, the optimal model is more complex because the DL value is more affected by the closeness term. In this example, 0.4 is selected as the best because it produces the maximum DL value.
Prior to determining the weight parameter , we estimate the weight values of geometric parameters forming the complexity term in Equation (6). The consists of three geometric terms including the number of vertices, the number of identical line directions and the inner angle transition.
In multi-attribute decision making, an entropy weighting method, which is one of the objective methods, is used to determine appropriate weights for attributes : the greater the value of the entropy corresponding to a special attribute, the smaller attribute’s weight. We adopt the entropy weighting method to determine the relative importance of three geometric terms in Equation (6). In information theory, entropy is understood as a measure of uncertainty about attributes drawn from data and can be normally characterized as follows:
The basic formulation can be rewritten to calculate entropy in the existence of two possibilities p and q = 1 − p as follows:
where p represents the event that a current hypothesized parameter set belongs to a class of optimal model parameters and q indicates the reverse situation of p. In this study, a probability for each term in Equation (6) is derived by calculating a probability that each geometric factor in a given model can converge to the optimal model. The optimal model in terms of model complexity, according to the definition of model complexity discussed in Section 3.1, is represented by a rectangle where the number of vertices is four, the number of identical line directions is two, and all inner angles have no penalty. Thus, the probability that four vertices are randomly selected from vertices is one over four combinations of , . Similarly, the probability that two identical line directions are selected from identical line directions is one over two combinations of , . The probability of inner angle with no penalty in Equation (5) is 3/16. Because all inner angles have no penalty to be optimal model, the probability for is . The estimated probabilities are converted into entropy using Equation (9). Weight parameters for three sub-terms are determined as suggested in previous studies [41,42]:
The performance of the proposed method was evaluated over the ISPRS benchmark datasets provided by the ISPRS WGIII/4 . The ISPRS benchmark datasets consist of three sub-regions (Area 1, Area 2, and Area 3) of the Vaihingen dataset, and two sub-regions (Area 4 and Area 5) of the Toronto dataset (Figure 8). The Vaihingen dataset was acquired by Leica ALS50 system at an altitude of 500 m above ground level in August 2008. Ten strips are overlapped with 30% rate and an average point density is approximately 6.7/m2 (~0.39 m point spacing). The 3D positional accuracy shows approximately ±10 cm. The Vaihingen dataset contains typical European building types showing various shapes including gable, hip roof, and their mixed structures. The Toronto dataset was taken by Optech’s ALTM-ORION M system at an altitude of 650 m in 2009. The test area includes six strips with about 6/m2 average point density (~0.41 m point spacing). The dataset contains representative scene characteristics of a modern mega city in North America including a mixture of low- and high-story building and a complex cluster of high-rise buildings. For both datasets, reference building models were generated by manual stereo plotting method. More detailed explanation on the data can be found in . To extract the building points, we applied the classification method described in .
The ISPRS benchmark project on urban classification and 3D building modeling led by ISPRS WGIII/4 provides evaluation metrics to estimate the results obtained from the latest state-of-the-art algorithms for building detection and 3D building reconstruction . The ISPRS evaluation metrics were designed for measuring the performance characteristics of individual algorithms by comparing multiple evaluation indices including confusion matrix (area-based and object-based), topological analysis among roof planes, and geometric accuracy (RMSE). Thus, the ISPRS metrics are used to evaluate our proposed method. In addition, we added two shape similarity measures (Hausdorff distance and turning function distance) and an angle-based evaluation index to evaluate different aspects of reconstructed building models. Hausdorff measures shape similarity between reference models and algorithmic models by taking the maximum distance among the minimum distances measured between each vertex for two model datasets . In contrast to RMSE, which assesses the average difference between two models, the Hausdorff distance can measure the maximum shape difference caused by over-simplification and under-simplification without any pre-defined value for the proximity criterion. The turning function distance represents a cumulative measure of the angles through which a polygonal curve turns . A turning function distance enables the direct measuring of turning pattern similarity between reference and algorithmic models. Thus, the turn function distance can measure a resemblance between two models at global scale. Additionally, an angle-based evaluation index measures the difference between main orientation of a building modeled in a reference dataset and the results produced by an algorithm. The main orientation of a building model is determined by analyzing the frequency of building lines for eight direction zones generated by the CLF. Table 1 summarizes an evaluation indices used in this paper.
Evaluations using confusion matrix were applied under three different conditions: (a) by applying area-based method for outer building boundary; and by applying object-based method (b) for all roof planes; and (c) for roof planes with more than 10 m2, respectively (Table 2).
In the area-based evaluation (Table 2a), our proposed rooftop reconstruction algorithm showed that the completeness, correctness, and quality of the reconstructed building models are 91.5%, 97.4%, and 89.2%, respectively. The results indicate that most of resulting building models were properly overlapped to the corresponding reference building models. The error rate for the completeness is larger than the error rate for the correctness. This is due to the fact that most of the boundary points from irregularly distributed points are not exactly located on the real building outline but they often feature a small offset to it. The inexact observations cause boundary displacement which is generally positioned toward the inside of the building. As a result, a building model tends to be shrunken compared to the reference building model. This leads to the increase of FNs and the decrease of TPs, degenerating the completeness.
In the object-based evaluation methods, a roof plane in one dataset was considered to be a true positive if a certain minimum percentage of its area (50% overlap) is covered by a roof plane in the other dataset. While the completeness, correctness, and quality for all roof planes are 79.5%, 96.0%, and 77.3%, respectively (Table 2b), the values are increased to 93.8%, 96.9%, and 91.3% if only large roof planes (>10 m2) are considered (Table 2c). The results indicate that small roof planes were not detected as well by our proposed method. This is mainly caused by the small number of points on small building roof planes which made it difficult to extract sufficient modeling cues for reconstructing rooftop models. Figure 9 clearly shows the effect of the size of roof plane. When only roof planes with an area smaller than 5 m2, are considered, the completeness is considerably low for all five datasets. In particular, the completeness for Area 2 (Figure 9b) and Area 5 (Figure 9e) were 26.3% and 37.4%, respectively. We observed that buildings in the two regions have many small objects on their roofs which were represented in reference building rooftop models.
As shown in Table 2, the area-based evaluations show that similar levels of model quality were achieved for both the Vaihigen dataset and the Toronto dataset. However, the object-based evaluations indicate that the model quality for the Vaihingen dataset is better than one for the Toronto dataset. This is mainly related to segmentation errors which occur more in complex scenes. We observed that many roof planes in the Toronto dataset were under-segmented by merging adjacent clusters. As a result, building rooftop models generated from under-segmented clusters caused a low success rate of the completeness.
In addition, we compared the evaluation results with those assessed for other algorithms that were reported in  where area-based evaluation results were not reported (Table 3). The object-based evaluation results (Table 3a) demonstrate that our method can outperform other building reconstruction algorithms except for the BNU in terms of the completeness and quality. In particular, when roof planes, whose area is larger than 10 m2, were considered, our proposed method showed more accurate results. The BNU, which outperform our method, was assessed only for Area 3. With regard to robustness, our proposed method outperforms the BNU. The correctness of our method is better than the average of all other evaluated methods. Considering that the correctness is above 90% for all compared methods except MON and FIE, the correctness of our method is large enough. In addition, the superiority of our method can be proven by Toronto dataset which consists of complex buildings. Only three participants submitted their results for Toronto dataset, and our method achieved the best results for all indices.
Geometrical errors in planimetry, and in height were assessed using RMSE. The RMSE measures Euclidean distance in two different ways: (1) from a vertex in the reconstructed rooftop model to its closest vertex in reference model; and (2) from a vertex in the reference model to its closest vertex in the reconstructed rooftop model. Note that only distances shorter than a certain tolerance distance (<3 m) were considered as introduced by .
The average RMSE of distances in planimetry for the Vaihigen dataset and the Toronto dataset are 0.76 m and 0.96 m, respectively. As shown in Table 3b, the geometric accuracy is better than the average geometric accuracy of building models reconstructed by other algorithms. Figure 10 shows the cumulative histogram of geometric accuracy in RMSE over the five sub-regions. Overall, more than 70% of evaluated vertices are located with less than 1.25 m RMSE. In most test regions, the results of RMSE of reference vertices (Figure 10b) are better than those of RMSE of extracted vertices (Figure 10a). The reason is that the proposed method provides under-simplified models with redundant vertices (i.e., having more numbers of vertices compared to the reference model). Note that the closest vertex within a certain tolerance distance (>3 m) was used to calculated RMSE. Thus, RMSE of extracted vertices, which have redundant vertices, tends to be worse than one of reference vertices.
Hausdorff distance was applied to 2D outer boundaries and to 3D roof planes with 1:1 correspondence, respectively (Table 4b). The averages of Hausdorff distance for 2D outer boundaries and for 3D roof planes are 1.81 m and 1.17 m, respectively. The results show that the maximum distance between the vertices of reference rooftop models and extracted rooftop models is expected to be less than roughly twice the RMSE by our proposed method. In addition, the average of the Hausdorff distance for 2D outer boundaries is larger than the value for 3D roof planes. This is mainly caused by topology relations between roof planes. As shown in Figure 11, two roof planes, which share a common edge in reference models (or in extracted models), were represented by separated roof planes in extracted models (or reference models). The different topology relations caused a large amount of shape differences in outer boundary representation.
Turning function distance, which measures how similar two shapes are, was applied to outer building boundaries and to roof planes with 90% overlap, respectively. Roughly, when the value is smaller than approximately 0.03, two corresponding shapes are very similar in terms of visual inspection. However, when the value is larger than approximately 0.05, the shapes are considerably dissimilar (Figure 12). For five sub-regions, the average turning function distances are 0.042 for 2D outer boundaries and 0.033 for 3D roof planes, respectively (Table 4c). Although turning function distances do not provide a specific range for which value is acceptable for building rooftop models, our results can be compared with examples given in Figure 12. The comparison indicates that the building rooftop models reconstructed by the proposed method can achieve acceptable shape similarities compared with reference building rooftop models in terms of visual inspection. Similarly to the results of Hausdorff distance, the turning function distance for 2D outer boundaries is larger than one for 3D roof planes due to different topologies and representations of rooftop models.
To evaluate the quality of model orientation, the angle difference was measured by calculating the difference of dominant orientations between reconstructed rooftop models and reference rooftop models. Table 4a shows the angle differences for five sub-regions where the averages of angle differences are 1.17° for 2D outer boundaries and 0.91° for 3D roof planes, respectively. Note that main angles for outer boundary and for 3D roof planes can be different because the main angle is separately determined for outer boundary and 3D roof planes. The orientation error was entirely caused by representative angles of CLF which were used to represent a regular pattern of the line orientation. The representative angles of CLF were calculated from all initial boundary lines connecting boundary points of individual building models without any prior knowledge of building orientations. Thus, a large amount of orientation error in small building models can be accidently caused if angles of the boundary lines were distorted by local distributions of boundary points.
Additionally, topology relations were assessed by comparing overlap area between reference rooftop planes and extracted rooftop planes. Table 5 represents the number of instances of 1:1, 1:M, N:1, and N:M relations. More than 63% of roof planes are matched with 1:1 relations; 22% of roof planes have N:1 relations; 7% of roof planes have 1:M relations; and 8% of roof planes have N:M relations. The topology errors are mainly caused by incorrect segmentation and incomplete modeling cues. In particular, relatively higher N:1 relations are caused by under-segmentations and superstructures on roofs which often occur in complex scene. Thus, the N:1 relations were observed more in the Toronto dataset.
To evaluate an effect of weight parameters in MDL-based objective function, we compared building models generated using fixed weight parameters with building models generated using the proposed weighting methods. Area-based evaluations using confusion matrix and shape-based indices were applied. The area-based evaluations using confusion matrix show an increase of 1.3% for the completeness, a decrease of 0.7% for the correctness, and an increase of 0.6% for the quality when the proposed weighting methods were used (Table 6). For Hausdorff distance and turning function distance, the improvements of 0.44 m and 0.003 were achieved, respectively (Table 7). While evaluation results using confusion matrix and evaluation results for turning function distance show slight improvements, the results for Hausdorff distance show relatively large improvements for all sub-regions except for Area 3. In addition, the most improvements for all evaluation methods were achieved by Area 4 where a relatively large number of shape differences at local scale between extracted models and reference models were observed. Figure 13 shows an example where shape difference at local scale is reduced by the proposed weighting methods. When fixed weight parameters were used, a lower part of the building model (red circle) was under-simplified (Figure 13c). This is related to the number of boundary points and a degree of model complexity. A large number of observations produced relatively high closeness value compared with complexity value. This caused imbalance between two values because fixed weight parameters do not consider the property of an individual building model. In contrast, the closeness term and the complexity term were balanced by using flexible weight parameters (Figure 13d). As shown in Table 6 and Table 7 and Figure 13, applying flexible weight values makes positive effects in preserving shapes similar to reference rooftop models.
Figure 14 visualizes reconstructed building rooftop models which are representative buildings of five sub-regions. Visual inspection indicates that the proposed building reconstruction method can robustly provide accurate regularized 3D building rooftop models in both simple scenes and complex scenes. Figure 15 shows all reconstruction building rooftop models over our test datasets.
In this study, we proposed an automatic 3D building reconstruction method which covers a full chain of rooftop modeling. Building-labeled points were segmented into homogeneous clusters with a hierarchical structure, which enables explicit interpretation of building rooftop configuration. To effectively gather evidence of a rooftop structure, three linear modeling cues including intersection line, step lines, and boundaries were separately extracted by considering their characteristics. In the proposed method, regularization is the most important process, which implicitly imposes geometric regularities on reconstructed rooftop models based on MDL principle. In the MDL framework, finding a regularized rooftop model was recognized as a model selection problem. The best model was selected by minimizing DL values among competing hypotheses generated by a newly designed hypothesis generation process. To automatically control weight parameters, a Min-Max based weighting method and Entropy-based weighting method were proposed. The experimental results showed that the proposed method can provide qualitatively and quantitatively well-regularized 3D building rooftop models. More specifically, the results are summarized as follows:
In current study, 3D point clouds obtained by airborne LiDAR was used as a primary information source for the automation of reconstructing rooftop models. However, the proposed method can be also applicable to photogrammetric point clouds generated by various dense matching technologies . As future work, we will investigate the impact of photogrammetric point clouds the quality of 3D rooftop models reconstructed, and thus seek for an optimal solution to make the proposed method be robust to various quality of point clouds. In addition, it will be required to examine the impact of the accuracy of building detection, especially in relation to the occlusion caused by the presence of vegetation and adjacent buildings.
This research was partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery and the NSERC DATA Analytics and Visualization (DAV) CREATE program. We thank ISPRS WG III/4 benchmark team led by Dr. Franz Rottensteiner at Leibniz University of Hannover, who provided valuable data and evaluated our results for supporting this research. We also thank Woo-Sug Cho from Inha University, who provided insights and expertise that assisted the research.
The following are available online at www.mdpi.com/1424-8220/17/3/621/s1, Video S1: Sequential results of implicit regularization for 3D rooftop reconstruction.
G.S conceived and designed the research work; J.J and Y.J contributed with the implementation of the proposed methods, experiments and data analysis; and J.J, Y.J. and G.S. wrote the paper.
The authors declare no conflict of interest.