Advances in Information Technology have led to an exponential increase in the amount of data that scientists collect, to the extent that they are now in dire need of new methodologies to summarize and visualize the corresponding large datasets efficiently and rapidly. This is partly the reason that the studies of complex networks, and in particular the identification of community structures within these networks have become a primary focus of research in many fields 
. Interestingly, this surge in network research in social, biological, physical and mathematical sciences and numerous other fields has also brought a significant surge in the popularity of the hierarchical clustering (HC) algorithm, which was originally proposed more than half a century ago 
. The main reasons for the popularity of HC methods are that they are seemingly easy to set up, their computing requirements are usually small, and they provide visual information on data at low costs. As it has become common practice now, a HC tree is constructed on the basis of a choice of a empirical relational measure, either similarity or distance, among object nodes constituting a data cloud of interest, and an ad hoc choice of module, such as complete, single linkage or many others, for prescribing “distances” among sets of nodes 
. This tree is then conveniently perceived as being able to reveal multi-scale structural information on the data cloud, such as which nodes and which sets of nodes are close to each other. Such a convenient visual apparatus is seemingly bestowed with a “local-to-global” capability. It is not unusual for some scientists to report achieving the ideal ultimate goal of partitioning object nodes into optimally homogeneous clusters in a multi-scale fashion with the HC technique.
Are all these achievements assigned to the HC algorithm “too good to be true”? After being widely used in many scientific areas, indeed confusing questions and doubts in the validity of HC methods have been raised 
. Despite many such confusions and doubts so far there has been neither satisfactory justifications nor sustainable repudiations for the HC algorithm reported in literature. Nowadays a practitioner is more likely led to place doubts about an incoherent hierarchical clustering tree on his/her own choice of empirical relational measure for the data than on the HC algorithm itself.
Let us start with a review of Hierarchical clustering as it is the method of choice for partitioning data into subsets that share similarities. Starting with an empirical distance or similarity measure
, HC proceeds by first merging the two most similar data points. All subsequent steps require a distance between groups of data points. This was solved elegantly by Lance and Williams 
, who proposed a recurrence formula to compute the updated inter cluster distance values that result from the mergers which occur at each level of the procedure. The recurrence formula gives the distance
between a data point
and a cluster
as a function of the empirical distances
are parameters which define the linkage process.
An interesting property of this recurrence relation is that it usually induces a monotonic hierarchy (i.e. the values in the distance matrix increase monotonically during the agglomerative hierarchical clustering), with the exception of the centroid and median linkage methods 
. Johnson 
had shown that an algorithm that produces a monotonic hierarchy also induces a distance metric known as the ultrametric, i.e. that satisfies:
for all triplets
refer to any subsets of the data points. This inequality is clearly stronger that the triangular inequality of a general metric; it has been argued that it should be preserved to capture the true structure of the data set 
While most hierarchical clustering algorithms are designed to preserve an ultrametric, they are unfortunately very sensitive to the quality of the empirical distance measure used to compare individual data points. If this empirical distance satisfies the ultrametric inequality, also called strong triangular inequality, HC is expected to perform well. However, it is doubtful that real life data set and distance measure satisfy the ultrametric property exactly. Even if a margin of errors is allowed for each comparison, it was shown that ultrametric hierarchical clustering techniques are not robust with respect to the actual underlying cluster structure in the presence of noise in the empirical distance measure 
Noise however is not the only inherent problem of hierarchical clustering. The clustering structure obtained with HC is usually very complex with very many levels. Different choices of the ultrametric, such as complete linkage (i.e. pairwise maximum) or single linkage (i.e. pairwise minimum) often result in different hierarchies. As such, the ultrametric embedded in HC poorly reflects the geometry of the data cloud. Note that this ultrametric is imposed by the method, and not derived from the data. The DCG-tree procedure described in this paper is designed to alleviate this difficulty by letting the empirical distance measure and the data define the ultrametric.
The main argument we make in this paper is that a good partitioning of data into clusters can only be achieved if we have a good understanding of the data geometry and topology 
. Many clustering techniques have been developed to reach this understanding. Most of those techniques can be formulated as a discrete optimization problem, in which case they involve two distinct steps, namely (i) the definition of some suitable cost function, and (ii), the computation of a partitioning of the data which minimizes this cost function. The number of potentially suitable cost function for clustering is arbitrarily large 
; in fact, clustering techniques can be classified based on the similarity of their cost functions 
. Once the cost function is defined, in principle any optimization technique can be used to solve for the optimal partitioning of the data. In practice however, exhaustive approaches are deemed intractable because of the dimensionally of the problems at hand. Many heuristic techniques have therefore been developed (for review, see Puzicha, Hofmann and Buhmann 
. Among those, it is worth mentioning simulated annealing techniques based on Gibbs sampling 
, deterministic annealing 
, and mean field annealing 
. These three types of method have in common that they rely on a “temperature” parameter. This parameter can be optimized during the simulation to improve convergence: in the simulated annealing protocol for example, the temperature is gradually lowered, mimicking annealing process in metallurgy. It also provides the algorithm with the possibility to monitor phase transitions (i.e. cluster splits) in order to obtain a meaningful tree topology (see for example Rose 
Transforming the clustering problem into an optimization problem is however not a necessity. We have recently proposed an alternate approach that is inspired from statistical physics, in par with the deterministic annealing and mean field annealing methods mentioned above, that makes use of a temperature parameter to monitor transitions, but that does not explicitly consider a cost function 
. The main idea of this method is to embed the data geometry into a ferromagnetic potential landscape; its implementation is then based on two key observations. Firstly, it is observed that the empirical distance measure
imposes a weighted graph onto the collection of data points (renamed “nodes” in this context). By equating the weight on an edge with a ferromagnetic potential, this weighted graph is seen as equivalent to a potential landscape, typically characterized by many wells with various depths. Secondly, it is possible to explore this landscape and therefore define its geometry by using the popular dynamic Monte Carlo approach. A random walk as a function of “time” will identify the many wells of the potential, as well as the probability of jumping from one well to another. An additional advantage of using dynamic Monte Carlo is that it provides a different dimension to explore the geometry of the landscape, characterized with its temperature parameter
. At a high temperature
, a Markovian walk on the energy landscape will transition from any node to most of the other nodes with more or less equal probabilities. At a low temperature however, the Markov chain tends to get trapped in potential wells for various periods of time depending on the sizes of the well before it can escape. These two observations led to the following two-device algorithm, named Data Cloud Geometry or DCG, for deriving the underlying multi-scale geometry of a data cloud 
. At a given temperature
, a regulated random walk on the equivalent ferromagnetic landscape as a function of “time” detects information about the number of clusters and the corresponding cluster membership of individual data points. By repeating this procedure at different temperature, the algorithm derives the geometric hierarchy of the data cloud. DCG is similar in spirit to the granular model, which achieve clustering by a sequence of phase transitions on a paramagnetic potential landscape 
. Its implementation however is simpler and more effective computationally. It has been applied to analyze fMRI data 
, as well as to study binary networks 
The DCG procedure originally proposed by Fushing and McAssey 
is designed to extract unknown geometric information from a data cloud. In this paper we extend this concept and propose to summarize the information collected by DCG in the form of an ultrametric topological space, which is equivalent to a hierarchical tree, the DCG-tree that can also be represented with a Parisi matrix. We validate this approach on simulated and real data for different fields of applications with the corresponding HC-trees. We use these results to illustrate some of the key features of the method, including its robustness with respect to measurement errors, its ability to work on non convex data, and its self-correcting mechanisms. We discuss these results in comparison with similar results obtained with hierarchical clustering. We conclude with a discussion on further developments.