Nowadays, biological projects and experiments become more complex and bigger in scope and data that are produced are magnitudes higher than in the past. The increasing use of high-throughput technologies multiplies the amount of data generated per experiments and rapidly increases the sizes of the databases. In the analysis step we can identify the visualization of data as an already major bottleneck. The pure amount of data and their heterogeneity pose a challenge for efficient visualization tools. The main goal of the visualization tools should be the intuitive representation of data to provide an efficient interpretation and to allow a hypothesis driven planning of the next experiment.
In order to improve the scaling for big datasets most layout algorithms follow a heuristic approach instead of an exhaustive implementation. Despite the wealth of existing algorithms, the layout problem still remains one of the crucial bottlenecks in network visualization. Faster and more efficient algorithms are needed to bring especially large-scale networks into a form that can be easily understood and interpreted by the human brain. One way to circumvent the problem is parallelization of layout, clustering and graph theory algorithms that they can handle large networks. One solution could be the implementation of web services or libraries that outsource most of the computational effort and calculations to distant, powerful machines that can run many parallel jobs would greatly speed up the process of visualization and reduce the computational load on local computers. Another solution would be the way that these layout algorithms are written to be written in such a way that they can take advantage of the multi-core CPU or GPU technologies.
In addition, the extension of layout algorithms to encompass a third dimension would be one central step towards a new generation of visualization tools. This becomes more important in cases like pathway or heterogeneous data sets visualization. The extra dimension would allow a clearer structure and less cluttered views and could strongly facilitate a better navigation within the network. The extension of layout algorithms into three dimensions could thus render the representation of large-scale networks much more efficient, because 3D space minimizes the chance of crossover between two edges.
The third dimension would also offer an opportunity to fill a crucial gap in network data visualization; that is the representation of time. Currently, most network tools do not attempt to visualize time series data [42
] and thus only produce a static snapshot of all the interactions happening in dynamic systems. Introducing the parameter time as an extra dimension into network visualization tools would thus achieve a more complete picture of complex and highly dynamic biological systems. Being able to investigate the dynamics of a system could provide breakthroughs in fields such as pathway analysis or the observation of interaction at different cell cycle time points.
The rapid growth of data calls for the incorporation of powerful filters into visualization tools. Filters that reduce the noise in a dataset and restrict the user's attention to a core set of nodes of a particular interest could greatly improve visualization. Similarly, more efficient and interactive graphical user interfaces (GUIs) would allow the user to visualize and explore relevant sub networks or limited areas of a whole network without having to sieve through vast data masses. To increase the performance of visualization tools further, efficient handling and allocation of memory is essential. This can be achieved by loading only the necessary parts of the graph into memory. In this way, the amount of data and the taxonomies that can be visualized can be rapidly increased. Of course, Graphical Process Units (GPUs) hardware performance increases over time, something that allows visualization tools to employ more resource demanding algorithms like those handling advanced graphics calculations.
The future generation of visualization tools should aim to reduce the gap between analysis and visualization. Most existing visualization tools only incorporate a limited number of data analysis functionalities, making it necessary to constantly switch between different applications. The user has to be aware of the variety of tools that are suitable to analyze his data and must switch between them. Information and data sharing between different tools has become a much simpler task due to standard file formats, which should be supported by newly developed visualization tools. Standard formats that are applicable to many different data types will be key features for the growing need to integrate heterogeneous data into a network. A true marriage of analysis and visualization, however, cannot be achieved merely by the support of multiple, standard file formats. Instead, future visualization applications should directly include several of the analytical functionalities that are available in the presented tools.
Ideally, the next generation of visualization tools should be able to present very heterogeneous data coming from databases, experiments and text-mining applications. They need to be able to visualize multi-edged networks, incorporate widely used clustering techniques, pattern recognition algorithms and statistical analysis methods. While technology evolves the visualization tools could explore the wider use of autostereoscopic 3D displays, which allow seeing three-dimensional images without the need of special glasses. A visualization tool designed to integrate most of the aforementioned functionalities would greatly simplify large-scale research in molecular biology and would significantly cut down time and effort spent on data processing and analysis.
In summary and to provide some concrete solutions to visualization tool challenges we suggest the following:
• Visualization should be able to load and save data using worldwide standard file formats.
• Incorporation of appropriate statistical analysis of the networks.
• Algorithms that allow comparative analysis of different networks.
• Implementation of libraries and services that allow layout algorithms to run in distant powerful computers.
• Efficient layout algorithms that are able to use multi-core CPU technology.
• Algorithms that implement rendering and graphical calculations in GPU.
• Expansion of layout algorithms into 3D space especially for the visualization of pathway or heterogeneous data.
• Visualization of the network behavior and its changes over time. Such animations are currently possible using Flash technologies.