PATRISTICv1.0 is a Java program that can be used as an applet on our website or downloaded. It calculates patristic distances from trees, generates scatter plots from ordered pairs of distances and calculates correlation coefficients and other statistics from distance matrices. It reads trees in variants of the Newick format, including the NEXUS variant used by the package PAUP [6
] and the variants used by the programs MEGA [6
], PHYLIP [8
], CLUSTALX [9
] and TREEPUZZLE [10
]. An algorithm that traverses the various textual representations of trees [11
] was used to calculate the patristic distances, along with code that permitted different tree-text formats to be read, permitted the easy selection of matrices for plotting from a large number of stored matrices and permitted matrices and plots to be displayed. PATRISTICv1.0 runs on Windows, Mac and Linux systems with the Java Runtime Environment. A patristic distance matrix from a tree of 187 gene sequences was calculated in 12 seconds in a PC with an AMD CPU at 2.2 GHz and 256 RAM using the JRE 1.5. PATRISTICv1.0 was tested by calculating patristic distances by hand across several small trees and in every case the results of the program were found to be accurate.
The program also recognises distance matrices calculated by other programs from other components of sequence data, such as evolutionary distances calculated from pair-wise sequence comparisons. It reads distance matrices generated by the programs MEGA, PAUP and PHYLIP. For the current version, these externally generated matrices must be presented as upper-right or lower-left hemi-matrices or a column. Other measures of that can be converted into distances, such as the isolation dates of virus samples, may be entered for each species or gene as real numbers. If sample times are entered directly PATRISTICv1.0 will generate a matrix of time differences between the species or sequences.
The order of sequences or species represented in a tree almost always differs from the order in the original data file from which the tree was found. Hence to plot patristic distances against distances calculated by other methods or patristic distances from two trees, the program automatically reorders matrices with matching sequence or species labels. Matrices may also be edited and reordered within an editing window.
A regression is calculated from the ordered pairs of distances when two matrices are plotted against each other (Figure ) and simple statistics such as the sums of the distances are displayed. Correlation coefficients, differences and quotients between the ordered pairs of distances may also be calculated using PATRISTICv1.0, as may the mean and standard deviation of the differences or quotients, and the program has a facility to enter other formulae so that other statistics can be calculated.
Figure 1 A screenshot of PATRISCTICv1.0 in operation with a plot of patristic distances calculated from two different regions of bunyavirus segment S sequences. The green points represent paired distances that are more than two standard deviations from the mean (more ...)
Points on a plot that lie outside a chosen multiple of the standard deviation are identified by colour. Points are also automatically identified on a plot when the mouse cursor is moved over them. A zoom feature allows the user to focus on a specific part of a plot by choosing the minimum and maximum distances for the two axes which correspond to the two distance matrices. Plots of distance matrices may be inverted relative to the axes by a single mouse click. The user can also determine the scale used on the axes as well as the dot size.
Plots may be saved as postscript or jpeg files allowing editing in graphics software in a vectorial or bitmap format. The program also allows distance matrices to be saved in a coma separated value format (CSV) as a full matrix or as columns so that they may be entered into a spreadsheet program. Matrices can also be saved in the DIP format used by the software DIPLOMO [12