There are three different ways of accessing the clan information. First, there is an additional release flatfile, Pfam-C, which contains all of the clan information and a list of the Pfam families that are members of the clan. Second, all of the information is contained in the Pfam MySQL database that we make available for download. Third, clan information can be accessed via the websites. There are two web entry points to the clan information. A user can ‘browse by’ a list of clans or follow links from clan member families (). For each clan, we display annotation and a list of Pfam families that constitute the clan (). In addition, there are links to two additional features; a clan relationship diagram and a clan alignment.
Summary of the new website features and web services, including server location
The clan relationship diagrams show how the individual families are related to each other (). To produce these diagrams, we perform an all-against-all profile–profile comparison between the clan members. In the relationship diagram Pfam families are graph nodes. Edges are added between nodes when a significant profile–profile score is observed between two nodes (represented by solid lines). After all edges have been added in this way, any nodes/domains that have no connecting edges are identified. Where possible, these detached nodes are connected by adding an edge between it and the node in the clan with the highest scoring profile–profile score that falls above the 0.001 threshold (i.e. E
-values 0.001–10). A dashed line represents these edges in the final graph. Domains that have been brought into the clan based on a structural similarity may remain detached, indicating that profile–profile comparisons are not able to detect all distant relationships. The E
-values used to construct the edges are displayed. These E
-values are also clickable links to a visualization of the profile–profile alignment (10
The clan alignment is an alignment of all the clan seed alignments (). These are produced by an option in MUSCLE (11
) that aligns two input multiple sequence alignments without altering their local alignments. Where more than two seed alignments are being aligned, we use the profile–profile comparison scores to guide the progressive alignment procedure so that the most similar seed alignments are aligned first, before more divergent alignments. These alignments are pushing the boundaries of what is feasible to align by sequence alone, so the alignments must be treated with some caution.