Generalized vertex adjacency
Let
S be a gene network with a gene set
V = {1,...,
n}. Two genes
g and
h are
i-adjacent, and the pair (
g, h) is an
i-adjacency, in the gene network
S, written in

in
S, if there are
i - 1 genes between them in
S along a shortest path from one gene to the other. We define genes
g and
h to be (
i, j)
-adjacent, and the pair (
g, h) is called an (
i, j)
-adjacency, in two gene networks
S and
T, if they are
i-adjacent in either one of the gene networks and
j-adjacent in the other. We say
g is an
i-adjacent neighbor of the gene
h in a gene network
S, if
g and
h are
i-adjacent in
S.
We denote

the set of all
i adjacencies in a network
M , where 1 ≤
i ≤ Θ For two networks
S and
T with the same vertex set
V = {1,...,
n}, we define a subset of
C
V to be a (
θ, ψ)
generalized adjacency cluster, or (
θ, ψ)
cluster, if all vertices in the subset
C are also the whole vertices of a connected component of the graph

To obtain (
θ, ψ) clusters of two gene networks,
S and
T, the new network

need to be created first. The network

can be constructed by connecting two genes of gene networks
S and
T if they are
i-adjacent in
S and
j-adjacent in
T, where max(
i, j) ≤ max (
θ, ψ) and min(
i, j) ≤ min (
θ, ψ). Figure illustrates how the grid networks
S and
T determine the (1, 2) clusters {2,3,4,5,7,9,10,12,13,14,15,19,20},{11,17,18,22} and {16,21,23,24}. Figures and depict the same process for triangular graphs and hexagonal graphs, respectively.
Weight function
The definition of generalized adjacency cluster in the previous section does not discriminate among pairs of (i, j)-adjacent genes as long as i and j are less than some cut-off values. However, it seems reasonable to think that (i, j) with smaller i and j should be weighted more heavily in defining clusters. To explore this, consider two networks S and T with the same vertices. Let wij be the weight on two vertices that are (i, j)-adjacent, i.e., i-adjacent in one of the networks and j-adjacent in the other, such that
1. 0 ≤
ωij =
ωji, i, j ![[set membership]](/corehtml/pmc/pmcents/x2208.gif)
{1, 2,...,
n-1}
2.

3. ωi, j ≥ ωk, l if
(a) max(i, j) <max(k, l) or
(b) max(i, j) = max(k, l) and min(i, j) <min(k, l)
This is a very general class of weights with reasonable monotonicity and total weight conditions. We define the dissimilarity between two gene networks S and T as
where
P is the number of pairs (
x, y) that are (1, 1)-adjacent in two identical gene networks.
nij is the total number of pairs (
x, y) that are
i-adjacent in
S and
j-adjacent in
T.
l is the diameter of the network. We have argued elsewhere [
4] that the "natural" way of finding weights is to minimize
d and we proved the following surprising
Theorem 1.
Let
The weight ω that minimizes d(
S, T)
has where k* is an integer and maximizes the function
where nij is the number of gene pairs i-adjacent on S and j-adjacent on T.
This suggests that uniform weights are appropriate for all (
i, j) adjacencies up to a certain cutoff. Empirical work indicates that
k* is of the order of

where
n is the number of vertices in the network and so the cutoff would be for
i and
j to be less than some value

E.g., for a network with 100 vertices, it should suffice to consider 2- and 3-adjacencies, but 4-adjacencies need not be considered.
The expected number of (i, j) adjacencies in two random networks
An essential step in studying gene clusters is to verify their significance. Random networks are often used to estimate the significance of clusters. In this section, we represent some characteristics of the expected number of (i, j) adjacencies in two random networks, which can then be used in evaluating cluster significance.
Theorem 2. Let M be a randomly labelled square grid network with N vertices. Then the number of i adjacencies, ni, in the network M converges in distribution to the Poisson with parameter
the expected number of i adjacencies in the network M.
Proof. Because M is a random square grid network, we can use a coordinate system to represent it. Vertices in the network correspond to the points in the plane with integer coordinates, x-coordinates being in the range 1,..., m, y-coordinates being in the range 1,..., n, where N = mn. Without loss of generality, we set m ≤ n. Two vertices in the network are i-adjacent if the L1 distance between them in the integer coordinates is i.
Let

be 1 if vertices
u, v are
i-adjacent in the network
M and 0 for otherwise. Then
ni =

Since most vertices have 4
i i-adjacent neighbors, we can show that
where the error term is due to edge effects [
9]. Since
N =
mn,
where the error term includes the edge effects detailed in equation(5). Then
Therefore, based on the proof of Theorem 2 in [
10], we can conclude that
ni converges in distribution to the Poisson with parameter
E(
ni) the expected number of
i adjacencies in the network
M.
Theorem 3. For S and T two random square grid networks with the same N vertices, the number of pairs of vertices nij that are i-adjacent in S and j-adjacent in T converges in distribution to the Poisson with parameter
the expected number of (i, j) adjacencies in networks S and T.
Proof. Let

be 1 if vertices
g, h are
i-adjacent in the random square grid network
S and 0 otherwise. Similarly, define

to be 1 if vertices
g, h are
j-adjacent in the random square grid network
T and 0 otherwise. Let

be 1 if vertices
g, h are
i-adjacent in
S and
j-adjacent in
T. Otherwise

Because of the independence of
g, h being
i-adjacent in
S and
j-adjacent in
T, the probability that
g and
h are (
i, j)-adjacent in
S and
T is
So the expected number of (i, j)-adjacencies in the two networks S and T is
The term

in equation (10) represents the total number of (
g, h) combinations in two networks
S and
T based on pairs of location of (
g, h) in
S and
T. There are

pairs of location possible for (
g, h) in each of two networks and 2 alternatives for each gene pair (
g, h) in
S and
T. So

Hence, the expected number of (
i, j)-adjacencies in the two networks
S and
T is
Therefore, based on the proof of Theorem 2 in [
10], we can conclude that
nij converges in distribution to the Poisson with parameter
E(
nij), the expected number of (
i, j) adjacencies in networks
S and
T.
More generally we can use the same techniques to prove Theorems 4 and 5:
Theorem 4. Let D be the degree of a gene in the random genetic grid network, i.e. the number of 1-adjacent neighbors of this gene in the network. For two random genetic lattice networks S and T with same genes, the number of pairs of genes nij that are i-adjacent in S and j-adjacent in T converges in distribution to the Poisson with parameter
Even for networks as small as 400, simulations indicate that the distribution of nij is close to the Poisson in Theorem 4, for square (D = 4), hexagonal (D = 3), triangular (D = 6) grids as well as linear networks (D = 2). Looking beyond regular networks:
Theorem 5. Let Dk(M) be the number of k-adjacent gene pairs in the random network M. For two random networks S and T with same vertices, the number of pairs of vertices nij that are i-adjacent in S and j-adjacent in T converges in distribution to the Poisson with parameter