Generalized vertex adjacency

Let

*S *be a gene network with a gene set

*V *= {1,...,

*n*}. Two genes

*g *and

*h *are

*i-adjacent*, and the pair (

*g, h*) is an

*i-adjacency*, in the gene network

*S*, written in

in

*S*, if there are

*i - *1 genes between them in

*S *along a shortest path from one gene to the other. We define genes

*g *and

*h *to be (

*i, j*)

*-adjacent*, and the pair (

*g, h*) is called an (

*i, j*)

*-adjacency*, in two gene networks

*S *and

*T*, if they are

*i*-adjacent in either one of the gene networks and

*j*-adjacent in the other. We say

*g *is an

*i-adjacent neighbor *of the gene

*h *in a gene network

*S*, if

*g *and

*h *are

*i*-adjacent in

*S*.

We denote

the set of all

*i *adjacencies in a network

*M *, where 1 ≤

*i *≤ Θ For two networks

*S *and

*T *with the same vertex set

*V *= {1,...,

*n*}, we define a subset of

*C * *V *to be a (

*θ, ψ*)

*generalized adjacency cluster*, or (

*θ, ψ*)

*cluster*, if all vertices in the subset

*C *are also the whole vertices of a connected component of the graph

To obtain (

*θ, ψ*) clusters of two gene networks,

*S *and

*T*, the new network

need to be created first. The network

can be constructed by connecting two genes of gene networks

*S *and

*T *if they are

*i*-adjacent in

*S *and

*j*-adjacent in

*T*, where max(

*i, j*) ≤ max (

*θ, ψ*) and min(

*i, j*) ≤ min (

*θ, ψ*). Figure illustrates how the grid networks

*S *and

*T *determine the (1, 2) clusters {2,3,4,5,7,9,10,12,13,14,15,19,20},{11,17,18,22} and {16,21,23,24}. Figures and depict the same process for triangular graphs and hexagonal graphs, respectively.

Weight function

The definition of generalized adjacency cluster in the previous section does not discriminate among pairs of (*i, j*)-adjacent genes as long as *i *and *j *are less than some cut-off values. However, it seems reasonable to think that (*i, j*) with smaller *i *and *j *should be weighted more heavily in defining clusters. To explore this, consider two networks *S *and *T *with the same vertices. Let *w*_{ij }be the *weight *on two vertices that are (*i, j*)-adjacent, i.e., *i*-adjacent in one of the networks and *j*-adjacent in the other, such that

1. 0 ≤

*ω*_{ij }=

*ω*_{ji}, i, j {1, 2,...,

*n*-1}

2.

3. *ω*_{i, j }≥ *ω*_{k, l }if

(a) max(*i, j*) *<*max(*k, l*) or

(b) max(*i, j*) = max(*k, l*) and min(*i, j*) *<*min(*k, l*)

This is a very general class of weights with reasonable monotonicity and total weight conditions. We define the *dissimilarity *between two gene networks *S *and *T *as

where

*P *is the number of pairs (

*x, y*) that are (1, 1)-adjacent in two identical gene networks.

*n*_{ij }is the total number of pairs (

*x, y*) that are

*i*-adjacent in

*S *and

*j*-adjacent in

*T*.

*l *is the diameter of the network. We have argued elsewhere [

4] that the "natural" way of finding weights is to minimize

*d *and we proved the following surprising

**Theorem 1**.

*Let The weight ω that minimizes d*(

*S, T*)

*has*where k* is an integer and maximizes the function

*where n*_{ij }is the number of gene pairs i-adjacent on S and j-adjacent on T.

This suggests that uniform weights are appropriate for all (

*i, j*) adjacencies up to a certain cutoff. Empirical work indicates that

*k* *is of the order of

where

*n *is the number of vertices in the network and so the cutoff would be for

*i *and

*j *to be less than some value

E.g., for a network with 100 vertices, it should suffice to consider 2- and 3-adjacencies, but 4-adjacencies need not be considered.

The expected number of (*i, j*) adjacencies in two random networks

An essential step in studying gene clusters is to verify their significance. Random networks are often used to estimate the significance of clusters. In this section, we represent some characteristics of the expected number of (*i, j*) adjacencies in two random networks, which can then be used in evaluating cluster significance.

**Theorem 2**. *Let M be a randomly labelled square grid network with N vertices. Then the number of i adjacencies, n*_{i}, in the network M converges in distribution to the Poisson with parameter

*the expected number of i adjacencies in the network M*.

*Proof*. Because *M *is a random square grid network, we can use a coordinate system to represent it. Vertices in the network correspond to the points in the plane with integer coordinates, x-coordinates being in the range 1,..., *m*, y-coordinates being in the range 1,..., *n*, where *N *= *mn*. Without loss of generality, we set *m *≤ *n*. Two vertices in the network are *i*-adjacent if the *L*_{1 }distance between them in the integer coordinates is *i*.

Let

be 1 if vertices

*u, v *are

*i*-adjacent in the network

*M *and 0 for otherwise. Then

*n*_{i }=

Since most vertices have 4

*i i*-adjacent neighbors, we can show that

where the error term is due to edge effects [

9]. Since

*N *=

*mn*,

where the error term includes the edge effects detailed in equation(5). Then

Therefore, based on the proof of Theorem 2 in [

10], we can conclude that

*n*_{i }converges in distribution to the Poisson with parameter

*E*(

*n*_{i}) the expected number of

*i *adjacencies in the network

*M*.

**Theorem 3**. *For S and T two random square grid networks with the same N vertices, the number of pairs of vertices n*_{ij }that are i-adjacent in S and j-adjacent in T converges in distribution to the Poisson with parameter

*the expected number of *(*i, j*) *adjacencies in networks S and T*.

*Proof*. Let

be 1 if vertices

*g, h *are

*i*-adjacent in the random square grid network

*S *and 0 otherwise. Similarly, define

to be 1 if vertices

*g, h *are

*j*-adjacent in the random square grid network

*T *and 0 otherwise. Let

be 1 if vertices

*g, h *are

*i*-adjacent in

*S *and

*j*-adjacent in

*T*. Otherwise

Because of the independence of

*g, h *being

*i*-adjacent in

*S *and

*j*-adjacent in

*T*, the probability that

*g *and

*h *are (

*i, j*)-adjacent in

*S *and

*T *is

So the expected number of (*i, j*)-adjacencies in the two networks *S *and *T *is

The term

in equation (10) represents the total number of (

*g, h*) combinations in two networks

*S *and

*T *based on pairs of location of (

*g, h*) in

*S *and

*T*. There are

pairs of location possible for (

*g, h*) in each of two networks and 2 alternatives for each gene pair (

*g, h*) in

*S *and

*T*. So

Hence, the expected number of (

*i, j*)-adjacencies in the two networks

*S *and

*T *is

Therefore, based on the proof of Theorem 2 in [

10], we can conclude that

*n*_{ij }converges in distribution to the Poisson with parameter

*E*(

*n*_{ij}), the expected number of (

*i, j*) adjacencies in networks

*S *and

*T*.

More generally we can use the same techniques to prove Theorems 4 and 5:

**Theorem 4**. *Let D be the degree of a gene in the random genetic grid network, i.e. the number of *1*-adjacent neighbors of this gene in the network. For two random genetic lattice networks S and T with same genes, the number of pairs of genes n*_{ij }that are i-adjacent in S and j-adjacent in T converges in distribution to the Poisson with parameter

Even for networks as small as 400, simulations indicate that the distribution of *n*_{ij }is close to the Poisson in Theorem 4, for square (*D *= 4), hexagonal (*D *= 3), triangular (*D *= 6) grids as well as linear networks (*D *= 2). Looking beyond regular networks:

**Theorem 5**. *Let D*_{k}(*M*) *be the number of k-adjacent gene pairs in the random network M. For two random networks S and T with same vertices, the number of pairs of vertices n*_{ij }that are i-adjacent in S and j-adjacent in T converges in distribution to the Poisson with parameter