CNFs are a recently developed probabilistic graphical model (

Peng *et al.*, 2009), which integrates the power of both Conditional Random Fields (CRFs) (

Lafferty *et al.*, 2001) and neural networks (

Haykin, 1999). CNFs borrow from CRFs by parameterizing conditional probability in the log-linear form, and from neural networks by implicitly modeling complex, non-linear relationship between input features and output labels. CNFs have been applied to protein secondary structure prediction (

Wang *et al.*, 2010), protein conformation sampling (

Zhao *et al.*, 2010) and handwriting recognition (

Peng *et al.*, 2009). Here we describe how to model protein sequence–template alignment using CNF.

Let

*T* denote a template protein with solved structure and

*S* a target protein without solved structure. Each protein is associated with some protein features, such as sequence profile, (predicted) secondary structure, and (predicted) solvent accessibility. Let

*A*={

*a*_{1},

*a*_{2},…,

*a*_{L}} denote an alignment between

*T* and

*S* where

*L* is the alignment length and

*a*_{i} is one of the three possible states

*M*,

*I*_{t} and

*I*_{s}.

*M* represents two residues in alignment,

*I*_{t} denotes an insertion at the template protein, and

*I*_{s} denotes an insertion at the target protein. As shown in , an alignment can be represented as a sequence of states drawn from

*M*,

*I*_{t} and

*I*_{s,} and assigned a probability calculated by our CNF model. The alignment with the highest probability is deemed optimal. We calculate the probability of an alignment

*A* as follows:

where θ is the model parameter vector to be trained,

*i* indicates one alignment position and

*Z*(

*T*,

*S*) is the normalization factor (i.e. partition function) summing over all possible alignments for a given protein pair. The function

*E* in Equation (

1) estimates the log-likelihood of state transition from

*a*_{i−1} to

*a*_{i} based upon protein features. It is a non-linear scoring function defined as follows:

where

and ϕ represent edge and label feature functions, respectively, estimating the log-likelihood of states (or state transitions) from protein features. Both the edge and label feature functions can be as simple as a linear function or as complex as a neural network. Here we use neural networks with only one hidden layer to construct them. Due to space limitations, we will only explain the edge feature function in detail. The label feature function is similar but slightly simpler. In total, there are nine different state transitions, so there are nine edge feature functions, each corresponding to one type of state transition. shows an example of the edge feature function for the state transition from

*M* to

*I*_{t}. Given one state transition from

*u* to

*v* at position

*i*, where

*u* and

*v* are two alignment states, the edge feature function is defined as follows:

where the function

*f* is the feature generation function, generating input features from the target and template proteins for the alignment at position

*i*. The feature generation function is state-dependent, so we may use different features for different state transitions. In Equation (

3),

*j* is the index of a neuron in the hidden layer, λ

_{u,v}^{j} is the model parameter between that hidden neuron and the output layer,

*H*_{u,v}^{j}(

*x*)(=1/(1+

*e*^{−x})) is the gate function for the hidden neuron conducting non-linear transformation of input, and

*w*_{u,v}^{j} is the model parameter vector connecting the input layer to a hidden neuron. All the model parameters are state-dependent, but position-independent. In all, there are nine different neural networks for the nine state transitions. These neural networks have separate model parameters. In total, they constitute the model parameter vector θ introduced in Equation (

1). We obtained the best performance when using 12 hidden neurons in the hidden layer for all neural networks.