PJ Steiner:CRFall

From OpenWetWare
Jump to navigationJump to search

Notes on using Kevin Murphy's CRF matlab toolbox.

Getting it to work

To get testCRF to work, you'll probably need to recompile repmatC.c in the directory KPMtools. (It won't work if you're on *nix, and it won't be found if you're on OS X. I haven't tried on Windows.) Do this using mex. It may be on your path in *nix:

 mex /path/to/CRFall/KPMtools/repmatC.c

It won't be on your path in OS X:

 /Applications/MATLAB_R2008b.app/bin/mex /path/to/CRFall/KPMTools/repmatC.c


Linear Chain CRFs (crfchain)

I'm only working with the functions for linear chain CRFs. I think what follows is correct.

Limitations

You don't define 'feature functions' normally --- you don't define arbitrary functions of the form [math]\displaystyle{ f_k(t, y_{t-1}, y_t, \mathbf{x}) }[/math] each with a weight [math]\displaystyle{ \lambda_k }[/math].

Instead, you compute a 'feature vector' for each position in the sequence which is input to both the training and inference functions. Note that the symbol at position [math]\displaystyle{ i }[/math] has to be part of this feature vector. (For DNA, I defined four features corresponding to {A, T, G, C} and set the appropriate feature to one for each position.)

This effectively restricts you to two types of feature functions:

  • [math]\displaystyle{ f_k(y_{t-1}, y_t) }[/math]
These are potentials between states, and they're constant (can't be sequence dependent). The code learns these automatically.
  • [math]\displaystyle{ f_k(t, \mathbf{x}) }[/math]
These are the features in the feature vectors. Their values depend only on the input sequence. That sounds bad, but, the code learns weights [math]\displaystyle{ \lambda_{kq} }[/math] --- each is specific to a feature [math]\displaystyle{ k }[/math] and a state [math]\displaystyle{ q }[/math].

I don't think this is as powerful as a general linear chain CRF, but it's still more powerful than an HMM.


Training

To train, you need cell array of matrices X{s}(f,t) and a cell array of row vectors Y{s}(t). X{s} is a matrix --- each row corresponds to a feature, and each column corresponds to position in the sequence. Y{s} is a row vector containing the correct state labels for the sequence. (I believe state labels must be integers --- they're used to index matrices.)

The function crfchaintrain does the training:

 chain = crfchaintrain(chain, X, Y,         ...
                       'gradAlgo', 'scg',   ...
                       'checkGrad', 'on',   ...
                       'MaxIter', max_iter, ...
                       'verbose', 1);

Only chain, X, and Y are required arguments; the others are options.

Note: I've found that increasing the number of states (i.e. labels) can really increase running time.


Inference

To infer hidden states using a trained CRF chain, you need a matrix X_test(f,t) of feature vectors for the unlabeled sequence. (As above, rows correspond to individual features, columns correspond to elements in the sequence.) You infer like so:

 bel_chain = crfchaininfer(chain, X_test);
 [V, I] = max(bel_chain);

crfchaininfer computes probabilities for each state at each symbol in the sequence. (Rows are states, columns are symbols.) The call to max gives you two row vectors. V has the biggest energy/probability for each element in the sequence; I has the index into each column (i.e. the state) corresponding to that biggest value.