PJ Steiner:CRFall

Notes on using Kevin Murphy's CRF matlab toolbox.


 * Kevin Murphy's page: http://www.cs.ubc.ca/~murphyk/Software/CRF/crf.html


 * Another page with some notes: http://people.cs.uchicago.edu/~dinoj/crfallnotes.html (I'm not sure it's necessary to add a feature w/ value = 1.0 to every feature vector as done there.)

Getting it to work
To get  to work, you'll probably need to recompile   in the directory. (It won't work if you're on *nix, and it won't be found if you're on OS X. I haven't tried on Windows.)  Do this using. It may be on your path in *nix:

mex /path/to/CRFall/KPMtools/repmatC.c

It won't be on your path in OS X:

/Applications/MATLAB_R2008b.app/bin/mex /path/to/CRFall/KPMTools/repmatC.c

Linear Chain CRFs
I'm only working with the functions for linear chain CRFs. I think what follows is correct.

Limitations
You don't define 'feature functions' normally --- you don't define arbitrary functions of the form $$f_k(t, y_{t-1}, y_t, \mathbf{x})$$ each with a weight $$\lambda_k$$.

Instead, you compute a 'feature vector' for each position in the sequence which is input to both the training and inference functions. Note that the symbol at position $$i$$ has to be part of this feature vector. (For DNA, I defined four features corresponding to {A, T, G, C} and set the appropriate feature to one for each position.)

This effectively restricts you to two types of feature functions:


 * $$f_k(y_{t-1}, y_t)$$
 * These are potentials between states, and they're constant (can't be sequence dependent). The code learns these automatically.


 * $$f_k(t, \mathbf{x})$$
 * These are the features in the feature vectors. Their values depend only on the input sequence.  That sounds bad, but, the code learns weights $$\lambda_{kq}$$ --- each is specific to a feature $$k$$ and a state $$q$$.

I don't think this is as powerful as a general linear chain CRF, but it's still more powerful than an HMM.

Training
To train, you need cell array of matrices  and a cell array of row vectors. is a matrix --- each row corresponds to a feature, and each column corresponds to position in the sequence. is a row vector containing the correct state labels for the sequence. (I believe state labels must be integers --- they're used to index matrices.)

The function  does the training:

chain = crfchaintrain(chain, X, Y,        ...                        'gradAlgo', 'scg',   ...                        'checkGrad', 'on',   ...                        'MaxIter', max_iter, ...                        'verbose', 1);

Only,  , and   are required arguments; the others are options.

Note: I've found that increasing the number of states (i.e. labels) can really increase running time.

Inference
To infer hidden states using a trained CRF chain, you need a matrix  of feature vectors for the unlabeled sequence. (As above, rows correspond to individual features, columns correspond to elements in the sequence.) You infer like so:

bel_chain = crfchaininfer(chain, X_test); [V, I] = max(bel_chain);

computes probabilities for each state at each symbol in the sequence. (Rows are states, columns are symbols.) The call to   gives you two row vectors. has the biggest energy/probability for each element in the sequence;  has the index into each column (i.e. the state) corresponding to that biggest value.