PJ Steiner:CRFall
Notes on using Kevin Murphy's CRF matlab toolbox.
- Kevin Murphy's page: http://www.cs.ubc.ca/~murphyk/Software/CRF/crf.html
- Another page with some notes: http://people.cs.uchicago.edu/~dinoj/crfallnotes.html (I'm not sure it's necessary to add a feature w/ value = 1.0 to every feature vector as done there.)
Getting it to work
To get testCRF
to work, you'll probably need to recompile repmatC.c
in the directory KPMtools
. (It won't work if you're on *nix, and it won't be found if you're on OS X. I haven't tried on Windows.) Do this using mex
. It may be on your path in *nix:
mex /path/to/CRFall/KPMtools/repmatC.c
It won't be on your path in OS X:
/Applications/MATLAB_R2008b.app/bin/mex /path/to/CRFall/KPMTools/repmatC.c
Linear Chain CRFs (crfchain
)
I'm only working with the functions for linear chain CRFs. I think what follows is correct.
Limitations
You don't define 'feature functions' normally --- you don't define arbitrary functions of the form [math]\displaystyle{ f_k(t, y_{t-1}, y_t, \mathbf{x}) }[/math] each with a weight [math]\displaystyle{ \lambda_k }[/math].
Instead, you compute a 'feature vector' for each position in the sequence which is input to both the training and inference functions. Note that the symbol at position [math]\displaystyle{ i }[/math] has to be part of this feature vector. (For DNA, I defined four features corresponding to {A, T, G, C} and set the appropriate feature to one for each position.)
This effectively restricts you to two types of feature functions:
- [math]\displaystyle{ f_k(y_{t-1}, y_t) }[/math]
- These are potentials between states, and they're constant (can't be sequence dependent). The code learns these automatically.
- [math]\displaystyle{ f_k(t, \mathbf{x}) }[/math]
- These are the features in the feature vectors. Their values depend only on the input sequence. That sounds bad, but, the code learns weights [math]\displaystyle{ \lambda_{kq} }[/math] --- each is specific to a feature [math]\displaystyle{ k }[/math] and a state [math]\displaystyle{ q }[/math].
I don't think this is as powerful as a general linear chain CRF, but it's still more powerful than an HMM.
Training
To train, you need cell array of matrices X{s}(f,t)
and a cell array of row vectors Y{s}(t)
. X{s}
is a matrix --- each row corresponds to a feature, and each column corresponds to position in the sequence. Y{s}
is a row vector containing the correct state labels for the sequence. (I believe state labels must be integers --- they're used to index matrices.)
The function crfchaintrain
does the training:
chain = crfchaintrain(chain, X, Y, ... 'gradAlgo', 'scg', ... 'checkGrad', 'on', ... 'MaxIter', max_iter, ... 'verbose', 1);
Only chain
, X
, and Y
are required arguments; the others are options.
Note: I've found that increasing the number of states (i.e. labels) can really increase running time.
Inference
To infer hidden states using a trained CRF chain, you need a matrix X_test(f,t)
of feature vectors for the unlabeled sequence. (As above, rows correspond to individual features, columns correspond to elements in the sequence.) You infer like so:
bel_chain = crfchaininfer(chain, X_test); [V, I] = max(bel_chain);
crfchaininfer
computes probabilities for each state at each symbol in the sequence. (Rows are states, columns are symbols.) The call to max
gives you two row vectors. V
has the biggest energy/probability for each element in the sequence; I
has the index into each column (i.e. the state) corresponding to that biggest value.