¹²This is similar to train and execute in the same corpus and thus is used as a smoothing technique. I also performed experiments with leave-one-out generalization, but such approach was much slower and produced slightly lower performance.