I am working on a Named Entity Recognition (NER) project. Instead of using an existing library, I decided to implement one from scratch because I wanna learn the basics of how PGMs work under the hood. I converted the words in sentences into feature vectors. The features are manually picked by me, and I can only think of roughly ~20 features (such as: "Is the token capitalized?", "Is the token an English word?", etc.). However, I've heard good NER algorithms represent tokens using way more than 20 features, sometimes hundreds of features. How do they manage to think of so many features? Are there any recommended best practices in feature construction?
Tell me more
×
Cross Validated is a question and answer site for statisticians, data analysts, data miners and data visualization experts. It's 100% free, no registration required.