Tell me more ×

Cross Validated is a question and answer site for statisticians, data analysts, data miners and data visualization experts. It's 100% free, no registration required.

How to think of features in NLP problems

up vote 2 down vote favorite

I am working on a Named Entity Recognition (NER) project. Instead of using an existing library, I decided to implement one from scratch because I wanna learn the basics of how PGMs work under the hood. I converted the words in sentences into feature vectors. The features are manually picked by me, and I can only think of roughly ~20 features (such as: "Is the token capitalized?", "Is the token an English word?", etc.). However, I've heard good NER algorithms represent tokens using way more than 20 features, sometimes hundreds of features. How do they manage to think of so many features? Are there any recommended best practices in feature construction?

edited Aug 6 at 20:25

David Marx
1,221111

asked Aug 6 at 18:43

xiaoyao
386

is your question about the thought process that goes into feature selection, or about what other additional features for a NER algorithm might be? – David Marx Aug 6 at 20:13

Hi David, I think I need to know more about what other additional features for NER, and also what are some common approaches to find these features. Thanks – xiaoyao Aug 6 at 20:29

One place to start: you might consider comparing the kinds of features you've developed with the kinds of features in the Stanford NER library (reference slides 10 and 11): nlp.stanford.edu/software/jenny-ner-2007.pdf – David Marx Aug 6 at 20:56

Often times the huge numbers of features can come from sets with extremely high cardinality, like the vocabulary in your document collection, the part of speech, and so on. It's also fairly common to use features from neighboring words, so it's not necessarily the case that people are thinking of lots of unique features focused only on the target token. – lmjohns3 Aug 9 at 21:21

Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

Your Answer

Sign up or login

Post as a guest

Name

Email required, but not shown

Post as a guest

Name

Email required, but not shown

discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged machine-learning text-mining feature-construction nlp or ask your own question.

question feed

asked	16 days ago
viewed	60 times

How to think of features in NLP problems

Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

Your Answer

Sign up or login

Post as a guest

Browse other questions tagged machine-learning text-mining feature-construction nlp or ask your own question.

Related Jobs

Related