Python Text Processing with NLTK 2.0 Cookbook
Tokenizing Text and WordNet Basics
Tokenizing text into sentences
Tokenizing sentences into words
Tokenizing sentences using regular expressions
Filtering stopwords in a tokenized sentence
Looking up synsets for a word in WordNet
Looking up lemmas and synonyms in WordNet
Calculating WordNet synset similarity
Replacing and Correcting Words
Lemmatizing words with WordNet
Translating text with Babelfish
Replacing words matching regular expressions
Spelling correction with Enchant
Replacing negations with antonyms
Creating a part-of-speech tagged word corpus
Creating a chunked phrase corpus
Creating a categorized text corpus
Creating a categorized chunk corpus reader
Creating a MongoDB backed corpus reader
Corpus editing with file locking
Training a unigram part-of-speech tagger
Combining taggers with backoff tagging
Training and combining Ngram taggers
Creating a model of likely word tags
Tagging with regular expressions
Chunking and chinking with regular expressions
Merging and splitting chunks with regular expressions
Expanding and removing chunks with regular expressions
Partial parsing with regular expressions
Training a tagger-based chunker
Training a named entity chunker
Chaining chunk transformations
Converting a chunk tree to text
Bag of Words feature extraction
Training a naive Bayes classifier
Training a decision tree classifier
Training a maximum entropy classifier
Measuring precision and recall of a classifier
Calculating high information words
Combining classifiers with voting
Classifying with multiple binary classifiers
Distributed Processing and Handling Large Datasets
Distributed tagging with execnet
Distributed chunking with execnet
Parallel list processing with execnet
Storing a frequency distribution in Redis
Storing a conditional frequency distribution in Redis
Storing an ordered dictionary in Redis
Distributed word scoring with Redis and execnet
Parsing dates and times with Dateutil
Time zone lookup and conversion
Tagging temporal expressions with Timex
Extracting URLs from HTML with lxml
Converting HTML entities with BeautifulSoup