My Account | Newsletters | User Groups | Contact | About Us

Python Text Processing with NLTK 2.0 Cookbook Table of Contents

Python Text Processing with NLTK 2.0 Cookbook

Table of Contents

Preface
Chapter 1: Tokenizing Text and WordNet Basics
Chapter 2: Replacing and Correcting Words
Chapter 3: Creating Custom Corpora
Chapter 4: Part-of-Speech Tagging
Chapter 5: Extracting Chunks
Chapter 6: Transforming Chunks and Trees
Chapter 7: Text Classification
Chapter 8: Distributed Processing and Handling Large Datasets
Chapter 9: Parsing Specific Data
Appendix: Penn Treebank Part-of-Speech Tags
Index

Preface

Chapter 1: Tokenizing Text and WordNet Basics
- Introduction
- Tokenizing text into sentences
- Tokenizing sentences into words
- Tokenizing sentences using regular expressions
- Filtering stopwords in a tokenized sentence
- Looking up synsets for a word in WordNet
- Looking up lemmas and synonyms in WordNet
- Calculating WordNet synset similarity
- Discovering word collocations

Chapter 2: Replacing and Correcting Words
- Introduction
- Stemming words
- Lemmatizing words with WordNet
- Translating text with Babelfish
- Replacing words matching regular expressions
- Removing repeating characters
- Spelling correction with Enchant
- Replacing synonyms
- Replacing negations with antonyms

Chapter 3: Creating Custom Corpora
- Introduction
- Setting up a custom corpus
- Creating a word list corpus
- Creating a part-of-speech tagged word corpus
- Creating a chunked phrase corpus
- Creating a categorized text corpus
- Creating a categorized chunk corpus reader
- Lazy corpus loading
- Creating a custom corpus view
- Creating a MongoDB backed corpus reader
- Corpus editing with file locking

Chapter 4: Part-of-Speech Tagging
- Introduction
- Default tagging
- Training a unigram part-of-speech tagger
- Combining taggers with backoff tagging
- Training and combining Ngram taggers
- Creating a model of likely word tags
- Tagging with regular expressions
- Affix tagging
- Training a Brill tagger
- Training the TnT tagger
- Using WordNet for tagging
- Tagging proper names
- Classifier based tagging

Chapter 5: Extracting Chunks
- Introduction
- Chunking and chinking with regular expressions
- Merging and splitting chunks with regular expressions
- Expanding and removing chunks with regular expressions
- Partial parsing with regular expressions
- Training a tagger-based chunker
- Classification-based chunking
- Extracting named entities
- Extracting proper noun chunks
- Extracting location chunks
- Training a named entity chunker

Chapter 6: Transforming Chunks and Trees
- Introduction
- Filtering insignificant words
- Correcting verb forms
- Swapping verb phrases
- Swapping noun cardinals
- Swapping infinitive phrases
- Singularizing plural nouns
- Chaining chunk transformations
- Converting a chunk tree to text
- Flattening a deep tree
- Creating a shallow tree
- Converting tree nodes

Chapter 7: Text Classification
- Introduction
- Bag of Words feature extraction
- Training a naive Bayes classifier
- Training a decision tree classifier
- Training a maximum entropy classifier
- Measuring precision and recall of a classifier
- Calculating high information words
- Combining classifiers with voting
- Classifying with multiple binary classifiers

Chapter 8: Distributed Processing and Handling Large Datasets
- Introduction
- Distributed tagging with execnet
- Distributed chunking with execnet
- Parallel list processing with execnet
- Storing a frequency distribution in Redis
- Storing a conditional frequency distribution in Redis
- Storing an ordered dictionary in Redis
- Distributed word scoring with Redis and execnet

Chapter 9: Parsing Specific Data
- Introduction
- Parsing dates and times with Dateutil
- Time zone lookup and conversion
- Tagging temporal expressions with Timex
- Extracting URLs from HTML with lxml
- Cleaning and stripping HTML
- Converting HTML entities with BeautifulSoup
- Detecting and converting character encodings

Appendix: Penn Treebank Part-of-Speech Tags

Index

Book backreference:

Python Text Processing with NLTK 2.0 Cookbook

Your Shopping Cart

There are no items in your cart.

Title idea submission

If you have an idea for a book you would like to see Packt publish, please fill in this form, and you may see it published.

	RSS Feed
	Sign up to Packt's newsletter
	Follow Packt at Twitter
	Join our Facebook Group

Customer service body text... insert content here.

Returns Centre body text... insert content here.

Packt Updates

Sign up for updates, offers, free downloads and you could win an iPod shuffle.

Most Viewed

Packt Offers

eBook Bundle offer

Buy any two eBooks of your choice and get 50% off.
View Best-Selling Bundles
Updated with New Best-Sellers!

Special eBook Offer

Buy any 5 Open Source eBooks of your choice for $60 | £40 | €50 or any 5 Enterprise eBooks of your choice for $100 | £65 | €80. Grab your copies now.

Subscribe to PacktLib

PacktLib is Packt’s online digital book library. PacktLib provides you with the opportunity to view and search across every single book Packt publishes, within seven days after publication.

Visit http://PacktLib.PacktPub.com for more details.

Cover Image

To submit your images Click Here

Packt Updates

Sign up for updates, offers, free downloads and you could win an iPod shuffle.

Footer Copyright

Cookie Policy | Privacy Policy | Terms & Conditions | About Us | Address

This site requires cookies to be enabled in your browser. Packt Publishing 2012

We accept the following

We accept the following payment methods