Google's Duplicate Content webmaster guide defines duplicate content (for purposes of search engine optimization) as "substantive blocks of content within or across domains that either completely match other content or are appreciably similar".
Duplicate content is typically penalized (either excluded from or given a lower priority in search results) by search engines because it is often associated with scraper sites which copy content wholesale and simplistic article spinning techniques which generate "new" content by selectively replacing words in existing content.
Google's guide goes on to list the following as examples of duplicate content:
- Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
- Store items shown or linked via multiple distinct URLs
- Printer-only versions of web pages
Whether or not a given piece of content or technique will be considered duplicate content by a search engine is debatable without a full accounting of the algorithms used to rank and index sites, however, you can avoid penalties associated with duplicate content on your site by observing the following guidelines:
- Ensure that content is only accessible under one canonical URL
- If your site must return the same content under multiple URLs (e.g. for a "print view" page) specify a canonical URL manually with a link element in the document header
- In cases where your site returns similar content based upon parameters encoded in the URL (e.g. sorting a product catalog) exclude the URL parameters in Google Webmaster Tools