Skip to content
#

web-scraping

Here are 1,541 public repositories matching this topic...

opensource-assist
opensource-assist commented Feb 2, 2020

Affected file: grab/document.py

>>> import libgenapi
... /usr/local/lib/python3.9/site-packages/grab/document.py:35: DeprecationWarning: defusedxml.lxml is no longer supported and will be removed in a future release.
  import defusedxml.lxml

The defusedxml.lxml subpackage will be removed in a future release, so be

HBossier
HBossier commented May 6, 2017

Hello,

Bit a silly comment and possibly it has been given before (though I could not find it in the issues list).
But the Conditions of Use from IMDb explicitly state that:

Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express written consent as noted below.

Hence, u

HugoPoi
HugoPoi commented Jan 17, 2020

Might be good to add this to the documentation :

  • user-agent strings provided is not "random" because user-agents.json.gz contains browser fingerprints not user-agents strings so it will give you a representation of what is most used at the time period depending of the version of the lib.
  • the db is updated in full not incremental, so old UA are ventilated

For example :

Top user agen

bombledmonk
bombledmonk commented Mar 21, 2019

I have to admit I haven't spent any time troubleshooting, but it does look like this doesn't function as is anymore.

wayback-machine-scraper -f 20080623 -t 20080623 news.ycombinator.com
2019-03-21 11:50:11 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: scrapybot)
2019-03-21 11:50:11 [scrapy.utils.log] INFO: Versions: lxml 4.3.2.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.1, w3li
paras55
paras55 commented Feb 16, 2020

when run with following code :
with MyConnectionScraper(cookie='AQEDAS9oddoAec7fAAABcD_bsSwAAAFwY-g1LFEAR5RwykzJFoxZQ1ZjaMH2vcXgsasLMFb0GwyGbqgh_guqW-K122YvSwg2_zhDnX_gbpdXrYjPqY5Mq9U2o3KmrfQCYbImYSAmUOTVDuPpUoBPkS') as scraper:
connections=scraper.scrape()
people=connections.to_dict()

no such element: Unable to locate element: {"method":"css selector","selector":".mn-connections > h

Improve this page

Add a description, image, and links to the web-scraping topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-scraping topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.