Skip to content
#

scraping

Here are 1,856 public repositories matching this topic...

teodoroanca
teodoroanca commented Apr 16, 2020

Description

When I scrape without proxy, both https and http urls work.
Using proxy through https works just fine. My problem is when I try http urls.
In that moment I get the twisted.web.error.SchemeNotSupported: Unsupported scheme: b'' error

As I see, most of the people have this issue the other way around.

Steps to Reproduce

  1. Scrape a http link with proxy

**Expected

oldani
oldani commented Feb 18, 2019

If you're using proxies with requests-html and rendering JS sites is all good. Once you render a website pyppeteer don't know about this proxies and will expose your IP. This is an undesired behavior when scraping with proxies.

The idea is that whenever someone passes in proxies to the session object or any method call, make pyppeteer also use these proxies. #265

1BOB
1BOB commented Nov 17, 2017

问题是这样的,我想爬取商品分页的信息于是我用了个for循环,执行document = Jsoup.connect(domain+reviews+String.format(b, p)).get()改变p的值来改变评论的页码。

但是当爬完第一页后再爬取第二页评论时(没准备爬取一页评论时都会执行这句document = Jsoup.connect(domain+reviews+String.format(b, p)).get();)出现了这样的错误:
java.net.ConnectException: Connection timed out: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.s

jlvdh
jlvdh commented Nov 27, 2018

What is the current behavior?

Crawling a website that uses # (hashes) for url navigation does not crawl the pages that use #

The urls using # are not followed.

If the current behavior is a bug, please provide the steps to reproduce

Try crawling a website like mykita.com/en/

What is the motivation / use case for changing the behavior?

Though hashes are not ment to chan

ferret
brandonmp
brandonmp commented Nov 16, 2016

I'm trying to type some stuff into a page w/ artoo, and i think simulate() will do the trick.

i've never used simulate(), though, so i have no idea what the syntax is.

the github page for simulate linked in the artoo docs has no documentation, and the only docs i can find are for jquery-simulate-ext.

are there any examples I

Improve this page

Add a description, image, and links to the scraping topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the scraping topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.