scraper
Here are 3,480 public repositories matching this topic...
don't know how to do
If one opens the link to the docs provided in README the Readme opens on readthedocs.io. There is no navigation bar to find where one can browse to quick start page or advanced. You can only go there if one searches quick start and click on the page. Then there are navigation links for browsing through the docs.
Jus for the record:
I'm using Firefox (60.9.0 esr) on Windows 10 Pro.
Really gr
-
Updated
Aug 10, 2019 - Python
-
Updated
Apr 30, 2020 - PHP
What is the current behavior?
Crawling a website that uses # (hashes) for url navigation does not crawl the pages that use #
The urls using # are not followed.
If the current behavior is a bug, please provide the steps to reproduce
Try crawling a website like mykita.com/en/
What is the motivation / use case for changing the behavior?
Though hashes are not ment to chan
The developer of the website I intend to scrape information from is sloppy and has left a lot of broken links.
When I execute an otherwise effective Ferret script on a list of pages, it stops altogether at every 404.
Is there a DOCUMENT_EXISTS or anything that would help the script go on?
-
Updated
May 24, 2020
This post is an example - scraper does not collect IG TV posts, just FYI - these will be missing from the meta data json.
-
Updated
May 26, 2020 - HTML
scoreText is broken
Tested in search and list methods, scoreText shows ' Rated 4.3 stars out of five stars ' instead of 4.3
The "API Documentation" link on: http://felipecsl.com/wombat/ points to http://rubydoc.info/gems/wombat/2.1.1/frames
On that page, the "API Documentation" link points to https://www.rubydoc.info/gems/wombat/2.0.0/frames and so on.
Unrelated, gemnasium badge is reporting errors.
Hope this little helps. I'd send a PR, but I'm not using the gem right now.
Documentation incorrectly states that any software accepting CONNECT method could be used as a proxy
Hello,
I was trying to build by own image with a 3rd party HTTP proxy.
Expected Behavior
According to the documentation:
you can use every software which accept the CONNECT method (Squid, Tinyproxy, etc.).
Actual Behavior
This is not the case because Scrapoxy expects to receive 200 response on http://xx.xx.
-
Updated
May 12, 2020 - Python
-
Updated
May 24, 2020 - JavaScript
Issue description
The original title key translates the title. It should not.
Version of IMDbPY, Python and OS
- Python:
3.6.9 - IMDbPY:
6.9dev20200125(installed from the repo here) - OS:
uname_result(system='Linux', node='blackfx', release='4.15.0-76-generic', version='#86-Ubuntu SMP Fri Jan 17 17:24:28 UTC 2020', machine='x86_64', processor='x86_64')
-
Updated
Dec 12, 2019 - Python
-
Updated
May 31, 2020 - Scala
As suggested by one of the programmers:
I would include a section to your README explaining how you'd combine the library with actually making HTTP requests. You could suggest a recommended approach. Otherwise it's another decision that an end user has to make, potentially leaving them to use x-ray instead.
Good call. Will include a section about how we do it at Applaudience.
-
Updated
Jan 4, 2018 - Python
Improve this page
Add a description, image, and links to the scraper topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the scraper topic, visit your repo's landing page and select "manage topics."


I'd suggest one of those clickable eyeballs next to the credential for viewing it.
Possibly require a password to view/change it.