All Questions
Tagged with regex web-scraping
16 questions
8
votes
3
answers
686
views
Function in Python to extract web Data
I developed this feature that I think can be improved quite a bit. The result is the desired one, but it took me many lines of code. Any idea to optimize it?
...
3
votes
0
answers
627
views
Scraper Class with Regex
I already posted a not (yet) complete version of this class, but that question got closed, because it contained not passing tests.
Here is the full version, all 5 tests pass.
Originally there was ...
2
votes
1
answer
602
views
Scraping data unveiling a button from craigslist
I've written some code to parse the names and phone numbers from craigslist. It starts from the link in "m_url" then goes one layer deep to parse the name and then again another layer deep to parse ...
2
votes
1
answer
739
views
JavaScript Website-Content Grabber
I my firm a few people have following problem:
A Content Management System is hosted externally. The treaty doesn't include database-access.
In September the treaty will expire. So they have to get ...
7
votes
2
answers
5k
views
Using python and beautifulsoup to iterate through a list of websites to find a particular string
I'm attempting to find companies who mention a particular service in on their homepage. To do this, I am iterating through a csv file with two columns - ID and URL. I'm using BeautifulSoup to get the ...
2
votes
1
answer
78
views
Parsing HTML to download e-books
I'm currently writing a little tool to get into Go.
As I'm not familiar with the language I'm especially looking for
Conventional go stuff.
utility.go feels wrong.Should I wrap the client and email/...
13
votes
1
answer
373
views
Regex-guided crawler that downloads regex-matching images up to a crawling level
This is one simple crawler that downloads images from websites, the website's URL to be crawled to must match the regex, as well as any image-to-download's URL.
(Also, I know, I made my own thread ...
2
votes
1
answer
82
views
Wikipedia indexer and shortest link finder
I have the following code, how can I make it more efficient? Also, it doesn't always find the shortest route. (See Cat -> Tree)
...
3
votes
2
answers
193
views
Parsing HTML from multiple webpages simultaneously
My friend wrote a scraper in Go that takes the results from a house listing webpage and finds listings for houses that he's interested in. The initial search returns listings, they are filtered by ...
7
votes
3
answers
302
views
IP and router connections
How can I make my code more pythonic ? I definitely think there is a way to make this code a lot more readable and clear + shorter...
But I haven't found an effective way. Any techniques I can use to ...
1
vote
1
answer
197
views
Formatting HTML for use in a locally hosted iframe
This formats HTML for use in a locally hosted iframe so that you can manipulate the content in the iframe freely, without running into cross domain issues. It uses Goutte to retrieve the HTML. I'd ...
5
votes
2
answers
288
views
Press any login button on any site
I'm working on a script that will be able to press the login button on any site for an app I'm working on. I have it working (still a few edge cases to work out such as multiple submit buttons and ...
2
votes
2
answers
959
views
Phone Number Extracting using RegEx And HtmlAgilityPack
I've written this whole code to extract cell numbers from a website. It is extracting numbers perfectly but very slowly, and it's also hanging my Form while Extracting.
...
3
votes
1
answer
74
views
Cheat Code Scraper
During breaks, I find myself playing Emerald version a lot and was tired of having to use the school's slow wifi to access the internet. I wrote a scraper to obtain cheat codes and send them to my psp ...
15
votes
1
answer
60k
views
Getting data correctly from <span> tag with beautifulsoup and regex
I am scraping an online shop page, trying to get the price mentioned in that page. In the following block the price is mentioned:
...