Tagged Questions
6
votes
2answers
230 views
Why does this regex take so long to find email addresses in certain files?
I have a regular expression that looks for email addresses ( this was taken from another SO post that I can't find and has been tested on all kinds of email configurations ... changing this is not ...
4
votes
1answer
524 views
How to perform web scraping to find specific linked pages in Java on Google App Engine?
I need to retrieve text from a remote web site that does not provide an RSS feed.
What I know is that the data I need is always on pages linked to from the main page (http://www.example.com/) with a ...
3
votes
2answers
203 views
Python find file download link on webpage
I need a regex that will return to me the text contained between double quotes that starts with a specified text block, and ends with a specific file extension (say .txt). I'm using urllib2 to get ...
3
votes
2answers
74 views
How can I extract sentences with years in them with a regex?
I'm parsing Wikipedia articles. I want to extract every sentence with a year in it. The year can be anything from 1000 - 2012. Below is the regex I've been trying, but I can't quite get it right. ...
3
votes
3answers
93 views
PHP Filtering an array for 1 url
I made a script that creates an array of urls scraped from a page and I want to filter the array for just 1 certain url.
The array currently looks like this:
Array
(
[0] => index.jsp
[1] ...
3
votes
3answers
279 views
Regex pattern with subpattern exceptions (Python)
I am using BeautifulSoup to extract tabledata tags from a table. The TD's have a class of either 'a','u','e','available-unavailable' or 'unavailable-available'. (Yes, I know quirky class names but ...
2
votes
2answers
258 views
Android/Java: Html scraping, regex album art from Spotify
I'm working on a project that requires me to scrape an image link to an album art from open.spotify
Example: http://open.spotify.com/track/296mPMQavmf1vvxYrUvLN8
In this example I'm looking for this ...
1
vote
3answers
370 views
PHP web scraping
I use php web scraping, and I want to get the price (3.65) on Sunday form the html code below:
<tr class="odd">
<td >
<b>Sunday</b> Info
...
1
vote
4answers
181 views
PHP regex to return <option> values
Just wondering if you can help me out a bit with a little task I'm trying to do in php.
I have text that looks something like this in a file:
(random html)
...
<OPTION VALUE="195" ...
1
vote
1answer
219 views
Cannot find data from string using regex while string.find() works just fine
import re
import urllib
p = urllib.urlopen("http://sprunge.us/QZhU")
page = p.read()
pos = page.find("<h2><span>")
print page[pos:pos+48]
c = ...
1
vote
3answers
268 views
Web Scraping of Person Descriptions
I've attempted to build a program to scrape the web for company management teams. It's very accurate at obtaining many things, including:
-names
-job titles
-images
-emails
-Qualifications (MD, ...
1
vote
1answer
75 views
Inquiry: Why is my regex code not reading all characters?
I have the following description I want scrap using my program.
<hr>Provides AFROTC cadets up to 13 options for practical leadership and specialized training
through exposure to USAF ...
1
vote
2answers
1k views
Need to ignore case in preg_match_all usage
Im trying to crape html and grab items between <tr> tags. Some of the tags are coming through as uppercase for some reason ( <TR> ) and are being ignored by my pattern. How can i tell my ...
1
vote
1answer
964 views
How do I get rid of characters like ' that appear instead of apostrophes? [duplicate]
Possible Duplicate:
Convert XML/HTML Entities into Unicode String in Python
I am attempting to scrape a website using Python. I import and use the urllib2, BeautifulSoup and re modules.
...
1
vote
3answers
2k views
How can I parse specific info from html source code using Java
I know there is lots of topics for my question but I couldnt find helpful solution for my answer. I could connect to website and read line by line in Java, now here is my problem. I want to parse a ...