All Questions
Tagged with web-scraping java
23 questions
4
votes
1
answer
117
views
Java classes for downloading all in-coming/out-going links of an article in the Wikipedia article graph
(The entire project is in GitHub.)
Introduction
This project provides facilities for generating in-coming or out-going links in a given Wikipedia page.
Code
...
4
votes
1
answer
102
views
Webscraping tennis data 1.1
I incorporated the substantial changes suggested in my previous question that involved building a web-scraper for gathering tennis data.
The improved code is shown below:
...
8
votes
1
answer
265
views
Webscraping tennis data
So as a starter Java project I decided to web scrape some data (specifically all historically No. 1 ranked players for weeks starting from 1973) from the ATP website, and do something with it (IPR). I'...
2
votes
2
answers
708
views
Jsoup connection to URL
I have simple class that I want to ask if is there any possible to improve it? I mean, for me it looks poor. Is there any way to use here try-with-resources, stream or ...
0
votes
1
answer
3k
views
YouTube page scraping using Jsoup
I am trying to scrape the YouTube video streaming page to get the metadata of the video. I am considering this YouTube page as an example. You can find the HTML contents of that page over here (I have ...
2
votes
1
answer
122
views
Part of web crawler
According to the first part: Forum crawler - counts statistics for words in chosen forum topic
I take into account the review of Janos and created Iterator for my classes.
This is part of the whole ...
1
vote
1
answer
190
views
Forum crawler - counts statistics for words in chosen forum topic
I have made a skeleton part of the crawler and would like to ask you for a review. I'm not sure especially about the way how I divide app into classes.
What app does?
It scraps through all the ...
2
votes
1
answer
2k
views
Class for scraping images with JSoup
I refactored this class as far as I'm capable of at the time but I wonder if it can't be better. One thing that I'm not sure of, is that I take parameters from a method, which is not a constructor, ...
3
votes
0
answers
52
views
Regularly watch recent posts of a blog for specific words with HTML scraping
Task
I want to watch the "Recent Posts" section of a blog for changes/new posts but only for specific posts containing a pre-defined word. Afterwards a list should be outputted in the console with ...
6
votes
2
answers
4k
views
Multithreaded webcrawler
I've been trying to learn Java for the last day or two. This is the first project I am working on, so please bear with me. I worked on a multithreaded web crawler. It is fairly simple but I'd like to ...
2
votes
1
answer
160
views
Optimizing Java HTML parser
I wrote a program that goes through a webpage and returns matches of regex. I used it on my letterboxd.com account to go through all of my movies (over 900 entries) and then find genres field for each ...
5
votes
1
answer
461
views
Finding shortest paths in a Wikipedia article graph using Java
(See also Finding shortest paths in a Wikipedia article graph using Java - second attempt.)
I have this sort of a web crawler that asks for two (English) Wikipedia article titles (the source and the ...
2
votes
2
answers
280
views
Java web scraping robots
I am developing application that goes through 2 websites and gets all the articles, but my code is identical in most parts, is there a way to optimize this code actually :/ (TL and DN are the naming ...
0
votes
1
answer
857
views
Implementation of bridge design pattern for a web scraping app - follow-up
Earlier today I tried to implement an example of the bridge design pattern, but I ended up misinterpreting it.
I made a lot of changes:
...
0
votes
1
answer
177
views
Implementation of Bridge Design Pattern
I made an implementation of the Bridge Pattern to handle ever-changing in crawler APIs that I'm using in my APP.
...