I'm tinkering with the idea of crafting a search engine in my spare time. More of a learning experience than anything at this point, but still a project. A key aspect of this system is checking whether a domain is live or not. That's what this code is trying to do (and succeeding).
It's written in Java, and using outside classes means that the code is pre-obfuscated! Yay!
Source Code
import java.util.ArrayList;
import java.util.Iterator;
public class BaseCheckDriver extends Thread{
public static void main(String[] args) {
long start = System.currentTimeMillis(), end;
String query = (args.length > 0 && args[0].equals(false) ?
"select * from ... where is_live is null" :
"select * from ...");
int numThreads = 16; // can be changed to however many
ArrayList<Object[]> results = Database.query(query, null);
Database.update("update ... set is_live = null where is_live is not null limit " + (results.size() + 1), null); // gets around "safe updates" and allows for easy monitoring
// distribute results to lists
ArrayList<ArrayList<Object[]>> listContainer = new ArrayList<ArrayList<Object[]>>();
for(int i = 0;i<numThreads;i++) listContainer.add(new ArrayList<Object[]>());
for(Object[] row : results){
int addTo = 0;
for(int i=1;i<listContainer.size();i++)
if(listContainer.get(i).size() < listContainer.get(i - 1).size()) addTo = i;
listContainer.get(addTo).add(row);
}
// distribute lists to threads
ArrayList<Thread> threadContainer = new ArrayList<Thread>();
for(int i = 0;i<numThreads;i++) threadContainer.add(new BaseCheckDriver(listContainer.get(i)));
for(Thread thread : threadContainer) thread.start();
// let threads execute
try{
for(Thread thread : threadContainer) thread.join();
}catch(InterruptedException e){
e.printStackTrace();
}
end = System.currentTimeMillis();
System.out.println("All done!");
System.out.println("\tTotal execution time: " + (end - start) + "ms");
System.out.println("\tAverage execution time: " + ((end - start) / results.size()) + "ms");
}
// now the fun begins
private ArrayList<Object[]> results;
public BaseCheckDriver(ArrayList<Object[]> results){
this.results = results;
}
public void run(){
System.out.println(Thread.currentThread().getName() + " Started!");
long start = System.currentTimeMillis(), end;
Iterator<Object[]> resultIterator = results.iterator();
while(resultIterator.hasNext()) Indexer.indexBase(resultIterator.next());
end = System.currentTimeMillis();
System.out.println(Thread.currentThread().getName() + " Done!");
System.out.println("\tTotal execution time: " + (end - start) + "ms");
System.out.println("\tAverage execution time: " + ((end - start) / results.size()) + "ms");
}
}
Any improvements would be appreciated. Speed is at a good spot right now (stats below), but one thing that I notice is that although the distribution is technically even, the execution times are far from it. A page that isn't live will take longer to look for than one that is (more often than not) and a page behind a very slow connection may take longer than that. An example: this last run had Thread 12 being the last to complete, running 34 rows behind Thread 7, which was another 34 rows behind Thread 9.
I guess the best solution would be some form of opportunistic distribution, but I have no idea how to go about that. Essentially what I'm thinking of is passing a row off to a thread as soon as it's not busy, thus having all threads finish at about the same time.
Output stats (status data removed for clarity):
... Thread-15 Done! Total execution time: 21167099ms Average execution time: 25813ms ... Thread-2 Done! Total execution time: 21201090ms Average execution time: 25823ms ... Thread-10 Done! Total execution time: 21457947ms Average execution time: 26168ms ... Thread-13 Done! Total execution time: 21608962ms Average execution time: 26352ms ... Thread-4 Done! Total execution time: 21627681ms Average execution time: 26343ms ... Thread-11 Done! Total execution time: 21638154ms Average execution time: 26387ms ... Thread-5 Done! Total execution time: 21824853ms Average execution time: 26583ms ... Thread-3 Done! Total execution time: 21890344ms Average execution time: 26663ms ... Thread-6 Done! Total execution time: 21900767ms Average execution time: 26675ms ... Thread-0 Done! Total execution time: 21909558ms Average execution time: 26686ms ... Thread-8 Done! Total execution time: 21930624ms Average execution time: 26712ms ... Thread-14 Done! Total execution time: 22053145ms Average execution time: 26894ms ... Thread-1 Done! Total execution time: 22091676ms Average execution time: 26908ms ... Thread-9 Done! Total execution time: 22167669ms Average execution time: 27033ms ... Thread-7 Done! Total execution time: 22626100ms Average execution time: 27559ms ... Thread-12 Done! Total execution time: 23560248ms Average execution time: 28732ms All done! Total execution time: 23562053ms Average execution time: 1794ms
Update - Revised Code
I took a suggestion from the comments and switched to ExecutorService, and it makes things much easier (after I figured out how it worked and wrote a wrapper)! Also some miscellaneous cleanup and more comments.
Source Code
Output
All done!
Total execution time: 22415396ms
Average execution time: 1707ms
As you can see, execution time is actually faster with the same number of threads as distributed by ExecutorService. This really does help things!