I'm having some performance issues (speed of execution) with a very simple task, explained below. My code is to dumb to show here, so i'll explain the problem in detail:
Consider the following sql table, which stores daily rainfall data over ~12 years:
historic data
year, month, day, rainfall_observed
2001, 1, 1, 4
2001, 1, 2, 0
...
2012, 12, 31, 10
Now, consider another table with identical structure, but holding ~1 year of daily forecast data:
forecast
year, month, day, rainfall_forec
2013, 1, 1, 0
2013, 1, 2, 3
...
2013, 12, 31, 15
The problem: for each day of forecast data, count the same (month, day) in historic data, such that rainfall_observed > rainfall_forec.
Example
Consider the forecast entry
year = 2013, month = 3, day = 15, rainfall_forec = 10
Then i need to count in historic data every entry with month = 3 AND day = 15 WHERE rainfall observed > 10 (easy with SQL). So, my output is a list with a count for each forecast day.
There's a lot way of doing this:
1 - Use a SQL query using inside a for loop (1 query per forecast day = 365 queries in a loop - i didn't test it yet)
2 - Query all the observed data and use nested loops for comparison (~ 365 * 12 * 365 comparisons...)
3 - Do some mix of the above or some math with Dates (but here missing data is a issue)
I'm currently using option 2, but i need it to be faster. I'll try option 1, but maybe i'm missing a simpler solution.
So, i appreciate any advice on how to solve this. Thanks!