This script fetches a list of "end-positions" from a MySQL database. The positions are array positions that are the same in both MySQL and R.
Next, the script takes the previous 34 elements for each metric and calculates their correlation with a query series.
The top X highest correlated (in total for each metric) and bottom X lowest correlated rows are saved.
Finally the initial MySQL list is trimmed to contain only those highly correlated points.
I am very new to R, and I am pretty sure that I did a very inefficient job of implementing this. I am looking for response times under 1 second in length.
#Executed during initialization:
library(RODBC)
library(stats)
channel<- odbcConnect("data")
c<-mat.or.vec(3000,5) #will hold correlations
n1<-seq(-33,0)
z <- sqlQuery(channel,"SELECT RPos,M1,M2,M3,M4 FROM `data`.`z` ") #Get whole series
al_string <- "select RPos,OpenTime FROM z JOIN actionlist on(OpenTime = pTime)"
trim_string<- "DELETE FROM ActionList WHERE OpenTime NOT IN (SELECT OpenTime FROM ReducedList)"
#This segment is called repeatedly. Each time actionlist contains different positions.
actionlist<- sqlQuery(channel,al_string)
#SIMULATION: (x will be filled by something else in reality)
x <- sqlQuery(channel,"SELECT z.pTime AS pTime,M1,M2,M3,M4 FROM z JOIN (SELECT pTime FROM z ORDER BY rand() limit 1) AS X ON (z.pTime<=X.pTime AND z.pTime> x.pTime-(300*34))")
i<-1
This is the part I am most interested in speeding up:
GetTopN<-function(n)
{
for(i in 1:nrow(actionlist))
{
c[i,1]<-actionlist$OpenTime[i];
for(j in 2:ncol(z)) c[i,j]<-cor(z[actionlist$RPos[i]+n1,j],x[,j]);
}
avc <- (cbind(c[,1],rowSums(c[,2:5])));
topx <- c[order(avc[,2], decreasing=T),1][1:n];
bottomx <- c[order(avc[,2], decreasing=F),1][1:n];
DF<-as.data.frame(c(topx,bottomx),row.names=NULL);
colnames(DF)[1]<-"OpenTime";
sqlSave(channel,dat=DF,tablename="ReducedList",append=FALSE,rownames=FALSE,safer=FALSE);
sqlQuery(channel,trim_string);
}
c
. It's a very important base R function. 2) Now that you've renamed it, don't sort it twice. Sort it once, and then usehead
andtail
. – joran Feb 7 '12 at 4:02c[i,1]<-actionlist$OpenTime[i]
toc[,1]<-actionlist$OpenTime
and move it out of the loop. – Richie Cotton Feb 7 '12 at 11:33?Rprof
to know how you can check which part of your function is slowing everything down. – Joris Meys Feb 7 '12 at 15:31