I'm writing a code which task is to grow Random Forest trees based on multiple parameters. In short:
- Firstly, I declare a data frame in which model parameters and some stats will be saved.
- Secondly, I declare model parameters and the loop iterator (it will be showed after every loop iteration).
- Next, I have a nested loops with the model and prediction function.
- Furthermore, parameters and some stats from the confusion matrix are saved to the dataframe.
- Additionally, the number of iterations is printed and counted.
- Last but not least, garbage collector is called.
The code looks like this:
## data frame in which model parameters and some stats will be saved
model_eff <- data.frame("ntrees" = numeric(0),
"zeros" = numeric(0),
"mvars"= numeric(0),
"eff" = numeric(0),
"0_0" = numeric(0),
"0_1" = numeric(0),
"1_0" = numeric(0),
"1_1" = numeric(0),
"predict_sum" = numeric(0),
"triangle" = numeric(0))
## parameteres
ntrees <- c(300, 500)
zeros <- sum(train.target) * c(1, 2, 3, 4, 5)
mvars <- c(30, 50, 70, 90, 110, 130)
## loop counter
i = 1
## loop with model, prediction etc.
for (j in 1:length(ntrees)){
for (k in 1:length(zeros)){
for (l in 1:length(mvars)){
## i-th model
model <- randomForest(train,
y = as.factor(train.target),
ntree = ntrees[j],
do.trace = T,
sampsize = c('0' = zeros[k], '1' = sum(train.target)),
mtry = mvars[l])
## prediction - my function, apart from a regular prediction
## outputs additional info
predict.model(model, val, val.target)
## inserting model parameters and stats to a data frame for further comparisons
model_eff <- rbind(model_eff,
c("ntrees" = ntrees[j],
"zeros" = zeros[k],
"mvars"= mvars[l],
"eff" = eff_measures$eff,
"0_0" = eff_measures$c.m[1, 1],
"0_1" = eff_measures$c.m[1, 2],
"1_0" = eff_measures$c.m[2, 1],
"1_1" = eff_measures$c.m[2, 2],
"predict_sum" = sum(TARGET3),
"triangle" = eff_measures$triangle))
## printing the number of iteration
cat("iteration =", i)
i <- i+1
## calling garbage collector to assure free space in RAM
gc()
}
}
}
I have already split the train/validation data sets and their target variables, knowing that Random Forest deals with such data mor efficiently. I also tried to use the "foreach" package for parallelizing computations, however, the growing time for only one tree was 10-15% longer than without using all the cores.
I would like to know if I can shorten the time of execution of this code, especially if there is a way to avoid multiple loops since I heard that they are not the best way of programming in R.