My goal is to iterate over a datamatrix, calculate mean, standard error of the mean, and then have a bar plot representing the 2 groups of my data matrix in R.
My code below works as a for loop. However, my initial goal was to output each 8 plots in one .png file. To that end, I posted the question here.
One of the 'comments' was to use an 'apply' and 'function'. I never wrote a code in R using apply or function.
Here's my input file:
TranscriptID GeneID Biotype TranscriptName CommonNAme GeneName TSS-ID Locus-ID DNp63D-DMECs-1 DNp63D-DMECs-2 DNp63D-DMECs-3 DNp63WTMECs-1 DNp63WTMECs-2 Fold 2-tailedtest
Test1 TestA protein_coding Fun1 Ex1 Ex1 ExA1 ExA1 1.15E-08 2.68E-12 0.005077929 4.99E-07 6.38E-08 6.02E+03 0.495089687
Test2 TestB protein_coding Fun2 Ex2 Ex2 ExA2 ExA2 3.69E-08 0.014129129 0.075213367 0.121370367 0.404553833 1.13E-01 0.123434776
Test3 TestC protein_coding Fun3 Ex3 Ex3 ExA3 ExA3 4.89E-05 0 0 6.58E-05 1.64E-34 4.96E-01 0.643007583
Test4 TestA protein_coding Fun4 Ex4 Ex4 ExA4 ExA4 0.058629449 0 0 0.056200966 0.253314667 1.26E-01 0.180082201
Test5 TestB protein_coding Fun5 Ex5 Ex5 ExA5 ExA5 7.80E-06 0 0 1.42E-11 4.20E-36 3.66E+05 0.495026427
Test6 TestC protein_coding Fun6 Ex6 Ex6 ExA6 ExA6 0 0 0 0 2.41E-101 0.00E+00 0.272228401
Test7 TestA protein_coding Fun7 Ex7 Ex7 ExA7 ExA7 3.77E-08 0.023945749 0.077103517 0.262936167 0.2940195 1.21E-01 0.004479038
Test8 TestB protein_coding Fun8 Ex8 Ex8 ExA8 ExA8 9.30E-09 4.82E-14 0.000827853 8.19E-07 7.47E-07 3.52E+02 0.496141526
Here is my for
loop code:
input <- read.delim(file="MECs-DNp63IsoformLevels.txt", header=TRUE, sep="\t")
input<-as.matrix(input)
for (i in 1:nrow(input)) {
mean1 <- mean(as.numeric(input[i,12:13]))
mean2 <- mean(as.numeric(input[i,9:11]))
sd1 <- sd(as.numeric(input[i,12:13]))
sd2 <- sd(as.numeric(input[i,9:11]))
sem1 <- sd2/sqrt(length(input[i,12:13]))
sem2 <- sd1/sqrt(length(input[i,9:11]))
mean_sem <- data.frame(mean=c(mean1, mean2), sem=c(sem1, sem2), group=c("WT", "DNp63D-D"))
mean_sem$group<-factor(mean_sem$group, levels=mean_sem$group, ordered=TRUE) #this prevents ggplot from ordering the x-axis alphabaetically and keeps the order as the input dataframe
theme_set(theme_gray(base_size = 20))
print(i)
p<- ggplot(mean_sem, aes(x=group, y=mean)) +
geom_bar(stat='identity', width=.3, colour="black", fill=c("blue", "red")) +
geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem),
width=.2) +
geom_line(aes(colour=group)) +
scale_colour_manual(values=c("blue", "red")) +
xlab('Genotype of MECs') +
ylab('Quantile Norm FPKM')
q = p +ggtitle(input[i,5])
ggsave(filename=paste(input[i,5],'.png', sep=""), plot=q)
}
Here is my code with the apply function:
input <- read.delim(file="MECs-DNp63IsoformLevels.txt", header=TRUE, sep="\t")
input<-as.matrix(input)
apply(input, 1, function(input) { mean1=mean(as.numeric(input[9:11]))
mean2=mean(as.numeric(input[12:13]))
sd1= sd(as.numeric(input[9:11]))
sd2 = sd(as.numeric(input[12:13]))
sem1= sd1/sqrt(length(input[9:11]))
sem2= sd2/sqrt(length(input[12:13]))
mean_sem = data.frame(mean=c(mean1, sem1), sem=c(sd1, sem2), group=c("WT", "DNp63D-D"))
pdf("Test.pdf")
p=ggplot(mean_sem, aes(x=group, y=mean))+ geom_bar(stat='identity', width=.3, colour="black", fill=c("blue", "red"))+ geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem), width=.2)+geom_line(aes(colour=group)) + scale_colour_manual(values=c("blue", "red")) + xlab('Genotype of MECs') + ylab('Quantile Norm FPKM')
q = p +ggtitle(input[1])
ggsave(filename=paste(input[1],'.png', sep=""), plot=q) #plots each figure and gives it a name similar to column 1.
## I need it to plot each 8 figures in 1 png file
})
Can my for
loop or function code be optimized? This is literally my first function/apply code I wrote in R and it was more trial and error on the syntax as I saw some people using 'c' with others using multiple {{}{}
in each function.
## I need it to plot each 8 figures in 1 png file
? Can you provide 16 lines of data so we see which column can be used to group rows 8 by 8? – flodel Feb 3 at 12:29*apply
, you can usefacet_wrap
when plotting afterwards, which would likely give you what you're looking for. – alistaire Feb 10 at 17:22*apply
function, you'll have to use<<-
to break out of its environment so you can append your computations to something; with afor
loop, you can just use<-
. Note that pre-allocating a data structure of the correct dimensions can speed up your loop, if necessary. You could probably also refactor your code withdplyr
, which is usually convenient for these kinds of operations. – alistaire Feb 10 at 21:24