Algorithm for correcting missing values

Question

I have problem with setting missing values in data frame. In the first 3 columns there are ID of product, ID of store, and number of week. There are also 28 columns from 4 to 31 corresponding to last 28 days of selling item (last 7 days are days in our week). I want to set the missing values by comparing two records with the same first and second column but different number of weeks.

corrections <- function(x,y){
  #the functions changes vector y if the difference between weeks is not greeter than 3
  if (x[1]==y[1] && x[2]==y[2] && -(x[3]-y[3])<=3){
    t=y[3]-x[3]
    t=as.integer(t)
    a=x[(4+ (t*7) ):31]        
    b=y[4:(31- (t*7)) ]
    c= a-b

    for (i in 1:(28-(t*7))){
      if (is.na(c[i]))
      {
        if (!(is.na(a[i]) && is.na(b[i])))
        {
          if (is.na(b[i]))
            b[i]=a[i]
          else
            a[i]=b[i]
        }
      }    
    }
    y[4:(31- t*7)]=b
  }
  return(y)
}


for (i in 2:(dim(salesTraining)[1]) {
  salesTraining[i,]=corrections(salesTraining[i-1,], salesTraining[i,])
}

the loop takes 1 minute for every 1000 records so if my data have 212000 records it will take ~3,5 hours (if it's linear complexity). Is there any error or can I do it better - faster?

Example of data frame:

   productID storeID weekInData dailySales1 dailySales2 dailySales3 dailySales4 dailySales5
1          1       1         37           0           0           0           0           0
2          1       1         38           0           0           0           0           0
3          1       1         39           0           0           0           0           0
4          1       1         40           0          NA           0          NA           2
5          1       1         41          NA           0          NA           0           0
6          1       1         42           0           0           0          NA           0
7          1       1         43           0           0          NA           0          NA
8          1       1         44           0           2           1          NA           0
9          1       1         45          NA           0           0          NA           0
10         1       1         46          NA           0           0          NA          NA
   dailySales6 dailySales7 dailySales8 dailySales9 dailySales10 dailySales11 dailySales12 dailySales13
1           NA          NA           0          NA            0            0            0            0
2            0          NA          NA           0            0            0            0            0
3            0          NA           0           0            0           NA            2           NA
4            0          NA           0          NA            0           NA            0            0
5            0           0          NA           0            0            0            0            0
6           NA           0          NA           0            0            0            0            0
7            0           0           0           2           NA            0            0            0
8            0          NA           0          NA            0           NA            0            1
9            1           0           0           0            0            0            1            0
10           0           0           0          NA            0           NA            0            0
   dailySales14 dailySales15 dailySales16 dailySales17 dailySales18 dailySales19 dailySales20
1             0            0            0            0            0            0            0
2             0            0            0            0            5            2           NA
3             0            0            0            0            0            0            0
4             0            0            0            0            0            0            0
5             0            0            0            0            0            0            0
6             0            0            2            1            0            0           NA
7             0            0            0            0            0            0            1
8             0            0            0            0            0            1            0
9             0            0           -1            0            0            0            0
10            0            0            0            0            0            0            0
   dailySales21 dailySales22 dailySales23 dailySales24 dailySales25 dailySales26 dailySales27
1            NA            0            0            0            5            2            0
2             0            0            0            0            0            0            0
3             0            0            0            0            0            0            0
4             0            0            0            0            0            0            0
5             0            0           NA            1            0            0            0
6             0            0            0            0            0            0            1
7             0            0            0            0            0            1            0
8             0            0           NA            0            0            0            0
9            NA            0            0            0           NA            0            0
10            0            1            0            0            0            0            0
   dailySales28 daysStoreClosed_series daysStoreClosed_target dayOfMonth dayOfYear weekOfYear month
1             0                      5                      2         23       356         51    12
2             0                      6                      2         30       363         52    12
3             0                      6                      1          6         5          1     1
4             0                      6                      1         13        12          2     1
5             0                      6                      1         19        18          3     1
6             0                      5                      1         26        25          4     1
7             0                      4                      1          2        32          5     2
8             0                      4                      1          9        39          6     2
9             0                      4                      1         16        46          7     2
10            0                      4                      1         23        53          8     2
   quarter
1        4
2        4
3        1
4        1
5        1
6        1
7        1
8        1
9        1
10       1

I can help but I'd much prefer help you find a better format for your data at the same time... If each week (row) contains 28 days of sales, won't you have a lot of duplicated data? For example won't dailySales28 on week n match dailySales21 on week n+1? Would it not be better if each row only contained 7 days of data instead? Eventually, I feel a data.frame with daily data would be more appropriate, although it could depend on the type of data analysis you are planning. — flodel, May 16 at 16:25
Are you talking straight hot-deck imputation? Have you tried any of the packages suggested in the SocialSciences TaskView, such as mix, mitools, or even Hmisc? (With something that common -- in surveys for example -- it's very likely already implemented and efficiently so.) — r2evans, Jun 8 at 20:30

asked	3 months ago
viewed	45 times

current community

your communities

more stack exchange communities

Algorithm for correcting missing values

Your Answer

Browse other questions tagged algorithm r or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Algorithm for correcting missing values

Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Browse other questions tagged algorithm r or ask your own question.

Related

Hot Network Questions