Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I am trying to fill values based on group, in my case id. I would like to fill the missing values according to the available date info for each id.

   id  date
1   1 23-04
2   1 23-04
3   1  <NA>
4   1  <NA>
5   2 24-04
6   2  <NA>
7   2  <NA>
8   2  <NA>
9   3 23-04
10  3  <NA>
11  3  <NA>
12  3  <NA>
13  4  <NA>
14  4  <NA>
15  4  <NA>
16  4  <NA>

What I need is:

    id  date
1   1 23-04
2   1 23-04
3   1 23-04
4   1 23-04
5   2 24-04
6   2 24-04
7   2 24-04
8   2 24-04
9   3 23-04
10  3 23-04
11  3 23-04
12  3 23-04
13  4  <NA>
14  4  <NA>
15  4  <NA>
16  4  <NA>

I figured out a loop, but I would like to avoid it because my data has 23 millions rows:

for(i in 2:nrow(dta)){
  if(dta$id[i-1] == dta$id[i])
  {
    dta$date[i] = dta$date[i-1]
  }
}

I cannot figure out how to translate this into dplyr syntax:

dta = structure(list(id = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"
), class = "factor"), date = structure(c(1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, NA, NA, NA, NA), .Label = c("23-04", 
"24-04"), class = "factor")), .Names = c("id", "date"), row.names = c(NA, 
-16L), class = "data.frame")
share|improve this question

1 Answer 1

up vote 2 down vote accepted

In R this is usually solved using the na.locf (Last Observation Carried Forward) function from the zoo package.

See also here:


# test data
x <- read.table(text="id;date
1;23-04
1;23-04
1;NA
1;NA
2;24-04
2;NA
2;NA
2;NA
3;23-04
3;NA
3;NA
3;NA
4;NA
4;NA
4;NA
4;NA", header=TRUE, sep=";")

library("dplyr")
library("zoo")

x %>% group_by(id) %>% transmute(date=na.locf(date, na.rm=FALSE))
Source: local data frame [16 x 2]
Groups: id

   id  date
1   1 23-04
2   1 23-04
3   1 23-04
4   1 23-04
5   2 24-04
6   2 24-04
7   2 24-04
8   2 24-04
9   3 23-04
10  3 23-04
11  3 23-04
12  3 23-04
13  4    NA
14  4    NA
15  4    NA
16  4    NA

Another option are rolling self-joins supported by the data.table package (see here).

share|improve this answer
    
interesting I will try it on the whole sample –  giacomoV Jul 27 at 15:52

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.