Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

In my data frame I assign values to the column Register by checking another column (Source) for specific values. Based on those values, the output in Register changes.

I am relatively new to R, but I have some coding knowledge. The following code contains lines that are all so similar that I can't help but wonder if it can't be done way simpler with a condition. In JavaScript, for instance, I'd do this with a case-function. I do not know, however, if something similar is possible in R.

d$Register <- "Banana placeholder"
d$Register[grep(".*/comp-a/.*", d$Source, perl=TRUE)] <- "a: spont-conv"
d$Register[grep(".*/comp-b/.*", d$Source, perl=TRUE)] <- "b: interv-ler-nl"
d$Register[grep(".*/comp-(c|d)/.*", d$Source, perl=TRUE)] <- "c/d: telefoon"
d$Register[grep(".*/comp-e/.*", d$Source, perl=TRUE)] <- "e: onderhandeling"
d$Register[grep(".*/comp-f/.*", d$Source, perl=TRUE)] <- "f: interv-radio-tv"
d$Register[grep(".*/comp-g/.*", d$Source, perl=TRUE)] <- "g: debat"
d$Register[grep(".*/comp-h/.*", d$Source, perl=TRUE)] <- "h: les"
d$Register[grep(".*/comp-i/.*", d$Source, perl=TRUE)] <- "i: spont-comm-radio-tv"
d$Register[grep(".*/comp-j/.*", d$Source, perl=TRUE)] <- "j: reportage-radio-tv"
d$Register[grep(".*/comp-k/.*", d$Source, perl=TRUE)] <- "k: nieuws-radio-tv"
d$Register[grep(".*/comp-l/.*", d$Source, perl=TRUE)] <- "l: comm-radio-tv"
d$Register[grep(".*/comp-m/.*", d$Source, perl=TRUE)] <- "m: misviering"
d$Register[grep(".*/comp-n/.*", d$Source, perl=TRUE)] <- "n: college"
d$Register[grep(".*/comp-o/.*", d$Source, perl=TRUE)] <- "o: voorgelezen"
share|improve this question

2 Answers 2

up vote 2 down vote accepted

It would be better to make a list of the parameters, and then use a loop to perform the assignments. That will reduce the duplication of logic, for example in the long grep(...).

params <- list(
  c("/comp-a/", "a: spont-conv"),
  c("/comp-b/", "b: interv-ler-nl"),
  c("/comp-(c|d)/", "c/d: telefoon"),
  c("/comp-e/", "e: onderhandeling"),
  c("/comp-f/", "f: interv-radio-tv"),
  c("/comp-g/", "g: debat"),
  c("/comp-h/", "h: les"),
  c("/comp-i/", "i: spont-comm-radio-tv"),
  c("/comp-j/", "j: reportage-radio-tv"),
  c("/comp-k/", "k: nieuws-radio-tv"),
  c("/comp-l/", "l: comm-radio-tv"),
  c("/comp-m/", "m: misviering"),
  c("/comp-n/", "n: college"),
  c("/comp-o/", "o: voorgelezen")
)

placeholder <- "PLACEHOLDER"
d$Register <- placeholder

for (i in 1:length(params)) {
  pattern.i <- params[[i]][1]
  label.i <- params[[i]][2]
  d$Register[grep(pattern.i, d$Source)] <- label.i
}

if (nrow(d[d$Register == placeholder]) > 0) stop('Register is not assigned in some rows')

Some other minor improvements:

  • Simplified the regular expressions: pattern will match the same things as .*pattern.*
  • No need for the perl = T parameter in grep
share|improve this answer
    
So, if I'm correct, your solution is more resource-friendly (and possibly faster) because it only once has to do a specific operation? –  Bram Vanroy Dec 21 '14 at 20:09
    
Not really. It's not more resource friendly, and not any faster. But it's still better, because it doesn't have duplicated logic. Btw I just noticed a few more possible improvements, see my updated post –  janos Dec 21 '14 at 20:29
    
At first I got an error: Error in $<-.data.frame(*tmp*, "Register", value = c("a: spont-conv", : replacement has 7159 rows, data has 7373, but after assigning a placeholder value to register before running the command it works. E.g. d$Register <- "none". –  Bram Vanroy Dec 22 '14 at 9:18
    
Yes, you still need the placeholder to create the column. I omitted it earlier for brevity. I added it now, with a validation at the end to warn for data where a value was not assigned. –  janos Dec 22 '14 at 9:52
1  
Ah, and now I also know how to throw an error in R! Thanks again! –  Bram Vanroy Dec 22 '14 at 10:06

This is more or less a code review. I am also learning R. There are some parts which you do "by hand", such as dealing with the duplication of topics ("c/d: telefoon"), but I coded a solution where everything is automated. Note however that it makes for a lot more code and you should probably stick to your original method, or @janos's, especially if you only have to build such a table once. But I am posting my solution here nonetheless since you might find something useful.

InvertListMapping <- function(list) {
  # Inverts the mapping relationship of a list: 
  # e.g. { 1 -> "a", 2 -> "a", 3 -> "b" } becomes { "a" -> c(1, 2), "b" -> 3} 
  inverted <- list()
  {
      lapply(names(list), function (key) {
            value <- toString(list[[key]])
            inverted[[value]] <<- c(inverted[[value]], key)
          })
  }
  return(inverted)  
}

ExtractAbbr <- function(groupname) {
  # Extract the source abbreviation: e.g. from "asdf/comp-b/asd" it extracts "b".
  res <- regexpr("/comp-(?<abbr>\\S+)/", groupname, perl=T)
  start <- attr(res, "capture.start")
  if (start < 1)
    stop(paste("Failed to extract the abbreviated newsgroup from:", groupname))
  length <- attr(res, "capture.length")
  return (substr(groupname, start, start + length - 1))
}

ConvertAbbrsToLegend <- function(topic2abbrs) {
  # Generates the map (list) from a set of abbreviations to the corresponding legend;
  # e.g. { a -> "a: spont-conv", c -> "c/d: telefoon", etc.}.
  abbrs2legend <- list()
  lapply(names(topic2abbrs), function (topic) {
        abbrs <- topic2abbrs[[topic]]
        legend <- paste(paste(abbrs, collapse = "/"), ": ", topic, sep = "")
        lapply(abbrs, function (abbr) { abbrs2legend[[abbr]] <<- legend })
      })
  return(abbrs2legend)
}

SourceToLegend <- function(sources, abbrs2legend) {
  # For each source string, it extracts the abbrevation and outputs
  # the corresponding legend.
  GetLegend <- function(source) {
    abbr <- ExtractAbbr(source)
    return(abbrs2legend[[abbr]])
  }
  return(Vectorize(GetLegend)(sources))
}


# Example:

abbr2topic <- list()
abbr2topic$a <- "spont-conv"
abbr2topic$b <- "interv-ler-nl"
abbr2topic$c <- "telefoon"
abbr2topic$d <- "telefoon"
sources <- c("aadf/comp-a/af", "vwef/comp-c/wefw")

topic2abbrs <- InvertListMapping(abbr2topic)
abbrs2legend <- ConvertAbbrsToLegend(topic2abbrs)

legends <- SourceToLegend(sources, abbrs2legend)

print(legends)  # "a: spont-conv"  "c/d: telefoon"

It throws an exception if it fails at some point, but maybe you could change that to have some dummy output value when a source is not recognized.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.