Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

Given a string of length n, print all permutation of the given string.

The code bellow generates all permutations (with repetitions) for the given string and stores them in object res. Could you suggest ways to make the code faster (or a completely different algorithm written in R)? (For the "ABRACADABRA" word it takes more than 9 minutes to complete)

#############################################################
word <- "ABRACADABRA"
#############################################################

vec <- unlist(strsplit(word,""))
charTable <- sort(table(vec),decreasing = TRUE)
totChars  <- sum(charTable)

combTable <-choose(c(totChars,totChars-cumsum(charTable)),charTable)
combTable <-combTable[combTable>0]
names(combTable) <- names(charTable)
totCombs <- prod(combTable)

# initialize res
res <- data.frame(matrix(rep(NA,totCombs*totChars),ncol=totChars))


for(i in 1:length(combTable)){
  totRows <- prod(combTable[1:i])
  copyRows <- prod(combTable[1:(i-1)])

  if (totRows>copyRows){ 
    tmp <- res[1:copyRows,]
    for(j in 1:(combTable[i]-1)){
      res[(j*copyRows+1):((j+1)*copyRows),] <- tmp
    }
  }

  tmpCombs <- t(combn(ifelse(i>1,totChars-cumsum(charTable)[i-1],totChars), charTable[i]))
  letr <- names(charTable)[i]

  if(i==1){
    for(k in 1:totRows){
      res[k,tmpCombs[k,]] <- letr
    }
  } else if (i>1 & totRows==copyRows){
    for(k in 1:totRows){
      res[k,is.na(res[k,])][tmpCombs[1,]] <- letr
    }
  } else{
    grpRows <- prod(combTable[1:(i-1)])

    for(k in 1 : combTable[i]){ 
      for(m in ((k-1)*grpRows+1):((k)*grpRows)){ 
        res[m,is.na(res[m,])][tmpCombs[k,]] <- letr
      }
    }
  }
}
share|improve this question
    
Should the algorithm take advantage of repetitions at the cost of code simplicity? Can we assume a limit on the input size, e.g. will the output fit in memory? – flodel Apr 8 at 23:15
    
My aim is to produce exactly the number of results, without having to generate first the full factorial and then filter the duplicates, to avoid running out of memory on the first step. – George Dontas Apr 9 at 5:10
up vote 3 down vote accepted

This is one of many instances where people would be tempted to say, "see, loops in R are slow". And my response is, "loops aren't that slow, but you can put slow code in loops."

There's no reason res needs to be a data.frame. Leave it as a matrix and your code runs in under a second on my laptop.

# initialize res
res <- matrix(rep(NA,totCombs*totChars),ncol=totChars)

The lesson here is:

If you don't need the multi-typed columns of a data.frame, use a matrix.


You might look into factoradics as another possible algorithm. That would allow you to generate permutations in lexicographic order.

share|improve this answer
    
Thank you for the lesson learned! – George Dontas Apr 10 at 5:56

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.