Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I implemented basic string compression algorithm that uses the counts of repeated characters. For example: the string aabcccccaaa would become a2b1c5a3. What do you think about this, is there a better way to do this?

#include<stdio.h>
#include<stdlib.h>
#include<string.h>

char* compressString(char* arr, int size);

int main() {

  char arr[] = "aabcccccaa";
  char *str;
  printf("Before compression: %s\n", arr);
  str = compressString(arr, strlen(arr));
  printf("After compression: %s\n", str);
  // free allocated memory
  free(str);
  return 0;
}

char* compressString(char* str, int len) {
  char last = str[0];
  char *buf = (char *)malloc(len * sizeof(char));
  int count = 1;
  int j = 0;

  for (int i = 1; i < len; i++) {
    if (last == str[i]) {
      count++;
    } else {
      buf[j++] = last;
      buf[j++] += count + '0';
      last = str[i];
      count = 1;
    }
  }
  buf[j++] = last;
  buf[j] += count + '0';
  buf[j] = '\0';
  return buf;
}
share|improve this question
    
Thoughts: const correctness, comment on encoding of count (why single character/digit?), look into PackBits to see how to avoid encoding a length "for every source character change". – greybeard Jan 8 at 5:38
1  
Please don't update your code with changes after you've received answers, see What to do after receiving answers for more information. – Quill Jan 8 at 6:18
    
Seeing your edit to revision 5: it is detrimental to have encode run in more than one place. – greybeard Jan 8 at 6:21
up vote 3 down vote accepted

A couple problems

  • You don't allocate enough space for the return buffer. You need to allocate 2*len + 1 bytes to handle the worst cast scenario. The +1 is for the null terminating byte.

  • If the count goes above 9, you will output a non-digit character instead of a digit. If the count goes above 256, the digit will wrap around back to '1' and your compression will have failed to encode the original string.

share|improve this answer
2  
@CodeCrack: perhaps you should get a rough overview of existing compression algorithms before reinventing the wheel in a particularly non-circular shape. For one thing this would show you all the ways that have been invented for storing counts efficiently (variable-length encodings based on bytes, nibbles or bits, Huffman-encoding of counts, arithmetic coding of counts, and so on). Run-length encodings like yours are often found in bitmaps (unsurprisingly called RLE bitmaps); looking at those might give you interesting ideas. Most of your questions have already been answered ages ago... – DarthGizka Jan 8 at 7:01

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.