String compression implementation in C

Question

I implemented basic string compression algorithm that uses the counts of repeated characters. For example: the string aabcccccaaa would become a2b1c5a3. What do you think about this, is there a better way to do this?

#include<stdio.h>
#include<stdlib.h>
#include<string.h>

char* compressString(char* arr, int size);

int main() {

  char arr[] = "aabcccccaa";
  char *str;
  printf("Before compression: %s\n", arr);
  str = compressString(arr, strlen(arr));
  printf("After compression: %s\n", str);
  // free allocated memory
  free(str);
  return 0;
}

char* compressString(char* str, int len) {
  char last = str[0];
  char *buf = (char *)malloc(len * sizeof(char));
  int count = 1;
  int j = 0;

  for (int i = 1; i < len; i++) {
    if (last == str[i]) {
      count++;
    } else {
      buf[j++] = last;
      buf[j++] += count + '0';
      last = str[i];
      count = 1;
    }
  }
  buf[j++] = last;
  buf[j] += count + '0';
  buf[j] = '\0';
  return buf;
}

Thoughts: const correctness, comment on encoding of count (why single character/digit?), look into PackBits to see how to avoid encoding a length "for every source character change". — greybeard, Jan 8 at 5:38
Please don't update your code with changes after you've received answers, see What to do after receiving answers for more information. — Quill, Jan 8 at 6:18
Seeing your edit to revision 5: it is detrimental to have encode run in more than one place. — greybeard, Jan 8 at 6:21

JS1 · Accepted Answer · 2016-01-08 05:50:22Z

up vote 3 down vote accepted

A couple problems

You don't allocate enough space for the return buffer. You need to allocate 2*len + 1 bytes to handle the worst cast scenario. The +1 is for the null terminating byte.
If the count goes above 9, you will output a non-digit character instead of a digit. If the count goes above 256, the digit will wrap around back to '1' and your compression will have failed to encode the original string.

answered Jan 8 at 5:50

JS1

15.5k21559

2

@CodeCrack: perhaps you should get a rough overview of existing compression algorithms before reinventing the wheel in a particularly non-circular shape. For one thing this would show you all the ways that have been invented for storing counts efficiently (variable-length encodings based on bytes, nibbles or bits, Huffman-encoding of counts, arithmetic coding of counts, and so on). Run-length encodings like yours are often found in bitmaps (unsurprisingly called RLE bitmaps); looking at those might give you interesting ideas. Most of your questions have already been answered ages ago... – DarthGizka Jan 8 at 7:01

add a comment |

asked	3 months ago
viewed	79 times
active	3 months ago

current community

your communities

more stack exchange communities

String compression implementation in C

1 Answer 1

A couple problems

Your Answer

Not the answer you're looking for? Browse other questions tagged algorithm c strings compression or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

String compression implementation in C

1 Answer 1

A couple problems

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged algorithm c strings compression or ask your own question.

Linked

Related

Hot Network Questions