Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

Possible Duplicate Converting byte array to string and back again in C#

I am using Huffman Coding for compression and decompression of some text from here

The code in there builds a huffman tree to use it for encoding and decoding. Everything works fine when I use the code directly.

For my situation, i need to get the compressed content, store it and decompress it when ever need.

The output from the encoder and the input to the decoder are BitArray.

When I tried convert this BitArray to String and back to BitArray and decode it using the following code, I get a weird answer.

Tree huffmanTree = new Tree();
huffmanTree.Build(input);

string input = Console.ReadLine();
BitArray encoded = huffmanTree.Encode(input);

// Print the bits
Console.Write("Encoded Bits: ");
foreach (bool bit in encoded)
{
    Console.Write((bit ? 1 : 0) + "");
}
Console.WriteLine();

// Convert the bit array to bytes
Byte[] e = new Byte[(encoded.Length / 8 + (encoded.Length % 8 == 0 ? 0 : 1))];
encoded.CopyTo(e, 0);

// Convert the bytes to string
string output = Encoding.UTF8.GetString(e);

// Convert string back to bytes
e = new Byte[d.Length];
e = Encoding.UTF8.GetBytes(d);

// Convert bytes back to bit array
BitArray todecode = new BitArray(e);

string decoded = huffmanTree.Decode(todecode);

Console.WriteLine("Decoded: " + decoded);

Console.ReadLine();

The Output of Original code from the tutorial is:

enter image description here

The Output of My Code is:

enter image description here

Where am I wrong friends? Help me, Thanks in advance.

share|improve this question
 
Keep in mind that C# strings are UTF-16. –  antonijn Feb 3 '13 at 8:34
 
I tried ASCII with same weird answer. I tried Unicode (UTF-16) but give half right and half junk answer. Like Welcome gives WelcomESTT –  Gopikrishna S Feb 3 '13 at 8:38
add comment

2 Answers

You cannot stuff arbitrary bytes into a string. That concept is just undefined. Conversions happen using Encoding.

string output = Encoding.UTF8.GetString(e);

e is just binary garbage at this point, it is not a UTF8 string. So calling UTF8 methods on it does not make sense.

Solution: Don't convert and back-convert to/from string. This does not round-trip. Why are you doing that in the first place? If you need a string use a round-trippable format like base-64 or base-85.

share|improve this answer
add comment

I'm pretty sure Encoding doesn't roundtrip - that is you can't encode an arbitrary sequence of bytes to a string, and then use the same Encoding to get bytes back and always expect them to be the same.

If you want to be able to roundtrip from your raw bytes to string and back to the same raw bytes, you'd need to use base64 encoding e.g.

http://blogs.microsoft.co.il/blogs/mneiter/archive/2009/03/22/how-to-encoding-and-decoding-base64-strings-in-c.aspx

share|improve this answer
 
But base64 gives 4/3 of the input instead of compressing it. I need compression not encoding(SRC: wikipedia.org) –  Gopikrishna S Feb 3 '13 at 9:29
 
@GopikrishnaS Then you need to use byte[], not string. string is for character data, not binary. –  svick Feb 3 '13 at 11:01
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.