Converting a string to byte-array without using an encoding (byte-by-byte)

Question

How do I convert a string to a byte[] in .NET (C#)?

Also, why should encoding be taken into consideration? Can't I simply get what bytes the string has been stored in? Why is there a dependency on character encodings?

Every string is stored as an array of bytes right? Why can't I simply have those bytes? — Agnel Kurian, Jan 23 '09 at 14:05
The encoding is what maps the characters to the bytes. For example, in ASCII, the letter 'A' maps to the number 65. In a different encoding, it might not be the same. The high-level approach to strings taken in the .NET framework makes this largely irrelevant, though (except in this case). — Lucas Jones, Apr 13 '09 at 14:13
To play devil's advocate: If you wanted to get the bytes of an in-memory string (as .NET uses them) and manipulate them somehow (i.e. CRC32), and NEVER EVER wanted to decode it back into the original string...it isn't straight forward why you'd care about encodings or how you choose which one to use. — Greg, Dec 1 '09 at 19:47
Surprised no-one has given this link yet: joelonsoftware.com/articles/Unicode.html — Bevan, Jun 29 '10 at 2:57
A char is not a byte and a byte is not a char. A char is both a key into a font table and a lexical tradition. A string is a sequence of chars. (A words, paragraphs, sentences, and titles also have their own lexical traditions that justify their own type definitions -- but I digress). Like integers, floating point numbers, and everything else, chars are encoded into bytes. There was a time when the encoding was simple one to one: ASCII. However, to accommodate all of human symbology, the 256 permutations of a byte were insufficient and encodings were devised to selectively use more bytes. — George, Aug 28 '14 at 15:43

Zhaph - Ben Duguid · Answer 1 · 2009-01-23 14:03:30Z

up vote 53 down vote

You need to take the encoding into account, because 1 character could be represented by 1 or more bytes (up to about 6), and different encodings will treat these bytes differently.

Joel has a posting on this:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

answered Jan 23 '09 at 14:03

Zhaph - Ben Duguid
20.6k44588

4

"1 character could be represented by 1 or more bytes" I agree. I just want those bytes regardless of what encoding the string is in. The only way a string can be stored in memory is in bytes. Even characters are stored as 1 or more bytes. I merely want to get my hands on them bytes. – Agnel Kurian Jan 23 '09 at 14:07

8

You don't need the encodings unless you (or someone else) actually intend(s) to interpret the data, instead of treating it as a generic "block of bytes". For things like compression, encryption, etc., worrying about the encoding is meaningless. See my answer for a way to do this without worrying about the encoding. – Mehrdad Apr 30 '12 at 7:54

4

@Mehrdad - Totally, but the original question, as stated when I initially answered, didn't caveat what OP was going to happen with those bytes after they'd converted them, and for future searchers the information around that is pertinent - this is covered by Joel's answer quite nicely - and as you state within your answer: provided you stick within the .NET world, and use your methods to convert to/from, you're happy. As soon as you step outside of that, encoding will matter. – Zhaph - Ben Duguid Apr 30 '12 at 10:48

add a comment |

cyberbobcat · Answer 2 · 2009-01-23 13:43:58Z

up vote 2 down vote

// C# to convert a string to a byte array.
public static byte[] StrToByteArray(string str)
{
    System.Text.ASCIIEncoding  encoding=new System.Text.ASCIIEncoding();
    return encoding.GetBytes(str);
}


// C# to convert a byte array to a string.
byte [] dBytes = ...
string str;
System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
str = enc.GetString(dBytes);

answered Jan 23 '09 at 13:43

cyberbobcat
70511431

3

1) That will lose data due to using ASCII as the encoding. 2) There's no point in creating a new ASCIIEncoding - just use the Encoding.ASCII property. – Jon Skeet Jan 27 '09 at 6:35

add a comment |

gkrogers · Answer 3 · 2009-01-23 13:43:18Z

up vote 15 down vote

byte[] strToByteArray(string str)
{
    System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
    return enc.GetBytes(str);
}

answered Jan 23 '09 at 13:43

gkrogers
5,60421831

3

This doesn't always work. Some special characters can get lost in using such a method I've found the hard way. – JB King Jan 23 '09 at 17:14

1

if the charset was utf it wouldn't work! – ahmadali shafiee Sep 18 '12 at 6:27

| show 1 more comment

asked	6 years ago
viewed	886435 times
active	1 month ago

current community

your communities

more stack exchange communities

Converting a string to byte-array without using an encoding (byte-by-byte)

33 Answers 33

protected by Paŭlo Ebermann Jun 27 '13 at 19:25

Not the answer you're looking for? Browse other questions tagged c# .net string or ask your own question.

Visit Chat

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Converting a string to byte-array without using an encoding (byte-by-byte)

33 Answers 33

protected by Paŭlo Ebermann Jun 27 '13 at 19:25

Not the answer you're looking for? Browse other questions tagged c# .net string or ask your own question.

Visit Chat

Linked

Related

Hot Network Questions