Sign up ×
Stack Overflow is a community of 4.7 million programmers, just like you, helping each other. Join them, it only takes a minute:

How do I convert a string to a byte[] in .NET (C#)?

Also, why should encoding be taken into consideration? Can't I simply get what bytes the string has been stored in? Why is there a dependency on character encodings?

share|improve this question
6  
Every string is stored as an array of bytes right? Why can't I simply have those bytes? – Agnel Kurian Jan 23 '09 at 14:05
69  
The encoding is what maps the characters to the bytes. For example, in ASCII, the letter 'A' maps to the number 65. In a different encoding, it might not be the same. The high-level approach to strings taken in the .NET framework makes this largely irrelevant, though (except in this case). – Lucas Jones Apr 13 '09 at 14:13
9  
To play devil's advocate: If you wanted to get the bytes of an in-memory string (as .NET uses them) and manipulate them somehow (i.e. CRC32), and NEVER EVER wanted to decode it back into the original string...it isn't straight forward why you'd care about encodings or how you choose which one to use. – Greg Dec 1 '09 at 19:47
36  
Surprised no-one has given this link yet: joelonsoftware.com/articles/Unicode.html – Bevan Jun 29 '10 at 2:57
8  
A char is not a byte and a byte is not a char. A char is both a key into a font table and a lexical tradition. A string is a sequence of chars. (A words, paragraphs, sentences, and titles also have their own lexical traditions that justify their own type definitions -- but I digress). Like integers, floating point numbers, and everything else, chars are encoded into bytes. There was a time when the encoding was simple one to one: ASCII. However, to accommodate all of human symbology, the 256 permutations of a byte were insufficient and encodings were devised to selectively use more bytes. – George Aug 28 '14 at 15:43

33 Answers 33

A character is both a lookup key into a font table and a lexical tradition such as ordering, upper and lower case versions, etc.

Consequently, a character is not a byte (8-bits) and a byte is not a character. In particular, the 256 permutations of a byte cannot accommodate the thousands of symbols within some written languages, much less all languages. Hence, various methods for encoding characters have been devised. Some encode for a particular class of languages (ASCII encoding); multiple languages using code pages (Extended ASCII); or, ambitiously, all languages by selectively including additional bytes as needed, Unicode.

Within a system, such as the .Net framework, a String implies a particular character encoding. In .Net this encoding is Unicode. Since the framework reads and writes Unicode by default, dealing with character encoding is typically not necessary in .Net.

However, in general, to load a character string into the system from a byte stream you need to know the source encoding to therefore interpret and subsequently translate it correctly (otherwise the codes will be taken as already being in the system's default encoding and thus render gibberish). Similarly, when a string is written to an external source, it will be written in a particular encoding.

share|improve this answer

If you really want a copy of the underlying bytes of a string, you can use a function like the one that follows. However, you shouldn't please read on to find out why.

[DllImport(
        "msvcrt.dll",
        EntryPoint = "memcpy",
        CallingConvention = CallingConvention.Cdecl,
        SetLastError = false)]
private static extern unsafe void* UnsafeMemoryCopy(
    void* destination,
    void* source,
    uint count);

public static byte[] GetUnderlyingBytes(string source)
{
    var length = source.Length * sizeof(char);
    var result = new byte[length];
    unsafe
    {
        fixed (char* firstSourceChar = source)
        fixed (byte* firstDestination = result)
        {
            var firstSource = (byte*)firstSourceChar;
            UnsafeMemoryCopy(
                firstDestination,
                firstSource,
                (uint)length);
        }
    }

    return result;
}

This function will get you a copy of the bytes underlying your string, pretty quickly. You'll get those bytes in whatever way they are encoding on your system. This encoding is almost certainly UTF-16LE but that is an implementation detail you shouldn't have to care about.

It would be safer, simpler and more reliable to just call,

System.Text.Encoding.Unicode.GetBytes()

In all likelihood this will give the same result, is easier to type, and the bytes will always round-trip with a call to

System.Text.Encoding.Unicode.GetString()
share|improve this answer

OP's question: "How do I convert a string to a byte array in .NET (C#)?" [sic]

You can use the following code:

static byte[] ConvertString (string s) {
    return new byte[0];
}

As a benefit, encoding does not matter! Oh wait, this is an ecoding... it's just trivial and highly lossy.

share|improve this answer

protected by Paŭlo Ebermann Jun 27 '13 at 19:25

Thank you for your interest in this question. Because it has attracted low-quality answers, posting an answer now requires 10 reputation on this site.

Would you like to answer one of these unanswered questions instead?

Not the answer you're looking for? Browse other questions tagged or ask your own question.