Converting a string to byte-array without using an encoding (byte-by-byte)

Question

How do I convert a string to a byte[] in .NET (C#)?

Also, why should encoding be taken into consideration? Can't I simply get what bytes the string has been stored in? Why is there a dependency on character encodings?

Every string is stored as an array of bytes right? Why can't I simply have those bytes? — Agnel Kurian, Jan 23 '09 at 14:05
The encoding is what maps the characters to the bytes. For example, in ASCII, the letter 'A' maps to the number 65. In a different encoding, it might not be the same. The high-level approach to strings taken in the .NET framework makes this largely irrelevant, though (except in this case). — Lucas Jones, Apr 13 '09 at 14:13
To play devil's advocate: If you wanted to get the bytes of an in-memory string (as .NET uses them) and manipulate them somehow (i.e. CRC32), and NEVER EVER wanted to decode it back into the original string...it isn't straight forward why you'd care about encodings or how you choose which one to use. — Greg, Dec 1 '09 at 19:47
Surprised no-one has given this link yet: joelonsoftware.com/articles/Unicode.html — Bevan, Jun 29 '10 at 2:57
A char is not a byte and a byte is not a char. A char is both a key into a font table and a lexical tradition. A string is a sequence of chars. (A words, paragraphs, sentences, and titles also have their own lexical traditions that justify their own type definitions -- but I digress). Like integers, floating point numbers, and everything else, chars are encoded into bytes. There was a time when the encoding was simple one to one: ASCII. However, to accommodate all of human symbology, the 256 permutations of a byte were insufficient and encodings were devised to selectively use more bytes. — George, Aug 28 '14 at 15:43

4 revs · Answer 1 · 2014-08-30 01:15:33Z

A character is both a lookup key into a font table and a lexical tradition such as ordering, upper and lower case versions, etc.

Consequently, a character is not a byte (8-bits) and a byte is not a character. In particular, the 256 permutations of a byte cannot accommodate the thousands of symbols within some written languages, much less all languages. Hence, various methods for encoding characters have been devised. Some encode for a particular class of languages (ASCII encoding); multiple languages using code pages (Extended ASCII); or, ambitiously, all languages by selectively including additional bytes as needed, Unicode.

Within a system, such as the .Net framework, a String implies a particular character encoding. In .Net this encoding is Unicode. Since the framework reads and writes Unicode by default, dealing with character encoding is typically not necessary in .Net.

However, in general, to load a character string into the system from a byte stream you need to know the source encoding to therefore interpret and subsequently translate it correctly (otherwise the codes will be taken as already being in the system's default encoding and thus render gibberish). Similarly, when a string is written to an external source, it will be written in a particular encoding.

Jodrell · Answer 2 · 2014-11-25 10:29:12Z

If you really want a copy of the underlying bytes of a string, you can use a function like the one that follows. However, you shouldn't please read on to find out why.

[DllImport(
        "msvcrt.dll",
        EntryPoint = "memcpy",
        CallingConvention = CallingConvention.Cdecl,
        SetLastError = false)]
private static extern unsafe void* UnsafeMemoryCopy(
    void* destination,
    void* source,
    uint count);

public static byte[] GetUnderlyingBytes(string source)
{
    var length = source.Length * sizeof(char);
    var result = new byte[length];
    unsafe
    {
        fixed (char* firstSourceChar = source)
        fixed (byte* firstDestination = result)
        {
            var firstSource = (byte*)firstSourceChar;
            UnsafeMemoryCopy(
                firstDestination,
                firstSource,
                (uint)length);
        }
    }

    return result;
}

This function will get you a copy of the bytes underlying your string, pretty quickly. You'll get those bytes in whatever way they are encoding on your system. This encoding is almost certainly UTF-16LE but that is an implementation detail you shouldn't have to care about.

It would be safer, simpler and more reliable to just call,

System.Text.Encoding.Unicode.GetBytes()

In all likelihood this will give the same result, is easier to type, and the bytes will always round-trip with a call to

System.Text.Encoding.Unicode.GetString()

Thomas Eding · Answer 3 · 2013-09-27 23:26:41Z

OP's question: "How do I convert a string to a byte array in .NET (C#)?" [sic]

You can use the following code:

static byte[] ConvertString (string s) {
    return new byte[0];
}

As a benefit, encoding does not matter! Oh wait, this is an ecoding... it's just trivial and highly lossy.

asked	6 years ago
viewed	870534 times
active	23 days ago

current community

your communities

more stack exchange communities

Converting a string to byte-array without using an encoding (byte-by-byte)

33 Answers 33

protected by Paŭlo Ebermann Jun 27 '13 at 19:25

Not the answer you're looking for? Browse other questions tagged c# .net string or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Converting a string to byte-array without using an encoding (byte-by-byte)

33 Answers 33

protected by Paŭlo Ebermann Jun 27 '13 at 19:25

Not the answer you're looking for? Browse other questions tagged c# .net string or ask your own question.

Linked

Related

Hot Network Questions