2373

How do I convert a string to a byte[] in .NET (C#) without manually specifying a specific encoding?

I'm going to encrypt the string. I can encrypt it without converting, but I'd still like to know why encoding comes to play here.

Also, why should encoding even be taken into consideration? Can't I simply get what bytes the string has been stored in? Why is there a dependency on character encodings?

29
  • 30
    Every string is stored as an array of bytes right? Why can't I simply have those bytes? Commented Jan 23, 2009 at 14:05
  • 149
    The encoding is what maps the characters to the bytes. For example, in ASCII, the letter 'A' maps to the number 65. In a different encoding, it might not be the same. The high-level approach to strings taken in the .NET framework makes this largely irrelevant, though (except in this case). Commented Apr 13, 2009 at 14:13
  • 23
    To play devil's advocate: If you wanted to get the bytes of an in-memory string (as .NET uses them) and manipulate them somehow (i.e. CRC32), and NEVER EVER wanted to decode it back into the original string...it isn't straight forward why you'd care about encodings or how you choose which one to use. Commented Dec 1, 2009 at 19:47
  • 36
    A char is not a byte and a byte is not a char. A char is both a key into a font table and a lexical tradition. A string is a sequence of chars. (A words, paragraphs, sentences, and titles also have their own lexical traditions that justify their own type definitions -- but I digress). Like integers, floating point numbers, and everything else, chars are encoded into bytes. There was a time when the encoding was simple one to one: ASCII. However, to accommodate all of human symbology, the 256 permutations of a byte were insufficient and encodings were devised to selectively use more bytes. Commented Aug 28, 2014 at 15:43
  • 10
    Four years later, I stand by my original comment on this question. It's fundamentally flawed because the fact that we're talking about a string implies interpretation. The encoding of that string is an implicit part of the serialized contract, otherwise it's just a bunch of meaningless bits. If you want meaningless bits, why generate them from a string at all? Just write a bunch of 0's and be done with it. Commented Dec 12, 2014 at 22:44

41 Answers 41

1
2
1
bytes[] buffer = UnicodeEncoding.UTF8.GetBytes(string something); //for converting to UTF then get its bytes

bytes[] buffer = ASCIIEncoding.ASCII.GetBytes(string something); //for converting to ascii then get its bytes
1

simple code with LINQ

string s = "abc"
byte[] b = s.Select(e => (byte)e).ToArray();

EDIT : as commented below, it is not a good way.

but you can still use it to understand LINQ with a more appropriate coding :

string s = "abc"
byte[] b = s.Cast<byte>().ToArray();
2
  • 2
    It's hardly more faster, let alone most fastest. It's certainly an interesting alternative, but it's essentially the same as Encoding.Default.GetBytes(s) which, by the way, is way faster. Quick testing suggests that Encoding.Default.GetBytes(s) performs at least 79% faster. YMMV. Commented Oct 25, 2013 at 4:36
  • 6
    Try it with a . This code will not crash, but will return a wrong result (which is even worse). Try casting to a short instead of byte to see the difference. Commented Dec 18, 2013 at 8:57
1

Simply use this:

byte[] myByte= System.Text.ASCIIEncoding.Default.GetBytes(myString);
1
  • 2
    ...and lose all characters with a jump cope higher than 127. In my native language it is perfectly valid to write "Árvíztűrő tükörfúrógép.". System.Text.ASCIIEncoding.Default.GetBytes("Árvíztűrő tükörfúrógép.").ToString(); will return "Árvizturo tukörfurogép." losing information which can not be retrieved. (And I didn't yet mention asian languages where you would loose all characters.) Commented Jan 11, 2018 at 15:09
0

I have written a Visual Basic extension similar to the accepted answer, but directly using .NET memory and Marshalling for conversion, and it supports character ranges unsupported in other methods, like UnicodeEncoding.UTF8.GetString or UnicodeEncoding.UTF32.GetString or even MemoryStream and BinaryFormatter (invalid characters like: 񩱠 & ChrW(55906) & ChrW(55655)):

<Extension> _
Public Function ToBytesMarshal(ByRef str As String) As Byte()
    Dim gch As GCHandle = GCHandle.Alloc(str, GCHandleType.Pinned)
    Dim handle As IntPtr = gch.AddrOfPinnedObject
    ToBytesMarshal = New Byte(str.Length * 2 - 1) {}
    Try
        For i As Integer = 0 To ToBytesMarshal.Length - 1
            ToBytesMarshal.SetValue(Marshal.ReadByte(IntPtr.Add(handle, i)), i)
        Next
    Finally
        gch.Free()
    End Try
End Function

<Extension> _
Public Function ToStringMarshal(ByRef arr As Byte()) As String
    Dim gch As GCHandle = GCHandle.Alloc(arr, GCHandleType.Pinned)
    Try
        ToStringMarshal = Marshal.PtrToStringAuto(gch.AddrOfPinnedObject)
    Finally
        gch.Free()
    End Try
End Function
0

A character is both a lookup key into a font table and a lexical tradition such as ordering, upper and lower case versions, etc.

Consequently, a character is not a byte (8-bits) and a byte is not a character. In particular, the 256 permutations of a byte cannot accommodate the thousands of symbols within some written languages, much less all languages. Hence, various methods for encoding characters have been devised. Some encode for a particular class of languages (ASCII encoding); multiple languages using code pages (Extended ASCII); or, ambitiously, all languages by selectively including additional bytes as needed, Unicode.

Within a system, such as the .NET framework, a String implies a particular character encoding. In .NET this encoding is Unicode. Since the framework reads and writes Unicode by default, dealing with character encoding is typically not necessary in .NET.

However, in general, to load a character string into the system from a byte stream you need to know the source encoding to therefore interpret and subsequently translate it correctly (otherwise the codes will be taken as already being in the system's default encoding and thus render gibberish). Similarly, when a string is written to an external source, it will be written in a particular encoding.

4
  • 2
    Unicode is not an encoding. Unicode is an abstract mapping of characters to codepoints. There are multiple ways of encoding Unicode; in particular, UTF-8 and UTF-16 are most common. .NET uses UTF-16, though I'm unsure if it's UTF-16 LE or UTF-16 BE. Commented Aug 26, 2017 at 3:22
  • UTF-16 LE or UTF-16 BE is nor relevant: strings are using unbreakable 16-bit code units without any interpretation. UTF-16BE or UTF-16 LE may become relevant only when you convert strings to byte arrays or the reverse because, at that time, you'll specify an encoding (and in that case the string must first be valid UTF-16, but strings don't have to be valid UTF-16). GetBytes() is not necessarily returning valid UTF-16 BE/LE, it uses a simple arithmetic; the returned array is also not valid UTF-8 but arbitrary bytes. The byte order in result is system-specific if no encoding is specified. Commented Sep 7, 2019 at 16:05
  • This also means that string.UTF8.getBytes() may throw encoding exceptions from arbitrary strings whose content is not valid UTF-16. In C# you have the choice of encoders/decoders (codec) to use. You may want to use your own codec which will pack/unpack bytes differently, or may silently drop unpaired surrogates (if the codec attempts to interpret the string as UTF-16), or may drop the high bytes, or replace/interpret the codeunits invalid in UTF-16 by U+FFFD. The codec may also use data compression, or hexadecimal/base64 or escaping...Codecs are not restricted to just the UTF8 encoding. Commented Sep 7, 2019 at 16:15
  • note: I use here the term "codec" voluntarily instead of "encoding" which is more specific and used only for text. strings in C#, C, C++, Java, Javascript/ECMAscript/ActiveScript are NOT restricted to just valid text: they are just a generic storage structure, convenient for text and treated as text by libraries (but not all). As such the UTF forms are not enforced at all except inside specific APIs using them (including UTF* encoding objects). Yes you can store a binary program or PNG image in a compact immutable string instead of mutable array, but you can I/O all strings to text channels Commented Sep 7, 2019 at 18:50
-1

From byte[] to string:

        return BitConverter.ToString(bytes);
0
-1

To convert a string to a byte[] use the following solution:

string s = "abcdefghijklmnopqrstuvwxyz";
byte[] b = System.Text.UTF32Encoding.GetBytes(s);

I hope it helps.

4
  • 2
    that's not a solution! Commented Apr 12, 2014 at 17:12
  • 1
    Before your edit it was: s.Select(e => (byte)e) this only works for ASCII characters. But the char type is for storing UTF16 Units. Now after your editing, the code is at least correct, but it varies from environment to environment, hence rendering it virtually useless. IMHO Encoding.Default should only be used for interacting with legacy Windows "Ansi codepage" code. Commented Apr 13, 2014 at 8:04
  • Good point. How do you feel about byte[] b = new System.Text.UTF32Encoding().GetBytes(s); ? Commented Apr 14, 2014 at 8:30
  • use byte[] b = System.Text.UTF32Encoding.GetBytes(s);, UTF8 is equally fine. Commented Apr 14, 2014 at 9:12
-3
// C# to convert a string to a byte array.
public static byte[] StrToByteArray(string str)
{
    System.Text.ASCIIEncoding  encoding=new System.Text.ASCIIEncoding();
    return encoding.GetBytes(str);
}


// C# to convert a byte array to a string.
byte [] dBytes = ...
string str;
System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
str = enc.GetString(dBytes);
1
  • 7
    1) That will lose data due to using ASCII as the encoding. 2) There's no point in creating a new ASCIIEncoding - just use the Encoding.ASCII property. Commented Jan 27, 2009 at 6:35
-4

Here is the code:

// Input string.
const string input = "Dot Net Perls";

// Invoke GetBytes method.
// ... You can store this array as a field!
byte[] array = Encoding.ASCII.GetBytes(input);

// Loop through contents of the array.
foreach (byte element in array)
{
    Console.WriteLine("{0} = {1}", element, (char)element);
}
0
-5

I had to convert a string to a byte array for a serial communication project - I had to handle 8-bit characters, and I was unable to find a method using the framework converters to do so that didn't either add two-byte entries or mis-translate the bytes with the eighth bit set. So I did the following, which works:

string message = "This is a message.";
byte[] bytes = new byte[message.Length];
for (int i = 0; i < message.Length; i++)
    bytes[i] = (byte)message[i];
2
  • 3
    Its not safe this way and you will loose original data if input string contains unicode range characters. Commented Feb 11, 2016 at 19:43
  • This was for a serial communication project, which couldn't handle unicode anyway. Granted that it was an extremely narrow case. Commented Feb 6, 2017 at 20:55
-12

OP's question: "How do I convert a string to a byte array in .NET (C#)?" [sic]

You can use the following code:

static byte[] ConvertString (string s) {
    return new byte[0];
}

As a benefit, encoding does not matter! Oh wait, this is an ecoding... it's just trivial and highly lossy.

2
  • It's not a conversion. It's a new byte array. What the OP really needed was a pointer and memcpy. Or a cast: byte[] b = (byte[]) s;. Commented Apr 28, 2014 at 12:44
  • 2
    Furthermore, "s" isn't even used here. Definitely not a solution. Commented Oct 14, 2014 at 7:18
1
2

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.