How do I get a consistent byte representation of strings in C# without manually specifying an encoding?

Question

How do I convert a string to a byte[] in .NET (C#) without manually specifying a specific encoding?

I'm going to encrypt the string. I can encrypt it without converting, but I'd still like to know why encoding comes to play here.

Also, why should encoding even be taken into consideration? Can't I simply get what bytes the string has been stored in? Why is there a dependency on character encodings?

Every string is stored as an array of bytes right? Why can't I simply have those bytes? — Agnel Kurian
– Agnel Kurian, Commented Jan 23, 2009 at 14:05
The encoding is what maps the characters to the bytes. For example, in ASCII, the letter 'A' maps to the number 65. In a different encoding, it might not be the same. The high-level approach to strings taken in the .NET framework makes this largely irrelevant, though (except in this case). — Lucas Jones
– Lucas Jones, Commented Apr 13, 2009 at 14:13
To play devil's advocate: If you wanted to get the bytes of an in-memory string (as .NET uses them) and manipulate them somehow (i.e. CRC32), and NEVER EVER wanted to decode it back into the original string...it isn't straight forward why you'd care about encodings or how you choose which one to use. — Greg
– Greg, Commented Dec 1, 2009 at 19:47
A char is not a byte and a byte is not a char. A char is both a key into a font table and a lexical tradition. A string is a sequence of chars. (A words, paragraphs, sentences, and titles also have their own lexical traditions that justify their own type definitions -- but I digress). Like integers, floating point numbers, and everything else, chars are encoded into bytes. There was a time when the encoding was simple one to one: ASCII. However, to accommodate all of human symbology, the 256 permutations of a byte were insufficient and encodings were devised to selectively use more bytes. — George
– George, Commented Aug 28, 2014 at 15:43
Four years later, I stand by my original comment on this question. It's fundamentally flawed because the fact that we're talking about a string implies interpretation. The encoding of that string is an implicit part of the serialized contract, otherwise it's just a bunch of meaningless bits. If you want meaningless bits, why generate them from a string at all? Just write a bunch of 0's and be done with it. — Greg D
– Greg D, Commented Dec 12, 2014 at 22:44

user1120193 · Accepted Answer · 2012-01-02 11:07:00Z

1

bytes[] buffer = UnicodeEncoding.UTF8.GetBytes(string something); //for converting to UTF then get its bytes

bytes[] buffer = ASCIIEncoding.ASCII.GetBytes(string something); //for converting to ascii then get its bytes

answered Jan 2, 2012 at 11:07

user1120193

2403 silver badges12 bronze badges

Add a comment |

Avlin · Accepted Answer · 2013-12-18 10:13:26Z

1

simple code with LINQ

string s = "abc"
byte[] b = s.Select(e => (byte)e).ToArray();

EDIT : as commented below, it is not a good way.

but you can still use it to understand LINQ with a more appropriate coding :

string s = "abc"
byte[] b = s.Cast<byte>().ToArray();

edited Dec 18, 2013 at 10:13

answered Oct 11, 2012 at 9:45

Avlin

5304 silver badges20 bronze badges

2

It's hardly more faster, let alone most fastest. It's certainly an interesting alternative, but it's essentially the same as Encoding.Default.GetBytes(s) which, by the way, is way faster. Quick testing suggests that Encoding.Default.GetBytes(s) performs at least 79% faster. YMMV.

WynandB
– WynandB

10/25/2013 04:36:21
Commented Oct 25, 2013 at 4:36
6

Try it with a €. This code will not crash, but will return a wrong result (which is even worse). Try casting to a short instead of byte to see the difference.

Hans Keﬆing
– Hans Keﬆing

12/18/2013 08:57:07
Commented Dec 18, 2013 at 8:57

Add a comment |

jonsca · Accepted Answer · 2015-07-01 01:14:44Z

1

Simply use this:

byte[] myByte= System.Text.ASCIIEncoding.Default.GetBytes(myString);

edited Jul 1, 2015 at 1:14

jonsca

10.5k26 gold badges56 silver badges63 bronze badges

answered Jun 30, 2015 at 14:39

alireza amini

1,7421 gold badge18 silver badges33 bronze badges

2

...and lose all characters with a jump cope higher than 127. In my native language it is perfectly valid to write "Árvíztűrő tükörfúrógép.". System.Text.ASCIIEncoding.Default.GetBytes("Árvíztűrő tükörfúrógép.").ToString(); will return "Árvizturo tukörfurogép." losing information which can not be retrieved. (And I didn't yet mention asian languages where you would loose all characters.)

mg30rg
– mg30rg

01/11/2018 15:09:12
Commented Jan 11, 2018 at 15:09

Add a comment |

Peter Mortensen · Accepted Answer · 2017-01-09 01:18:36Z

I have written a Visual Basic extension similar to the accepted answer, but directly using .NET memory and Marshalling for conversion, and it supports character ranges unsupported in other methods, like UnicodeEncoding.UTF8.GetString or UnicodeEncoding.UTF32.GetString or even MemoryStream and BinaryFormatter (invalid characters like: 񩱠 & ChrW(55906) & ChrW(55655)):

<Extension> _
Public Function ToBytesMarshal(ByRef str As String) As Byte()
    Dim gch As GCHandle = GCHandle.Alloc(str, GCHandleType.Pinned)
    Dim handle As IntPtr = gch.AddrOfPinnedObject
    ToBytesMarshal = New Byte(str.Length * 2 - 1) {}
    Try
        For i As Integer = 0 To ToBytesMarshal.Length - 1
            ToBytesMarshal.SetValue(Marshal.ReadByte(IntPtr.Add(handle, i)), i)
        Next
    Finally
        gch.Free()
    End Try
End Function

<Extension> _
Public Function ToStringMarshal(ByRef arr As Byte()) As String
    Dim gch As GCHandle = GCHandle.Alloc(arr, GCHandleType.Pinned)
    Try
        ToStringMarshal = Marshal.PtrToStringAuto(gch.AddrOfPinnedObject)
    Finally
        gch.Free()
    End Try
End Function

6 revs, 3 users 95% · Accepted Answer · 2017-01-09 01:21:19Z

0

A character is both a lookup key into a font table and a lexical tradition such as ordering, upper and lower case versions, etc.

Consequently, a character is not a byte (8-bits) and a byte is not a character. In particular, the 256 permutations of a byte cannot accommodate the thousands of symbols within some written languages, much less all languages. Hence, various methods for encoding characters have been devised. Some encode for a particular class of languages (ASCII encoding); multiple languages using code pages (Extended ASCII); or, ambitiously, all languages by selectively including additional bytes as needed, Unicode.

Within a system, such as the .NET framework, a String implies a particular character encoding. In .NET this encoding is Unicode. Since the framework reads and writes Unicode by default, dealing with character encoding is typically not necessary in .NET.

However, in general, to load a character string into the system from a byte stream you need to know the source encoding to therefore interpret and subsequently translate it correctly (otherwise the codes will be taken as already being in the system's default encoding and thus render gibberish). Similarly, when a string is written to an external source, it will be written in a particular encoding.

edited Jan 9, 2017 at 1:21

community wiki

6 revs, 3 users 95%
George

2

Unicode is not an encoding. Unicode is an abstract mapping of characters to codepoints. There are multiple ways of encoding Unicode; in particular, UTF-8 and UTF-16 are most common. .NET uses UTF-16, though I'm unsure if it's UTF-16 LE or UTF-16 BE.

Kevin
– Kevin

08/26/2017 03:22:50
Commented Aug 26, 2017 at 3:22
UTF-16 LE or UTF-16 BE is nor relevant: strings are using unbreakable 16-bit code units without any interpretation. UTF-16BE or UTF-16 LE may become relevant only when you convert strings to byte arrays or the reverse because, at that time, you'll specify an encoding (and in that case the string must first be valid UTF-16, but strings don't have to be valid UTF-16). GetBytes() is not necessarily returning valid UTF-16 BE/LE, it uses a simple arithmetic; the returned array is also not valid UTF-8 but arbitrary bytes. The byte order in result is system-specific if no encoding is specified.

verdy_p
– verdy_p

09/07/2019 16:05:49
Commented Sep 7, 2019 at 16:05
This also means that string.UTF8.getBytes() may throw encoding exceptions from arbitrary strings whose content is not valid UTF-16. In C# you have the choice of encoders/decoders (codec) to use. You may want to use your own codec which will pack/unpack bytes differently, or may silently drop unpaired surrogates (if the codec attempts to interpret the string as UTF-16), or may drop the high bytes, or replace/interpret the codeunits invalid in UTF-16 by U+FFFD. The codec may also use data compression, or hexadecimal/base64 or escaping...Codecs are not restricted to just the UTF8 encoding.

verdy_p
– verdy_p

09/07/2019 16:15:53
Commented Sep 7, 2019 at 16:15
note: I use here the term "codec" voluntarily instead of "encoding" which is more specific and used only for text. strings in C#, C, C++, Java, Javascript/ECMAscript/ActiveScript are NOT restricted to just valid text: they are just a generic storage structure, convenient for text and treated as text by libraries (but not all). As such the UTF forms are not enforced at all except inside specific APIs using them (including UTF* encoding objects). Yes you can store a binary program or PNG image in a compact immutable string instead of mutable array, but you can I/O all strings to text channels

verdy_p
– verdy_p

09/07/2019 18:50:54
Commented Sep 7, 2019 at 18:50

Add a comment |

Peter Mortensen · Accepted Answer · 2017-01-09 01:19:24Z

-1

From byte[] to string:

        return BitConverter.ToString(bytes);

edited Jan 9, 2017 at 1:19

Peter Mortensen

31.6k22 gold badges110 silver badges134 bronze badges

answered Jan 21, 2015 at 14:05

Piero Alberto

3,9638 gold badges62 silver badges112 bronze badges

Add a comment |

WonderWorker · Accepted Answer · 2019-01-24 12:02:39Z

-1

To convert a string to a byte[] use the following solution:

string s = "abcdefghijklmnopqrstuvwxyz";
byte[] b = System.Text.UTF32Encoding.GetBytes(s);

I hope it helps.

edited Jan 24, 2019 at 12:02

answered Apr 9, 2014 at 12:39

WonderWorker

9,1625 gold badges70 silver badges75 bronze badges

2

that's not a solution!

Sebastian
– Sebastian

04/12/2014 17:12:59
Commented Apr 12, 2014 at 17:12
1

Before your edit it was: s.Select(e => (byte)e) this only works for ASCII characters. But the char type is for storing UTF16 Units. Now after your editing, the code is at least correct, but it varies from environment to environment, hence rendering it virtually useless. IMHO Encoding.Default should only be used for interacting with legacy Windows "Ansi codepage" code.

Sebastian
– Sebastian

04/13/2014 08:04:29
Commented Apr 13, 2014 at 8:04
Good point. How do you feel about byte[] b = new System.Text.UTF32Encoding().GetBytes(s); ?

WonderWorker
– WonderWorker

04/14/2014 08:30:58
Commented Apr 14, 2014 at 8:30
use byte[] b = System.Text.UTF32Encoding.GetBytes(s);, UTF8 is equally fine.

Sebastian
– Sebastian

04/14/2014 09:12:38
Commented Apr 14, 2014 at 9:12

Add a comment |

cyberbobcat · Accepted Answer · 2009-01-23 13:43:58Z

-3

// C# to convert a string to a byte array.
public static byte[] StrToByteArray(string str)
{
    System.Text.ASCIIEncoding  encoding=new System.Text.ASCIIEncoding();
    return encoding.GetBytes(str);
}


// C# to convert a byte array to a string.
byte [] dBytes = ...
string str;
System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
str = enc.GetString(dBytes);

answered Jan 23, 2009 at 13:43

cyberbobcat

1,1871 gold badge18 silver badges36 bronze badges

7

1) That will lose data due to using ASCII as the encoding. 2) There's no point in creating a new ASCIIEncoding - just use the Encoding.ASCII property.

Jon Skeet
– Jon Skeet

01/27/2009 06:35:53
Commented Jan 27, 2009 at 6:35

Add a comment |

shytikov · Accepted Answer · 2013-01-23 06:41:24Z

-4

Here is the code:

// Input string.
const string input = "Dot Net Perls";

// Invoke GetBytes method.
// ... You can store this array as a field!
byte[] array = Encoding.ASCII.GetBytes(input);

// Loop through contents of the array.
foreach (byte element in array)
{
    Console.WriteLine("{0} = {1}", element, (char)element);
}

edited Jan 23, 2013 at 6:41

shytikov

9,6189 gold badges61 silver badges106 bronze badges

answered Jan 23, 2013 at 6:21

sagardhavale

11

Add a comment |

IgnusFast · Accepted Answer · 2016-01-21 17:19:03Z

-5

I had to convert a string to a byte array for a serial communication project - I had to handle 8-bit characters, and I was unable to find a method using the framework converters to do so that didn't either add two-byte entries or mis-translate the bytes with the eighth bit set. So I did the following, which works:

string message = "This is a message.";
byte[] bytes = new byte[message.Length];
for (int i = 0; i < message.Length; i++)
    bytes[i] = (byte)message[i];

answered Jan 21, 2016 at 17:19

IgnusFast

977 bronze badges

3

Its not safe this way and you will loose original data if input string contains unicode range characters.

Mojtaba Rezaeian
– Mojtaba Rezaeian

02/11/2016 19:43:09
Commented Feb 11, 2016 at 19:43
This was for a serial communication project, which couldn't handle unicode anyway. Granted that it was an extremely narrow case.

IgnusFast
– IgnusFast

02/06/2017 20:55:04
Commented Feb 6, 2017 at 20:55

Add a comment |

Thomas Eding · Accepted Answer · 2013-09-27 23:26:41Z

-12

OP's question: "How do I convert a string to a byte array in .NET (C#)?" [sic]

You can use the following code:

static byte[] ConvertString (string s) {
    return new byte[0];
}

As a benefit, encoding does not matter! Oh wait, this is an ecoding... it's just trivial and highly lossy.

answered Sep 27, 2013 at 23:26

Thomas Eding

36.8k14 gold badges84 silver badges112 bronze badges

It's not a conversion. It's a new byte array. What the OP really needed was a pointer and memcpy. Or a cast: byte[] b = (byte[]) s;.

Lodewijk
– Lodewijk

04/28/2014 12:44:01
Commented Apr 28, 2014 at 12:44
2

Furthermore, "s" isn't even used here. Definitely not a solution.

Niki Romagnoli
– Niki Romagnoli

10/14/2014 07:18:05
Commented Oct 14, 2014 at 7:18

Add a comment |

Collectives™ on Stack Overflow

How do I get a consistent byte representation of strings in C# without manually specifying an encoding?

41 Answers 41

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

41 Answers 41

Your Answer

Sign up or log in

Post as a guest

Linked

Related