c# Detect xml encoding from Byte Array?

Question

Well i have a byte array, and i know its a xml serilized object in the byte array is there any way to get the encoding from it?

Im not going to deserilize it but im saving it in a xml field on a sql server... so i need to convert it to a string?

Jon Skeet · Accepted Answer · 2009-02-24 11:05:01Z

You could look at the first 40-ish bytes*. They should contain the document declaration (assuming it has an document declaration) which should either contain the encoding or you can assume it's UTF-8 or UTF-16, which should should be obvious from how you've understood the "

Realistically, do you expect you'll ever get anything other than UTF-8 or UTF-16? If not, you could check for the patterns you get at the start of both of those and throw an exception if it doesn't follow either pattern. Alternatively, if you want to make another attempt, you could always try to decode the document as UTF-8, re-encode it and see if you get the same bytes back. It's not ideal, but it might just work.

I'm sure there are more rigorous ways of doing this, but they're likely to be finicky :)

* Quite possibly less than this. I figure 20 characters should be enough, which is 40 bytes in UTF-16.

Downvoters: if you're going to downvote, please provide a comment. Otherwise the downvote serves no real purpose.

Peter Lillevold · Answer 2 · 2009-03-12 10:56:13Z

A solution similar to this question could solve this by using a Stream over the byte array. Then you won't have to fiddle at the byte level. Like this:

Encoding encoding;
using (var stream = new MemoryStream(bytes))
{
    using (var xmlreader = new XmlTextReader(stream))
    {
        xmlreader.MoveToContent();
        encoding = xmlreader.Encoding;
    }
}

AnthonyWJones · Answer 3 · 2009-02-24 11:08:51Z

The first 2 or 3 bytes may be a BOM which can tell you whether the stream is UTF-8, Unicode-LittleEndian or Unicode-BigEndian.

UTF-8 BOM is 0xEF 0xBB 0xBF Unicode-Bigendian is 0xFE 0xFF Unicode-LittleEndiaon is 0xFF 0xFE

If none of these are present then you can use ASCII to test for <?xml (note most modern XML generation sticks to the standard that no white space may preceed the xml declare).

ASCII is use up until ?> so you can find the precence of encoding= and find its value. If encoding isn't present or <?xml declare is not present then you can assume UTF-8.

asked	3 years ago
viewed	7186 times
active	8 months ago

c# Detect xml encoding from Byte Array?

3 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged c# xml encoding binary-data or ask your own question.

Hello World!

Linked

c# Detect xml encoding from Byte Array?

3 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged c# xml encoding binary-data or ask your own question.

Hello World!

Linked

Related