Well i have a byte array, and i know its a xml serilized object in the byte array is there any way to get the encoding from it?
Im not going to deserilize it but im saving it in a xml field on a sql server... so i need to convert it to a string?
Well i have a byte array, and i know its a xml serilized object in the byte array is there any way to get the encoding from it? Im not going to deserilize it but im saving it in a xml field on a sql server... so i need to convert it to a string? |
|||||
|
You could look at the first 40-ish bytes1. They should contain the document declaration (assuming it has an document declaration) which should either contain the encoding or you can assume it's UTF-8 or UTF-16, which should should be obvious from how you've understood the Realistically, do you expect you'll ever get anything other than UTF-8 or UTF-16? If not, you could check for the patterns you get at the start of both of those and throw an exception if it doesn't follow either pattern. Alternatively, if you want to make another attempt, you could always try to decode the document as UTF-8, re-encode it and see if you get the same bytes back. It's not ideal, but it might just work. I'm sure there are more rigorous ways of doing this, but they're likely to be finicky :) 1 Quite possibly less than this. I figure 20 characters should be enough, which is 40 bytes in UTF-16. |
||||
A solution similar to this question could solve this by using a Stream over the byte array. Then you won't have to fiddle at the byte level. Like this:
|
|||
|
The first 2 or 3 bytes may be a Byte Order Mark (BOM) which can tell you whether the stream is UTF-8, Unicode-LittleEndian or Unicode-BigEndian. UTF-8 BOM is 0xEF 0xBB 0xBF Unicode-Bigendian is 0xFE 0xFF Unicode-LittleEndiaon is 0xFF 0xFE If none of these are present then you can use ASCII to test for ASCII is used up until |
|||||
|