Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I have a file that seems to mix encoding in it. It seems like a Unicode encoded file, but the character length string is encoded like a UTF8 or similar. Here is an example:

05 41 00 72 00 69 00 61 00 6C 00
5  A  .  r  .  i  .  a  .  l  .

In this example it stores the string like Unicode, using the extra character, but the length of the string is half of what it should be, 05 instead of 0A, as if it were encoded as UTF8.

If I use:

using (var reader = new BinaryReader(File.Open(fileName, FileMode.Open), Encoding.Unicode))
{
  temp = reader.ReadString();
}

When I run this then temp = "Ar"

I have this code that works. But is there a better way?

using (var reader = new BinaryReader(File.Open(fileName, FileMode.Open), Encoding.Unicode))
{
  tempByte = reader.ReadByte();
  var length = Convert.ToInt32(tempByte) * 2;
  byteArray = reader.ReadBytes(length);
  for (var ww = 0; ww < length; ww = ww + 2)
  {
    tempString = tempString + (char)byteArray[ww];
  }
}
share|improve this question
    
As far as I can see, that looks like UTF-16 encoding. – Bobby Feb 14 '14 at 17:54
up vote 2 down vote accepted

BinaryReader.ReadString() expects the string prefixed with the number of bytes to read, not the number of characters (I think this is because of variable-length encodings, especially UTF-8, but also UTF-16).

So, you can't use ReadString() directly, but you also don't have to convert the characters byte by byte like you do (which wouldn't work for non-ASCII characters anyway).

For this, you can use ReadChars(), which takes as a parameter the number of characters (not bytes) to read.

You also need to figure out what format is the number of bytes saved in. It could be a simple single-byte number (which means the string can have at most 255 characters), or it could be VLQ-encoded, which you can read using Read7BitEncodedInt(). Though that method is protected, so I'm going to assume the former for simplicity.

So, the code could look like this:

using (var reader = new BinaryReader(File.Open(fileName, FileMode.Open), Encoding.Unicode))
{
    int characterCount = reader.ReadByte();
    char[] characters = reader.ReadChars(characterCount);
    return new string(characters);
}
share|improve this answer

To me it looks like somebody just used BinaryWriter.Write(String) and you should be able to extract those strings with a UTF16 encoded BinaryReader using BinaryReader.ReadString() method

share|improve this answer
1  
This is where I'm stuck, if UTF16 is the encoding for this, I don't have this as an option in my encoding. I have ASCII, BigEndianUnicode, Default, UTF32, UTF7, UTF8 and Unicode. None of which decode this properly. – Family Feb 16 '14 at 12:22
    
Try Unicode, it's UTF16 (or UCS2, should be similar for most use cases) – thomasch Feb 16 '14 at 13:13

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.