following a recent thread in stackoverflow, I'm posting a new question: I have several strings from which I want to extract the encoding type. I'm willing to do it using regex:
Examples:
utf-8 quoted printable
string str = "=?utf-8?Q?=48=69=67=68=2d=45=6e=64=2d=44=65=73=69=67=6e=65=72=2d=57=61=74=63=68=2d=52=65=70=6c=69=63=61=73=2d=53=61=76=65=2d=54=48=4f=55=53=41=4e=44=53=2d=32=30=31=32=2d=4d=6f=64=65=6c=73?=";
utf-8 Base 64
string fld4 = "=?utf-8?B?VmFsw6lyaWUgTWVqc25lcm93c2tp?= <[email protected]>";
Windows 1258 Base 64
string msg2= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
iso-8859-1 Quoted printable
string fld2 = "=?iso-8859-1?Q?Fr=E9d=E9ric_Germain?= <[email protected]>";
etc...
In order to write a generic decoding function, we need to extract:
the charset (utf-8, Windows1258, etc...)
the transfert encoding type (quoted printable or base 64)
the encoded string
Any idea how to extract the pattern between ?xxx?Q? or ?xxx?B?
Note: this can be uppercase or lowercase
Thanks.
string.Split
can also work for something so simple. – Jon Jul 10 '13 at 20:35