Tagged Questions

info newest frequent votes active unanswered

UTF-8 (Unicode Transformation Format, 8 bits) is a character encoding that describes each Unicode code point using a byte sequence of one to six bytes. It is backwards-compatible with ASCII while still supporting representation of all Unicode code points.

learn more… | top users | synonyms

votes

0answers

37 views

Metalsmith plugin for stripping UTF8 BOM from files

I've been developing a metalsmith static site and I came across an issue where Visual Studio was automatically adding a BOM to the pages. I wrote the following plugin for metalsmith (it needs to be ...

asked Sep 23 at 0:17

James Khoury
2,4041945

vote

1answer

31 views

Robustly dealing with malformed Unicode files

I'm writing a script that deals reads UTF-8-encoded XML files and writes parts of those files into a tempfile for further processing. Sometimes, the input files will have a few malformed characters. ...

python file utf-8

asked Sep 9 at 16:20

Thor
9016

votes

2answers

258 views

Validating UTF-8 byte array

I'm writing a validator function that receives a byte[] and checks whether it represents a valid UTF-8 byte sequence, according to this table. Is my approach ...

java bitwise utf-8

asked Aug 7 at 19:12

Óscar López
386110

votes

3answers

322 views

Functions to escape CSS rules in PHP

Some context I've been tasked with supplying an escaping function to arbitrary CSS values that are entered through a form. The goals and caveats are: I know it's bad practice to let users input ...

php css security utf-8 escaping

asked Jun 30 at 8:25

Madara Uchiha
2,813829

votes

1answer

126 views

Method to return a string of max length (in bytes vs. characters)

In my (c#) code, I need to generate a string (from a longer string) which when UTF-8 encoded, is no longer than a given max length (in bytes). ...

c# strings utf-8

asked Jun 24 at 9:41

Darragh
232

votes

1answer

63 views

Replacing Perisan and Arabic digits

I'm using this function to replace UTF-8 characters representing numbers in text with 'normal' digits. I'm wondering if this is optimized code since this is using two ...

php strings converting utf-8

asked May 7 at 11:08

tinybyte
1185

vote

1answer

44 views

Avoiding use of .encode() in rss2html

My concern with this code is the excessive use of .encode('utf-8'). Any advice on refining these functions would be very helpful. rss2html GitHub repo ...

python python-2.7 utf-8 rss

asked May 3 at 6:48

Ricky Wilson
1305

votes

2answers

344 views

Customised Java UTF-8

I have implemented a customized UTF-8 encoding mechanism. The code works fine, but I have a lot of concerns regarding the code. ...

java reinventing-the-wheel utf-8

asked Feb 26 at 12:23

srikanth
1819

votes

4answers

1k views

Function to convert ISO-8859-1 to UTF-8

I wrote this function last year to convert between the two encodings and just found it. It takes a text buffer and its size, then converts to UTF-8 if there's enough space. What should be changed to ...

c strings converting unicode utf-8

asked Feb 3 at 20:54

2013Asker
979211

votes

3answers

2k views

Count byte length of string

I am looking for some guidance and optimization pointers for my custom JavaScript function which counts the bytes in a string rather than just chars. The website uses UTF-8 and I am looking to ...

javascript optimization strings utf-8

asked Dec 16 '13 at 16:22

MonkeyZeus
1456

votes

2answers

88 views

Macros to detect UTF-8

I'm working on a program that handles UTF-8 characters. I've made the following macros to detect UTF-8. I've tested them with a few thousand words and they seem to work. I'll add another one to do ...

c macros utf-8

asked Oct 11 '13 at 11:15

2013Asker
979211

votes

1answer

390 views

Better code for converting a char to its UTF-8 percent encoding representation?

This is working code for a URI template (RFC 6570) implementation; when the character to render is not within a specific character set, it is needed to grab the UTF-8 representation of that character ...

java guava utf-8

asked May 24 '13 at 4:51

fge
86329

votes

1answer

428 views

Please review my UTF-8 character reader function

You may see full code here (note that the link points to the specific commit). Language is "clean C" (that is, a subset of C89, C99 and C++98 — it is intended to compile under all of these ...

c++ c portability utf-8

asked Apr 2 '11 at 14:02

Alexander Gladysh
1236

newest utf-8 questions feed

current community

your communities

more stack exchange communities

Tagged Questions

Metalsmith plugin for stripping UTF8 BOM from files

Robustly dealing with malformed Unicode files

Validating UTF-8 byte array

Functions to escape CSS rules in PHP

Method to return a string of max length (in bytes vs. characters)

Replacing Perisan and Arabic digits

Avoiding use of .encode() in rss2html

Customised Java UTF-8

Function to convert ISO-8859-1 to UTF-8

Count byte length of string

Macros to detect UTF-8

Better code for converting a char to its UTF-8 percent encoding representation?

Please review my UTF-8 character reader function

Hot Network Questions

your communities

Tagged Questions

Related Tags