Unicode is intended to be a universal character set for describing all the characters required for written text incorporating all writing systems, technical symbols and punctuation.
2
votes
1answer
36 views
Sort the characters in a UnicodeString
The task: sort the characters in a string, as provided by the ICU UnicodeString. This is because I want to be able to find anagrams using the suggestion from "...
3
votes
0answers
30 views
Convert accented character to user name
I am using below function for converting the accented character to user name.
...
7
votes
3answers
145 views
Unicode Chess PvP with Move Validation
Main Purpose
This script allows two players to play chess on a virtual chessboard
printed on the screen by making use of the Unicode chess characters.
Visual appearence
...
3
votes
0answers
62 views
Reduce encoded length of UTF-8 encoded Ruby string in C extension
I'm writing a Ruby extension in C. It's a string processing module working on UTF-8 encoded strings only.
One method, full_width_to_ascii!, converts full width ...
3
votes
2answers
64 views
Check that files don't contain Unicode
I use this code to check that all files in a directory are free from Unicode and non-printable characters.
Can I improve the structure of the code?
...
5
votes
1answer
68 views
Alternate letters to UpperCase
As an exercise I repeated this Java question, but in Go: Convert string to mixed case
The objective is for every second letter to be converted to uppercase.
Go string processing is relatively new to ...
3
votes
1answer
84 views
Get argument as unicode string from argparse in Python 2 and 3
Code from my last project, that has to work on Python 2.7 and Python 3.
I want to get a path as a unicode string from argparse:
...
1
vote
2answers
54 views
Anchor Text String Transformation in Swift
I need to convert a string using these rules:
Downcase the string
Replace spaces, a blacklist of "invalid" chars, and non-ascii letters (like é) with -
Replace ...
9
votes
1answer
82 views
Highlight specific words in a sentence with diacritrics
I am searching for some improvements, particularly in the regex, in the way I highlight specific words in a string.
I have keywords into my database stored without any diacritrics
The user comes ...
2
votes
0answers
61 views
Termios/Xterm line editor for APL interpreter
As an interesting sub-part of an interpreter -- just the Read part of the REPL -- I present my raw-mode line-oriented editor that I intend to use for my APL interpreter. (The Eval part has been posted ...
3
votes
1answer
113 views
Converting Hindi Devanagari proprietary font to Unicode font
This working macro converts Hindi Devanagari proprietary font to Unicode font. It reads the conversion table from a tab delimited file, and converts one by one by finding all matches and then does ...
6
votes
2answers
71 views
Unicode-capable symbol table (N-way search tree with hash buckets)
As in my previous question, this module is coupled with its own testing framework.
As a symbol-table for a Unicode-capable programming language interpreter, I decided to combine the 3 types of ...
4
votes
1answer
297 views
Dealing with messy JSON API and UTF-8 encoding problems
I am using an API that returns a JSON object that has "encoded_polyline" fields that tend to result in errors like this one:
UnicodeEncodeError: 'utf-8' codec ...
6
votes
1answer
925 views
Removing accents from certain characters
I have a method that I am using to remove accents from certain characters. The problem is the massive slew of characters I am expected to work with. I have to, basically, remove accents from all Latin ...
7
votes
1answer
109 views
Korean Romanization to Hangeul library
After a little work, I've finally ironed this out... Though could I get someone to go over this and see if there is a better way to present this library within the code, before I publish it?
Premise: ...
3
votes
1answer
90 views
Haskell requires more memory than Python when I read map from file. Why?
I have this simple code in Python:
...
4
votes
3answers
225 views
Listing all the chars in a given UnicodeBlock
I want to visually inspect all the characters that Java thinks are in any given UnicodeBlock. The following method, as far as I can tell, does the task. But, it sure feels like awful design.
...
7
votes
1answer
285 views
Serialization: Escape Input String
In Json strings characters can be escaped with \\.
Here is an iterator that can read such strings and convert the escaped characters to UTF-8
...
4
votes
1answer
67 views
Data-checking class supporting letters
I'm starting to learn OOP with PHP, and all I've learned so far is just by searching and reading. So I have this need to check input data for certain things like min of chars, max of chars, spaced or ...
10
votes
3answers
436 views
Mutable String Class
Here is a String class that I've made for fun in my spare time, for myself.
I have a few concerns about it, and am considering going immutable or a mix of the two ...
6
votes
1answer
85 views
Copy directories while changing Unicode filenames to ASCII
I created a short Perl 6 script copyfnameascii.pl to copy a file hierarchy I have, applying an URL Decoding to the names of folders and removing non-ASCII ...
3
votes
2answers
103 views
Create nice url with diactrics removal
Please review my class. It uses iconv() (it's probably not the best solution however I haven't found any better alternative to change unknown characters).
...
11
votes
3answers
2k views
Customised Java UTF-16
I have implemented customized encoding mechanism for javaUTF16. Does this implementation support all the characters?
...
3
votes
2answers
2k views
Custom encoding for BinaryReader
I have a file that seems to mix encoding in it. It seems like a Unicode encoded file, but the character length string is encoded like a UTF8 or similar. Here is an example:
...
7
votes
4answers
4k views
Function to convert ISO-8859-1 to UTF-8
I wrote this function last year to convert between the two encodings and just found it. It takes a text buffer and its size, then converts to UTF-8 if there's enough space.
What should be changed to ...
6
votes
1answer
14k views
Getting data correctly from <span> tag with beautifulsoup and regex
I am scraping an online shop page, trying to get the price mentioned in that page. In the following block the price is mentioned:
...
1
vote
2answers
556 views
node.js library for extracting words from a text
I'm looking for feedback on my library for extracting words from a text: https://npmjs.org/package/uwords
The extracted word is defined as sequence of Unicode characters from Lu, Ll, Lt, Lm, Lo ...
7
votes
1answer
319 views
dir=“auto” JavaScript shim for IE
Reason for script:
dir="auto" is an attribute value from the HTML 5 spec with current poor support in IE and Opera browsers. The project I am working on only ...
5
votes
2answers
397 views
isRTL.coffee library to determine if a text is of right-to-left direction
I just wrote this tiny library called isRTL.coffee to determine the direction of the text. Is there any better way of doing this?
...
0
votes
1answer
580 views
Ruby code to identify 2-byte characters
I just wrote this fuzzy snippet to identify any 2-byte character in my data file which assumes only 1-byte characters (ANSI). Please review and suggest me any better solution!
...
3
votes
4answers
210 views
String parsing with multiple delimeters
My data is in this format:
龍舟 龙舟 [long2 zhou1] /dragon boat/imperial boat/\n
And I want to return:
('龍舟', '龙舟', 'long2 zhou1', '/dragon boat/imperial boat/')
...
3
votes
1answer
604 views
Unicode parsing in PHP
Firstly, apologies if this is not the correct type of question for here, I had it on the stackoverflow but it was closed with a suggestion I post here.
I’m in the process of converting from Latin 15 ...
3
votes
1answer
995 views
Simplify regular expression? (Converting Unicode fractions to TeX)
Background
I'm converting Unicode text to TeX for typesetting. In the input, I'm allowing simple fractions like ½ and ⅔ using single Unicode characters and complex fractions like ¹²³/₄₅₆ using ...
7
votes
1answer
211 views
Source code level portable C++ Unicode literals
Windows console windows do unfortunately not support stream I/O of international characters. For instance, in Windows 7, you can still do "chcp 65001" (sets the active code page to UTF-8), type "more",...
6
votes
2answers
670 views