Questions that deal with various representations of characters & character sets, such as: ASCII, UTF-8, EBCDIC, among others. Often encountered when moving files between operating systems that encode new lines with carriage returns and/or newline characters.
1
vote
1answer
29 views
How to find non-British non-ASCII non-LaTeX characters for pdftex?
I am debugging my tex file by eliminating all technical flaws in the systems. I cannot find anything wrong in my document with Tex community here and myself so I think there can be something non-ASCII ...
4
votes
2answers
174 views
Replace “/U+[0-9A-Fa-f]{4}/” with proper unicode character in shell pipeline with sed eval flag
I am trying to properly visualize the existing characters that listed in the /usr/include/X11/keysymdef.h file.
It has lines like:
#define XK_onethird 0x0ab0 /* U+2153 VULGAR FRACTION ONE THIRD *...
3
votes
2answers
23 views
fold and text columns
Can fold be set to recognize characters instead of bytes? Traditional Chinese characters appear to be encoded in three bytes each (in UTF-8 at least), which means that if fold's -w is not a multiple ...
1
vote
2answers
37 views
How to determine the character encoding that a terminal uses in a C/C++ program?
I've noticed that SyncTERM uses a different character encoding than the default MacOS terminal emulator, and they're incompatible with one another. For example, say you want to print a block ...
1
vote
2answers
45 views
Print a character having a codepoint
I have a list of codepoints like 0x13000, 0x1300A.
I have to print the corresponding Unicode characters from bash.
I've already tried to do it with other commands that I've found searching in the ...
12
votes
2answers
898 views
Wget returning binary instead of html?
I am using wget to download a static html page. The W3C Validator tells me the page is encoded in UTF-8. Yet when I cat the file after download, I get a bunch of binary nonsense. I'm on Ubuntu, and ...
4
votes
2answers
77 views
Find files by character encoding
I have a long-running python script that failed to utf-8 decode a file. The error message doesn't tell me what file it failed on, only that it couldn't decode byte 0x81 in position 194. I know which ...
1
vote
1answer
32 views
How to convert interrogation mark character in accented letters
I have a file containing accented letters
with cat, I get interrogation marks
voltm�tre
with less, I get accented letters with brackets
voltm<E8>tre
with vim, I get the accented letter ...
1
vote
1answer
31 views
Linux Mint doesn't write Arabic letters
I installed arabic fonts for Linux mint and i can switch between arabic and English, but it seems that mint cannot write arabic letters for example renaming a file or writing in any text-editor, when ...
2
votes
1answer
51 views
Why can't I convert a UTF-8 to MS-ANSI using iconv?
I am trying to convert a file from utf-8 to ms-ansi.
I use
iconv -f UTF8 -t MS-ANSI// < data.txt
but get
iconv: illegal input sequence at position 171359
when looking into this
dd if=...
3
votes
1answer
27 views
When installing Linux what factors go into choosing the locale for the server?
When I am installing Linux (for the GB locale) I am presented with the option of choosing en_GB, en_GB.UTF-8 and en_GB.ISO-8859-15.
What factors go into making the choice? As far as I know the ...
0
votes
1answer
25 views
Unreadable lozenge characters in tty1
Something wrong is going on in my Debian Jessie. Usually I use tty7 with GUI and everything is fine here. In tty1 though, Polish characters (both being typed and read from UTF-8 files) are represented ...
0
votes
2answers
65 views
Unable to display Greek letters on Mutt 1.7
I recently installed mutt on Linux Mint 18 using apt.
I configured it and it worked great for all my three accounts.
Then I realized I had mutt version 1.5 while version 1.7 is the latest one.
I ...
8
votes
2answers
172 views
How to set fallback encoding to UTF-8 in Firefox?
I've written a Norwegian markdown document:
$ file brukerveiledning.md
brukerveiledning.md: UTF-8 Unicode text
I've converted it to HTML using the markdown command:
$ markdown > brukerveiledning....
1
vote
3answers
74 views
Copy sql file over ssh with accents
I'm trying to migrate one database from server A to server B.
The database is mysql. This database has some records with character like ç, ã é, ...
The database encoding is UTF8
So on server A I ...
1
vote
0answers
12 views
ISO8859-1 characters appear as question marks via Samba
I have a CIFS mount looking like this:
rw,cache=loose,credentials=/etc/x.txt,uid=33,gid=33,file_mode=0660,dir_mode=0770,nofail
I am trying to rm -r /storage/somedir and it complains about "No such ...
1
vote
1answer
106 views
Webmin help page encoding : iso-8859-1 vs utf-8
Webmin is serving static help pages.
Webmin 1.47 was using the characters set was iso-8859-1 as character encoding. This information is transmitted by the HTTP header
content type:"Text/html; ...
3
votes
2answers
732 views
Convert binary encoding that head and Notepad can read to UTF-8
I have a CSV file which is in binary character set but I have to convert to UTF-8 to process in HDFS (Hadoop).
I have used the below command to check characterset.
file -bi filename.csv
Output :
...
1
vote
4answers
111 views
I have a file called “¬” and I am confused
I have found a file called ¬ on an old legacy Solaris server and I am confused about how to interact with it on the command line (bash 2.05).
¬ has the same function on the command line as the home ...
0
votes
1answer
229 views
Why did this file not convert to UTF-8 when using iconv? [duplicate]
Versions: Linux 2.6; Bash 4.1.2; iconv 2.12
The ISO conversion returned no errors, yet the converted file still shows as US-ASCII.
Question
How can I transcode foobar.txt to UTF-8?
$> file -bi ...
3
votes
2answers
115 views
Problem with reading text file encoded in Western encoding (ISO-8859-1)
I'm having a problem with encoding of ISO-8859-1 text file (subtitles in Polish language), which looks something like that:
Mieszka³ sam,|¿adnej ¿ony, dzieci.
It should be : "Mieszkał sam, żadnej ...
1
vote
1answer
51 views
store file with invalid characters
Some files we get from a customer could not be processed properly because they were declared as US-ASCII but contained invalid characters. In order to validate a software fix, I am trying to copy ...
0
votes
1answer
41 views
Unexpected appearance of â in man
In few of the man pages I've often seen below typical appearance of the character â in stead of an apostrophe, say in man who,
-T, -w, --mesg
add userâs message status as +, - or ?
Why would that ...
1
vote
1answer
309 views
How to convert GBK to UTF-8 in a mixed encoding directory?
Background :
You can skip this section, if you are not interested in.
I usually backup my MicroSD card of my cell phone, with command sudo dd if=/dev/sdc1 of=~/Document/Cell\ Phone\ Files/...
11
votes
2answers
488 views
Which character encodings are supported by posix?
POSIX defines the behavior of tools such as grep, awk, sed, etc which work against text files.
Since it is a text file, I think there is the problem(s) of character encoding.
Question:
What is the ...
2
votes
1answer
161 views
Fish shell shows dark-grey “⎔ characters in prompt
I'm pretty certain this is some foible of my SSH client (RoyalTS for Windows) but, having just installed and changed to fish shell, my prompt is preceded by two dark-grey ⎠characters.
It doesn't ...
1
vote
1answer
26 views
Understanding what is happening when I dump a terminal character sequence with Ctrl-v?
If I want to bind a key-mapping to a function or widget in zsh I have learnt that I first have to hit Ctrl+v - at a prompt, then enter the key sequence I want to use, then use the output in my key-...
3
votes
2answers
2k views
UTF-8 characters are not displayed correctly in Debian
Short description of my problem:
I ran into an issue lately where I am unable to make bash/nano/irssi/etc display "special" UTF-8 characters like the german umlauts (äüö), the euro sign (€) and some ...
0
votes
0answers
95 views
Special characters are not recognized even with UTF-8 in KDE
So, I have Manjaro KDE pre 16 and I have some character encoding issues. Characters like ã á and such are not recognised and are represented by the ? sign.
And all programs seem to use this encoding ...
0
votes
0answers
262 views
Fix broken windows cyrillic filename encoding
I have the following folder:
$ ls -1 --color=never
??
??¨??_?¥¬®áâà æ¨®ë© ¬ â¥à¨ « ¨ ¢ë¡®àª¨
?१¥â 樨
??¨??_?á®¢ë¥ ¨áâ®ç¨ª¨
?ª§ ¬¥
??¨??_?§ ??¨?? £®á. íª§ ¬¥
_ТВиМС_13-14.zip
As you ...
1
vote
0answers
63 views
certain characters get converted when adding to file
I have a script I am writing, that includes passing across a connection string to socat.
When I paste this into the command line it is fine, but when I open a file and paste it into there (pico ...
0
votes
1answer
247 views
How can I get rid of the byte-order mark 
I have created an html webpage when I run it at the bottom of the page there's some unwanted characters uncoding . I tried to get rid of it through vi using set nobomb but it still appears in the ...
1
vote
1answer
595 views
Octals 302 240 together seem to correspond to non-breaking space
By looking at a particular line of a text file (say, the 1123th, see below), it seems that there is a non-breaking space, but I am not sure:
$ cat myfile.csv | sed -n 1123p | cut -f2
Lisztes feher
$ ...
2
votes
1answer
123 views
Why is this find command not returning filenames containing non-ASCII characters only?
I'm trying to determine the root cause of why this find command is not working; it shouldn't match the file called this_should_not_match below:
$ > find . -type f -name "*[^ -~]*"
./__º╚t
./...
4
votes
1answer
279 views
Is it possible to convert linux salted sha512 password hash to LDAP format?
We have an LDAP server which stores passwords and other user data.
The server is not used for authentication of client machines though but only for authentication of client apps.
So users change their ...
2
votes
0answers
140 views
how to copy a piece of a text file byte-by-byte to another text file? dd, head, or?
I need to grab the first lines of a long text file for some bugfixing on a smaller file (a Python script does not digest the large text file as intended). However, for the bugfixing to make any sense, ...
3
votes
3answers
410 views
does head input > output copy all invisible characters to the new file?
I need to grab the first lines of a long text file for some bugfixing on a smaller file (a Python script does not digest the large text file as intended). However, for the bugfixing to make any sense, ...
2
votes
0answers
49 views
Generating 'ASCII-art' banners with arrows
I recently discovered figlet which generates ASCII-art banners. Joy!
... but, alas, I want a banner with an arrow on it. Now...
$ figlet unicode → arrow
_ _ //\ ...
0
votes
0answers
355 views
unzipping files with non-ASCII charactes in the file names
On the Linux distribution that I use, I have been frustrated with the unzip command since every time I have unzipped files where the file names contain non-ASCII characters, the command mangles the ...
1
vote
1answer
39 views
USB LABEL in different language
If the USB LABEL (name) is in different langauage (Hindi or Chinese) how do i get to know which language it is coded to pass to the iconv or is there any way we can know the language which has been ...
5
votes
1answer
32 views
Can I set both (in and out) charsets in `less`?
I can tell less to output characters in UTF-8:
export LESSCHARSET=UTF-8
But then it tries to read files as UTF-8 as well.
Can I tell it to read files as ISO-8859-2 (latin2) but display them as UTF-...
5
votes
1answer
209 views
How to fix file name encoding
I scrapped a site with wget.
That site is in German and some of that pages had Ü,ü,Ö,ö,Ä,ä,ß in the URL.
Now some files have a very weird name.
For example one file is called mirror.de/�%...
5
votes
2answers
1k views
Specify encoding with libreoffice --convert-to csv
Excel files can be converted to CSV using:
$ libreoffice --convert-to csv --headless --outdir dir file.xlsx
Everything appears to work just fine. The encoding, though, is set to something wonky. ...
0
votes
0answers
109 views
Iceweasel content encoding errors (caused by Debian 8.3 upgrade)
I recently updated my system from Debian Mate 8.2 + Iceweasel 38.5 to Debian Mate 8.3 + Iceweasel 38.6
Almost immediatly after this, I noticed that my browsing was considerably slower but I thought it ...
1
vote
1answer
144 views
Encoding of /proc/<pid>/cmdline files
I am thinking about working with the /proc/<pid>/cmdline files, but I couldn't find any documentation about the file encoding. The only piece of information I could find is located in the man ...
0
votes
0answers
42 views
info to get using last digits of ip address
What is the product/registered owners id based on the last digits of
IP#: 85ff:fec9:d41b
My facebook account shows a login from that IP# and I am accessing the owner info and login info for my own ...
5
votes
5answers
806 views
Rename folder with odd characters
I have a folder on my Mac called "␀␀␀␀HFS+ Private Data". I'm trying to delete it but it contains a bunch of odd characters that are choking unlink, rm and mv, making it difficult to remove it and ...
13
votes
3answers
730 views
How to convert an emoticon specified by a U+xxxxx code to utf-8?
Emoticons seem to be specified using a format of U+xxxxx
wherein each x is a hexadecimal digit.
For example, U+1F615 is the official Unicode Consortium code for the "confused face" 😕
As I am ...
2
votes
1answer
5k views
How can I correctly decompress a ZIP archive of files with Hebrew names?
(Question self-migrated from superuser.com)
Someone sent me a ZIP file containing files with Hebrew names (and created on Windows, not sure with which tool). I use LXDE on Debian Stretch. The Gnome ...
1
vote
0answers
327 views
curl --data-urlencode and underscores
I've been using curl -XPOST to post some links to a Telegram channel via a bot api/key, the urls are in form of https://site/x/pre_encoded_string, where pre_encoded_string is in form (real samples) ...