Questions that deal with various representations of characters & character sets, such as: ASCII, UTF-8, EBCDIC, among others. Often encountered when moving files between operating systems that encode new lines with carriage returns and/or newline characters.

learn more… | top users | synonyms

1
vote
1answer
29 views

How to find non-British non-ASCII non-LaTeX characters for pdftex?

I am debugging my tex file by eliminating all technical flaws in the systems. I cannot find anything wrong in my document with Tex community here and myself so I think there can be something non-ASCII ...
4
votes
2answers
174 views

Replace “/U+[0-9A-Fa-f]{4}/” with proper unicode character in shell pipeline with sed eval flag

I am trying to properly visualize the existing characters that listed in the /usr/include/X11/keysymdef.h file. It has lines like: #define XK_onethird 0x0ab0 /* U+2153 VULGAR FRACTION ONE THIRD *...
3
votes
2answers
23 views

fold and text columns

Can fold be set to recognize characters instead of bytes? Traditional Chinese characters appear to be encoded in three bytes each (in UTF-8 at least), which means that if fold's -w is not a multiple ...
1
vote
2answers
37 views

How to determine the character encoding that a terminal uses in a C/C++ program?

I've noticed that SyncTERM uses a different character encoding than the default MacOS terminal emulator, and they're incompatible with one another. For example, say you want to print a block ...
1
vote
2answers
45 views

Print a character having a codepoint

I have a list of codepoints like 0x13000, 0x1300A. I have to print the corresponding Unicode characters from bash. I've already tried to do it with other commands that I've found searching in the ...
12
votes
2answers
898 views

Wget returning binary instead of html?

I am using wget to download a static html page. The W3C Validator tells me the page is encoded in UTF-8. Yet when I cat the file after download, I get a bunch of binary nonsense. I'm on Ubuntu, and ...
4
votes
2answers
77 views

Find files by character encoding

I have a long-running python script that failed to utf-8 decode a file. The error message doesn't tell me what file it failed on, only that it couldn't decode byte 0x81 in position 194. I know which ...
1
vote
1answer
32 views

How to convert interrogation mark character in accented letters

I have a file containing accented letters with cat, I get interrogation marks voltm�tre with less, I get accented letters with brackets voltm<E8>tre with vim, I get the accented letter ...
1
vote
1answer
31 views

Linux Mint doesn't write Arabic letters

I installed arabic fonts for Linux mint and i can switch between arabic and English, but it seems that mint cannot write arabic letters for example renaming a file or writing in any text-editor, when ...
2
votes
1answer
51 views

Why can't I convert a UTF-8 to MS-ANSI using iconv?

I am trying to convert a file from utf-8 to ms-ansi. I use iconv -f UTF8 -t MS-ANSI// < data.txt but get iconv: illegal input sequence at position 171359 when looking into this dd if=...
3
votes
1answer
27 views

When installing Linux what factors go into choosing the locale for the server?

When I am installing Linux (for the GB locale) I am presented with the option of choosing en_GB, en_GB.UTF-8 and en_GB.ISO-8859-15. What factors go into making the choice? As far as I know the ...
0
votes
1answer
25 views

Unreadable lozenge characters in tty1

Something wrong is going on in my Debian Jessie. Usually I use tty7 with GUI and everything is fine here. In tty1 though, Polish characters (both being typed and read from UTF-8 files) are represented ...
0
votes
2answers
65 views

Unable to display Greek letters on Mutt 1.7

I recently installed mutt on Linux Mint 18 using apt. I configured it and it worked great for all my three accounts. Then I realized I had mutt version 1.5 while version 1.7 is the latest one. I ...
8
votes
2answers
172 views

How to set fallback encoding to UTF-8 in Firefox?

I've written a Norwegian markdown document: $ file brukerveiledning.md brukerveiledning.md: UTF-8 Unicode text I've converted it to HTML using the markdown command: $ markdown > brukerveiledning....
1
vote
3answers
74 views

Copy sql file over ssh with accents

I'm trying to migrate one database from server A to server B. The database is mysql. This database has some records with character like ç, ã é, ... The database encoding is UTF8 So on server A I ...
1
vote
0answers
12 views

ISO8859-1 characters appear as question marks via Samba

I have a CIFS mount looking like this: rw,cache=loose,credentials=/etc/x.txt,uid=33,gid=33,file_mode=0660,dir_mode=0770,nofail I am trying to rm -r /storage/somedir and it complains about "No such ...
1
vote
1answer
106 views

Webmin help page encoding : iso-8859-1 vs utf-8

Webmin is serving static help pages. Webmin 1.47 was using the characters set was iso-8859-1 as character encoding. This information is transmitted by the HTTP header content type:"Text/html; ...
3
votes
2answers
732 views

Convert binary encoding that head and Notepad can read to UTF-8

I have a CSV file which is in binary character set but I have to convert to UTF-8 to process in HDFS (Hadoop). I have used the below command to check characterset. file -bi filename.csv Output : ...
1
vote
4answers
111 views

I have a file called “¬” and I am confused

I have found a file called ¬ on an old legacy Solaris server and I am confused about how to interact with it on the command line (bash 2.05). ¬ has the same function on the command line as the home ...
0
votes
1answer
229 views

Why did this file not convert to UTF-8 when using iconv? [duplicate]

Versions: Linux 2.6; Bash 4.1.2; iconv 2.12 The ISO conversion returned no errors, yet the converted file still shows as US-ASCII. Question How can I transcode foobar.txt to UTF-8? $> file -bi ...
3
votes
2answers
115 views

Problem with reading text file encoded in Western encoding (ISO-8859-1)

I'm having a problem with encoding of ISO-8859-1 text file (subtitles in Polish language), which looks something like that: Mieszka³ sam,|¿adnej ¿ony, dzieci. It should be : "Mieszkał sam, żadnej ...
1
vote
1answer
51 views

store file with invalid characters

Some files we get from a customer could not be processed properly because they were declared as US-ASCII but contained invalid characters. In order to validate a software fix, I am trying to copy ...
0
votes
1answer
41 views

Unexpected appearance of â in man

In few of the man pages I've often seen below typical appearance of the character â in stead of an apostrophe, say in man who, -T, -w, --mesg add userâs message status as +, - or ? Why would that ...
1
vote
1answer
309 views

How to convert GBK to UTF-8 in a mixed encoding directory?

Background : You can skip this section, if you are not interested in. I usually backup my MicroSD card of my cell phone, with command sudo dd if=/dev/sdc1 of=~/Document/Cell\ Phone\ Files/...
11
votes
2answers
488 views

Which character encodings are supported by posix?

POSIX defines the behavior of tools such as grep, awk, sed, etc which work against text files. Since it is a text file, I think there is the problem(s) of character encoding. Question: What is the ...
2
votes
1answer
161 views

Fish shell shows dark-grey “⎔ characters in prompt

I'm pretty certain this is some foible of my SSH client (RoyalTS for Windows) but, having just installed and changed to fish shell, my prompt is preceded by two dark-grey ⎠characters. It doesn't ...
1
vote
1answer
26 views

Understanding what is happening when I dump a terminal character sequence with Ctrl-v?

If I want to bind a key-mapping to a function or widget in zsh I have learnt that I first have to hit Ctrl+v - at a prompt, then enter the key sequence I want to use, then use the output in my key-...
3
votes
2answers
2k views

UTF-8 characters are not displayed correctly in Debian

Short description of my problem: I ran into an issue lately where I am unable to make bash/nano/irssi/etc display "special" UTF-8 characters like the german umlauts (äüö), the euro sign (€) and some ...
0
votes
0answers
95 views

Special characters are not recognized even with UTF-8 in KDE

So, I have Manjaro KDE pre 16 and I have some character encoding issues. Characters like ã á and such are not recognised and are represented by the ? sign. And all programs seem to use this encoding ...
0
votes
0answers
262 views

Fix broken windows cyrillic filename encoding

I have the following folder: $ ls -1 --color=never ?? ??¨??_?¥¬®­áâà æ¨®­­ë© ¬ â¥à¨ « ¨ ¢ë¡®àª¨ ?१¥­â æ¨¨ ??¨??_?á­®¢­ë¥ ¨áâ®ç­¨ª¨ ?ª§ ¬¥­ ??¨??_?§ ??¨?? ­ £®á. íª§ ¬¥­ _ТВиМС_13-14.zip As you ...
1
vote
0answers
63 views

certain characters get converted when adding to file

I have a script I am writing, that includes passing across a connection string to socat. When I paste this into the command line it is fine, but when I open a file and paste it into there (pico ...
0
votes
1answer
247 views

How can I get rid of the byte-order mark 

I have created an html webpage when I run it at the bottom of the page there's some unwanted characters uncoding . I tried to get rid of it through vi using set nobomb but it still appears in the ...
1
vote
1answer
595 views

Octals 302 240 together seem to correspond to non-breaking space

By looking at a particular line of a text file (say, the 1123th, see below), it seems that there is a non-breaking space, but I am not sure: $ cat myfile.csv | sed -n 1123p | cut -f2 Lisztes feher $ ...
2
votes
1answer
123 views

Why is this find command not returning filenames containing non-ASCII characters only?

I'm trying to determine the root cause of why this find command is not working; it shouldn't match the file called this_should_not_match below: $ > find . -type f -name "*[^ -~]*" ./__º╚t ./...
4
votes
1answer
279 views

Is it possible to convert linux salted sha512 password hash to LDAP format?

We have an LDAP server which stores passwords and other user data. The server is not used for authentication of client machines though but only for authentication of client apps. So users change their ...
2
votes
0answers
140 views

how to copy a piece of a text file byte-by-byte to another text file? dd, head, or?

I need to grab the first lines of a long text file for some bugfixing on a smaller file (a Python script does not digest the large text file as intended). However, for the bugfixing to make any sense, ...
3
votes
3answers
410 views

does head input > output copy all invisible characters to the new file?

I need to grab the first lines of a long text file for some bugfixing on a smaller file (a Python script does not digest the large text file as intended). However, for the bugfixing to make any sense, ...
2
votes
0answers
49 views

Generating 'ASCII-art' banners with arrows

I recently discovered figlet which generates ASCII-art banners. Joy! ... but, alas, I want a banner with an arrow on it. Now... $ figlet unicode → arrow _ _ //\ ...
0
votes
0answers
355 views

unzipping files with non-ASCII charactes in the file names

On the Linux distribution that I use, I have been frustrated with the unzip command since every time I have unzipped files where the file names contain non-ASCII characters, the command mangles the ...
1
vote
1answer
39 views

USB LABEL in different language

If the USB LABEL (name) is in different langauage (Hindi or Chinese) how do i get to know which language it is coded to pass to the iconv or is there any way we can know the language which has been ...
5
votes
1answer
32 views

Can I set both (in and out) charsets in `less`?

I can tell less to output characters in UTF-8: export LESSCHARSET=UTF-8 But then it tries to read files as UTF-8 as well. Can I tell it to read files as ISO-8859-2 (latin2) but display them as UTF-...
5
votes
1answer
209 views

How to fix file name encoding

I scrapped a site with wget. That site is in German and some of that pages had Ü,ü,Ö,ö,Ä,ä,ß in the URL. Now some files have a very weird name. For example one file is called mirror.de/�%...
5
votes
2answers
1k views

Specify encoding with libreoffice --convert-to csv

Excel files can be converted to CSV using: $ libreoffice --convert-to csv --headless --outdir dir file.xlsx Everything appears to work just fine. The encoding, though, is set to something wonky. ...
0
votes
0answers
109 views

Iceweasel content encoding errors (caused by Debian 8.3 upgrade)

I recently updated my system from Debian Mate 8.2 + Iceweasel 38.5 to Debian Mate 8.3 + Iceweasel 38.6 Almost immediatly after this, I noticed that my browsing was considerably slower but I thought it ...
1
vote
1answer
144 views

Encoding of /proc/<pid>/cmdline files

I am thinking about working with the /proc/<pid>/cmdline files, but I couldn't find any documentation about the file encoding. The only piece of information I could find is located in the man ...
0
votes
0answers
42 views

info to get using last digits of ip address

What is the product/registered owners id based on the last digits of IP#: 85ff:fec9:d41b My facebook account shows a login from that IP# and I am accessing the owner info and login info for my own ...
5
votes
5answers
806 views

Rename folder with odd characters

I have a folder on my Mac called "␀␀␀␀HFS+ Private Data". I'm trying to delete it but it contains a bunch of odd characters that are choking unlink, rm and mv, making it difficult to remove it and ...
13
votes
3answers
730 views

How to convert an emoticon specified by a U+xxxxx code to utf-8?

Emoticons seem to be specified using a format of U+xxxxx wherein each x is a hexadecimal digit. For example, U+1F615 is the official Unicode Consortium code for the "confused face" 😕 As I am ...
2
votes
1answer
5k views

How can I correctly decompress a ZIP archive of files with Hebrew names?

(Question self-migrated from superuser.com) Someone sent me a ZIP file containing files with Hebrew names (and created on Windows, not sure with which tool). I use LXDE on Debian Stretch. The Gnome ...
1
vote
0answers
327 views

curl --data-urlencode and underscores

I've been using curl -XPOST to post some links to a Telegram channel via a bot api/key, the urls are in form of https://site/x/pre_encoded_string, where pre_encoded_string is in form (real samples) ...