Take the 2-minute tour ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems.. It's 100% free, no registration required.

I want to decode URL encoding, is there any built-in tool for doing this or could anyone provide me with a sed code that will do this?

I did search a bit through unix.stackexchange.com and on the internet but I couldn't find any command line tool for decoding url encoding.

What I want to do is simply in place edit a txt file so that:

  • %21 becomes !
  • %23 becomes #
  • %24 becomes $
  • %26 becomes &
  • %27 becomes '
  • %28 becomes (
  • %29 becomes )

And so on.

share|improve this question

6 Answers 6

up vote 8 down vote accepted

Found these Python one liners that do what you want:

$ alias urldecode='python -c "import sys, urllib as ul; \
    print ul.unquote_plus(sys.argv[1])"'

$ alias urlencode='python -c "import sys, urllib as ul; \
    print ul.quote_plus(sys.argv[1])"'

Example

$ urldecode 'q+werty%3D%2F%3B'
q werty=/;

$ urlencode 'q werty=/;'
q+werty%3D%2F%3B

References

share|improve this answer

There is a built-in function for that in the Python standard library. In Python 2, it's urllib.unquote.

decoded_url=$(python2 -c 'import sys, urllib; print urllib.unquote(sys.argv[1])' "$encoded_url")

Or to process a file:

python2 -c 'import sys, urllib; print urllib.unquote(sys.stdin.read())' <file >file.new &&
mv -f file.new file

In Python 3, it's urllib.parse.unquote.

decoded_url=$(python2 -c 'import sys, urllib.parse; print(urllib.parse.unquote(sys.argv[1]))' "$encoded_url")

Or to process a file:

python3 -c 'import sys, urllib; print(urllib.parse.unquote(sys.stdin.read()))' <file 

file.new && mv -f file.new file

In Perl you can use URI::Escape.

decoded_url=$(perl -MURI::Escape -e 'print uri_unescape($ARGV[0])' "$encoded_url")

Or to process a file:

perl -i -MURI::Escape -e 'print uri_unescape($ARGV[0])' file

If you want to stick to POSIX portable tools, it's awkward, because the only serious candidate is awk which doesn't parse hexadecimal numbers. see Using awk printf to urldecode text for examples with common awk implementations including BusyBox.

share|improve this answer

If you want to simple use sed command then use following command:

sed -e 's/%21/!/g' -e 's/%23/#/g' -e 's/%24/$/g' -e 's/%26/&/g' -e s/%27/"'"/g -e 's/%28/(/g' -e 's/%29/)/g'

But is is more convenient to create script like (say sedscript):

#!/bin/bash
s/%21/!/g
s/%23/#/g
s/%24/$/g
s/%26/&/g
s/%27/"'"/g
s/%28/(/g
s/%29/)/g

Then run sed -f sedscript <old >new which will out-put as you desired.


For an ease, command urlencode is also available directly in gridsite-clients package can be installed from (by sudo apt-get install gridsite-clients in Ubuntu/Debain system)

NAME
       urlencode - convert strings to or from URL-encoded form

SYNOPSIS
       urlencode [-m|-d] string [string ...]

DESCRIPTION
       urlencode encodes strings according to RFC 1738.

       That is, characters A-Z a-z 0-9 . _ and - are passed through unmodified, but all other characters
       are represented as %HH, where HH is their two-digit upper-case hexadecimal ASCII  representation.
       For example, the URL http://www.gridpp.ac.uk/ becomes http%3A%2F%2Fwww.gridpp.ac.uk%2F

       urlencode  converts  each  character  in  all  the strings given on the command line. If multiple
       strings are given, they are concatenated with separating spaces before conversion.

OPTIONS
       -m     Instead of full conversion, do GridSite "mild URL encoding" in which A-Z a-z 0-9 . = - _ @
              and  / are passed through unmodified. This results in slightly more human-readable strings
              but the application must be prepared to create or simulate the directories implied by  any
              slashes.

       -d     Do  URL-decoding rather than encoding, according to RFC 1738. %HH and %hh strings are con‐
              verted and other characters are passed through unmodified, with the exception  that  +  is
              converted to space.

Example of decoding url:

$ urlencode -d "http%3a%2f%2funix.stackexchange.com%2f"
http://unix.stackexchange.com/

$ urlencode -d "Example: %21, %22, . . . , %29 etc"
Example: !, ", . . . , ) etc
share|improve this answer
    
For tutorial on sed visit –  Pandya 2 days ago
    
This is a bad solution, because it requires hardcoding every character. This problem is exemplified by your code missing the often used %20 escape sequence. –  Overv yesterday
    
@Overv I've just Revised –  Pandya yesterday

And another Perl approach:

#!/usr/bin/env perl
use URI::Encode;
my $uri     = URI::Encode->new( { encode_reserved => 0 } );
while (<>) {

    print $uri->decode($_)
}

You will need to install the URI::Encode module. On my Debian, I could simply run

sudo apt-get install URI::Encode

Then, I ran the script above on a test file containing:

http://foo%21asd%23asd%24%26asd%27asd%28asd%29

The result was (I had saved the script as foo.pl):

$ ./foo.pl
http://foo!asd#asd$&asd'asd(asd)
share|improve this answer

Using GNU awk 4.1.0

awk -iord '
RT {
  RT = chr(strtonum("0x" substr(RT, 2)))
}
{
  printf $0 RT
}
' RS=%..

Or golfed

awk -iord 'RT{RT=chr(strtonum("0x"substr(RT,2)))}{printf$0RT}' RS=%..
share|improve this answer

Perl one liner:

$ perl -pe 's/\%(\w\w)/chr hex $1/ge'

Example:

$ echo '%21%22' |  perl -pe 's/\%(\w\w)/chr hex $1/ge'
!"
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.