Take the 2-minute tour ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It's 100% free, no registration required.

I'm new in linux (shell). I need to decode base64 text in xml file using linux shell script. Could you please help me to write linux shell script for decoding the values of those tags where attribute is encoding="base64" the structure of my file is

    <directory-entries>
        <entry dn="ads">
        <attr name="memberof">
        <value>CN=VPN-employee</value>
        <value encoding="base64">aGVsbG8gd29ybGQ=   </value>
<value encoding="base64">
Q049RmxvcHB5IC0g0LTQvtGB0YLRg9C/INC30LDQutGA0YvRgixPVT1EZXZpY2UgQ29udHJv
bCxPVT1Hcm91cHMsT1U90JHQkNCd0JosREM9aHEsREM9YmM=
    </value>
    <value encoding="base64">
Q049VVNCLdC00LjRgdC60LggLSDRgtC+0LvRjNC60L4g0YfRgtC10L3QuNC1LE9VPURldmlj
ZSBDb250cm9sLE9VPUdyb3VwcyxPVT3QkdCQ0J3QmixEQz1ocSxEQz1iYw==
    </value>
    </attr>
    </entry>
    </directory-entries>

The wanted output is

    <directory-entries>
        <entry dn="ads">
        <attr name="memberof">
        <value>CN=VPN-employee</value>
        <value encoding="base64">Hello world  </value>
       <value encoding="base64"> decoded         </value>
       <value encoding="base64">    decoded         </value>
    </attr>
    </entry>
    </directory-entries>

I'm generating XML from Active Directory using ldapsearch. The script that I used to obtain this file is:

ldapsearch -h host -p 389 -D "CN=informatica,OU=Accounts for System Purposes,OU=System Accounts,DC=hq,DC=bc" -w password -s sub -B -E UTF-8 -X "(&(objectClass=organizationalPerson)(CN=*))" employeeID memberof > ldap_logins.xml

I don't know if it is possible to decode the text while generating the xml file. Thank you in advance!

share|improve this question
    
I don't have a complete answer, but a couple of hints. On the ldapsearch side, you can use the -t option to output "non-printable" text to temporary files rather than Base64-encoded values. If you want to parse XML, check out XMLStarlet. Also, does the output need to be valid XML? Shouldn't the "encoded" attribute be dropped from the output? –  Stephen Kitt May 20 at 8:02
    
Thank you for feedback. Yes, the output should be valid XML. I need decoded value, the attribute itself can be dropped from the output –  Meruyert May 20 at 8:31
    
@Meruyert I've provided a proper answer using an xml parser called xmlstarlet. Just check it, if it helps. –  shivams May 22 at 2:57

3 Answers 3

Compact Script

Assuming the xml is in file.xml, just do:

sed -r 's/("base64">)([[:graph:]]+)/\1'"`grep -oP '"base64">\K[[:graph:]]+' test | base64 -d`"'/g' file.xml 

This is a compact regex, which will do the task. Let me break it down and explain.

Break Down

First I select the base64 string using grep and decode it:

grep -oP '"base64">\K[[:graph:]]+' file.xml | base64 -d

I could save this in a variable:

baseString=`grep -oP '"base64">\K[[:graph:]]+' file.xml | base64 -d`

Then use sed to replace the base64 with the decoded string saved in the variable:

sed -r 's/("base64">)([[:graph:]]+)/\1'"$baseString"'/g' file.xml
share|improve this answer
    
Thank you for your answer! The script works for cases where values do not have line breaks. I have line breaks in values. I've updated the structure of the file in the question, added more examples. Do you have any ideas how to deal with those line breaks? –  Meruyert May 20 at 10:57
    
Oh! Multi-line regex is very tricky using bash. For such cases, it is better advised to go for some proper xml parser. However, I will provide some solution using regex. Wait. –  shivams May 20 at 11:29

I'll say what I always do. Please NEVER use regular expressions to parse XML. It's bad news. XML has some various formatting which means semantically identical XML will match or not match certain regular expressions. Simple things like line wrapping, unary tags, etc.

This means you create brittle code, which one day might mysteriously break because of an upstream and perfectly valid change to your data flow.

For parsing your XML I would suggest using perl and the quite excellent XML::Twig module.

#!/usr/bin/perl
use strict;
use warnings;

use XML::Twig;
use MIME::Base64;

#we take a "value" element, check it for an "encoding=base64" and if it is
#we rewrite the content and delete that attribute in the XML. 
sub decode_value {
    my ( $twig, $value ) = @_;
    if (    $value->att('encoding')
        and $value->att('encoding') eq "base64" )
    {
        my $decoded_text = decode_base64( $value->text );
        if ( $decoded_text =~ m/[^\s\d\w\=\-\,\.]/ ) {
            $decoded_text = "decoded";
        }
        $value->set_text($decoded_text);
        $value -> del_att('encoding');

    }
}


#twig handlers 'fires' a piece of code each time you hit a 'value' element. 
#it passes this piece of code that chunk of XML to handle, which means
#you can do things like dynamic XML rewrites 
#pretty print controls output XML rendering - there's a variety of options
#check the manpage. 
my $twig = XML::Twig->new(
    pretty_print  => "indented",
    twig_handlers => { 'value' => \&decode_value, }
);
$twig->parsefile('your_xml_file');
$twig->print;

This will give:

<directory-entries>
  <entry dn="ads">
    <attr name="memberof">
      <value>CN=VPN-employee</value>
      <value encoding="base64">hello world</value>
      <value encoding="base64">decoded</value>
      <value encoding="base64">decoded</value>
    </attr>
  </entry>
</directory-entries>

You could alternatively transform $decoded_text like this:

$decoded_text =~ s/[^\s\d\w=,-. ]+/_/g;

(URI::Escape module is worth a look here too, as it 'percent encodes' text URL style. )

Which would give instead:

  <value encoding="base64">CN=Floppy - _ _,OU=Device Control,OU=Groups,OU=_,DC=hq,DC=bc</value>
  <value encoding="base64">CN=USB-_ - _ _,OU=Device Control,OU=Groups,OU=_,DC=hq,DC=bc</value>

But you might also find using Net::LDAP does what you need.

#!/usr/bin/perl
use strict;
use warnings;

use Net::LDAP;

my $ldap   = Net::LDAP->new('host');
my $result = $ldap->bind(
    'CN=informatica,OU=Accounts for System Purposes,OU=System Accounts,DC=hq,DC=bc',
    'password'
);
if ( $result->code ) { die "Error connecting to LDAP server"; }

my $ldap_search = $ldap->search(
    base   => 'DC=hq,DC=bc',
    scope  => 'subtree',
    filter => '(&(objectClass=organizationalPerson)(CN=*))',
    attrs  => [ 'employeeID', 'memberOf' ],
);

foreach my $entry ( $ldap_search->entries ) {
    print "dn:\t", $entry->dn(), "\n";
    foreach my $attr ( $entry->attributes ) {
        print "$attr:";
        foreach my $value ( $entry->get_value($attr) ) {
            next unless defined $value;
            if ( $value =~ m/[^\s\d\w,-=+@\'.()]/ ) { $value = "binary_data" }
            chomp($value);
            print "\t$value\n";
        }
    }
}
share|improve this answer
    
Yes. Using an xml parser is always the only sane option. @Meruyert please use this solution (if it works fine) , rather than going for my regex based solution. –  shivams May 20 at 14:57
    
It is unclear which language you are using. @Sobrique. –  shivams May 20 at 15:00
    
Wow, that's impressive on my part. Amended answer to indicate that I do mean perl here ;) –  Sobrique May 20 at 15:02
    
Sorry for my ignorance. But I am really a new kid. Born in the era of Python, rather than Perl. Done a lot of bash but never touched Perl :/ Perhaps, I should be ashamed :| –  shivams May 20 at 15:03
    
Hardly. Perl and Python have very similar use cases. I'm crusty enough to pre-date python, and learned perl back when it was really the only option for extending shell scripting. Still like it though, not least because it remains pretty similar to shell, and very widely supported. –  Sobrique May 20 at 15:45

Here is a proper answer using xmlstarlet. This is a tool used for xml parsing and editing. First of all, install this package on your system. If you're on a Debian-based system, then do:

sudo apt-get install xmlstarlet

Now,

  1. first we read the value of base64 encoded string
  2. then we decode this string
  3. then we modify the corresponding tag value

Here is the complete script for that:

#!/bin/bash

for i in $(seq 3)
do
    #Find the string and decoded it and save it in a variable
    decodedString=`xmlstarlet sel -t -v "/directory-entries/entry/attr/value[@encoding='base64'][$i]" file.xml | tr -d \r\n[:space:] | base64 -d`

    #Now modify the xml document
    xmlstarlet ed -L -u "/directory-entries/entry/attr/value[@encoding='base64'][$i]" -v "$decodedString" file.xml
done

I have done it for a loop of 3. You do it for whatever number of elements you have.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.