1

I have a complex string of this form:

inp="key1 =   what' ever the value key2 = the value Nb.2   key3= \"last value\""

I need to get the first key associated with its first value. I want to use bash regex to extract the key, the value, and what remains from the string:

rkeyval="[[:space:]]*([_[:alnum:]]*?)[[:space:]]*=[[:space:]]*((.*?)[[:space:]]+([_[:alnum:]]+?[[:space:]]*=[[:space:]]*.*))"

if [[ $inp =~ $rkeyval ]]; then

  key=${BASH_REMATCH[1]}
  val=${BASH_REMATCH[3]}
  left=${BASH_REMATCH[4]}

  for i in $(seq 0 $(( ${#BASH_REMATCH[*]}-1 ))); do  
    echo -e "$i: \"${BASH_REMATCH[$i]}\""; 
  done; 
else
  echo "no match"
fi

This does not work. On my Mac with Bash 4.4, there is no match:

no match

On my Red Hat Linux, I get the following output:

0: "key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""
1: "key1"
2: "what' ever the value key2 = the value Nb.2   key3= "last value""
3: "what' ever the value key2 = the value Nb.2  "
4: "key3= "last value""

I expect the following output:

0: "key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""
1: "key1"
2: "what' ever the value key2 = the value Nb.2   key3= "last value""
3: "what' ever the value"
4: "key3= "last value""

In other words, the key would be the second matching group, and the value the third.

This expression works on an online PHP regexp tester.

I want this to work in any Unix machine having an updated version of Bash.

I don't know why this does not work, and why the result differ from one platform to another, even though my regex respects the Posix convention (or does it?). What am I doing wrong here?

2
  • If you "want this to work in any unix machine having an updated version of bash" then you don't want it to work on a Mac, which has Bash from over a decade ago. Jan 9, 2017 at 20:09
  • There is a bit of truth in that statement. However bash supports regex since version 3.something. My mac has bash 4.4.0. Bash regex should work. If really not, at least let's find a general answer for linux!
    – kaligne
    Jan 9, 2017 at 21:35

2 Answers 2

2

POSIX does not define *? for EREs, which Bash uses, instead specifying that:

The behavior of multiple adjacent duplication symbols ( '+', '*', '?', and intervals) produces undefined results.

Bash uses the system regcomp/regexec for regular-expression matching. Apple's libc presumably does not implement the behaviour you want for *?.

There is no standard way to recover non-greedy matching semantics from greedy, though in this case at least some of them are unnecessary (the first [_[:alnum:]]*?, for example). Otherwise, you need to transform the expression to match something else or mutate the data in advance (and probably afterwards) to get the effect.

1

An asterisk is already an optional count (as it could be zero characters). There is no need to add an ? to it.

So, Will it be ok if each parenthesis will capture a key or a value?:

s='[[:space:]]*'        # spaces
n='[_[:alnum:]]+'       # a valid name (limited by spaces)
e="${s}=${s}"           # an equal sign (=).

rkeyval="${s}(${n})${e}([^=]*) (${n})${e}([^=]*) (${n})${e}(.*)"
#            1^^^^^    2^^^^^^ 3^^^^^    4^^^^^^ 5^^^^^    6^^^
echo "$rkeyval"

That will capture like this:

if [[ $inp =~ $rkeyval ]]; then

    i=0
    while ((i<${#BASH_REMATCH[@]})); do
        printf '%s: "%s"\n' "$((i))" "${BASH_REMATCH[i++]}";
    done
else
    echo "no match"
fi

Printing:

0: "key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""
1: "key1"
2: "what' ever the value"
3: "key2"
4: "the value Nb.2  "
5: "key3"
6: ""last value""

And the values you want (if I understand your code correctly) could be approximated by (edit to get a perfect match):

key="${BASH_REMATCH[1]}"
val="${BASH_REMATCH[@]:2:3}"
left="${BASH_REMATCH[@]:5:2}"
0

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .