Is there something like JavaScript's “split()” in the shell?

Question

It's very easy to use split() in JavaScript to break a string into an array.

What about shell script?

Say I want to do this:

$ script.sh var1_var2_var3

When the user give such string var1_var2_var3 to the script.sh, inside the script it will convert the string into an array like

array=( var1 var2 var3 )
for name in ${array[@]}; do
    # some code
done

what shell are you using, with bash you can do IFS='_' read -a array <<< "${string}" — gwillie, yesterday
perl can do that too. It's not "pure" shell, but it's quite common. — Sobrique, yesterday
Technically tr, cut, and awk are part of writing shell scripts; tr and cut will do what you need; as for a built-in for bash? Not sure. — josten, yesterday
@Sobrique I am also unaware of the technical definition of "pure" shell, but there is node.js. — emory, 6 hours ago
I tend to work on 'is it probably installed on my linux box by default' and don't fret the minutiae :) — Sobrique, 4 hours ago

Stéphane Chazelas · Accepted Answer · 2015-09-09 06:59:45Z

Bourne/POSIX-like shells have a split+glob operator and it's invoked every time you leave a parameter expansion ($var, $-...), command substitution ($(...)), or arithmetic expansion ($((...))) unquoted in list context.

Actually, you invoked it by mistake when you did for name in ${array[@]} instead of for name in "${array[@]}". (Actually, you should beware that invoking that operator like that by mistake is source of many bugs and security vulnerabilities).

That operator is configured with the $IFS special parameter (to tell what characters to split on (though beware that space, tab and newline receive a special treatment there)) and the -f option to disable (set -f) or enable (set +f) the glob part.

Also note that while the S in $IFS was originally (in the Bourne shell where $IFS comes from) for Separator, in POSIX shells, the characters in $IFS should rather be seen as delimiters or terminators (see below for an example).

So to split on _:

string='var1_var2_var3'
IFS=_ # delimit on _
set -f # disable the glob part
array=($string) # invoke the split+glob operator

for i in "${array[@]}"; do # loop over the array elements.

To see the distinction between separator and delimiter, try on:

string='var1_var2_'

That will split it into var1 and var2 only (no extra empty element).

So, to make it similar to JavaScript's split(), you'd need an extra step:

string='var1_var2_var3'
IFS=_ # delimit on _
set -f # disable the glob part
temp=${string}_ # add an extra delimiter
array=($temp) # invoke the split+glob operator

(note that it would split an empty $string into 1 (not 0) element, like JavaScript's split()).

To see the special treatments tab, space and newline receive, compare:

IFS=' '; string=' var1  var2  '

(where you get var1 and var2) with

IFS='_'; string='_var1__var2__'

where you get: '', var1, '', var2, ''.

Note that the zsh shell doesn't invoke that split+glob operator implicitly like that unless in sh or ksh emulation. There, you have to invoke it explicitely. $=string for the split part, $~string for the glob part ($=~string for both), and it also has a split operator where you can specify the separator:

array=(${(s:_:)string})

or to preserve the empty elements:

array=("${(@s:_:)string}")

Note that there s is for splitting, not delimiting (also with $IFS, a known POSIX non-conformance of zsh). It's different from JavaScript's split() in that an empty string is split into 0 (not 1) element.

A notable difference with $IFS-splitting is that ${(s:abc:)string} splits on the abc string, while with IFS=abc, that would split on a, b or c.

With zsh and ksh93, the special treatment that space, tab or newline receive can be removed by doubling them in $IFS.

_{As a historic note, the Bourne shell (the ancestor or modern POSIX shells) always stripped the empty elements. It also had a number of bugs related to splitting and expansion of $@ with non-default values of $IFS. For instance IFS=_; set -f; set -- $@ would not be equivalent to IFS=_; set -f; set -- $1 $2 $3....}

Splitting on regexps

Now for something closer to JavaScript's split() that can split on regular expressions, you'd need to rely on external utilities.

In the POSIX tool-chest,awk has a split operator that can split on extended regular expressions (those are more or less a subset of the Perl-like regular expressions supported by JavaScript).

split() {
  awk -v q="'" '
    function quote(s) {
      gsub(q, q "\\" q q, s)
      return q s q
    }
    BEGIN {
      n = split(ARGV[1], a, ARGV[2])
      for (i = 1; i <= n; i++) printf " %s", quote(a[i])
      exit
    }' "$@"
}
string=a__b_+c
eval "array=($(split "$string" '[_+]+'))"

The zsh shell has builtin support for Perl-compatible regular expressions (in its zsh/pcre module), but using it to split a string, though possible is relatively cumbersome.

Is there any reason for special treatments with tab, space and newline? — cuonglm, yesterday
@cuonglm, generally you want to split on words when the delimiters are blanks, in the case of non-blank delimiters (like to split $PATH on :) on the contrary, you generally want to preserve empty elements. Note that in the Bourne shell, all characters were receiving the special treatment, ksh changed that to have only the blank ones (only space, tab and newline though) treated specially. — Stéphane Chazelas, yesterday
Well, the recent added Bourne shell note surprised me. And for completing, should you add the note for zsh treatment with string contains 2 or more characters in ${(s:string:)var}? If added, I can delete my answer :) — cuonglm, yesterday
What do you mean by "Also note that the S in $IFS is for Delimiter, not Separator."? I understand the mechanics and that it ignores trailing separators but the S stands for Separator, not delimiter. At least, that's what my bash's manual says. — terdon♦, yesterday
@terdon, $IFS comes from the Bourne shell where it was separator, ksh changed the behaviour without changing the name. I mention that to stress that split+glob (except in zsh or pdksh) doesn't simply split anymore. — Stéphane Chazelas, yesterday

Gilles · Answer 2 · 2015-09-08 13:12:28Z

up vote 6 down vote

Yes, use IFS and set it to _. Then use read -a to store into an array (-r turns off backslash expansion). Note that this is specific to bash; ksh and zsh have similar features with slightly different syntax, and plain sh doesn't have array variables at all.

$ r="var1_var2_var3"
$ IFS='_' read -r -a array <<< "$r"
$ for name in "${array[@]}"; do echo "+ $name"; done
+ var1
+ var2
+ var3

From man bash:

read

-a aname

The words are assigned to sequential indices of the array variable aname, starting at 0. aname is unset before any new values are assigned. Other name arguments are ignored.

IFS

The Internal Field Separator that is used for word splitting after expansion and to split lines into words with the read builtin command. The default value is ``''.

Note that read stops at the first newline. Pass -d '' to read to avoid that, but in that case, there will be an extra newline at the end due to the <<< operator. You can remove it manually:

IFS='_' read -r -d '' -a array <<< "$r"
array[$((${#array[@]}-1))]=${array[$((${#array[@]}-1))]%?}

edited yesterday

Gilles
288k41476867

answered yesterday

fedorqui
1,589420

That assumes $r doesn't contain newline characters or backslashes. Also note that it will only work in recent versions of the bash shell. – Stéphane Chazelas yesterday

@StéphaneChazelas good point. Yes, this is the "basic" case of a string. For the rest, everyone should go for your comprehensive answer. Regarding the versions of bash, read -a was introduced in bash 4, right? – fedorqui yesterday

1

sorry my bad, I thought <<< was added only recently to bash but it seems it's been there since 2.05b (2002). read -a is even older than that. <<< comes from zsh and is supported by ksh93 (and mksh and yash) as well but read -a is bash-specific (it's -A in ksh93, yash and zsh). – Stéphane Chazelas yesterday

@StéphaneChazelas is there any "easy" way to find when these changes happened? I say "easy" not to dig into the release files, maybe a page showing them all. – fedorqui yesterday

1

I look at change logs for that. zsh also has a git repository with history as far back as 3.1.5 and its mailing list is used for tracking changes as well. – Stéphane Chazelas yesterday

| show 1 more comment

asked	yesterday
viewed	911 times
active	today

current community

your communities

more stack exchange communities

Is there something like JavaScript's “split()” in the shell?

2 Answers 2

Splitting on regexps

Your Answer

Not the answer you're looking for? Browse other questions tagged shell shell-script string or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Is there something like JavaScript's “split()” in the shell?

2 Answers 2

Splitting on regexps

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged shell shell-script string or ask your own question.

Linked

Related

Hot Network Questions