Take the 2-minute tour ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It's 100% free, no registration required.

It's very easy to use split() in JavaScript to break a string into an array.

What about shell script?

Say I want to do this:

$ script.sh var1_var2_var3

When the user give such string var1_var2_var3 to the script.sh, inside the script it will convert the string into an array like

array=( var1 var2 var3 )
for name in ${array[@]}; do
    # some code
done
share|improve this question
    
what shell are you using, with bash you can do IFS='_' read -a array <<< "${string}" –  gwillie yesterday
    
perl can do that too. It's not "pure" shell, but it's quite common. –  Sobrique yesterday
1  
Technically tr, cut, and awk are part of writing shell scripts; tr and cut will do what you need; as for a built-in for bash? Not sure. –  josten yesterday
    
@Sobrique I am also unaware of the technical definition of "pure" shell, but there is node.js. –  emory 6 hours ago
    
I tend to work on 'is it probably installed on my linux box by default' and don't fret the minutiae :) –  Sobrique 4 hours ago

2 Answers 2

up vote 24 down vote accepted

Bourne/POSIX-like shells have a split+glob operator and it's invoked every time you leave a parameter expansion ($var, $-...), command substitution ($(...)), or arithmetic expansion ($((...))) unquoted in list context.

Actually, you invoked it by mistake when you did for name in ${array[@]} instead of for name in "${array[@]}". (Actually, you should beware that invoking that operator like that by mistake is source of many bugs and security vulnerabilities).

That operator is configured with the $IFS special parameter (to tell what characters to split on (though beware that space, tab and newline receive a special treatment there)) and the -f option to disable (set -f) or enable (set +f) the glob part.

Also note that while the S in $IFS was originally (in the Bourne shell where $IFS comes from) for Separator, in POSIX shells, the characters in $IFS should rather be seen as delimiters or terminators (see below for an example).

So to split on _:

string='var1_var2_var3'
IFS=_ # delimit on _
set -f # disable the glob part
array=($string) # invoke the split+glob operator

for i in "${array[@]}"; do # loop over the array elements.

To see the distinction between separator and delimiter, try on:

string='var1_var2_'

That will split it into var1 and var2 only (no extra empty element).

So, to make it similar to JavaScript's split(), you'd need an extra step:

string='var1_var2_var3'
IFS=_ # delimit on _
set -f # disable the glob part
temp=${string}_ # add an extra delimiter
array=($temp) # invoke the split+glob operator

(note that it would split an empty $string into 1 (not 0) element, like JavaScript's split()).

To see the special treatments tab, space and newline receive, compare:

IFS=' '; string=' var1  var2  '

(where you get var1 and var2) with

IFS='_'; string='_var1__var2__'

where you get: '', var1, '', var2, ''.

Note that the zsh shell doesn't invoke that split+glob operator implicitly like that unless in sh or ksh emulation. There, you have to invoke it explicitely. $=string for the split part, $~string for the glob part ($=~string for both), and it also has a split operator where you can specify the separator:

array=(${(s:_:)string})

or to preserve the empty elements:

array=("${(@s:_:)string}")

Note that there s is for splitting, not delimiting (also with $IFS, a known POSIX non-conformance of zsh). It's different from JavaScript's split() in that an empty string is split into 0 (not 1) element.

A notable difference with $IFS-splitting is that ${(s:abc:)string} splits on the abc string, while with IFS=abc, that would split on a, b or c.

With zsh and ksh93, the special treatment that space, tab or newline receive can be removed by doubling them in $IFS.

As a historic note, the Bourne shell (the ancestor or modern POSIX shells) always stripped the empty elements. It also had a number of bugs related to splitting and expansion of $@ with non-default values of $IFS. For instance IFS=_; set -f; set -- $@ would not be equivalent to IFS=_; set -f; set -- $1 $2 $3....

Splitting on regexps

Now for something closer to JavaScript's split() that can split on regular expressions, you'd need to rely on external utilities.

In the POSIX tool-chest,awk has a split operator that can split on extended regular expressions (those are more or less a subset of the Perl-like regular expressions supported by JavaScript).

split() {
  awk -v q="'" '
    function quote(s) {
      gsub(q, q "\\" q q, s)
      return q s q
    }
    BEGIN {
      n = split(ARGV[1], a, ARGV[2])
      for (i = 1; i <= n; i++) printf " %s", quote(a[i])
      exit
    }' "$@"
}
string=a__b_+c
eval "array=($(split "$string" '[_+]+'))"

The zsh shell has builtin support for Perl-compatible regular expressions (in its zsh/pcre module), but using it to split a string, though possible is relatively cumbersome.

share|improve this answer
    
Is there any reason for special treatments with tab, space and newline? –  cuonglm yesterday
1  
@cuonglm, generally you want to split on words when the delimiters are blanks, in the case of non-blank delimiters (like to split $PATH on :) on the contrary, you generally want to preserve empty elements. Note that in the Bourne shell, all characters were receiving the special treatment, ksh changed that to have only the blank ones (only space, tab and newline though) treated specially. –  Stéphane Chazelas yesterday
    
Well, the recent added Bourne shell note surprised me. And for completing, should you add the note for zsh treatment with string contains 2 or more characters in ${(s:string:)var}? If added, I can delete my answer :) –  cuonglm yesterday
1  
What do you mean by "Also note that the S in $IFS is for Delimiter, not Separator."? I understand the mechanics and that it ignores trailing separators but the S stands for Separator, not delimiter. At least, that's what my bash's manual says. –  terdon yesterday
    
@terdon, $IFS comes from the Bourne shell where it was separator, ksh changed the behaviour without changing the name. I mention that to stress that split+glob (except in zsh or pdksh) doesn't simply split anymore. –  Stéphane Chazelas yesterday

Yes, use IFS and set it to _. Then use read -a to store into an array (-r turns off backslash expansion). Note that this is specific to bash; ksh and zsh have similar features with slightly different syntax, and plain sh doesn't have array variables at all.

$ r="var1_var2_var3"
$ IFS='_' read -r -a array <<< "$r"
$ for name in "${array[@]}"; do echo "+ $name"; done
+ var1
+ var2
+ var3

From man bash:

read

-a aname

The words are assigned to sequential indices of the array variable aname, starting at 0. aname is unset before any new values are assigned. Other name arguments are ignored.

IFS

The Internal Field Separator that is used for word splitting after expansion and to split lines into words with the read builtin command. The default value is ``''.

Note that read stops at the first newline. Pass -d '' to read to avoid that, but in that case, there will be an extra newline at the end due to the <<< operator. You can remove it manually:

IFS='_' read -r -d '' -a array <<< "$r"
array[$((${#array[@]}-1))]=${array[$((${#array[@]}-1))]%?}
share|improve this answer
    
That assumes $r doesn't contain newline characters or backslashes. Also note that it will only work in recent versions of the bash shell. –  Stéphane Chazelas yesterday
    
@StéphaneChazelas good point. Yes, this is the "basic" case of a string. For the rest, everyone should go for your comprehensive answer. Regarding the versions of bash, read -a was introduced in bash 4, right? –  fedorqui yesterday
1  
sorry my bad, I thought <<< was added only recently to bash but it seems it's been there since 2.05b (2002). read -a is even older than that. <<< comes from zsh and is supported by ksh93 (and mksh and yash) as well but read -a is bash-specific (it's -A in ksh93, yash and zsh). –  Stéphane Chazelas yesterday
    
@StéphaneChazelas is there any "easy" way to find when these changes happened? I say "easy" not to dig into the release files, maybe a page showing them all. –  fedorqui yesterday
1  
I look at change logs for that. zsh also has a git repository with history as far back as 3.1.5 and its mailing list is used for tracking changes as well. –  Stéphane Chazelas yesterday

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.