Take the 2-minute tour ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It's 100% free, no registration required.

It's very easy to use split() in JavaScript to break a string into an array.

What about shell script?

Say I want to do this:

$ script.sh var1_var2_var3

When the user give such string var1_var2_var3 to the script.sh, inside the script it will convert the string into an array like

array=( var1 var2 var3 )
for name in ${array[@]}; do
    # some code
done
share|improve this question
    
what shell are you using, with bash you can do IFS='_' read -a array <<< "${string}" –  gwillie 2 days ago
    
perl can do that too. It's not "pure" shell, but it's quite common. –  Sobrique 2 days ago
1  
Technically tr, cut, and awk are part of writing shell scripts; tr and cut will do what you need; as for a built-in for bash? Not sure. –  josten 2 days ago
    
@Sobrique I am also unaware of the technical definition of "pure" shell, but there is node.js. –  emory yesterday
    
I tend to work on 'is it probably installed on my linux box by default' and don't fret the minutiae :) –  Sobrique yesterday

2 Answers 2

up vote 24 down vote accepted

Bourne/POSIX-like shells have a split+glob operator and it's invoked every time you leave a parameter expansion ($var, $-...), command substitution ($(...)), or arithmetic expansion ($((...))) unquoted in list context.

Actually, you invoked it by mistake when you did for name in ${array[@]} instead of for name in "${array[@]}". (Actually, you should beware that invoking that operator like that by mistake is source of many bugs and security vulnerabilities).

That operator is configured with the $IFS special parameter (to tell what characters to split on (though beware that space, tab and newline receive a special treatment there)) and the -f option to disable (set -f) or enable (set +f) the glob part.

Also note that while the S in $IFS was originally (in the Bourne shell where $IFS comes from) for Separator, in POSIX shells, the characters in $IFS should rather be seen as delimiters or terminators (see below for an example).

So to split on _:

string='var1_var2_var3'
IFS=_ # delimit on _
set -f # disable the glob part
array=($string) # invoke the split+glob operator

for i in "${array[@]}"; do # loop over the array elements.

To see the distinction between separator and delimiter, try on:

string='var1_var2_'

That will split it into var1 and var2 only (no extra empty element).

So, to make it similar to JavaScript's split(), you'd need an extra step:

string='var1_var2_var3'
IFS=_ # delimit on _
set -f # disable the glob part
temp=${string}_ # add an extra delimiter
array=($temp) # invoke the split+glob operator

(note that it would split an empty $string into 1 (not 0) element, like JavaScript's split()).

To see the special treatments tab, space and newline receive, compare:

IFS=' '; string=' var1  var2  '

(where you get var1 and var2) with

IFS='_'; string='_var1__var2__'

where you get: '', var1, '', var2, ''.

Note that the zsh shell doesn't invoke that split+glob operator implicitly like that unless in sh or ksh emulation. There, you have to invoke it explicitely. $=string for the split part, $~string for the glob part ($=~string for both), and it also has a split operator where you can specify the separator:

array=(${(s:_:)string})

or to preserve the empty elements:

array=("${(@s:_:)string}")

Note that there s is for splitting, not delimiting (also with $IFS, a known POSIX non-conformance of zsh). It's different from JavaScript's split() in that an empty string is split into 0 (not 1) element.

A notable difference with $IFS-splitting is that ${(s:abc:)string} splits on the abc string, while with IFS=abc, that would split on a, b or c.

With zsh and ksh93, the special treatment that space, tab or newline receive can be removed by doubling them in $IFS.

As a historic note, the Bourne shell (the ancestor or modern POSIX shells) always stripped the empty elements. It also had a number of bugs related to splitting and expansion of $@ with non-default values of $IFS. For instance IFS=_; set -f; set -- $@ would not be equivalent to IFS=_; set -f; set -- $1 $2 $3....

Splitting on regexps

Now for something closer to JavaScript's split() that can split on regular expressions, you'd need to rely on external utilities.

In the POSIX tool-chest,awk has a split operator that can split on extended regular expressions (those are more or less a subset of the Perl-like regular expressions supported by JavaScript).

split() {
  awk -v q="'" '
    function quote(s) {
      gsub(q, q "\\" q q, s)
      return q s q
    }
    BEGIN {
      n = split(ARGV[1], a, ARGV[2])
      for (i = 1; i <= n; i++) printf " %s", quote(a[i])
      exit
    }' "$@"
}
string=a__b_+c
eval "array=($(split "$string" '[_+]+'))"

The zsh shell has builtin support for Perl-compatible regular expressions (in its zsh/pcre module), but using it to split a string, though possible is relatively cumbersome.

share|improve this answer
    
Is there any reason for special treatments with tab, space and newline? –  cuonglm 2 days ago
1  
@cuonglm, generally you want to split on words when the delimiters are blanks, in the case of non-blank delimiters (like to split $PATH on :) on the contrary, you generally want to preserve empty elements. Note that in the Bourne shell, all characters were receiving the special treatment, ksh changed that to have only the blank ones (only space, tab and newline though) treated specially. –  Stéphane Chazelas 2 days ago
    
Well, the recent added Bourne shell note surprised me. And for completing, should you add the note for zsh treatment with string contains 2 or more characters in ${(s:string:)var}? If added, I can delete my answer :) –  cuonglm 2 days ago
1  
What do you mean by "Also note that the S in $IFS is for Delimiter, not Separator."? I understand the mechanics and that it ignores trailing separators but the S stands for Separator, not delimiter. At least, that's what my bash's manual says. –  terdon 2 days ago
    
@terdon, $IFS comes from the Bourne shell where it was separator, ksh changed the behaviour without changing the name. I mention that to stress that split+glob (except in zsh or pdksh) doesn't simply split anymore. –  Stéphane Chazelas 2 days ago

Yes, use IFS and set it to _. Then use read -a to store into an array (-r turns off backslash expansion). Note that this is specific to bash; ksh and zsh have similar features with slightly different syntax, and plain sh doesn't have array variables at all.

$ r="var1_var2_var3"
$ IFS='_' read -r -a array <<< "$r"
$ for name in "${array[@]}"; do echo "+ $name"; done
+ var1
+ var2
+ var3

From man bash:

read

-a aname

The words are assigned to sequential indices of the array variable aname, starting at 0. aname is unset before any new values are assigned. Other name arguments are ignored.

IFS

The Internal Field Separator that is used for word splitting after expansion and to split lines into words with the read builtin command. The default value is ``''.

Note that read stops at the first newline. Pass -d '' to read to avoid that, but in that case, there will be an extra newline at the end due to the <<< operator. You can remove it manually:

IFS='_' read -r -d '' -a array <<< "$r"
array[$((${#array[@]}-1))]=${array[$((${#array[@]}-1))]%?}
share|improve this answer
    
That assumes $r doesn't contain newline characters or backslashes. Also note that it will only work in recent versions of the bash shell. –  Stéphane Chazelas 2 days ago
    
@StéphaneChazelas good point. Yes, this is the "basic" case of a string. For the rest, everyone should go for your comprehensive answer. Regarding the versions of bash, read -a was introduced in bash 4, right? –  fedorqui 2 days ago
1  
sorry my bad, I thought <<< was added only recently to bash but it seems it's been there since 2.05b (2002). read -a is even older than that. <<< comes from zsh and is supported by ksh93 (and mksh and yash) as well but read -a is bash-specific (it's -A in ksh93, yash and zsh). –  Stéphane Chazelas 2 days ago
    
@StéphaneChazelas is there any "easy" way to find when these changes happened? I say "easy" not to dig into the release files, maybe a page showing them all. –  fedorqui 2 days ago
1  
I look at change logs for that. zsh also has a git repository with history as far back as 3.1.5 and its mailing list is used for tracking changes as well. –  Stéphane Chazelas 2 days ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.