This sort of thing is easier to do in awk
or perl
than in a shell script (although if you're using a sh
like bash which
supports arrays, it's a bit easier than if using a sh
without arrays. You still have far more complications with quoting and globbing or expansion where you don't want it in a shell script than you do in perl
or awk
)
For example:
#!/usr/bin/perl
use strict;
my $pathx='/path/to/my/files';
my $dh;
my @frames=();
# get list of .csv files from $pathx
opendir($dh, $pathx) || die "can't open directory '$pathx': $!\n";
my @csvfiles = grep { /\.csv$/ && -f "$pathx/$_" } readdir($dh);
closedir($dh);
foreach my $f (@csvfiles) {
my @fields=split(/\./,$f);
my $fn=$fields[@fields-2]; # perl array indices start from 0, not 1.
printf '%s<-read.csv("%s",header=TRUE)'."\n", $fn, "$pathx/$f";
# <snip> etc .etc. etc.
printf '%s_2y<-tail(%s_log,730)'."\n", $fn, $fn;
push @frames,"${fn}_2y";
}
print "df-<data.frame(", join(',',@frames), ")\n";
NOTE: you can use the File::Find
module instead of a simple readdir()
if you need directory recursion.
Sample output (with files a.csv
, b.csv
, and c.csv
):
a<-read.csv("/path/to/my/files/a.csv",header=TRUE)
a_2y<-tail(a_log,730)
b<-read.csv("/path/to/my/files/b.csv",header=TRUE)
b_2y<-tail(b_log,730)
c<-read.csv("/path/to/my/files/c.csv",header=TRUE)
c_2y<-tail(c_log,730)
df-<data.frame(a_2y,b_2y,c_2y)
or with awk
:
NOTE: awk doesn't have a join()
function so I had to write one. awk
doesn't have a readdir()
function either, so it's easiest to just pipe the output of find
into it (write a wrapper sh
script to do that if necessary).
#!/usr/bin/awk -f
BEGIN {
FS="[./]";
delete A; # has side-effect of defining A as an array
};
# i isn't an argument to this function, it's a local variable.
# in awk, extra whitespace separates function args from declaration
# of local variable(s)
function join(array,sep, i) {
result=array[1]; # awk array indices start from 1
for (i=2;i<=length(array);i++) result = result sep array[i];
return result;
};
# main code block, run on every input line
{
fn=$(NF-1);
printf "%s<-read.csv(\"%s\",header=TRUE)\n", fn, $0;
# <snip> etc .etc. etc.
printf "%s_2y<-tail(%s_log,730)\n", fn, fn;
A[length(A)+1] = sprintf("%s_2y",fn);
};
END {
print "df-<data.frame(" join(",",A) ")";
}
save as, e.g., myscript.awk
, make it executable with chmod
and run as:
find "${PATHX}" -maxdepth 1 -type f -name "*.csv" | ./myscript.awk
Output is identical to the perl
version.
Finally, the same algorithm in bash:
#!/bin/bash
PATHX="/path/to/my/files"
declare -a frames=()
# get list of .csv files and store in array csvfiles.
csvfiles=( $(find "$PATHX" -maxdepth 1 -type f -name '*.csv' ) )
function join() {
local sep result i
sep="$1" ; shift
result="$1" ; shift
for i in "$@" ; do result="$result$sep$i" ; done
printf '%s' "$result"
}
for f in "${csvfiles[@]}" ; do
fn=$(basename "$f" '.csv')
printf "%s<-read.csv(\"%s\",header=TRUE)\n" $fn $f;
# <snip> etc .etc. etc.
printf "%s_2y<-tail(%s_log,730)\n" $fn $fn;
frames+=( "${fn}_2y" )
done
echo 'df-<data.frame('$( join ',' "${frames[@]}" )')';
This avoids a while read
loop which is almost always the worst possible way to process a series of lines in a shell script. Use awk
or perl
or sed
or a for
loop around an array - anything to avoid using a while read
loop.
data.frame
output need to show up inside thewhile
loop after thetail
line? because that loop will only see one line of output fromfind
at a time. – Jeff Schaller♦ Jul 12 '16 at 11:58R
commands, and not outputting "a_2y" & etc. The a,b,c parts are apparently base elements of some CSV filenames. Are we trying to update your existing script or write a separate one? – Jeff Schaller♦ Jul 12 '16 at 12:05