Adding find contents to array

Question

I've currently got a script that generates R commands based on the output of find

#!/bin/bash
PATHX="/path/to/my/files"
find "${PATHX}" -maxdepth 1 -type f -name "*.csv" | while read d; do
FN=$(echo -n "${d}" | cut -d/ -f5 | cut -d. -f1)
echo "${FN}<-read.csv(\"${PATHX}/${FN}.csv\",header=TRUE)"
# <snip> etc .etc. etc.
echo "${FN}_2y<-tail(${FN}_log,730)"
done

This works great. But I've got one problem with one R command :

df<-data.frame(list,of,columns,goes,here)

I can't figure out how I can integrate this into the find/while above, namely I need to output a list of ${FN}_2y into the data.frame() function.

So for example, assume my script outputted :

a_2y
b_2y
c_2y

I need to end up with df<-data.frame(a_2y,b_2y,c_2y)

Clarification further to question in comments, I only need one instance of data.frame, right at the very end once the all the csv inputs have been parsed in.

does the data.frame output need to show up inside the while loop after the tail line? because that loop will only see one line of output from find at a time. — Jeff Schaller♦, Jul 12 '16 at 11:58
@JeffSchaller no. I only need to output one instance of data.frame, right at the end. — Little Code, Jul 12 '16 at 12:00
if I'm understanding your script correctly, it's echoing R commands, and not outputting "a_2y" & etc. The a,b,c parts are apparently base elements of some CSV filenames. Are we trying to update your existing script or write a separate one? — Jeff Schaller♦, Jul 12 '16 at 12:05
@JeffSchaller sorry for the confusion. The script works great, the script is outputting what I say in the example, I was just using a,b,c as substitute for the variables that originate from the input file names. All I need is the one last bit of the script, right at the end, that creates the data.frame command. — Little Code, Jul 12 '16 at 12:06

cas · Accepted Answer · 2016-07-13 08:06:54Z

This sort of thing is easier to do in awk or perl than in a shell script (although if you're using a sh like bash which supports arrays, it's a bit easier than if using a sh without arrays. You still have far more complications with quoting and globbing or expansion where you don't want it in a shell script than you do in perl or awk)

For example:

#!/usr/bin/perl

use strict;

my $pathx='/path/to/my/files';

my $dh;
my @frames=();

# get list of .csv files from $pathx
opendir($dh, $pathx) || die "can't open directory '$pathx': $!\n";
my @csvfiles = grep { /\.csv$/ && -f "$pathx/$_" } readdir($dh);
closedir($dh);

foreach my $f (@csvfiles) {
   my @fields=split(/\./,$f);
   my $fn=$fields[@fields-2];   # perl array indices start from 0, not 1.

   printf '%s<-read.csv("%s",header=TRUE)'."\n", $fn, "$pathx/$f";
   # <snip> etc .etc. etc.
   printf '%s_2y<-tail(%s_log,730)'."\n", $fn, $fn;

   push @frames,"${fn}_2y";
}

print "df-<data.frame(", join(',',@frames), ")\n";

NOTE: you can use the File::Find module instead of a simple readdir() if you need directory recursion.

Sample output (with files a.csv, b.csv, and c.csv):

a<-read.csv("/path/to/my/files/a.csv",header=TRUE)
a_2y<-tail(a_log,730)
b<-read.csv("/path/to/my/files/b.csv",header=TRUE)
b_2y<-tail(b_log,730)
c<-read.csv("/path/to/my/files/c.csv",header=TRUE)
c_2y<-tail(c_log,730)
df-<data.frame(a_2y,b_2y,c_2y)

or with awk:

NOTE: awk doesn't have a join() function so I had to write one. awk doesn't have a readdir() function either, so it's easiest to just pipe the output of find into it (write a wrapper sh script to do that if necessary).

#!/usr/bin/awk -f

BEGIN {
  FS="[./]";
  delete A; # has side-effect of defining A as an array
};   

# i isn't an argument to this function, it's a local variable.
# in awk, extra whitespace separates function args from declaration
# of local variable(s)

function join(array,sep,       i) {     
  result=array[1];     # awk array indices start from 1
  for (i=2;i<=length(array);i++) result = result sep array[i];
  return result;
};

# main code block, run on every input line
{
  fn=$(NF-1);
  printf "%s<-read.csv(\"%s\",header=TRUE)\n", fn, $0;
  # <snip> etc .etc. etc.
  printf "%s_2y<-tail(%s_log,730)\n", fn, fn;
  A[length(A)+1] = sprintf("%s_2y",fn);
};

END {
  print "df-<data.frame(" join(",",A) ")";
}

save as, e.g., myscript.awk, make it executable with chmod and run as:

find "${PATHX}" -maxdepth 1 -type f -name "*.csv" | ./myscript.awk

Output is identical to the perl version.

Finally, the same algorithm in bash:

#!/bin/bash

PATHX="/path/to/my/files"

declare -a frames=()
# get list of .csv files and store in array csvfiles.
csvfiles=( $(find "$PATHX" -maxdepth 1 -type f -name '*.csv' ) )

function join() {
  local sep result i
  sep="$1" ; shift
  result="$1" ; shift

  for i in "$@" ; do result="$result$sep$i" ; done
  printf '%s' "$result"
}

for f in "${csvfiles[@]}" ; do
  fn=$(basename "$f" '.csv')

  printf "%s<-read.csv(\"%s\",header=TRUE)\n" $fn $f;
  # <snip> etc .etc. etc.
  printf "%s_2y<-tail(%s_log,730)\n" $fn $fn;

  frames+=( "${fn}_2y" )
done

echo 'df-<data.frame('$( join ',' "${frames[@]}" )')';

This avoids a while read loop which is almost always the worst possible way to process a series of lines in a shell script. Use awk or perl or sed or a for loop around an array - anything to avoid using a while read loop.

Wow ! Thanks for this ! And especially thanks for pointing out the more sensible way to do it in Perl ! — Little Code, Jul 13 '16 at 7:04

meuh · Accepted Answer · 2016-07-12 12:08:33Z

You can collect the names into a variable fns and echo this at the end. Since you have a pipe you need to keep the variable in the same subshell as the while/do/done. ${fns:1} is a substring of the variable, dropping the initial extra comma.

#!/bin/bash
PATHX="/path/to/my/files"
find "${PATHX}" -maxdepth 1 -type f -name "*.csv" |
(   fns=
    while read d; do
        FN=$(echo -n "${d}" | cut -d/ -f3 | cut -d. -f1)
        echo "${FN}<-read.csv(\"${PATHX}/${FN}.csv\",header=TRUE)"
        # <snip> etc .etc. etc.
        echo "${FN}_2y<-tail(${FN}_log,730)"
        fns+=",${FN}_2y"
    done
    echo "df<-data.frame(${fns:1})"
)

Stack Exchange Network

current community

your communities

more stack exchange communities

Adding find contents to array

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged bash find or ask your own question.

Hot Network Questions

Adding find contents to array

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged bash find or ask your own question.

Related

Hot Network Questions