File handling script

Question

This is my first bash script so I ask to you to say what you think.

The script must check every directory and if it finds a pdf file in this directory: move all the pdf files to parent directory, change the name of the files and remove directory.

So if I had this folder structure:

folder1
  file.pdf
  file.txt
folder2
  some.pdf
  another.pdf
folder3
  some.txt

the script must change it to:

 folder1.pdf
 folder2.pdf
 folder2_2.pdf
 folder3
   some.txt

Here is a script code:

#!/bin/bash

working_directory="$HOME/MEGAsync/downloads/books"

cd $working_directory

# for each directory in working directory
for D in `find . -type d`
do
    # skip '.'
    if [[ $D != "." ]]; then
        # get folder name 
        folder_name=${D##*/}
        # and cd to folder
        cd $D

        # number of pdf files in the current directory
        declare -i pdf_count=0
        # for each file in the current directory
        for f in *; do

            # get file without path
            file=$(basename "$f")
            extension="${file##*.}"
            filename="${file%.*}"

            if [[ $extension == 'pdf' ]]; then
                pdf_count=pdf_count+1
                if [[ pdf_count < 2 ]]; then
                    new_filename="${working_directory}/${folder_name}.pdf";
                else
                    new_filename="${working_directory}/${folder_name}_${pdf_count}.pdf";
                fi

                echo cp $working_directory/$folder_name/$f $new_filename
                cp $working_directory/$folder_name/$f $new_filename
            fi
        done

        # return to parent folder
        cd ..

        # delete directory if pdf-file was found
        if [[ $pdf_count != "0" ]]; then
            echo rm -R $working_directory/$folder_name
            rm -R $working_directory/$folder_name
        fi
    fi
done

rolfl · Answer 1 · 2014-07-26 15:22:26Z

For a first script, this is great, and I am impressed.

your structure is neat
you are using loops properly
you are using variable substitution which is 'advanced'

The script appears that it will work for all the standard cases, and, I have seen much worse in 'production' code.

So, as a beginner, well done.

What's next?

the edge cases are going to kill this script - spaces in file names.
the logging and error handling could be better
I recommend basename and dirname instead of the variable regexes you use:
- dirname: http://linux.die.net/man/1/dirname
- basename: http://linux.die.net/man/1/basename (note that basename path/to/file.pdf .pdf returns file)
consistency on for->do loop - you have do on the next line once, and then for ... ; do the next time. Either is fine, but be consistent.

All of the error-handling, logging, and the spaces-in-files can be solved easily by using a function for running commands. Also, the "$@" construct in bash is 'special'

Consider the bash function:

function runcmd {
    echo "$@"
    # Fail if the command fails...
    "$@" || exit 1
}

So, this function will print the command, and then will run it, and exit the script if the command fails.

Now, instead of structures like:

            echo cp $working_directory/$folder_name/$f $new_filename
            cp $working_directory/$folder_name/$f $new_filename

you can do:

            runcmd cp "$working_directory/$folder_name/$f" $new_filename

and you will still get the echo, you will get the new error handling, and you will handle spaces in names just fine.

janos · Answer 2 · 2014-07-27 08:57:32Z

As @rolfl said, the script will fail if the directories contain spaces. In any case, a for loop over the output of the find command is really not pretty. You can get around the issue by changing the outermost for loop to a while loop instead:

find . -type d | while read D; do
    # ...
done

And btw, don't use `cmd` style command substitution, the modern way is $(cmd).

Deeply nested code can be hard to read. I recommend to skip the "." directory this way instead:

find . -type d | while read D; do
    [[ $D == . ]] && continue
    # ...
done

That is, instead of nesting deeper under an if, just revert the condition and do a continue instead.

And you don't need to quote ..

Instead of this:

folder_name=${D##*/}

Better to get the folder name like this:

folder_name=$(basename "$D")

Instead of this:

cd "$D"
# do something ...
cd ..

I would recommend this:

pushd "$D"
# do something ...
popd

The end result is the same, with the difference that you don't need to specify where to go back, it will go back to the right place. It's more robust that way.

You could get the PDF count much easier:

pdf_count=$(ls "$D"/*.pdf | wc -l)

This is a lot faster too, instead of iterating over each file and extracting their extensions, matching, and counting one by one.

You could refactor that part of the code to never go inside $D, which is more robust. (It's very easy to make mistakes when you change the working directory in the middle of scripts.)

Finally, most of the comments are pointless, the code you commented already explains itself, doesn't it?

Thanks. The 'pushd' and 'popd' commands make console output. How can I make them to work saliently ? The only way I have found is to redirect output to /dev/null : 'opd > /dev/null'. Is it correct way ? — demas, 10 hours ago
True, they produce output, and I'm afraid redirecting with >/dev/null is the only way. But read the next point there, I think you can achieve what you want without changing directories at all, and that will be the most robust solution. — janos, 10 hours ago

asked	yesterday
viewed	47 times
active	today

current community

your communities

more stack exchange communities

File handling script

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged beginner bash file-system or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

File handling script

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged beginner bash file-system or ask your own question.

Related

Hot Network Questions