Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I have some html data stored in text files right now. I recently decided to store the HTML data in the pgsql database instead of flat files. Right now, the 'entries' table contains a 'path' column that points to the file. I have added a 'content' column that should now store the data in the file pointed to by 'path'. Once that is complete, the 'path' column will be deleted. The problem that I am having is that the files contain apostrophes that throw my script out of whack. What can I do to correct this issue??

Here is the script

#!/bin/sh
dbname="myDB"
username="username"
fileroot="/path/to/the/files/*"

for f in $fileroot
do
psql $dbname $username -c "
  UPDATE entries
  SET content='`cat $f`'
  WHERE id=SELECT id FROM entries WHERE path LIKE '*`$f`';"
done

Note: The logic in the id=SELECT...FROM...WHERE path LIKE "" is not the issue. I have tested this with sample filenames in the pgsql environment.

The problem is that when I cat $f, any apostrophe in Edit: the contents of $f closes the SQL string, and I get a syntax error.

share|improve this question
add comment

2 Answers

up vote 2 down vote accepted

For the single quote escaping issue, a reasonable workaround might be to double the quotes, so you'd use:

`sed "s/'/''/g" < "$f"`

to include the file contents instead of the cat, and for the second invocation in the LIKE where you appeared to intend to use the file name use:

${f/"'"/"''"/}

to include the literal string content of $f instead of executing it, and double the quotes. The ${varname/match/replace} expression is bash syntax and may not work in all shells; use:

`echo "$f" | sed "s/'/''/g"`

if you need to worry about other shells.


There are a bunch of other problems in that SQL.

  • You're trying to execute $f in your second invocation. I'm pretty sure you didn't intend that; I imagine you meant to include the literal string.
  • Your subquery is also wrong, it lacks parentheses; (SELECT ...) not just SELECT.
  • Your LIKE expression is also probably not doing what you intended; you probably meant % instead of *, since % is the SQL wildcard.

If I also change backticks to $() (because it's clearer and easier to read IMO), fix the subquery syntax and add an alias to disambiguate the columns, and use a here-document instead passed to psql's stdin, the result is:

psql $dbname $username <<__END__
  UPDATE entries
  SET content=$(sed "s/'/''/g" < "$f")
  WHERE id=(SELECT e.id FROM entries e WHERE e.path LIKE '$(echo "$f" | sed "s/'/''/g")');
__END__

The above assumes you're using a reasonably modern PostgreSQL with standard_conforming_strings = on. If you aren't, change the regexp to escape apostrophes with \ instead of doubling them, and prefix the string with E, so O'Brien becomes E'O\'Brien'. In modern PostgreSQL it'd instead become 'O''Brien'.


In general, I'd recommend using a real scripting language like Perl with DBD::Pg or Python with psycopg to solve scripting problems with databases. Working with the shell is a bit funky. This expression would be much easier to write with a database interface that supported parameterised statements.

For example, I'd write this as follows:

import os
import sys
import psycopg2

try:
        connstr = sys.argv[1]
        filename = sys.argv[2]
except IndexError as ex:
        print("Usage: %s connect_string filename" % sys.argv[0])
        print("Eg: %s \"dbname=test user=fred\" \"some_file\"" % sys.argv[0])
        sys.exit(1)


def load_file(connstr,filename):
        conn = psycopg2.connect(connstr)
        curs = conn.cursor()
        curs.execute("""
        UPDATE entries
        SET content = %s
        WHERE id = (SELECT e.id FROM entries e WHERE e.path LIKE '%%'||%s);
        """, (filename, open(filename,"rb").read()))
        curs.close()

if __name__ == '__main__':
        load_file(connstr,filename)

Note the SQL wildcard % is doubled to escape it, so it results in a single % in the final SQL. That's because Python is using % as its format-specifier so a literal % must be doubled to escape it.

You can trivially modify the above script to accept a list of file names, connect to the database once, and loop over the list of all file names. That'll be a lot faster, especially if you do it all in one transaction. It's a real pain to do that with psql scripting; you have to use bash co-process as shown here ... and it isn't worth the hassle.

share|improve this answer
    
answered everything I could have possibly wanted to know. Thanks. –  parker.sikand Nov 18 '12 at 22:00
add comment

In the original post, I made it sound like there were apostrophes in the filename represented by $f. This was NOT the case, so a simple echo "$f" was able to fix my issue.

To make it more clear, the contents of my files were formatted as html snippets, typically something like <p>Blah blah <b>blah</b>...</p>. After trying the solution posted by Craig, I realized I had used single quotes in some anchor tags, and I did NOT want to change those to something else. There were only a few files where this violation occurred, so I just changed these to double quotes by hand. I also realized that instead of escaping the apostrophes, it would be better to convert them to &apos; Here is the final script that I ended up using:

dbname="myDB"
username="username"
fileroot="/path/to/files/*"

for f in $fileroot
do
psql $dbname $username << __END__
  UPDATE entries
  SET content='$(sed "s/'/\&apos;/g" < "$f")'
  WHERE id=(SELECT e.id FROM entries e WHERE path LIKE '%$(echo "$f")');
__END__
done

The format coloring on here might make it look like the syntax is incorrect, but I have verified that it is correct as posted.

share|improve this answer
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.