up vote 2 down vote favorite
share [fb]

To forgo reading the entire problem, my basic question is: Is there a function in PostgreSQL that escapes regular expression characters in a string? I've probed the documentation (but perhaps not thoroughly enough) and I was unable to find such a function.

Here is the full problem:

In a PostgreSQL database, I have a column with unique names in it. I also have a process which periodically inserts names into this field, and, to prevent duplicates, if it needs to enter a name that already exists, it appends a space and parentheses with a count to the end.

i.e. Name, Name (1), Name (2), Name (3), etc.

As it stands, I use the following code to find the next number to add in the series (written in plpgsql):

var_name_id := 1;

SELECT CAST(substring(a.name from E'\\((\\d+)\\)$') AS int)
INTO var_last_name_id
FROM my_table.names a
WHERE a.name LIKE var_name || ' (%)'
ORDER BY CAST(substring(a.name from E'\\((\\d+)\\)$') AS int) DESC
LIMIT 1;

IF var_last_name_id IS NOT NULL THEN
    var_name_id = var_last_name_id + 1;
END IF;

var_new_name := var_name || ' (' || var_name_id || ')';

("var_name" contains the name I'm trying to insert)

This works for now, but the problem lies in the WHERE statement:

WHERE a.name LIKE var_name || ' (%)'

This check doesn't verify that the % in question is a number, and it doesn't account for multiple parentheses, as in something like "Name ((1))", and if either case existed a cast exception would be thrown.

The WHERE statement really needs to be something more like:

WHERE a.r1_name ~* var_name || E' \\(\\d+\\)'

Now we come to the real problem, which is that "var_name" could contain regular expression characters.

So, my question as stated above is: Is there a function in PostgreSQL that escapes regular expression characters in a string, so I could do something like:

WHERE a.r1_name ~* regex_escape(var_name) || E' \\(\\d+\\)'

Any suggestions are much appreciated, including a possible reworking of my duplicate name solution.

Thanks,

B.J.

link|improve this question

68% accept rate
feedback

2 Answers

up vote 1 down vote accepted

how about trying something like this, substituting var_name for my hard-coded 'John Bernard':

create table my_table(name text primary key);
insert into my_table(name) values ('John Bernard'), 
                                  ('John Bernard (1)'), 
                                  ('John Bernard (2)'), 
                                  ('John Bernard (3)');


select max(regexp_replace(substring(name, 13), ' |\(|\)', '', 'g')::integer+1) 
from my_table 
where substring(name, 1, 12)='John Bernard' 
      and substring(name, 13)~'^ \([1-9][0-9]*\)$';

 max
-----
   4
(1 row)

one caveat: I am assuming single-user access to the database while this process is running (and so are you in your approach). If that is not the case then the max(n)+1 approach will not be a good one.

link|improve this answer
The database will be accessed by multiple users, but the additional filters in my query (omitted in my question) will limit it to consider rows only allowed to be used by one user at a time. Thank you for the help, and I'll accept your solution of splitting the name and using substring and char_length (literally represented by 12) to compare the 2 parts separately. – Benny Feb 28 at 20:35
feedback

Are you at liberty to change the schema? I think the problem would go away if you could use a composite primary key:

name text not null,
number integer not null,
primary key (name, number)

It then becomes the duty of the display layer to display Fred #0 as "Fred", Fred #1 as "Fred (1)", &c.

If you like, you can create a view for this duty. Here's the data:

=> select * from foo;
  name  | number 
--------+--------
 Fred   |      0
 Fred   |      1
 Barney |      0
 Betty  |      0
 Betty  |      1
 Betty  |      2
(6 rows)

The view:

create or replace view foo_view as
select *,
case
  when number = 0 then
    name
  else
    name || ' (' || number || ')'
end as name_and_number
from foo;

And the result:

=> select * from foo_view;
  name  | number | name_and_number 
--------+--------+-----------------
 Fred   |      0 | Fred
 Fred   |      1 | Fred (1)
 Barney |      0 | Barney
 Betty  |      0 | Betty
 Betty  |      1 | Betty (1)
 Betty  |      2 | Betty (2)
(6 rows)
link|improve this answer
I considered this solution, but since inserting the data is really the only time the problem exists, the bulk of the work would be making all the display query changes. It just seems like modifying the insertion query would be a lot easier – Benny Feb 28 at 20:38
@Benny, You could use a view for that, if you like. Please see the amended answer. – Wayne Conrad Feb 28 at 20:52
feedback

Your Answer

 
or
required, but never shown

Not the answer you're looking for? Browse other questions tagged or ask your own question.