0

I expect this is an easy answer, but I can't figure out the right terms to search for.

I would like to add a comma in between any company name and "INC" (or, as it may be, "LLC", "LP", etc.). For example, I want 'COMPANY INC' to be 'COMPANY, INC' where 'COMPANY' could be any company name you might think of. It could have letters, numbers, but probably not special characters.

REGEXP_REPLACE("COMPANY A LP', '[a-zA-Z0-9] LP', ', LP') of course only returns 'COMPANY , LP'; how do I keep the A so that it returns COMPANY A, LP?

EDIT: There are also already valid records with the appropriate comma, i.e. COMPANY A, INC also already exists.

1

This seems to work:

test=> select REGEXP_REPLACE('COMPANY A LP', ' LP$', ', LP');
 regexp_replace
----------------
 COMPANY A, LP
(1 row)

This is even better:

test=> select REGEXP_REPLACE('COMPANY A LP', ' (INC|LP|LLC)$', ', \1');
 regexp_replace
----------------
 COMPANY A, LP
(1 row)

test=> select REGEXP_REPLACE('COMPANY A INC', ' (INC|LP|LLC)$', ', \1');
 regexp_replace
----------------
 COMPANY A, INC
(1 row)

(works for any of the suffixes you mentioned, in a single statement).

Or, if as per the comments, some of the records already have the comma:

test=> select REGEXP_REPLACE('COMPANY A, INC', '([^,]) (INC|LP|LLC)$', '\1, \2');
 regexp_replace
----------------
 COMPANY A, INC
(1 row)

test=> select REGEXP_REPLACE('COMPANY A INC', '([^,]) (INC|LP|LLC)$', '\1, \2');
 regexp_replace
----------------
 COMPANY A, INC
(1 row)

Another way to deal with that, if you are updating the table so that all the records are formatted uniformly, is to leave the ones that are not formatted incorrectly alone:

UPDATE xxx 
SET company_name=REGEXP_REPLACE(company_name, ' (INC|LP|LLC)$', ', \1') 
WHERE company_name ~ '[^,] (INC|LP|LLC)$';
3
  • Thanks, that's really close. I forgot to mention there are already records with the appropriate comma, i.e. COMPANY A, INC and COMPANY A INC both exists. – Dylan Hettinger Feb 7 '14 at 5:23
  • Great, thanks for the update. Can you tell me what the term is to describe the '\1' and '\2' technique? – Dylan Hettinger Feb 7 '14 at 16:58
  • 1
    In the PG docs it describes "parenthesized subexpression" (see postgresql.org/docs/current/static/…). In other implementations, eg perl, they are called capture groups. /1 refers to the text matched by the first pair of () in the RE, /2 by the second, etc – harmic Feb 7 '14 at 22:15

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.