Postgres regex: add comma

Question

I expect this is an easy answer, but I can't figure out the right terms to search for.

I would like to add a comma in between any company name and "INC" (or, as it may be, "LLC", "LP", etc.). For example, I want 'COMPANY INC' to be 'COMPANY, INC' where 'COMPANY' could be any company name you might think of. It could have letters, numbers, but probably not special characters.

REGEXP_REPLACE("COMPANY A LP', '[a-zA-Z0-9] LP', ', LP') of course only returns 'COMPANY , LP'; how do I keep the A so that it returns COMPANY A, LP?

EDIT: There are also already valid records with the appropriate comma, i.e. COMPANY A, INC also already exists.

harmic · Accepted Answer · 2014-02-07 06:14:00Z

This seems to work:

test=> select REGEXP_REPLACE('COMPANY A LP', ' LP$', ', LP');
 regexp_replace
----------------
 COMPANY A, LP
(1 row)

This is even better:

test=> select REGEXP_REPLACE('COMPANY A LP', ' (INC|LP|LLC)$', ', \1');
 regexp_replace
----------------
 COMPANY A, LP
(1 row)

test=> select REGEXP_REPLACE('COMPANY A INC', ' (INC|LP|LLC)$', ', \1');
 regexp_replace
----------------
 COMPANY A, INC
(1 row)

(works for any of the suffixes you mentioned, in a single statement).

Or, if as per the comments, some of the records already have the comma:

test=> select REGEXP_REPLACE('COMPANY A, INC', '([^,]) (INC|LP|LLC)$', '\1, \2');
 regexp_replace
----------------
 COMPANY A, INC
(1 row)

test=> select REGEXP_REPLACE('COMPANY A INC', '([^,]) (INC|LP|LLC)$', '\1, \2');
 regexp_replace
----------------
 COMPANY A, INC
(1 row)

Another way to deal with that, if you are updating the table so that all the records are formatted uniformly, is to leave the ones that are not formatted incorrectly alone:

UPDATE xxx 
SET company_name=REGEXP_REPLACE(company_name, ' (INC|LP|LLC)$', ', \1') 
WHERE company_name ~ '[^,] (INC|LP|LLC)$';

Thanks, that's really close. I forgot to mention there are already records with the appropriate comma, i.e. COMPANY A, INC and COMPANY A INC both exists. — Dylan Hettinger, Feb 7 '14 at 5:23
Great, thanks for the update. Can you tell me what the term is to describe the '\1' and '\2' technique? — Dylan Hettinger, Feb 7 '14 at 16:58
In the PG docs it describes "parenthesized subexpression" (see postgresql.org/docs/current/static/…). In other implementations, eg perl, they are called capture groups. /1 refers to the text matched by the first pair of () in the RE, /2 by the second, etc — harmic, Feb 7 '14 at 22:15

current community

your communities

more stack exchange communities

Postgres regex: add comma

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged regex postgresql postgresql-9.2 or ask your own question.

Linked

Hot Network Questions

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged regex postgresql postgresql-9.2 or ask your own question.

Linked

Related