Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

In postgresql, I need to extract 3 specific alphanumeric characters from a field for a given column and then create the year from it. So if the value of the field is FUT,GRA,S12,1055559 or S11,1050830,PON, I need S11 or S12 and then to create 2011 or 2012. The Result can remain a text field.

I have used the following to get the S?? out: substring(my_field from E'(S\\d\\d),') but creating the year seems to be more challenging. I was hoping to use regexp_split_to_array but it does not seem to be working very well.

share|improve this question

2 Answers 2

You look like you're almost there -- you just have to append the '20' to the extracted '12'/'11':

UPDATED (PostgreSQL has no boundary \b (?!)):

SELECT '20'||substring(my_field from E'(?:^|\\W)S(\\d\\d)(?!\\w)') ....

-- NOTE: Old version. However there's no '\b' in this regex flavour.
SELECT '20'||substring(my_field from E'\\bS(\\d\\d)\\b') ....

You have to double-escape backslashes. Also, the use of the brackets in the regex means that the selected substring will be '12' or '11', not including the 'S'. The '20'||xxxx just concatenates the two strings.

The (?:^|\\W) makes sure the Sxx is either at the start of the string, or preceded by a non-word character (like space, or comma). This prevents the S13 in FUT,GRAS13,S12 from being matched.

The (?!\\w) makes sure the Sxx isn't followed by another word character (ie is not part of a larger word). This prevents S13 being matched in S132,GRA,S12.

(Note - I previously had \b which means "word boundary" but PostgreSQL regex doesn't support that, so the (?:^|\W) and (?!\w) are ways to simulate that.

share|improve this answer
    
AFAIK, there's no word boundary in PostgreSQL's regex, \b means backspace. –  M42 Jan 25 '12 at 9:36
    
that's right, see postgresql.org/docs/9.1/static/… –  araqnid Jan 25 '12 at 11:05
    
ahh, that's a shame. and lookahead is supported but not lookbehind -- will update the answer. cheers! –  mathematical.coffee Jan 26 '12 at 13:00

AFAIK, there's no word boundary in PostgreSQL's regex, \b means backspace.

I would do:

SELECT '20'||substring('FUT,GRA,S12,1055559' from E'\\WS(\\d{2})\\W')

where \W means: not a word character.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.