Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I need to get a substring from a path directory made of 2 numbers preceeded and followed by a '_'. The string is like:

'P:\pgdfecol\71698384737978\INFENTECONTROL\2011_9_43\2011_9_46_43_29_10.ZIP'

and in this case I want to get the 43 following the 46.

The path is stored following the next rule, after the last backslash:

'\TablesPK_twoCharactersClassification_twoCharactersDocumentType_anything.ZIP'

I want to get the classification. The catch is that the table's primary key could be more than one field, although I know in each case how many fields are in the pk.

I got with something like this:

select substring(substring(substring('P:\pgdfecol\71698384737978\INFENTECONTROL\2011_9_43\2011_9_46_43_29_10.ZIP' from '([^\\]*(\.ZIP|zip))') from '([^_]*_){4}') from '[0-9]{2}')

But I would like something simpler.

Other cases:

'P:\pgdfecol\71698384737978\INFENTECONTROL\2011_03_46\2011_03_46_46_48_.ZIP'

(need the second 46)

'P:\pgdfecol\71698384737978\INFCONTABLE\2009_05_INBP\2009_05_INBP_22_28_.ZIP'

(need the 22 after INBP)

'P:\pgdfecol\71698384737978\INFOFICIAL\2007_06_MB\2007_06_MB_29_28_.ZIP'

(need the 29)

'P:\pgdfecol\71698384737978\ASOCIADOS\8010625\8010625_02_04_20110111.ZIP'

(02 after 8010625)

In the last case, the pk is only one field, so I've changed the sentence as:

select substring(substring(substring('P:\pgdfecol\71698384737978\ASOCIADOS\8010625\8010625_02_04_20110111.ZIP' from '([^\\]*(\.ZIP|zip))') from '([^_]*_){2}') from '[0-9]{2}')

For one Pk, I need the second set of ([^_]*_), for three the fourth, and so on..

select substring(substring(substring('P:\pgdfecol\71698384737978\ACTASCOMITE\ACRE123\ACRE123_17_11_.ZIP' from '([^\\]*(\.ZIP|zip))') from '([^_]*_){2}') from '[0-9]{2}')

(I get 17)

I'm using postgres 9.0.

share|improve this question
    
Would also be helpful to add a couple more example values with possible variations to make clear what you need. And add what your database has to say to SHOW standard_conforming_strings;, please. –  Erwin Brandstetter Oct 26 '12 at 4:14
    
Why is there no _ before .ZIP in 8010625_02_04_20110111.ZIP? –  Erwin Brandstetter Oct 26 '12 at 4:47
    
It's not mandatory. After the "TwoCharactersDocumentType" could be anything. The only thing sure is that after the table's primary key always goes the classification and document_type. –  nowxue Oct 26 '12 at 4:52
    
But it's always _ and digits between the two digits and .zip? I think I got it all in my solution now. –  Erwin Brandstetter Oct 26 '12 at 4:54
    
No, after 02_04 could go numbers or letters or more '', it's not sure. That's why I start filtering from the last backslash, because before the .zip could go anything. it follows thi "format" \TablesPK_twoCharactersClassification_twoCharactersDocumentType_anything_includi‌​ng_letters.ZIP. The classificafion and document type are always between "" lower hyphen. –  nowxue Oct 26 '12 at 5:01
show 2 more comments

1 Answer

up vote 1 down vote accepted

I am beginning to understand. Consider this test case:

WITH x(txt) AS ( VALUES
     ('P:\pgdfecol\71698384737978\INFENTECONTROL\2011_9_43\2011_9_46_43_29_10.ZIP')  -- 43
    ,('P:\pgdfecol\71698384737978\INFENTECONTROL\2011_03_46\2011_03_46_46_48_.ZIP')  --need the second 46
    ,('P:\pgdfecol\71698384737978\INFCONTABLE\2009_05_INBP\2009_05_INBP_22_28_.ZIP') --need the 22 after INBP
    ,('P:\pgdfecol\71698384737978\INFOFICIAL\2007_06_MB\2007_06_MB_29_28_.ZIP')      --need the 29
    )
SELECT txt, substring(txt, '\\(?:[^_\\]+_){3}(\d\d)_[^\\]*\.(?:ZIP|zip)$')
FROM   x

(?:) .. non-capturing parenthesis
[^_\\].. character class with any character except \ and _
\d .. a digit, same as [0-9] effectively
+ .. 1 or more matches (greedy)
$ .. end of string
[_\d] .. character class with digits and _

The case with only one pk needs a different pattern. Use {1} instead of {3}.

Not sure why you escape the backslashes. In modern versions of PostgreSQL standard_conforming_strings is on by default so you don't need to escape backslashes in strings - but still in regular expression of course.

share|improve this answer
    
Hi, thanks for answering but it shows nothing to me, an empty string as result. I've escaped the backslash because it gives me an error "non-standar use of '\\' in string literal..." in pgAdmin 1.14. Besides, the primary key could be of any number of fields, and not only digits. –  nowxue Oct 26 '12 at 4:08
1  
@nowxue You need to disclose your version of PostgreSQL. Always. Add that to your question. I suspect you have an older version with standard_conforming_strings = off ... pgAdmin is only the GUI and irrelevant to the question. –  Erwin Brandstetter Oct 26 '12 at 4:11
    
No, you're right. The escaped backslashes are not needed in the string, but in the regex –  nowxue Oct 26 '12 at 4:15
    
Excellent!! I'm still getting an error in my Postgres when I try to execute the sentence without escaping the backslashes, and when I escape the backslash, I get nothing. But I've executed it in other server, and it worked. Thank you very much. Mine is Windows 7, Lc_collate and lc_type: Spanish_Colombia.1252. –  nowxue Oct 26 '12 at 5:28
1  
set standard_conforming_strings = on, and it worked! –  nowxue Oct 26 '12 at 5:48
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.