2

I'm having trouble structuring the output of a Netezza query. I'm working with web data and need to be able to count the distinct URL's that are receiving traffic. The issue is that most of the rows contain URL parameters that need to be stripped before counting.

This is what my URL's look like:

  • www.site.com/page?utm_source=xxx&utm_campaign=yyy
  • www.site.com/another?cmp_code=zzz&cmp_vendor=aaa
  • www.site.com/page?cmp_code=nnn&cmp_vendor=bbb
  • www.site.com/another
  • www.site.com/something

What I'd like to return is something like this:

  • www.site.com/page
  • www.site.com/another
  • www.site.com/page
  • www.site.com/another
  • www.site.com/something

My query is below. It runs without error, but its not producing the results that I'd like.

SELECT UID, TIMESTAMP, SUBSTR(FULL_URL, 1, INSTR(FULL_URL, '?', -1) AS MY_URL
FROM MY.TABLE
LIMIT 100;

This is the result

  • www.site.com/page
  • www.site.com/another
  • www.site.com/page
  • NULL
  • NULL

You can see here that if my URL contains a question mark, I'm getting the exact result that I want. If it does not contain URL parameters then it returns nothing at all.

I've also tried a regex approach, but this is currently giving me a Count field incorrect error.

SELECT UID, TIMESTAMP, REGEXP_REPACE(FULL_URL, (\.\+)(\/?)(\.\*), $1) AS MY_URL
FROM MY.TABLE
LIMIT 100;

I'm still new to Netezza and I'm not seeing the logic or operators that I need to accomplish this. Can anyone point me in the right direction?

2
  • 1
    try regexp_extract(full_url,'^[^?]+') Commented Dec 8, 2016 at 22:50
  • This does exactly what I need. Thanks for the assist!
    – Lenwood
    Commented Dec 8, 2016 at 23:02

1 Answer 1

2

This should work:

create temp table myTable ( Full_URL varchar(5000));
insert into myTable values('www.site.com/page?utm_source=xxx&utm_campaign=yyy');
insert into myTable values('www.site.com/another?cmp_code=zzz&cmp_vendor=aaa');
insert into myTable values('www.site.com/page?cmp_code=nnn&cmp_vendor=bbb');
insert into myTable values('www.site.com/another');
insert into myTable values('www.site.com/something');  

select regexp_replace(Full_URL,'\?.*','')
from myTable
0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.