Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I have a database including certain strings, such as '{TICKER|IBM}' to which I will refer as ticker-strings. My target is to count the amount of ticker-strings per day for multiple strings.

My database table 'tweets' includes the rows 'tweet_id', 'created at' (dd/mm/yyyy hh/mm/ss) and 'processed text'. The ticker-strings, such as '{TICKER|IBM}', are within the 'processed text' row.

At this moment, I have a working SQL query for counting one ticker-string (thanks to the help of other Stackoverflow-ers). What I would like to have is a SQL query in which I can count multiple strings (next to '{TICKER|IBM}' also '{TICKER|GOOG}' and '{TICKER|BAC}' for instance).

The working SQL query for counting one ticker-string is as follows:

SELECT d.date, IFNULL(t.count, 0) AS tweet_count
FROM all_dates AS d
LEFT JOIN (
    SELECT COUNT(DISTINCT tweet_id) AS count, DATE(created_at) AS date
    FROM tweets
    WHERE processed_text LIKE '%{TICKER|IBM}%'
    GROUP BY date) AS t
ON d.date = t.date

The eventual output should thus give a column with the date, a column with {TICKER|IBM}, a column with {TICKER|GOOG} and one with {TICKER|BAC}.

I was wondering whether this is possible and whether you have a solution for this? I have more than 100 different ticker-strings. Of course, doing them one-by-one is an option, but it is a very time-consuming one.

share|improve this question
    
The term you are looking for is 'pivot' (turning several rows in a few columns into one row with many columns). You should be able to find plenty of uses of it on Stackoverflow with a search or two –  Twelfth Jun 30 at 20:14
    
It is a little complicated, but I believe to generalize this query for all ticker symbols you need to create a pivot table query. Also, using wildcards, ex. '%{TICKER|IBM}%' will force a serial read on table tweets and impair performance. In this case I think you can use LEFT(processed_text, Length('{TICKER|')) = '{TICKER|' to find tickers. Her's an SO example of dynamic pivot tables in mysql: stackoverflow.com/questions/12598120/… –  ron tornambe Jun 30 at 20:39
add comment

2 Answers 2

If I understand correctly, you can do this with conditional aggregation:

SELECT d.date, coalesce(IBM, 0) as IBM, coalesce(GOOG, 0) as GOOG, coalesce(BAC, 0) AS BAC
FROM all_dates d LEFT JOIN
     (SELECT DATE(created_at) AS date,
             COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|IBM}%' then tweet_id
                   END) as IBM,
             COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|GOOG}%' then tweet_id
                   END) as GOOG,
             COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|BAC}%' then tweet_id
                   END) as BAC
      FROM tweets
      GROUP BY date
     ) t
     ON d.date = t.date;
share|improve this answer
    
Thanks Gordon, this works perfectly! Typing in the names of the companies in the script costs way less time than doing all 100+ ticker-strings manually, so I really appreciate it!! –  Geoffrey Jun 30 at 20:57
add comment

I'd return the specified resultset like this, adding expressions to the SELECT list for each "ticker" I want returned as a separate column:

   SELECT d.date
        , IFNULL(SUM(t.processed_text LIKE '%{TICKER|IBM}%' ),0) AS `cnt_ibm`
        , IFNULL(SUM(t.processed_text LIKE '%{TICKER|GOOG}%'),0) AS `cnt_goog`
        , IFNULL(SUM(t.processed_text LIKE '%{TICKER|BAC}%' ),0) AS `cnt_goog`
        , IFNULL(SUM(t.processed_text LIKE '%{TICKER|...}%' ),0) AS `cnt_...`
     FROM all_dates d
     LEFT
     JOIN tweets t
       ON t.created_at >= d.date
      AND t.created_at < d.date + INTERVAL 1 DAY
    GROUP BY d.date

NOTES: The expressions within the SUM aggregates above are evaluated as booleans, so they return 1 (if true), 0 (if false), or NULL. I'd avoid wrapping the created_at column in a DATE() function, and use a range scan instead, especially if a predicate is added (WHERE clause) that restricts the values ofdatebeing returned fromall_dates`.

As an alternative, expressions like this will return an equivalent result:

     , SUM(IF(t.process_text LIKE '%{TICKER|IBM}%' ,1,0)) AS `cnt_ibm`
share|improve this answer
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.