Tell me more ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Working on a simple time sheet webapp, I have created the following query (simplified - I actually have several mapped columns of a similar type to project_id) to generate test data:

INSERT INTO `entries` (`entry_id`, `user_id`, `project_id`, `date`, `comment`, `hours`)
VALUES
( null,
  0,
  (SELECT `project_id` FROM projects ORDER BY RAND() LIMIT 1),
  CURRENT_DATE(),
  '# TEST DATA #',
  (SELECT ROUND((0.25 + RAND() * (24 - 0.24)), 2))
);

Currently, I'm running this query inside a PHP loop.

This code is currently adequate for my needs now, as I can iterate 50 loops in 0.3037 seconds.

However, I fear that when it comes time to test large data sets (searching and report generation for rows > 1 000 000), I may run into problems creating them.

How can I optimize this algorithm? Should I consider using a stored procedure?

Or should I just not worry about it, and run the loop to longer iterations, more times?

My stored procedure attempt

Following loki2302's suggestion, I wrote this stored procedure:

DROP PROCEDURE IF EXISTS `create_test_entries`;

DELIMITER //

CREATE PROCEDURE `create_test_entries` (IN number INT)
LANGUAGE SQL
DETERMINISTIC
SQL SECURITY INVOKER
COMMENT 'Creates `number` of random test entries in the timesheet'
BEGIN
    DECLARE i INT;
    SET i = 0;

    WHILE i < number DO
        INSERT INTO `entries` (`entry_id`, `user_id`, `project_id`, `department_id`, `task_id`, `date`, `comment`, `hours`)
        VALUES
        ( null,
          0,
          (SELECT `project_id` FROM projects ORDER BY RAND() LIMIT 1),
          CURRENT_DATE(),
          '# TEST DATA #',
          (SELECT ROUND((0.25 + RAND() * (24 - 0.24)), 2))
        );
        SET i = i + 1;
    END WHILE;
END //

Comments?

share|improve this question
@loki2302 Added! Comments? – msanford Aug 8 '12 at 18:38
So, what's about your original question? Is this approach any faster? :-) – loki2302 Aug 8 '12 at 18:43
@loki2302 I've never run long queries like this before, so basically: is running this type of stored procedure a million times a bad idea, is there a more efficient way to do it, or anything else that jumps out? – msanford Aug 8 '12 at 18:56
Accessing database million times is always bad idea. If you need to have a million of entries, just make this SP generate you all these entries at once and then call it once. – loki2302 Aug 8 '12 at 18:59

1 Answer

up vote 2 down vote accepted

An answer for updated question.

Feels like (SELECT project_id FROM projects ORDER BY RAND() LIMIT 1) stands for "get random but still valid project_id". You could probably just iterate over all of your projects (with no randomness) and for every project add a random number of records. I don't think it affects your intention but this will probably work faster.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.