Armand Niculescu, BEng, MSM, is a 34 year old Art Director at Media Division. and he enjoys working with visual arts for film, web and print.

15 responses to “Using MySQL to generate daily sales reports with filled gaps”

  1. S. Karpfen

    Thanks for the very instructive example. It’s a very intuitive solution to build the calendar; once, one has the idea to do it…

  2. Alex Skakunov

    I like the way you created a chart out of a table and simple image ;]]]

  3. Ralf Neubauer

    The problem with calendar tables: they get out of date. As most people will pre-fill them some years into the future, this wil happen, when no-one even remembers the table and how to add dates. What is the best strategy to avoid this?

  4. Norman Avila

    Very illustrative and intutive, simple but great output, i will consider it to graph my weather data (including, missing data days) in my site

    thanks

  5. Philippe

    Very illustrative article.
    Best techniques are sometimes so simple and obvious that we do forget they exist !
    Thanks a lot for your article.

    But, there is a “but”. The output is buggy and, with data provided above you don’t obtain the good result.
    Instead, you obtain this :
    date_order, total_sales
    ’2009-08-16′, 0
    ’2009-08-17′, 1
    ’2009-08-18′, 12
    ’2009-08-19′, 0
    ’2009-08-20′, 0
    ’2009-08-21′, 1

    As you can see, the two first rows are not taken in account :
    (’2009-08-15 12:20:20′, ’1′, ’2′, ’123′),
    (’2009-08-15 12:20:20′, ’2′, ’2′, ’123′),
    Why ? Because of this part of the query : SELECT MIN(order_date) FROM orders.
    In fact, SQL returns “2009-08-15 12:20:20″ which is greater than “2009-08-15 00:00:00″ (the value for a date field only).
    Using the keyword “BETWEEN”, you say SQL retrieve values that are greater or equal then “2009-08-15 00:00:00″.
    So, the correct query would be :
    SELECT MIN(DATE(order_date)) FROM orders
    In integrality :
    SELECT calendar.datefield AS date_order,
    Ifnull(Sum(orders.quantity),0) AS total_sales
    FROM orders
    RIGHT JOIN calendar
    ON Date(orders.order_date) = calendar.datefield
    WHERE calendar.datefield BETWEEN (SELECT Min(Date(order_date))
    FROM orders) AND (SELECT Max(order_date)
    FROM orders)
    GROUP BY date_order
    And the result is :
    ’2009-08-15′, 4
    ’2009-08-16′, 0
    ’2009-08-17′, 1
    ’2009-08-18′, 12
    ’2009-08-19′, 0
    ’2009-08-20′, 0
    ’2009-08-21′, 1

    Have a nice day ;)

  6. Philippe

    To respond to ralph neubauer, why not create a stored proc who creates automatically a memory table with the dates needed for the job ?
    There is the source code I suggest for the stored proct (PL/SQL for MYsql )

    DELIMITER //

    DROP PROCEDURE IF EXISTS sales_by_period//

    CREATE PROCEDURE sales_by_period(IN start_date DATE, IN end_date DATE)
    COMMENT ‘Returns a dataset for sales between start_date and end_date’
    BEGIN
    — Drop and creates a temporary table in memory for the time of the session
    DROP TABLE IF EXISTS mem_calendar;
    CREATE TEMPORARY TABLE mem_calendar (
    datefield DATE
    ) ENGINE=MEMORY;

    — fills temporary table with dates
    SET @temp_date := start_date;
    WHILE @temp_date <= end_date DO
    INSERT INTO mem_calendar VALUES(@temp_date);
    SET @temp_date = ADDDATE(@temp_date, INTERVAL 1 DAY);
    END WHILE;

    — retieves and returns data
    SELECT
    mem_calendar.datefield AS date_order,
    IFNULL(SUM(orders.quantity),0) AS total_sales
    FROM orders
    RIGHT JOIN mem_calendar ON DATE(orders.order_date) = mem_calendar.datefield
    WHERE mem_calendar.datefield BETWEEN start_date AND end_date
    GROUP BY date_order;
    END //

    DELIMITER ;

    As you can see, instead of using a physical table I use a memory table wich is faster.
    But you need to activate the "memory" engine which is not activated by default (it seems to me…).
    With such a stored proc, you prevent every situation.

    Hope this helps.
    Regards,
    Philippe

  7. Philippe

    Hi Armand,

    “On what version of MySQL have you tested? My code works perfectly on MySQL 5.0.22, I just retested it.”
    Response : MySql 5.1.36
    I agree for the inconvenience of temporary tables so I used a table in memory to get better perfromances.
    Plus, using a stored proc permits to delimit the period to examine instead of querying for all records contained in orders table.
    This is another approach to extend the point of view and to give more ideas for members looking at your great article. That was just my intention.
    Regards.

  8. Mikolaj

    Hello
    I thinkyour exaple will miss first days if no orders ware made this days.
    I mean I yu have table orders filled with dates starting from 3rd of November 
    calendar.datefield BETWEEN (SELECT Min(Date(order_date))
    wil still miss 1st and 2nd of November.

    Regards
    MJ

  9. Mikolaj

    OK, so query should look like this:

    SELECT calendar.datefield AS date, 
           IFNULL(SUM(orders.quantity),0) AS total_sales
    FROM calendar LEFT JOIN orders ON (calendar.datefield = DATE(orders.order_date))
    WHERE (calendar.datefield BETWEEN ('2009-10-01') AND ('2009-10-31') 
    GROUP BY date;
    And one mote thing... If you would like to get sales info per product_id or customer_id, please remember to add condition to JOIN part and not into WHERE part like:
    SELECT calendar.datefield AS date, 
           IFNULL(SUM(orders.quantity),0) AS total_sales,
           customer_id
    FROM calendar LEFT JOIN orders ON (calendar.datefield = DATE(orders.order_date)) AND customer_id=1
    WHERE (calendar.datefield BETWEEN ('2009-10-01') AND ('2009-10-31') 
    GROUP BY date;
    otherwise you will not get all NULLs from the join.
    Regards
    MJ
    
    
    
    
    
  10. Jose

    Ok..Here’s my try..I think this problem is easier to solve if we think in terms of unix timestamps:
    I first create a “numbers” table, with a single field (id) that goes from 0 to <max number of dates that may get involved in the query>. The query then is this:

    select @row+numbers.id*86400 , orders.quantity  FROM (select @row:=(select UNIX_TIMESTAMP(min(order_date)) FROM orders), @end:=(select UNIX_TIMESTAMP(max
    (order_date)) FROM orders)) r , numbers LEFT JOIN orders ON UNIX_TIMESTAMP(order_date)=@row+numbers.id*86400 WHERE @row+numbers.id*86400 < @end;

    @row is initialized as the min date.The 86400 are the seconds in a day.The entries in the “numbers” table (0..1..2….n) makes the expression (@row+numbers.id*86400) jump one day at a time, which is then left joined to dates in the orders table.
    In the case you’d need hours, or minutes, instead of days, changing the constant should be enough.
    I’ve just crafted this query, so i dont know if it’s bug-free! 

  11. ipkwena

    Thanks for the article/tutorial. Its simplicity and level of detail makes it of particular relevance to a small project I am working on.