SQL Server > SQL Server Forums > Data Mining > Help for a beginner to mining (please!)

Answered Help for a beginner to mining (please!)

  • Thursday, April 19, 2012 12:59 PM
     
     

    Hi,

    I hope this is the correct forum and someone would be kind enough to assist.

    I have a simple, two table database of customers and orders, joined by a single key. e.g.

    tblCustomers (CASE):

      CustID    PK

      Gender

      WhereFound

    tblOrders (NESTED):

      OrderID    PK

      CustID      FK

      Sales Value

    I want to do a simple clustering based on the gender, wherefound and the sales value within the customers orders to understand if Males in Wales spend more on average (for example)

    I can flatten this data in to a single table, but I'm trying to learn how to run nested table queries. I'm using Visual Studio to execute these using the mining wizard... but I'm unable to work out the correct settings for Input and Predict for these variables.

    For instance - I know I need to set CustID in the Orders table as a key, but then it tries to use it as an input. I also know I want Sales Value discretizing, which I think I do by setting the column type in the wizard, but again I'm not sure I'm doing this correctly.

    Could anyone provide a "text" version of the checkboxes I need to set. Much appreciated if you can.

    Thanks,

    Jon


    • Edited by Jon at Work Thursday, April 19, 2012 1:57 PM
    •  

Answers

  • Thursday, April 19, 2012 9:52 PM
    Answerer
     
     Answered

    Hi Jon. You can use nested tables for a clustering model, but you wouldn't use CustId from Orders as the key, if Orders is the nested table. Instead, when you choose the Key from the nested table, select the column that you want to model.

    I know it's not intuitive when you're used to thinking of a key as a kind of relational connector, but remember, when you set up the data source view, you already defined how the case table and nested table are connected.

    http://msdn.microsoft.com/en-us/library/ms175659(v=SQL.100).aspx

    (Read the section, Using Nested Table Columns in a Mining Model)

    To set the nested table key

    1. In Specify Table Types dialog box, set [Customer] table as the Case table, and [Orders] table as the Nested table. (When I reproed this with AdventureWorks, I used Targeted Mailing as the case table, and vAssocSeqOrders as the nested table.)

    2. In Specify the Training Data dialog box, set the predictable and any input columns for the case table. For the nested table, choose your [Sales Value] column as the key. The logical key column is already defined for you.

    To change the discretization method

    Click on the Mining Structure tab, click the [Sales Value] column, and then change the Content property to Discretized. Then you can set the discretization options.

    Hope that helps.. nested tables are very useful but can be confusing. This tip from the dev team might help with some more examples:

    http://www.sqlserverdatamining.com/ssdm/Default.aspx?tabid=102&Id=48


    SQL Server UE, Data Mining

    • Proposed As Answer by koles Monday, April 23, 2012 12:23 PM
    • Marked As Answer by Jon at Work Monday, April 23, 2012 12:38 PM
    •  

All Replies

  • Thursday, April 19, 2012 9:52 PM
    Answerer
     
     Answered

    Hi Jon. You can use nested tables for a clustering model, but you wouldn't use CustId from Orders as the key, if Orders is the nested table. Instead, when you choose the Key from the nested table, select the column that you want to model.

    I know it's not intuitive when you're used to thinking of a key as a kind of relational connector, but remember, when you set up the data source view, you already defined how the case table and nested table are connected.

    http://msdn.microsoft.com/en-us/library/ms175659(v=SQL.100).aspx

    (Read the section, Using Nested Table Columns in a Mining Model)

    To set the nested table key

    1. In Specify Table Types dialog box, set [Customer] table as the Case table, and [Orders] table as the Nested table. (When I reproed this with AdventureWorks, I used Targeted Mailing as the case table, and vAssocSeqOrders as the nested table.)

    2. In Specify the Training Data dialog box, set the predictable and any input columns for the case table. For the nested table, choose your [Sales Value] column as the key. The logical key column is already defined for you.

    To change the discretization method

    Click on the Mining Structure tab, click the [Sales Value] column, and then change the Content property to Discretized. Then you can set the discretization options.

    Hope that helps.. nested tables are very useful but can be confusing. This tip from the dev team might help with some more examples:

    http://www.sqlserverdatamining.com/ssdm/Default.aspx?tabid=102&Id=48


    SQL Server UE, Data Mining

    • Proposed As Answer by koles Monday, April 23, 2012 12:23 PM
    • Marked As Answer by Jon at Work Monday, April 23, 2012 12:38 PM
    •  
  • Friday, April 20, 2012 7:38 AM
     
     

    Thank you ever so much Jeannine - I would never have thought to key sales !

    I've got to the stage of setting Sales Value as a Key (input is then auto-checked), then when I review the model settings on the Mining Structure tab, I'm not able to change the content property for Sales Value as it's locked in as a key?

    If I run it leaving this as a key, then I get a truncated set of variables with the top (say) 20 discrete sales value in it. What I was hoping for was say 10 clusters - one for each calculated sales "bucket" (range of values, e.g. £0 - £300, £300-£398, etc.) , with a breakdown of the splits in gender and wherefound for each of these buckets.

    I can do this through the excel add-in on a flat table ... so I roughly know what I want to get to, I just want to do it on a relational table.

    • Edited by Jon at Work Friday, April 20, 2012 8:46 AM Corrected column name
    •