Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

How can I Left Outer Join (I think it is Left Outer Join but I am not 100% sure) two data tables with the following tables and conditions while keeping all columns from both tables?

dtblA:

 col1  col2       anotherColumn1
   1    Hi            any1
   2    Bye           any1
   3    Later         any1
   4    Never         any1

dtblB:

 col1   anotherColumn2
   1      any2
   1      any2
   2      any2
   3      any2
   3      any2
   3      any2

dtblJoined:

 col1  col2   anotherColumn1    anotherColumn2
   1    Hi       any1               any2
   1    Hi       any1               any2
   2    Bye      any1               any2
   3    Later    any1               any2
   3    Later    any1               any2
   3    Later    any1               any2

In dtblA, col1 is has unique values. In dtblB, col1 is not required to have unique values. IF dbltA has 4 in col1 but dtblB does not have 4 then dtblJoined will not show that information.

I can use regular DataTable operations, LINQ, or whatever.

I tried this but it removes duplicates:

dtblA.PrimaryKey = new DataColumn[] {dtblA.Columns["col1"]}

DataTable dtblJoined = new DataTable();
dtblJoined.Merge(dtblA, false, MissingSchemaAction.AddWithKey);
dtblJoined.Merge(dtblB, false, MissingSchemaAction.AddWithKey);

EDIT 1:

This is close to I what I want but it only has columns from one of the tables ( found at this link ):

    dtblJoined = (from t1 in dtblA.Rows.Cast<DataRow>()
                  join t2 in dtblB.Rows.Cast<DataRow>() on t1["col1"] equals t2["col1"]
                  select t1).CopyToDataTable();

EDIT 2:

An answer from this link seems to work for me but I had to change it a bit as follows:

DataTable targetTable = dtblA.Clone();
var dt2Columns = dtblB.Columns.OfType<DataColumn>().Select(dc =>
new DataColumn(dc.ColumnName, dc.DataType, dc.Expression, dc.ColumnMapping));
var dt2FinalColumns = from dc in dt2Columns.AsEnumerable()
                   where targetTable.Columns.Contains(dc.ColumnName) == false
                   select dc;

targetTable.Columns.AddRange(dt2FinalColumns.ToArray());

var rowData = from row1 in dtblA.AsEnumerable()
                          join row2 in dtblB.AsEnumerable()
                          on row1["col1"] equals row2["col1"]
                          select row1.ItemArray.Concat(row2.ItemArray.Where(r2 => row1.ItemArray.Contains(r2) == false)).ToArray();

 foreach (object[] values in rowData)
      targetTable.Rows.Add(values);

I also found this link and I might try that out since it seems more concise.

share|improve this question
 
This is also natural join. Tell us what's supposed to happen when dtblA has '4' and dtblB does not. –  Shoe Jul 16 at 18:45
 
Just in case you missed these other posts on SO regarding the same issue Check here and here –  Rwiti Jul 16 at 18:45
 
updated to show dtblA with 4 –  Soenhay Jul 16 at 20:28

3 Answers

You could probably use LINQ and do something like this:

var dtblJoined = from dB in dtblB.AsEnumerable()
                 join dA in dtblA.AsEnumerable() on dA.col1 equals dB.col1 into dAB
                 from d in dAB.DefaultIfEmpty()
                 select new (col1 = dB.col1, ; col2 = (dB.col1 == dA.col1) ? dA.col2 : null);

This would return an IEnumerable as the result not a DataTable, but it should get you closer to what you are looking for I think. May need a little tweaking though.

share|improve this answer

This is simply an inner join between the 2 tables:

var query = (from x in a.AsEnumerable()
              join y in b.AsEnumerable() on x.Field<int>("col1") equals y.Field<int>("col1")
              select new { col1= y.Field<int>("col1"), col2=x.Field<int>("col2") }).ToList();

Produces:

col1 col2
1    Hi 
1    Hi 
2    Bye 
3    Later 
3    Later 
3    Later 
share|improve this answer
up vote 0 down vote accepted

Thanks all for your help. Here is what I came up with based on answers from this link:

    /// <summary>
    /// Joins the passed in DataTables on the colToJoinOn.
    /// <para>Returns an appropriate DataTable with zero rows if the colToJoinOn does not exist in both tables.</para>
    /// </summary>
    /// <param name="dtblWithUniqueValuesInColumn"></param>
    /// <param name="dtblWithExtraData"></param>
    /// <param name="colToJoinOn"></param>
    /// <returns></returns>
    /// <remarks>http://stackoverflow.com/questions/2379747/create-combined-datatable-from-two-datatables-joined-with-linq-c-sharp?rq=1</remarks>
    public static DataTable LeftOuterJoinTwoDataTablesOnOneColumn(DataTable dtblWithUniqueValuesInColumn, DataTable dtblWithExtraData, string colToJoinOn)
    {
        //Change column name to a temp name so the LINQ for getting row data will work properly.
        string strTempColName = colToJoinOn + "_2";
        if (dtblWithExtraData.Columns.Contains(colToJoinOn))
            dtblWithExtraData.Columns[colToJoinOn].ColumnName = strTempColName;

        //Get columns from dtblWithUniqueValuesInColumn
        DataTable dtblResult = dtblWithUniqueValuesInColumn.Clone();

        //Get columns from dtblWithExtraData
        var dt2Columns = dtblWithExtraData.Columns.OfType<DataColumn>().Select(dc => new DataColumn(dc.ColumnName, dc.DataType, dc.Expression, dc.ColumnMapping));

        //Get columns from dtblWithExtraData that are not in dtblWithUniqueValuesInColumn
        var dt2FinalColumns = from dc in dt2Columns.AsEnumerable()
                              where !dtblResult.Columns.Contains(dc.ColumnName)
                              select dc;

        //Add the rest of the columns to dtblResult
        dtblResult.Columns.AddRange(dt2FinalColumns.ToArray());

        //No reason to continue if the colToJoinOn does not exist in both DataTables.
        if (!dtblWithUniqueValuesInColumn.Columns.Contains(colToJoinOn) || (!dtblWithExtraData.Columns.Contains(colToJoinOn) && !dtblWithExtraData.Columns.Contains(strTempColName)))
        {
            if (!dtblResult.Columns.Contains(colToJoinOn))
                dtblResult.Columns.Add(colToJoinOn);
            return dtblResult;
        }

        //get row data
        var rowData = from row1 in dtblWithUniqueValuesInColumn.AsEnumerable()
                      join row2 in dtblWithExtraData.AsEnumerable()
                      on row1[colToJoinOn] equals row2[strTempColName]
                      select row1.ItemArray.Concat(row2.ItemArray).ToArray();

        //Add row data to dtblResult
        foreach (object[] values in rowData)
            dtblResult.Rows.Add(values);

        //Change column name back to original
        dtblWithExtraData.Columns[strTempColName].ColumnName = colToJoinOn;

        //Remove extra column from result
        dtblResult.Columns.Remove(strTempColName);

        return dtblResult;
    }

Note that you can put LINQ code into a function and into a Library file then in the main code you can edit and continue.

EDIT 3:

This method now works correctly and it is still fast when the tables have 2000+ rows. Any recommendations/suggestions/improvements would be appreciated.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.