I am writing a method to create a folder tree. I am getting the data from a database and creating the list of nodes. Is there a way to get the method to execute quicker? Perhaps using parallel task library?

Each node looks like this:

public class Node : INode
{
    public int Id { get; set; }
    public int? ParentId { get; set; }
    public string DisplayText { get; set; }
    public bool DefaultState { get; set; }
    public List<Node> ChildItems { get; set; }
}

The fastest implementation I have been able to come up with has been just two for loops using recursion:

/// <summary>
/// For loop incrementing: 00:00:28.2218157
/// </summary>
/// <param name="folderRepo"></param>
/// <param name="folders"></param>
private void GetChildFoldersIncrement(IRepository<Folder> folderRepo, List<Node> folders)
{
    for (var i = 0; i < folders.Count; i++)
    {
        var folderId = folders[i].Id;
        var childFolders = folderRepo.FindByExp(f => f.ParentFolderID == folderId).ToList();
        if (childFolders.Count == 0) continue;
        for (var j = 0; j < childFolders.Count; j++)
        {
            var node = new Node
            {
                Id = childFolders[j].FolderID,
                ParentId = childFolders[j].ParentFolderID,
                DisplayText = childFolders[j].FolderName,
                DefaultState = false,
                ChildItems = new List<Node>()
            };

            if (folders[i].ChildItems == null) folders[i].ChildItems = new List<Node>();
            folders[i].ChildItems.Add(node);
        }

        if (folders[i].ChildItems.Count > 0) GetChildFoldersIncrement(folderRepo, folders[i].ChildItems);
    }
}

I've attempted to optimize the loops by decrementing them but it was actually slightly slower:

/// <summary>
/// For loop decrement: 00:00:28.3496396
/// </summary>
/// <param name="folderRepo"></param>
/// <param name="folders"></param>
private void GetChildFoldersDecrement(IRepository<Folder> folderRepo, List<Node> folders)
{
    if (folders == null) return;
    var maxFolders = folders.Count();
    for (var i = maxFolders - 1; i >= 0; --i)
    {
        var folderId = folders[i].Id;
        var childFolders = folderRepo.FindByExp(f => f.ParentFolderID == folderId).ToList();
        var maxChildFolders = childFolders.Count();
        for (var j = maxChildFolders - 1; j >= 0; --j)
        {
            var node = new Node
            {
                Id = childFolders[j].FolderID,
                ParentId = childFolders[j].ParentFolderID,
                DisplayText = childFolders[j].FolderName,
                DefaultState = false,
                ChildItems = new List<Node>()
            };

            if (folders[i].ChildItems == null) folders[i].ChildItems = new List<Node>();
            folders[i].ChildItems.Add(node);
        }

        GetChildFoldersDecrement(folderRepo, folders[i].ChildItems);
    }
}
share|improve this question
    
sort by ParentFolderId, you'll have O(nln(n)) instead of O(n2) – GameAlchemist Feb 24 '15 at 22:52
2  
Decrementing loops is a micro-optimization and will not likely make any significant difference. You've also got a few other lines in there that look like micro-optimizations. Forget about them. Focus on reducing the number of nested loops. Also, what does FindByExp do? It looks like it's probably another loop. Do you have access to a performance profiler? – craftworkgames Feb 24 '15 at 23:23
    
instead of building your tree from the top down, have you thought of building it from the bottom up, this should reduce the number of loops since each folder can only have one parent. Once you find the parent you are done with that folder. – Matt Johnson Feb 25 '15 at 6:24
1  
Have you ever heard of a Closure table? dirtsimple.org/2010/11/… If you can add one to your database you'll be able to query without recursing. It's a classic space/time tradeoff but works wonders on an app I help develop - DAG of ~150k nodes with max depth of 8. Might be worth investigating. – RobH Feb 25 '15 at 11:47
    
What kind of database storage are you using? What is the ORM framework? The main issue is not in .NET performance, but in the number of calls you are performing to the underlying storage. You should aim to optimize that first. – almaz Feb 25 '15 at 17:35

Let's start from your code. The recursion can be implemented in a much clearer way if the method returns the collection of all child folders, instead of updating parent folder's ChildItems collection. In this case it could look as simple as

private IEnumerable<Node> GetChildFolderNodes(IRepository<Folder> folderRepo, int folderId)
{
    var childFolders = folderRepo.FindByExp(f => f.ParentFolderID == folderId).ToArray();

    return childFolders.Select(folder => new Node
    {
        Id = folder.FolderID,
        ParentId = folder.ParentFolderID,
        DisplayText = folder.FolderName,
        DefaultState = false,
        ChildItems = GetChildFolderNodes(folderRepo, folder.FolderID).ToList()
    });
}

There are a number of articles describing why a IRepository<T> is quite a bad idea when defined on top of ORM framework. Main reasons are leaked abstractions and abstraction on top of abstraction. And your example shows why - any optimization requires the knowledge of underlying implementation (Entity Framework).

Running this process in multiple threads may help, but will put additional load on the database server and thus is not scalable. Note that Entity Framework's DbContext is not thread-safe, so you cannot share the IRepository in multiple threads unless you spin up a new DbContext per request there (which I wouldn't recommend doing).

Assuming that you're using Entity Framework 6.0, FindByExp method returns IQueryable<Folder>, you can try asynchronous implementation of the data retrieval:

private async Task<List<Node>> GetChildFolderNodesAsync(Func<IRepository<Folder>> folderRepoFactory, int folderId)
{
    Folder[] childFolders;
    using (var folderRepo = folderRepoFactory())
    {
        childFolders = await folderRepo.FindByExp(f => f.ParentFolderID == folderId).ToArrayAsync();
    }

    return (await Task.WhenAll(childFolders.Select(async folder => new Node
    {
        Id = folder.FolderID,
        ParentId = folder.ParentFolderID,
        DisplayText = folder.FolderName,
        DefaultState = false,
        ChildItems = await GetChildFolderNodesAsync(folderRepoFactory, folder.FolderID)
    }))).ToList();
}

Note that in this case you would need to spin up a new DbContext per request (which I expressed as a repository factory).

And finally, assuming that this is a critical path in your application, the best way to optimize the code you're running is to reduce the number of round trips to database server, instead of trying to squeeze microseconds on a .NET layer.

For example, if your underlying DB engine is SQL Server, you can leverage the recursive CTE to load full folder hierarchy in one go. It can be optimized even further if CTE returns the depth level for folders, but I'll leave it for you as an exercise:

private List<Node> GetChildFolderNodesAsync(DbSet<Folder> folderSet, int folderId)
{
    var allFolders = folderSet.SqlQuery(@"WITH cte (FolderID, ParentFolderID, FolderName) AS
    (
        SELECT  FolderID, ParentFolderID, FolderName
        FROM    Folders
        WHERE   ParentFolderID = @p0
    UNION ALL
        SELECT  F.FolderID, F.ParentFolderID, F.FolderName
        FROM    Folders F
        JOIN    cte
        ON      F.ParentFolderID = cte.FolderID
    )
    SELECT  *
    FROM cte", folderId).AsNoTracking();

    //Converting all folders into nodes, but skipping the ChildItems property for now...
    var nodesByParent = allFolders
        .GroupBy(folder => folder.ParentFolderID)
        .ToDictionary(folders => folders.Key, folders => folders.Select(folder => new Node
        {
            Id = folder.FolderID,
            ParentId = folder.ParentFolderID,
            DisplayText = folder.FolderName,
            DefaultState = false,
        }).ToList());

    List<Node> childNodes;

    //Now that we have all nodes grouped by parent, we can look up and assign ChildItems for all nodes
    foreach (var node in nodesByParent.Values.SelectMany(nodes => nodes))
    {
        node.ChildItems = nodesByParent.TryGetValue(node.Id, out childNodes)
            ? childNodes
            : new List<Node>();
    }

    return nodesByParent.TryGetValue(folderId, out childNodes)
        ? childNodes
        : new List<Node>();
}
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.