Introduction

Usually web application or service that run in the Windows Azure Platform follow the architecture shown in the figure below. A web farm consisting of two or more web roles serve requests coming from clients. Azure provides the necessary load balancer infrastructure. This principle is called "scale out" because you can enhance performance by adding new instances of the web role. Relational data is stored in SQL Azure. As you might know every SQL Azure database is in fact a failover cluster consisting of three nodes. The problem is that currently SQL Azure does not offer a built-in scale-out mechanism (note that the SQL Azure team announced that there will be an out-of-the-box scale out solution called SQL Azure Federation in the future).

At the database layer you can implement scale-out by using the concept of "sharding" (note that there is an excellent white paper from Microsoft describing the different types of sharding). In this example we implement a custom LINQ provider that demonstrates the use of expression trees and TPL/PLINQ for running distributed queries on a sharded database. On top we put a custom WCF Data Service (=OData) to enable REST queries. Our goal is to do some rudimentary performance tests that compare a single 10GB SQL Azure database with 10 x 1GB SQL Azure databases (sharded).

The test scenario that we have chosen for this example is a real estate search engine. To keep things simple the database just consists of a single table with approx. 35 properties.

If you would like to rebuild the sample step by step for learning purposes you can follow the hands-on-lab shown in my blog at http://www.timecockpit.com/en/blogs/11-02-16/Custom_OData_Provider_for_Windows_Azure.aspx.

Building and Installing the Sample

You can download the attached sample solution and build it in Visual Studio 2010. You will need the Windows Azure SDK and Windows Azure Tools for Microsoft Visual Studio.

In order to run the example you have to perform the following steps:

Download and build the solution.
Create one large and ten small databases in SQL Azure (see image below).
Change the connection strings in ConnectionStrings.config (you find the file in the solution directory).
Run the project Samples.Sharding.DemoDataGenerator. This is a WPF application that generates demo data. Make sure to uncheck Work locally so that the data is generated in the cloud.
Now you are ready to run the unit tests in the project Samples.Sharding.Tests or launch the OData services.

Description

If you want to learn more about the sample I recommend starting with the standard WCF Data Service that is included in the solution. You can find it in DefaultRealEstateService.svc in the project Samples.Sharding.Service. Try to run it in your browser using the local development web server (if you like you could also add a cloud project and run it in the Windows Azure Compute Emulator or even deploy it to Windows Azure). Here is a sample OData query that you could try:

http://localhost:<YourPort>/DefaultRealEstateService.svc/RealEstate?$top=25&$filter=SizeOfParcel ge 200 and SizeOfParcel lt 1000 and HasBalcony eq true&$orderby=SizeOfBuildingArea desc

Behind the scenes the WCF Data Services runtime translates the query string into an expression tree. After that it uses the LINQ provider for SQL Server to translate the expression tree into T-SQL and run it in SQL Azure. For those of you who are not that familiar with WCF Data Services/OData I have added a sample query in the form of a unit test, too (see project Samples.Sharding.Tests):

Edit|Remove

[TestMethod]
public void TestEntityFrameworkProvider()
{
    using (var context = RealEstateEntities.Create())
    {
        Measure("Large", () => 
            (from re in context.RealEstate
                where re.Location == "New York City" && re.HasBalcony.Value
                orderby re.SizeOfGarden
                select re)
            .Take(25)
            .ToArray());
    }
}

[TestMethod] 
public void TestEntityFrameworkProvider() 
{ 
    using (var context = RealEstateEntities.Create()) 
    { 
        Measure("Large", () =>  
            (from re in context.RealEstate 
                where re.Location == "New York City" && re.HasBalcony.Value 
                orderby re.SizeOfGarden 
                select re) 
            .Take(25) 
            .ToArray()); 
    } 
}

You can find the core of the sharding logic in the file ShardingProvider.cs in the project Samples.Sharding.Provider. Our implementation of the custom LINQ provider is based on the IQToolkit. It takes the expression tree of the corresponding query and sends it to all shards in parallel using PLINQ. Note that we use .NET's visitor classes to analyize and manipulate the query's expression tree.

Edit|Remove

// Send query to all sharding databases in parallel
var result = this.ConnectionStrings
    .AsParallel()
    .WithDegreeOfParallelism(this.ConnectionStrings.Length)
    .SelectMany(connectionString =>
    {
        using (var context = this.ContextCreator(connectionString))
        {
            context.CommandTimeout = 300;
            var rewriter = new SwitchQueryable<TEntity>(Expression.Constant(this.EntityProvider(context)));
            var ex2 =
                Expression.Lambda<Func<IEnumerable<TEntity>>>(
                        Expression.Call(
                            methodInfoExpr.Method,
                            rewriter.Visit(methodInfoExpr.Arguments[0]),
                            methodInfoExpr.Arguments[1]));
            return ex2.Compile()().ToArray();
        }
    })
    .ToArray();

// Send query to all sharding databases in parallel 
var result = this.ConnectionStrings 
    .AsParallel() 
    .WithDegreeOfParallelism(this.ConnectionStrings.Length) 
    .SelectMany(connectionString => 
    { 
        using (var context = this.ContextCreator(connectionString)) 
        { 
            context.CommandTimeout = 300; 
            var rewriter = new SwitchQueryable<TEntity>(Expression.Constant(this.EntityProvider(context))); 
            var ex2 = 
                Expression.Lambda<Func<IEnumerable<TEntity>>>( 
                        Expression.Call( 
                            methodInfoExpr.Method, 
                            rewriter.Visit(methodInfoExpr.Arguments[0]), 
                            methodInfoExpr.Arguments[1])); 
            return ex2.Compile()().ToArray(); 
        } 
    }) 
    .ToArray();

You can immediately try the custom LINQ provider with sharding by running the corresponding unit test (see project Samples.Sharding.Tests). As you can see the LINQ query with and without sharding look nearly the same:

Edit|Remove

[TestMethod]
public void TestShardingProvider()
{
    using (var context = RealEstateEntities.Create())
    {
        Measure("Many small", () =>
            (from re in CreateQueryableRoot()
                where re.Location == "New York City" && re.HasBalcony.Value
                orderby re.SizeOfGarden
                select re)
            .Take(25)
            .ToArray());
    }
}

[TestMethod] 
public void TestShardingProvider() 
{ 
    using (var context = RealEstateEntities.Create()) 
    { 
        Measure("Many small", () => 
            (from re in CreateQueryableRoot() 
                where re.Location == "New York City" && re.HasBalcony.Value 
                orderby re.SizeOfGarden 
                select re) 
            .Take(25) 
            .ToArray()); 
    } 
}

Based on the LINQ provider the example contains an implementation of a rudimentary custom OData service (see project Samples.Sharding.Service). To keep the example simple it does not support references or data manipulations (if you want to know more about that topics I recommand reading this excellent series of blog posts by Alex James, a Program Manager working on the Data Services team at Microsoft). Here is a sample OData query that you could try:

http://localhost:<YourPort>/CustomRealEstateService.svc/RealEstate?$top=25&$filter=SizeOfParcel ge 200 and SizeOfParcel lt 1000 and HasBalcony eq true&$orderby=SizeOfBuildingArea desc

Performance Evaluation

If you compare the two OData services or unit tests manually you should already see that the sharding queries run much faster. We have published the OData service in Windows Azure and used the load testing tool LoadStorm to simulate load on both implementations. We defined three reference queries and let 10 to 50 concurrent users (step-up load testing scenario; each of them firing six queries per minute) do some queries. The result has been as expected: Because of the large database on the single SQL Azure server the standard OData service does not really scale. The first chart shows the number of users and the throughput. In the second chart you can see that from a certain number of concurrent users the response time gets greater then 35 seconds; as a result we see a lot of HTTP errors.

If we run the same load test against our sharding service the results look very different - of course. No errors and an average response time of approx. 3 seconds instead of more than 10 :-)

Critical Evaluation

Please keep in mind that this example is not production code! It should demonstrate the basic ideas of

Sharding,
Custom LINQ Providers and
Simple Custom OData Providers.

If you want to use a similar scenario in a real world project be prepared that you will have to write a lot more code. Typical issues that you will have to solve are:

Cluttered connection pools in case of a large number of databases (will be solved with the federation feature of SQL Azure that I have mentioned earlier)
Intelligent distribution of data across shards (i.e. finding a suitable sharding/federation key)
Intelligent distribution of queries across shards (e.g. don't query a shard if it cannot contain results because of the structure of your sharding/federation keys)
Query optimization using indexes (for simplicity this sample does not create any index except the PK)

I generally recommend that you use SQL Azure Federations as soon as this feature becomes available. However, there are situations in which you will not be able to use it (e.g. slightly different data models on different shards). I hope that this example will be useful for you in these cases.