Ok, let’s break it down:
- How are joins constructed between two tables on multiple databases?
(A code example here would be helpful).
This is pretty straightforward. SQL Objects have anywhere from a one to four part naming convention:
Servername.databasename.schemaname.tablename
If all your tables are on the same server on the same database , with the same owner/schema, you can just ignore the first three parts and use what you are most used to:
Select a.*,b.* from
tableA a inner join
tableB b on a.col1=b.col1
If one of your tables is in a different database and both use the default schema for their databases, then you simply add the database to the second table:
Select a.*,b.* from
tableA a inner join
databaseC..tableB b on a.col1 = b.col1
If you happen to be in a third database different from either of the ones you are querying you use both database names explicitly:
Select a.*,b.* from
databaseD..tableA a inner join
databaseC..tableB b on a.col1 = b.col1
If you end up using different schemas and/or owners you can add those in:
Select a.*,b.* from
databaseD.john.tableA a inner join
databaseC.accounting.tableB b on a.col1 = b.col1
And lastly, if you are very careful about it and have a very good reason, you can join a (usually small) table on another server:
Select a.* from
databaseD.john.TableA a inner join
ATLANTA.databaseC.accounting.tableB b on a.col1 = b.col1
- When is it time to move beyond a 1 database/1 server setup? How common is it to need to do this? Are there any special strategies for tracking which tables are in which database?
I’ll combine these two because they go together. You’re almost always generally fine to start off with the assumption that one database one server is enough until your design/business/technical constraints force you to use more.
So to answer your second question first, since you generally have a reason for having separate databases, it should be fairly obvious from knowing the design of your system where something is.
As to when/why it’s necessary to move beyond a single database. Usually it’s a mix of business rules, politics, and/or technical reasons.
For instance, where I work we have 16 databases spread across 4 servers. We have a MainDB, ImageDB, referencetableDB, HighvolumeTransactionDB, ReportingDB, StagingDB, ProcessingDB, ArchiveDB, FinancialDB. To give some examples of why they are different:
- FinancialDB, sensitive information
- Image DB, specific different storage and recovery requirements
- ReferenceDB, low transaction,
high read
- ReportingDB, very high read, needs to be restored/replicated to various other environments unlike a lot of the other data
- StagingDB, nothing permanent, just a beefed up tempdb that we have more control over
- MainDB, interfaces with all the other DBs but needs differential backups so...we split out the
- HighVolumeTransaction tables, (which are relatively transient), to their own DB so
as to keep the backup reasonable size.
- Archive, Lots of the same data from Main and Reporting, but with longer retention periods and harder hitting queries digging deep in the data. If this was still combined with Main/Reporting it would bog our system down.
• Does the application code need to know that one or more databases are spread across multiple servers? If not, at what level are the requests filtered?
In a broad sense, they probably do. At a minimum they need to know what server they are pointing at in the database connection string. Processing, Reporting, Main, etc.
From there, they do need a database context to execute under. Generally that would be the most used one for the application, maybe even the original one from the one database/one server days of the application. You CAN have the application explicitly switch database context on every call but that makes it very hard to adjust the database without changing the app.
The usual, (or at least, MY usual), approach is to always access through one or maybe two main databases.
Then create views into other databases as necessary combined with interfacing with the database through stored procedures.
So to illustrate:
Let’s say you want to get a Client’s demographic information, Sales data and Credit balance and that’s spread across three tables originally all in the MainDB.
So you write a call from your app:
Select c.ClientName, c.ClientAddress, s.totalSales,f.CreditBlance from
Clients c join Sales s on c.clientid = s.clientid inner join AccountReceivable f on
c.clientid=f.clientid where c.clientid = @clientid
Awesome. However, now anytime we change a columname, or rename/move a table, you have to update the app code. So instead we do two things:
Create Clients, Sales, AccountReceivables Views (you wouldn’t use Select * but I’m demoing here)
Use MainDB
GO
Create view v_Clients as select * from Clients
Create view v_Sales as select * from Sales
Create view v_AccountReceivable as select * from AccountReceivable
Go
Then we’d also create a stored procedure, spGetClientSalesAR
Create proc spGetClientSalesAR @clientID int
as
Select c.ClientName as ClientName,
c.ClientAddress as ClientAddress,
s.totalSales as TotalSales,
f.CreditBlance as CreditBalance
from
v_Clients c join v_Sales s
on c.clientid = s.clientid
inner join v_AccountReceivable f
on c.clientid=f.clientid
where c.clientid = @clientid
And have your app call that.
Now as long as I don’t change the interface on that stored proc, I can pretty much do anything I need to do to the backend database to scale up or out.
In the extreme, I could even make my old MainDB just a bunch of shelled stored procedures and views such that underneath those views we created looked like this:
Create view v_Clients as select * from ServerX.DatabaseY.dbo.Clients
Create view v_Sales as select * from ServerQ.DatabaseP.dbo.Sales
Create view v_AccountReceivable as select * from ServerJ.DatabaseK.dbo.AccountReceivable
And your app would never know the difference, (assuming fast pipes and well staged data among other things).
Obviously that’s extreme and I’d be lying if I said everything was planned this way, but using stored procedures/views even if you do it while refactoring will allow you a lot of flexibility as your app grows from its humble one database/one server beginning.