SQL Server database replication tutorial
Denny Cherry, Contributor
SQL Server replication, a software package included in Microsoft SQL Server, is used to move
data from server to server in a transitionally consistent state from one instance to another.
SQL Server replication can
- Be unidirectional or bidirectional
- Be transmitted on a schedule or in real time
- Involve moving data to a single instance or to multiple instances
Replication topology
Replication topology consists of three servers – the subscriber, publisher and distributor. The
subscribers are the servers receiving the data. The publisher is the server that holds the initial
set of data being made available to the subscribers. The distributor is the server that contains
many of the settings.
Premium Access
Register now for unlimited access to our premium content across our network of over 70 information Technology web sites.
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States.
Privacy
This was first published in October 2008
It also holds the data as it is moved from the publisher to the
subscribers.
There are three different replication techniques available. They are snapshot replication, merge
replication and transactional replication.
Snapshot replication is a single unidirectional push of data. When updated data is fed from the
publisher to the subscribers, all of the data is sent each time.
Merge replication is a bidirectional replication that transmits data either in real time, or on
a schedule. Merge replication is the only bidirectional replication technique available.
Transactional replication is unidirectional from the publisher to the subscribers. Data can be
sent on a schedule or in real time. As data is transmitted to the subscriber, all data changes are
processed in the order they were made on the publisher.
When configuring replication, you can select which objects to replicate. Each database object
you replicate is called an article. Articles can be tables, views, stored procedures, functions,
rules, data types, etc. When selecting database objects, I recommend selecting all of the dependent
objects,
 |
More tips on SQL Server replication and availability: |
|
|
|
 |
 |
although it's not required. And while it's not required to
select child tables, you will want to make sure to replicate any user-defined data types and rules
that are used by the tables. Data types can be moved manually to the subscriber, but they must
exist on the remote server or the table creation will fail.
When configuring articles, you can configure filters on the articles. These filters are
effectively WHERE clauses telling SQL Server replication to transmit only a subset of the data in
the table. You can use any column that exists in the table as part of the vertical filter.
Horizontal filters can be used too. Horizontal filters are a fancy way of saying that a WHERE
clause is applied to the article when data is being replicated and only the data which matches the
WHERE clause is transmitted to the subscribers.
When to use which topology
Your key decision here is selecting which replication topology to use. Selecting the wrong
topology will give you very unsatisfactory end results.
Use snapshot replication when you want an occasional data push done. Because all data is moved
each time the snapshot is pushed, it uses a good amount of bandwidth. Only use snapshot replication
over a slow WAN if the amount of data changes to be replicated is greater than the size of the
initial data set. In other words, if a large portion of the data is being updated over and over,
snapshot will probably be the right technique. If that's not the case, then it won't be.
Merge replication should be used when you need to transmit changes from the publisher to the
subscriber, as well as from the subscriber back to the publisher. When you have multiple
subscribers, in time the changes are replicated out from the publisher to all the subscribers in
the network.
The transactional technique is probably the most common form of replication. It is used to
transmit data changes in near real time (or on a schedule) to one or more subscribers.
Transactional replication is most often used to move data from server to server in real time.
No matter which SQL Server replication topology you use, each subscriber is completely
independent from the other subscribers. Subscribers can fall behind for a number of reasons,
including network congestion, disk I/O congestion and locking and blocking by user processes.
Because they are independent of each other, a single subscriber slowing down will not cause the
other subscribers to slow down.
Replication agents
Data is moved by replication agents and there are several replication agents set up for each
publication. The snapshot agent is used by all three replication topologies. When replication is
first set up, a snapshot of the articles is taken and uploaded to each of the subscribers. When
using merge replication or transactional replication, a log reader agent will run on the
distributor and capture the changes in the transaction log. These changes are then logged into the
distribution database.
When using transactional replication, the changes are read from the distribution database and
applied to the subscriber by the distribution agent. When using merge replication, the changes are
read from the distribution database using the merge agent. The merge agent also pulls data changes
from the subscriber and uploads them to the publisher for distribution out to the other
subscribers.
When setting up subscriptions, you have two options for setting up each subscriber. You can use
a push subscription or a pull subscription. A push subscription is when the distribution agent runs
on the server hosting the distribution database. A pull subscription is when the distribution agent
runs on the server hosting the subscription database.
The advantage of push subscriptions is that the subscription agents are centralized on the
distributor. This limits your management overhead and keeps the load of the distribution (or merge)
agent off of the subscriber. The advantage of pull subscriptions is that the workload of the
distribution agents is spread out across all of the subscribers. Pull subscriptions are recommended
for use when the distribution database is located on the same server or instance as the
publisher.
Replication gotchas
When setting up your distributor, you'll be prompted to select a folder for storing the
snapshots. When using all push subscriptions, this can be a local drive path. When using pull
subscriptions or a combination of push and pull subscriptions, you must specify a network share.
This network share should not be an administrative share, and the domain accounts running the SQL
Server agent on each of the subscribers need to have read and write access to the network
share.
When running all push subscriptions and you have more than 30 or so subscriptions, you'll see
timeout errors when the subscriptions attempt to start. The quick fix for this is to edit the job
that is the distribution or merge agent. Edit step 2 and change the job type to Operating System
command. Then put the full path and executable name of the replication command in front of the
existing parameters. The default path for SQL Server 2008 is C:\Program Files\Microsoft SQL
Server\100\COM (replace 100 with 80 for SQL 2000, and 90 for SQL 2005). When running a distribution
agent, use distrib.exe, and when running a merge agent, use replmerg.exe.
Troubleshooting SQL Server replication can be tricky. The agents don't give much error data by
default. You can adjust the amount of error data you receive by changing the -OutputVerboseLevel
switch in the job properties. By increasing the number from the default, more error data will be
logged to the job step. You can also stop the SQL Agent job that runs the agent, and run the
command from a DOS command prompt to easily see more error data.
When SQL Server replication has a lot of data to transmit, a decent amount of network bandwidth
is required in order to keep it up to date. If enough bandwidth is not available, then replication
will slowly fall further and further behind. If you have a low-bandwidth, low-latency network, it
can help to increase the number of threads being used by adding the –SubscriptionStreams switch (or
increasing the number if the switch is already there). If you have a high-latency network, then
increasing this setting may not increase performance because transactional integrity is maintained
between the streams.
ABOUT THE AUTHOR
Denny Cherry has over a decade of experience managing SQL Server, including MySpace.com's
over 175-million-user installation, one of the largest in the world. Denny's areas of expertise
include system architecture, performance tuning, replication and troubleshooting. He currently
holds several Microsoft certifications related to SQL Server and is a Microsoft
MVP.
Check out his blog: SQL Server with Mr.
Denny.
MEMBER FEEDBACK TO THIS TIP
Do you have a comment on this tip? Let
us know.
Disclaimer:
Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.
Join the conversationComment
Share
Comments
Results
Contribute to the conversation