Tuesday, August 31, 2010

Replication Gotcha - Including An Article In Multiple Publications

When administering replication topologies it's common to group articles into publications based on roles that subscribers fulfill. Often you'll have multiple subscriber roles and therefore multiple publications, and in some cases a subset of articles are common between them. There's nothing to prevent you from adding the same article to more than one publication but I wanted to point out how this can potentially lead to major performance problems with replication.

Let's start with a sample table:

CREATE TABLE [dbo].[ReplDemo]
   
(
     
[ReplDemoID] [int] IDENTITY(1, 1) NOT FOR REPLICATION
                        
NOT NULL ,
     
[SomeValue] [varchar](20) NOT NULL ,
     
CONSTRAINT [PK_ReplDemo] PRIMARY KEY CLUSTERED ( [ReplDemoID] ASC )
       
ON [PRIMARY]
   
)
ON  [PRIMARY]
GO

Now let's pretend that we need this table replicated to two subscribers which have different roles. We'll create one publication for each role and add the table to both publications:

-- Adding the transactional publication
EXEC sp_addpublication @publication = N'ReplDemo Publication A',
   
@description = N'Publication to demonstrate behavior when same article is in multiple publications',
   
@sync_method = N'concurrent', @retention = 0, @allow_push = N'true',
   
@allow_pull = N'true', @allow_anonymous = N'false',
   
@enabled_for_internet = N'false', @snapshot_in_defaultfolder = N'true',
   
@compress_snapshot = N'false', @ftp_port = 21, @ftp_login = N'anonymous',
   
@allow_subscription_copy = N'false', @add_to_active_directory = N'false',
   
@repl_freq = N'continuous', @status = N'active',
   
@independent_agent = N'true', @immediate_sync = N'false',
   
@allow_sync_tran = N'false', @autogen_sync_procs = N'false',
   
@allow_queued_tran = N'false', @allow_dts = N'false', @replicate_ddl = 1,
   
@allow_initialize_from_backup = N'false', @enabled_for_p2p = N'false',
   
@enabled_for_het_sub = N'false'
GO
EXEC sp_addpublication_snapshot @publication = N'ReplDemo Publication A',
   
@frequency_type = 1, @frequency_interval = 0,
   
@frequency_relative_interval = 0, @frequency_recurrence_factor = 0,
   
@frequency_subday = 0, @frequency_subday_interval = 0,
   
@active_start_time_of_day = 0, @active_end_time_of_day = 235959,
   
@active_start_date = 0, @active_end_date = 0, @job_login = NULL,
   
@job_password = NULL, @publisher_security_mode = 1
GO
-- Adding the transactional articles
EXEC sp_addarticle @publication = N'ReplDemo Publication A',
   
@article = N'ReplDemo', @source_owner = N'dbo',
   
@source_object = N'ReplDemo', @type = N'logbased', @description = N'',
   
@creation_script = N'', @pre_creation_cmd = N'drop',
   
@schema_option = 0x00000000080350DF,
   
@identityrangemanagementoption = N'manual',
   
@destination_table = N'ReplDemo', @destination_owner = N'dbo', @status = 8,
   
@vertical_partition = N'false',
   
@ins_cmd = N'CALL [dbo].[sp_MSins_dboReplDemo]',
   
@del_cmd = N'CALL [dbo].[sp_MSdel_dboReplDemo]',
   
@upd_cmd = N'SCALL [dbo].[sp_MSupd_dboReplDemo]'
GO

-- Adding the transactional publication
EXEC sp_addpublication @publication = N'ReplDemo Publication B',
   
@description = N'Publication to demonstrate behavior when same article is in multiple publications',
   
@sync_method = N'concurrent', @retention = 0, @allow_push = N'true',
   
@allow_pull = N'true', @allow_anonymous = N'false',
   
@enabled_for_internet = N'false', @snapshot_in_defaultfolder = N'true',
   
@compress_snapshot = N'false', @ftp_port = 21, @ftp_login = N'anonymous',
   
@allow_subscription_copy = N'false', @add_to_active_directory = N'false',
   
@repl_freq = N'continuous', @status = N'active',
   
@independent_agent = N'true', @immediate_sync = N'false',
   
@allow_sync_tran = N'false', @autogen_sync_procs = N'false',
   
@allow_queued_tran = N'false', @allow_dts = N'false', @replicate_ddl = 1,
   
@allow_initialize_from_backup = N'false', @enabled_for_p2p = N'false',
   
@enabled_for_het_sub = N'false'
GO
EXEC sp_addpublication_snapshot @publication = N'ReplDemo Publication B',
   
@frequency_type = 1, @frequency_interval = 0,
   
@frequency_relative_interval = 0, @frequency_recurrence_factor = 0,
   
@frequency_subday = 0, @frequency_subday_interval = 0,
   
@active_start_time_of_day = 0, @active_end_time_of_day = 235959,
   
@active_start_date = 0, @active_end_date = 0, @job_login = NULL,
   
@job_password = NULL, @publisher_security_mode = 1
GO
-- Adding the transactional articles
EXEC sp_addarticle @publication = N'ReplDemo Publication B',
   
@article = N'ReplDemo', @source_owner = N'dbo',
   
@source_object = N'ReplDemo', @type = N'logbased', @description = N'',
   
@creation_script = N'', @pre_creation_cmd = N'drop',
   
@schema_option = 0x00000000080350DF,
   
@identityrangemanagementoption = N'manual',
   
@destination_table = N'ReplDemo', @destination_owner = N'dbo', @status = 8,
   
@vertical_partition = N'false',
   
@ins_cmd = N'CALL [dbo].[sp_MSins_dboReplDemo]',
   
@del_cmd = N'CALL [dbo].[sp_MSdel_dboReplDemo]',
   
@upd_cmd = N'SCALL [dbo].[sp_MSupd_dboReplDemo]'
GO

After creating the publications we create our subscriptions, take & apply the snapshot, and we're ready to start making changes so we execute this simple insert statement:

INSERT  INTO dbo.ReplDemo
       
( SomeValue )
VALUES  ( 'Test' )

Here's the million dollar question: How many times does this insert statement get added to the distribution database? To find out we'll run the following statement on the distributor (after the log reader agent has done it's work, of course):

SELECT  MSrepl_commands.xact_seqno ,
       
MSrepl_commands.article_id ,
       
MSrepl_commands.command_id ,
       
MSsubscriptions.subscriber_id
FROM    distribution.dbo.MSrepl_commands AS [MSrepl_commands]
       
INNER JOIN distribution.dbo.MSsubscriptions AS [MSsubscriptions] ON MSrepl_commands.publisher_database_id = MSsubscriptions.publisher_database_id
                                                             
AND MSrepl_commands.article_id = MSsubscriptions.article_id
       
INNER JOIN distribution.dbo.MSarticles AS [MSarticles] ON MSsubscriptions.publisher_id = MSarticles.publisher_id
                                                             
AND MSsubscriptions.publication_id = MSarticles.publication_id
                                                             
AND MSsubscriptions.article_id = MSarticles.article_id
WHERE   MSarticles.article = 'ReplDemo'
ORDER BY MSrepl_commands.xact_seqno ,
       
MSrepl_commands.article_id ,
       
MSrepl_commands.command_id

Here's the output of the statement:

Query Results

That's one row for each publication the table article is included in. Now imagine that an update statement affects 100,000 rows in the table. In this example that would turn into 200,000 rows that will be inserted into the distribution database and need to be cleaned up at a later date. It's not hard to see how this could lead to performance problems for tables that see a high volume of insert\update\delete activity.

Workarounds
Two workarounds for this behavior come to mind:

  1. Modify data using stored procedures, then replicate both their schema and execution. This won't help for insert statements and is useless if you're only updating\deleting a single row each time the procedure executes. This also assumes that all dependencies necessary for the stored procedure(s) to execute exist at the subscriber
  2. Limit table articles to one publication per article. If you're creating publications from scratch then place table articles that would otherwise be included in multiple publications into their own distinct publication. If you're working with existing publications that already include the table article then subscribe only to the article(s) that you need rather than adding the article to another publication. (Subscribing to individual articles within a publication can get tricky - I'll demonstrate how to do this in a future post)

Tuesday, August 24, 2010

Replication Gotcha: Blank XML

Transactional replication in SQL Server 2005\2008 can handle the XML datatype just fine with few exceptions - one in particular being when the XML value is blank. I'll save the argument about whether or not a blank (or empty string if you prefer) value is well formed XML for another day because the point is that SQL Server allows it. Consider the following table:

CREATE TABLE [dbo].[XMLReplTest]
   
(
     
[XMLReplTestID] [int] IDENTITY(1, 1) NOT FOR REPLICATION
                           
NOT NULL ,
     
[SomeXML] [xml] NOT NULL ,
     
CONSTRAINT [PK_XMLReplTest] PRIMARY KEY CLUSTERED
       
( [XMLReplTestID] ASC ) ON [PRIMARY]
   
)
ON  [PRIMARY]
GO

Execute the following statement and you'll see that SQL Server handles it just fine:

INSERT  INTO dbo.XMLReplTest
       
( SomeXML )
VALUES  ( '' )

Now let's add this table to a transactional replication publication:

-- Adding the transactional publication
EXEC sp_addpublication @publication = N'XML Replication Test',
   
@description = N'Sample publication to demonstrate blank XML gotcha',
   
@sync_method = N'concurrent', @retention = 0, @allow_push = N'true',
   
@allow_pull = N'true', @allow_anonymous = N'false',
   
@enabled_for_internet = N'false', @snapshot_in_defaultfolder = N'true',
   
@compress_snapshot = N'false', @ftp_port = 21, @ftp_login = N'anonymous',
   
@allow_subscription_copy = N'false', @add_to_active_directory = N'false',
   
@repl_freq = N'continuous', @status = N'active',
   
@independent_agent = N'true', @immediate_sync = N'false',
   
@allow_sync_tran = N'false', @autogen_sync_procs = N'false',
   
@allow_queued_tran = N'false', @allow_dts = N'false', @replicate_ddl = 1,
   
@allow_initialize_from_backup = N'false', @enabled_for_p2p = N'false',
   
@enabled_for_het_sub = N'false'
GO
EXEC sp_addpublication_snapshot @publication = N'XML Replication Test',
   
@frequency_type = 1, @frequency_interval = 0,
   
@frequency_relative_interval = 0, @frequency_recurrence_factor = 0,
   
@frequency_subday = 0, @frequency_subday_interval = 0,
   
@active_start_time_of_day = 0, @active_end_time_of_day = 235959,
   
@active_start_date = 0, @active_end_date = 0, @job_login = NULL,
   
@job_password = NULL, @publisher_security_mode = 1
GO

-- Adding the transactional articles
EXEC sp_addarticle @publication = N'XML Replication Test',
   
@article = N'XMLReplTest', @source_owner = N'dbo',
   
@source_object = N'XMLReplTest', @type = N'logbased', @description = N'',
   
@creation_script = N'', @pre_creation_cmd = N'drop',
   
@schema_option = 0x00000000080350DF,
   
@identityrangemanagementoption = N'manual',
   
@destination_table = N'XMLReplTest', @destination_owner = N'dbo',
   
@status = 8, @vertical_partition = N'false',
   
@ins_cmd = N'CALL [dbo].[sp_MSins_dboXMLReplTest]',
   
@del_cmd = N'CALL [dbo].[sp_MSdel_dboXMLReplTest]',
   
@upd_cmd = N'SCALL [dbo].[sp_MSupd_dboXMLReplTest]'
GO

Assume we've created the publication, added a subscriber, taken & applied the snapshot, and we're ready to start changing data. Let's throw a monkey wrench into the works by executing the insert statement with the blank XML again and watch what happens to the log reader agent:

Log Reader Agent Error

That's not a very nice error (or resolution)! I've been able to reproduce this behavior in SQL 2005 & 2008 but I have not tried it in 2008 R2. I've entered a Connect bug report so hopefully this is fixed in a forthcoming cumulative update. In the meantime there is a simple workaround - add a check constraint. Since we're working with the XML datatype the only option for checking length with a scalar function is DATALENGTH. The DATALENGTH for a blank xml value is 5 so we want to check that any inserted or updated value is greater than 5:

ALTER TABLE dbo.XMLReplTest ADD CONSTRAINT
  
CK_XMLReplTest_SomeXML CHECK (DATALENGTH(SomeXML) > 5)
GO

If you are affected by this behavior please consider taking a moment to go vote for it on Connect.

Thursday, August 19, 2010

Thoughts On The 2010 PASS BoD Elections Candidate List

Yesterday PASS published the list of candidates on the ballot for the 2010 Board of Directors election. Much to my surprise my friends Steve Jones (Blog | Twitter) and Jack Corbett (Blog | Twitter) did not make the final cut. The ensuing chatter on Twitter; comments on Steve & Jack's blog posts here, here, & here; discussion thread on SQLServerCentral; and blog posts from Andy Leonard (Blog | Twitter) here, here, and here echo the sentiment. After weighing the issue from both sides I offer my own 2 cents on the matter.

Random Thoughts

  • The nomination committee (hereby referred to as the NomCom) stayed within the boundaries of a process they were given to follow. The BoD had the opportunity to modify the NomCom's list and did not. This was not a failure on the NomCom's part.
  • Kudos to Stuart Ainsworth (Blog | Twitter), member of the NomCom, for his post about the thought process that went into their decision.
  • Welcome to politics, where you're never going to be able to please everyone 100% of the time.
  • Is the outcry of support for Steve the general consensus of the community at large or just what we're hearing from the vocal minority?

My Message To The SQL Community

  • Regardless of what you feel about Steve's exclusion from the ballot don't forget that there are five great candidates who made the cut. I'd rather have to eliminate good candidates than have too few to pick from.
  • Be careful what you wish for. The SQL community cried foul when last year's ballot included a candidate who no one knew much about. Prior to this year's election PASS created an elections website which gave everyone insight into the process. Would you rather have a repeat of last year's election with almost no transparency?
  • Put up or shut up. At last year's Summit the entire BoD was available for an hour-long Q&A session. I was there and if memory serves me correct there were as many BoD members on stage answering questions as people who showed up to ask them. Things won't change if you take your ball and go home because you didn't get your way. Use this as motivation to dig in and fight for a change if you're not happy.
  • Don't give up on PASS. For as many things as the community criticizes there are just as many that PASS gets right.

My Suggestion To The PASS Board of Directors
Don't stonewall the community who elected you. This is your chance to show that you are open to listening to the people you represent. If I were in your shoes I'd do two things:

  1. Publish the yea\nay votes cast by each member for the final slate of candidates. Some of the current BoD members are up for re-election and I think the community at large would be interested in knowing where each member stands.
  2. Give the community the chance to prove that you're wrong. Just like you recently did for the Summit, hold a community choice vote for which of the candidates not selected should be on the final ballot. To ensure it's not a vocal minority who are upset set a minimum number of total votes (500? 1000?) that must be received in order for the results to be valid. Is this fair to the other 5 candidates? Sure, they're already through and can begin campaigning now; whoever gets picked as the community choice candidate will have the disadvantage of being a few weeks behind.

That's my opinion - take it for what it is. Agree? Disagree? Feel free to leave comments and let me know what you think.

Tuesday, August 10, 2010

The PASS SQLRally Logo Winner Is…

SQLRally Winner

With 63% of the total votes our winner is Speedometer! Congratulations to azzam on 99designs and thank you to everyone else who submitted logos and took the time to vote for their favorite design.

Wednesday, August 4, 2010

Vote For The PASS SQLRally Logo

We received over 100 submissions for our 99designs contest to design the SQLRally logo. The SQLRally planning team, with help from the PASS Board of Directors, have narrowed the list down to 3 finalists. Now it's up to the SQL community to help pick the winner! Check out the 3 logos below, then head on over to the voting page and cast your vote for the best design. Voting is open until Monday, August 9, 3 PM Eastern\12 PM Pacific time. Don't wait, vote today!

CheckeredFlagFastLaneSpeedometer