BCP | TroubleshootingSQL

I had recently run into an issue where the MaxBCPThreads value was set to a non-zero value and the Snapshot Agent was still not creating multiple snapshot files for the table in question which was over 70GB in size.

On further inspection, I did find that the issue was with the statistics distribution of the table in question was affecting the parallel file export by the snapshot agent. It has already been documented that the concurrent snapshot generation option prevents the use of parallel BCP threads by the Distribution Agent. This is documented on the ReplTalk blog on MSDN.

Now to find out why the parallel threads were not being generated. So I created a repro of the situation on my side with a transaction replication publication having three tables with the same schema options. The snapshot folder after the snapshot agent’s execution is as shown in the screenshot.

The tblNewArticle6 has 60398 rows, tblNewArticle3 has 100,000 rows and the tblNewArticle1 has 2 rows.

The MaxBCPThreads configured value for the snapshot agent is 6. Surprisingly, tblNewArticle6 has 12 BCP files and the tblNewArticle3 has only one BCP file.

Looking into a profiler trace and the verbose log for the Snapshot Agent, I found out the following information:

1. There is a thread which performs an exploration on what ranges the parallel BCP threads will be fetching. The application name in the profiler trace will show up as “Auxiliary bcp load ordering hint and partitioning resolution thread # for publication <publication name>”. See Screenshot 2 for more details.

2. You will also find dynamic SQL commands executing DBCC SHOW_STATISTICS against the tables in question.

In my case, tblNewArticle6 has a data distribution graph as shown in screenshot 3. It is clear from the screenshot that the tblNewArticle6 has multiple ranges available for the index id = 1. These were utilized by the Snapshot agent to generate 12 BCP files.

On looking into the Snapshot agent verbose log, I find find the following

Partitioning where clauses for article ‘tblnewArticle6’:

([SalesOrderNumber] is null) or ([SalesOrderNumber] <= convert(nvarchar(7), 0x53004F0035003000380034003400) collate SQL_Latin1_General_CP1_CI_AS)
([SalesOrderNumber] > convert(nvarchar(7), 0x53004F0035003000380034003400) collate SQL_Latin1_General_CP1_CI_AS) and ([SalesOrderNumber] <= convert(nvarchar(7), 0x53004F0035003300310039003500) collate SQL_Latin1_General_CP1_CI_AS)
…
…
([SalesOrderNumber] > convert(nvarchar(7), 0x53004F0037003300300037003600) collate SQL_Latin1_General_CP1_CI_AS)

The above ranges translated to 12 ranges which is the same number of files that I see in my snapshot folder.

Now looking into tblNewArticle3 which has 100,000 rows, I have a DBCC SHOW_STATISTICS output which only showed me two distinct ranges. Another point to note was that the primary key of the table has a non-clustered index defined on it and the clustered index key is a non primary key column. On increasing the number of rows in the table, I still find that the number of rows in the table, tblNewArticle3, I still found that the number of BCP files that were being generated was only 1. Then on additional investigation, I found that there is a small requirement of the clustered index to contain the primary key columns for the published article as well. This is evaluated in one of the system stored procedures for replication.

Some deep diving into the profiler traces, snapshot agent logs and debugging the stored procedures will help you arrive at the following conclusion:

Multiple BCP files will be created by the Snapshot Agent only if the following are true:

1. There are multiple distinct ranges available in the clustered index’s statistics histogram to create partitions which can be used by the BCP program to extract the data in parallel.

2. The clustered index must be defined on the primary key columns.

3. MaxBCPThreads value is set to a value other than 1.

If the above is true, then you will see multiple BCP files created by the Snapshot agent for the table of interest. Note that the above information is simplified to ensure that I comply with my NDA. After making the above changes, I see the following output in the repldata folder shown below in the screenshot:

Additional Reference:
Improving snapshot performance using MaxBCPThreads

Often on data recovery cases, we need to extract all the user table data into a new database. There are multiple options to perform this task:

1. Export/Import Wizard

2. Custom SSIS Task

3. Copy Database Wizard

4. BCP operations

5. SELECT INTO T-SQL commands

I find that the last option is the easiest. I use the following script to get all the T-SQL SELECT…INTO commands for all user tables in the old database:

 select 'select * into <new db>.'+b.name+'.['+a.name+'] from <old db>.'+b.name+'.['+a.name+']' from sys.objects a inner join sys.schemas b on a.schema_id = b.schema_id where a.type = 'U'

Note: You will have to replace the new database and old database names with the appropriate names. The above script only works for versions of SQL Server 2005 and above.

TroubleshootingSQL

Explaining the bits and bytes of data in an AI world

Tag Archives: BCP

MaxBCPThreads not causing parallel export

Generating SELECT INTO scripts for all user tables

How to remove unwanted Quotation Marks while importing a data file

TroubleshootingSQL

Explaining the bits and bytes of data in an AI world

Share this post:

Share this post:

Share this post: