Removing duplicate rows from a table

This seems to be a self-explanatory title and before you start raising your eyes about the need for a blog post on this, let me set the context. I am not going to teach how to do some thing which is mentioned in a KB Article:

How to remove duplicate rows from a table in SQL Server
http://support.microsoft.com/kb/139444

What I am going to cover in a post is “when you have inconsistencies in a SQL Server database table regarding duplicate entries in a column which has a Primary Key constraint defined on it, you need to be aware of a gotcha before you start tracking down the duplicate rows”.

For this I have table with the following definition:

create table t (c1 int primary key)

When I do a SELECT on the table, I get the following output:

c1


100

100

Yes, there are duplicate rows in the Primary Key column! And no, I am not going to show how I made that happen. There are some pretty smart people out there who already know this! Smile And for the rest who are still in the dark as to how this can be accomplished, I am not covering any undocumented stuff here!

A DBCC CHECKTABLE on the table yields the following error:

DBCC results for ‘t’.

Msg 2512, Level 16, State 2, Line 1

Table error: Object ID 85575343, index ID 1, partition ID 72057594038976512, alloc unit ID 72057594039894016 (type In-row data). Duplicate keys on page (1:192) slot 0 and page (1:192) slot 1.

There are 2 rows in 1 pages for object “t”.

CHECKTABLE found 0 allocation errors and 1 consistency errors in table ‘t’ (object ID 85575343).

repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKTABLE (dbCorrupt.dbo.t).

 

When I use the query from the above KB Article to find the duplicate rows, I get back no records:

select c1,COUNT(*) as counts
from t
group by c1
having COUNT(*) > 1

In the above query, if you remove the HAVING clause, you will find that both the rows are returned with a counts column value of 1 each. This is because one of the values was changed to 100 without the knowledge of the Storage Engine.

When I execute the query below, I get back a single row:

select c1 from t where c1 = 100

If you look at the Scan Count from the Statistics IO output of the query, you will find that the Scan Count is 0. The logic behind this is explained on a previous post that I had written on the MSDN SQLServerFAQ blog some time back. This is because SQL Server doesn’t expect to retrieve more than 1 row when searching for a value belonging to a primary key column.

Now, I put all the contents of the table “t” into another table “t1” and the above query to find out duplicates returned the c1 = 100 having 2 records. Also, when I drop the primary key column on the table, I was able to retrieve the duplicate rows. The catch now is to determine which duplicate row needs to be deleted if the non-primary key columns of the table have different values. This requires understanding of the application business logic and tracing down the prior inserts done in the table which have been logged at a different layer. NEVER delete duplicate rows without understanding the repercussions of deleting a row from a production database table unless and until the duplicate rows have the same data. Eg. In the above table, I can use a DELETE TOP 1 command to delete from the table “t” because I only have one column in the table and both the duplicate rows have the same values.

To summarize, if your Primary Key column has duplicate values, then you need to either put all the rows into another table or remove the Primary Key constraint and then find out the duplicates. Just thought I would put this out for everyone in case you weren’t able to identify why duplicate rows were not showing up using a SELECT..GROUP BY.. HAVING query where as DBCC CHECK commands were reporting errors.

Removing primary transaction log file

Recently I was discussing about removing transaction log files on the #sqlhelp hashtag on Twitter. The question was whether we could remove an existing transaction log file after adding a new one. During the course of the discussion, I learnt that the file that was being removed was the primary transaction log file of the database.

When you attempt to remove the primary log file of a database from Management Studio or using ALTER DATABASE command, you will receive the following error:

Drop failed for LogFile ‘dbLogTest_log’.  (Microsoft.SqlServer.Smo)

The primary data or log file cannot be removed from a database. (Microsoft SQL Server, Error: 5020)

This is by-design. We do not allow the removal of the primary transaction log file of user databases. This was first introduced in SQL Server 2000.

Tibor (Blog), a SQL Server MVP, has already blogged about how to remove transaction log files using T-SQL commands here. So, I am not going to duplicate the post here again by providing the same T-SQL commands.

T-SQL Tuesday: Common SQL Server myths – Series I–Setup and Services

imageAs part of the Microsoft CSS group, we get to deal with a lot of production as well as non-production related issues. While we are troubleshooting such issues, we also end of dispelling some or the other myth that exists regarding SQL Server architecture.

So, this time when Sankar Reddy (Blog | Twitter), a SQL Server MVP, decided to host T-SQL Tuesday, a community blogging effort started by long time SQL Server MVP, Adam Machanic (Blog | Twitter), I decided to contribute a post on this month’s topic “Misconceptions in SQL Server”.

There are multiple posts by very eminent people in the SQL community that have debunked various SQL Server myths. If you are not a SQL Server specialist and buy these myths that have spread around in the community like wild fires, you would probably think that SQL Server and MS Access are the same. And as any good SQL Server DBA would know that this is not true! Since, T-SQL Tuesday is an effort to enhance and share community knowledge, I shall attempt to refute and clarify certain common myths that I deal with day-in and day-out during the course of my work with SQL CSS. My first series will be on

Series #1: SQL Server Services and Patching

Thumbs downIt is alright to change the SQL Service account and password from the Service Manager snap-in
This is one of the most common misconceptions that people across the world have in this area. The SQL Server Configuration Manager has been designed to do the following (other than a host of other configuration activities):

  1. Change SQL service account
  2. Update the password of the SQL service account
  3. Make configuration related changes like adding trace flags

If you have not used Configuration Manager to do any of the above and have not faced an issue, then you are lucky! But Lady Luck may not always be on your side! Sarcastic smileThe Configuration Manager (available from SQL Server 2005 onwards) does a host of other activities when you make a change to your service account like granting the necessary permissions required on the Registry for the SQL service account etc. which the Services Manager doesn’t do. Even in SQL Server 2000, service account changes should be done through Enterprise Manager as it grants the necessary permissions required to use the Full-text feature. As new features get added to the product, a lot of considerations w.r.t. ACLs, Registry permissions, encryption keys, etc. need to be taken into account while modifying SQL Service accounts. The Configuration Manager was designed keeping these considerations in mind. So, this is not a new recommendation but has been around for more than a decade.

The other option that you have is to use WMI and the SQL Server namespaces to programmatically change the SQL service account. I had posted an example of this on Technet. If you want to modify Reporting Services service accounts, then it would be a bit more complex because you would need to manage the encryption keys etc.


Thumbs downWindows Cleanup Utility can be used to uninstall SQL Server
All I have to say to this is NOT a good idea if you still want to be the DBA, Windows Cleanup Utility shouldn’t be used unless and until Microsoft has published the same (via an official blog post or KB Article) to be used for a certain scenario. Windows Cleanup Utility is a generic tool that uses Windows Installation GUIDs from the registry to annihilate the product’s existence from your box! Since, we use Windows Installer to install and patch SQL Server (true for SQL Server 2005 to SQL Server 2008 R2), the SQL component registry entries are centrifugal to the SQL’s existence on the box. If these are not cleaned up by our uninstaller program correctly, then you can have a server which behaves very goofily the next time you run a SQL installation program. The worst case scenarios that I have seen are:

  1. No further patching possible for existing instances of SQL on the box
  2. No further new installations of SQL Server on the box

You wouldn’t want to land up in either situation. In such situations, you might even have to rebuild the box! Not pretty when you have multiple instances running on the server! Surprised smile


Thumbs downDifferent nodes can have different builds for the same SQL instance
SQL Server clusters are not supported in scenarios where different nodes belonging to the same clustered instance are on different builds. The reason behind this is that are be code differences in the SQL builds that we release. So if you failover to a lower build of the SQL Server, then you can expect to see some known issues surfacing which do not occur on the other node/nodes.


Thumbs downSQL Server setup will automatically patch the newly added node
If you have recently added a node to the SQL instance, then you should patch it immediately to the current build of the SQL clustered instance that you added the node to, before putting that node into production. Till SQL Server 2005, there was no option of slip-streaming such installations. You can use the steps mentioned in the following KB Article in case you run into issues while patching the newly added node.


Thumbs downSQL Server services don’t go offline while patching the SQL instance
Another common misconception that makes applications connecting to the SQL Server instance unhappy Sad smile (read: application timeouts) if they are kept online during SQL setup. At a high level, I like to look at SQL Server setup as a two step process. The first step involves MSI (Windows Installer) actions, which are responsible for making the necessary registry changes, laying down the binaries and other Windows related configuration changes including .NET (if necessary). The second step is when the SQL Server instance is started up and configuration scripts (T-SQL) are executed to configure the SQL instance and make the necessary meta-date changes required for the system metadata residing in the SQL system databases. For the first step, the SQL instance needs to be offline as the system database files(like mssqlsystemresource) cannot be updated if the SQL instance has a lock on the files. When the SQL Service is started for configuring the instance, it is in single user mode with certain parameters, so that applications cannot connect to it during that time. It is always advisable to take a downtime during patching of SQL instances. Hot patching is still not available in SQL Server.


Thumbs downDeleting files from the C:\Windows\Installer folder will free up disk space
A mistake that I have seen so many people make, that I have lost count. Yes, it will free up your disk space but at a cost where it is possible that you might not be able to patch your existing applications which use Windows Installer for installation/patching. Surprisingly, this is a hidden folder and still people think it’s wise to delete files from that folder. We made it hidden for a reason!! As mentioned before in this blog post, that SQL installation uses Windows Installer. The files .MSI and .MSP) used during installation are cached in the Windows Installer folder. If you delete these files, then you will land up with various missing MSI/MSP files error. SQLSeverFAQ (Blog | Twitter), a SQL CSS MSDN blog, has multiple posts has step-by-step instructions when you run into such issues. Disclaimer: The steps mentioned in the posts are NOT for the faint-hearted! So, avoid Shift+Delete of such files!

Reference :
Part – 1 – SQL Server 2005 Patch Fails to install with an error Unable to install Windows Installer MSP File
SQL SERVER 2005 SP3 Install Fails with Error 1612- Unable to install Windows Installer MSP file

In case I have missed out any other common myth related to SQL Services and Setup, please feel free to leave a comment and I shall add it to the post. Leave your name and Twitter handle (if you have one) and I shall acknowledge the credit for the myth to you.

Sleeping session ID: Where art thou?

This has been an age old confusion ever since the Dynamic Management Views changed the way SQL Performance monitoring was done.

A SPID/Session ID will show up as Sleeping/Awaiting Command when it has no active work request that it needs to execute on the Database Engine. Consider the following scenario:

1. You connect to a SQL instance using Management Studio

2. You execute: SELECT @@SPID

3. You then open another SSMS query window and try and lookup the Session ID returned by the output of Step #2 using the DMV: sys.dm_exec_requests.

If you didn’t know of this behavior, then you will be in for a surprise! This is because the Session ID associated with the query in Step #2 has no active work to perform which is why the DMV didn’t report a row for that particular session. However, when you look up this session id using sys.dm_exec_connections or sys.sysprocesses. The reason sys.sysprocesses reports this SPID is because it doesn’t differentiate between a session with/without an active work request. It will report all SPIDs currently connected to the database engine.

The above behavior is expected and by-design.

Bob Dorr has mentioned about this in his post on the CSS SQL Escalation blog and also talks about how this can affect concurrency if such a session has open transactions:

How It Works: What is a Sleeping / Awaiting Command Session

Jet Provider for x64

All those who have blogged about the non-availability of x64 version of Jet (including me :)) would now have to eat their blog posts or at least publish addendums stating that there is now a x64 version of Jet Provider (oops it’s now called ACE) with the advent of Office 2010.

If you had an application which imports data from an Excel sheet into a SQL Server database table using a Jet Provider, then there are chances that you either moved to SSIS packages or kept on using a 32-bit version of SQL Server. Well now with Office 2010, you have the option of using the ACE provider for connecting 64-bit Office files.

Make sure that you read the end user agreement before you migrate everything off to use Office 2010 Data Access components.

Read more about it here: http://blogs.msdn.com/psssql/archive/2010/01/21/how-to-get-a-x64-version-of-jet.aspx

Addedum: April 4th, 2010. After my colleague, Evan pointed out the server side support policy for ACE.

Disclaimer: The ACE redistributable link does mention the following:

The Access Database Engine 2010 Redistributable is not intended:

  • As a replacement for the Jet OLEDB Provider in server-side applications.
  • To be used within a service program or web application that relies on a Windows service.