Category Archives: AlwaysOn

SQL Server AlwaysOn

SQL Server HA and DR solutions ?

I have been trying to figure out the best way to HA and DR protect SQL servers for some time now.

My customer is currently running SQL server 2008 R2 on Veritas cluster with Veritas Volume Manager as block level replication between primary and DR site.  The cluster has 4 physical nodes on each site with multiple instances for different system. Two of the hosts on each site has FusionIO flash cards for tempdb storage.  All other data is stored in a NetApp SAN connected through dual 8 GB FC HBA’s. Symantec NetBackup 7.6 is used for backups, it’s using the latest snapshot technology in NetApp and Vmware to make backups of databases in just seconds. Restore is also done in just minutes.

Upgrade to SQL 2012 is in the pipe and we are also looking into the possibility of making these servers virtual in Vmware. Load on the physical boxes is not that high and the performance of the Vmware environment would be sufficient. Vmware is running on HP blades, blade chassi is connected to NetApp through dual (or more) 10 GB Ethernet. Datastores in Vmware uses NFS protocol. Veritas is considered expensive and the goal is to get rid of it.

IOmeter och Crystal diskmark gives us almost the same disk performance between physical and virtual servers. We are looking into the possibility of using either InfinIO and/or IOTurbine in the Vmware hosts to get even better performance.

Last 8 months we have been testing AlwaysON to see if that is the solution for a new environment. AlwaysOn is very nice when used with non-shared storage and both synchronous and asynchronous replication between hosts and sites seems to be working very well. We setup a cluster with 3 plus 3 hosts on primary and DR site and created AG’s for both Sharepoint environments and other both big and small databases. We setup AG’s local to only one site and AG’s spanning both sites and we used a file share witness that could also be moved to DR site.

All worked well until we performed  a planned DR test where we failed over all AG’s to the DR site and then pulled the plug between the sites. Primary site cluster stayed up but DR site stopped because of lost quorum…. My mistake of course, had to manually change the quorum configuration after failing over. Have fixed that now with a powershell script that fails over all AG’s, resumes replication and changes quorum. So far so good.

But there is still a problem. As I said, one of the AG’s I have created is only local to the servers on the DR site, it’s a database only used by application installed on the DR site so we don’t need it on the primary site. Problem is that the cluster on the DR site goes down when we lose connectivity between the sites..  Microsoft  Cluster is designed this way to avoid a split brain situation, but it’s not what I want….

One way to solve this would be two have separate AlwaysOn clusters, but that’s expensive and it would be better to utilize the same servers for more than one thing.

Another way would be two have two FCSI clusters and put AlwaysON on top of that, but that would require shared storage for each FCSI on each site and in Vmware this requires FC or ISCSI. If I understand it correctly it could work in Windows 2012 and Vmware 5.5. But we are not there yet.

On top of this there is the problem with logins and jobs that need s to be synched between the servers. Having six servers and 4 AG’s makes this a bit complicated.

Jonathan Kehayias at SQLskills has written a small application to help out with this, but it’s still not enough. Check it out here

So, I have started to look into different solutions. I have found the following alternatives that we should consider:

  • Vmware native HA with SRM for DR protection
  • SIOS
  • DH2i
  • Veritas

What I’m considering right now is to just put the SQL servers in Vmware as is and use SRM for DR protection, it would give basically the same protection level as the Veritas solution. Block level snapshot replication in the NetApp instead of block level volume replication with Veritas. Failover between local host on primary would be slightly slower because  it would require the virtual machine to restarted from scratch on a new hosts, compared to Veritas where only SQL server has to startup. But Veritas has to take care of shared storage and unmounting and mounting diskgroups and that can also take some time. Failing to DR site is manual in Veritas and would be the same in Vmware, data loss would possibly be the same if the line between the sites is not fast enough for high peak load.

Using some other third-party clustering like SIOS or DH2i would maybe be better than Veritas, less complicated, possibly cheaper but something new that needs to be implemented.

With the protection that Vmware, NetApp and Symantec provides is it really neccesary to add AlwaysOn protection as well if we don’t need readable secondarys or backup offloadning ?

What’s your opinion ?

Replicate logins and jobs in #SQLServer #AlwaysOn environments

After working for a while with SQl 2012 and AlwaysOn it becomes clear that keeping logins, jobs and other uncontained objects in sync between different hosts is a problem.

Maybe not if your running a small cluster with only 2 servers and one or two AG’s, but when you have 5 servers and 5 AG’s it becomes a hazzel to keep control of what should be copied to each server.
I started out with home made script and synclogin scripts but it feels like I’m always forgetting something.

Today I found this:

SQLSkills and Jonathan Kehayias have created a SSMS add-in that lets you create SQLCMD script that can copy everything. Easy to use and a very good start.

So far it only creates a SQLCMD script that you will have to automate yourself, but Jonathan believes they will continue to develope it into a more automated utility.

That sounds very interesting.

DQS with AlwaysOn automatic failover

In the last blog post I described how to set up DQS in AlwaysOn.

In this post I will setup automatic failover of the database. The actual failover is handled by SQL server

availability groups, but there are a couple of steps that has to be done manually to be able to connect successfully

to the DQS database after a failover, automatic or manual.

I’m using the document as a guide for this setup:

The problem is the mapping of SID’s don’t get aligned correctly between the different hosts in the AG.

In manual or planned failover scenario it is easy to just run the following commands database failover has completed:





ALTER USER dqs_service WITH LOGIN=[##MS_dqs_service_login##]


ALTER USER dqs_service WITH LOGIN=[##MS_dqs_service_login##]

In unplanned automatic failover no one is around to run the command, this can solved by creating the following solution:

Create a database on each node, DQSFailoverStatus, and then a table called replica_role

USE DQSFailoverStatus;

CREATE TABLE [dbo].[replica_role]


[replica_role] [tinyint] NULL


INSERT INTO [dbo].[replica_role] ([replica_role])


SELECT [role]

FROM sys.dm_hadr_availability_replica_states

WHERE is_local = 1


Create a SQL agent Job on all nodes in the AG:

USE DQSFailoverStatus;


SET @last_role =


SELECT TOP 1 [replica_role]

FROM [dbo].[replica_role]


DECLARE @current_role TINYINT;

SET @current_role =



FROM sys.dm_hadr_availability_replica_states

WHERE is_local = 1


— Last time it was secondary, currently it is primary; run the commands
— to alter the authorization

IF (@last_role = 2 AND @current_role = 1)

ALTER USER dqs_service WITH LOGIN=[##MS_dqs_service_login##]

ALTER USER dqs_service WITH LOGIN=[##MS_dqs_service_login##]

USE DQSFailoverStatus;
UPDATE dbo.[replica_role] SET [replica_role] = @current_role;

Depending on your SLA requirements setup the agent job to run every 5-15 minutes.

If you have multiple AG’s on you installation the check script has to be modified slightly to recognaize the AG name where you DQS database is running, something like this:

FROM sys.dm_hadr_availability_replica_states states
join sys.dm_hadr_name_id_map map
on states.group_id = map.ag_id
WHERE states.is_local = 1
and map.ag_name =’IT’

DataQualityServices setup with AlwaysON

Setting up Data Quality Service with Always on is well described in this document:

I used that as guide when I installed DQS at a customer site today.

The procedure was very straight forward and I had no problems at all.

1. First I added DQS feature from the SQL 2012 installation media. I did not have to reboot or restart SQL server so that was easy.

2. I applied the SP1 and CU3 patches that where previously applied for the rest of the SQL server.

3. I repeated this on all the nodes where the AvailabilityGroup was setup to run.

4. After that I had to run the DQSInstaller.exe file in the …\MSSQL11.MSSQLSERVER\MSSQL\Binn folder on each node. This requires me to enter a Database Master Key, you must use the same key on all nodes and be careful not to forget it. It is need when a SQL upgrade takes place.

The DQSInstaller then created three databases.

5. Then I added a Windows account on the first node and gave it permissions to the DQS_MAIN database according to the document.

6. Then I deleted the new databases from the two other nodes in the cluster.

7. Now I could add the three databases as availability databases in the AG. And they where copied out to all nodes.

8. I then copied the Windows account to all three nodes. Keep in mind tha AlwaysOn doesn’t replicate any logins. I use a sp called dbs_CopyLogins, it can downloaded here:, is very useful since it copies the password correctly..

9. Then I installed the DQS client from the SQL distribution and patches it. When I started the Client I Could just enter the AG listener name and could successfully login.

Staffan Olofsson
MCSE: Data Platform

MasterDataServices setup in a AlwaysOn environment

I was told to install MDS (Master Data Services) in AlwaysOn environment with HA (high availability) and DR(disaster recovery) protection for a customer. This sounded like an easy task with some withe papers and guidelines available on the net.

I started on with this one from Microsoft which looked very promising:

But I soon realized this was maybe not the best solution since it required me to install IIS on each node in the AlwaysON cluster and it also required that all IIS installations where kept in sync.

The AlwaysOn environment at this customer site is setup with 5 nodes, three on primary site and two on DR site. It includes 3 Availability Groups which can failover between at least 3 of the different hosts. It uses NonShared Storage.

Installing IIS and all MDS services in this environment didn’t sound very appealing so I decided to use a different approach.

I created a new virtual server in Vmware and added it to the SRM (SiteRecoveryManager) setup, which means that it is replicated to the DR site and can be started up there in case of DR situation. On the primary site HA is accomplished through multiple hosts in the Vmware cluster.

After Installing MDS on the new virtual server, I run the Master Data Services Configuration Manager and pointed to the AlwaysOn AG listener that I had previously created on the new AG for the MDS database.

I now have HA and DR solution for MDS without having to mess up the SQL servers with IIS and MDS services, that feels like a much cleaner solution to me.

Staffan Olofsson
MCSE: Data Platform