Showing posts with label SQL Server. Show all posts
Showing posts with label SQL Server. Show all posts

Format-Volume Bug or Feature?

Format-Volume Bug or Feature?

Problem

We just recently added an additional node to our existing Availability Group cluster. 
Our team uses a PowerShell script to easily format multiple drives in a specific allocation unit size.

The issue happened when we formatted the drive on the new node. It is supposed to only format the drive on the server where it was ran. But for some reason, it also formatted the drive on the existing nodes.

This wiped out the drives on the existing nodes. The existing system went down and we have to restore everything from our backup.

We have to do a RCA (Root Cause Analysis) because it affected our production system.

Root Cause Analysis

We have to build a similar environment to simulate the issue.

To cut the story short, we were able to pin point the issue on Format-Volume command in PowerShell.

This particular issue only affects servers that are part of Windows cluster and have same drive letters on each node.

For SQL Server Always On:
  • This does NOT affect Failover Cluster Instance (FCI) since it is using shared storage.
  • This only affects the Availability Group configuration because each node have it's own storage and they can have same drive letters.
As a test we ran the Format-Volume to format the Z drive on node1 (of a 3 node cluster).

Format-Volume









It formatted the Z drive on of all the nodes instead of just formatting the drive on node1.

Solution

Open a case with Microsoft to check if this is a bug or if there is a different method that we can do to run the Format-Volume only on the node where it is executed. (At the time of this writing, we have not heard back from Microsoft.)

As a precautionary measure for our team, we only use this feature on the following situation:
  • Standalone servers
  • Always On Failover Cluster Instance servers
  • New server builds of Always On Availability Group

Get SQL Server Windows Cluster Information

Get SQL Server Windows Cluster Information

Problem

I have already identified that there is a SQL Server installed on a server. Using PowerShell, I wanted to know if the server is using windows clustered WITHOUT connecting the database instance. 

This is an option that system administrators since they have admin rights on the Operating System but don't have access to the SQL Server instance.

I need to know the following:
1. I need to know if the server is clustered or not.
2. If the server is clustered, I need to know why type of clustering is configured on the server. 
    SQL Server have 2 types of Windows Clustering:
  • Failover Cluster Instance
  • Availability Group
    You can also combine Failover Cluster Instance with Availability Group.
    I also wanted to know if this type of configuration is configured on the server.
3. If the server is clustered, I also need to know if the server is active or passive node

Solution

One of the requirements to be able to get windows clustering information is to ensure that you have FailoverClusters PowerShell module installed on the machine that you are using to run the PowerShell commands that are specified below.

Let us step through the requirements:
1. How to check if the server is clustered or not?
    Using HKEY_LOCAL_MACHINE check if there is a "Cluster" entry in the registry
PowerShell

Get-ItemProperty -Path Registry::HKEY_LOCAL_MACHINE\Cluster}
    To check a server remotely you need to use Invoke-Command
PowerShell

Invoke-Command -Computer $servername -ScriptBlock {Get-ItemProperty -Path Registry::HKEY_LOCAL_MACHINE\Cluster}
    If no value is returned, the SQL Server is standalone.
    If the command returns a value, then SQL Server is clustered using windows clustering.

2. How to check what type of clustering is configured on the server?
    This is where the FailoverClusters module will come-in handy.
    We need to get/check the following:
    a. Check WMIObject MSCluster_ResourceGroup for root\mscluster namespace
    b. Get-Cluster information of the server
    c. Get-ClusterResource information of the server

    Check root\mscluster namespace
PowerShell

Get-WMIObject -Class MSCluster_ResourceGroup -Namespace root\mscluster}
    To check a server remotely you need to use Invoke-Command
PowerShell

Invoke-Command -Computer $servername -ScriptBlock {Get-WMIObject -Class MSCluster_ResourceGroup -Namespace root\mscluster}
    Get the cluster and cluster resource information of the server
PowerShell

Get-Cluster -Name $servername | Get-ClusterResource
    If both root\mscluster namespace and cluster resource does not return a value then the server is configured as a "single node" cluster.

    If both root\mscluster namespace and cluster resource returns a value, we need to figure out what type of SQL Server windows clustering is configured.

3. How to check if "Availability Group" is configured?
    Using the cluster resource information, check if there is a resource type called "SQL Server Availability Group".
PowerShell

Get-Cluster -Name $servername | Get-ClusterResource | Where-Object ResourceType -eq 'SQL Server Availability Group'
    If no results were returned then, the server is configured with "Failover Cluster Instance".
    If one result is returned, the server is configured with "Availability Group/Listener".
    If more than one result is returned, the server is configured with multiple "Availability Group/Listener".

4. How to check if "FCI with Availability Group" is configured?
    First you need to make sure that the server is configured with "Availability Group". 
    Using the cluster resource information, check if there is a resource type called "SQL Server Availability Group" or resource type called "Network Name" and resource name that starts with "SQL Network Name".
PowerShell

Get-Cluster -Name $servername | Get-ClusterResource | Where-Object { (($_.ResourceType -eq 'Network Name') -and ($_.Name -Like 'SQL Network Name *')) -or ($_.ResourceType -eq 'SQL Server Availability Group')  }
    If 2 different resource types are returned, the server is configured with "FCI with Availability Group".
    NOTE: Ensure that you filter the uniqueness of the returned resource type value.

5. How to check if the server is the active or passive node?
    Using the cluster resource information, get the OwnerNode for resource type called "SQL Server".
PowerShell

Get-Cluster -Name $servername | Get-ClusterResource | Where-Object ResourceType -eq 'SQL Server' | Select-Object OwnerNode
    If the returned OwnerNode value is the same as the $servername, then it is the "active" node, otherwise it is the passive node.

Now you can combine all of the commands above to create a function that will return the SQL Server Windows Cluster Information.

You can now download the Get-SQLServer-WindowsClusterInfo function from github.

Create External Data Source to Reference a Readable Secondary Replica of Always On Availability Group

Create External Data Source to Reference a Readable Secondary Replica of Always On Availability Group

Problem

We need to create an external table that uses and external data source. The external data source needs to query the (read_only) readable secondary node of an availability group.

Pretty simple, just follow the Microsoft document:

There are several requirements:
  1. The external data source needs to be configured as AlwayOn Availability Group with Read Only Routing List configured.  Configure read-only routing for an availability group - SQL Server Always On | Microsoft Docs
  2. CONNECTION_OPTIONS argument for External Data Source is only available on SQL Server 2019.
  3. Declare the Application_Intent=ReadOnly on CONNECTION_OPTIONS
  4. Run a query on external table and execute it multiple times. This will give you time to check where the session is being executed. 
  5. Verify the connection on both primary and secondary server. The session should be running in the secondary server.

Solution

First Try:

1. Configured/created the external data source
CREATE EXTERNAL DATA SOURCE [XDSName]
WITH (LOCATION=n'sqlserver://AGListenerName',
CONNECTION_OPTIONS='ApplicationIntent=ReadOnly',
CREDENTIAL=[dbuser])
GO
2. Configured/created external data table
CREATE EXTERNAL TABLE [XTable] ([column1] INT not null...)
WITH (LOCATION=N'[dbname].[schemaname].[tablename]',
DATA_SOURCE=[XDSName])
GO
3. Run a query against the external data table and repeated it 1000 times
SELECT * FROM [XTable]
GO 1000
4. Verify both the primary and secondary server. The query should run in secondary server.
 
With my first test, the query ran in the primary server. 

After doing some investigation, I found out that the "default database" of the credential that I used is pointed to the "master" database.

I modified "default database" of the credential I used to point to database name that is being replicated by the Availability Group. I tested again by running a query on the external data table multiple time, this time the session is running on the secondary node.

But we are not allowed to change the "default database" of the user.
The "default database" was changed back to "master"

Second Try:

1. Configured/created the external data source, but this time the database needs to be declared.
CREATE EXTERNAL DATA SOURCE [XDSName]
WITH (LOCATION=n'sqlserver://AGListenerName',
CONNECTION_OPTIONS='ApplicationIntent=ReadOnly; Database=dbname',
CREDENTIAL=[dbuser])
GO
2. Configured/created external data table
3. Run a query against the external data table and repeated it 1000 times
4. Verify both the primary and secondary server. The query should run in secondary server.

The initial test failed because I used a wrong keyword "DatabaseName". 

A co-worker of mine gave me a sample of his external data source that he has configured and already working against a secondary server.

After comparing the 2 definition, I found out that the keyword should be "Database" and not "DatabaseName"

I tested again and it worked because the "database" keyword is defined in the CONNECTION_OPTIONS which override the "default database" of the credential.

Allowed connection string keywords that can be used on CONNECTION_OPTIONS

Since there is no example on Microsoft's document, I submitted a pull request and added an example G. Create external data source to reference a readable secondary replica of Always On Availability Group