Showing posts with label AG. Show all posts
Showing posts with label AG. Show all posts

SQL Server error 47106

SQL Server error 47106

Problem

I was trying to configure Read-Scale Always On Availability Group in our test server using the steps from https://www.mssqltips.com/sqlservertip/6905/sql-server-read-scale-always-on-availability-groups/ but I encountered an error "SQL Server error 47106".

Below is the error that I got:

TITLE: Microsoft SQL Server Management Studio
------------------------------
Failed to join the instance 'TESTSQL17A' to the availability group 'TESTSQL17RS'. (Microsoft.SqlServer.Management.HadrModel)
For help, click: https://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=16.100.47021.0&EvtSrc=Microsoft.SqlServer.Management.Smo.ExceptionTemplates.FailedOperationExceptionText&LinkId=20476
------------------------------
ADDITIONAL INFORMATION:
An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)
------------------------------
Cannot join availability group 'TESTSQL17RS'. Download configuration timeout. Please check primary configuration, network connectivity and firewall setup, then retry the operation.
Failed to join local availability replica to availability group 'TESTSQL17RS'.  The operation encountered SQL Server error 47106 and has been rolled back.  Check the SQL Server error log for more details.  When the cause of the error has been resolved, retry the ALTER AVAILABILITY GROUP JOIN command. (Microsoft SQL Server, Error: 47106)
For help, click: https://docs.microsoft.com/sql/relational-databases/errors-events/mssqlserver-47106-database-engine-error
------------------------------
BUTTONS:
OK
------------------------------

Solution

I scoured the internet for any related issues and saw a post from  https://www.voltol.com/2020/02/sql-error-47106-when-trying-to-add.html

Reading through the article above helped me in the right direction. The port used by read-scale is being blocked by a firewall.

Below is the link of required firewall rules to access SQL Server: 

During the course of the investigation, I performed the following tasks:

1. Checked the port being used for Read-Scale Availability Group. Availability group uses port 5022 or 7022 for database mirroring.

SQL

SELECT name, protocol_desc, port, state_desc
FROM sys.tcp_endpoints
WHERE type_desc = 'DATABASE_MIRRORING'

2. Using PowerShell command Test-NetConnection, I tested the port being used by SQL Server for database mirroring on both servers. I included port 1433 since it is the default TCP port of SQL Server instance.

PowerShell

clear
Test-NetConnection -ComputerName TESTSQL17A -Port 1433
Test-NetConnection -ComputerName TESTSQL17A -Port 5022

Test-NetConnection -ComputerName TESTSQL17B -Port 1433
Test-NetConnection -ComputerName TESTSQL17B -Port 5022
The test from TESTSQL17B to TESTSQL17A failed on port 5022, but the connection using port 1433 was successful.








3. To ensure that the ports are open, I manually added the ports 1433 and 5022 in the firewall rules of both servers.

PowerShell

clear
New-NetFirewallRule -DisplayName "SQLServer default instance" -Direction Inbound -LocalPort 1433 -Protocol TCP -Action Allow
New-NetFirewallRule -DisplayName "SQLServer Database Mirroring" -Direction Inbound -LocalPort 5022 -Protocol TCP -Action Allow

New-NetFirewallRule -DisplayName "SQLServer default instance" -Direction Outbound -LocalPort 1433 -Protocol TCP -Action Allow
New-NetFirewallRule -DisplayName "SQLServer Database Mirroring" -Direction Outbound -LocalPort 5022 -Protocol TCP -Action Allow
The test to connect using port 5022 still failed even though both outbound and inbound firewall rules were already been configured.

4. Contacted systems administrators to check the ports and firewall rules. The server admin confirmed that ports and firewall rule settings are correct. 

I recall getting an email regarding "Zero Trust Network Access" software being tested by our security team.

5. Contacted our security team and asked someone to check the servers that I am using for my testing.

The security admin confirmed that one of the server is included in their "Zero Trust Network Access" testing. They opened/verified that port 5022 are open on both servers.

The Test-NetConnection testing on port 5022 was successful after the change was implemented.

6. Read-scale Always On configuration was successful after the changes were made.

Some solutions might require working with other teams in your company. Awareness of events happening around you is also important.


Format-Volume Bug or Feature?

Format-Volume Bug or Feature?

Problem

We just recently added an additional node to our existing Availability Group cluster. 
Our team uses a PowerShell script to easily format multiple drives in a specific allocation unit size.

The issue happened when we formatted the drive on the new node. It is supposed to only format the drive on the server where it was ran. But for some reason, it also formatted the drive on the existing nodes.

This wiped out the drives on the existing nodes. The existing system went down and we have to restore everything from our backup.

We have to do a RCA (Root Cause Analysis) because it affected our production system.

Root Cause Analysis

We have to build a similar environment to simulate the issue.

To cut the story short, we were able to pin point the issue on Format-Volume command in PowerShell.

This particular issue only affects servers that are part of Windows cluster and have same drive letters on each node.

For SQL Server Always On:
  • This does NOT affect Failover Cluster Instance (FCI) since it is using shared storage.
  • This only affects the Availability Group configuration because each node have it's own storage and they can have same drive letters.
As a test we ran the Format-Volume to format the Z drive on node1 (of a 3 node cluster).

Format-Volume









It formatted the Z drive on of all the nodes instead of just formatting the drive on node1.

Solution

Open a case with Microsoft to check if this is a bug or if there is a different method that we can do to run the Format-Volume only on the node where it is executed. (At the time of this writing, we have not heard back from Microsoft.)

As a precautionary measure for our team, we only use this feature on the following situation:
  • Standalone servers
  • Always On Failover Cluster Instance servers
  • New server builds of Always On Availability Group

Get SQL Server Windows Cluster Information

Get SQL Server Windows Cluster Information

Problem

I have already identified that there is a SQL Server installed on a server. Using PowerShell, I wanted to know if the server is using windows clustered WITHOUT connecting the database instance. 

This is an option that system administrators since they have admin rights on the Operating System but don't have access to the SQL Server instance.

I need to know the following:
1. I need to know if the server is clustered or not.
2. If the server is clustered, I need to know why type of clustering is configured on the server. 
    SQL Server have 2 types of Windows Clustering:
  • Failover Cluster Instance
  • Availability Group
    You can also combine Failover Cluster Instance with Availability Group.
    I also wanted to know if this type of configuration is configured on the server.
3. If the server is clustered, I also need to know if the server is active or passive node

Solution

One of the requirements to be able to get windows clustering information is to ensure that you have FailoverClusters PowerShell module installed on the machine that you are using to run the PowerShell commands that are specified below.

Let us step through the requirements:
1. How to check if the server is clustered or not?
    Using HKEY_LOCAL_MACHINE check if there is a "Cluster" entry in the registry
PowerShell

Get-ItemProperty -Path Registry::HKEY_LOCAL_MACHINE\Cluster}
    To check a server remotely you need to use Invoke-Command
PowerShell

Invoke-Command -Computer $servername -ScriptBlock {Get-ItemProperty -Path Registry::HKEY_LOCAL_MACHINE\Cluster}
    If no value is returned, the SQL Server is standalone.
    If the command returns a value, then SQL Server is clustered using windows clustering.

2. How to check what type of clustering is configured on the server?
    This is where the FailoverClusters module will come-in handy.
    We need to get/check the following:
    a. Check WMIObject MSCluster_ResourceGroup for root\mscluster namespace
    b. Get-Cluster information of the server
    c. Get-ClusterResource information of the server

    Check root\mscluster namespace
PowerShell

Get-WMIObject -Class MSCluster_ResourceGroup -Namespace root\mscluster}
    To check a server remotely you need to use Invoke-Command
PowerShell

Invoke-Command -Computer $servername -ScriptBlock {Get-WMIObject -Class MSCluster_ResourceGroup -Namespace root\mscluster}
    Get the cluster and cluster resource information of the server
PowerShell

Get-Cluster -Name $servername | Get-ClusterResource
    If both root\mscluster namespace and cluster resource does not return a value then the server is configured as a "single node" cluster.

    If both root\mscluster namespace and cluster resource returns a value, we need to figure out what type of SQL Server windows clustering is configured.

3. How to check if "Availability Group" is configured?
    Using the cluster resource information, check if there is a resource type called "SQL Server Availability Group".
PowerShell

Get-Cluster -Name $servername | Get-ClusterResource | Where-Object ResourceType -eq 'SQL Server Availability Group'
    If no results were returned then, the server is configured with "Failover Cluster Instance".
    If one result is returned, the server is configured with "Availability Group/Listener".
    If more than one result is returned, the server is configured with multiple "Availability Group/Listener".

4. How to check if "FCI with Availability Group" is configured?
    First you need to make sure that the server is configured with "Availability Group". 
    Using the cluster resource information, check if there is a resource type called "SQL Server Availability Group" or resource type called "Network Name" and resource name that starts with "SQL Network Name".
PowerShell

Get-Cluster -Name $servername | Get-ClusterResource | Where-Object { (($_.ResourceType -eq 'Network Name') -and ($_.Name -Like 'SQL Network Name *')) -or ($_.ResourceType -eq 'SQL Server Availability Group')  }
    If 2 different resource types are returned, the server is configured with "FCI with Availability Group".
    NOTE: Ensure that you filter the uniqueness of the returned resource type value.

5. How to check if the server is the active or passive node?
    Using the cluster resource information, get the OwnerNode for resource type called "SQL Server".
PowerShell

Get-Cluster -Name $servername | Get-ClusterResource | Where-Object ResourceType -eq 'SQL Server' | Select-Object OwnerNode
    If the returned OwnerNode value is the same as the $servername, then it is the "active" node, otherwise it is the passive node.

Now you can combine all of the commands above to create a function that will return the SQL Server Windows Cluster Information.

You can now download the Get-SQLServer-WindowsClusterInfo function from github.