Techieshelp.com

Exchange DAG Node Failure – Force Switchover With Queues

I had this issue with a client last week, the system was Exchange 2010 with a 2 node Database Availability Group (DAG) setup. One of the Exchange nodes had gone offline and this would be permanent as the failure was catastrophic. I checked that the second node had kicked into action but it had not. The mailbox database was down and upon checking the replication status of the mailbox database to the second node the copy queue was at 9223372036854775766.

Because of this when I tried to force fail over I was greeted with the following error.

An Active Manager operation failed. Error The database action failed. Error: An error occurred while trying to validate the specified database copy for possible activation. Error: Database copy ‘Database1’ on server ‘dagnode2.domain.com’ has a copy queue length of 9223372036854725486 logs, which is too high to enable automatic recovery. You can use the Move-ActiveMailboxDatabase cmdlet with the -SkipLagChecks and -MountDialOverride parameters to move the database with loss. If the database isn’t mounted after successfully running Move-ActiveMailboxDatabase, use the Mount-Database cmdlet to mount the database.

I was pretty confident that no mail would be lost as all my clients are in cached mode so upon reconnecting to the CAS server they would sync mail backup to to second mailbox server.

Upon running the command mentioned in the error I was again greeted with red warning errors stating the that it could not start the Microsoft Exchange Search Indexing service on the failed node…that’s because it does not exist anymore, great.

To get around this we need to add a few extra flags to the command above. They are as below.

With this in mind we run the following commands to force our node back into life even though the mailbox database is not fully synced.

Move-ActiveMailboxDatabase database1 -ActivateOnServer dagnode2 -SkipHealthChecks -SkipActiveCopyChecks -SkipClientExperienceChecks -SkipLagChecks -MountDialOverride:BESTEFFORT

Once ran your database will now mount and clients will be able to connect. As mentioned this works well for situations where you have a 2 node DAG cluster with one node down and the copy queue length does not allow automatic failure.

Remove a Copy Of DAG Mailbox Database

Once you are up and running again you will need to tidy up the failed dag node. First by removing the mailbox database copy from the failed server. Do this with this command.

Remove-MailboxDatabaseCopy -Identity database1\dagnode1 -Confirm:$False

Remove a Node From DAG

We also should remove the node from dag completely with the following command.

Remove-DatabaseAvailabilityGroupServer -Identity DAG -MailboxServer DAGNODE1 -ConfigurationOnly

This little lot took me a few hours to get fixed so hopefully this will save someone out there a lot of time.