Best Practice #1: If you are deploying jobs against databases in AlwaysOn Availability Groups, make sure both jobs exist on all replicas. So, in the event of a failover, the jobs can continue to work.
Best Practice #2: Also, as per best practice, if you are setting up this job, the first step in the job should be to run fn_hadr_is_primary_replica check.
Straight forward enough, I think.
Reviewing an AlwaysOn Availability Group configuration, my client asked me to explain the excessive number of logins failed messages on the primary and secondary. I suspected misconfigured jobs, and I was correct. Almost …
The best practice #2 was not followed, so it caused some of the failed login messages. But what was strange was the same job was also redirecting its calls from the secondary SQL Server Agent to the Primary Replica!
Here comes the banging my head against AlwaysOn Availability comes in, I checked everything I can think off:
- Read-only routing
- Linked servers
- Aliases
- Host file
- Open query
- Multi-server job
Nothing! I am like OMG, that is one messed up b**.
I was confounded; what made it worse, my client said this happened after they patched to SQL Server 2016 SP2 CU11. So, I went down a wrong rabbit hole for a while, but I was able to keep my Microsoft hat on and talking to colleagues I found what is happening behind the scenes. This is by design, if client connections fail on the secondary replica, they get redirected to the primary replica.
The question to ask, if SQL Server Agent a client connection? I know the rhetorical question. SQL Server Agent is a core SQL application, but it is an external application. Therefore, it is considered a client connection, and it makes calls using SQL Server native client. Which has built-in logic for backward compatibility for Database Mirroring to redirect connections!
I learned something new (rather old)!
Read up more details here.