Sometimes you spend several days (or nights) trying to solve a problem, only to find out your problem is caused by some inexplicable design decision you couldn’t possibly anticipate. In this case, I was investigating errors we’ve head on peak hours from our SQL Server database server, where we got “Timeout expired” errors on commit or rollback of database transactions, for the last two months.
Now, our database server probably doesn’t have enough capacity to deal with its load on peak hours, but still, we hadn’t had any errrors like this before, even at the busiest of times. Completion of transactions in SQL Server (and especially commits) have always been very fast on SQL Server, in my experience over the last 20 years.
When searching the internet, it turns out quite a few people have this same problem, but nobody knows exactly why. In the end, there’s exactly one place to find the surprising answer, provided by Matt Neerincx of the SQL Server team in a forum post:
The value used in the .NET SQLClient class for the client-side transaction timeout is the value set for the connection timeout of the database connection.
In his post, Matt explains that in this case they had no other values available in the class interface, so they decided to re-use the existing value for the connection timeout. Now, for me these two values are completely unrelated. Sometimes you want to set the connection timeout pretty low, so that un unavailable server, or incorrectly provided server name, is detected without a very long wait. I had done just that, by reducing our default connection timeout from 30 to 20 seconds, never suspecting this would have these drastic side-effects on the completion of transactions!
Ah well, problem solved, on to the next: This morning we had one of the infamous “Deadlock detected” errors in our ASP.NET web application. That’s another can of wurms, trying to make sense of the myriad of threadpool-related settings in ASP.NET.