Troubleshooting Methodology

I was working on a project for a client recently that involved hundreds of accounts. I would test my process on a couple, make sure that it worked, and then run the script against the entire list. However before I moved the lot to the new location, my customer wanted his account to be a test-balloon… he wanted to see that my process worked before okaying the move.

He is quite technical, so when little things go wrong, he does not usually call me. I truly appreciate that. It can also come back to bite both of us.

I sent him a message to let him know that his account was ready for testing. He replied a few minutes later that the data was intact, but there was an issue. I told him that I was going to make a couple of adjustments, and asked that he try after each one. When I told him I was ready, he told me that he had also tweaked something… and that the issue was now resolved for his account, and that I could go ahead with the bulk operation.

There are very few times in my career when I received the message “Issue Resolved” from a customer, and was not pleased with it. I know, there are less scrupulous consultants who are always happy to pad their bills, and when an issue is resolved it means they have to turn off the timer. I am not like that, and this had nothing to do with that. I asked him for a phone call, and he was happy to call. I wanted to speak with him by phone, because it is always delicate when you have to rap a client’s knuckles.

There are times when it is just important to solve a problem. When your production servers go down and you are losing however much money for every minute of downtime, your first goal is to get the servers back up. Where things are less critical, it is often more important to know what worked, so you can properly adjust your methodology. That is why, when troubleshooting a problem, you make one change, and if it does not work, you undo the change before trying the next. It is the only way to be sure that you not only solve the problem, you can document the solution so that when it happens again you know how to resolve it. It is scientific method, and the only correct way to troubleshoot that will reliably let you resolve an issue permanently, rather than fixing a problem once.

My client understood that, in his eagerness to move forward, his tweaks interfered with my method, and agreed to run tests against another account. We did, my solution had worked, and we were able to run the operation for several hundred users, each of whom was likely oblivious that we were moving their accounts about.


There is a difference between fixing a problem and resolving one. Knowing what you did to fix an issue is important so that however much time you spent trying things does not need to be repeated down the road. Being able to document ‘The problem was resolved, and this is what fixed it’ is not only more complete than ‘It is fixed, ‘ it is also what your customer is paying you for. It is the difference between someone who prints business cards and says they fix computers, and a technician, administrator, or systems engineer. It shows that you are dedicated to solving your customer’s problems and earning their trust, rather than building a recuring source of revenue for you based on a problem that will almost certainly happen again.


2 responses to “Troubleshooting Methodology”

  1. This is the exact idea that MS support used to pride themselves on was having the answer to those weird, quirky problems that no one else ran into. The whole internal KB was based on that. If you see these weird behaviours on this OS and with this software, then this is the solution. X + Y = Z

    It speaks volumes that you, as a person, have the integrity to go the extra mile to document it because for a lot of people, documentation really is an afterthought.

    There are some companies that do a serious postmortem and deep dive into the why, document the hell out of it so if it happens again, those millions of dollars lost is minimized as much as possible.

    Then you have those other companies who don’t. They end up with no system state back up of their on-prem AD environment, can no longer promote any dc into the domain and are truly screwed. All the troubleshooting they did 18 months ago, isn’t documented anywhere, now they in the middle of a premier case with Microsoft and are remembering everything that they already tried and wasted not only premier hours, but their own bandwidth as well… (true story of a customer environment. Not doing the troubleshooting, but observing the mayhem)

    It pays, not only to take the time to troubleshoot correctly (instead of brute forcing a solution), but it’s also 100% worth taking the time to document the solution. Even if it’s not for your customer, future you will appreciate it.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: