Debugging Service Bus Issues in Azure / by James Heywood

So I had an interesting morning investigating an issue with the project I am working on at the moment.

Background

Briefly we have built an identity solution integrating several client websites, the client uses Gigya for social sign on and we have put in an API and service bus, fronted by a custom built JavaScript SDK (written by yours truly in backbone) to manage the login and registration of users to Gigya and then hit endpoints in our API which raise messages onto the service bus.

These messages are then picked up by a variety of worker roles in order to synchronise user data between several systems, including the client's email campaign tool and in relation to this post and the issue I was having, a Solr search index containing segmented user data which the client wishes to use for advertising and marketing purposes.

So we have a cloud based solution which is robust, scales well, is easy to manage and is also open to extension (by creating further service bus topics or queues and the worker roles to handle and process these messages), nice!

FYI we are using Microsoft Azure as our platform so this post is skewed towards this, however the same concepts apply regardless of your platform and all cloud development presents the same challenges when trying to debug issues and identify the cause of unexpected behaviour.

The Issue

My problem came about when trying to figure out why a particular message was not resulting in the relevant data in our search index, where is the problem here? With the API, the bus, the topic, the message handler, the search index itself? With so many parts its a challenge just trying to find the right place to start, and it took a couple of hours to figure this out.

As it happens the problem turned out to be something trivial and once I understood this the fix was also trivial. As with most things it was the investigation that was costly, so just in case you are having an interesting time trying to figure out an issue with Azure development here is how I found the problem.

Resources

Firstly I needed a few things setting up and/or installing;

Windows Azure Management Portal

Obviously you are going to need access to the cloud resources in question, a quick request to the administrator sorted that out for me. When first using the azure portal it can take a while to familiarise yourself with the iconography, the elephant for HD Insight is my personal favourite!

Once you play around with it for a while though it starts to make a bit more sense, although it is very information dense so take your time and digest it at your leisure, assuming you have the time. If you're trying to debug an issue you probably don't, in which case good luck!

Service Bus Explorer

As we are using a bus and the issue manifested itself as the lack of an expected outcome from a message posted on the bus I thought it prudent to take a peak at the messages on the bus, just call me Columbo!

The reason for this is to check if I had any messages in dead letter queues, or indeed if the bus was set up correctly. This handy tool let's you peak at messages on the bus and those that have failed to be handled and ended up in the dead letter queue. Just download the file, unzip and run the .exe in the bin folder. You'll need to add your connection settings to start looking around which you can find from your worker role configuration in the Azure Portal.

Azure Management Studio

This piece of software gives a reasonable interface to manage your azure configuration, in particular it allows you to review azure storage content, which is what I was concerned with. When you run the installer it prompts you to visit an endpoint in azure to download your azure publish settings, follow the link and upload the downloaded file to the Azure Management Studio software and it sets up all of your azure subscription details for you, nice!

The Investigation

Now we have some weapons in our arsenal we are ready to do battle with the pesky issue, below are the steps I undertook during my investigation.

  • In the portal, go to the Service Bus section and select the correct namespace, from here use either queues or in my case topics to ensure that the relevant topic is set up as expected. All seems well here, let's move on.
  • Next go to the Cloud Services section and select the relevant service name, from here click on Configure and ensure that the configuration of your worker role(s) is as expected. In my case this allowed me to check that I had the correct endpoints set up for my search index, which ruled out an issue with the handler connecting to the search index. You can also check things like connections to table storage, if you have any. This would have given me a clue early on in my investigation had I thought about it but alas I was too busy scratching my head wondering where to look for helpful information at the time!
  • From here I made a note of the service bus settings (hostname, key name, key value) and then proceeded to start up the service bus explorer
  • In service bus explorer I chose File > Connect and added the details of my service bus and clicked OK to connect to the bus in question.
  • When this had loaded I drilled down into the topic in question, found the relevant subscription and spotted that there was a message sat in the dead letter queue for this. Upon retrieval of this message I spotted I had an issue processing the message, so this pointed in the direction of my message handler.
  • The next step was to open up Azure Management Studio, drill down to the relevant storage account and then filter and query the WADLogsTable for a message matching the role and timestamp of the dead lettered message I just found on the bus.
  • Once I found this I had an exception message and stack trace, so at that point it was a simple case of figuring out the error and making the fix. 

The Fix

As it turns out I simply forgot to add some data to our development environment's table storage. A query issued by the message handler relied on this data and as it was missing it was throwing an exception and dead lettering the message on the bus.

 

TL/DR

After all that it was developer error, doh!

 

I hope this post has helped shed some light on the fun of working in azure and also enlightened you to some of the tools at your disposal to assist when trying to identify an issue.

Cheers,
James