Reviving ZoneBot: rebuilding using V4 of Microsoft Bot Framework
Zone head of .NET development Andy Butland shares his experience of rebooting a chatbot using the latest version of Microsoft’s framework…
A year or so ago, I spent a bit of time investigating the Microsoft Bot Framework, using it to develop ‘ZoneBot’, a chatbot that we have integrated into Slack so that people can access basic information about their colleagues (nothing too intrusive!). ZoneBot pulls information from our internal projects and timesheeting tool, Timezone, providing people’s contact details, schedule, location, role and whether or not they are on holiday.
Some months after, it stopped working — likely due to one of the various platform changes that had taken place. I’ve recently got it going again, updating from version 3 to 4 of the framework, and was interested to discover that quite a bit has changed between the two versions; so much so that it was actually easier to start over, pulling in the bits that could be reused from the older version.
In this article I’ll discuss how the bot has been put together using version 4 of the framework, and share some key parts of the code.
The first key change to be aware of is that the framework is now based on .Net Core, so there are a number of differences in how we set up the bot directly related to that, particularly around how services are registered and configuration is used. All of this occurs in the Startup class.
The bot makes use of two service classes, one of which makes calls to the Timezone API, while the other is used to provide an API wrapper around the Zone digital glossary.
The former is defined by the following interface and DTOs:
The implementations make use of typed HTTP clients to make HTTP calls to retrieve the information requested, which allows us to configure a client with the necessary base address, standard headers etc that we read from configuration. The Timezone one looks like this:
These services, and the IOptions<> parameter created from configuration, are registered in the Startup class. We register two things here with the .Net Core dependency injection framework: statically typed configuration and the service implementations themselves. Both of which we can then access via constructor injection in the service or other components.
Also configured in the Startup class is the bot itself.
A bot of any sophistication will require some form of state or “memory”, such that it can recall information received from the user in previous interactions when responding to any given prompt. With ZoneBot I wanted to be able to ask a question like “Where is Fred Bloggs today?”, and then follow up by asking “What is he working on?”. To answer the second question the bot needs to ‘remember’ who the user is talking about. Similarly, I wanted to be able to ask “And what about Sally Smith?”. Here the bot needs to know the intent of the previous question to be able to answer the latest one.
The simplest solution is to hold this state in memory, which is configured as follows:
The StateBotAccessors is a simple wrapping class that we’ll expose to the bot itself, following the documented best practices:
After that the bot itself is registered, along with the natural language processing Luis model service, using the standard method found in the samples made available by Microsoft.
In my first article about ZoneBot, I discussed the Luis model created for natural language processing, where we set up various intents (such as “who is”, “what is”, “what is the schedule for”) and entities (such as “person”) that a user, via plain English, will be asking the bot. We train the model with various phrases, picking out the intents and entities that the model then generalises in an attempt to respond appropriately to variations on the requests it hasn’t already seen.
Other than a bit more training to improve performance, the model was still working from the previous iteration of the bot.
The bot itself is constructed as follows, such that it has access to the bot services — ie the Luis model — and the external services that we previously registered:
The entry point is the OnTurnAsync method, which is called once for each incoming message.
The basic pattern is three steps: retrieve the conversation data, attempt to recognise and handle the instruction, and finally save any changes made to the conversation data back to the configured storage:
To handle the instruction itself, we need to follow a different method to that used in version 3; we no longer use dialog attributes associated with methods to define how each intent is acted on. Instead, we first look at the instance of RecognizerResult we’ve got back from the request to the Luis model. Using the GetTopScoringIntent() method we can retrieve the intent that the model has interpreted as being most likely what our user was asking about. We can also get back a confidence score here too, if we wanted to be more sure about the user’s intention before acting on it.
My first implementation of this was to create a large switch statement to conditionally act on the different intents, which worked fine, but probably wasn’t the cleanest technique to use. It meant I had a large bot class, or “god” class, that best practice would suggest to be easier to manage broken up into a number of smaller ones. A large switch statement with branching logic is often something that shouts strategy pattern as a better approach, so that’s what I went with in the end.
The first step for this is to instantiate the appropriate strategy for the supplied intent, which is done via a factory method that looks up a class in a dictionary that has a key of the intent name. An instance of the class is instantiated and returned to the calling code:
Following a check to confirm we’ve been able to retrieve and create a strategy implementation, we pass it in the constructor of the class HandleIntentContext. The constructor’s parameter is of the base type for each of the strategies, so it’ll accept any one we’ve resolved. The HandleMessage method then simply hands off to the chosen strategy.
The strategy implementation for the “who is” intent is shown below:
We’ve already established the intent of the user from the Luis model at this point, but we still need to extract the entity — ie the person the user is referring to. I’ve created a couple of helper methods for this, defined on the abstract base strategy class.
One of the instances where we want to add some smarts to our bot is to recall who we are talking about if they aren’t specified in the most recent message. This is handled by the following method, which checks for the use of a third person pronoun (eg “he”, “she”) and, if found, looks up the value of the most recently recognised person and uses that:
We then call out to the Timezone API, retrieve the response and, if successful, parse it into a message that’s returned to the user.
Finally, we save the last recognised intent and entity to the conversation data, such that it’s available for use in handling the next message from the user.
One last point worth noting is that we’ve defined one of our intents as a repeat instruction, with the Luis model trained to recognise this intent by phrases like “And Sally Smith” and “What about Fred Bloggs”. In this case we know the entity from the message, but it’s the intent we need to look up from the previous conversation data. We can do that easily by reading the LastRecognisedIntent property that we’ve set in each strategy implementation.