When starting to learn about Bots in the Microsoft ecosystem, you'll likely land, as I did, on Microsoft's How bots work article. I'm sure that, unlike me, you figured out some of the important subtleties in this article and particularly in the state diagram near the top:
However, unlike you, it took me a while to figure out some key messages in the article, especially as they relate to how Bots actually communicate, which is what we'll discuss in this post.
Now, truth be told, if your Bot is fully working, then how a Bot works might seem irrelevant. However, there are several reasons why it's useful to know how the things we rely on work under the covers. First of all, it's always useful in troubleshooting to know what's going on behind the scenes. In addition, in the case of Teams Bots, understanding how they work will help with some important use cases such as getting the members of a channel or sending a message to a user outside of a regular message-response cycle.
As an example, the ChitChattr Reminder and Quotatious Bots need to send messages on specific schedules rather than in direct response to a user's incoming message. This pattern is referred to in the world of Teams as "Pro-active Messaging" (more info here) and it is clearly a topic a lot of people are having issues with (here are two separate examples just last week), so hopefully I'll be able to cover more about it in future posts. For now, let's return to the trying to understand the original topic.
As I mentioned above, if you're like me you might have glossed over the original image How bots work article and instead of the seeing the actual state diagram above, you might have imagined you saw something like this (I'm ignoring the ConversationUpdate part, and just talking about the message sending):
Now, interestingly enough there is a user case that works like this in Teams, which is for Task Invoke. However, for the most part, the interaction above is totally wrong for two main reasons:
- In concept the communication is between the user and your Bot. In practice there's a very important third party in the mix: the Microsoft Bot Framework Services. You'll notice that, for instance, you change the settings of your Bot in the Azure portal, such as changing the 'Messaging endpoint' (something you do a lot during development, especially if you're using NGrok), you'll realise that this change propagates quite quickly to subsequent calls to your Bot. Now, I haven't determined whether calls to your Bot are actually proxied through the Bot Framework Service or whether it's just used by the implementation (e.g. Teams) to look up details like this prior to an actual call to your Bot (I suspect it's the latter based on a few reasons), but either way it's an important part of the mix. The image below, for reference, shows how you change this setting in Azure for a Bot:
Aside from the missing party, it is also important to be aware that the conversation does NOT in reality take the form of a request and direct response. Rather, when a user sends a message to your Bot, what happens is that an HTTP POST is received by your Bot and then (at least in a normal success scenario), your Bot replies immediately with a 200-code response. This is basically just an acknowledgement - it's letting the user's client know that the message was received successfully. It's now up to you (i.e. to your Bot) to decide what to do:
- Not reply at all - not very nice, but technically acceptable
- "Reply" with a single message response
- "Reply" with a "working on it" signal indicating that an operation is in progress but will still take a while (a "Typing" indicator activity)
- "Reply" with multiple response messages to the user.
I imagine partly it's in order to enable all of these scenarios that the mechanism needs to be, effectively, asynchronous in nature, because, at this stage, your Bot will send it's own HTTP POST message to a specified endpoint in order to send one or more responses. Each of these responses, while they appear to the user to be part of the same chat conversation, are actually totally unique HTTP POST requests, and each will get it's own 200 responses.
- Actually, I kind of lied above. I sincerely apologize for that, and I certainly try not to make a habit of it, but I initially said there were two reasons why my earlier state diagram was incorrect - there's actually a third reason (and maybe even more). Not only are the Microsoft Bot Framework Services missing from my picture, but so is the relevant Bot "Channel" at play (e.g. the Web Chat or Teams itself) (note that "Channel" here is not referring to a "Channel" in Microsoft Teams, but rather the Bot Channels that you register in Azure, being the different types of platforms your Bot supports - Teams, Facebook, Slack, etc.). The Channel forms a fourth participant in the conversation. In the case of Teams, as an example, when your bot sends its POST reply messages, the messages are not being sent to the end user's device. Such an approach might be fine for a support Bot on an ecommerce website, but if that happened in Teams, then if you sent a message on your desktop, you might not see the Bot's reply appear on your tablet, and the conversations would be inconsistent. Instead, the Bot sends it's 'reply' message to a specially-designated Service endpoint. The original message Activity payload that your bot receives contains a "serviceUrl", for example as shown below for Teams:
This Service Url is telling the Bot what endpoint to call if / when it intends on sending a response message to the user.
So, with everything we've learned above, let's put together a new diagram that (hopefully) better illustrates the flows between all the parties involved. Note that I'm not using a proper State diagram but something that I hope is more simple and understandable, and I'm not showing the 200-code responses, only the important flows.
The numbering indicates (roughly) the flow of messages in the exchange, so let step through them and see what's involved:
- The user sends a message to the Bot (in this case the word "Foo"). This is shown in the screenshot below, showing a browser instance of Teams with the F12 Developer tools open. I've marked with a "1" the various places to see this - in the chat window, in the call to the Teams backend, and in the payload of the call itself (hence three things shown as "1" in the image). We'll come back to "3" and "6" a bit later...
- Teams calls into the Microsoft Bot Framework Services, using the Azure Application ID defined in the manifest for your Teams App, in the Bot section, as shown below.
Teams gets the "Messaging endpoint" for your Bot from the Bot Framework Services, as shown in the sample screenshot below.
- In the meantime, while all the above AND the below are going on, the Teams client is making "polling" requests all the time from the client to the backend Teams services. It's doing this because any response that comes back from you Bot will NOT go directly the client, as we said above, but will instead go to the Teams backend, and be stored there waiting for the client's next Poll request. You can see this as (3) in the earlier screenshot, and it will come up later in our discussion.
- Teams now makes a call on the user's behalf to your Bot endpoint, as shown in the NGrok sample screenshot below. Note the "serviceUrl" being sent by Teams to your Bot, so it knows where to call back to.
This call is deserialized via the Bot Framework libraries, in the case of a C# Bot, for example, into an Activity object, from which you can read the Text property, as shown:
- Your Bot code will now send a response. If you're using C#, for instance, you'll simply call something like:
turnContext.SendActivityAsync, like the example below:
Behind the scenes, you're calling into the Bot Framework libraries, which is using an HttpClient internally to make the HTTP POST call to the calling application's Service Url. This is shown below in the code from the Conversation class in the BotBuilder DotNet Project. I've highlighted below where the POST HttpRequestMessage is created, and how it's sent to the internal HttpClient.
- At this point, a "response" text message has been sent to Teams by your Bot. The Teams backend will store this, waiting for the next Poll request from the user's Teams client (web, desktop, mobile). The user's Teams client has been making these Poll requests periodically, as we showed with the call (3) above, but this time there's a response, in the form of a message from your Bot. If such a message does exist, Teams will render it in the user's client, as a message from the Bot.
All of this, of course, is transparent to the user - they just see it as "I sent a message to the Bot, and (hopefully) got a response", as shown below in the screenshot from Teams:
Now, in this example, I sent a garbage message to my Bot in the form of "boo" (or "Foo" in one of the screenshot samples above). The Bot in question uses LUIS to try understand the incoming message, so my Bot, BEFORE sending the user's response message, makes a call to the LUIS API. In this example, it can't understand what "boo" means, so it returns a standard error message "Sorry, it looks like something went wrong". A summary of this conversation, showing the user's Chrome browser instance call on my desktop, the Bot's call to LUIS, from the Bot running locally on my development workstation, and the Bot's call back to Teams, appears in the Fiddler trace below. Note the "Process" column, indicating which application is making which call.
Well, that was quite a long post, especially for my first blog post in close to a decade! I hope it was useful in understanding some of what's going on behind the scenes with how Bots work, with a particular reference to Microsoft Teams.
[Update 2020-01-19] Following an interesting discussion on Stack Overflow, I thought it might be useful to add here that it's not required to use the hosted Microsoft Bot Framework Services in this mix - one can build and run a bot using the Microsoft Bot Framework that's entirely self hosted, for example on premises, if that suits your needs. My thinking at the moment though is that it's better to leverage the Bot Framework Services as they handle things like security and provide an easy jump if you want to make your bot work with 3rd party platforms (e.g. Teams, Facebook). For more on this topic, see the post here by Michael Richardson.