In this hands-on tutorial we will be building a simple automated triage application that uses Re:infer to categorise incoming emails in real time. We'll build an end-to-end workflow that can be used as a starting point for your own Re:infer automation, and take a detailed look at how to use the real-time Trigger API.
You need the following permissions in order to follow the tutorial. You can check your current permissions on your Account page.
|Contains the pre-labelled |
|Your development organisation||During your onboarding, you should have received access to an organisation that you can use as your development environment.|
Please contact firstname.lastname@example.org if you are unsure what organisation you can use for development, or if you require "View sources" and "View labels" permissions in
In this tutorial, we will be using pre-labelled data. In order to make a copy of the pre-labelled
reinfer-sandbox/integration-tutorial dataset, create a new dataset in your development organisation using the "Copy an existing taxonomy" option. You can find instructions on how to do that here.
Since your new dataset contains labelled data, the model will start training immediately. You can track the model training status in the dataset status bar. Once done, performance metrics for each label will appear on the Validation page, and a new model version will appear on the Models page.
Design your application
Now that you are familiar with the prerequisites, let's start building our end-to-end workflow. In this section we will discuss the design of a simple automated triage application and its integration with Re:infer. In the next sections we will learn about the Trigger API that will drive our automation. Finally, we will build our application based on the design here, and test it using pre-labelled data.
We will target a typical email support use-case as the starting point for our design:
- An Outlook support mailbox receives a large number of customer emails daily.
- A triage team turns each email into a support ticket. This requires populating ticket fields with information from the email (eg. a customer ID). Each ticket is then added to the appropriate workflow queue.
- The tickets in the workflow queues are continuously processed by a customer support team.
There are two automation opportunities here: the triage step and the processing step. This tutorial will demonstrate how to automate the triage step by using Re:infer to extract required fields from the email, and assign the email to a workflow queue.
While the processing step is not discussed in this tutorial, it relies on the data collected during the triage step, so you have the option of adding it after building the sample triage application.
Let's take a look at the diagram below, which sketches out the end-to-end design we will build:
Due to the live connection between the Exchange server and Re:infer, Re:infer can serve as a data source for your application. This way a separate connection between your application and the Exchange server is not needed. Your application will continuously poll Re:infer for new emails, and receive them together with their predicted labels and entities. (We assume that no users are working directly in the mailbox's inbox at the same time as your application is running; otherwise you would need to account for conflicts between your application and mailbox users).
Your application will query Re:infer and, for each email, check whether the required labels and entities are present in the API response. If yes, it will create a ticket in the appropriate workflow queue. If not, it will make a second API request to Re:infer to mark the email as a "no prediction" exception. Similarly, there should be a way for users processing the tickets to report miscategorised tickets so that the corresponding emails can be marked in Re:infer as a "wrong prediction" exception. (Both exception types will then be reviewed and labelled by the model maintainer in order to improve model performance).
Parts of the design (shown in the diagram with a dotted outline) will be out of scope for this tutorial. In a real-life scenario, these steps should of course not be skipped:
- We will be using existing data in the platform instead of setting up a live EWS connection.
- The data comes pre-labelled, so we won't need to train a model.
- We won't design a feedback loop for "wrong prediction" exceptions since the design depends on the capabilities of the system where tickets are processed.
The recommended option for getting email data into Re:infer is to use the Re:infer EWS connector, but other options are also available. Since we are using data that is already in the platform, setting up data ingestion is not part of this tutorial. You can learn more about all available data ingestion options here.
We would like to automate this process:
A triage team turns each email into a support ticket. This requires populating ticket fields with information from the email (eg. a customer ID). Each ticket is then added to the appropriate workflow queue.
For the sake of this tutorial, let's assume that our workflow queues are "Renewal", "Cancellation", "Admin", and "Urgent". Emails concerning renewal, cancellation, and admin tasks (eg. address changes) are supposed to go into the respective queues, while all urgent emails should go into the "Urgent" queue regardless of topic.
Let's also assume that each email can contain a customer ID (either in the subject or the body of the email). The customer ID needs to be extracted so that it can be used when creating a ticket from the email. But since customers sometimes forget to include the customer ID, we want to make this field optional so that a ticket can be created even if no customer ID is present.
In order to be able to categorise emails into the four workflow queues, the model has been trained to predict the labels "Renewal", "Cancellation", "Admin", and "Urgent". In order to extract the customer ID, a "Customer ID" entity has been configured. (Re:infer comes with many pre-built entity kinds; further entity kinds can be added based on the needs of your specific integration. You can see a list of currently available entities here, and learn about requesting new entity kinds here).
We can now come up with a mapping between the predicted label(s) received from Re:infer and the workflow queue the email should go into:
We made a few choices for the sake of the tutorial:
- In addition to the existing four workflow queues there is a special "Uncategorised" queue. If the model is not able to provide a prediction, we put the email there to be manually processed. Alternatively we could have picked an existing queue that should deal with all uncategorised emails, for example "Admin".
- If an email has more than one label from the set of
["Renewal", "Cancellation", "Admin"], it means that it contains multiple requests. We choose to put such emails into the "Uncategorised" queue, perhaps because we don't anticipate to get many of them. Alternatively we could have created a "Complex Requests" queue.
In a real-life scenario, you should base such decisions on the specific requirements of your use case.
In order to query a model for predictions you of course need to have a trained model. A model is trained by labelling some of the data you ingested. Since multiple hours of labelling are required in order to produce a model that performs well, we will be using pre-labelled data in this tutorial so that you won't need to train your own model.
In a real-life scenario, a model trainer should have good domain knowledge of the data. For example, the user of a support mailbox would be a good model trainer to label the data coming from that mailbox. The training needs to be done carefully in order to produce a model that performs well and is not biased. To that end, Re:infer provides training resources and offers hands-on training workshops.
Even a well-performing model will occasionally provide incorrect results, either by failing to predict a label, or by predicting the wrong label. One of the best ways to improve the model is to label the emails the model doesn't perform well on. For this reason, we want to have a feedback loop for such emails:
For each email, your application checks whether required labels and entities are present. If yes, it creates a ticket in the appropriate workflow queue. If not, it makes a second API request to Re:infer to mark the email as a "no prediction" exception. Similarly, there should be a way for users processing the tickets to report miscategorised tickets so that the corresponding emails can be marked in Re:infer as a "wrong prediction" exception.
Our design shows feedback loops for both types of exceptions.
Our design shows workflow queues in an abstract way. In reality, you might be pushing the emails directly into a CRM platform, use a message broker such as Kafka, or even simply move the emails from the inbox folder into a subfolder. For the purposes of this tutorial, we will mock up the queues, but you are encouraged to develop your test integration end-to-end.
Understand Trigger API
In order to fetch incoming emails together with predicted labels and extracted entities, we will use the Trigger API. The Trigger API allows you to define a stream of comments based on a dataset, pinned model version, and optional comment filters, and to iterate through them in a stateful way. It is so named due to its intended use in applications that "trigger" downstream actions in an automated process. Go ahead and read the doc pages to make sure you understand how Triggers work.
Each result in the Trigger response will contain a comment, a list of predicted labels, and a list of entities. This is passed as a JSON structure as seen below:
The following section explains how to correctly interpret the predicted labels in each trigger response.
The Trigger endpoint will return predicted labels together with a confidence score (a number between 0 and 1). For example, the snippet below would be for predictions of "Cancellation" and "Admin" with confidences of about 0.84 and 0.02 respectively:
In order to correctly interpret such a result, you need to determine the minimum confidence score at which you will treat the prediction as saying "yes, the label applies". We call this number the confidence score threshold.
Precision and Recall
In order to understand confidence thresholds you should be familiar with the terms precision and recall. You can find an explanation of these terms on our support pages. Briefly, a high precision relates to a low false positive rate (i.e. your results are more likely to be accurate), and a high recall relates to a low false negative rate (i.e. you are less likely to miss relevant results).
The confidence threshold for a label corresponds to a specific precision-vs-recall tradeoff. Unless your model performs perfectly, having high precision will sacrifice some recall, and conversely having high recall will sacrifice some precision. These tradeoffs are visualised as a precision-recall-curve for each label on the Validation page.
Using the interactive slider, you can quickly find your desired threshold: move the slider to the right to optimize for precision, or to the left to optimize for recall, until you find the precision and recall that matches your application requirements. The displayed threshold value will be your desired threshold. If you want to learn more about the Validation page functionality, please see the support pages.
If you look through the Validation page you might notice that the shapes of precision-recall-curves are different for each label. This gives you a hint as to how we will be picking thresholds: we will pick an individual threshold for each label. This is particularly important in automation applications where the best performance must be ensured.
For our sample application, we will pick a balanced precision-vs-recall tradeoff for our "Renewal", "Cancellation", and "Admin" labels, and optimize for recall for the "Urgent" label (so that we are less likely to miss urgent emails). (Note that due to continuous improvements in the underlying ML performance, by the time you are doing this tutorial the precision and recall values at the picked threshold might be slightly different from what is shown here.)
We'll be using the following threshold values for the rest of the tutorial.
We can now apply the thresholds to our example. The "Cancellation" label applies since
0.8374786376953125 > 0.619. The "Admin" label doesn't apply since
0.0164003014564514 < 0.898.
To make this process easier, the Triggers API allows you to specify your label thresholds in the Trigger config. If specified, only labels with values above its threshold are returned.
In a real-life scenario, the target precision-recall performance will be decided by the combination of business requirements and historical model performance. For example, if a label historically achieved 85% precision at 55% recall, you may decide to invest additional time into training it up to 90% precision at 55% recall. You will then pin the new model version, pick new thresholds, and update the configuration of your application. You can find more about this process here.
Build your application
Having finalised our design, we are ready to start building out our application.
Pin your Model
Go to the Models page and pin the model by clicking on the "Save" toggle. Once the model is pinned, you can start accessing it via the API.
If you want to follow this part of the tutorial using a different labelled dataset, you should make sure that it's sufficiently labelled. In particular, a dataset with only a few labelled examples will produce a model that won't return predictions for the majority of comments.
This tutorial will show you how to configure a Trigger using the most common options. For a full list of options, see the Trigger API reference. For an overview of parameteres required by prediction endpoints, see the Pinned Model How-To.
Congratulations, you have completed the Re:infer automation tutorial. Of course, your own automation application may be different from what is covered here. Please contact email@example.com if you have questions.