This page describes how to interpret labels and entities downloaded from the Re:infer platform for use in your application. This page describes the labels and entities themselves - to understand where to find them in the downloaded data, be sure to check the documentation for your chosen download method.
A comment can have zero, one, or multiple predicted labels. The example below shows two predicted labels (Order and Order > Missing) together with their confidence scores. This format is used by most API routes. An exception is the Dataset Export route which formats label names as strings instead of lists (to be consistent with the CSV export in the browser).
- All routes except Dataset Export
- Dataset Export route
The Label object has the following format:
|array<string> or string||All API routes except Dataset Export: The name of the predicted label, formatted as a list of hierarchical labels. For instance, the label Parent Label > Child Label will have the format |
Dataset Export API route: The name of the predicted label, formatted as a string with
|number||Confidence score. A number between 0.0 and 1.0.|
|number||Sentiment score. A number between -1.0 and 1.0. Only returned if sentiments are enabled in the dataset.|
A: The following download methods provide labels: Re:infer API, CSV downloads, and Re:infer command-line tool. Please take a look at the Downloading Data page for an overview of the available download methods, and the FAQ item below for a detailed comparison.
A: The tables below explain the differences between the download methods. A description of labels in the Explore page in the Re:infer web UI is provided for comparison.
Explore page, CSV download, Re:infer command-line tool, and the Export API endpoint provide latest available predictions. Note that after a new model version has been trained, but before all predictions have been recalculated, you will see a mix of predictions from the latest and the previous model versions. These methods are aware of assigned labels and will show them as assigned or with a confidence score of 1.
|Method||Assigned Labels||Predicted Labels|
|Explore Page||Explore page visually differentiates assigned labels from predicted labels. It does not report confidence scores for assigned labels.||Explore page is designed to support the model training workflow, so it shows selected predicted labels that the user may want to pin. It will preferentially show labels that meet a balanced threshold (derived from F-score for that label), but may also show labels with lower probability as a suggestion, if the user is likely to want to pin them.|
|Export API||Returns assigned labels.||Returns all predicted labels (no threshold is applied).|
|CSV Download||Returns a confidence score of 1 for assigned labels. Note that predicted labels may also have a score of 1 if the model is very confident.||Returns all predicted labels (no threshold is applied).|
|Re:infer CLI||If a comment has assigned labels, will return both assigned and predicted labels for that comment.||Returns all predicted labels (no threshold is applied).|
In contrast to the non-deterministic methods above, Trigger API and Predict API routes will return predictions from a specific model version. As such, these API routes behave as if you downloaded a comment from the platform and then sent it for prediction against a specific model version, and are not aware of assigned labels.
|Method||Assigned Labels||Predicted Labels|
|Trigger API and Predict API||Not aware of assigned labels.||Return predicted labels with confidence score above the provided label thresholds (or above the default value of 0.25 if no thresholds are provided).|
When designing an application that makes decisions on a per-verbatim basis, you will want to convert the confidence score of each label into a Yes-or-No answer. You can do that by determining the minimum confidence score at which you will treat the prediction as saying "yes, the label applies". We call this number the confidence score threshold.
How to pick a confidence score threshold
A common misconception is picking the threshold to equal the precision you'd like to get ("I want the labels to be correct at least 70% of the time, so I will pick labels with confidence scores above 0.70"). To understand thresholds and how to pick them, please check the Confidence Thresholds section of the integration guide.
When it's impractical to hand-pick thresholds
Re:infer can auto-set sensible thresholds to fit your use-case. This functionality is being actively improved; please contact us at email@example.com if you are interested in using it.
If you are exporting labels for use in an analytics application, it's important to decide whether to expose confidence scores to users. For users of business analytics applications, you should convert the confidence scores into presence or absence of the label using one of the approaches described in the Automation section. On the other hand, users of data science applications proficient in working with probabilistic data will benefit from access to raw confidence scores.
An important consideration is to make sure that all predictions in your analytics application are from the same model version. If you are upgrading your integration to fetch predictions from a new model version, all predictions will need to be reingested for the data to stay consistent.
A comment can have zero, one, or multiple predicted entities. The example below shows one predicted
order_number entity. Note that unlike labels, entities do not have associated confidence scores.
The API returns entities in the following format:
|string||(Deprecated) Entity kind.|
|Span||An object containing location of the entity in the comment.|
A: The following download methods provide entities: Re:infer API and Re:infer command-line tool. Please take a look at the Downloading Data overview to understand which method is suitable for your use-case. Note that CSV downloads will not include entities.
Responses to prediction requests will also contain information about the model that was used to make the predictions.
|timestamp||When the model version was pinned.|