Skip to main content

Get predictions for a pinned model

POST/api/v1/datasets/<project>/<dataset_name>/labellers/<version>/predict
Permissions required: View labels, View sources
Billable Operation

You will be charged 1 AI unit per created comment, or per updated comment (based on its unique ID) if its text was modified.

curl -X POST 'https://<my_api_endpoint>/api/v1/datasets/<project>/<dataset_name>/labellers/<version>/predict' \
-H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"documents": [
{
"messages": [
{
"body": {
"text": "Hi Bob,\n\nCould you send me the figures for today?"
},
"from": "alice@company.com",
"sent_at": "2020-01-09T16:34:45Z",
"signature": {
"text": "Thanks,\nAlice"
},
"subject": {
"text": "Figures Request"
},
"to": [
"bob@organisation.org"
]
}
],
"timestamp": "2013-09-12T20:01:20.000000+00:00",
"user_properties": {
"string:City": "London"
}
},
{
"messages": [
{
"body": {
"text": "Alice,\n\nHere are the figures for today."
},
"from": "bob@organisation.org",
"sent_at": "2020-01-09T16:44:45Z",
"signature": {
"text": "Regards,\nBob"
},
"subject": {
"text": "Re: Figures Request"
},
"to": [
"alice@company.com"
]
}
],
"timestamp": "2011-12-12T10:04:30.000000+00:00",
"user_properties": {
"string:City": "Bucharest"
}
}
],
"threshold": 0.25
}'

You have to provide the model version you want to query for predictions in the request. You can use the integer version number, or the special values live or staging to query the current Live or Staging model version.

Request Format

NameTypeRequiredDescription
documentsarray<Comment>yesA batch of at most 4096 documents, in the format described in the Comment Reference. Larger batches are faster (per document) than smaller ones.
thresholdnumbernoThe confidence threshold to filter the label results by. A number between 1.0 and 0.0. 0.0 will include all results. Set to "auto" to use auto-thresholds. If not set, the default threshold of 0.25 will be used.
labelsarray<Label>noA list of requested labels to be returned with optionally label-specific thresholds.

Where Label has the following format:

NameTypeRequiredDescription
namearray<string>yesThe name of the label to be returned, formatted as a list of hierarchical labels. For instance, the label "Parent Label > Child Label" will have the format ["Parent Label", "Child Label"].
thresholdnumbernoThe confidence threshold to use for the label. If not specified, will default to the threshold specified at the top-level.

Response Format

NameTypeDescription
statusstringok if the request is successful, or error in case of an error. See the Overview to learn more about error responses.
predictionsarray<array<Label>>A list of array<Label> in the same order as the comments in the request, where each Label has the format described here.
entitiesarray<array<Entity>>A list of array<Entity> in the same order as the comments in the request, where each Entity has the format described here.
label_propertiesarray<LabelProperty>An array containing predicted label properties for this comment, where each LabelProperty has the format described here.
modelModelInformation about the model that was used to make the predictions, in the format described here.

Get predictions for a pinned model for raw emails

POST/api/v1/datasets/<project>/<dataset_name>/labellers/<version>/predict-raw-emails
Permissions required: View labels, View sources
Billable Operation

You will be charged 1 AI unit per created comment, or per updated comment (based on the email's Message ID) if its text was modified.

curl -X POST 'https://<my_api_endpoint>/api/v1/datasets/<project>/<dataset_name>/labellers/<version>/predict-raw-emails' \
-H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"documents": [
{
"raw_email": {
"body": {
"plain": "Hi Bob,\n\nCould you send me the figures for today?\n\nThanks,\nAlice"
},
"headers": {
"parsed": {
"Date": "Thu, 09 Jan 2020 16:34:45 +0000",
"From": "alice@company.com",
"Message-ID": "abcdef@company.com",
"References": "<01234@company.com> <56789@company.com>",
"Subject": "Figures Request",
"To": "bob@organisation.org"
}
}
},
"user_properties": {
"string:City": "London"
}
},
{
"raw_email": {
"body": {
"html": "<p>Alice,</p><p>Here are the figures for today.</p><p>Regards,<br/>Bob</p>"
},
"headers": {
"raw": "Message-ID: 012345@company.com\nDate: Thu, 09 Jan 2020 16:44:45 +0000\nSubject: Re: Figures Request\nFrom: bob@organisation.org\nTo: alice@company.com"
}
},
"user_properties": {
"string:City": "Bucharest"
}
}
],
"include_comments": false,
"threshold": 0.25,
"transform_tag": "generic.0.CONVKER5"
}'

You have to provide the model version you want to query for predictions in the request. You can use the integer version number, or the special values live or staging to query the current Live or Staging model version.

Request Format

NameTypeRequiredDescription
transform_tagstringyesA tag specifying how the raw data should be processed.
documentsarray<Document>yesA batch of at most 4096 documents in the format described below. Larger batches are faster (per document) than smaller ones.
thresholdnumbernoThe confidence threshold to filter the label results by. A number between 1.0 and 0.0. 0.0 will include all results. Set to "auto" to use auto-thresholds. If not set, the default threshold of 0.25 will be used.
labelsarray<Label>noA list of requested labels to be returned with optionally label-specific thresholds.
include_commentsbooleannoIf set to true, the comments parsed from the emails will be returned in the response body.

Where Document has the following format:

NameTypeRequiredDescription
raw_emailRawEmailyesEmail data, in the format described here.
user_propertiesmap<string, string | number>noAny user-defined metadata that applies to the comment. The format is described here.

Note: Some user properties are generated based on the email content. If these conflict with uploaded user properties, the request will fail with 422 Unprocessable Entity.

Where Label has the following format:

NameTypeRequiredDescription
namearray<string>yesThe name of the label to be returned, formatted as a list of hierarchical labels. For instance, the label "Parent Label > Child Label" will have the format ["Parent Label", "Child Label"].
thresholdnumbernoThe confidence threshold to use for the label. If not specified, will default to the threshold specified at the top-level.

Response Format

NameTypeDescription
statusstringok if the request is successful, or error in case of an error. See the Overview to learn more about error responses.
commentsarray<Comment>A list of comments parsed from the uploaded raw emails, in the format described in the Comment Reference. Only returned if you set include_comments in the request.
predictionsarray<array<Label>>A list of array<Label> in the same order as the comments in the request, where each Label has the format described here.
entitiesarray<array<Entity>>A list of array<Entity> in the same order as the comments in the request, where each Entity has the format described here.
label_propertiesarray<LabelProperty>An array containing predicted label properties for this comment, where each LabelProperty has the format described here.
modelModelInformation about the model that was used to make the predictions, in the format described here.

Note: For large requests, this endpoint may take longer to respond. You should increase your client timeout.

Get predictions for a pinned model by comment id

POST/api/v1/datasets/<project>/<dataset_name>/labellers/<version>/predict-comments
Permissions required: View labels, View sources
curl -X POST 'https://<my_api_endpoint>/api/v1/datasets/<project>/<dataset_name>/labellers/<version>/predict-comments' \
-H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"threshold": 0.25,
"uids": [
"18ba5ce699f8da1f.0001",
"18ba5ce699f8da1f.0002"
]
}'

You have to provide the model version you want to query for predictions in the request. You can use the integer version number, or the special values live or staging to query the current Live or Staging model version.

Request Format

NameTypeRequiredDescription
uidsarray<string>yesA list of at most 4096 combined source_id-s and comment_id-s in the format of source_id.comment_id. Sources don't need to belong to the current dataset - so you can request predictions of comments for a source in a different (or no) dataset. Larger lists are faster (per comment) than smaller ones.
thresholdnumbernoThe confidence threshold to filter the label results by. A number between 1.0 and 0.0. 0.0 will include all results. Set to "auto" to use auto-thresholds. If not set, the default threshold of 0.25 will be used.
labelsarray<Label>noA list of requested labels to be returned with optionally label-specific thresholds.

Where Label has the following format:

NameTypeRequiredDescription
namearray<string>yesThe name of the label to be returned, formatted as a list of hierarchical labels. For instance, the label "Parent Label > Child Label" will have the format ["Parent Label", "Child Label"].
thresholdnumbernoThe confidence threshold to use for the label. If not specified, will default to the threshold specified at the top-level.

Response Format

NameTypeDescription
statusstringok if the request is successful, or error in case of an error. See the Overview to learn more about error responses.
predictionsarray<Prediction>A list of predictions in the format described below.
modelModelInformation about the model that was used to make the predictions, in the format described here.

Where Prediction has the following format:

NameTypeDescription
uidstringA combined source_id and comment_id in the format of source_id.comment_id.
labelsarray<Label>An array containing predicted labels for this comment, where Label has the format described here.
entitiesarray<Entity>An array containing predicted entities for this comment, where Entity has the format described here.
label_propertiesarray<LabelProperty>An array containing predicted label properties for this comment, where each LabelProperty has the format described here.

Note: For large requests, this endpoint may take longer to respond. You should increase your client timeout.

Get model validation statistics

GET/api/v1/datasets/<project>/<dataset_name>/labellers/<version>/validation
Permissions required: View labels, View sources
curl -X GET 'https://<my_api_endpoint>/api/v1/datasets/project1/collateral/labellers/live/validation' \
-H "Authorization: Bearer $REINFER_TOKEN"

This route returns statistics of how well a model is performing. Same statistics can be viewed in the Validation page. A model's statistics can be requested with its integer version number. You can use the special values live and staging to retrieve statistics for the current Live or Staging model versions, or the special value latest for the most recently available model version.

Although this endpoint accepts both pinned and not pinned model versions, we recommend querying either pinned model versions or the special value latest, as statistics are not guaranteed to be available for not pinned model versions.

The response validation object contains the following fields:

NameTypeDescription
mean_average_precision_safefloatMean Average Precision score (between 0.0 and 1.0). This field will be null if MAP is unavailable.
num_labelsnumberNumber of labels in the taxonomy (at the time the model version was pinned).
labelsarray<Label>List of labels in the taxonomy (at the time the model version was pinned). Note that, as the response example demonstrates, parent labels are returned as a separate label in addition to being returned as a part of child labels.
num_reviewed_commentsnumberNumber of reviewed comments in the dataset (at the time the model version was pinned).
versionnumberModel version.
num_amber_labelsnumberNumber of labels in amber warning state.
num_red_labelsnumberNumber of labels in red warning state.
dataset_scorenumberOverall dataset score, between 0 and 100.
dataset_qualitystringOne of "poor", "average", "good", "excellent", representing the overall dataset quality rank. Can be null if there is not enough data.
balancefloatA measure of the similarity between reviewed and unreviewed comments (between 0.0 and 1.0). Can be null if there is not enough data.
balance_qualitystringOne of "poor", "average", "good", "excellent", representing the balance quality rank. Can be null if there is not enough data.
coveragefloatA fractional value of label coverage in the dataset (between 0.0 and 1.0). Can be null if there is not enough data.
coverage_qualitystringOne of "poor", "average", "good", "excellent", representing the coverage quality rank. Can be null if there is not enough data.
all_labels_qualitystringOne of "poor", "average", "good", "excellent", representing the all labels quality rank. Can be null if there is not enough data.
underperforming_labels_qualitystringOne of "poor", "average", "good", "excellent", representing the underperforming labels quality rank. Can be null if there is not enough data.

Where Label has the following format:

NameTypeDescription
namestringThe name of the label, formatted as a string.
partsarray<string>The name of the label, formatted as a list of hierarchical labels. For instance, the label "Parent Label > Child Label" will have the format ["Parent Label", "Child Label"].