Skip to content
  • Auto
  • Light
  • Dark

Evaluation Runs

Evaluation Runs

Evaluation Runs

Run an Evaluation Test Case
post/v2/gen-ai/evaluation_runs
Retrieve Results of an Evaluation Run
get/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results
Retrieve Information About an Existing Evaluation Run
get/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}
Retrieve Results of an Evaluation Run Prompt
get/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results/{prompt_id}
API Evaluation Metric
APIEvaluationMetricobject
ShowShow
descriptionstring
optional
invertedboolean
optional

If true, the metric is inverted, meaning that a lower value is better.

metric_namestring
optional
metric_typeenum
optional
"METRIC_TYPE_UNSPECIFIED" OR "METRIC_TYPE_GENERAL_QUALITY" OR "METRIC_TYPE_RAG_AND_TOOL"
Hide ParametersShow Parameters
"METRIC_TYPE_UNSPECIFIED"
"METRIC_TYPE_GENERAL_QUALITY"
"METRIC_TYPE_RAG_AND_TOOL"
metric_uuidstring
optional
metric_value_typeenum
optional
"METRIC_VALUE_TYPE_UNSPECIFIED" OR "METRIC_VALUE_TYPE_NUMBER" OR "METRIC_VALUE_TYPE_STRING" OR "METRIC_VALUE_TYPE_PERCENTAGE"
Hide ParametersShow Parameters
"METRIC_VALUE_TYPE_UNSPECIFIED"
"METRIC_VALUE_TYPE_NUMBER"
"METRIC_VALUE_TYPE_STRING"
"METRIC_VALUE_TYPE_PERCENTAGE"
range_maxnumber
optional

The maximum value for the metric.

formatfloat
range_minnumber
optional

The minimum value for the metric.

formatfloat
API Evaluation Metric Result
APIEvaluationMetricResultobject
ShowShow
error_descriptionstring
optional

Error description if the metric could not be calculated.

metric_namestring
optional

Metric name

metric_value_typeenum
optional
"METRIC_VALUE_TYPE_UNSPECIFIED" OR "METRIC_VALUE_TYPE_NUMBER" OR "METRIC_VALUE_TYPE_STRING" OR "METRIC_VALUE_TYPE_PERCENTAGE"
Hide ParametersShow Parameters
"METRIC_VALUE_TYPE_UNSPECIFIED"
"METRIC_VALUE_TYPE_NUMBER"
"METRIC_VALUE_TYPE_STRING"
"METRIC_VALUE_TYPE_PERCENTAGE"
number_valuenumber
optional

The value of the metric as a number.

formatdouble
reasoningstring
optional

Reasoning of the metric result.

string_valuestring
optional

The value of the metric as a string.

API Evaluation Prompt
APIEvaluationPromptobject
ShowShow
ground_truthstring
optional

The ground truth for the prompt.

inputstring
optional
input_tokensstring
optional

The number of input tokens used in the prompt.

formatuint64
outputstring
optional
output_tokensstring
optional

The number of output tokens used in the prompt.

formatuint64
prompt_chunksarray of object
optional

The list of prompt chunks.

Hide ParametersShow Parameters
chunk_usage_pctnumber
optional

The usage percentage of the chunk.

formatdouble
chunk_usedboolean
optional

Indicates if the chunk was used in the prompt.

index_uuidstring
optional

The index uuid (Knowledge Base) of the chunk.

source_namestring
optional

The source name for the chunk, e.g., the file name or document title.

textstring
optional

Text content of the chunk.

prompt_idnumber
optional

Prompt ID

formatint64
prompt_level_metric_resultsarray of error_descriptionstringmetric_namestringmetric_value_typeenumnumber_valuenumberreasoningstringstring_valuestringAPIEvaluationMetricResult
optional

The metric results for the prompt.

API Evaluation Run
APIEvaluationRunobject
ShowShow
agent_deletedboolean
optional

Whether agent is deleted

agent_namestring
optional

Agent name

agent_uuidstring
optional

Agent UUID.

agent_version_hashstring
optional

Version hash

agent_workspace_uuidstring
optional

Agent workspace uuid

created_by_user_emailstring
optional
created_by_user_idstring
optional
formatuint64
error_descriptionstring
optional

The error description

evaluation_run_uuidstring
optional

Evaluation run UUID.

evaluation_test_case_workspace_uuidstring
optional

Evaluation test case workspace uuid

finished_atstring
optional

Run end time.

formatdate-time
pass_statusboolean
optional

The pass status of the evaluation run based on the star metric.

queued_atstring
optional

Run queued time.

formatdate-time
run_level_metric_resultsarray of error_descriptionstringmetric_namestringmetric_value_typeenumnumber_valuenumberreasoningstringstring_valuestringAPIEvaluationMetricResult
optional
run_namestring
optional

Run name.

star_metric_resulterror_descriptionstringmetric_namestringmetric_value_typeenumnumber_valuenumberreasoningstringstring_valuestringAPIEvaluationMetricResult
optional
started_atstring
optional

Run start time.

formatdate-time
statusenum
optional
"EVALUATION_RUN_STATUS_UNSPECIFIED" OR "EVALUATION_RUN_QUEUED" OR "EVALUATION_RUN_RUNNING_DATASET" OR 6 more

Evaluation Run Statuses

Hide ParametersShow Parameters
"EVALUATION_RUN_STATUS_UNSPECIFIED"
"EVALUATION_RUN_QUEUED"
"EVALUATION_RUN_RUNNING_DATASET"
"EVALUATION_RUN_EVALUATING_RESULTS"
"EVALUATION_RUN_CANCELLING"
"EVALUATION_RUN_CANCELLED"
"EVALUATION_RUN_SUCCESSFUL"
"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"
"EVALUATION_RUN_FAILED"
test_case_descriptionstring
optional

Test case description.

test_case_namestring
optional

Test case name.

test_case_uuidstring
optional

Test-case UUID.

test_case_versionnumber
optional

Test-case-version.

formatint64