Evaluation Runs

Run an Evaluation Test Case

post/v2/gen-ai/evaluation_runs

Retrieve Results of an Evaluation Run

get/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results

Retrieve Information About an Existing Evaluation Run

get/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}

Retrieve Results of an Evaluation Run Prompt

get/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results/{prompt_id}

API Evaluation Metric

APIEvaluationMetricobject

ShowShow

descriptionstring

optional

invertedboolean

optional

If true, the metric is inverted, meaning that a lower value is better.

metric_namestring

optional

metric_typeenum

optional

"METRIC_TYPE_UNSPECIFIED" OR "METRIC_TYPE_GENERAL_QUALITY" OR "METRIC_TYPE_RAG_AND_TOOL"

Hide ParametersShow Parameters

"METRIC_TYPE_UNSPECIFIED"

"METRIC_TYPE_GENERAL_QUALITY"

"METRIC_TYPE_RAG_AND_TOOL"

metric_uuidstring

optional

metric_value_typeenum

optional

"METRIC_VALUE_TYPE_UNSPECIFIED" OR "METRIC_VALUE_TYPE_NUMBER" OR "METRIC_VALUE_TYPE_STRING" OR "METRIC_VALUE_TYPE_PERCENTAGE"

Hide ParametersShow Parameters

"METRIC_VALUE_TYPE_UNSPECIFIED"

"METRIC_VALUE_TYPE_NUMBER"

"METRIC_VALUE_TYPE_STRING"

"METRIC_VALUE_TYPE_PERCENTAGE"

range_maxnumber

optional

The maximum value for the metric.

formatfloat

range_minnumber

optional

The minimum value for the metric.

formatfloat

API Evaluation Metric Result

APIEvaluationMetricResultobject

ShowShow

error_descriptionstring

optional

Error description if the metric could not be calculated.

metric_namestring

optional

Metric name

metric_value_typeenum

optional

"METRIC_VALUE_TYPE_UNSPECIFIED" OR "METRIC_VALUE_TYPE_NUMBER" OR "METRIC_VALUE_TYPE_STRING" OR "METRIC_VALUE_TYPE_PERCENTAGE"

Hide ParametersShow Parameters

"METRIC_VALUE_TYPE_UNSPECIFIED"

"METRIC_VALUE_TYPE_NUMBER"

"METRIC_VALUE_TYPE_STRING"

"METRIC_VALUE_TYPE_PERCENTAGE"

number_valuenumber

optional

The value of the metric as a number.

formatdouble

reasoningstring

optional

Reasoning of the metric result.

string_valuestring

optional

The value of the metric as a string.

API Evaluation Prompt

APIEvaluationPromptobject

ShowShow

ground_truthstring

optional

The ground truth for the prompt.

inputstring

optional

input_tokensstring

optional

The number of input tokens used in the prompt.

formatuint64

outputstring

optional

output_tokensstring

optional

The number of output tokens used in the prompt.

formatuint64

prompt_chunksarray of object

optional

The list of prompt chunks.

Hide ParametersShow Parameters

chunk_usage_pctnumber

optional

The usage percentage of the chunk.

formatdouble

chunk_usedboolean

optional

Indicates if the chunk was used in the prompt.

index_uuidstring

optional

The index uuid (Knowledge Base) of the chunk.

source_namestring

optional

The source name for the chunk, e.g., the file name or document title.

textstring

optional

Text content of the chunk.

prompt_idnumber

optional

Prompt ID

formatint64

prompt_level_metric_resultsarray of

optional

The metric results for the prompt.

API Evaluation Run

APIEvaluationRunobject

ShowShow

agent_deletedboolean

optional

Whether agent is deleted

agent_namestring

optional

Agent name

agent_uuidstring

optional

Agent UUID.

agent_version_hashstring

optional

Version hash

agent_workspace_uuidstring

optional

Agent workspace uuid

created_by_user_emailstring

optional

created_by_user_idstring

optional

formatuint64

error_descriptionstring

optional

The error description

evaluation_run_uuidstring

optional

Evaluation run UUID.

evaluation_test_case_workspace_uuidstring

optional

Evaluation test case workspace uuid

finished_atstring

optional

Run end time.

formatdate-time

pass_statusboolean

optional

The pass status of the evaluation run based on the star metric.

queued_atstring

optional

Run queued time.

formatdate-time

run_level_metric_resultsarray of

optional

run_namestring

optional

Run name.

star_metric_result

optional

started_atstring

optional

Run start time.

formatdate-time

statusenum

optional

"EVALUATION_RUN_STATUS_UNSPECIFIED" OR "EVALUATION_RUN_QUEUED" OR "EVALUATION_RUN_RUNNING_DATASET" OR 6 more

Evaluation Run Statuses

Hide ParametersShow Parameters

"EVALUATION_RUN_STATUS_UNSPECIFIED"

"EVALUATION_RUN_QUEUED"

"EVALUATION_RUN_RUNNING_DATASET"

"EVALUATION_RUN_EVALUATING_RESULTS"

"EVALUATION_RUN_CANCELLING"

"EVALUATION_RUN_CANCELLED"

"EVALUATION_RUN_SUCCESSFUL"

"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"

"EVALUATION_RUN_FAILED"

test_case_descriptionstring

optional

Test case description.

test_case_namestring

optional

Test case name.

test_case_uuidstring

optional

Test-case UUID.

test_case_versionnumber

optional

Test-case-version.

formatint64