Skip to main content

Scenarios for Evaluations

Scenarios let you evaluate your LLMs and agents. This guide will help you decide the proper format for your scenario based on the type of evaluation you want to run.

Formatting scenario data

The format of your inputs should match the format expected by your model. The format of your result is dependent on the type of evaluations you want to run on the scenario.

Classification

In classification scenarios, the results correspond to the expected category or label that the model should assign to the input. For example, a point in a classification scenario could look like the following:

{
"input": "Can you explain how the WebBizz Rewards loyalty program works and its benefits?",
"result": "rewards"
}

Here rewards indicates that the input should be classified into the rewards category. See the Get started with Classification page for more on classification evaluations.

Retrieval

For a retrieval evaluation, each result is a list of one or more viable document IDs that should be returned for the associated input, like the following:

{
"input": "Can you explain how the WebBizz Rewards loyalty program works and its benefits?",
"result": ["35a4fd5b-453e-4ca6-9536-f20db7303344"]
}

See our Retrieval Testing guide for more details on setting up scenarios retrieval evaluations!

Generation

Evaluation of generative models can either be referenced or reference-free. Referenced evaluations involve comparing the generative model's output to one or more references, and in such cases, the result field should contain the reference(s). For example,

{
"input": "Can you explain how the WebBizz Rewards loyalty program works and its benefits?",
"result": "With WebBizz Rewards, customers can earn points with each purchase and avail exclusive discounts."
}

When performing referenced evaluations, the reference in the result field will be compared against the model's outputs. The content of the reference depends on your use case and can vary from written responses to edited versions of the model's outputs.

For reference-free evaluations, the result field is not strictly necessary, meaning any placeholder value can be provided, e.g.

{
"input": "Can you explain how the WebBizz Rewards loyalty program works and its benefits?",
"result": "<YOUR_PLACEHOLDER_STRING_HERE>"
}

Get started on setting up such scenarios with our Generation evaluation guide.

Simulations

In simulations, scenarios are used to specify the persona your simulation driver should assume (input) and the expected behavior of the simulation target (result). For an in-context example, please see this section of the "Simulations" page.