Scenarios for Evaluations
Scenarios let you evaluate your LLMs and agents. This guide will help you decide the proper format for your scenario based on the type of evaluation you want to run.
Formatting scenario data
The format of your input
s should match the format expected by your model. The format of your result
is dependent on the type of evaluations you want to run on the scenario.
Classification
In classification scenarios, the result
s correspond to the expected category or label that the model should assign to the input
. For example, a point in a classification scenario could look like the following:
{
"input": "Can you explain how the WebBizz Rewards loyalty program works and its benefits?",
"result": "rewards"
}
Here rewards
indicates that the input
should be classified into the rewards
category. See the Get started with Classification page for more on classification evaluations.
Retrieval
For a retrieval evaluation, each result
is a list of one or more viable document IDs that should be returned for the associated input
, like the following:
{
"input": "Can you explain how the WebBizz Rewards loyalty program works and its benefits?",
"result": ["35a4fd5b-453e-4ca6-9536-f20db7303344"]
}
See our Retrieval Testing guide for more details on setting up scenarios retrieval evaluations!
Generation
Evaluation of generative models can either be referenced or reference-free. Referenced evaluations involve comparing the generative model's output to one or more references, and in such cases, the result
field should contain the reference(s). For example,
{
"input": "Can you explain how the WebBizz Rewards loyalty program works and its benefits?",
"result": "With WebBizz Rewards, customers can earn points with each purchase and avail exclusive discounts."
}
When performing referenced evaluations, the reference in the result
field will be compared against the model's outputs. The content of the reference depends on your use case and can vary from written responses to edited versions of the model's outputs.
For reference-free evaluations, the result
field is not strictly necessary, meaning any placeholder value can be provided, e.g.
{
"input": "Can you explain how the WebBizz Rewards loyalty program works and its benefits?",
"result": "<YOUR_PLACEHOLDER_STRING_HERE>"
}
Get started on setting up such scenarios with our Generation evaluation guide.
Simulations
In simulations, scenarios are used to specify the persona your simulation driver should assume (input) and the expected behavior of the simulation target (result). For an in-context example, please see this section of the "Simulations" page.