Skip to main content

Scenarios

What is a Scenario?

In Okareo, a scenario is a structured representation of a dataset used to test, evaluate, and improve large language models (LLMs). Each scenario is a collection of data points, with each point defined by an input and its expected result. These form the backbone of rigorous model evaluations and synthetic data generation efforts.

You can represent each data point as a JSON or dictionary object. Scenarios can be used for both human-authored test sets and machine-generated variants.

tip

Create your own scenario and synthetic data in a Colab Notebook

Why Use Scenarios?

Scenarios unlock a wide range of capabilities:

  • Model Evaluation: Use scenarios to evaluate classification, retrieval, and generation models with Okareo’s built-in evaluation framework.
  • Synthetic Data Generation: Generate new test cases using scenario generators, expanding a few examples into robust datasets automatically.
  • Baseline and Golden Dataset Creation: Define benchmarks that enable apples-to-apples comparisons across different models or model versions.

Golden Datasets and Baselines

A critical use of scenarios is in establishing golden datasets and baselines:

  • Golden Datasets are curated sets of scenario examples that capture your ideal model behavior. They serve as a reference point for evaluating future models.
  • Baselines define the minimum expected performance and help you assess improvements over time or across different model options.

Benefits

  • Data-Driven Decisions: Make informed choices about model deployment using consistent reference points.
  • Consistency: Ensure evaluations are standardized, repeatable, and reliable.
  • Rapid Iteration: Test and iterate quickly on new models or configurations using the same scenario framework.

Running Experiments with Scenarios

Okareo enables powerful experimentation across models:

  • Compare Models: Run multiple models against the same scenario set to benchmark performance under identical conditions.

  • Generate Variations: Use LLMs to automatically produce new examples from seed data. For example:

    okareo.generate_scenarios(source_scenario, name, number_examples, generation_type)
  • Evaluate and Iterate: Apply quality checks, review outcomes, and refine scenarios as part of an ongoing loop to optimize model performance.

Getting Started

You can work with scenarios via both the Okareo UI and the Python/TypeScript SDKs:

  • Define Your Scenarios: Create a list of input and result pairs that capture expected model behavior.
  • Create Scenario Sets: Organize them into reusable sets using the Okareo SDK or UI.
  • Evaluate and Compare: Run models against these sets to monitor performance and track improvements over time.
note

Try creating and generating scenarios for yourself with the companion Jupyter notebook - scenarios.ipynb