Introduction to Checks and Metrics

Okareo evaluations span a range of input and output data types, and consequently, we use different metrics to measure the performance across these evaluation types. The following table summarizes the metrics used for each evaluation type.

Evaluation Type	Metrics
Generation	Checks
Simulation	Checks
Retrieval	Precision@k, Recall@k, MRR, MAP
Classification	Accuracy, Precision, Recall, F1

While the metrics under Retrieval and Classification will be familiar to data scientists/machine learning practitioners, checks are unique to Okareo. We provide more details below.

What is a check?

In Okareo, a check is a mechanism for scoring a generative model's output. A check can be narrowly tailored to assess a particular behavior of your LLM.

With checks, you can answer behavioral questions like:

Did the check pass? Was the check's threshold exceeded?
In what situations did this check fail?
Did the check change between Version A and Version B of my model?

Cookbook examples that showcase Okareo checks are available here:

Colab Notebook
Typescript Cookbook (Clone okareo-cookbook repo and download the okareo-cli)

Okareo provides predefined checks that let you bootstrap your generative evaluations and simulations. Additionally, you can upload custom checks to Okareo to tailor your evaluations to your specific needs.

What is a check?​

What is a check?