Skip to main content

Overview

What problem are we solving?

AI/ML is becoming increasingly common in software development. Deterministic code, which always produces the same output given the same input, is relatively easy to test. Non-deterministic software components, which can produce different outputs given the same input, require new approaches.

Manual testing and manually monitored production feedback loops can be used to improve models. But doing so is arduous, time consuming and risky.

Enter Okareo. Our focus is on helping you establish reliable AI throughout your development lifecycle.

Okareo Diagram

Getting Started

When seeking reliability, it is not hard for model evaluation to get complicated, fast. But let's start with something simple to get a feel for Okareo.

Okareo Platform Basics

info

Okareo provides SDK, CLI, and Notebook support for Python and Typescript. If your prefer Python Notebooks, feel free to start with the exmaples below. This guide focuses on the CLI+SDK path useful for local development and CI.

Sign-Up and get your API Token

To use Okareo you will need an API Token, some data, and a model to test.

  1. If you haven't already, you can sign-up for a free account with full access.
  2. Provision your API Token from the Home page or from Settings > API Token
  3. We suggest making the token available in your environment as OKAREO_API_KEY
export OKAREO_API_KEY="<YOUR_TOKEN>"

Install the Okareo CLI

Okareo includes a simple CLI script runner that you can use with Python, Typescript and even yaml config to drive evaluation. To get started quickly, we will provide simple Python and Typescript examples.

Download the latest version of the okareo CLI (okareo) for your development environment.

curl -O -L https://github.com/okareo-ai/okareo-cli/releases/latest/download/okareo_darwin_arm64.tar.gz
tar -xvf okareo_darwin_arm64.tar.gz

Add Okareo to your path after unpacking:

export PATH="$PATH:[LOCAL_PATH_WHERE_YOU_UNPACKED_OKAREO]"

Run okareo -v to verify your installation before moving to next step.

Let's Run an Evaluation!

Step 1: Create an Okareo Project

Okareo projects are language specific typescript or python

okareo init --language typescript

The init command creates a .okareo folder with a config.yml file and a folder called flows. Evaluation and fine tuning flows you want to run from the CLI are placed in the flows folder and can be run individually or as a group.

Step 2: Create an Evaluation Flow

Everyone's AI/Model evaluation needs are different. We have provided some common examples you can build on.

Here we will evaluation a Function Call. The goal is to detemine if the "model" (in this case a simple code block) will correctly interpret the request and respond with a signature complete API response. In this mock example, failure could mean accidentally deleting a valid user.

Save the following script as function_eval.ts in your .okareo/flows folder created by the okareo init command.

// Save this flow as function_eval.ts and place it in your .okareo/flows folder
import { Okareo, RunTestProps, TestRunType, CustomModel,} from "okareo-ts-sdk";

const main = async () => {
try {
const okareo = new Okareo({api_key:process.env.OKAREO_API_KEY});
const project_id = (await okareo.getProjects()).find(p => p.name === 'Global')?.id;

const seedData = [
{ input: "can you delete my account? my name is Bob", result: {name: "delete_account", parameter_definitions: { username: { value: "Bob", type: "str", required: true } } } },
{ input: "how do I make an account? I'm Alice", result: { name: "create_account", parameter_definitions: { username: { value: "Alice", type: "str", required: true } } } },
{ input: "how do I create an account?", result: { name: "create_account", parameter_definitions: { username: { value: "Alice", type: "str", required: true } } } },
{ input: "my name is John. how do I create a project?", result: { name: "create_account", parameter_definitions: { username: { value: "Alice", type: "str", required: true } } } }];

const scenario: any = await okareo.create_scenario_set({
name: `Function Call Demo Scenario - ${(Math.random() + 1).toString(36).substring(7)}`,
project_id: project_id,
seed_data: seedData
});

const function_call_model = {
type: 'custom',
invoke: async (input_value) => {
const usernames = ["Alice", "Bob", "Charlie"];
const out: { tool_calls: { name: string; parameters: { [key: string]: any } }[] } = { tool_calls: [] };
const tool_call: { name: string; parameters: { [key: string]: any } } = { name: "unknown", parameters: {} };
if (input_value.includes("delete")) {
tool_call.name = "delete_account";
}
if (input_value.includes("create")) {
tool_call.name = "create_account";
}
for (const username of usernames) {
if (input_value.includes(username)) {
tool_call.parameters["username"] = username;
break;
}
}
out.tool_calls.push(tool_call);
return {
model_prediction: out,
model_input: input_value,
model_output_metadata: {},
};
}
} as CustomModel;

const model = await okareo.register_model({
name: 'Function Call Demo Model',
project_id: project_id,
models: function_call_model,
update: true,
});

const eval_run: any = await model.run_test({
name: 'Function Call Demo Evaluation',
project_id: project_id,
scenario_id: scenario.scenario_id,
calculate_metrics: true,
type: TestRunType.NL_GENERATION,
checks: ["is_function_correct", "are_required_params_present", "are_all_params_expected", "do_param_values_match"],
} as RunTestProps);

console.log(`View the evaluation in the Okareo app: ${eval_run.app_link}`);

} catch (e) {
console.error(JSON.stringify(e, null, 2));
}
}
main();

Step 3: Run your first flow

Let's run your first Okareo flow. In this case we are using the -f flag to just run the function_eval script. For a list of available comamnds, you can use okareo --help.

okareo run -f function_eval

Step 4: What to do next

Now that we have a working flow that establishes a scenario, registers a model and runs an evaluation, you are ready to start building and evaluating your AI-native capabilities. Also, don't hesitate to give us feedback. Learning is golden.

Next Steps:
  • Synthetic Scenario Generation: Learn about synthetically creating a behavior map of positive and negative scenarios that you can use to establish baseline metrics for your models.
  • Supported Models and Approaches: Okareo includes built-in support for a wide variety of model types and providers incuding custom models.
  • Evaluation and Checks: Evaluations are driven through discrete Checks that provide specific metrics. Checks can be deterministic code or based on an AI judge. Learn more about built-in and custom checks for evaluation.
  • Fine Tuning: Assembling and organizing data sets for fine tuning is a native element of Okareo. Learn more from our Founding Data Scientist in this blog on bootstrapping fine tuning with Okareo.