Overview
What problem are we solving?
AI/ML is becoming increasingly common in software development. Deterministic code, which always produces the same output given the same input, is relatively easy to test. Non-deterministic software components, which can produce different outputs given the same input, require new approaches to testing.
Manual testing and manually monitored production feedback loops can be used to improve models. But doing so is arduous, time consuming and risky.
Enter Okareo. Our focus is on helping you establish reliable AI throughout your development lifecycle.
Getting Started
When seeking reliability, it is not hard for model evaluation to get complicated, fast. But let's start with something simple to get a feel for Okareo.
Okareo Basics
To use Okareo you will need an API Token, some data, and a model to test.
Okareo can evaluate a wide range of models. In each case, the process is very similar. The output and analytics for each can differ dramatically.
The following is a general outline for how to evaluate a model. For specific examples or instructions, please refer to the guides and examples.
Step 1: Get an API Token
- If you haven't already, sign-up for Okareo.
- Navigate to Settings > API Token and provision a token
- We suggest making the token available in your environment as
OKAREO_API_KEY
export OKAREO_API_KEY="<YOUR_TOKEN>"
Step 2: Install Okareo
Okareo is just an easy pip, yarn, or npm install away. We also expose all of our capabilities via API if you prefer.
- pip
- yarn
- npm
pip install Okareo
yarn -D add okareo-ts-sdk
npm install okareo-ts-sdk --save-dev
Step 3: Register a Model
Everyone's AI/Model evaluation needs are different. We have provided some common examples you can build on.
- Python
- Typescript
# Model endpoints can be Custom made by you.
# Or you can use one of our premade endpoint for OpenAI, Cohere, Pinecone, Qdrant, and more..
model_under_test = okareo.register_model(
name="Example Classifier",
project_id="",
model=<CUSTOM, OpenAI, Cohere, Pinecone, QDrant, ...>,
)
We have provided some common examples you can build on.
// Model endpoints can be Custom made by you.
// Or you can use one of our premade endpoint for OpenAI, Cohere, Pinecone, Qdrant, and more..
okareo.register_model(
ModelUnderTest({
name: Example Classifier",
tags: ["latest", "Example"],
project_id: project_id,
model: OpenAIModel({ //TCustomModel | TOpenAIModel | TCohereModel | TPineconeDB | TQDrant
model_id:"gpt-3.5-turbo",
temperature:0.5,
system_prompt_template:CLASSIFICATION_CONTEXT_TEMPLATE,
user_prompt_template:USER_PROMPT_TEMPLATE
api_key: OPENAI_API_KEY | "string"
}),
})
);
Step 4: Create a Scenario
Okareo scenarios can be used as defined or as seeds for synthetically generating variations to stretch your model and discover edges.
- Python
- Typescript
# Define a collection of scenario data
scenario_set = ScenarioSetCreate(
name="Scenario Name",
number_examples=10,
generation_type=ScenarioType.REPHRASE_INVARIANT,
seed_data=[
SeedData(
input_= {JSON} | "String" | Custom... ,
result={JSON} | "String" | Custom...
),
...
],
)
# Create the Scenario Set in Okareo from the scenario data
scenario = okareo.create_scenario_set(scenario_set)
print(f"{scenario.app_link}")
// Define a collection of scenario data
// Create the Scenario Set in Okareo from the scenario data
const scenario: any = await okareo.create_scenario_set(
{
name: "Scenario Name",
project_id: project_id,
number_examples: 10,
generation_type: ScenarioType.REPHRASE_INVARIANT,
seed_data=[
SeedData({
input: JSON | "String" | Custom... ,
result: JSON | "String" | Custom...
}),
...
],
}
);
console.log('Scenario App Link', scenario.app_link);
All Okareo primary objects (models, scenarios, and evaluations) can be accessed through the UI. Just print or share the .app_link
property.
Step 4: Run an Evaluation
Okareo can handle the round-trip evaluation from the cloud. This makes it easy to run evaluations of any size or length from CI or from your local workspace.
- Python
- Typescript
evaluation = model_under_test.run_test_v2(
name="Example Classifier Run",
scenario=scenario,
api_key=<MODEL_API_KEY>, # This is bsed on the model you defined
test_run_type=TestRunType.MULTI_CLASS_CLASSIFICATION,
calculate_metrics=True,
)
print(f"{evaluation.app_link}")
const evaluation: TestRunItem = await okareo.run_test({
name: "Example Classifier Run",
project_id: project_id,
scenario_id: sData.scenario_id,
calculate_metrics: true,
type: TestRunType.MULTI_CLASS_CLASSIFICATION,
}
);
console.log('Evaluation App Link', evaluation.app_link);