Get Started with Generations
With the availability of LLMs the generation of content has become easier than ever. However there are many dimensions in which this can go wrong - bias, hallucination, mis-representation, format, language, and many more.
What do you need?
You will need an environment for running Okareo. Typescript and Python are both available. Please see the SDK sections for more on how to setup each.
Cookbook examples for this guide are available:
- Colab Notebook
- Typescript Cookbook - Coming Soon
Example Generation Using OpenAI
In this example, we will use OpenAI to summarize text and then score the output and provide lexical and semantic comparison based on metrics unique to evaluation of LLM generation.
Download a complete Jupyter notebook - generation_eval.ipynb
Or, start from an Okareo cookbook for Typescript + Jest
Step 1: Setup Okareo and OpenAI
Make sure you have the API keys for Okareo and OpenAI available. We suggest making the keys available through environment variables named OKAREO_API_KEY
and OPENAI_API_KEY
.
Step 2: Register the Generation Model
Register a Model: Models can be shared across evaluation runs. To make this easier, all models can be referred to by name. It also means model names must be unique and that once they are defined, they can not be modified. Don't worry, it is just meta-data. You can make as many as you need.
Setup the generation prompt that you will use with OpenAI.
- Python
- Typescript
# Simple generation prompt for use with OpenAI'a GPT 3.5 Turbo model
USER_PROMPT_TEMPLATE = "{scenario_input}"
SUMMARIZATION_CONTEXT_TEMPLATE = """
You will be provided with text.
Summarize the text in 1 simple sentence.
"""
// Simple generation prompt for use with OpenAI'a GPT 3.5 Turbo model
// the {scenario_input} is replaced with each scenario input property in the evaluation run
const USER_PROMPT_TEMPLATE: string = "{scenario_input}"
const SUMMARIZATION_CONTEXT_TEMPLATE: string = `
You will be provided with text.
Summarize the text in 1 simple sentence.
`
Now, register the model with the context prompt that you will use for evaluation. Registered models can be reused across multiple scenarios or behavior checks.
- Python
- Typescript
# Evaluate the scenario and model combination and then get a link to the results on Okareo
from okareo import Okareo
from okareo.model_under_test import OpenAIModel
okareo = Okareo(OKAREO_API_KEY)
mut_name = "Example Generation Model"
model_under_test = okareo.register_model(
name=mut_name,
model=OpenAIModel(
model_id="gpt-3.5-turbo",
temperature=0,
system_prompt_template=SUMMARIZATION_CONTEXT_TEMPLATE,
user_prompt_template=USER_PROMPT_TEMPLATE,
),
)
// Evaluate the scenario and model combination and then get a link to the results on Okareo
import { Okareo, OpenAIModel } from "okareo-ts-sdk";
okareo = Okareo(OKAREO_API_KEY)
const model_under_test = await okareo.register_model({
name: "Example Generation Model",
tags: ["OpenAI", "Example"],
project_id: project_id,
models: {
type: "openai",
api_key: OPENAI_API_KEY,
model_id:"gpt-3.5-turbo",
temperature:0.0,
system_prompt_template:SUMMARIZATION_CONTEXT_TEMPLATE,
user_prompt_template:USER_PROMPT_TEMPLATE
} as OpenAIModel,
});
Step 3: Create a Scenario to Evaluate
Create Scenario: Scenarios are uploaded or are created synthetically within Okareo. The example here demonstrates how to upload jsonl.
At scale, it is more common to check-in and upload jsonl directly in CI.
- Python
- Typescript
import os
import tempfile
webbizz_articles = os.popen('curl https://raw.githubusercontent.com/okareo-ai/okareo-python-sdk/main/examples/webbizz_10_articles.jsonl').read()
temp_dir = tempfile.gettempdir()
file_path = os.path.join(temp_dir, "webbizz_10_articles.jsonl")
with open(file_path, "w+") as file:
lines = webbizz_articles.split('\n')
# Use the first 3 json objects to make a scenario set with 3 scenarios
for i in range(3):
file.write(f"{lines[i]}\n")
scenario = okareo.upload_scenario_set(file_path=file_path, scenario_name="Webbizz Articles Scenario")
# make sure to clean up tmp file
os.remove(file_path)
print(f"https://app.okareo.com/project/{scenario.project_id}/scenario/{scenario.scenario_id}")
const scenario: any = await okareo.upload_scenario_set({
file_path: "./examples/webbizz_10_articles.jsonl",
scenario_name: "Webbizz Articles Scenario",
project_id: project_id
});
console.log(`Scenario: ${scenario.app_link}`);
Step 4: Evaluate the Scenario
Evaluation: Okareo has a built-in test harness for running evaluations directly from the cloud. This makes it easy to run quick or long tests from CI or from your local workspace.
- Python
- Typescript
# Evaluate the scenario and model combination and then get a link to the results on Okareo
from okareo_api_client.models.test_run_type import TestRunType
eval_name = "Example Generation"
evaluation = model_under_test.run_test(
name=eval_name,
scenario=scenario,
api_key=OPENAI_API_KEY,
test_run_type=TestRunType.NL_GENERATION,
calculate_metrics=True,
)
print(f"See results in Okareo: {evaluation.app_link}")
// Evaluate the scenario and model combination and then get a link to the results on Okareo
import { Okareo, TestRunType } from "okareo-ts-sdk";
const evaluation: any = await model_under_test.run_test({
name: "Example Generation",
tags: ["Generation", "BUILD_ID"],
model_api_key: OPENAI_API_KEY,
project_id: project_id,
scenario: scenario,
calculate_metrics: true,
type: TestRunType.NL_GENERATION,
});
Step 5: Review Results
Results: Navigate to your last evaluation either within app.okareo.com or directly from the link generated in the example to view evaluation results.
Okareo automaticlly calculates metrics and provides an error matrix to compare expected to actual results for evaluations identified as NL_GENERATION
.