Typescript SDK

Okareo has a rich set of APIs that you can explore through the API Guide. This SDK provides access to all of the Okareo API endpoints through the OpenAPI spec. It also provides convenience functions that make testing and development with the Okareo platform faster.

In addition to making model baseline evaluation available in development, you can use this SDK to drive automation such as in CI/CD or elsewhere.

The Typescript library is transpiled to javascript. As a result this SDK can be used in any common js project.

tip

The SDK requires an API Token. Refer to the Okareo API Key guide for more information.

Overview

Automating Okareo through Typescript can be done multiple ways.

Okareo CLI - Using the Okareo CLI directly will give you the ability to write Typescript/Javascript while keeping Okareo independent from the rest of your project. Refer to the Okareo SDK/CLI pages to learn more.
Unit Testing - We find that models are usually part of a larger applicaiton context. When this is the case, it is beneficial to run model evaluations and scenario expansion as part of your general CI/CD process.

The Okareo cookbooks in github okareo-cookbook provide examples you can build from using the CLI directly, driving Okareo from Jest and more.

SDK Installation

NPM
Yarn

npm install -D okareo-ts-sdk

yarn add -D okareo-ts-sdk

Using the Okareo Typescript SDK

Jest: Hello Projects!

The following Jest example creates an Okareo instance, requests a list of projects, and then verifies that more than zero projects are returned.

import { Okareo } from 'okareo-ts-sdk';
const OKAREO_API_KEY = process.env.OKAREO_API_KEY;
describe('Example', () => {
    test('Get All Projects', async () =>  {
        const okareo = new Okareo({api_key:OKAREO_API_KEY});
        const projects: any[] = await okareo.getProjects();
        expect(projects.length).toBeGreaterThanOrEqual(0);
    });
});

AI/LLM Evaluation Workflow

The following script synthetically transforms a set of direct requests into passive questions and then evaluates the core_app.getIntentContextTemplate(user, chat_history) context through OpenAI to determine if actual intent is maintainted. The number of synthetic examples created is 3 times the number of rows in the DIRECTED_INPUT data passed in.

import { Okareo, OpenAIModel, RunTestProps, ClassificationReporter } from 'okareo-ts-sdk';

const OKAREO_API_KEY = process.env.OKAREO_API_KEY;

const main = async () => {
    try {
        const okareo = new Okareo({api_key:process.env.OKAREO_API_KEY });

        const sData: any = await okareo.create_scenario_set({
            name: "Detect Passive Intent",
            project_id: project_id,
            number_examples: 3,
            generation_type: ScenarioType.TEXT_REVERSE_QUESTION,
            seed_data: DIRECTED_INTENT
        });
        
        const model_under_test = await okareo.register_model({
            name: "User Chat Intent - 3.5 Turbo",
            tags: ["TS-SDK", "Testing"],
            project_id: project_id,
            models: {
                type: "openai",
                model_id:"gpt-3.5-turbo",
                temperature:0.5,
                system_prompt_template:core_app.getIntentContextTemplate(user, chat_history),
                user_prompt_template:`{scenario_input}`
            } as OpenAIModel
        });

        const eval_run: any = await model_under_test.run_test({
            name: "TS-SDK Classification",
            tags: ["Classification", "BUILD_ID"],
            model_api_key: OPENAI_API_KEY,
            project_id: project_id,
            scenario_id: sData.scenario_id,
            calculate_metrics: true,
            type: TestRunType.MULTI_CLASS_CLASSIFICATION,
        } as RunTestProps );

        const reporter = new ClassificationReporter({
            eval_run, 
            error_max: 2, // allows for up to 2 errors 
            metrics_min: {
                precision: 0.95,
                recall: 0.9,
                f1: 0.9,
                accuracy: 0.95
            },
        });
        reporter.log(); // logs a table to the console output with the report results

    } catch (error) {
        console.error(error);
    }
}

main();

Typescript SDK and Okareo API

The Okareo Typescript SDK is a set of convenience functions and wrappers for the Okareo REST API.

warning

Reporters are only supported in Typescript.
If you are interested in Python support, please let us know.

`Class Okareo`

`create_or_update_check`

This uploads or updates a check with the specified name. If the name for the check exists already and the check name is not shared with a predefined Okareo check, then that check will be overwritten. Returns a detailed check response object.

There are two types of checks - Code (Deterministic) and Behavioral (Model Judge). Code based checks are very fast and entirely predictable. They are code. Behavioral checks pass judgement based on inference. Behavioral checks are slower and can be less predictable. However, they are occasionally the best way to express behavioral expectaions. For example, "did the model expose private data?" is hard to analyze deterministically.

Code checks use Python. Okareo will generate the python for you with the typescript okareo.generate_check SDK function. You can then pass the code result to okareo.create_or_update_check

Usage
Result
my_custom_check.py

// For code checks (e.g. deterministic)
okareo.create_or_update_check({
    name: str,
    description: str,
    check_config: {
        type: CheckOutputType.Score | CheckOutputType.PASS_FAIL,
        code_contents: <CHECK_PYTHON_CODE> // Python code that inherits from BaseCheck
    }
});

//For behavioral checks (e.g. prompt/judges)
okareo.create_or_update_check({
    name: str,
    description: str,
    check_config: {
        type: CheckOutputType.Score | CheckOutputType.PASS_FAIL,
        prompt_template: <CHECK_PROMPT> // The prompt describing the desired behavior
    }
});

/** EvaluatorDetailedResponse */
EvaluatorDetailedResponse: {
    /**
        * Id
        * Format: uuid
        */
    id?: string;
    /**
        * Project Id
        * Format: uuid
        */
    project_id?: string;
    /** Name */
    name?: string;
    /**
        * Description
        * @default
        */
    description?: string;
    /** Requires Scenario Input */
    requires_scenario_input?: boolean;
    /** Requires Scenario Result */
    requires_scenario_result?: boolean;
    /**
        * Output Data Type
        * @default
        */
    output_data_type?: string;
    /**
        * Code Contents
        * @default
        */
    code_contents?: string;
    /**
        * Time Created
        * Format: date-time
        */
    time_created?: string;
    /** Warning */
    warning?: string;
    /** Check Config */
    check_config?: Record<string, never>;
};

from okareo.checks import CodeBasedCheck
# any other imports required for your check

class Check(CodeBasedCheck):
    @staticmethod
    def evaluate(
        model_output: str, scenario_input: str, scenario_result: str, metadata: dict
    ) -> Union[bool, int, float]:
        # Your code here 
        output = ...
        return output

`delete_check`

Deletes the check with the provided ID and name.

okareo.delete_check("<CHECK-UUID>", "<CHECK-NAME>")
/*
Check deletion was successful
*/

`create_scenario_set`

A scenario set is the Okareo unit of data collection. Any scenario can be used to drive a registered model or as a seed for synthetic data generation. Often both.

Usage
Details
Result

    import { Okareo, SeedData, ScenarioType } from "okareo-ts-sdk";
    const okareo = new Okareo({api_key:OKAREO_API_KEY});

    okareo.create_scenario_set(
        {
            name:"NAME OF SCENARIO",
            project_id: PROJECT_ID,
            number_examples:1,
            generation_type: ScenarioType.SEED 
            seed_data: [
                SeedData({
                    input:"Example input to be sent to the model",  
                    result:"Expected result from the model"
                }),
            ]
        }
    )

Takes a single argument ScenarioSetCreate

async create_scenario_set(props: components["schemas"]["ScenarioSetCreate"]): Promise<components["schemas"]["ScenarioSetResponse"]> {
    //...
}

import { components } from "okareo-ts-sdk";
//components["schemas"]["ScenarioSetCreate"]
ScenarioSetCreate: {
    /**
    * Project Id
    * Format: uuid
    * @description ID for the project
    */
    project_id?: string;

    /**
    * Name
    * @description Name of the scenario set
    */
    name: string;

    /**
    * Seed Data
    * @description Seed data is a list of dictionaries, each with an input and result
    */
    seed_data: components["schemas"]["SeedData"][];

    /**
    * Number Examples
    * @description Number of examples
    */
    number_examples: number;

    /**
    * @description Type of generation. Current supported scenario types are:<br />
    *         Seed: Seed data for a scenario set<br />
    *         Rephrase invariant: Results will be rephrased versions of inputs<br />
    *         Conditional: Results will be rephrased inputs represented in a conditional format<br />
    *         Text reverse question: The result will be the target question for the input<br />
    *         Text reverse label: The result will be the intent of the target question for the input
    * @default SEED
    */
    generation_type?: components["schemas"]["ScenarioType"];
    
    /**
    * @description Tone to use for scenario generation.
    * @default Neutral
    */
    generation_tone?: components["schemas"]["GenerationTone"];
};

import { components } from "okareo-ts-sdk";
// components["schemas"]["ScenarioSetResponse"]
/** ScenarioSetResponse */
ScenarioSetResponse: {
    /**
    * Scenario Id
    * Format: uuid
    */
    scenario_id: string;

    /**
    * Project Id
    * Format: uuid
    */
    project_id: string;

    /**
    * Time Created
    * Format: date-time
    */
    time_created: string;

    /** Type */
    type: string;
    /**
    * Tags
    * @default []
    */
    tags?: string[];

    /** Name */
    name?: string;
    /**
    * Seed Data
    * @default []
    */
    seed_data?: components["schemas"]["SeedData"][];

    /**
    * Scenario Count
    * @default 0
    */
    scenario_count?: number;

    /**
    * Scenario Input
    * @default []
    */
    scenario_input?: string[];

    /**
    * App Link
    * @description This URL links to the Okareo webpage for this scenario set
    * @default
    */
    app_link?: string;
};

`find_datapoints`

Datapoints are accessible for research and analysis as part of CI or elsewhere. Datapoints can be returned from a broad range of dimension criteria. Typicaly some combination of time, feedback, and model are used. But there are many others available.

Usage
Details
Result

import { Okareo, DatapointSearch } from "okareo-ts-sdk";
const okareo = new Okareo({api_key:OKAREO_API_KEY});

const data: any = await okareo.find_datapoints(
    DatapointSearch({ 
        project_id: project_id,
        mut_id: model_id,
    })
);

async find_datapoints(props: components["schemas"]["DatapointSearch"]): Promise<components["schemas"]["DatapointListItem"][]> {
    //...
}

import { components } from "okareo-ts-sdk";
// components["schemas"]["DatapointSearch"]
DatapointSearch: {
    /**
    * Tags
    * @description Tags are strings that can be used to filter datapoints in the Okareo app
    * @default []
    */
    tags?: string[];
    /**
    * From Date
    * Format: date-time
    * @description Earliest date
    * @default 2022-12-31T23:59:59.999999
    */
    from_date?: string;
    /**
    * To Date
    * Format: date-time
    * @description Latest date
    */
    to_date?: string;
    /**
    * Feedback
    * @description Feedback is a 0 to 1 float value that captures user feedback range for related datapoint results
    */
    feedback?: number;
    /** Error Code */
    error_code?: string;
    /**
    * Context Token
    * @description Context token is a unique token to link various datapoints which originate from the same context
    */
    context_token?: string;
    /**
    * Project Id
    * @description Project ID
    */
    project_id?: string;
    /**
    * Mut Id
    * Format: uuid
    * @description Model ID
    */
    mut_id?: string;
    /**
    * Test Run Id
    * Format: uuid
    * @description Test run ID
    */
    test_run_id?: string;
};

Returns an array of DatapointListItem objects

import { components } from "okareo-ts-sdk";
// components["schemas"]["DatapointListItem"]
DatapointListItem: {
    /**
    * Id
    * Format: uuid
    */
    id: string;
    /**
    * Tags
    * @default []
    */
    tags?: string[];
    /** Input */
    input?: Record<string, never> | unknown[] | string;
    /**
    * Input Datetime
    * Format: date-time
    */
    input_datetime?: string;
    /** Result */
    result?: Record<string, never> | unknown[] | string;
    /**
    * Result Datetime
    * Format: date-time
    */
    result_datetime?: string;
    /** Feedback */
    feedback?: number;
    /** Error Message */
    error_message?: string;
    /** Error Code */
    error_code?: string;
    /**
    * Time Created
    * Format: date-time
    */
    time_created?: string;
    /** Context Token */
    context_token?: string;
    /**
    * Mut Id
    * Format: uuid
    */
    mut_id?: string;
    /**
    * Project Id
    * Format: uuid
    */
    project_id?: string;
    /**
    * Test Run Id
    * Format: uuid
    */
    test_run_id?: string;
};

`generate_check`

Generates the contents of a .py file for implementing a CodeBasedCheck based on an EvaluatorSpecRequest. Pass the generated_code of this method's result to the create_or_update_check function to make the check available within Okareo.

Usage
Result

const check = okareo.generate_check({
    project_id: "",
    description: "Return True if the model_output is at least 20 characters long, otherwise return False.",
    requires_scenario_input: false, // True if check uses scenario input
    requires_scenario_result: false, // True if check uses scenario result
    output_data_type: "bool" | "int" | "float", // if pass/fail: 'bool'. if score: 'int' | 'float'
})

/** EvaluatorGenerateResponse */
EvaluatorGenerateResponse: {
    /** Name */
    name?: string;
    /** Description */
    description?: string;
    /** Requires Scenario Input */
    requires_scenario_input?: boolean;
    /** Requires Scenario Result */
    requires_scenario_result?: boolean;
    /** Output Data Type */
    output_data_type?: string;
    /** Generated Code */
    generated_code?: string;
};

`generate_scenario_set`

Generate synthetic data based on a prior scenario. The seed scenario could be from a prior evaluation run, an upload, or statically defined.

Usage
Details
Result

import { Okareo } from "okareo-ts-sdk";
const okareo = new Okareo({api_key:OKAREO_API_KEY});

const data: any = await okareo.generate_scenario_set(
    {
        project_id: project_id,
        name: "EXAMPLE SCENARIO NAME",
        source_scenario_id: "SOURCE_SCENARIO_ID",
        number_examples: 2,
        generation_type: ScenarioType.REPHRASE_INVARIANT,
    }
)

Take a single argument ScenarioSetGenerate

async generate_scenario_set(props: components["schemas"]["ScenarioSetGenerate"]): Promise<components["schemas"]["ScenarioSetResponse"]> {
    //...
}

import { components } from "okareo-ts-sdk";
// components["schemas"]["ScenarioSetGenerate]
/** ScenarioSetGenerate */
ScenarioSetGenerate: {
    /**
    * Project Id
    * Format: uuid
    * @description ID for the project
    */
    project_id?: string;
    /**
    * Source Scenario Id
    * Format: uuid
    * @description ID for the scenario set that the generated scenario set will use as a source
    */
    source_scenario_id: string;
    /**
    * Name
    * @description Name of the generated scenario set
    */
    name: string;
    /**
    * Number Examples
    * @description Number of examples to be generated for the scenario set
    */
    number_examples: number;
    /**
    * @description Type of generation. Current supported scenario types are:<br />
    *         Seed: Seed data for a scenario set<br />
    *         Rephrase invariant: Results will be rephrased versions of inputs<br />
    *         Conditional: Results will be rephrased inputs represented in a conditional format<br />
    *         Text reverse question: The result will be the target question for the input<br />
    *         Text reverse label: The result will be the intent of the target question for the input
    * @default REPHRASE_INVARIANT
    */
    generation_type?: components["schemas"]["ScenarioType"];
    /**
    * @description Tone to use for scenario generation.
    * @default Neutral
    */
    generation_tone?: components["schemas"]["GenerationTone"];
};

import { components } from "okareo-ts-sdk";
//components["schemas"]["ScenarioSetResponse"]
/** ScenarioSetResponse */
ScenarioSetResponse: {
    /**
    * Scenario Id
    * Format: uuid
    */
    scenario_id: string;
    /**
    * Project Id
    * Format: uuid
    */
    project_id: string;
    /**
    * Time Created
    * Format: date-time
    */
    time_created: string;
    /** Type */
    type: string;
    /**
    * Tags
    * @default []
    */
    tags?: string[];
    /** Name */
    name?: string;
    /**
    * Seed Data
    * @default []
    */
    seed_data?: components["schemas"]["SeedData"][];
    /**
    * Scenario Count
    * @default 0
    */
    scenario_count?: number;
    /**
    * Scenario Input
    * @default []
    */
    scenario_input?: string[];
    /**
    * App Link
    * @description This URL links to the Okareo webpage for this scenario set
    * @default
    */
    app_link?: string;
};

`get_all_checks`

Return the list of all available checks. The returned list will include both predefined checks in Okareo as well as custom checks uploaded in association with your current organization.

Usage
Result

okareo.get_all_checks()

EvaluatorBriefResponse: {
    /**
        * Id
        * Format: uuid
        */
    id?: string;
    /** Name */
    name?: string;
    /**
        * Description
        * @default
        */
    description?: string;
    /**
        * Output Data Type
        * @default
        */
    output_data_type?: string;
    /**
        * Time Created
        * Format: date-time
        */
    time_created?: string;
    /** Check Config */
    check_config?: Record<string, never>;
};

`get_check`

Returns a detailed check response object. Useful if you have a check's ID and want to get more information about the check.

Usage
Result

okareo.get_check("<UUID-FOR-CHECK>")

EvaluatorDetailedResponse: {
    /**
        * Id
        * Format: uuid
        */
    id?: string;
    /**
        * Project Id
        * Format: uuid
        */
    project_id?: string;
    /** Name */
    name?: string;
    /**
        * Description
        * @default
        */
    description?: string;
    /** Requires Scenario Input */
    requires_scenario_input?: boolean;
    /** Requires Scenario Result */
    requires_scenario_result?: boolean;
    /**
        * Output Data Type
        * @default
        */
    output_data_type?: string;
    /**
        * Code Contents
        * @default
        */
    code_contents?: string;
    /**
        * Time Created
        * Format: date-time
        */
    time_created?: string;
    /** Warning */
    warning?: string;
    /** Check Config */
    check_config?: Record<string, never>;
};

`run_test`

Run a test directly from a registered model. This requires both a registered model and at least one scenario.

The run_test function is called on a registered model in the form model_under_test.run_test(...). If your model requires an API key to call, then you will need to pass your key in the api_key parameter. Your API keys are not stored by Okareo.

warning

Depending on size and complexity, model runs can take a long time to evaluate. Use scenarios appropriate in size to the task at hand.

Classification
Retrieval
Generation

Read the Classification Overview to learn more about classificaiton evaluations in Okareo.

// Classification evaluations return accuracy, precision, recall, and f1 scores.
const model_under_test = okareo.register_model(...);
const test_run_response: any = await model_under_test.run_test({
    name:"<YOUR_TEST_RUN_NAME>",
    tags: [<OPTIONAL_ARRAY_OF_STRING_TAGS>],
    project_id: project_id,
    scenario_id:"<YOUR_SCENARIO_ID>",
    model_api_key: "<YOUR_MODEL_API_KEY>", //Key for OpenAI, Cohere, Pinecone, QDrant, etc.,
    calculate_metrics: true,
    type: TestRunType.MULTI_CLASS_CLASSIFICATION,
} as RunTestProps);
/*
test_run_response: {
    id:str,
    project_id:str,
    mut_id:str,
    scenario_set_id:str,
    name:str,
    tags:Array[str],
    type:'MULTI_CLASS_CLASSIFICATION',
    start_time:Date,
    end_time=Date,
    test_data_point_count:int,
    model_metrics: {
        'weighted_average': {
            'precision': float,
            'recall': float,
            'f1': float,
            'accuracy': float
        },
        'scores_by_label': {
            'label_1': {
                'precision': float,
                'recall': float,
                'f1': float
            },
            ...,
            'label_N': {
                'precision': float,
                'recall': float,
                'f1': float
            },
        }
    },
    error_matrix: [
        {'label_1': [int, ..., int]},
        ...,
        {'label_N': [int, ..., int]}
    ],
    app_link: str
}
*/

Read the Retrieval Overview to learn more about retrieval evaluations in Okareo.

// Specify retrieval metrics and corresponding K values.
// Below, we use the same k_vals for all available metrics,
// but you can specify any subset of these metrics with 
// different sets of K values to evaluate.
const k_max = 5;
const k_vals = [1, 2, 5, 7, 10];
const metrics_kwargs = {
    "accuracy_at_k": k_vals,
    "precision_recall_at_k": k_vals,
    "ndcg_at_k": k_vals,
    "mrr_at_k": k_vals,
    "map_at_k": k_vals,
}
const model_under_test = okareo.register_model(...);
const test_run_response: any = await model_under_test.run_test({
    name:"<YOUR_TEST_RUN_NAME>",
    project_id: project_id,
    scenario_id:"<YOUR_SCENARIO_ID>",
    type:TestRunType.INFORMATION_RETRIEVAL,
    model_api_key: "<YOUR_MODEL_API_KEY>", //Key for OpenAI, Cohere, Pinecone, QDrant, etc.,
    metrics_kwargs: metrics_kwargs,
} as RunTestProps);
/*
test_run_response: {
    id:str,
    project_id:str,
    mut_id:str,
    scenario_set_id:str,
    name:str,
    tags:Array[str],
    type:'INFORMATION_RETRIEVAL',
    start_time:Date,
    end_time=Date,
    test_data_point_count:int,
    model_metrics: {
        'Accuracy@k': {'1': float, ..., '5': float},
        'Precision@k': {'1': float, ..., '5': float},
        'Recall@k': {'1': float, ..., '5': float},
        'NDCG@k': {'1': float, ..., '5': float},
        'MRR@k': {'1': float, ..., '5': float},
        'MAP@k': {'1': float, ..., '5': float},
        'row_level_metrics': {
            '<UUID-FOR-ROW-1>': {
                '1': {'accuracy': float, 'precision': float, 'recall': float, 'mrr': float, 'ndcg': float, 'map': float},
                ...,
                '5': {'accuracy': float, 'precision': float, 'recall': float, 'mrr': float, 'ndcg': float, 'map': float},
            },
            ...,
            '<UUID-FOR-ROW-N>': {
                '1': {'accuracy': float, 'precision': float, 'recall': float, 'mrr': float, 'ndcg': float, 'map': float},
                ...,
                '5': {'accuracy': float, 'precision': float, 'recall': float, 'mrr': float, 'ndcg': float, 'map': float},
            }
        }
    },
    error_matrix=[],
    app_link: str
}
*/

To perform evaluations of generative models, you will need to specify your desired checks.

Read the Generation Overview to learn more about generation evaluations in Okareo.

const model_under_test = okareo.register_model(...);
const test_run_response: any = await model_under_test.run_test({
    model_api_key: "<YOUR_MODEL_API_KEY>", //Key for OpenAI, Cohere, Pinecone, QDrant, etc.,
    name:"<YOUR_TEST_RUN_NAME>",
    tags: [<OPTIONAL_ARRAY_OF_STRING_TAGS>],
    project_id: project_id,
    scenario_id:"<YOUR_SCENARIO_ID>",
    calculate_metrics: true,
    checks: ['CHECK_NAME_1', ..., 'CHECK_NAME_N']
    type: TestRunType.NL_GENERATION,
} as RunTestProps);
/*
    test_run_response: {
        id:str,
        project_id:str,
        mut_id:str,
        scenario_set_id:str,
        name:str,
        tags:Array[str],
        type:'NL_GENERATION',
        start_time:Date,
        end_time=Date,
        test_data_point_count:int,
        model_metrics: {
            'mean_scores': {
                'CHECK_NAME_1' : float,
                ...,
                'CHECK_NAME_N': float,
            },
            'scores_by_row': [
                {
                    'scenario_index': 1,
                    'test_id': "UUID-FOR-ROW-1",
                    'CHECK_NAME_1': float,
                    ...,
                    'CHECK_NAME_N': float,
                },
                ...,
                {
                    'scenario_index': M,
                    'test_id': "UUID-FOR-ROW-M",
                    'CHECK_NAME_1': float,
                    ...,
                    'CHECK_NAME_N': float,
                }
            ]
        },
        error_matrix: [],
        app_link: str
    }
*/

`ScenarioType`

// import { ScenarioType } from "okareo-ts-sdk";
export declare enum ScenarioType {
    COMMON_CONTRACTIONS = "COMMON_CONTRACTIONS",
    COMMON_MISSPELLINGS = "COMMON_MISSPELLINGS",
    CONDITIONAL = "CONDITIONAL",
    LABEL_REVERSE_INVARIANT = "LABEL_REVERSE_INVARIANT",
    NAMED_ENTITY_SUBSTITUTION = "NAMED_ENTITY_SUBSTITUTION",
    NEGATION = "NEGATION",
    REPHRASE_INVARIANT = "REPHRASE_INVARIANT",
    ROUNDTRIP_INVARIANT = "ROUNDTRIP_INVARIANT",
    SEED = "SEED",
    TERM_RELEVANCE_INVARIANT = "TERM_RELEVANCE_INVARIANT",
    TEXT_REVERSE_LABELED = "TEXT_REVERSE_LABELED",
    TEXT_REVERSE_QUESTION = "TEXT_REVERSE_QUESTION"
}

Okareo has multiple synthetic data generators. We have provided details about each generator type below:

Common Contractions

ScenarioType.COMMON_CONTRACTIONS

Each input in the scenario will be shortened by 1 or 2 characters. For example, if the input is What is a steering wheel?, the generated input could be What is a steering whl?.

Common Misspellings

ScenarioType.COMMON_MISSPELLINGS

Common misspellings of the inputs will be generated. For example, if the input is What is a reciept?, the generated input could be What is a reviept?

Conditional

ScenarioType.CONDITIONAL

Each input in the scenario will be rephrased as a conditional statement. For example, if the input is What are the side effects of this medicine?, the generated input could be Considering this medicine, what might be the potential side effects?.

Rephrase

ScenarioType.REPHRASE_INVARIANT

Rephrasings of the inputs will be generated. For example, if the input is Neil Alden Armstrong was an American astronaut and aeronautical engineer who in 1969 became the first person to walk on the Moon, the generated input could be Neil Alden Armstrong, an American astronaut and aeronautical engineer, made history in 1969 as the first individual to set foot on the Moon.

Reverse Question

ScenarioType.TEXT_REVERSE_QUESTION

Each input in the scenario will be rephrased as a question that the input should be the answer for. For example, if the input is The first game of baseball was played in 1846., the generated input could be When was the first game of baseball ever played?.

Seed

ScenarioType.SEED

The simplest of all generators. It does nothing. A true NoOp.

Term Relevance

ScenarioType.TERM_RELEVANCE_INVARIANT

Each input in the scenario will be rephrased to only include the most relevant terms, where relevance is based on the list of inputs provided to the scenario. We will then use parts of speech to determine an valid ordering of relevant terms. For example, if the inputs are all names of various milk teas such as Cool Sweet Honey Taro Milk Tea with Brown Sugar Boba, the generated input could be Taro Milk Tea, since Taro, Milk, and Tea could be the most relevant terms.

`get_scenario_sets`

Return one or more scenarios based on the project_id or a specific project_id + scenario_id pair

Usage
Details
Result

import { Okareo, components } from "okareo-ts-sdk";

const okareo = new Okareo({api_key:OKAREO_API_KEY});
const project_id = "YOUR_PROJECT_ID";
const scenario_id = "YOUR_SCENARIO_ID";

const all_scenarios = await okareo.get_scenario_sets({ project_id });
// or
const specific_scenario = await okareo.get_scenario_sets({ project_id, scenario_id });

Takes two arguments project_id and scenario_id.

project_id: string

scenario_id: string // not required

import { components } from "okareo-ts-sdk";
// components["schemas"]["ScenarioSetResponse"][]
/** ScenarioSetResponse */
ScenarioSetResponse: {
    /**
     * Scenario Id
     * Format: uuid
     */
    scenario_id: string;
    /**
     * Project Id
     * Format: uuid
     */
    project_id: string;
    /**
     * Time Created
     * Format: date-time
     */
    time_created: string;
    /** Type */
    type: string;
    /**
     * Tags
     * @default []
     */
    tags?: string[];
    /** Name */
    name?: string;
    /**
     * Seed Data
     * @default []
     */
    seed_data?: components["schemas"]["SeedData"][];
    /**
     * Scenario Count
     * @default 0
     */
    scenario_count?: number;
    /**
     * Scenario Input
     * @default []
     */
    scenario_input?: string[];
    /**
     * App Link
     * @description This URL links to the Okareo webpage for this scenario set
     * @default
     */
    app_link?: string;
    /** Warning */
    warning?: string;
};

`get_scenario_data_points`

Return each of the datapoints related to a single evaluation run

Usage
Details
Result

import { Okareo, components } from "okareo-ts-sdk";
async get_scenario_data_points(scenario_id: string): Promise<components["schemas"]["ScenarioDataPoinResponse"][]> {
    //...
}

Take a single argument scenario_id

scenario_id: string

import { components } from "okareo-ts-sdk";
// components["schemas"]["ScenarioDataPoinResponse"]
/** ScenarioDataPoinResponse */
ScenarioDataPoinResponse: {
    /**
    * Id
    * Format: uuid
    */
    id: string;
    /** Input */
    input: Record<string, never> | unknown[] | string;
    /** Result */
    result: Record<string, never> | unknown[] | string;
    /**
    * Meta Data
    * Format: json-string
    */
    meta_data?: string;
}

`get_test_run`

Return a previously run test. This is useful for "hill-climbing" where you look at a prior run, make changes and re-run or if you want to baseline the current run from the last.

Usage
Details
Result

import { Okareo, components } from "okareo-ts-sdk";
async get_test_run(test_run_id: string): Promise<components["schemas"]["TestRunItem"]> {
    //...
}

Take a single argument test_run_id

test_run_id: string

import { components } from "okareo-ts-sdk";
// components["schemas"]["TestRunItem"]
/** TestRunItem */
TestRunItem: {
/**
* Id
* Format: uuid
*/
id: string;
/**
* Project Id
* Format: uuid
*/
project_id: string;
/**
* Mut Id
* Format: uuid
*/
mut_id: string;
/**
* Scenario Set Id
* Format: uuid
*/
scenario_set_id: string;
/** Name */
name?: string;
/**
* Tags
* @default []
*/
tags?: string[];
/** Type */
type?: string;
/**
* Start Time
* Format: date-time
*/
start_time?: string;
/**
* End Time
* Format: date-time
*/
end_time?: string;
/** Test Data Point Count */
test_data_point_count?: number;
/** Model Metrics */
model_metrics?: Record<string, never>;
/** Error Matrix */
error_matrix?: unknown[];
/**
* App Link
* @description This URL links to the Okareo webpage for this test run
* @default
*/
app_link?: string;
};

`register_model`

Register the model that you want to evaluate, test or collect datapoints from. Models must be uniquely named within a project namespace.

In order to run a test, you will need to register a model. If you have already registered a model with the same name, the existing model will be returned. The model data is only updated if the "update: true" flag is passed.

warning

The first time a model is defined, the attributes of the model are persisted. Subsequent calls to register_model will return the persisted model. They will not update the definition.

Usage
Details
Result

import { Okareo, CustomModel } from "okareo-ts-sdk";
const okareo = new Okareo({api_key:OKAREO_API_KEY});
const model_under_test = await okareo.register_model({
    name: "Example Custom Model",
    tags: ["Custom", "End-2-End"],
    project_id: project_id,
    models: {
        type: "custom",
        invoke: (input: string) => { 
            return {
                actual: "Technical Support",
                model_response: {
                    input: input,
                    method: "hard coded",
                    context: "Example context response",
                }
            }
        }
    } as CustomModel
});

The passed properties for the register function are based on the type of model being used. See the model types for more information.

async register_model(props: any): Promise<components["schemas"]["ModelUnderTestResponse"]> {
    //...
}

interface BaseModel {
    type?: string | undefined;
    tags?: string[] | undefined;
}
export interface TCustomModel extends BaseModel {
    invoke: Function;
}
export interface TCustomModelResponse {
    actual: any | string;
    response: any | string;
}

// components["schemas"]["ModelUnderTestResponse"]
/** ModelUnderTestResponse */
ModelUnderTestResponse: {
    /**
    * Id
    * Format: uuid
    */
    id: string;
    /**
    * Project Id
    * Format: uuid
    */
    project_id: string;
    /** Name */
    name: string;
    /** Tags */
    tags: string[];
    /** Time Created */
    time_created: string;
    /** Datapoint Count */
    datapoint_count?: number;
    /**
    * App Link
    * @description This URL links to the Okareo webpage for this model
    * @default
    */
    app_link?: string;
};

Okareo has ready-to-run integrations with the following models and vector databases. Don't hesitate to reach out if you need another model.

OpenAI (LLM)

import { OpenAIModel } from 'okareo';

interface OpenAIModel extends BaseModel {
    type: "openai";
    model_id: string;
    temperature: number;
    system_prompt_template: string;
    user_prompt_template: string;
    dialog_template: string;
    tools?: unknown[];
}

Generation Model (LLM)

import { GenerationModel } from 'okareo';

interface GenerationModel extends BaseModel {
    type: "generation";
    model_id: string;
    temperature: number;
    system_prompt_template: string;
    user_prompt_template: string;
    dialog_template: string;
    tools?: unknown[];
}

The GenerationModel is a universal LLM interface that supports most model providers. Users can plug in different model names, including OpenAI, Anthropic, and Cohere models.

Example using Cohere model with GenerationModel:

import { GenerationModel } from 'okareo';

const cohereModel: GenerationModel = {
    type: "generation",
    model_id: "command-r",
    temperature: 0.7,
    system_prompt_template: "You are a helpful assistant.",
};

Example with tools:

import { GenerationModel } from 'okareo';

const tools = [
    {
        type: "function",
        function: {
            name: "get_current_weather",
            description: "Get the current weather in a given location",
            parameters: {
                type: "object",
                properties: {
                    location: {
                        type: "string",
                        description: "The city and state, e.g. San Francisco, CA"
                    },
                    unit: {
                        type: "string",
                        enum: ["celsius", "fahrenheit"]
                    }
                },
                required: ["location"]
            }
        }
    }
];

const modelWithTools: GenerationModel = {
    type: "generation",
    model_id: "gpt-3.5-turbo-0613",
    temperature: 0.7,
    system_prompt_template: "You are a helpful assistant with access to weather information.",
    tools: tools
};

In these examples, we're using the Cohere "command-r" model and the OpenAI "gpt-3.5-turbo-0613" model through the GenerationModel interface. The second example demonstrates how to include tools, which can be used for function calling capabilities.

Pinecone (VectorDB)

//import { TPineconeDB } from "okareo-ts-sdk";
export interface TPineconeDB extends BaseModel {
    type?: string | undefined; // from BaseModel
    tags?: string[] | undefined; // from BaseModel
    index_name: string;
    region: string;
    project_id: string;
    top_k: string;
}

QDrant (VectorDB)

//import { TQDrant } from "okareo-ts-sdk";
export interface TQDrant extends BaseModel {
    type?: string | undefined; // from BaseModel
    tags?: string[] | undefined; // from BaseModel
    collection_name: string;
    url: string;
    top_k: string;
}

Custom Model

You can use the CustomModel object to define your own custom, provider-agnostic models.

//import { TCustomModel } from "okareo-ts-sdk";
export interface TCustomModel extends BaseModel {
    invoke(input: string): { 
        actual: any | string; 
        model_response: {
            input: any | string;
            method: any | string;
            context: any | string; 
        }
    };
}

To use the CustomModel object, you will need to implement an invoke method that returns a ModelInvocation object. For example,

import { CustomModel, ModelInvocation } from "okareo-ts-sdk";

const my_custom_model: CustomModel =  {
    type: "custom",
    invoke: (input: string) => {
        // your model's invoke logic goes here
        return {
            model_prediction: ...,
            model_input: input,
            model_output_metadata: {
                prediction: ...,
                other_data_1: ...,
                other_data_2: ...,
                ...,
            },
            tool_calls: ...
        } as ModelInvocation
    }
}

Where the ModelInvocation's inputs are defined as follows:

export interface  ModelInvocation {
    /**
    * Prediction from the model to be used when running the evaluation,
    * e.g. predicted class from classification model or generated text completion from
    * a generative model. This would typically be parsed out of the overall model_output_metadata
    */
    model_prediction?: Record<string, any> | unknown[] | string;
    /**
    * All the input sent to the model
    */
    model_input?: Record<string, any> | unknown[] | string;
    /**
    * Full model response, including any metadata returned with model's output
    */
    model_output_metadata?: Record<string, any> | unknown[] | string;
    /**
    * List of tool calls made during the model invocation, if any
    */
    tool_calls?: any[];
}

The logic of your invoke method depends on many factors, chief among them the intended TestRunType of the CustomModel. Below, we highlight an example of how to use CustomModel for each TestRunType in Okareo.

Classification
Retrieval
Generation

The following CustomModel classification example is taken from the custommodel.test.ts script. This model always returns "Technical Support" as the model_prediction.

const classificationModel = CustomModel({
    type: "custom",
    invoke: (input: string) => {
        return {
            model_prediction: "Technical Support",
            model_input: input,
            model_output_metadata: {
                input: input,
                method: "hard coded",
                context: "Example context"
            }
        } as ModelInvocation
    }
});

Okareo natively supports Pinecone or QDrant models for retrieval. If you want to utilize a different model provider/database, then you can use CustomModel to do so.

The following CustomModel retrieval example is taken from the custommodel.test.ts script. This example assigns random scores to a random subset ofarticleIdsand returns theid, score` pairs as the model's prediction.

import { CustomModel, ModelInvocation } from "okareo-ts-sdk";

const retrievalModel = CustomModel({
    type: "custom",
    invoke: (input: string) => {
        const articleIds = ["Spring Saver", "Free Shipping", "Birthday Gift", "Super Sunday", "Top 10", "New Arrivals", "January", "July"];
        const scores = Array.from({length: 5}, () => ({
            id: articleIds[Math.floor(Math.random() * articleIds.length)], // Select a random ID for each score
            score: parseFloat(Math.random().toFixed(2)) // Generate a random score
        })).sort((a, b) => b.score - a.score); // Sort based on the score

        const parsedIdsWithScores = scores.map(({ id, score }) => [id, score])
                    
        return {
            model_prediction: parsedIdsWithScores,
            model_input: input,
            model_output_metadata: {
                input: input,
            }
        } as ModelInvocation
    }
});

Okareo natively supports most model providers through Generation Model models for generation. If you want to utilize a different model provider/endpoint, then you can use CustomModel class to do so.

The following snippet makes a POST request to a generic model provider that can be accessed via an API.

// API key from your desired model provider
const API_KEY = "<YOUR_API_KEY>";
// URL for the API endpoint that calls your model
const MODEL_URL = "<YOUR_MODEL_URL>";

const generationModel = CustomModel({
    type: "custom",
    invoke: (input: string) => {
        // format input_value as messages as reqiured by the API
        // here we assume messages are sent to the model as a list
        // i.e., [{'role': 'content'}, 'role', 'content']
        const messages = [{ "user": input }];
        const payload = {
            messages: messages
        };

        const headers = {
            "accept": "application/json",
            "content-type": "application/json",
            "Authorization": `Bearer ${API_KEY}`
        };

        const response = await fetch(MODEL_URL, {
            method: 'POST',
            headers: headers,
            body: JSON.stringify(payload)
        });

        const fullModelOutput = await response.json();
        const generatedResponse = fullModelOutput.messages[fullModelOutput.messages.length - 1].content;
        
        return {
            model_prediction: generatedResponse,
            model_input: input,
            model_output_metadata: fullModelOutput,
            tool_calls: ...,
        } as ModelInvocation
    }
});

`MultiTurnDriver`

A MultiTurnDriver allows you to evaluate a language model over the course of a full conversation. The MultiTurnDriver is made up of two pieces: a Driver and a Target.

The Driver is defined in your MultiTurnDriver, while your Target is defined as either a CustomMultiturnTarget or a GenerationModel.

// import { MultiTurnDriver, StopConfig } from "okareo-ts-sdk"
export interface MultiTurnDriver extends BaseModel {
    type: "driver";
    target: GenerationModel | CustomMultiturnTarget;
    driver_temperature: number = 0.8
    max_turns: bigint = 5
    repeats: bigint = 1
    first_turn: string = "target"
    stop_check: StopConfig
}

Driver

The possible parameters for the Driver are:

driver_temperature: number = 1.0
max_turns: bigint = 5
repeats: bigint = 1
first_turn: string = "target"
stop_check: StopConfig

driver_temperature defines temperature used in the model that will simulate a user.

max_turns defines the maximum number of back-and-forth interactions that can be in the conversation.

repeats defines how many times each row in a scenario will be run when a model is run with run_test. Since the Driver is non-deterministic, repeating the same row of a scenario can lead to different conversations.

first_turn defines whether the Target or the Driver will send the first message in the conversation.

stop_check defines how the check will stop. It requires the check name, and a boolean value defining whether or not it stops on a True or False value returned from the check.

Target

A Target is either a GenerationModel or a CustomMultiturnTarget. Refer to GenerationModel for details on GenerationModel.

The only exception to the standard usage is that a system_prompt_template is required when using a MultiTurnDriver. The system_prompt_template defines the system prompt for how the Target should behave.

A CustomMultiturnTarget is defined in largely the same way as a CustomModel. The key difference is that the input is a list of messages in OpenAI's message format.

Driver and Target Interaction

The Driver simulates user behavior, while the Target represents the AI model being tested. This setup allows for testing complex scenarios and evaluating the model's performance over extended conversations.

Setting up a scenario

Scenarios in MultiTurnDriver are crafted using SeedData, where the input field serves as a driver prompt, instructing the simulated user (Driver) on how to behave throughout the conversation, including specific questions to ask, responses to give, and even how to react to the model's function calls, thereby creating a controlled yet dynamic testing environment for evaluating the model's performance across various realistic interaction patterns.

const seedData: SeedData[] = [
    {
        input: "You are interacting with a customer service agent. First, ask about WebBizz...",
        result: "N/A",
    },
    // ... more seed data
];

Tools and Function Calling

The Target model can be equipped with tools, which are essentially functions the model can call. For instance:

const tools = [
    {
        type: "function",
        function: {
            name: "delete_account",
            description: "Deletes the user's account",
            // ... parameter details
        },
    }
];

These tools allow the model to perform specific actions, like deleting a user account in this case.

Mocking Tool Results

The driver prompt can be used to mock the results of tool calls. This is crucial for testing how the model responds to different outcomes without actually performing the actions. For example:

const input = `... If you receive any function calls, output the result in JSON format 
and provide a JSON response indicating that the deletion was successful.`;

This prompt instructs the Driver to simulate a successful account deletion when the function is called.

Checks and Conversation Control

Checks are used to evaluate specific aspects of the conversation or to control its flow. For instance:

const stopCheck: StopConfig = {
    check_name: "task_completion_delete_account",
    stop_on: true,
};

This configuration stops the conversation when the account deletion task is completed.

Custom checks can be created to evaluate various aspects of the conversation:

okareo.createOrUpdateCheck({
    name: 'task_completion_delete_account',
    description: "Check if the agent confirms account deletion",
    check: new ModelBasedCheck(/* ... */)
});

These checks can assess task completion, adherence to guidelines, or any other relevant criteria.

`upload_scenario_set`

Batch upload jsonl formatted data to create a scenario. This is the most efficient method for pushing large data sets for tests and evaluations.

Usage
Details
Result

import { Okareo } from "okareo-ts-sdk";
const okareo = new Okareo({api_key:OKAREO_API_KEY});
const data: any = await okareo.upload_scenario_set(
    {
        file_path: "example_data/seed_data.jsonl",
        scenario_name: "Uploaded Scenario Set",
        project_id: project_id
    }
);

Takes two arguments, file_path and scenario_name

async upload_scenario_set(props: UploadScenarioSetProps): Promise<components["schemas"]["ScenarioSetResponse"]> {
    //...
}

export interface UploadScenarioSetProps {
    project_id: string;
    scenario_name: string;
    file_path: string;
}

/** ScenarioSetResponse */
ScenarioSetResponse: {
    /**
    * Scenario Id
    * Format: uuid
    */
    scenario_id: string;
    /**
    * Project Id
    * Format: uuid
    */
    project_id: string;
    /**
    * Time Created
    * Format: date-time
    */
    time_created: string;
    /** Type */
    type: string;
    /**
    * Tags
    * @default []
    */
    tags?: string[];
    /** Name */
    name?: string;
    /**
    * Seed Data
    * @default []
    */
    seed_data?: components["schemas"]["SeedData"][];
    /**
    * Scenario Count
    * @default 0
    */
    scenario_count?: number;
    /**
    * Scenario Input
    * @default []
    */
    scenario_input?: string[];
    /**
    * App Link
    * @description This URL links to the Okareo webpage for this scenario set
    * @default
    */
    app_link?: string;
};

Reporters

Primarily part of the Okareo Typescript SDK are a set of reporters. The reporters allow you to get rapid feedback in CI or locally from the command line.

Reporters are convenience functions that interpret evaluations based on thresholds that you provide. The reporters are not persisted and do not alter or change the evaluation. They are simply conveniences for rapid summarization locally and in CI.

Singleton Evaluation Reporters

There are two categories of reporters. The singleton reporters are based on specific evaluation types and can report on each. You can set thresholds specific to classification, retrieval, or generation and the reporters will provide detailed pass/fail information. The second category provides trend information. The history reporter takes a list of evaluations along with a threshold instance and returns a table of results over time.

`Class ClassificationReporter`

The classification reporter takes the evaluated metrics and the confusion matrix and returns a pass/fail, count of errors, and the specific metric that fails.

info

By convention we define the reporter thresholds independently. This way we can re-use them in trend analysis and across evaluations.

Example console output from passing test:

Input
Response

import { ClassificationReporter } from "okareo-ts-sdk";
/*
... body of evaluation
*/
const eval_thresholds = {
    error_max: 8, 
    metrics_min: {
        precision: 0.95,
        recall: 0.9,
        f1: 0.9,
        accuracy: 0.95
    }
}
const reporter = new ClassificationReporter({
    eval_run:classification_run, 
    ...eval_thresholds,
});
reporter.log(); //provides a table of results
/*
    // do something if it fails
    if (!reporter.pass) { ... }
*/

interface ClassificationReporterResponse {
    pass: boolean;
    errors: number;
    fail_metrics: {
        min: {
            [key: string]: {
                metric: string,
                value: number,
                expected: number,
            }
        }
    }
}

Response Example

/* Success */
{
    pass: true,
    errors: 0,
    fail_metrics: { }
}

/* Failure */
{
    pass: false,
    errors: 6,
    fail_metrics: {
        precision: { metric: 'precision', value: 0.75, expected: 0.95 },
        f1: { metric: 'f1', value: 0.7333333333333333, expected: 0.9 }
    }
}

`Class RetrievalReporter`

The retrieval reporter provides a shortcut for metrics @k. Each metric can reference a different k value. The result of the report is always in summary form and only returns metrics that exceed thresholds.

Example console output from failing test: Okareo Diagram

Input
Response

import { classification_reporter } from "okareo-ts-sdk";
/*
... body of evaluation
*/
const report = retrieval_reporter(
    {
        eval_run:data,  // data from a retrieval run 
        metrics_min: {
            'Accuracy@k': {
                value: 0.96,
                at_k: 3
            },
            'Precision@k': {
                value: 0.5,
                at_k: 1 // can use different k values by metric
            },
            'Recall@k': {
                value: 0.8,
                at_k: 2 // can use different k values by metric
            },
            'NDCG@k': {
                value: 0.2,
                at_k: 3
            },
            'MRR@k': {
                value: 0.96,
                at_k: 3
            },
            'MAP@k': {
                value: 0.96,
                at_k: 3
            }
        }
    }
);
expect(report.pass).toBeTruthy(); // example report assertion

interface RetrievalReporterResponse {
    pass: boolean;
    errors: number;
    fail_metrics: {
        min: {
            [key: string]: {
                metric: string,
                value: number,
                expected: number,
                k: number;
            }
        }
    }
}

Response Example

/* Success */
{
    pass: true,
    errors: 0,
    fail_metrics: { }
}
/* Failure */
{
    pass: false,
    errors: 52,
    fail_metrics: {
        'MRR@k': {
            metric: 'MRR@k',
            k: 3,
            value: 0.8833333333333332,
            expected: 0.99
        },
        'MAP@k': {
            metric: 'MAP@k',
            k: 3,
            value: 0.8833333333333332,
            expected: 0.99
        }
    }
} 

`Class GenerationReporter`

The genration reporter takes an arbitrary list of metric name:value pairs and reports on results that did not meet the minimum threshold defined. Often these metrics are unique to your circumstance. Boolean values will be treated as "0" or "1".

Example console output from failing test: Okareo Diagram

Input
Response

import { classification_reporter } from "okareo-ts-sdk";
/*
... body of evaluation
*/
const report = generation_reporter(
    {
        eval_run:data, 
        metrics_min: {
            coherence: 4.9,
            consistency: 3.2,
            fluency: 4.7,
            relevance: 4.3,
            overall: 4.1
        }
    }
);
expect(report.pass).toBeTruthy(); // example report assertion

interface GenerationReporterResponse {
    pass: boolean;
    errors: number;
    fail_metrics: {
        min: {
            [key: string]: {
                metric: string,
                value: number,
                expected: number,
            }
        },
        max: {
            [key: string]: {
                metric: string,
                value: number,
                expected: number,
            }
        },
        pass_rate: {
            [key: string]: {
                metric: string,
                value: number,
                expected: number,
            }
        }
    }
}

Response Example

/* Success */
{
    pass: true,
    errors: 0,
    fail_metrics: { }
}
/* Failure */
{
    pass: false,
    errors: 24,
    fail_metrics: {
        coherence: { metric: 'coherence', value: 3.6041134314518946, expected: 4.9 },
        fluency: { metric: 'fluency', value: 2.0248845922814245, expected: 4.7 }
    }
}

History Reporter

The second category of reporter provides historical information based on a series of test runs. Like the singletons, each reporter analyzes a single evaluation type at a time. However the mechanism is shared across all types.

`Class EvaluationHistoryReporter`

The EvaluationHistoryReporter requires four inputs: the evaluation type, list of evals, assertions, and the number to render. The type must be one of the Okareo TestRunType definitions. The assertions are shared with the singleton reports.

info

By convention we define the reporter thresholds independently. Re-using thresholds between singleton reports and historic reports is one of the many reasons.

Classification Report Okareo Diagram

Retrieval Report Okareo Diagram

Generation Report Okareo Diagram

Usage

const history_class = new EvaluationHistoryReporter({
    type: TestRunType.MULTI_CLASS_CLASSIFICATION,
    evals:[TEST_RUN_CLASSIFICATION as components["schemas"]["TestRunItem"], TEST_RUN_CLASSIFICATION as components["schemas"]["TestRunItem"]],
    assertions: class_metrics,
    last_n: 5,
});
history_class.log();

Exporting Reports for CI

`Class JSONReporter`

When using Okareo as part of a CI run, it is useful to export evaluations into a common location that can be picked up by the CI analytics.

By using JSONReporter.log([eval_run, ...]) after each evaluation, Okareo will collect the json results in ./.okareo/reports. The location can be controlled as part of the CLI with the -r LOCATION or --report LOCATION parameters. The output JSON is useful in CI for historical reference.

info

JSONReporter.log([eval_run, ...]) will output to the console unless the evaluation is initiated by the CLI.

Usage

import { JSONReporter } from 'okareo-ts-sdk';
const reporter = new JSONReporter([eval_run]);
reporter.log();

Typescript SDK

Overview​

SDK Installation​

Using the Okareo Typescript SDK​

Jest: Hello Projects!​

AI/LLM Evaluation Workflow​

Typescript SDK and Okareo API​

Class Okareo​

create_or_update_check​

delete_check​

create_scenario_set​

find_datapoints​

generate_check​

generate_scenario_set​

get_all_checks​

get_check​

run_test​

ScenarioType​

Common Contractions​

Common Misspellings​

Conditional​

Rephrase​

Reverse Question​

Seed​

Term Relevance​

get_scenario_sets​

get_scenario_data_points​

get_test_run​

register_model​

OpenAI (LLM)​

Generation Model (LLM)​

Pinecone (VectorDB)​

QDrant (VectorDB)​

Custom Model​

MultiTurnDriver​

Driver​

Target​

Driver and Target Interaction​

Setting up a scenario​

Tools and Function Calling​

Mocking Tool Results​

Checks and Conversation Control​

upload_scenario_set​

Reporters​

Singleton Evaluation Reporters​

Class ClassificationReporter​

Class RetrievalReporter​

Class GenerationReporter​

History Reporter​

Class EvaluationHistoryReporter​

Exporting Reports for CI​

Class JSONReporter​

Overview

SDK Installation

Using the Okareo Typescript SDK

Jest: Hello Projects!

AI/LLM Evaluation Workflow

Typescript SDK and Okareo API

`Class Okareo`

`create_or_update_check`

`delete_check`

`create_scenario_set`

`find_datapoints`

`generate_check`

`generate_scenario_set`

`get_all_checks`

`get_check`

`run_test`

`ScenarioType`

Common Contractions

Common Misspellings

Conditional

Rephrase

Reverse Question

Seed

Term Relevance

`get_scenario_sets`

`get_scenario_data_points`

`get_test_run`

`register_model`

OpenAI (LLM)

Generation Model (LLM)

Pinecone (VectorDB)

QDrant (VectorDB)

Custom Model

`MultiTurnDriver`

Driver

Target

Driver and Target Interaction

Setting up a scenario

Tools and Function Calling

Mocking Tool Results

Checks and Conversation Control

`upload_scenario_set`

Reporters

Singleton Evaluation Reporters

`Class ClassificationReporter`

`Class RetrievalReporter`

`Class GenerationReporter`

History Reporter

`Class EvaluationHistoryReporter`

Exporting Reports for CI

`Class JSONReporter`