Skip to main content

Terminology

Welcome to the Okareo Terminology page! Here, we provide a comprehensive list of key terms and concepts that are essential for understanding our platform and its capabilities in Agent simulation, LLM evaluation and LLM Error Tracking. Whether you are a Developer, Product Manager, or Subject Matter Expert, familiarizing yourself with this terminology will help provide orientation within the Okareo ecosystem.

Why Terminology Matters

Why isn't this page in an appendix somewhere? In the rapidly evolving field of next generation application experinces that use artificial intelligence, machine learning, and language models precise language is crucial. However many of the most common phrases are unclear or are used very broadly. Agents. Need we say more.

  • Clarity: Each term is defined in straightforward language, making it accessible to users of all backgrounds.
  • Context: We provide context around each term, explaining how it relates to Okareo's functionalities and the broader landscape of AI evaluation.
  • Collaboration: Understanding these terms fosters better communication among team members, enabling more effective collaboration on AI projects. As you explore the terminology, feel free to refer back to this page whenever you encounter a term that needs clarification. Our goal is to empower you with the knowledge you need to navigate the Okareo platform confidently and effectively.
  • Current: If we missed something, let us know. We will add it. support@okareo.com

Okareo Terminology

  • Agent: Okareo considers any application utilizing a language model that interacts with external input an Agent. Based on this, not all Agents are Agentic. With the rise of large context windows, chain of thought, and specialized language models, many single-prompt applications that are not strictly Agentic still provide nuanced experiences that require unique tooling to evaluate and track errors.

  • Agentic: Autonomous decision-making entities capable of engaging in multiturn conversations and/or executing tasks through function calls are Agentic. These agentic systems leverage the capabilities of Large Language Models (LLMs) to understand user intent and respond appropriately, often forming part of a larger agent network where one primary agent delegates tasks to specialized subagents.

  • Evaluation: The process of assessing the performance of an LLM by comparing its outputs against expected results or benchmarks to ensure accuracy and reliability.

  • Model: The Okareo internal representation of the parameters necessary to call an inference endpoint. This includes the provider, the model type (GenAI, Clasification, Embedding, etc), any necessary model attribute (temperature, etc), and even the system prompt. All models are versioned.

  • Prompt Versioning: When a prompt is associated to a model, by definition the prompt is automatically versioned. The benefit to this approach is that the model configuration data crucual to the prompt is stored with the prompt version. The same prompt used on two different versions of the same model may require different temperature and other settings. This approach keeps each distinct.

  • Simulation: Use of synthetic personas that drive interactions with a target agents for the purpose of evaluation. This involves generating variations of real user interactions to test how models respond under different conditions, allowing for robust offline evaluations and comparisons of model behavior before deployment. Simulations can be used to assess model readiness, identify potential issues, and refine performance through iterative testing.

  • Persona: The discrete prompt and objective used to drive a conversation with an agent in a simulation. Personas are stored as the input of a scenario.

  • Driver: A componnet of simulations that interprets a Persona and synthetically generates the appropriate next interaction with an agent in a simulation.

  • Target: The Okareo representation of the Agent in a simulation. The target may be in the form of an API endpoint or a prompt. Simulations are intentionally at-arms-length from the actual entity being interacted with. Simulations can interact with data pipelines, chatbots, email generators and much more.

  • Observability: The practice of collecting and analyzing telemetry data from LLM applications to understand their performance and behavior, which aids in evaluation and improvement. This is the functional underpinning of Error Tracking. However, as the name implies, it is passive logging.

  • Error Tracking: The process of monitoring and recording errors or anomalies in LLM outputs. This helps identify issues such as hallucinations, biases, gaps in factuality, or agentic errors allowing teams to rapidly address the discovered issue or improve model performance.

  • Monitors: LLM traffic is automatically evaluated through monitors. Each monitor defines a specific set of conditions and the checks to use when the conditions are met. A single LLM completion may be part of more than one monitor. Alerts and notifications are defined through the combination of a monitor, a set of checks and a notification channel.

  • Checks: Specific metrics or criteria used to evaluate the output of an LLM. These can be standard checks (like coherence or fluency) or custom checks tailored to specific use cases.

  • Custom Checks: User-defined evaluation criteria that allow for tailored assessments of LLM outputs, enabling developers to focus on specific behaviors or requirements.

  • Scenarios: Pairs of inputs and expected outputs used to test how well an LLM performs. Each scenario helps determine the accuracy of the model's responses.

  • Synthetic Data Generation: The process of creating artificial data that mimics real-world data, used to evaluate LLMs under various scenarios and edge cases.

  • AI CI/CD (Continuous Integration/Continuous Deployment): A set of practices that enable teams to deliver code changes more frequently and reliably. Okareo integrates LLM evaluation into CI/CD workflows to maintain high-quality outputs.

  • Metrics: Quantitative measures used to assess the performance of LLMs, including reference-based metrics (like BLEU score) and reference-free metrics (like consistency and fluency). In Okareo metrics are produced through Checks.

  • Proxy: A service that facilitates the monitoring and evaluation of LLM (Large Language Model) requests by routing them through a unified endpoint. It allows developers to connect their inference endpoints to various LLM providers while automatically collecting performance metrics and error data, enabling real-time analysis and ensuring the reliability of interactions. The proxy can be self-hosted and access through https://proxy.okareo.com