Skip to main content

title: “Quickstart” description: “Install Evalit, run an experiment, and find the winning prompt”

1. Install

Evalit isn’t on PyPI yet. Install from source:
git clone https://github.com/evalitdev/evalit.git
cd evalit
pip install .

2. Create prompt versions

from evalit import PromptManager

pm = PromptManager()

# V1 (control)
pm.create(
  name="meaning-of-life-prompt",
  template="Question: {question}\nAnswer:",
  version_message="Initial version."
)

# V2 (challenger)
pm.create(
  name="meaning-of-life-prompt",
  template="Please answer this question: {question}\nAnswer:",
  version_message="Added politeness."
)

3. Run an A/B test

import random
from evalit import Experiment

def mock_llm(prompt: str) -> str:
  if "please" in prompt.lower():
    return "The correct answer is 42." if random.random() > 0.3 else "I don't know."
  else:
    return "The correct answer is 42." if random.random() > 0.6 else "I don't know."

dataset = [
  {"id": f"q_{i}", "inputs": {"question": "What is the meaning of life?"}, "expected_output": "42"}
  for i in range(20)
]

variants = {
  "control": pm.get("meaning-of-life-prompt", version=1),
  "challenger": pm.get("meaning-of-life-prompt", version=2)
}

exp = Experiment(name="Politeness Test", variants=variants)
exp.run(dataset=dataset, llm_function=mock_llm, budget=100)
report = exp.analyze()
print(report["winner"])  # e.g., "challenger"

4. Evaluate directly (optional)

from evalit import Evaluator

data = [
  {"prompt_name": "control", "example_id": "1", "outcome": 1},
  {"prompt_name": "control", "example_id": "2", "outcome": 0},
  {"prompt_name": "challenger", "example_id": "1", "outcome": 1},
]

e = Evaluator()
e.fit(data)
print(e.get_scores())
print(e.predict_performance("control"))

Next steps

  • Explore the SDK: /sdk/python/overview
  • Manage prompts: /sdk/python/prompt-manager
  • Run experiments: /sdk/python/experiment
  • Evaluate performance: /sdk/python/evaluator