Evaluation Playground

Interactive Model Testing

Select a benchmark question to test how your models query the Memanto memory layer. Evaluate the answer with a secondary LLM Judge against ground truth.

Context Question

Choose a ground-truth question from the benchmarks. This sets the Agent ID.

Inference LLM

The model generating the initial answer.

Keys are not stored locally.
Judge LLM

The model evaluating the accuracy of the answer.

Ready to evaluate
Review the context question and click Send to test the agent.