Skip to content

Getting Started

WalledEval can serve four major functions, namely the following:

  • Testing LLM Response Safety


    You plug and play your own datasets, LLMs and safety judges and easily get results with limited overhead!

    Prompt Benchmarking

  • LLM Knowledge


    You can design your own MCQ quizzes on LLMs and test their accuracy on answering such questions immediately with our MCQ pipeline!

    MCQ Benchmarking

  • Safety Judge Effectiveness


    You can easily get messy with testing judges using our framework!

    Judge Benchmarking

  • Automated Red-Teaming


    If you think that's all, you're mistaken! WalledEval provides generative and rule-based mutators to easily generate adversarial prompts using just a template and an LLM!

    Automated Red-Teaming