Getting Started¶
WalledEval can serve four major functions, namely the following:
-
Testing LLM Response Safety
You plug and play your own datasets, LLMs and safety judges and easily get results with limited overhead!
-
LLM Knowledge
You can design your own MCQ quizzes on LLMs and test their accuracy on answering such questions immediately with our MCQ pipeline!
-
Safety Judge Effectiveness
You can easily get messy with testing judges using our framework!
-
Automated Red-Teaming
If you think that's all, you're mistaken! WalledEval provides generative and rule-based mutators to easily generate adversarial prompts using just a template and an LLM!