Skip to content
#

prompt-testing

Here are 42 public repositories matching this topic...

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

  • Updated Mar 14, 2026
  • TypeScript

prompt-evaluator is an open-source toolkit for evaluating, testing, and comparing LLM prompts. It provides a GUI-driven workflow for running prompt tests, tracking token usage, visualizing results, and ensuring reliability across models like OpenAI, Claude, and Gemini.

  • Updated Dec 4, 2025
  • TypeScript

Test Claude Projects without copy-pasting. Local workbench for prompt engineering, agent testing, and workflow iteration. Direct Claude.ai access via cookie auth, 20+ prompt templates, web fetch/search tools, file uploads. Stop switching tabs to test your prompts.

  • Updated Jan 13, 2026
  • JavaScript

curl for prompts. Run .prompt files against any LLM (Anthropic, OpenAI, Ollama) from the terminal. Treat prompts as code — version them, review them in PRs, and test them in CI.

  • Updated Feb 27, 2026
  • Python

PromptGuard is a pragmatic, opinionated framework for establishing continuous integration for LLM behavior. It operates on a simple, verifiable principle: run the same prompts across multiple model configurations, compare outputs against defined expectations, and flag semantic regressions.

  • Updated Jan 17, 2026
  • Python

Improve this page

Add a description, image, and links to the prompt-testing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the prompt-testing topic, visit your repo's landing page and select "manage topics."

Learn more