Build software better, together

promptfoo / promptfoo

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

testing ci evaluation ci-cd pentesting cicd vulnerability-scanners prompts evaluation-framework red-teaming rag llm prompt-engineering llmops prompt-testing llm-eval llm-evaluation llm-evaluation-framework

Updated Mar 14, 2026
TypeScript

msoedov / agentic_security

Star

Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪

agent-framework ai-red-team prompt-testing llm-security llm-vulnerabilities llm-evaluation llm-fuzzing llm-evaluation-framework llm-guardrails llm-scanner llm-jailbreaks llm-fuzzer llm-fuzzer-aggregator agent-security

Updated Feb 3, 2026
Python

babelcloud / LLM-RGB

Star

LLM Reasoning and Generation Benchmark. Evaluate LLMs in complex scenarios systematically.

benchmark prompt llm prompt-engineering prompt-testing

Updated May 25, 2025
TypeScript

jhd3197 / Prompture

Sponsor

Star

Prompture is an API-first library for requesting structured JSON output from LLMs (or any structure), validating it against a schema, and running comparative tests between models.

openai toon json-validation structured-output pydantic llm prompt-engineering ai-testing prompt-testing prompture

Updated Mar 13, 2026
Python

aralyekta / prompttester

Star

Test, compare, and optimize your AI prompts in minutes

prompt-testing llm-tools llm-test llm-evaluation prompt-test llm-testing

Updated Aug 13, 2025
JavaScript

prompt-foundry / typescript-sdk

Star

The prompt engineering, prompt management, and prompt evaluation tool for TypeScript, JavaScript, and NodeJS.

typescript gpt open-ai gpt-3 gpt-4 llm prompt-engineering llmops prompt-testing prompt-manager prompt-management llm-eval llm-test llm-ops llm-evaluation prompt-evaluation

Updated Nov 15, 2025
TypeScript

yukinagae / genkitx-promptfoo

Star

Community Plugin for Genkit to use Promptfoo

plugin testing firebase ai evaluation prompt prompts evaluation-framework llm llmops prompt-testing llm-eval llm-evaluation llm-evaluation-framework promptfoo genkit genkitx genkit-plugin

Updated Jan 3, 2025
TypeScript

calibrtr / llm-prompt-test

Star

LLM Prompt Test helps you test Large Language Models (LLMs) prompts to ensure they consistently meet your expectations.

testing tdd test prompt test-automation testing-tools prompts large-language-models llm prompt-engineering prompt-testing

Updated May 22, 2024
TypeScript

syamsasi99 / prompt-evaluator

Star

prompt-evaluator is an open-source toolkit for evaluating, testing, and comparing LLM prompts. It provides a GUI-driven workflow for running prompt tests, tracking token usage, visualizing results, and ensuring reliability across models like OpenAI, Claude, and Gemini.

electron react typescript datascience developer-tools ai-evaluation llm prompt-engineering prompt-testing promptfoo ai-evaluation-tools ai-evaluation-metrics ai-evaluation-framework

Updated Dec 4, 2025
TypeScript

SEMalytics / claude_project_chat

Star

Test Claude Projects without copy-pasting. Local workbench for prompt engineering, agent testing, and workflow iteration. Direct Claude.ai access via cookie auth, 20+ prompt templates, web fetch/search tools, file uploads. Stop switching tabs to test your prompts.

flask devtools api-client developer-tools testing-tools knowledge-base workflow-automation ai-agents claude ai-development llm prompt-engineering prompt-testing anthropic claude-api prompt-templates claude-projects

Updated Jan 13, 2026
JavaScript

Maneesh-Relanto / Prompt-Run

Star

curl for prompts. Run .prompt files against any LLM (Anthropic, OpenAI, Ollama) from the terminal. Treat prompts as code — version them, review them in PRs, and test them in CI.

Updated Feb 27, 2026
Python

Beagle-AI-automation / promptperfect

Star

Open-source prompt optimization tool — improve your LLM prompts with AI-powered suggestions and explanations.

open-source ai nextjs gemini openai vercel prompt-tuning llm prompt-engineering generative-ai llmops prompt-testing anthropic prompt-optimization prompt-optimizer

Updated Mar 14, 2026
TypeScript

amansoomro062 / atelier

Star

An open-source AI prompt engineering playground with live code execution. Test OpenAI & Claude prompts, execute JavaScript, and iterate in real-time.

playground ai nextjs openai developer-tools claude llm prompt-engineering prompt-testing anthropic prompt-optimization system-prompts

Updated Nov 8, 2025
TypeScript

kvyb / prompt-agent

Star

AI agent that helps you create, test, and iterate on LLM prompts. Saves versioned artifacts, generates test samples, runs evaluations, and provides detailed performance analysis.

ai-agents prompt-engineering prompting prompt-testing llm-evaluation langgraph prompt-engineering-tool

Updated Jan 19, 2026
Python

51p50x / llm-audit

Sponsor

Star

OWASP LLM Top 10 vulnerability scanner CLI — test your AI endpoints for prompt injection, jailbreaks, data leakage & more. Fast red-teaming tool with pass/fail reports + fix recommendations. 🛡️

python testing ci evaluation ci-cd python3 pentesting python-3 cicd vulnerability-scanners prompts llm prompt-testing prompt-injection

Updated Mar 8, 2026
Python

yukinagae / promptfoo-sample

Star

Sample project demonstrates how to use Promptfoo, a test framework for evaluating the output of generative AI models

testing evaluation prompts evaluation-framework llm llmops prompt-testing llm-eval llm-evaluation llm-evaluation-framework promptfoo

Updated Sep 10, 2024

GTMVP / modal-llm-evaluator

Star

Run 1,000 LLM evaluations in 10 minutes. Test prompts across Claude, GPT-4, and Gemini with parallel execution, real-time cost tracking, and beautiful visualizations. Open source.

python testing benchmarking machine-learning automation ai modal developer-tools parallel-execution mlops streamlit llm prompt-engineering llms prompt-testing anthropic llm-evaluation cost-tracking google-gemini

Updated Dec 12, 2025
Python

srdarkseer / PromptForge

Star

Visual prompt engineering platform for creating, testing, and versioning LLM prompts across multiple providers (OpenAI, Anthropic, Mistral, Gemini).

ai-tools llm prompt-engineering prompt-testing prompt-optimization

Updated Nov 5, 2025
TypeScript

Tarunjit45 / PromptGuard

Star

PromptGuard is a pragmatic, opinionated framework for establishing continuous integration for LLM behavior. It operates on a simple, verifiable principle: run the same prompts across multiple model configurations, compare outputs against defined expectations, and flag semantic regressions.

python nlp open-source developer-tools regression-testing ai-safety mlops llm prompt-engineering prompt-testing llm-evaluation ai-infrastructure model-reliability semantic-drift llm-systems

Updated Jan 17, 2026
Python

abdullahkhalid00 / prompt-db

Star

A collection of prompts that I use on a day-to-day basis for work and leisure.

markdown jinja2 text prompts prompt-engineering chatgpt prompt-testing prompt-template

Updated Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prompt-testing

Here are 42 public repositories matching this topic...

promptfoo / promptfoo

msoedov / agentic_security

babelcloud / LLM-RGB

jhd3197 / Prompture

aralyekta / prompttester

prompt-foundry / typescript-sdk

yukinagae / genkitx-promptfoo

calibrtr / llm-prompt-test

syamsasi99 / prompt-evaluator

SEMalytics / claude_project_chat

Maneesh-Relanto / Prompt-Run

Beagle-AI-automation / promptperfect

amansoomro062 / atelier

kvyb / prompt-agent

51p50x / llm-audit

yukinagae / promptfoo-sample

GTMVP / modal-llm-evaluator

srdarkseer / PromptForge

Tarunjit45 / PromptGuard

abdullahkhalid00 / prompt-db

Improve this page

Add this topic to your repo