Qualifire AI Releases Rogue: An Finish-to-Finish Agentic AI Testing Framework, Evaluating the Efficiency of AI Brokers

October 17, 2025

103

Agentic programs are stochastic, context-dependent, and policy-bounded. Typical QA—unit checks, static prompts, or scalar “LLM-as-a-judge” scores—fails to show multi-turn vulnerabilities and gives weak audit trails. Developer groups want protocol-accurate conversations, express coverage checks, and machine-readable proof that may gate releases with confidence.

Qualifire AI has open-sourced Rogue, a Python framework that evaluates AI brokers over the Agent-to-Agent (A2A) protocol. Rogue converts enterprise insurance policies into executable situations, drives multi-turn interactions in opposition to a goal agent, and outputs deterministic stories appropriate for CI/CD and compliance critiques.

Fast Begin

Stipulations

uvx – If not put in, observe uv set up information
Python 3.10+
An API key for an LLM supplier (e.g., OpenAI, Google, Anthropic).

Set up

Choice 1: Fast Set up (Really useful)

Use our automated set up script to rise up and operating rapidly:

# TUI
uvx rogue-ai
# Net UI
uvx rogue-ai ui
# CLI / CI/CD
uvx rogue-ai cli

Choice 2: Guide Set up

(a) Clone the repository:

git clone https://github.com/qualifire-dev/rogue.git
cd rogue

(b) Set up dependencies:

In case you are utilizing uv:

Or, if you’re utilizing pip:

(c) OPTIONALLY: Arrange your surroundings variables: Create a .env file within the root listing and add your API keys. Rogue makes use of LiteLLM, so you’ll be able to set keys for varied suppliers.

OPENAI_API_KEY="sk-..."
ANTHROPIC_API_KEY="sk-..."
GOOGLE_API_KEY="..."

Working Rogue

Rogue operates on a client-server structure the place the core analysis logic runs in a backend server, and varied shoppers hook up with it for various interfaces.

Default Conduct

Whenever you run uvx rogue-ai with none mode specified, it:

Begins the Rogue server within the background
Launches the TUI (Terminal Consumer Interface) consumer

Accessible Modes

Default (Server + TUI): uvx rogue-ai – Begins server in background + TUI consumer
Server: uvx rogue-ai server – Runs solely the backend server
TUI: uvx rogue-ai tui – Runs solely the TUI consumer (requires server operating)
Net UI: uvx rogue-ai ui – Runs solely the Gradio internet interface consumer (requires server operating)
CLI: uvx rogue-ai cli – Runs non-interactive command-line analysis (requires server operating, ideally suited for CI/CD)

Mode Arguments

Server Mode

uvx rogue-ai server [OPTIONS]

Choices:

–host HOST – Host to run the server on (default: 127.0.0.1 or HOST env var)
–port PORT – Port to run the server on (default: 8000 or PORT env var)
–debug – Allow debug logging

TUI Mode

uvx rogue-ai tui [OPTIONS]
Net UI Mode
uvx rogue-ai ui [OPTIONS]

Choices:

–rogue-server-url URL – Rogue server URL (default: http://localhost:8000)
–port PORT – Port to run the UI on
–workdir WORKDIR – Working listing (default: ./.rogue)
–debug – Allow debug logging

Instance: Testing the T-Shirt Retailer Agent

This repository features a easy instance agent that sells T-shirts. You need to use it to see Rogue in motion.

Set up instance dependencies:

In case you are utilizing uv:

or, if you’re utilizing pip:

pip set up -e .[examples]

(a) Begin the instance agent server in a separate terminal:

In case you are utilizing uv:

uv run examples/tshirt_store_agent

If not:

python examples/tshirt_store_agent

This may begin the agent on http://localhost:10001.

(b) Configure Rogue within the UI to level to the instance agent:

Agent URL: http://localhost:10001
Authentication: no-auth

(c) Run the analysis and watch Rogue check the T-Shirt agent’s insurance policies!

You need to use both the TUI (uvx rogue-ai) or Net UI (uvx rogue-ai ui) mode.

The place Rogue Suits: Sensible Use Instances

Security & Compliance Hardening: Validate PII/PHI dealing with, refusal habits, secret-leak prevention, and regulated-domain insurance policies with transcript-anchored proof.
E-Commerce & Help Brokers: Implement OTP-gated reductions, refund guidelines, SLA-aware escalation, and tool-use correctness (order lookup, ticketing) underneath adversarial and failure circumstances.
Developer/DevOps Brokers: Assess code-mod and CLI copilots for workspace confinement, rollback semantics, rate-limit/backoff habits, and unsafe command prevention.
Multi-Agent Programs: Confirm planner↔executor contracts, functionality negotiation, and schema conformance over A2A; consider interoperability throughout heterogeneous frameworks.
Regression & Drift Monitoring: Nightly suites in opposition to new mannequin variations or immediate adjustments; detect behavioral drift and implement policy-critical move standards earlier than launch.

What Precisely Is Rogue—and Why Ought to Agent Dev Groups Care?

Rogue is an end-to-end testing framework designed to guage the efficiency, compliance, and reliability of AI brokers. Rogue synthesizes enterprise context and threat into structured checks with clear aims, techniques and success standards. The EvaluatorAgent runs protocol right conversations in quick single flip or deep multi flip adversarial modes. Deliver your personal mannequin, or let Rogue use Qualifire’s bespoke SLM judges to drive the checks. Streaming observability and deterministic artifacts: dwell transcripts,move/fail verdicts, rationales tied to transcript spans, timing and mannequin/model lineage.

Beneath the Hood: How Rogue Is Constructed

Rogue operates on a client-server structure:

Rogue Server: Incorporates the core analysis logic
Shopper Interfaces: A number of interfaces that hook up with the server:
- TUI (Terminal UI): Trendy terminal interface constructed with Go and Bubble Tea
- Net UI: Gradio-based internet interface
- CLI: Command-line interface for automated analysis and CI/CD

This structure permits for versatile deployment and utilization patterns, the place the server can run independently and a number of shoppers can hook up with it concurrently.

Abstract

Rogue helps developer groups check agent habits the best way it truly runs in manufacturing. It turns written insurance policies into concrete situations, workout routines these situations over A2A, and information what occurred with transcripts you’ll be able to audit. The result’s a transparent, repeatable sign you should utilize in CI/CD to catch coverage breaks and regressions earlier than they ship.

Due to the Qualifire crew for the thought management/ Assets for this text. Qualifire crew has supported this content material/article.

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

🙌 Comply with MARKTECHPOST: Add us as a most well-liked supply on Google.

Previous articleThe Rooster Squisher 3000 Undoubtedly Doesn’t Squish Chickens

Next articleWhich robotic is greatest for room service supply?

Qualifire AI Releases Rogue: An Finish-to-Finish Agentic AI Testing Framework, Evaluating the Efficiency of AI Brokers

Fast Begin

Stipulations

Set up

Choice 1: Fast Set up (Really useful)

Choice 2: Guide Set up

Working Rogue

Default Conduct

Accessible Modes

Mode Arguments

Server Mode

The place Rogue Suits: Sensible Use Instances

What Precisely Is Rogue—and Why Ought to Agent Dev Groups Care?

Beneath the Hood: How Rogue Is Constructed

Abstract

An Implementation to Construct Dynamic AI Techniques with the Mannequin Context Protocol (MCP) for Actual-Time Useful resource and Instrument Integration

Microsoft AI Proposes BitNet Distillation (BitDistill): A Light-weight Pipeline that Delivers as much as 10x Reminiscence Financial savings and about 2.65x CPU Speedup

Weak-for-Robust (W4S): A Novel Reinforcement Studying Algorithm that Trains a weak Meta Agent to Design Agentic Workflows with Stronger LLMs

LEAVE A REPLY Cancel reply

Most Popular

Rising Natural Grains and Pulses within the Northeast: What Does the Analysis Say?

Fiber development steadies Japan telecom income, analysis finds

FAA DiSCVR drone identification – DRONELIFE

Tips on how to keep away from over- or under-sizing a servo gearbox

Recent Comments

ABOUT US

POPULAR POSTS

Rising Natural Grains and Pulses within the Northeast: What Does the Analysis Say?

Fiber development steadies Japan telecom income, analysis finds

FAA DiSCVR drone identification – DRONELIFE

POPULAR CATEGORY