A Coding Implementation to Construct an AI Agent with Reside Python Execution and Automated Validation

May 25, 2025

32

On this tutorial, we’ll uncover methods to harness the facility of a sophisticated AI Agent, augmented with each Python execution and result-validation capabilities, to deal with complicated computational duties. By integrating LangChain’s ReAct agent framework with Anthropic’s Claude API, we construct an end-to-end resolution to generate Python code and execute it stay, seize its outputs, keep execution state, and robotically confirm outcomes towards anticipated properties or take a look at circumstances. This seamless loop of “write → run → validate” empowers you to develop sturdy analyses, algorithms, and easy ML pipelines with confidence in each step.

!pip set up langchain langchain-anthropic langchain-core anthropic

We set up the core LangChain framework together with the Anthropic integration and its core utilities, guaranteeing you could have each the agent orchestration instruments (langchain, langchain-core) and the Claude-specific bindings (langchain-anthropic, anthropic) obtainable in your setting.

import os
from langchain.brokers import create_react_agent, AgentExecutor
from langchain.instruments import Instrument
from langchain_core.prompts import PromptTemplate
from langchain_anthropic import ChatAnthropic
import sys
import io
import re
import json
from typing import Dict, Any, Checklist

We carry collectively all the things wanted to construct our ReAct-style agent: OS entry for setting variables, LangChain’s agent constructors (create_react_agent, AgentExecutor), and Instrument class for outlining customized actions, the PromptTemplate for crafting the chain-of-thought immediate, and Anthropic’s ChatAnthropic consumer for connecting to Claude. Commonplace Python modules (sys, io, re, json) deal with I/O seize, common expressions, and serialization, whereas typing gives kind hints for clearer, extra maintainable code.

class PythonREPLTool:
    def __init__(self):
        self.globals_dict = {
            '__builtins__': __builtins__,
            'json': json,
            're': re
        }
        self.locals_dict = {}
        self.execution_history = []
   
    def run(self, code: str) -> str:
        attempt:
            old_stdout = sys.stdout
            old_stderr = sys.stderr
            sys.stdout = captured_output = io.StringIO()
            sys.stderr = captured_error = io.StringIO()
           
            execution_result = None
           
            attempt:
                outcome = eval(code, self.globals_dict, self.locals_dict)
                execution_result = outcome
                if outcome just isn't None:
                    print(outcome)
            besides SyntaxError:
                exec(code, self.globals_dict, self.locals_dict)
           
            output = captured_output.getvalue()
            error_output = captured_error.getvalue()
           
            sys.stdout = old_stdout
            sys.stderr = old_stderr
           
            self.execution_history.append({
                'code': code,
                'output': output,
                'outcome': execution_result,
                'error': error_output
            })
           
            response = f"**Code Executed:**n```pythonn{code}n```nn"
            if error_output:
                response += f"**Errors/Warnings:**n{error_output}nn"
            response += f"**Output:**n{output if output.strip() else 'No console output'}"
           
            if execution_result just isn't None and never output.strip():
                response += f"n**Return Worth:** {execution_result}"
           
            return response
           
        besides Exception as e:
            sys.stdout = old_stdout
            sys.stderr = old_stderr
           
            error_info = f"**Code Executed:**n```pythonn{code}n```nn**Runtime Error:**n{str(e)}n**Error Sort:** {kind(e).__name__}"
           
            self.execution_history.append({
                'code': code,
                'output': '',
                'outcome': None,
                'error': str(e)
            })
           
            return error_info
   
    def get_execution_history(self) -> Checklist[Dict[str, Any]]:
        return self.execution_history
   
    def clear_history(self):
        self.execution_history = []

This PythonREPLTool encapsulates a stateful in‐course of Python REPL: it captures and executes arbitrary code (evaluating expressions or operating statements), redirects stdout/stderr to document outputs and errors, and maintains a historical past of every execution. Returning a formatted abstract, together with the executed code, any console output or errors, and return values, gives clear, reproducible suggestions for each snippet run inside our agent.

class ResultValidator:
    def __init__(self, python_repl: PythonREPLTool):
        self.python_repl = python_repl
   
    def validate_mathematical_result(self, description: str, expected_properties: Dict[str, Any]) -> str:
        """Validate mathematical computations"""
        validation_code = f"""
# Validation for: {description}
validation_results = {{}}


# Get the final execution outcomes
historical past = {self.python_repl.execution_history}
if historical past:
    last_execution = historical past[-1]
    print(f"Final execution output: {{last_execution['output']}}")
   
    # Extract numbers from the output
    import re
    numbers = re.findall(r'd+(?:.d+)?', last_execution['output'])
    if numbers:
        numbers = [float(n) for n in numbers]
        validation_results['extracted_numbers'] = numbers
       
        # Validate anticipated properties
        for prop, expected_value in {expected_properties}.objects():
            if prop == 'rely':
                actual_count = len(numbers)
                validation_results[f'count_check'] = actual_count == expected_value
                print(f"Rely validation: Anticipated {{expected_value}}, Acquired {{actual_count}}")
            elif prop == 'max_value':
                if numbers:
                    max_val = max(numbers)
                    validation_results[f'max_check'] = max_val = expected_value
                    print(f"Min worth validation: {{min_val}} >= {{expected_value}} = {{min_val >= expected_value}}")
            elif prop == 'sum_range':
                if numbers:
                    whole = sum(numbers)
                    min_sum, max_sum = expected_value
                    validation_results[f'sum_check'] = min_sum  str:
        """Validate knowledge evaluation outcomes"""
        validation_code = f"""
# Knowledge Evaluation Validation for: {description}
validation_results = {{}}


# Verify if required variables exist in world scope
required_vars = {checklist(expected_structure.keys())}
existing_vars = []


for var_name in required_vars:
    if var_name in globals():
        existing_vars.append(var_name)
        var_value = globals()[var_name]
        validation_results[f'{{var_name}}_exists'] = True
        validation_results[f'{{var_name}}_type'] = kind(var_value).__name__
       
        # Sort-specific validations
        if isinstance(var_value, (checklist, tuple)):
            validation_results[f'{{var_name}}_length'] = len(var_value)
        elif isinstance(var_value, dict):
            validation_results[f'{{var_name}}_keys'] = checklist(var_value.keys())
        elif isinstance(var_value, (int, float)):
            validation_results[f'{{var_name}}_value'] = var_value
           
        print(f"✓ Variable '{{var_name}}' discovered: {{kind(var_value).__name__}} = {{var_value}}")
    else:
        validation_results[f'{{var_name}}_exists'] = False
        print(f"✗ Variable '{{var_name}}' not discovered")


print(f"nFound {{len(existing_vars)}}/{{len(required_vars)}} required variables")


# Further construction validation
for var_name, expected_type in {expected_structure}.objects():
    if var_name in globals():
        actual_type = kind(globals()[var_name]).__name__
        validation_results[f'{{var_name}}_type_match'] = actual_type == expected_type
        print(f"Sort verify '{{var_name}}': Anticipated {{expected_type}}, Acquired {{actual_type}}")


validation_results
"""
        return self.python_repl.run(validation_code)
   
    def validate_algorithm_correctness(self, description: str, test_cases: Checklist[Dict[str, Any]]) -> str:
        """Validate algorithm implementations with take a look at circumstances"""
        validation_code = f"""
# Algorithm Validation for: {description}
validation_results = {{}}
test_results = []


test_cases = {test_cases}


for i, test_case in enumerate(test_cases):
    test_name = test_case.get('title', f'Check {{i+1}}')
    input_val = test_case.get('enter')
    anticipated = test_case.get('anticipated')
    function_name = test_case.get('perform')
   
    print(f"nRunning {{test_name}}:")
    print(f"Enter: {{input_val}}")
    print(f"Anticipated: {{anticipated}}")
   
    attempt:
        if function_name and function_name in globals():
            func = globals()[function_name]
            if callable(func):
                if isinstance(input_val, (checklist, tuple)):
                    outcome = func(*input_val)
                else:
                    outcome = func(input_val)
               
                handed = outcome == anticipated
                test_results.append({{
                    'test_name': test_name,
                    'enter': input_val,
                    'anticipated': anticipated,
                    'precise': outcome,
                    'handed': handed
                }})
               
                standing = "✓ PASS" if handed else "✗ FAIL"
                print(f"Precise: {{outcome}}")
                print(f"Standing: {{standing}}")
            else:
                print(f"✗ ERROR: '{{function_name}}' just isn't callable")
        else:
            print(f"✗ ERROR: Perform '{{function_name}}' not discovered")
           
    besides Exception as e:
        print(f"✗ ERROR: {{str(e)}}")
        test_results.append({{
            'test_name': test_name,
            'error': str(e),
            'handed': False
        }})


# Abstract
passed_tests = sum(1 for take a look at in test_results if take a look at.get('handed', False))
total_tests = len(test_results)
validation_results['tests_passed'] = passed_tests
validation_results['total_tests'] = total_tests
validation_results['success_rate'] = passed_tests / total_tests if total_tests > 0 else 0


print(f"n=== VALIDATION SUMMARY ===")
print(f"Exams handed: {{passed_tests}}/{{total_tests}}")
print(f"Success price: {{validation_results['success_rate']:.1%}}")


test_results
"""
        return self.python_repl.run(validation_code)

This ResultValidator class builds on the PythonREPLTool to robotically generate and run bespoke validation routines, checking numerical properties, verifying knowledge constructions, or operating algorithm take a look at circumstances towards the agent’s execution historical past. Emitting Python snippets that extract outputs, examine them to anticipated standards, and summarize move/fail outcomes closes the loop on “execute → validate” inside our agent’s workflow.

python_repl = PythonREPLTool()
validator = ResultValidator(python_repl)

Right here, we instantiate our interactive Python REPL software (python_repl) after which create a ResultValidator tied to that very same REPL occasion. This wiring ensures any code you execute is instantly obtainable for automated validation steps, closing the loop on execution and correctness checking.

python_tool = Instrument(
    title="python_repl",
    description="Execute Python code and return each the code and its output. Maintains state between executions.",
    func=python_repl.run
)


validation_tool = Instrument(
    title="result_validator",
    description="Validate the outcomes of earlier computations with particular take a look at circumstances and anticipated properties.",
    func=lambda question: validator.validate_mathematical_result(question, {})
)

Right here, we wrap our REPL and validation strategies into LangChain Instrument objects, assigning them clear names and descriptions. The agent can invoke python_repl to run code and result_validator to verify the final execution towards your specified standards robotically.

prompt_template = """You might be Claude, a sophisticated AI assistant with Python execution and outcome validation capabilities.


You may execute Python code to unravel complicated issues after which validate your outcomes to make sure accuracy.


Accessible instruments:
{instruments}


Use this format:
Query: the enter query you have to reply
Thought: analyze what must be completed
Motion: {tool_names}
Motion Enter: [your input]
Commentary: [result]
... (repeat Thought/Motion/Motion Enter/Commentary as wanted)
Thought: I ought to validate my outcomes
Motion: [validation if needed]
Motion Enter: [validation parameters]
Commentary: [validation results]
Thought: I now have the whole reply
Ultimate Reply: [comprehensive answer with validation confirmation]


Query: {enter}
{agent_scratchpad}"""


immediate = PromptTemplate(
    template=prompt_template,
    input_variables=["input", "agent_scratchpad"],
    partial_variables={
        "instruments": "python_repl - Execute Python codenresult_validator - Validate computation outcomes",
        "tool_names": "python_repl, result_validator"
    }
)

Above immediate template frames Claude as a dual-capability assistant that first causes (“Thought”), selects from the python_repl and result_validator instruments to run code and verify outputs, after which iterates till it has a validated resolution. By defining a transparent chain-of-thought construction with placeholders for software names and their utilization examples, it guides the agent to: (1) break down the issue, (2) name python_repl to execute needed code, (3) name result_validator to verify correctness, and at last (4) ship a self-checked “Ultimate Reply.” This scaffolding ensures a disciplined “write → run → validate” workflow.

class AdvancedClaudeCodeAgent:
    def __init__(self, anthropic_api_key=None):
        if anthropic_api_key:
            os.environ["ANTHROPIC_API_KEY"] = anthropic_api_key
       
        self.llm = ChatAnthropic(
            mannequin="claude-3-opus-20240229",
            temperature=0,
            max_tokens=4000
        )
       
        self.agent = create_react_agent(
            llm=self.llm,
            instruments=[python_tool, validation_tool],
            immediate=immediate
        )
       
        self.agent_executor = AgentExecutor(
            agent=self.agent,
            instruments=[python_tool, validation_tool],
            verbose=True,
            handle_parsing_errors=True,
            max_iterations=8,
            return_intermediate_steps=True
        )
       
        self.python_repl = python_repl
        self.validator = validator
   
    def run(self, question: str) -> str:
        attempt:
            outcome = self.agent_executor.invoke({"enter": question})
            return outcome["output"]
        besides Exception as e:
            return f"Error: {str(e)}"
   
    def validate_last_result(self, description: str, validation_params: Dict[str, Any]) -> str:
        """Manually validate the final computation outcome"""
        if 'test_cases' in validation_params:
            return self.validator.validate_algorithm_correctness(description, validation_params['test_cases'])
        elif 'expected_structure' in validation_params:
            return self.validator.validate_data_analysis(description, validation_params['expected_structure'])
        else:
            return self.validator.validate_mathematical_result(description, validation_params)
   
    def get_execution_summary(self) -> Dict[str, Any]:
        """Get abstract of all executions"""
        historical past = self.python_repl.get_execution_history()
        return {
            'total_executions': len(historical past),
            'successful_executions': len([h for h in history if not h['error']]),
            'failed_executions': len([h for h in history if h['error']]),
            'execution_details': historical past
        }

This AdvancedClaudeCodeAgent class wraps all the things right into a single, easy-to-use interface: it configures the Anthropic Claude consumer (utilizing your API key), instantiates a ReAct-style agent with our python_repl and result_validator instruments and the customized immediate, and units up an executor that drives iterative “assume → code → validate” loops. Its run() methodology permits you to submit natural-language queries and returns Claude’s last, self-checked reply; validate_last_result() exposes handbook hooks for extra checks; and get_execution_summary() gives a concise report on each code snippet you’ve executed (what number of succeeded, failed, and their particulars).

if __name__ == "__main__":
    API_KEY = "Use Your Personal Key Right here"
   
    agent = AdvancedClaudeCodeAgent(anthropic_api_key=API_KEY)
   
    print("🚀 Superior Claude Code Agent with Validation")
    print("=" * 60)
   
    print("n🔢 Instance 1: Prime Quantity Evaluation with Twin Prime Detection")
    print("-" * 60)
    query1 = """
    Discover all prime numbers between 1 and 200, then:
    1. Calculate their sum
    2. Discover all twin prime pairs (primes that differ by 2)
    3. Calculate the common hole between consecutive primes
    4. Determine the biggest prime hole on this vary
    After computation, validate that we discovered the proper variety of primes and that every one recognized numbers are literally prime.
    """
    result1 = agent.run(query1)
    print(result1)
   
    print("n" + "=" * 80 + "n")
   
    print("📊 Instance 2: Superior Gross sales Knowledge Evaluation with Statistical Validation")
    print("-" * 60)
    query2 = """
    Create a complete gross sales evaluation:
    1. Generate gross sales knowledge for 12 merchandise throughout 24 months with life like seasonal patterns
    2. Calculate month-to-month development charges, yearly totals, and development evaluation
    3. Determine prime 3 performing merchandise and worst 3 performing merchandise
    4. Carry out correlation evaluation between completely different merchandise
    5. Create abstract statistics (imply, median, customary deviation, percentiles)
    After evaluation, validate the information construction, guarantee all calculations are mathematically appropriate, and confirm the statistical measures.
    """
    result2 = agent.run(query2)
    print(result2)
   
    print("n" + "=" * 80 + "n")
   
    print("⚙️ Instance 3: Superior Algorithm Implementation with Check Suite")
    print("-" * 60)
    query3 = """
    Implement and validate a complete sorting and looking out system:
    1. Implement quicksort, mergesort, and binary search algorithms
    2. Create take a look at knowledge with numerous edge circumstances (empty lists, single parts, duplicates, sorted/reverse sorted)
    3. Benchmark the efficiency of various sorting algorithms
    4. Implement a perform to search out the kth largest component utilizing completely different approaches
    5. Check all implementations with complete take a look at circumstances together with edge circumstances
    After implementation, validate every algorithm with a number of take a look at circumstances to make sure correctness.
    """
    result3 = agent.run(query3)
    print(result3)
   
    print("n" + "=" * 80 + "n")
   
    print("🤖 Instance 4: Machine Studying Mannequin with Cross-Validation")
    print("-" * 60)
    query4 = """
    Construct an entire machine studying pipeline:
    1. Generate an artificial dataset with options and goal variable (classification downside)
    2. Implement knowledge preprocessing (normalization, characteristic scaling)
    3. Implement a easy linear classifier from scratch (gradient descent)
    4. Break up knowledge into prepare/validation/take a look at units
    5. Prepare the mannequin and consider efficiency (accuracy, precision, recall)
    6. Implement k-fold cross-validation
    7. Evaluate outcomes with completely different hyperparameters
    Validate all the pipeline by guaranteeing mathematical correctness of gradient descent, correct knowledge splitting, and life like efficiency metrics.
    """
    result4 = agent.run(query4)
    print(result4)
   
    print("n" + "=" * 80 + "n")
   
    print("📋 Execution Abstract")
    print("-" * 60)
    abstract = agent.get_execution_summary()
    print(f"Complete code executions: {abstract['total_executions']}")
    print(f"Profitable executions: {abstract['successful_executions']}")
    print(f"Failed executions: {abstract['failed_executions']}")
   
    if abstract['failed_executions'] > 0:
        print("nFailed executions particulars:")
        for i, execution in enumerate(abstract['execution_details']):
            if execution['error']:
                print(f"  {i+1}. Error: {execution['error']}")
   
    print(f"nSuccess price: {(abstract['successful_executions']/abstract['total_executions']*100):.1f}%")

Lastly, we instantiate the AdvancedClaudeCodeAgent along with your Anthropic API key, run 4 illustrative instance queries (masking prime‐quantity evaluation, gross sales knowledge analytics, algorithm implementations, and a easy ML pipeline), and print every validated outcome. Lastly, it gathers and shows a concise execution abstract, whole runs, successes, failures, and error particulars, demonstrating the agent’s stay “write → run → validate” workflow.

In conclusion, we have now developed a flexible AdvancedClaudeCodeAgent able to seamlessly mixing generative reasoning with exact computational management. At its core, this Agent doesn’t simply draft Python snippets; it runs them on the spot and checks their correctness towards your specified standards, closing the suggestions loop robotically. Whether or not you’re performing prime-number analyses, statistical knowledge evaluations, algorithm benchmarking, or end-to-end ML workflows, this sample ensures reliability and reproducibility.

Try the Pocket book on GitHub. All credit score for this analysis goes to the researchers of this venture. Additionally, be happy to comply with us on Twitter and don’t neglect to affix our 95k+ ML SubReddit and Subscribe to our E-newsletter.

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.