Upstage’s Groundedness Verify service offers a strong API for verifying that AI-generated responses are firmly anchored in dependable supply materials. By submitting context–reply pairs to the Upstage endpoint, we are able to immediately decide whether or not the equipped context helps a given reply and obtain a confidence evaluation of that grounding. On this tutorial, we exhibit methods to make the most of Upstage’s core capabilities, together with single-shot verification, batch processing, and multi-domain testing, to make sure that our AI methods produce factual and reliable content material throughout numerous topic areas.
!pip set up -qU langchain-core langchain-upstage
import os
import json
from typing import Record, Dict, Any
from langchain_upstage import UpstageGroundednessCheck
os.environ["UPSTAGE_API_KEY"] = "Use Your API Key Right here"
We set up the most recent LangChain core and the Upstage integration package deal, import the required Python modules for information dealing with and typing, and set our Upstage API key within the surroundings to authenticate all subsequent groundedness verify requests.
class AdvancedGroundednessChecker:
"""Superior wrapper for Upstage Groundedness Verify with batch processing and evaluation"""
def __init__(self):
self.checker = UpstageGroundednessCheck()
self.outcomes = []
def check_single(self, context: str, reply: str) -> Dict[str, Any]:
"""Verify groundedness for a single context-answer pair"""
request = {"context": context, "reply": reply}
response = self.checker.invoke(request)
end result = {
"context": context,
"reply": reply,
"grounded": response,
"confidence": self._extract_confidence(response)
}
self.outcomes.append(end result)
return end result
def batch_check(self, test_cases: Record[Dict[str, str]]) -> Record[Dict[str, Any]]:
"""Course of a number of check circumstances"""
batch_results = []
for case in test_cases:
end result = self.check_single(case["context"], case["answer"])
batch_results.append(end result)
return batch_results
def _extract_confidence(self, response) -> str:
"""Extract confidence degree from response"""
if hasattr(response, 'decrease'):
if 'grounded' in response.decrease():
return 'excessive'
elif 'not grounded' in response.decrease():
return 'low'
return 'medium'
def analyze_results(self) -> Dict[str, Any]:
"""Analyze batch outcomes"""
complete = len(self.outcomes)
grounded = sum(1 for r in self.outcomes if 'grounded' in str(r['grounded']).decrease())
return {
"total_checks": complete,
"grounded_count": grounded,
"not_grounded_count": complete - grounded,
"accuracy_rate": grounded / complete if complete > 0 else 0
}
checker = AdvancedGroundednessChecker()
The AdvancedGroundednessChecker class wraps Upstage’s groundedness API right into a easy, reusable interface that lets us run each single and batch context–reply checks whereas accumulating outcomes. It additionally contains helper strategies to extract a confidence label from every response and compute general accuracy statistics throughout all checks.
print("=== Check Case 1: Peak Discrepancy ===")
result1 = checker.check_single(
context="Mauna Kea is an inactive volcano on the island of Hawai'i.",
reply="Mauna Kea is 5,207.3 meters tall."
)
print(f"End result: {result1['grounded']}")
print("n=== Check Case 2: Right Info ===")
result2 = checker.check_single(
context="Python is a high-level programming language created by Guido van Rossum in 1991. It emphasizes code readability and ease.",
reply="Python was made by Guido van Rossum & focuses on code readability."
)
print(f"End result: {result2['grounded']}")
print("n=== Check Case 3: Partial Info ===")
result3 = checker.check_single(
context="The Nice Wall of China is roughly 13,000 miles lengthy and took over 2,000 years to construct.",
reply="The Nice Wall of China may be very lengthy."
)
print(f"End result: {result3['grounded']}")
print("n=== Check Case 4: Contradictory Info ===")
result4 = checker.check_single(
context="Water boils at 100 levels Celsius at sea degree atmospheric stress.",
reply="Water boils at 90 levels Celsius at sea degree."
)
print(f"End result: {result4['grounded']}")
We run 4 standalone groundedness checks, protecting a factual error in top, an accurate assertion, a imprecise partial match, and a contradictory declare, utilizing the AdvancedGroundednessChecker. It prints every Upstage end result as an instance how the service flags grounded versus ungrounded solutions throughout these completely different situations.
print("n=== Batch Processing Instance ===")
test_cases = [
{
"context": "Shakespeare wrote Romeo and Juliet in the late 16th century.",
"answer": "Romeo and Juliet was written by Shakespeare."
},
{
"context": "The speed of light is approximately 299,792,458 meters per second.",
"answer": "Light travels at about 300,000 kilometers per second."
},
{
"context": "Earth has one natural satellite called the Moon.",
"answer": "Earth has two moons."
}
]
batch_results = checker.batch_check(test_cases)
for i, end in enumerate(batch_results, 1):
print(f"Batch Check {i}: {end result['grounded']}")
print("n=== Outcomes Evaluation ===")
evaluation = checker.analyze_results()
print(f"Whole checks carried out: {evaluation['total_checks']}")
print(f"Grounded responses: {evaluation['grounded_count']}")
print(f"Not grounded responses: {evaluation['not_grounded_count']}")
print(f"Groundedness charge: {evaluation['accuracy_rate']:.2%}")
print("n=== Multi-domain Testing ===")
domains = {
"Science": {
"context": "Photosynthesis is the method by which vegetation convert daylight, carbon dioxide, & water into glucose and oxygen.",
"reply": "Crops use photosynthesis to make meals from daylight and CO2."
},
"Historical past": {
"context": "World Battle II resulted in 1945 after the give up of Japan following the atomic bombings.",
"reply": "WWII resulted in 1944 with Germany's give up."
},
"Geography": {
"context": "Mount Everest is the best mountain on Earth, positioned within the Himalayas at 8,848.86 meters.",
"reply": "Mount Everest is the tallest mountain and is positioned within the Himalayas."
}
}
for area, test_case in domains.gadgets():
end result = checker.check_single(test_case["context"], test_case["answer"])
print(f"{area}: {end result['grounded']}")
We execute a sequence of batched groundedness checks on predefined check circumstances, print particular person Upstage judgments, after which compute and show general accuracy metrics. It additionally conducts multi-domain validations in science, historical past, and geography as an instance how Upstage handles groundedness throughout completely different topic areas.
def create_test_report(checker_instance):
"""Generate an in depth check report"""
report = {
"abstract": checker_instance.analyze_results(),
"detailed_results": checker_instance.outcomes,
"suggestions": []
}
accuracy = report["summary"]["accuracy_rate"]
if accuracy 0.9:
report["recommendations"].append("Excessive accuracy - system performing effectively")
return report
print("n=== Ultimate Check Report ===")
report = create_test_report(checker)
print(f"Total Efficiency: {report['summary']['accuracy_rate']:.2%}")
print("Suggestions:", report["recommendations"])
print("n=== Tutorial Full ===")
print("This tutorial demonstrated:")
print("• Fundamental groundedness checking")
print("• Batch processing capabilities")
print("• Multi-domain testing")
print("• Outcomes evaluation and reporting")
print("• Superior wrapper implementation")
Lastly, we outline a create_test_report helper that compiles all gathered groundedness checks right into a abstract report, full with general accuracy and tailor-made suggestions, after which prints out the ultimate efficiency metrics together with a recap of the tutorial’s key demonstrations.
In conclusion, with Upstage’s Groundedness Verify at our disposal, we acquire a scalable, domain-agnostic resolution for real-time reality verification and confidence scoring. Whether or not we’re validating remoted claims or processing giant batches of responses, Upstage delivers clear, grounded/not-grounded judgments and confidence metrics that allow us to watch accuracy charges and generate actionable high quality studies. By integrating this service into our workflow, we are able to improve the reliability of AI-generated outputs and keep rigorous requirements of factual integrity throughout all functions.
Take a look at the Codes. All credit score for this analysis goes to the researchers of this venture. Additionally, be happy to observe us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our Publication.
Sana Hassan, a consulting intern at Marktechpost and dual-degree scholar at IIT Madras, is enthusiastic about making use of know-how and AI to deal with real-world challenges. With a eager curiosity in fixing sensible issues, he brings a contemporary perspective to the intersection of AI and real-life options.