We start this tutorial by designing a modular deep analysis system that runs instantly on Google Colab. We configure Gemini because the core reasoning engine, combine DuckDuckGo’s Instantaneous Reply API for light-weight internet search, and orchestrate multi-round querying with deduplication and delay dealing with. We emphasize effectivity by limiting API calls, parsing concise snippets, and utilizing structured prompts to extract key factors, themes, and insights. Each part, from supply assortment to JSON-based evaluation, permits us to experiment shortly and adapt the workflow for deeper or broader analysis queries. Try the FULL CODES right here.
import os
import json
import time
import requests
from typing import Checklist, Dict, Any
from dataclasses import dataclass
import google.generativeai as genai
from urllib.parse import quote_plus
import re
We begin by importing important Python libraries that deal with system operations, JSON processing, internet requests, and knowledge buildings. We additionally incorporate Google’s Generative AI SDK and utilities, resembling URL encoding, to make sure our analysis system operates easily. Try the FULL CODES right here.
@dataclass
class ResearchConfig:
gemini_api_key: str
max_sources: int = 10
max_content_length: int = 5000
search_delay: float = 1.0
class DeepResearchSystem:
def __init__(self, config: ResearchConfig):
self.config = config
genai.configure(api_key=config.gemini_api_key)
self.mannequin = genai.GenerativeModel('gemini-1.5-flash')
def search_web(self, question: str, num_results: int = 5) -> Checklist[Dict[str, str]]:
"""Search internet utilizing DuckDuckGo Instantaneous Reply API"""
attempt:
encoded_query = quote_plus(question)
url = f"https://api.duckduckgo.com/?q={encoded_query}&format=json&no_redirect=1"
response = requests.get(url, timeout=10)
knowledge = response.json()
outcomes = []
if 'RelatedTopics' in knowledge:
for matter in knowledge['RelatedTopics'][:num_results]:
if isinstance(matter, dict) and 'Textual content' in matter:
outcomes.append({
'title': matter.get('Textual content', '')[:100] + '...',
'url': matter.get('FirstURL', ''),
'snippet': matter.get('Textual content', '')
})
if not outcomes:
outcomes = [{
'title': f"Research on: {query}",
'url': f"https://search.example.com/q={encoded_query}",
'snippet': f"General information and research about {query}"
}]
return outcomes
besides Exception as e:
print(f"Search error: {e}")
return [{'title': f"Research: {query}", 'url': '', 'snippet': f"Topic: {query}"}]
def extract_key_points(self, content material: str) -> Checklist[str]:
"""Extract key factors utilizing Gemini"""
immediate = f"""
Extract 5-7 key factors from this content material. Be concise and factual:
{content material[:2000]}
Return as numbered listing:
"""
attempt:
response = self.mannequin.generate_content(immediate)
return [line.strip() for line in response.text.split('n') if line.strip()]
besides:
return ["Key information extracted from source"]
def analyze_sources(self, sources: Checklist[Dict[str, str]], question: str) -> Dict[str, Any]:
"""Analyze sources for relevance and extract insights"""
evaluation = {
'total_sources': len(sources),
'key_themes': [],
'insights': [],
'confidence_score': 0.7
}
all_content = " ".be part of([s.get('snippet', '') for s in sources])
if len(all_content) > 100:
immediate = f"""
Analyze this analysis content material for the question: "{question}"
Content material: {all_content[:1500]}
Present:
1. 3-4 key themes (one line every)
2. 3-4 essential insights (one line every)
3. General confidence (0.1-1.0)
Format as JSON with keys: themes, insights, confidence
"""
attempt:
response = self.mannequin.generate_content(immediate)
textual content = response.textual content
if 'themes' in textual content.decrease():
evaluation['key_themes'] = ["Theme extracted from analysis"]
evaluation['insights'] = ["Insight derived from sources"]
besides:
cross
return evaluation
def generate_comprehensive_report(self, question: str, sources: Checklist[Dict[str, str]],
evaluation: Dict[str, Any]) -> str:
"""Generate remaining analysis report"""
sources_text = "n".be part of([f"- {s['title']}: {s['snippet'][:200]}"
for s in sources[:5]])
immediate = f"""
Create a complete analysis report on: "{question}"
Primarily based on these sources:
{sources_text}
Evaluation abstract:
- Complete sources: {evaluation['total_sources']}
- Confidence: {evaluation['confidence_score']}
Construction the report with:
1. Govt Abstract (2-3 sentences)
2. Key Findings (3-5 bullet factors)
3. Detailed Evaluation (2-3 paragraphs)
4. Conclusions & Implications (1-2 paragraphs)
5. Analysis Limitations
Be factual, well-structured, and insightful.
"""
attempt:
response = self.mannequin.generate_content(immediate)
return response.textual content
besides Exception as e:
return f"""
# Analysis Report: {question}
## Govt Abstract
Analysis performed on "{question}" utilizing {evaluation['total_sources']} sources.
## Key Findings
- A number of views analyzed
- Complete data gathered
- Analysis accomplished efficiently
## Evaluation
The analysis course of concerned systematic assortment and evaluation of data associated to {question}. Varied sources have been consulted to supply a balanced perspective.
## Conclusions
The analysis gives a basis for understanding {question} primarily based on accessible data.
## Analysis Limitations
Restricted by API constraints and supply availability.
"""
def conduct_research(self, question: str, depth: str = "commonplace") -> Dict[str, Any]:
"""Major analysis orchestration methodology"""
print(f"🔍 Beginning analysis on: {question}")
search_rounds = {"primary": 1, "commonplace": 2, "deep": 3}.get(depth, 2)
sources_per_round = {"primary": 3, "commonplace": 5, "deep": 7}.get(depth, 5)
all_sources = []
search_queries = [query]
if depth in ["standard", "deep"]:
attempt:
related_prompt = f"Generate 2 associated search queries for: {question}. One line every."
response = self.mannequin.generate_content(related_prompt)
additional_queries = [q.strip() for q in response.text.split('n') if q.strip()][:2]
search_queries.lengthen(additional_queries)
besides:
cross
for i, search_query in enumerate(search_queries[:search_rounds]):
print(f"🔎 Search spherical {i+1}: {search_query}")
sources = self.search_web(search_query, sources_per_round)
all_sources.lengthen(sources)
time.sleep(self.config.search_delay)
unique_sources = []
seen_urls = set()
for supply in all_sources:
if supply['url'] not in seen_urls:
unique_sources.append(supply)
seen_urls.add(supply['url'])
print(f"📊 Analyzing {len(unique_sources)} distinctive sources...")
evaluation = self.analyze_sources(unique_sources[:self.config.max_sources], question)
print("📝 Producing complete report...")
report = self.generate_comprehensive_report(question, unique_sources, evaluation)
return {
'question': question,
'sources_found': len(unique_sources),
'evaluation': evaluation,
'report': report,
'sources': unique_sources[:10]
}
We outline a ResearchConfig dataclass to handle parameters like API keys, supply limits, and delays, after which construct a DeepResearchSystem class that integrates Gemini with DuckDuckGo search. We implement strategies for internet search, key level extraction, supply evaluation, and report era, permitting us to orchestrate multi-round analysis and produce structured insights in a streamlined workflow. Try the FULL CODES right here.
def setup_research_system(api_key: str) -> DeepResearchSystem:
"""Fast setup for Google Colab"""
config = ResearchConfig(
gemini_api_key=api_key,
max_sources=15,
max_content_length=6000,
search_delay=0.5
)
return DeepResearchSystem(config)
We create a setup_research_system perform that simplifies initialization in Google Colab by wrapping our configuration in ResearchConfig and returning a ready-to-use DeepResearchSystem occasion with customized limits and delays. Try the FULL CODES right here.
if __name__ == "__main__":
API_KEY = "Use Your Personal API Key Right here"
researcher = setup_research_system(API_KEY)
question = "Deep Analysis Agent Structure"
outcomes = researcher.conduct_research(question, depth="commonplace")
print("="*50)
print("RESEARCH RESULTS")
print("="*50)
print(f"Question: {outcomes['query']}")
print(f"Sources discovered: {outcomes['sources_found']}")
print(f"Confidence: {outcomes['analysis']['confidence_score']}")
print("n" + "="*50)
print("COMPREHENSIVE REPORT")
print("="*50)
print(outcomes['report'])
print("n" + "="*50)
print("SOURCES CONSULTED")
print("="*50)
for i, supply in enumerate(outcomes['sources'][:5], 1):
print(f"{i}. {supply['title']}")
print(f" URL: {supply['url']}")
print(f" Preview: {supply['snippet'][:150]}...")
print()
We add a essential execution block the place we initialize the analysis system with our API key, run a question on “Deep Analysis Agent Structure,” after which show structured outputs. We print analysis outcomes, a complete report generated by Gemini, and an inventory of consulted sources with titles, URLs, and previews.
In conclusion, we see how your complete pipeline constantly transforms unstructured snippets right into a structured, well-organized report. We efficiently mix search, language modeling, and evaluation layers to simulate an entire analysis workflow inside Colab. Through the use of Gemini for extraction, synthesis, and reporting, and DuckDuckGo without cost search entry, we create a reusable basis for extra superior agentic analysis techniques. This pocket book gives a sensible, technically detailed template that we will now increase with extra fashions, customized rating, or domain-specific integrations, whereas nonetheless retaining a compact, end-to-end structure.
Try the FULL CODES right here. Be happy to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be happy to observe us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our Publication.
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.