Constructing an Superior PaperQA2 Analysis Agent with Google Gemini for Scientific Literature Evaluation

August 10, 2025

42

On this tutorial, we stroll via constructing a sophisticated PaperQA2 AI Agent powered by Google’s Gemini mannequin, designed particularly for scientific literature evaluation. We arrange the atmosphere in Google Colab/Pocket book, configure the Gemini API, and combine it seamlessly with PaperQA2 to course of and question a number of analysis papers. By the tip of the setup, we have now an clever agent able to answering complicated questions, performing multi-question analyses, and conducting comparative analysis throughout papers, all whereas offering clear solutions with proof from supply paperwork. Try the Full Codes right here.

!pip set up paper-qa>=5 google-generativeai requests pypdf2 -q


import os
import asyncio
import tempfile
import requests
from pathlib import Path
from paperqa import Settings, ask, agent_query
from paperqa.settings import AgentSettings
import google.generativeai as genai


GEMINI_API_KEY = "Use Your Personal API Key Right here"
os.environ["GEMINI_API_KEY"] = GEMINI_API_KEY


genai.configure(api_key=GEMINI_API_KEY)
print("✅ Gemini API key configured efficiently!")

We start by putting in the required libraries, together with PaperQA2 and Google’s Generative AI SDK, after which import the required modules for our venture. We set our Gemini API key as an atmosphere variable and configure it, guaranteeing the mixing is prepared to be used. Try the Full Codes right here.

def download_sample_papers():
   """Obtain pattern AI/ML analysis papers for demonstration"""
   papers = {
       "attention_is_all_you_need.pdf": "https://arxiv.org/pdf/1706.03762.pdf",
       "bert_paper.pdf": "https://arxiv.org/pdf/1810.04805.pdf",
       "gpt3_paper.pdf": "https://arxiv.org/pdf/2005.14165.pdf"
   }
  
   papers_dir = Path("sample_papers")
   papers_dir.mkdir(exist_ok=True)
  
   print("📥 Downloading pattern analysis papers...")
   for filename, url in papers.gadgets():
       filepath = papers_dir / filename
       if not filepath.exists():
           strive:
               response = requests.get(url, stream=True, timeout=30)
               response.raise_for_status()
               with open(filepath, 'wb') as f:
                   for chunk in response.iter_content(chunk_size=8192):
                       f.write(chunk)
               print(f"✅ Downloaded: {filename}")
           besides Exception as e:
               print(f"❌ Didn't obtain {filename}: {e}")
       else:
           print(f"📄 Already exists: {filename}")
  
   return str(papers_dir)


papers_directory = download_sample_papers()


def create_gemini_settings(paper_dir: str, temperature: float = 0.1):
   """Create optimized settings for PaperQA2 with Gemini fashions"""
  
   return Settings(
       llm="gemini/gemini-1.5-flash",
       summary_llm="gemini/gemini-1.5-flash",
      
       agent=AgentSettings(
           agent_llm="gemini/gemini-1.5-flash",
           search_count=6, 
           timeout=300.0, 
       ),
      
       embedding="gemini/text-embedding-004",
      
       temperature=temperature,
       paper_directory=paper_dir,
      
       reply=dict(
           evidence_k=8,            
           answer_max_sources=4,      
           evidence_summary_length="about 80 phrases",
           answer_length="about 150 phrases, however will be longer",
           max_concurrent_requests=2,
       ),
      
       parsing=dict(
           chunk_size=4000,
           overlap=200,
       ),
      
       verbosity=1,
   )

We obtain a set of well-known AI/ML analysis papers for our evaluation and retailer them in a devoted folder. We then create optimized PaperQA2 settings configured to make use of Gemini for all LLM and embedding duties, fine-tuning parameters like search rely, proof retrieval, and parsing for environment friendly and correct literature processing. Try the Full Codes right here.

class PaperQAAgent:
   """Superior AI Agent for scientific literature evaluation utilizing PaperQA2"""
  
   def __init__(self, papers_directory: str, temperature: float = 0.1):
       self.settings = create_gemini_settings(papers_directory, temperature)
       self.papers_dir = papers_directory
       print(f"🤖 PaperQA Agent initialized with papers from: {papers_directory}")
      
   async def ask_question(self, query: str, use_agent: bool = True):
       """Ask a query concerning the analysis papers"""
       print(f"n❓ Query: {query}")
       print("🔍 Looking via analysis papers...")
      
       strive:
           if use_agent:
               response = await agent_query(question=query, settings=self.settings)
           else:
               response = ask(query, settings=self.settings)
              
           return response
          
       besides Exception as e:
           print(f"❌ Error processing query: {e}")
           return None
  
   def display_answer(self, response):
       """Show the reply with formatting"""
       if response is None:
           print("❌ No response obtained")
           return
          
       print("n" + "="*60)
       print("📋 ANSWER:")
       print("="*60)
      
       answer_text = getattr(response, 'reply', str(response))
       print(f"n{answer_text}")
      
       contexts = getattr(response, 'contexts', getattr(response, 'context', []))
       if contexts:
           print("n" + "-"*40)
           print("📚 SOURCES USED:")
           print("-"*40)
           for i, context in enumerate(contexts[:3], 1):
               context_name = getattr(context, 'title', getattr(context, 'doc', f'Supply {i}'))
               context_text = getattr(context, 'textual content', getattr(context, 'content material', str(context)))
               print(f"n{i}. {context_name}")
               print(f"   Textual content preview: {context_text[:150]}...")
  
   async def multi_question_analysis(self, questions: listing):
       """Analyze a number of questions in sequence"""
       outcomes = {}
       for i, query in enumerate(questions, 1):
           print(f"n🔄 Processing query {i}/{len(questions)}")
           response = await self.ask_question(query)
           outcomes = response
          
           if response:
               print(f"✅ Accomplished: {query[:50]}...")
           else:
               print(f"❌ Failed: {query[:50]}...")
              
       return outcomes
  
   async def comparative_analysis(self, matter: str):
       """Carry out comparative evaluation throughout papers"""
       questions = [
           f"What are the key innovations in {topic}?",
           f"What are the limitations of current {topic} approaches?",
           f"What future research directions are suggested for {topic}?",
       ]
      
       print(f"n🔬 Beginning comparative evaluation on: {matter}")
       return await self.multi_question_analysis(questions)


async def basic_demo():
   """Show primary PaperQA performance"""
   agent = PaperQAAgent(papers_directory)
  
   query = "What's the transformer structure and why is it necessary?"
   response = await agent.ask_question(query)
   agent.display_answer(response)


print("🚀 Working primary demonstration...")
await basic_demo()


async def advanced_demo():
   """Show superior multi-question evaluation"""
   agent = PaperQAAgent(papers_directory, temperature=0.2)
  
   questions = [
       "How do attention mechanisms work in transformers?",
       "What are the computational challenges of large language models?",
       "How has pre-training evolved in natural language processing?"
   ]
  
   print("🧠 Working superior multi-question evaluation...")
   outcomes = await agent.multi_question_analysis(questions)
  
   for query, response in outcomes.gadgets():
       print(f"n{'='*80}")
       print(f"Q: {query}")
       print('='*80)
       if response:
           answer_text = getattr(response, 'reply', str(response))
           display_text = answer_text[:300] + "..." if len(answer_text) > 300 else answer_text
           print(display_text)
       else:
           print("❌ No reply accessible")


print("n🚀 Working superior demonstration...")
await advanced_demo()


async def research_comparison_demo():
   """Show comparative analysis evaluation"""
   agent = PaperQAAgent(papers_directory)
  
   outcomes = await agent.comparative_analysis("consideration mechanisms in neural networks")
  
   print("n" + "="*80)
   print("📊 COMPARATIVE ANALYSIS RESULTS")
   print("="*80)
  
   for query, response in outcomes.gadgets():
       print(f"n🔍 {query}")
       print("-" * 50)
       if response:
           answer_text = getattr(response, 'reply', str(response))
           print(answer_text)
       else:
           print("❌ Evaluation unavailable")
       print()


print("🚀 Working comparative analysis evaluation...")
await research_comparison_demo()

̌We outline a PaperQAAgent that makes use of our Gemini-tuned PaperQA2 settings to go looking papers, reply questions, and cite sources with clear show helpers. We then run primary, superior multi-question, and comparative demos so we are able to interrogate literature end-to-end and summarize findings effectively. Try the Full Codes right here.

def create_interactive_agent():
   """Create an interactive agent for customized queries"""
   agent = PaperQAAgent(papers_directory)
  
   async def question(query: str, show_sources: bool = True):
       """Interactive question perform"""
       response = await agent.ask_question(query)
      
       if response:
           answer_text = getattr(response, 'reply', str(response))
           print(f"n🤖 Reply:n{answer_text}")
          
           if show_sources:
               contexts = getattr(response, 'contexts', getattr(response, 'context', []))
               if contexts:
                   print(f"n📚 Based mostly on {len(contexts)} sources:")
                   for i, ctx in enumerate(contexts[:3], 1):
                       ctx_name = getattr(ctx, 'title', getattr(ctx, 'doc', f'Supply {i}'))
                       print(f"  {i}. {ctx_name}")
       else:
           print("❌ Sorry, I could not discover a solution to that query.")
          
       return response
  
   return question


interactive_query = create_interactive_agent()


print("n🎯 Interactive agent prepared! Now you can ask customized questions:")
print("Instance: await interactive_query('How do transformers deal with lengthy sequences?')")


def print_usage_tips():
   """Print useful utilization suggestions"""
   suggestions = """
   🎯 USAGE TIPS FOR PAPERQA2 WITH GEMINI:
  
   1. 📝 Query Formulation:
      - Be particular about what you wish to know
      - Ask about comparisons, mechanisms, or implications
      - Use domain-specific terminology
  
   2. 🔧 Mannequin Configuration:
      - Gemini 1.5 Flash is free and dependable
      - Regulate temperature (0.0-1.0) for creativity vs precision
      - Use smaller chunk_size for higher processing
  
   3. 📚 Doc Administration:
      - Add PDFs to the papers listing
      - Use significant filenames
      - Combine various kinds of papers for higher protection
  
   4. ⚡ Efficiency Optimization:
      - Restrict concurrent requests without spending a dime tier
      - Use smaller evidence_k values for quicker responses
      - Cache outcomes by saving the agent state
  
   5. 🧠 Superior Utilization:
      - Chain a number of questions for deeper evaluation
      - Use comparative evaluation for analysis evaluations
      - Mix with different instruments for full workflows
  
   📖 Instance Inquiries to Strive:
   - "Examine the eye mechanisms in BERT vs GPT fashions"
   - "What are the computational bottlenecks in transformer coaching?"
   - "How has pre-training advanced from word2vec to trendy LLMs?"
   - "What are the important thing improvements that made transformers profitable?"
   """
   print(suggestions)


print_usage_tips()


def save_analysis_results(outcomes: dict, filename: str = "paperqa_analysis.txt"):
   """Save evaluation outcomes to a file"""
   with open(filename, 'w', encoding='utf-8') as f:
       f.write("PaperQA2 Evaluation Resultsn")
       f.write("=" * 50 + "nn")
      
       for query, response in outcomes.gadgets():
           f.write(f"Query: {query}n")
           f.write("-" * 30 + "n")
           if response:
               answer_text = getattr(response, 'reply', str(response))
               f.write(f"Reply: {answer_text}n")
              
               contexts = getattr(response, 'contexts', getattr(response, 'context', []))
               if contexts:
                   f.write(f"nSources ({len(contexts)}):n")
                   for i, ctx in enumerate(contexts, 1):
                       ctx_name = getattr(ctx, 'title', getattr(ctx, 'doc', f'Supply {i}'))
                       f.write(f"  {i}. {ctx_name}n")
           else:
               f.write("Reply: No response availablen")
           f.write("n" + "="*50 + "nn")
  
   print(f"💾 Outcomes saved to: {filename}")


print("✅ Tutorial full! You now have a totally practical PaperQA2 AI Agent with Gemini.")

We create an interactive question helper that permits us to ask customized questions on demand and optionally view cited sources. We additionally print sensible utilization suggestions and add a saver that writes each Q&A with supply names to a outcomes file, wrapping up the tutorial with a ready-to-use workflow.

In conclusion, we efficiently created a totally practical AI analysis assistant that leverages the velocity and flexibility of Gemini with the strong paper processing capabilities of PaperQA2. We are able to now interactively discover scientific papers, run focused queries, and even carry out in-depth comparative analyses with minimal effort. This setup enhances our potential to digest complicated analysis and in addition streamlines all the literature overview course of, enabling us to concentrate on insights somewhat than guide looking out.

Try the Full Codes right here. Be happy to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be happy to comply with us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our E-newsletter.

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

Previous articleiOS swift BLE updateValue() with out PeripheralManagerIsReady?

Next articleRobotaxi Falls Into Development Pit, Tesla Dojo Accomplished

Constructing an Superior PaperQA2 Analysis Agent with Google Gemini for Scientific Literature Evaluation

An Implementation to Construct Dynamic AI Techniques with the Mannequin Context Protocol (MCP) for Actual-Time Useful resource and Instrument Integration

Microsoft AI Proposes BitNet Distillation (BitDistill): A Light-weight Pipeline that Delivers as much as 10x Reminiscence Financial savings and about 2.65x CPU Speedup

Weak-for-Robust (W4S): A Novel Reinforcement Studying Algorithm that Trains a weak Meta Agent to Design Agentic Workflows with Stronger LLMs

LEAVE A REPLY Cancel reply

Most Popular

Obtain 2x quicker information lake question efficiency with Apache Iceberg on Amazon Redshift

ADU 1391: The Way forward for Drones: New Drones, Alternatives and Challenges

Raspberry Pi Goals for Extra Versatile OS Configuration with a Transfer to Cloud-Init

The place AI meets cloud-native computing

Recent Comments

ABOUT US

POPULAR POSTS

Obtain 2x quicker information lake question efficiency with Apache Iceberg on Amazon Redshift

ADU 1391: The Way forward for Drones: New Drones, Alternatives and Challenges

Raspberry Pi Goals for Extra Versatile OS Configuration with a Transfer to Cloud-Init

POPULAR CATEGORY