A Coding Information Implementing ScrapeGraph and Gemini AI for an Automated, Scalable, Perception-Pushed Aggressive Intelligence and Market Evaluation Workflow

June 3, 2025

68

On this tutorial, we exhibit the best way to leverage ScrapeGraph’s highly effective scraping instruments together with Gemini AI to automate the gathering, parsing, and evaluation of competitor data. By utilizing ScrapeGraph’s SmartScraperTool and MarkdownifyTool, customers can extract detailed insights from product choices, pricing methods, expertise stacks, and market presence instantly from competitor web sites. The tutorial then employs Gemini’s superior language mannequin to synthesize these disparate knowledge factors into structured, actionable intelligence. All through the method, ScrapeGraph ensures that the uncooked extraction is each correct and scalable, permitting analysts to concentrate on strategic interpretation reasonably than guide knowledge gathering.

%pip set up --quiet -U langchain-scrapegraph langchain-google-genai pandas matplotlib seaborn

We quietly improve or set up the newest variations of important libraries, together with langchain-scrapegraph for superior net scraping and langchain-google-genai for integrating Gemini AI, in addition to knowledge evaluation instruments comparable to pandas, matplotlib, and seaborn, to make sure your surroundings is prepared for seamless aggressive intelligence workflows.

import getpass
import os
import json
import pandas as pd
from typing import Checklist, Dict, Any
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns

We import important Python libraries for establishing a safe, data-driven pipeline: getpass and os handle passwords and surroundings variables, json handles serialized knowledge, and pandas presents sturdy DataFrame operations. The typing module gives sort hints for higher code readability, whereas datetime data timestamps. Lastly, matplotlib.pyplot and seaborn equip us with instruments for creating insightful visualizations.

if not os.environ.get("SGAI_API_KEY"):
    os.environ["SGAI_API_KEY"] = getpass.getpass("ScrapeGraph AI API key:n")


if not os.environ.get("GOOGLE_API_KEY"):
    os.environ["GOOGLE_API_KEY"] = getpass.getpass("Google API key for Gemini:n")

We test if the SGAI_API_KEY and GOOGLE_API_KEY surroundings variables are already set; if not, the script securely prompts the consumer for his or her ScrapeGraph and Google (Gemini) API keys by way of getpass and shops them within the surroundings for subsequent authenticated requests.

from langchain_scrapegraph.instruments import (
    SmartScraperTool,
    SearchScraperTool,
    MarkdownifyTool,
    GetCreditsTool,
)
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableConfig, chain
from langchain_core.output_parsers import JsonOutputParser


smartscraper = SmartScraperTool()
searchscraper = SearchScraperTool()
markdownify = MarkdownifyTool()
credit = GetCreditsTool()


llm = ChatGoogleGenerativeAI(
    mannequin="gemini-1.5-flash",
    temperature=0.1,
    convert_system_message_to_human=True
)

Right here, we import and instantiate ScrapeGraph instruments, the SmartScraperTool, SearchScraperTool, MarkdownifyTool, and GetCreditsTool, for extracting and processing net knowledge, then configure the ChatGoogleGenerativeAI with the “gemini-1.5-flash” mannequin (low temperature and human-readable system messages) to drive our evaluation. We additionally usher in ChatPromptTemplate, RunnableConfig, chain, and JsonOutputParser from langchain_core to construction prompts and parse mannequin outputs.

class CompetitiveAnalyzer:
    def __init__(self):
        self.outcomes = []
        self.analysis_timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
   
    def scrape_competitor_data(self, url: str, company_name: str = None) -> Dict[str, Any]:
        """Scrape complete knowledge from a competitor web site"""
       
        extraction_prompt = """
        Extract the next data from this web site:
        1. Firm title and tagline
        2. Principal merchandise/providers supplied
        3. Pricing data (if accessible)
        4. Audience/market
        5. Key options and advantages highlighted
        6. Know-how stack talked about
        7. Contact data
        8. Social media presence
        9. Current information or bulletins
        10. Staff dimension indicators
        11. Funding data (if talked about)
        12. Buyer testimonials or case research
        13. Partnership data
        14. Geographic presence/markets served
       
        Return the knowledge in a structured JSON format with clear categorization.
        If data will not be accessible, mark as 'Not Accessible'.
        """
       
        attempt:
            consequence = smartscraper.invoke({
                "user_prompt": extraction_prompt,
                "website_url": url,
            })
           
            markdown_content = markdownify.invoke({"website_url": url})
           
            competitor_data = {
                "company_name": company_name or "Unknown",
                "url": url,
                "scraped_data": consequence,
                "markdown_length": len(markdown_content),
                "analysis_date": self.analysis_timestamp,
                "success": True,
                "error": None
            }
           
            return competitor_data
           
        besides Exception as e:
            return {
                "company_name": company_name or "Unknown",
                "url": url,
                "scraped_data": None,
                "error": str(e),
                "success": False,
                "analysis_date": self.analysis_timestamp
            }
   
    def analyze_competitor_landscape(self, rivals: Checklist[Dict[str, str]]) -> Dict[str, Any]:
        """Analyze a number of rivals and generate insights"""
       
        print(f"🔍 Beginning aggressive evaluation for {len(rivals)} corporations...")
       
        for i, competitor in enumerate(rivals, 1):
            print(f"📊 Analyzing {competitor['name']} ({i}/{len(rivals)})...")
           
            knowledge = self.scrape_competitor_data(
                competitor['url'],
                competitor['name']
            )
            self.outcomes.append(knowledge)
       
        analysis_prompt = ChatPromptTemplate.from_messages([
            ("system", """
            You are a senior business analyst specializing in competitive intelligence.
            Analyze the scraped competitor data and provide comprehensive insights including:
           
            1. Market positioning analysis
            2. Pricing strategy comparison
            3. Feature gap analysis  
            4. Target audience overlap
            5. Technology differentiation
            6. Market opportunities
            7. Competitive threats
            8. Strategic recommendations
           
            Provide actionable insights in JSON format with clear categories and recommendations.
            """),
            ("human", "Analyze this competitive data: {competitor_data}")
        ])
       
        clean_data = []
        for lead to self.outcomes:
            if consequence['success']:
                clean_data.append({
                    'firm': consequence['company_name'],
                    'url': consequence['url'],
                    'knowledge': consequence['scraped_data']
                })
       
        analysis_chain = analysis_prompt | llm | JsonOutputParser()
       
        attempt:
            competitive_analysis = analysis_chain.invoke({
                "competitor_data": json.dumps(clean_data, indent=2)
            })
        besides:
            analysis_chain_text = analysis_prompt | llm
            competitive_analysis = analysis_chain_text.invoke({
                "competitor_data": json.dumps(clean_data, indent=2)
            })
       
        return {
            "evaluation": competitive_analysis,
            "raw_data": self.outcomes,
            "summary_stats": self.generate_summary_stats()
        }
   
    def generate_summary_stats(self) -> Dict[str, Any]:
        """Generate abstract statistics from the evaluation"""
        successful_scrapes = sum(1 for r in self.outcomes if r['success'])
        failed_scrapes = len(self.outcomes) - successful_scrapes
       
        return {
            "total_companies_analyzed": len(self.outcomes),
            "successful_scrapes": successful_scrapes,
            "failed_scrapes": failed_scrapes,
            "success_rate": f"{(successful_scrapes/len(self.outcomes)*100):.1f}%" if self.outcomes else "0%",
            "analysis_timestamp": self.analysis_timestamp
        }
   
    def export_results(self, filename: str = None):
        """Export outcomes to JSON and CSV recordsdata"""
        if not filename:
            filename = f"competitive_analysis_{datetime.now().strftime('%Ypercentmpercentd_percentHpercentMpercentS')}"
       
        with open(f"{filename}.json", 'w') as f:
            json.dump({
                "outcomes": self.outcomes,
                "abstract": self.generate_summary_stats()
            }, f, indent=2)
       
        df_data = []
        for lead to self.outcomes:
            if consequence['success']:
                df_data.append({
                    'Firm': consequence['company_name'],
                    'URL': consequence['url'],
                    'Success': consequence['success'],
                    'Data_Length': len(str(consequence['scraped_data'])) if consequence['scraped_data'] else 0,
                    'Analysis_Date': consequence['analysis_date']
                })
       
        if df_data:
            df = pd.DataFrame(df_data)
            df.to_csv(f"{filename}.csv", index=False)
           
        print(f"✅ Outcomes exported to {filename}.json and {filename}.csv")

The CompetitiveAnalyzer class orchestrates end-to-end competitor analysis, scraping detailed firm data utilizing ScrapeGraph instruments, compiling and cleansing the outcomes, after which leveraging Gemini AI to generate structured aggressive insights. It additionally tracks success charges and timestamps, and gives utility strategies to export each uncooked and summarized knowledge into JSON and CSV codecs for straightforward downstream reporting and evaluation.

def run_ai_saas_analysis():
    """Run a complete evaluation of AI/SaaS rivals"""
   
    analyzer = CompetitiveAnalyzer()
   
    ai_saas_competitors = [
        {"name": "OpenAI", "url": "https://openai.com"},
        {"name": "Anthropic", "url": "https://anthropic.com"},
        {"name": "Hugging Face", "url": "https://huggingface.co"},
        {"name": "Cohere", "url": "https://cohere.ai"},
        {"name": "Scale AI", "url": "https://scale.com"},
    ]
   
    outcomes = analyzer.analyze_competitor_landscape(ai_saas_competitors)
   
    print("n" + "="*80)
    print("🎯 COMPETITIVE ANALYSIS RESULTS")
    print("="*80)
   
    print(f"n📊 Abstract Statistics:")
    stats = outcomes['summary_stats']
    for key, worth in stats.objects():
        print(f"   {key.substitute('_', ' ').title()}: {worth}")
   
    print(f"n🔍 Strategic Evaluation:")
    if isinstance(outcomes['analysis'], dict):
        for part, content material in outcomes['analysis'].objects():
            print(f"n   {part.substitute('_', ' ').title()}:")
            if isinstance(content material, checklist):
                for merchandise in content material:
                    print(f"     • {merchandise}")
            else:
                print(f"     {content material}")
    else:
        print(outcomes['analysis'])
   
    analyzer.export_results("ai_saas_competitive_analysis")
   
    return outcomes

The above perform initiates the aggressive evaluation by instantiating CompetitiveAnalyzer and defining the important thing AI/SaaS gamers to be evaluated. It then runs the total scraping-and-insights workflow, prints formatted abstract statistics and strategic findings, and eventually exports the detailed outcomes to JSON and CSV for additional use.

def run_ecommerce_analysis():
    """Analyze e-commerce platform rivals"""
   
    analyzer = CompetitiveAnalyzer()
   
    ecommerce_competitors = [
        {"name": "Shopify", "url": "https://shopify.com"},
        {"name": "WooCommerce", "url": "https://woocommerce.com"},
        {"name": "BigCommerce", "url": "https://bigcommerce.com"},
        {"name": "Magento", "url": "https://magento.com"},
    ]
   
    outcomes = analyzer.analyze_competitor_landscape(ecommerce_competitors)
    analyzer.export_results("ecommerce_competitive_analysis")
   
    return outcomes

The above perform units up a CompetitiveAnalyzer to guage main e-commerce platforms by scraping particulars from every web site, producing strategic insights, after which exporting the findings to each JSON and CSV recordsdata below the title “ecommerce_competitive_analysis.”

@chain
def social_media_monitoring_chain(company_urls: Checklist[str], config: RunnableConfig):
    """Monitor social media presence and engagement methods of rivals"""
   
    social_media_prompt = ChatPromptTemplate.from_messages([
        ("system", """
        You are a social media strategist. Analyze the social media presence and strategies
        of these companies. Focus on:
        1. Platform presence (LinkedIn, Twitter, Instagram, etc.)
        2. Content strategy patterns
        3. Engagement tactics
        4. Community building approaches
        5. Brand voice and messaging
        6. Posting frequency and timing
        Provide actionable insights for improving social media strategy.
        """),
        ("human", "Analyze social media data for: {urls}")
    ])
   
    social_data = []
    for url in company_urls:
        attempt:
            consequence = smartscraper.invoke({
                "user_prompt": "Extract all social media hyperlinks, neighborhood engagement options, and social proof components",
                "website_url": url,
            })
            social_data.append({"url": url, "social_data": consequence})
        besides Exception as e:
            social_data.append({"url": url, "error": str(e)})
   
    chain = social_media_prompt | llm
    evaluation = chain.invoke({"urls": json.dumps(social_data, indent=2)}, config=config)
   
    return {
        "social_analysis": evaluation,
        "raw_social_data": social_data
    }

Right here, this chained perform defines a pipeline to collect and analyze rivals’ social media footprints: it makes use of ScrapeGraph’s good scraper to extract social media hyperlinks and engagement components, then feeds that knowledge into Gemini with a centered immediate on presence, content material technique, and neighborhood techniques. Lastly, it returns each the uncooked scraped data and the AI-generated, actionable social media insights in a single structured output.

def check_credits():
    """Verify accessible credit"""
    attempt:
        credits_info = credit.invoke({})
        print(f"💳 Accessible Credit: {credits_info}")
        return credits_info
    besides Exception as e:
        print(f"⚠️  Couldn't test credit: {e}")
        return None

The above perform calls the GetCreditsTool to retrieve and show your accessible ScrapeGraph/Gemini API credit, printing the consequence or a warning if the test fails, and returns the credit score data (or None on error).

if __name__ == "__main__":
    print("🚀 Superior Aggressive Evaluation Device with Gemini AI")
    print("="*60)
   
    check_credits()
   
    print("n🤖 Operating AI/SaaS Aggressive Evaluation...")
    ai_results = run_ai_saas_analysis()
   
    run_additional = enter("n❓ Run e-commerce evaluation as effectively? (y/n): ").decrease().strip()
    if run_additional == 'y':
        print("n🛒 Operating E-commerce Platform Evaluation...")
        ecom_results = run_ecommerce_analysis()
   
    print("n✨ Evaluation full! Verify the exported recordsdata for detailed outcomes.")

Lastly, the final code piece serves because the script’s entry level: it prints a header, checks API credit, then kicks off the AI/SaaS competitor evaluation (and optionally e-commerce evaluation) earlier than signaling that each one outcomes have been exported.

In conclusion, integrating ScrapeGraph’s scraping capabilities with Gemini AI transforms a historically time-consuming aggressive intelligence workflow into an environment friendly, repeatable pipeline. ScrapeGraph handles the heavy lifting of fetching and normalizing web-based data, whereas Gemini’s language understanding turns that uncooked knowledge into high-level strategic suggestions. Consequently, companies can quickly assess market positioning, establish characteristic gaps, and uncover rising alternatives with minimal guide intervention. By automating these steps, customers achieve pace and consistency, in addition to the flexibleness to increase their evaluation to new rivals or markets as wanted.

Take a look at the Pocket book on GitHub. All credit score for this analysis goes to the researchers of this venture. Additionally, be happy to observe us on Twitter and don’t overlook to hitch our 95k+ ML SubReddit and Subscribe to our Publication.

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.

Previous articleRow Zero Supplies Excel-Like Expertise for Billion-Row Information Units

Next articleApple Intelligence would possibly take a backseat at WWDC25

A Coding Information Implementing ScrapeGraph and Gemini AI for an Automated, Scalable, Perception-Pushed Aggressive Intelligence and Market Evaluation Workflow

An Implementation to Construct Dynamic AI Techniques with the Mannequin Context Protocol (MCP) for Actual-Time Useful resource and Instrument Integration

Microsoft AI Proposes BitNet Distillation (BitDistill): A Light-weight Pipeline that Delivers as much as 10x Reminiscence Financial savings and about 2.65x CPU Speedup

Weak-for-Robust (W4S): A Novel Reinforcement Studying Algorithm that Trains a weak Meta Agent to Design Agentic Workflows with Stronger LLMs

LEAVE A REPLY Cancel reply

Most Popular

Anatomy of an AI agent data base

decodable – What’s unsuitable with my enum decoding in Swift?

Introducing catalog federation for Apache Iceberg tables within the AWS Glue Knowledge Catalog

Shawn Hymel’s CLI Information Frees Arduino UNO Q Customers From the “Fairly Limiting” App Lab

Recent Comments

ABOUT US

POPULAR POSTS

Anatomy of an AI agent data base

decodable – What’s unsuitable with my enum decoding in Swift?

Introducing catalog federation for Apache Iceberg tables within the AWS Glue Knowledge Catalog

POPULAR CATEGORY