On this tutorial, we implement the BioCypher AI Agent, a strong device designed for constructing, querying, and analyzing biomedical information graphs utilizing the BioCypher framework. By combining the strengths of BioCypher, a high-performance, schema-based interface for organic information integration, with the flexibleness of NetworkX, this tutorial empowers customers to simulate advanced organic relationships equivalent to gene-disease associations, drug-target interactions, and pathway involvements. The agent additionally consists of capabilities for producing artificial biomedical information, visualizing information graphs, and performing clever queries, equivalent to centrality evaluation and neighbor detection.
!pip set up biocypher pandas numpy networkx matplotlib seaborn
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
import json
import random
from typing import Dict, Record, Tuple, Any
We start by putting in the important Python libraries required for our biomedical graph evaluation, together with biocypher, Pandas, NumPy, NetworkX, Matplotlib, and Seaborn. These packages allow us to deal with information, create and manipulate information graphs, and successfully visualize relationships. As soon as put in, we import all needed modules to arrange our growth setting.
attempt:
from biocypher import BioCypher
from biocypher._config import config
BIOCYPHER_AVAILABLE = True
besides ImportError:
print("BioCypher not accessible, utilizing NetworkX-only implementation")
BIOCYPHER_AVAILABLE = False
We try to import the BioCypher framework, which supplies a schema-based interface for managing biomedical information graphs. If the import is profitable, we allow BioCypher options; in any other case, we gracefully fall again to a NetworkX-only mode, making certain that the remainder of the evaluation can nonetheless proceed with out interruption.
class BiomedicalAIAgent:
"""Superior AI Agent for biomedical information graph evaluation utilizing BioCypher"""
def __init__(self):
if BIOCYPHER_AVAILABLE:
attempt:
self.bc = BioCypher()
self.use_biocypher = True
besides Exception as e:
print(f"BioCypher initialization failed: {e}")
self.use_biocypher = False
else:
self.use_biocypher = False
self.graph = nx.Graph()
self.entities = {}
self.relationships = []
self.knowledge_base = self._initialize_knowledge_base()
def _initialize_knowledge_base(self) -> Dict[str, List[str]]:
"""Initialize pattern biomedical information base"""
return {
"genes": ["BRCA1", "TP53", "EGFR", "KRAS", "MYC", "PIK3CA", "PTEN"],
"ailments": ["breast_cancer", "lung_cancer", "diabetes", "alzheimer", "heart_disease"],
"medicine": ["aspirin", "metformin", "doxorubicin", "paclitaxel", "imatinib"],
"pathways": ["apoptosis", "cell_cycle", "DNA_repair", "metabolism", "inflammation"],
"proteins": ["p53", "EGFR", "insulin", "hemoglobin", "collagen"]
}
def generate_synthetic_data(self, n_entities: int = 50) -> None:
"""Generate artificial biomedical information for demonstration"""
print("🧬 Producing artificial biomedical information...")
for entity_type, objects in self.knowledge_base.objects():
for merchandise in objects:
entity_id = f"{entity_type}_{merchandise}"
self.entities[entity_id] = {
"id": entity_id,
"kind": entity_type,
"identify": merchandise,
"properties": self._generate_properties(entity_type)
}
entity_ids = record(self.entities.keys())
for _ in vary(n_entities):
supply = random.selection(entity_ids)
goal = random.selection(entity_ids)
if supply != goal:
rel_type = self._determine_relationship_type(
self.entities[source]["type"],
self.entities[target]["type"]
)
self.relationships.append({
"supply": supply,
"goal": goal,
"kind": rel_type,
"confidence": random.uniform(0.5, 1.0)
})
We outline the BiomedicalAIAgent class because the core engine for analyzing biomedical information graphs utilizing BioCypher. Within the constructor, we test whether or not BioCypher is out there and initialize it if doable; in any other case, we default to a NetworkX-only method. We additionally arrange our base buildings, together with an empty graph, dictionaries for entities and relationships, and a predefined biomedical information base. We then use generate_synthetic_data() to populate this graph with real looking organic entities, equivalent to genes, ailments, medicine, and pathways, and simulate their interactions by randomly generated however biologically significant relationships.
def _generate_properties(self, entity_type: str) -> Dict[str, Any]:
"""Generate real looking properties for various entity sorts"""
base_props = {"created_at": "2024-01-01", "supply": "artificial"}
if entity_type == "genes":
base_props.replace({
"chromosome": f"chr{random.randint(1, 22)}",
"expression_level": random.uniform(0.1, 10.0),
"mutation_frequency": random.uniform(0.01, 0.3)
})
elif entity_type == "ailments":
base_props.replace({
"prevalence": random.uniform(0.001, 0.1),
"severity": random.selection(["mild", "moderate", "severe"]),
"age_of_onset": random.randint(20, 80)
})
elif entity_type == "medicine":
base_props.replace({
"dosage": f"{random.randint(10, 500)}mg",
"efficacy": random.uniform(0.3, 0.95),
"side_effects": random.randint(1, 10)
})
return base_props
def _determine_relationship_type(self, source_type: str, target_type: str) -> str:
"""Decide biologically significant relationship sorts"""
relationships_map = {
("genes", "ailments"): "associated_with",
("genes", "medicine"): "targeted_by",
("genes", "pathways"): "participates_in",
("medicine", "ailments"): "treats",
("proteins", "pathways"): "involved_in",
("ailments", "pathways"): "disrupts"
}
return relationships_map.get((source_type, target_type),
relationships_map.get((target_type, source_type), "related_to"))
def build_knowledge_graph(self) -> None:
"""Construct information graph utilizing BioCypher or NetworkX"""
print("🔗 Constructing information graph...")
if self.use_biocypher:
attempt:
for entity_id, entity_data in self.entities.objects():
self.bc.add_node(
node_id=entity_id,
node_label=entity_data["type"],
node_properties=entity_data["properties"]
)
for rel in self.relationships:
self.bc.add_edge(
source_id=rel["source"],
target_id=rel["target"],
edge_label=rel["type"],
edge_properties={"confidence": rel["confidence"]}
)
print("✅ BioCypher graph constructed efficiently")
besides Exception as e:
print(f"BioCypher construct failed, utilizing NetworkX solely: {e}")
self.use_biocypher = False
for entity_id, entity_data in self.entities.objects():
self.graph.add_node(entity_id, **entity_data)
for rel in self.relationships:
self.graph.add_edge(rel["source"], rel["target"],
kind=rel["type"], confidence=rel["confidence"])
print(f"✅ NetworkX graph constructed with {len(self.graph.nodes())} nodes and {len(self.graph.edges())} edges")
def intelligent_query(self, query_type: str, entity: str = None) -> Dict[str, Any]:
"""Clever querying system with a number of evaluation sorts"""
print(f"🤖 Processing clever question: {query_type}")
if query_type == "drug_targets":
return self._find_drug_targets()
elif query_type == "disease_genes":
return self._find_disease_associated_genes()
elif query_type == "pathway_analysis":
return self._analyze_pathways()
elif query_type == "centrality_analysis":
return self._analyze_network_centrality()
elif query_type == "entity_neighbors" and entity:
return self._find_entity_neighbors(entity)
else:
return {"error": "Unknown question kind"}
def _find_drug_targets(self) -> Dict[str, List[str]]:
"""Discover potential drug targets"""
drug_targets = {}
for rel in self.relationships:
if (rel["type"] == "targeted_by" and
self.entities[rel["source"]]["type"] == "genes"):
drug = self.entities[rel["target"]]["name"]
goal = self.entities[rel["source"]]["name"]
if drug not in drug_targets:
drug_targets[drug] = []
drug_targets[drug].append(goal)
return drug_targets
def _find_disease_associated_genes(self) -> Dict[str, List[str]]:
"""Discover genes related to ailments"""
disease_genes = {}
for rel in self.relationships:
if (rel["type"] == "associated_with" and
self.entities[rel["target"]]["type"] == "ailments"):
illness = self.entities[rel["target"]]["name"]
gene = self.entities[rel["source"]]["name"]
if illness not in disease_genes:
disease_genes[disease] = []
disease_genes[disease].append(gene)
return disease_genes
def _analyze_pathways(self) -> Dict[str, int]:
"""Analyze pathway connectivity"""
pathway_connections = {}
for rel in self.relationships:
if rel["type"] in ["participates_in", "involved_in"]:
if self.entities[rel["target"]]["type"] == "pathways":
pathway = self.entities[rel["target"]]["name"]
pathway_connections[pathway] = pathway_connections.get(pathway, 0) + 1
return dict(sorted(pathway_connections.objects(), key=lambda x: x[1], reverse=True))
def _analyze_network_centrality(self) -> Dict[str, Dict[str, float]]:
"""Analyze community centrality measures"""
if len(self.graph.nodes()) == 0:
return {}
centrality_measures = {
"diploma": nx.degree_centrality(self.graph),
"betweenness": nx.betweenness_centrality(self.graph),
"closeness": nx.closeness_centrality(self.graph)
}
top_nodes = {}
for measure, values in centrality_measures.objects():
top_nodes[measure] = dict(sorted(values.objects(), key=lambda x: x[1], reverse=True)[:5])
return top_nodes
def _find_entity_neighbors(self, entity_name: str) -> Dict[str, List[str]]:
"""Discover neighbors of a particular entity"""
neighbors = {"direct": [], "oblique": []}
entity_id = None
for eid, edata in self.entities.objects():
if edata["name"].decrease() == entity_name.decrease():
entity_id = eid
break
if not entity_id or entity_id not in self.graph:
return {"error": f"Entity '{entity_name}' not discovered"}
for neighbor in self.graph.neighbors(entity_id):
neighbors["direct"].append(self.entities[neighbor]["name"])
for direct_neighbor in self.graph.neighbors(entity_id):
for indirect_neighbor in self.graph.neighbors(direct_neighbor):
if (indirect_neighbor != entity_id and
indirect_neighbor not in record(self.graph.neighbors(entity_id))):
neighbor_name = self.entities[indirect_neighbor]["name"]
if neighbor_name not in neighbors["indirect"]:
neighbors["indirect"].append(neighbor_name)
return neighbors
def visualize_network(self, max_nodes: int = 30) -> None:
"""Visualize the information graph"""
print("📊 Creating community visualization...")
nodes_to_show = record(self.graph.nodes())[:max_nodes]
subgraph = self.graph.subgraph(nodes_to_show)
plt.determine(figsize=(12, 8))
pos = nx.spring_layout(subgraph, ok=2, iterations=50)
node_colors = []
color_map = {"genes": "purple", "ailments": "blue", "medicine": "inexperienced",
"pathways": "orange", "proteins": "purple"}
for node in subgraph.nodes():
entity_type = self.entities[node]["type"]
node_colors.append(color_map.get(entity_type, "grey"))
nx.draw(subgraph, pos, node_color=node_colors, node_size=300,
with_labels=False, alpha=0.7, edge_color="grey", width=0.5)
plt.title("Biomedical Data Graph Community")
plt.axis('off')
plt.tight_layout()
plt.present()
We designed a set of clever features inside the BiomedicalAIAgent class to simulate real-world biomedical eventualities. We generate real looking properties for every entity kind, outline biologically significant relationship sorts, and construct a structured information graph utilizing both BioCypher or NetworkX. To realize insights, we included features for analyzing drug targets, disease-gene associations, pathway connectivity, and community centrality, together with a visible graph explorer that helps us intuitively perceive the interactions between biomedical entities.
def run_analysis_pipeline(self) -> None:
"""Run full evaluation pipeline"""
print("🚀 Beginning BioCypher AI Agent Evaluation Pipelinen")
self.generate_synthetic_data()
self.build_knowledge_graph()
print(f"📈 Graph Statistics:")
print(f" Entities: {len(self.entities)}")
print(f" Relationships: {len(self.relationships)}")
print(f" Graph Nodes: {len(self.graph.nodes())}")
print(f" Graph Edges: {len(self.graph.edges())}n")
analyses = [
("drug_targets", "Drug Target Analysis"),
("disease_genes", "Disease-Gene Associations"),
("pathway_analysis", "Pathway Connectivity Analysis"),
("centrality_analysis", "Network Centrality Analysis")
]
for query_type, title in analyses:
print(f"🔍 {title}:")
outcomes = self.intelligent_query(query_type)
self._display_results(outcomes)
print()
self.visualize_network()
print("✅ Evaluation full! AI Agent efficiently analyzed biomedical information.")
def _display_results(self, outcomes: Dict[str, Any], max_items: int = 5) -> None:
"""Show evaluation ends in a formatted means"""
if isinstance(outcomes, dict) and "error" not in outcomes:
for key, worth in record(outcomes.objects())[:max_items]:
if isinstance(worth, record):
print(f" {key}: {', '.be a part of(worth[:3])}{'...' if len(worth) > 3 else ''}")
elif isinstance(worth, dict):
print(f" {key}: {dict(record(worth.objects())[:3])}")
else:
print(f" {key}: {worth}")
else:
print(f" {outcomes}")
def export_to_formats(self) -> None:
"""Export information graph to varied codecs"""
if self.use_biocypher:
attempt:
print("📤 Exporting BioCypher graph...")
print("✅ BioCypher export accomplished")
besides Exception as e:
print(f"BioCypher export failed: {e}")
print("📤 Exporting NetworkX graph to codecs...")
graph_data = {
"nodes": [{"id": n, **self.graph.nodes[n]} for n in self.graph.nodes()],
"edges": [{"source": u, "target": v, **self.graph.edges[u, v]}
for u, v in self.graph.edges()]
}
attempt:
with open("biomedical_graph.json", "w") as f:
json.dump(graph_data, f, indent=2, default=str)
nx.write_graphml(self.graph, "biomedical_graph.graphml")
print("✅ Graph exported to JSON and GraphML codecs")
besides Exception as e:
print(f"Export failed: {e}")
def export_to_formats(self) -> None:
"""Export information graph to varied codecs"""
if self.use_biocypher:
attempt:
print("📤 Exporting BioCypher graph...")
print("✅ BioCypher export accomplished")
besides Exception as e:
print(f"BioCypher export failed: {e}")
print("📤 Exporting NetworkX graph to codecs...")
graph_data = {
"nodes": [{"id": n, **self.graph.nodes[n]} for n in self.graph.nodes()],
"edges": [{"source": u, "target": v, **self.graph.edges[u, v]}
for u, v in self.graph.edges()]
}
with open("biomedical_graph.json", "w") as f:
json.dump(graph_data, f, indent=2, default=str)
nx.write_graphml(self.graph, "biomedical_graph.graphml")
print("✅ Graph exported to JSON and GraphML codecs")
"""Show evaluation ends in a formatted means"""
if isinstance(outcomes, dict) and "error" not in outcomes:
for key, worth in record(outcomes.objects())[:max_items]:
if isinstance(worth, record):
print(f" {key}: {', '.be a part of(worth[:3])}{'...' if len(worth) > 3 else ''}")
elif isinstance(worth, dict):
print(f" {key}: {dict(record(worth.objects())[:3])}")
else:
print(f" {key}: {worth}")
else:
print(f" {outcomes}")
We wrap up the AI agent workflow with a streamlined run_analysis_pipeline() operate that ties every little thing collectively, from artificial information technology and graph development to clever question execution and last visualization. This automated pipeline permits us to look at biomedical relationships, analyze central entities, and perceive how totally different organic ideas are interconnected. Lastly, utilizing export_to_formats(), we be sure that the ensuing graph could be saved in each JSON and GraphML codecs for additional use, making our evaluation each shareable and reproducible.
if __name__ == "__main__":
agent = BiomedicalAIAgent()
agent.run_analysis_pipeline()
We conclude the tutorial by instantiating our BiomedicalAIAgent and operating the total evaluation pipeline. This entry level permits us to execute all steps, together with information technology, graph constructing, clever querying, visualization, and reporting, in a single, streamlined command, making it simple to discover biomedical information utilizing BioCypher.
In conclusion, by this superior tutorial, we acquire sensible expertise working with BioCypher to create scalable biomedical information graphs and carry out insightful organic analyses. The twin-mode assist ensures that even when BioCypher is unavailable, the system gracefully falls again to NetworkX for full performance. The flexibility to generate artificial datasets, execute clever graph queries, visualize relationships, and export in a number of codecs showcases the flexibleness and analytical energy of the BioCypher-based agent. Total, this tutorial exemplifies how BioCypher can function a important infrastructure layer for biomedical AI methods, making advanced organic information each usable and insightful for downstream purposes.
Try the Codes right here. All credit score for this analysis goes to the researchers of this mission. Additionally, be at liberty to observe us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our Publication.
Sana Hassan, a consulting intern at Marktechpost and dual-degree scholar at IIT Madras, is captivated with making use of expertise and AI to deal with real-world challenges. With a eager curiosity in fixing sensible issues, he brings a recent perspective to the intersection of AI and real-life options.