A Coding Implementation for Creating, Annotating, and Visualizing Complicated Organic Data Graphs Utilizing PyBEL

June 25, 2025

89

On this tutorial, we discover how one can leverage the PyBEL ecosystem to assemble and analyze wealthy organic data graphs instantly inside Google Colab. We start by putting in all mandatory packages, together with PyBEL, NetworkX, Matplotlib, Seaborn, and Pandas. We then show how one can outline proteins, processes, and modifications utilizing the PyBEL DSL. From there, we information you thru the creation of an Alzheimer’s disease-related pathway, showcasing how one can encode causal relationships, protein–protein interactions, and phosphorylation occasions. Alongside graph building, we introduce superior community analyses, together with centrality measures, node classification, and subgraph extraction, in addition to methods for extracting quotation and proof knowledge. By the top of this part, you’ll have a completely annotated BEL graph prepared for downstream visualization and enrichment analyses, laying a stable basis for interactive organic data exploration.

!pip set up pybel pybel-tools networkx matplotlib seaborn pandas -q


import pybel
import pybel.dsl as dsl
from pybel import BELGraph
from pybel.io import to_pickle, from_pickle
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from collections import Counter
import warnings
warnings.filterwarnings('ignore')


print("PyBEL Superior Tutorial: Organic Expression Language Ecosystem")
print("=" * 65)

We start by putting in PyBEL and its dependencies instantly in Colab, guaranteeing that each one mandatory libraries, NetworkX, Matplotlib, Seaborn, and Pandas, can be found for our evaluation. As soon as put in, we import the core modules and suppress warnings to maintain our pocket book clear and centered on the outcomes.

print("n1. Constructing a Organic Data Graph")
print("-" * 40)


graph = BELGraph(
   title="Alzheimer's Illness Pathway",
   model="1.0.0",
   description="Instance pathway displaying protein interactions in AD",
   authors="PyBEL Tutorial"
)


app = dsl.Protein(title="APP", namespace="HGNC")
abeta = dsl.Protein(title="Abeta", namespace="CHEBI")
tau = dsl.Protein(title="MAPT", namespace="HGNC")
gsk3b = dsl.Protein(title="GSK3B", namespace="HGNC")
irritation = dsl.BiologicalProcess(title="inflammatory response", namespace="GO")
apoptosis = dsl.BiologicalProcess(title="apoptotic course of", namespace="GO")




graph.add_increases(app, abeta, quotation="PMID:12345678", proof="APP cleavage produces Abeta")
graph.add_increases(abeta, irritation, quotation="PMID:87654321", proof="Abeta triggers neuroinflammation")


tau_phosphorylated = dsl.Protein(title="MAPT", namespace="HGNC",
                               variants=[dsl.ProteinModification("Ph")])
graph.add_increases(gsk3b, tau_phosphorylated, quotation="PMID:11111111", proof="GSK3B phosphorylates tau")
graph.add_increases(tau_phosphorylated, apoptosis, quotation="PMID:22222222", proof="Hyperphosphorylated tau causes cell dying")
graph.add_increases(irritation, apoptosis, quotation="PMID:33333333", proof="Irritation promotes apoptosis")


graph.add_association(abeta, tau, quotation="PMID:44444444", proof="Abeta and tau work together synergistically")


print(f"Created BEL graph with {graph.number_of_nodes()} nodes and {graph.number_of_edges()} edges")

We initialize a BELGraph with metadata for an Alzheimer’s illness pathway and outline proteins and processes utilizing the PyBEL DSL. By including causal relationships, protein modifications, and associations, we assemble a complete community that captures key molecular interactions.

print("n2. Superior Community Evaluation")
print("-" * 30)


degree_centrality = nx.degree_centrality(graph)
betweenness_centrality = nx.betweenness_centrality(graph)
closeness_centrality = nx.closeness_centrality(graph)


most_central = max(degree_centrality, key=degree_centrality.get)
print(f"Most related node: {most_central}")
print(f"Diploma centrality: {degree_centrality[most_central]:.3f}")

We compute diploma, betweenness, and closeness centralities to quantify every node’s significance inside the graph. By figuring out probably the most related nodes, we achieve perception into potential hubs that will drive illness mechanisms.

print("n3. Organic Entity Classification")
print("-" * 35)


node_types = Counter()
for node in graph.nodes():
   node_types[node.function] += 1


print("Node distribution:")
for func, depend in node_types.objects():
   print(f"  {func}: {depend}")

We classify every node by its operate, reminiscent of Protein or BiologicalProcess, and tally their counts. This breakdown helps us perceive the composition of our community at a look.

print("n4. Pathway Evaluation")
print("-" * 20)


proteins = [node for node in graph.nodes() if node.function == 'Protein']
processes = [node for node in graph.nodes() if node.function == 'BiologicalProcess']


print(f"Proteins in pathway: {len(proteins)}")
print(f"Organic processes: {len(processes)}")


edge_types = Counter()
for u, v, knowledge in graph.edges(knowledge=True):
   edge_types[data.get('relation')] += 1


print("nRelationship sorts:")
for rel, depend in edge_types.objects():
   print(f"  {rel}: {depend}")

We separate all proteins and processes to measure the pathway’s scope and complexity. Counting the totally different relationship sorts additional reveals which interactions, like will increase or associations, dominate our mannequin.

print("n5. Literature Proof Evaluation")
print("-" * 32)


citations = []
evidences = []
for _, _, knowledge in graph.edges(knowledge=True):
   if 'quotation' in knowledge:
       citations.append(knowledge['citation'])
   if 'proof' in knowledge:
       evidences.append(knowledge['evidence'])


print(f"Complete citations: {len(citations)}")
print(f"Distinctive citations: {len(set(citations))}")
print(f"Proof statements: {len(evidences)}")

We extract quotation identifiers and proof strings from every edge to judge our graph’s grounding in printed analysis. Summarizing complete and distinctive citations permits us to evaluate the breadth of supporting literature.

print("n6. Subgraph Evaluation")
print("-" * 22)


inflammation_nodes = [inflammation]
inflammation_neighbors = record(graph.predecessors(irritation)) + record(graph.successors(irritation))
inflammation_subgraph = graph.subgraph(inflammation_nodes + inflammation_neighbors)


print(f"Irritation subgraph: {inflammation_subgraph.number_of_nodes()} nodes, {inflammation_subgraph.number_of_edges()} edges")

We isolate the irritation subgraph by amassing its direct neighbors, yielding a centered view of inflammatory crosstalk. This focused subnetwork highlights how irritation interfaces with different illness processes.

print("n7. Superior Graph Querying")
print("-" * 28)


attempt:
   paths = record(nx.all_simple_paths(graph, app, apoptosis, cutoff=3))
   print(f"Paths from APP to apoptosis: {len(paths)}")
   if paths:
       print(f"Shortest path size: {len(paths[0])-1}")
besides nx.NetworkXNoPath:
   print("No paths discovered between APP and apoptosis")


apoptosis_inducers = record(graph.predecessors(apoptosis))
print(f"Elements that improve apoptosis: {len(apoptosis_inducers)}")

We enumerate easy paths between APP and apoptosis to discover mechanistic routes and establish key intermediates. Itemizing all predecessors of apoptosis additionally reveals us which components could set off cell dying.

print("n8. Knowledge Export and Visualization")
print("-" * 35)


adj_matrix = nx.adjacency_matrix(graph)
node_labels = [str(node) for node in graph.nodes()]


plt.determine(figsize=(12, 8))


plt.subplot(2, 2, 1)
pos = nx.spring_layout(graph, okay=2, iterations=50)
nx.draw(graph, pos, with_labels=False, node_color="lightblue",
       node_size=1000, font_size=8, font_weight="daring")
plt.title("BEL Community Graph")


plt.subplot(2, 2, 2)
centralities = record(degree_centrality.values())
plt.hist(centralities, bins=10, alpha=0.7, shade="inexperienced")
plt.title("Diploma Centrality Distribution")
plt.xlabel("Centrality")
plt.ylabel("Frequency")


plt.subplot(2, 2, 3)
features = record(node_types.keys())
counts = record(node_types.values())
plt.pie(counts, labels=features, autopct="%1.1f%%", startangle=90)
plt.title("Node Kind Distribution")


plt.subplot(2, 2, 4)
relations = record(edge_types.keys())
rel_counts = record(edge_types.values())
plt.bar(relations, rel_counts, shade="orange", alpha=0.7)
plt.title("Relationship Sorts")
plt.xlabel("Relation")
plt.ylabel("Rely")
plt.xticks(rotation=45)


plt.tight_layout()
plt.present()

We put together adjacency matrices and node labels for downstream use and generate a multi-panel determine displaying the community construction, centrality distributions, node-type proportions, and edge-type counts. These visualizations carry our BEL graph to life, supporting a deeper organic interpretation.

On this tutorial, now we have demonstrated the ability and suppleness of PyBEL for modeling complicated organic techniques. We confirmed how simply one can assemble a curated white-box graph of Alzheimer’s illness interactions, carry out network-level analyses to establish key hub nodes, and extract biologically significant subgraphs for centered research. We additionally coated important practices for literature proof mining and ready knowledge constructions for compelling visualizations. As a subsequent step, we encourage you to increase this framework to your pathways, integrating further omics knowledge, working enrichment exams, or coupling the graph with machine-learning workflows.

Try the Codes right here. All credit score for this analysis goes to the researchers of this challenge. Additionally, be happy to observe us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our E-newsletter.

Sana Hassan, a consulting intern at Marktechpost and dual-degree scholar at IIT Madras, is captivated with making use of expertise and AI to deal with real-world challenges. With a eager curiosity in fixing sensible issues, he brings a recent perspective to the intersection of AI and real-life options.