

Picture by Writer
If you happen to prepare fashions past a single pocket book, you have most likely hit the identical complications: you tweak 5 knobs, rerun coaching, and by Friday you may’t keep in mind which run produced the “good” ROC curve or which information slice you used. Weights & Biases (W&B) offers you a paper path — metrics, configs, plots, datasets, and fashions — so you may reply what modified with proof, not guesswork.
Under is a sensible tour. It is opinionated, mild on ceremony, and geared for groups who desire a clear experiment historical past with out constructing their very own platform. Let’s name it a no-fluff walkthrough.
# Why W&B at All?
Notebooks develop into experiments. Experiments multiply. Quickly you are asking: Which run used that information slice? Why is at present’s ROC curve increased? Can I reproduce final week’s baseline?
W&B offers you a spot to:
- Log metrics, configs, plots, and system stats
- Model datasets and fashions with artifacts
- Run hyperparameter sweeps
- Share dashboards with out screenshots
You can begin tiny and layer options when wanted.
# Setup in 60 Seconds
Begin by putting in the library and logging in together with your API key. If you do not have one but, yow will discover it right here.
pip set up wandb
wandb login # paste your API key as soon as


Picture by Writer
// Minimal Sanity Test
import wandb, random, time
wandb.init(undertaking="kdn-crashcourse", title="hello-run", config={"lr": 0.001, "epochs": 5})
for epoch in vary(wandb.config.epochs):
loss = 1.0 / (epoch + 1) + random.random() * 0.05
wandb.log({"epoch": epoch, "loss": loss})
time.sleep(0.1)
wandb.end()
Now it’s best to see one thing like this:


Picture by Writer
Now let’s go for the helpful bits.
# Monitoring Experiments Correctly
// Log Hyperparameters and Metrics
Deal with wandb.config
as the one supply of reality in your experiment’s knobs. Give metrics clear names so charts auto-group.
cfg = dict(arch="resnet18", lr=3e-4, batch=64, seed=42)
run = wandb.init(undertaking="kdn-mlops", config=cfg, tags=["baseline"])
# coaching loop ...
for step, (x, y) in enumerate(loader):
# ... compute loss, acc
wandb.log({"prepare/loss": loss.merchandise(), "prepare/acc": acc, "step": step})
# log a last abstract
run.abstract["best_val_auc"] = best_auc
Just a few ideas:
- Use namespaces like
prepare/loss
orval/auc
to group charts routinely - Add tags like
"lr-finder"
or"fp16"
so you may filter runs later - Use
run.abstract[...]
for one-off outcomes you wish to see on the run card
// Log Photos, Confusion Matrices, and Customized Plots
wandb.log({
"val/confusion": wandb.plot.confusion_matrix(
preds=preds, y_true=y_true, class_names=lessons)
})
You can even save any Matplotlib plot:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(historical past)
wandb.log({"coaching/curve": fig})
// Model Datasets and Fashions With Artifacts
Artifacts reply questions like, “Which actual recordsdata did this run use?” and “What did we prepare?” No extra final_final_v3.parquet
mysteries.
import wandb
run = wandb.init(undertaking="kdn-mlops")
# Create a dataset artifact (run as soon as per model)
uncooked = wandb.Artifact("imdb_reviews", kind="dataset", description="uncooked dump v1")
uncooked.add_dir("information/uncooked") # or add_file("path")
run.log_artifact(uncooked)
# Later, devour the newest model
artifact = run.use_artifact("imdb_reviews:newest")
data_dir = artifact.obtain() # folder path pinned to a hash
Log your mannequin the identical manner:
import torch
import wandb
run = wandb.init(undertaking="kdn-mlops")
model_path = "fashions/resnet18.pt"
torch.save(mannequin.state_dict(), model_path)
model_art = wandb.Artifact("sentiment-resnet18", kind="mannequin")
model_art.add_file(model_path)
run.log_artifact(model_art)
Now, the lineage is apparent: this mannequin got here from that information, below this code commit.
// Tables for Evaluations and Error Evaluation
wandb.Desk
is a light-weight dataframe for outcomes, predictions, and slices.
desk = wandb.Desk(columns=["id", "text", "pred", "true", "prob"])
for r in batch_results:
desk.add_data(r.id, r.textual content, r.pred, r.true, r.prob)
wandb.log({"eval/preds": desk})
Filter the desk within the UI to seek out failure patterns (e.g., brief evaluations, uncommon lessons, and so on.).
// Hyperparameter Sweeps
Outline a search area in YAML, launch brokers, and let W&B coordinate.
# sweep.yaml
methodology: bayes
metric: {title: val/auc, objective: maximize}
parameters:
lr: {min: 1e-5, max: 1e-2}
batch: {values: [32, 64, 128]}
dropout: {min: 0.0, max: 0.5}
Begin the sweep:
wandb sweep sweep.yaml # returns a SWEEP_ID
wandb agent // # run 1+ brokers
Your coaching script ought to learn wandb.config
for lr
, batch
, and so on. The dashboard reveals high trials, parallel coordinates, and the perfect config.
# Drop-In Integrations
Choose the one you utilize and maintain shifting.
// PyTorch Lightning
from pytorch_lightning.loggers import WandbLogger
logger = WandbLogger(undertaking="kdn-mlops")
coach = pl.Coach(logger=logger, max_epochs=10)
// Keras
import wandb
from wandb.keras import WandbCallback
wandb.init(undertaking="kdn-mlops", config={"epochs": 10})
mannequin.match(X, y, epochs=wandb.config.epochs, callbacks=[WandbCallback()])
// Scikit-learn
from sklearn.metrics import roc_auc_score
wandb.init(undertaking="kdn-mlops", config={"C": 1.0})
# ... match mannequin
wandb.log({"val/auc": roc_auc_score(y_true, y_prob)})
# Mannequin Registry and Staging
Consider the registry as a named shelf in your finest fashions. You push an artifact as soon as, then handle aliases like staging
or manufacturing
so downstream code can pull the proper one with out guessing file paths.
run = wandb.init(undertaking="kdn-mlops")
artwork = run.use_artifact("sentiment-resnet18:newest")
registry = wandb.sdk.artifacts.model_registry.ModelRegistry()
entry = registry.push(artwork, title="sentiment-classifier")
entry.aliases.add("staging")
Flip the alias whenever you promote a brand new construct. Customers at all times learn sentiment-classifier:manufacturing
.
# Reproducibility Guidelines
- Configs: Retailer each hyperparameter in
wandb.config
- Code and commit: Use
wandb.init(settings=wandb.Settings(code_dir="."))
to snapshot code or depend on CI to connect the git SHA - Surroundings: Log
necessities.txt
or the Docker tag and embrace it in an artifact - Seeds: Log them and set them
Minimal seed helper:
def set_seeds(s=42):
import random, numpy as np, torch
random.seed(s)
np.random.seed(s)
torch.manual_seed(s)
torch.cuda.manual_seed_all(s)
# Collaboration and Sharing With out Screenshots
Add notes and tags so teammates can search. Use Experiences to sew charts, tables, and commentary right into a hyperlink you may drop in Slack or a PR. Stakeholders can observe alongside with out opening a pocket book.
# CI and Automation Ideas
- Run
wandb agent
on coaching nodes to execute sweeps from CI - Log a dataset artifact after your ETL job; prepare jobs can rely upon that model explicitly
- After analysis, promote mannequin aliases (
staging
→manufacturing
) in a small post-step - Go
WANDB_API_KEY
as a secret and group associated runs withWANDB_RUN_GROUP
# Privateness and Reliability Ideas
- Use non-public initiatives by default for groups
- Use offline mode for air-gapped runs. Practice usually, then
wandb sync
later
export WANDB_MODE=offline
- Do not log uncooked PII. If wanted, hash IDs earlier than logging.
- For giant recordsdata, retailer them as artifacts as an alternative of attaching them to
wandb.log
.
# Widespread Snags (and Fast Fixes)
- “My run did not log something.” The script could have crashed earlier than
wandb.end()
was known as. Additionally, test that you have not setWANDB_DISABLED=true
in your setting. - Logging feels gradual. Log scalars at every step, however save heavy property like pictures or tables for the top of an epoch. You can even cross
commit=False
towandb.log()
and batch a number of logs collectively. - Seeing duplicate runs within the UI? If you’re restarting from a checkpoint, set
id
andresume="permit"
inwandb.init()
to proceed the identical run. - Experiencing thriller information drift? Put each dataset snapshot into an Artifact and pin your runs to specific variations.
# Pocket Cheatsheet
// 1. Begin a Run
wandb.init(undertaking="proj", config=cfg, tags=["baseline"])
// 2. Log Metrics, Photos, or Tables
wandb.log({"prepare/loss": loss, "img": [wandb.Image(img)]})
// 3. Model a Dataset or Mannequin
artwork = wandb.Artifact("title", kind="dataset")
artwork.add_dir("path")
run.log_artifact(artwork)
// 4. Eat an Artifact
path = run.use_artifact("title:newest").obtain()
// 5. Run a Sweep
wandb sweep sweep.yaml && wandb agent //
# Wrapping Up
Begin small: initialize a run, log a number of metrics, and push your mannequin file as an artifact. When that feels pure, add a sweep and a brief report. You will find yourself with reproducible experiments, traceable information and fashions, and a dashboard that explains your work with out a slideshow.
Josep Ferrer is an analytics engineer from Barcelona. He graduated in physics engineering and is at the moment working within the information science area utilized to human mobility. He’s a part-time content material creator centered on information science and know-how. Josep writes on all issues AI, overlaying the appliance of the continuing explosion within the area.