Weights & Biases: A KDnuggets Crash Course

October 7, 2025

37

Weights & Biases: A KDnuggets Crash Course

Picture by Writer

If you happen to prepare fashions past a single pocket book, you have most likely hit the identical complications: you tweak 5 knobs, rerun coaching, and by Friday you may’t keep in mind which run produced the “good” ROC curve or which information slice you used. Weights & Biases (W&B) offers you a paper path — metrics, configs, plots, datasets, and fashions — so you may reply what modified with proof, not guesswork.

Under is a sensible tour. It is opinionated, mild on ceremony, and geared for groups who desire a clear experiment historical past with out constructing their very own platform. Let’s name it a no-fluff walkthrough.

# Why W&B at All?

Notebooks develop into experiments. Experiments multiply. Quickly you are asking: Which run used that information slice? Why is at present’s ROC curve increased? Can I reproduce final week’s baseline?

W&B offers you a spot to:

Log metrics, configs, plots, and system stats
Model datasets and fashions with artifacts
Run hyperparameter sweeps
Share dashboards with out screenshots

You can begin tiny and layer options when wanted.

# Setup in 60 Seconds

Begin by putting in the library and logging in together with your API key. If you do not have one but, yow will discover it right here.

pip set up wandb
wandb login # paste your API key as soon as

Picture by Writer

// Minimal Sanity Test

import wandb, random, time

wandb.init(undertaking="kdn-crashcourse", title="hello-run", config={"lr": 0.001, "epochs": 5})
for epoch in vary(wandb.config.epochs):
    loss = 1.0 / (epoch + 1) + random.random() * 0.05
    wandb.log({"epoch": epoch, "loss": loss})
    time.sleep(0.1)
wandb.end()

Now it’s best to see one thing like this:

Picture by Writer

Now let’s go for the helpful bits.

# Monitoring Experiments Correctly

// Log Hyperparameters and Metrics

Deal with wandb.config as the one supply of reality in your experiment’s knobs. Give metrics clear names so charts auto-group.

cfg = dict(arch="resnet18", lr=3e-4, batch=64, seed=42)
run = wandb.init(undertaking="kdn-mlops", config=cfg, tags=["baseline"])

# coaching loop ...
for step, (x, y) in enumerate(loader):
    # ... compute loss, acc
    wandb.log({"prepare/loss": loss.merchandise(), "prepare/acc": acc, "step": step})

# log a last abstract
run.abstract["best_val_auc"] = best_auc

Just a few ideas:

Use namespaces like prepare/loss or val/auc to group charts routinely
Add tags like "lr-finder" or "fp16" so you may filter runs later
Use run.abstract[...] for one-off outcomes you wish to see on the run card

// Log Photos, Confusion Matrices, and Customized Plots

wandb.log({
    "val/confusion": wandb.plot.confusion_matrix(
        preds=preds, y_true=y_true, class_names=lessons)
})

You can even save any Matplotlib plot:

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(historical past)
wandb.log({"coaching/curve": fig})

// Model Datasets and Fashions With Artifacts

Artifacts reply questions like, “Which actual recordsdata did this run use?” and “What did we prepare?” No extra final_final_v3.parquet mysteries.

import wandb

run = wandb.init(undertaking="kdn-mlops")

# Create a dataset artifact (run as soon as per model)
uncooked = wandb.Artifact("imdb_reviews", kind="dataset", description="uncooked dump v1")
uncooked.add_dir("information/uncooked") # or add_file("path")
run.log_artifact(uncooked)

# Later, devour the newest model
artifact = run.use_artifact("imdb_reviews:newest")
data_dir = artifact.obtain() # folder path pinned to a hash

Log your mannequin the identical manner:

import torch
import wandb

run = wandb.init(undertaking="kdn-mlops")

model_path = "fashions/resnet18.pt"
torch.save(mannequin.state_dict(), model_path)

model_art = wandb.Artifact("sentiment-resnet18", kind="mannequin")
model_art.add_file(model_path)
run.log_artifact(model_art)

Now, the lineage is apparent: this mannequin got here from that information, below this code commit.

// Tables for Evaluations and Error Evaluation

wandb.Desk is a light-weight dataframe for outcomes, predictions, and slices.

desk = wandb.Desk(columns=["id", "text", "pred", "true", "prob"])
for r in batch_results:
    desk.add_data(r.id, r.textual content, r.pred, r.true, r.prob)
wandb.log({"eval/preds": desk})

Filter the desk within the UI to seek out failure patterns (e.g., brief evaluations, uncommon lessons, and so on.).

// Hyperparameter Sweeps

Outline a search area in YAML, launch brokers, and let W&B coordinate.

# sweep.yaml
methodology: bayes
metric: {title: val/auc, objective: maximize}
parameters:
  lr: {min: 1e-5, max: 1e-2}
  batch: {values: [32, 64, 128]}
  dropout: {min: 0.0, max: 0.5}

Begin the sweep:

wandb sweep sweep.yaml # returns a SWEEP_ID
wandb agent // # run 1+ brokers

Your coaching script ought to learn wandb.config for lr, batch, and so on. The dashboard reveals high trials, parallel coordinates, and the perfect config.

# Drop-In Integrations

Choose the one you utilize and maintain shifting.

// PyTorch Lightning

from pytorch_lightning.loggers import WandbLogger
logger = WandbLogger(undertaking="kdn-mlops")
coach = pl.Coach(logger=logger, max_epochs=10)

// Keras

import wandb
from wandb.keras import WandbCallback

wandb.init(undertaking="kdn-mlops", config={"epochs": 10})
mannequin.match(X, y, epochs=wandb.config.epochs, callbacks=[WandbCallback()])

// Scikit-learn

from sklearn.metrics import roc_auc_score
wandb.init(undertaking="kdn-mlops", config={"C": 1.0})
# ... match mannequin
wandb.log({"val/auc": roc_auc_score(y_true, y_prob)})

# Mannequin Registry and Staging

Consider the registry as a named shelf in your finest fashions. You push an artifact as soon as, then handle aliases like staging or manufacturing so downstream code can pull the proper one with out guessing file paths.

run = wandb.init(undertaking="kdn-mlops")
artwork = run.use_artifact("sentiment-resnet18:newest")
registry = wandb.sdk.artifacts.model_registry.ModelRegistry()
entry = registry.push(artwork, title="sentiment-classifier")
entry.aliases.add("staging")

Flip the alias whenever you promote a brand new construct. Customers at all times learn sentiment-classifier:manufacturing.

# Reproducibility Guidelines

Configs: Retailer each hyperparameter in wandb.config
Code and commit: Use wandb.init(settings=wandb.Settings(code_dir=".")) to snapshot code or depend on CI to connect the git SHA
Surroundings: Log necessities.txt or the Docker tag and embrace it in an artifact
Seeds: Log them and set them

Minimal seed helper:

def set_seeds(s=42):
    import random, numpy as np, torch
    random.seed(s)
    np.random.seed(s)
    torch.manual_seed(s)
    torch.cuda.manual_seed_all(s)

# Collaboration and Sharing With out Screenshots

Add notes and tags so teammates can search. Use Experiences to sew charts, tables, and commentary right into a hyperlink you may drop in Slack or a PR. Stakeholders can observe alongside with out opening a pocket book.

# CI and Automation Ideas

Run wandb agent on coaching nodes to execute sweeps from CI
Log a dataset artifact after your ETL job; prepare jobs can rely upon that model explicitly
After analysis, promote mannequin aliases (staging → manufacturing) in a small post-step
Go WANDB_API_KEY as a secret and group associated runs with WANDB_RUN_GROUP

# Privateness and Reliability Ideas

Use non-public initiatives by default for groups
Use offline mode for air-gapped runs. Practice usually, then wandb sync later

export WANDB_MODE=offline

Do not log uncooked PII. If wanted, hash IDs earlier than logging.
For giant recordsdata, retailer them as artifacts as an alternative of attaching them to wandb.log.

# Widespread Snags (and Fast Fixes)

“My run did not log something.” The script could have crashed earlier than wandb.end() was known as. Additionally, test that you have not set WANDB_DISABLED=true in your setting.
Logging feels gradual. Log scalars at every step, however save heavy property like pictures or tables for the top of an epoch. You can even cross commit=False to wandb.log() and batch a number of logs collectively.
Seeing duplicate runs within the UI? If you’re restarting from a checkpoint, set id and resume="permit" in wandb.init() to proceed the identical run.
Experiencing thriller information drift? Put each dataset snapshot into an Artifact and pin your runs to specific variations.

# Pocket Cheatsheet

// 1. Begin a Run

wandb.init(undertaking="proj", config=cfg, tags=["baseline"])

// 2. Log Metrics, Photos, or Tables

wandb.log({"prepare/loss": loss, "img": [wandb.Image(img)]})

// 3. Model a Dataset or Mannequin

artwork = wandb.Artifact("title", kind="dataset")
artwork.add_dir("path")
run.log_artifact(artwork)

// 4. Eat an Artifact

path = run.use_artifact("title:newest").obtain()

// 5. Run a Sweep

wandb sweep sweep.yaml && wandb agent //

# Wrapping Up

Begin small: initialize a run, log a number of metrics, and push your mannequin file as an artifact. When that feels pure, add a sweep and a brief report. You will find yourself with reproducible experiments, traceable information and fashions, and a dashboard that explains your work with out a slideshow.

Josep Ferrer is an analytics engineer from Barcelona. He graduated in physics engineering and is at the moment working within the information science area utilized to human mobility. He’s a part-time content material creator centered on information science and know-how. Josep writes on all issues AI, overlaying the appliance of the continuing explosion within the area.

Previous articleBYD-Led Consortium Awarded Contract for Autonomous Buses in Singapore

Next articleOne Ring to Rule VR

Weights & Biases: A KDnuggets Crash Course

# Why W&B at All?

# Setup in 60 Seconds

// Minimal Sanity Test

# Monitoring Experiments Correctly

// Log Hyperparameters and Metrics

// Log Photos, Confusion Matrices, and Customized Plots

// Model Datasets and Fashions With Artifacts

// Tables for Evaluations and Error Evaluation

// Hyperparameter Sweeps

# Drop-In Integrations

// PyTorch Lightning

// Keras

// Scikit-learn

# Mannequin Registry and Staging

# Reproducibility Guidelines

# Collaboration and Sharing With out Screenshots

# CI and Automation Ideas

# Privateness and Reliability Ideas

# Widespread Snags (and Fast Fixes)

# Pocket Cheatsheet

// 1. Begin a Run

// 2. Log Metrics, Photos, or Tables

// 3. Model a Dataset or Mannequin

// 4. Eat an Artifact

// 5. Run a Sweep

# Wrapping Up

LEAVE A REPLY Cancel reply

Most Popular

Recent Comments

ABOUT US

POPULAR POSTS

POPULAR CATEGORY