Stress Testing FastAPI Utility – KDnuggets

By Jules Jackson

August 15, 2025

0

48

Stress Testing FastAPI Utility – KDnuggets

Picture by Writer

# Introduction

Stress testing is essential for understanding how your software behaves below heavy load. For machine learning-powered APIs, it’s particularly essential as a result of mannequin inference might be CPU-intensive. By simulating a lot of customers, we are able to establish efficiency bottlenecks, decide the capability of our system, and guarantee reliability.

On this tutorial, we can be utilizing:

FastAPI: A contemporary, quick (high-performance) internet framework for constructing APIs with Python.
Uvicorn: An ASGI server to run our FastAPI software.
Locust: An open-source load testing device. You outline consumer conduct with Python code, and swarm your system with tons of of simultaneous customers.
Scikit-learn: For our instance machine studying mannequin.

# 1. Venture Setup and Dependencies

Arrange the venture construction and set up the required dependencies.

Create necessities.txt file and add the next Python packages:

fastapi==0.115.12
locust==2.37.10
numpy==2.3.0
pandas==2.3.0
pydantic==2.11.5
scikit-learn==1.7.0
uvicorn==0.34.3
orjson==3.10.18

Open your terminal, create a digital atmosphere, and activate it.

python -m venv venv
venvScriptsactivate

Set up all of the Python packages utilizing the necessities.txt file.

pip set up -r necessities.txt

# 2. Constructing the FastAPI Utility

On this part, we’ll create a file for coaching the Regression mannequin, for pydantic fashions, and the FastAPI software.

This ml_model.py handles the machine studying mannequin. It makes use of a singleton sample to make sure just one occasion of the mannequin is loaded. The mannequin is a Random Forest Regressor skilled on the California housing dataset. If a pre-trained mannequin (mannequin.pkl and scaler.pkl) would not exist, it trains and saves a brand new one.

app/ml_model.py:

import os
import threading

import joblib
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

class MLModel:
    _instance = None
    _lock = threading.Lock()

    def __new__(cls):
        if cls._instance is None:
            with cls._lock:
                if cls._instance is None:
                    cls._instance = tremendous().__new__(cls)
        return cls._instance

    def __init__(self):
        if not hasattr(self, "initialized"):
            self.mannequin = None
            self.scaler = None
            self.model_path = "mannequin.pkl"
            self.scaler_path = "scaler.pkl"
            self.feature_names = None
            self.initialized = True
            self.load_or_create_model()

    def load_or_create_model(self):
        """Load present mannequin or create a brand new one utilizing California housing dataset"""
        if os.path.exists(self.model_path) and os.path.exists(self.scaler_path):
            self.mannequin = joblib.load(self.model_path)
            self.scaler = joblib.load(self.scaler_path)
            housing = fetch_california_housing()
            self.feature_names = housing.feature_names
            print("Mannequin loaded efficiently")
        else:
            print("Creating new mannequin...")
            housing = fetch_california_housing()
            X, y = housing.information, housing.goal
            self.feature_names = housing.feature_names

            X_train, X_test, y_train, y_test = train_test_split(
                X, y, test_size=0.2, random_state=42
            )

            self.scaler = StandardScaler()
            X_train_scaled = self.scaler.fit_transform(X_train)

            self.mannequin = RandomForestRegressor(
                n_estimators=50,  # Lowered for quicker predictions
                max_depth=8,  # Lowered for quicker predictions
                random_state=42,
                n_jobs=1,  # Single thread for consistency
            )
            self.mannequin.match(X_train_scaled, y_train)

            joblib.dump(self.mannequin, self.model_path)
            joblib.dump(self.scaler, self.scaler_path)

            X_test_scaled = self.scaler.rework(X_test)
            rating = self.mannequin.rating(X_test_scaled, y_test)
            print(f"Mannequin R² rating: {rating:.4f}")

    def predict(self, options):
        """Make prediction for home worth"""
        features_array = np.array(options).reshape(1, -1)
        features_scaled = self.scaler.rework(features_array)
        prediction = self.mannequin.predict(features_scaled)[0]
        return prediction * 100000

    def get_feature_info(self):
        """Get details about the options"""
        return {
            "feature_names": record(self.feature_names),
            "num_features": len(self.feature_names),
            "description": "California housing dataset options",
        }

# Initialize mannequin as singleton
ml_model = MLModel()

The pydantic_models.py file defines the Pydantic fashions for request and response information validation and serialization.

app/pydantic_models.py:

from typing import Checklist

from pydantic import BaseModel, Subject

class PredictionRequest(BaseModel):
    options: Checklist[float] = Subject(
        ...,
        description="Checklist of 8 options: MedInc, HouseAge, AveRooms, AveBedrms, Inhabitants, AveOccup, Latitude, Longitude",
        min_length=8,
        max_length=8,
    )

    model_config = {
        "json_schema_extra": {
            "examples": [
                {"features": [8.3252, 41.0, 6.984, 1.024, 322.0, 2.556, 37.88, -122.23]}
            ]
        }
    }

app/fundamental.py: This file is the core FastAPI software, defining the API endpoints.

import asyncio
from contextlib import asynccontextmanager

from fastapi import FastAPI, HTTPException
from fastapi.responses import ORJSONResponse

from .ml_model import ml_model
from .pydantic_models import (
    PredictionRequest,
)

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Pre-load the mannequin
    _ = ml_model.get_feature_info()
    yield

app = FastAPI(
    title="California Housing Worth Prediction API",
    model="1.0.0",
    description="API for predicting California housing costs utilizing Random Forest mannequin",
    lifespan=lifespan,
    default_response_class=ORJSONResponse,
)

@app.get("/well being")
async def health_check():
    """Well being test endpoint"""
    return {"standing": "wholesome", "message": "Service is operational"}

@app.get("/model-info")
async def model_info():
    """Get details about the ML mannequin"""
    attempt:
        feature_info = await asyncio.to_thread(ml_model.get_feature_info)
        return {
            "model_type": "Random Forest Regressor",
            "dataset": "California Housing Dataset",
            "options": feature_info,
        }
    besides Exception:
        increase HTTPException(
            status_code=500, element="Error retrieving mannequin info"
        )

@app.publish("/predict")
async def predict(request: PredictionRequest):
    """Make home worth prediction"""
    if len(request.options) != 8:
        increase HTTPException(
            status_code=400,
            element=f"Anticipated 8 options, acquired {len(request.options)}",
        )
    attempt:
        prediction = ml_model.predict(request.options)
        return {
            "prediction": float(prediction),
            "standing": "success",
            "features_used": request.options,
        }
    besides ValueError as e:
        increase HTTPException(status_code=400, element=str(e))
    besides Exception:
        increase HTTPException(status_code=500, element="Prediction error")

Key factors:

lifespan supervisor: Ensures the ML mannequin is loaded throughout software startup.
asyncio.to_thread: That is essential as a result of scikit-learn’s predict technique is CPU-bound (synchronous). Operating it in a separate thread prevents it from blocking FastAPI’s asynchronous occasion loop, permitting the server to deal with different requests concurrently.

Endpoints:

/well being: A easy well being test.
/model-info: Supplies metadata in regards to the ML mannequin.
/predict: Accepts a listing of options and returns a home worth prediction.

run_server.py: It accommodates the script that’s used to run the FastAPI software utilizing Uvicorn.

import uvicorn

if __name__ == "__main__":

    uvicorn.run("app.fundamental:app", host="localhost", port=8000, employees=4)

All of the information and configurations can be found on the GitHub repository: kingabzpro/Stress-Testing-FastAPI

# 3. Writing the Locust Stress Check

Now, let’s create the stress take a look at script utilizing Locust.

assessments/locustfile.py: This file defines the conduct of simulated customers.

import json
import logging
import random

from locust import HttpUser, activity

# Cut back logging to enhance efficiency
logging.getLogger("urllib3").setLevel(logging.WARNING)

class HousingAPIUser(HttpUser):
    def generate_random_features(self):
        """Generate random however reasonable California housing options"""
        return [
            round(random.uniform(0.5, 15.0), 4),  # MedInc
            round(random.uniform(1.0, 52.0), 1),  # HouseAge
            round(random.uniform(2.0, 10.0), 2),  # AveRooms
            round(random.uniform(0.5, 2.0), 2),  # AveBedrms
            round(random.uniform(3.0, 35000.0), 0),  # Population
            round(random.uniform(1.0, 10.0), 2),  # AveOccup
            round(random.uniform(32.0, 42.0), 2),  # Latitude
            round(random.uniform(-124.0, -114.0), 2),  # Longitude
        ]

    @activity(1)
    def model_info(self):
        """Check well being endpoint"""
        with self.consumer.get("/model-info", catch_response=True) as response:
            if response.status_code == 200:
                response.success()
            else:
                response.failure(f"Mannequin information failed: {response.status_code}")

    @activity(3)
    def single_prediction(self):
        """Check single prediction endpoint"""
        options = self.generate_random_features()


        with self.consumer.publish(
            "/predict", json={"options": options}, catch_response=True, timeout=10
        ) as response:
            if response.status_code == 200:
                attempt:
                    information = response.json()
                    if "prediction" in information:
                        response.success()
                    else:
                        response.failure("Invalid response format")
                besides json.JSONDecodeError:
                    response.failure("Did not parse JSON")
            elif response.status_code == 503:
                response.failure("Service unavailable")
            else:
                response.failure(f"Standing code: {response.status_code}")

Key factors:

Every simulated consumer will wait between 0.5 and a pair of seconds between executing duties.
Creates reasonable random function information for the prediction requests.
Every consumer will make one health_check request and three single_prediction requests.

# 4. Operating the Stress Check

To judge the efficiency of your software below load, start by beginning your asynchronous machine studying software in a single terminal.

Mannequin loaded efficiently
INFO:     Began server course of [26216]
INFO:     Ready for software startup.
INFO:     Utility startup full.
INFO:     Uvicorn working on http://0.0.0.0:8000 (Press CTRL+C to give up)

Open your browser and navigate to http://localhost:8000/docs. Use the interactive API documentation to check your endpoints and guarantee they’re functioning accurately.

Open a brand new terminal window, activate the digital atmosphere, and navigate to your venture’s root listing to run Locust with the Net UI:

locust -f assessments/locustfile.py --host http://localhost:8000

Entry the Locust internet UI at http://localhost:8089 in your browser.

Within the Locust internet UI, set the full variety of customers to 500, the spawn fee to 10 customers per second, and run it for a minute.

Through the take a look at, Locust will show real-time statistics, together with the variety of requests, failures, and response instances for every endpoint.

As soon as the take a look at is full, click on on the Charts tab to view interactive graphs exhibiting the variety of customers, requests per second, and response instances.

To run Locust with out the online UI and mechanically generate an HTML report, use the next command:

locust -f assessments/locustfile.py --host http://localhost:8000 --users 500 --spawn-rate 10 --run-time 60s --headless  --html report.html

After the take a look at finishes, an HTML report named report.html can be saved in your venture listing for later assessment.

# Closing Ideas

Our app can deal with a lot of customers as we’re utilizing a easy machine studying mannequin. The outcomes present that the model-info endpoint has a larger response time than the prediction, which is spectacular. That is the best-case state of affairs for testing your software regionally earlier than pushing it to manufacturing.

If you need to expertise this setup firsthand, please go to the kingabzpro/Stress-Testing-FastAPI repository and observe the directions within the documentation.

Abid Ali Awan (@1abidaliawan) is an authorized information scientist skilled who loves constructing machine studying fashions. At present, he’s specializing in content material creation and writing technical blogs on machine studying and information science applied sciences. Abid holds a Grasp’s diploma in know-how administration and a bachelor’s diploma in telecommunication engineering. His imaginative and prescient is to construct an AI product utilizing a graph neural community for college students scuffling with psychological sickness.

Previous articleA Information to Customized Loss Capabilities and Calibration Metrics

Next articleStopping Machine Breakdowns: How Bodily AI Predicts Gear Issues

Stress Testing FastAPI Utility – KDnuggets

# Introduction

# 1. Venture Setup and Dependencies

# 2. Constructing the FastAPI Utility

# 3. Writing the Locust Stress Check

# 4. Operating the Stress Check

# Closing Ideas

An Implementation to Construct Dynamic AI Techniques with the Mannequin Context Protocol (MCP) for Actual-Time Useful resource and Instrument Integration

Microsoft AI Proposes BitNet Distillation (BitDistill): A Light-weight Pipeline that Delivers as much as 10x Reminiscence Financial savings and about 2.65x CPU Speedup

Weak-for-Robust (W4S): A Novel Reinforcement Studying Algorithm that Trains a weak Meta Agent to Design Agentic Workflows with Stronger LLMs

LEAVE A REPLY Cancel reply

Most Popular

MatrixSpace Operation Flytrap 4.5 – DRONELIFE

Türkiye: ‘alternatives from customs reform’

Ionic Angular ion-content inner-scroll has zero peak on iOS stopping scrolling – all customary fixes tried

Obtain 2x quicker information lake question efficiency with Apache Iceberg on Amazon Redshift

Recent Comments

ABOUT US

POPULAR POSTS

MatrixSpace Operation Flytrap 4.5 – DRONELIFE

Türkiye: ‘alternatives from customs reform’

Ionic Angular ion-content inner-scroll has zero peak on iOS stopping scrolling – all customary fixes tried

POPULAR CATEGORY