Deploying Small Language Fashions at Scale with AWS IoT Greengrass and Strands Brokers

December 19, 2025

30

Fashionable producers face an more and more complicated problem: implementing clever decision-making techniques that reply to real-time operational knowledge whereas sustaining safety and efficiency requirements. The quantity of sensor knowledge and operational complexity calls for AI-powered options that course of data domestically for instant responses whereas leveraging cloud sources for complicated duties.The business is at a vital juncture the place edge computing and AI converge. Small Language Fashions (SLMs) are light-weight sufficient to run on constrained GPU {hardware} but highly effective sufficient to ship context-aware insights. Not like Massive Language Fashions (LLMs), SLMs match throughout the energy and thermal limits of commercial PCs or gateways, making them very best for manufacturing unit environments the place sources are restricted and reliability is paramount. For the aim of this weblog submit, assume a SLM has roughly 3 to fifteen billion parameters.

This weblog focuses on Open Platform Communications Unified Structure (OPC-UA) as a consultant manufacturing protocol. OPC-UA servers present standardized, real-time machine knowledge that SLMs operating on the edge can eat, enabling operators to question tools standing, interpret telemetry, or entry documentation immediately—even with out cloud connectivity.

AWS IoT Greengrass allows this hybrid sample by deploying SLMs along with AWS Lambda features on to OPC-UA gateways. Native inference ensures responsiveness for safety-critical duties, whereas the cloud handles fleet-wide analytics, multi-site optimization, or mannequin retraining beneath stronger safety controls.

This hybrid strategy opens prospects throughout industries. Automakers might run SLMs in car compute models for pure voice instructions and enhanced driving expertise. Vitality suppliers might course of SCADA sensor knowledge domestically in substations. In gaming, SLMs might run on gamers’ gadgets to energy companion AI in video games. Past manufacturing, increased training establishments might use SLMs to supply customized studying, proofreading, analysis help and content material technology.

On this weblog, we are going to take a look at find out how to deploy SLMs to the sting seamlessly and at scale utilizing AWS IoT Greengrass.

The answer makes use of AWS IoT Greengrass to deploy and handle SLMs on edge gadgets, with Strands Brokers offering native agent capabilities. The providers used embrace:

AWS IoT Greengrass: An open-source edge software program and cloud service that permits you to deploy, handle and monitor system software program.
AWS IoT Core: Service enabling you to attach IoT gadgets to AWS cloud.
Amazon Easy Storage Service (S3): A extremely scalable object storage which helps you to to retailer and retrieve any quantity of information.
Strands Brokers: A light-weight Python framework for operating multi-agent techniques utilizing cloud and native inference.

We exhibit the agent capabilities within the code pattern utilizing an industrial automation situation. We offer an OPC-UA simulator which defines a manufacturing unit consisting of an oven and a conveyor belt in addition to upkeep runbooks because the supply of the commercial knowledge. This answer will be prolonged to different use circumstances by utilizing different agentic instruments.The next diagram exhibits the high-level structure:

AWS IoT Greengrass workflow for edge-based language model deployment using Strands Agents and Ollama

Consumer uploads a mannequin file in GPT-Generated Unified Format (GGUF) format to an Amazon S3 bucket which AWS IoT Greengrass gadgets have entry to.
The gadgets within the fleet obtain a file obtain job. S3FileDownloader element processes this job and downloads the mannequin file to the system from the S3 bucket. The S3FileDownloader element can deal with massive file sizes, sometimes wanted for SLM mannequin information that exceed the native Greengrass element artifact dimension limits.
The mannequin file in GGUF format is loaded into Ollama when Strands Brokers element makes the primary name to Ollama. GGUF is a binary file format used for storing LLMs. Ollama is a software program which masses the GGUF mannequin file and runs inference. The mannequin title is specified within the recipe.yaml file of the element.
The consumer sends a question to the native agent by publishing a payload to a tool particular agent subject in AWS IoT MQTT dealer.
After receiving the question, the element leverages the Strands Brokers SDK‘s model-agnostic orchestration capabilities. The Orchestrator Agent perceives the question, causes concerning the required data sources, and acts by calling the suitable specialised brokers (Documentation Agent, OPC-UA Agent, or each) to collect complete knowledge earlier than formulating a response.
If the question is expounded to an data that may be discovered within the documentation, Orchestrator Agent calls Documentation Agent.
Documentation Agent finds the knowledge from the offered paperwork and returns it to Orchestrator Agent.
If the question is expounded to present or historic machine knowledge, Orchestrator Agent will name OPC-UA Agent.
OPC-UA Agent makes a question to the OPC-UA server relying on the consumer question and returns the info from server to Orchestrator Agent.
Orchestrator Agent kinds a response primarily based on the collected data. Strands Brokers element publishes the response to a tool particular agent response subject in AWS IoT MQTT dealer.
The Strands Brokers SDK allows the system to work with domestically deployed basis fashions by way of Ollama on the edge, whereas sustaining the choice to change to cloud-based fashions like these in Amazon Bedrock when connectivity is on the market.
AWS IAM Greengrass service position gives entry to the S3 useful resource bucket to obtain fashions to the system.
AWS IoT certificates hooked up to the IoT factor permits Strands Brokers element to obtain and publish MQTT payloads to AWS IoT Core.
Greengrass element logs the element operation to the native file system. Optionally, AWS CloudWatch logs will be enabled to observe the element operation within the CloudWatch console.

Earlier than beginning this walkthrough, guarantee you could have:

On this submit, you’ll:

Deploy Strands Brokers as an AWS IoT Greengrass element.
Obtain SLMs to edge gadgets.
Check the deployed agent.

Part deployment

First, let’s deploy the StrandsAgentGreengrass element to your edge system.Clone the Strands Brokers repository:

git clone https://github.com/aws-solutions-library-samples/guidance-for-deploying-ai-agents-to-device-fleets-using-aws-iot-greengrass.git
cd guidance-for-deploying-ai-agents-to-device-fleets-using-aws-iot-greengrass

Use Greengrass Improvement Equipment (GDK) to construct and publish the element:

To publish the element, you want to modify the area and bucket values in gdk-config.json file. The advisable artifact bucket worth is greengrass-artifacts. GDK will generate a bucket in greengrass-artifacts-– format, if it doesn’t exist already. You’ll be able to consult with Greengrass Improvement Equipment CLI configuration file documentation for extra data. After modifying the bucket and area values, run the next instructions to construct and publish the element.

gdk element construct
gdk element publish

The element will seem within the AWS IoT Greengrass Elements Console. You’ll be able to consult with Deploy your element documentation to deploy the element to your gadgets.

After the deployment, the element will run on the system. It consists of Strands Brokers, an OPC-UA simulation server and pattern documentation. Strands Brokers makes use of Ollama server because the SLM inference engine. The element has OPC-UA and documentation instruments to retrieve the simulated real-time knowledge and pattern tools manuals for use by the agent.

If you wish to check the element in an Amazon EC2 occasion, you should utilize IoTResources.yaml Amazon CloudFormation template to deploy a GPU occasion with crucial software program put in. This template additionally creates sources for operating Greengrass. After the deployment of the stack, a Greengrass Core system will seem within the AWS IoT Greengrass console. The CloudFormation stack will be discovered beneath supply/cfn folder within the repository. You’ll be able to learn find out how to deploy a CloudFormation stack in Create a stack from the CloudFormation console documentation.

Downloading the mannequin file

The element wants a mannequin file in GGUF format for use by Ollama because the SLM. You have to copy the mannequin file beneath /tmp/vacation spot/ folder within the edge system. The mannequin file title have to be mannequin.gguf, in case you use the default ModelGGUFName parameter within the recipe.yaml file of the element.

For those who don’t have a mannequin file in GGUF format, you may obtain one from Hugging Face, for instance Qwen3-1.7B-GGUF. In a real-world utility, this is usually a fine-tuned mannequin which solves particular enterprise issues in your use case.

(Non-obligatory) Use S3FileDownloader to obtain mannequin information

To handle mannequin distribution to edge gadgets at scale, you should utilize the S3FileDownloader AWS IoT Greengrass element. This element is especially invaluable for deploying massive information in environments with unreliable connectivity, because it helps automated retry and resume capabilities. Because the mannequin information will be massive, and system connectivity just isn’t dependable in lots of IoT use circumstances, this element will help you to deploy fashions to your system fleets reliably.

After deploying S3FileDownloader element to your system, you may publish the next payload to issues//obtain subject by utilizing AWS IoT MQTT Check Consumer. The file shall be downloaded from the Amazon S3 bucket and put into /tmp/vacation spot/ folder within the edge system:

{
    "jobId": "filedownload",
    "s3Bucket": "",
    "key":"mannequin.gguf"
}

For those who used the CloudFormation template offered within the repository, you should utilize the S3 bucket created by this template. Consult with the output of the CloudFormation stack deployment to view the title of the bucket.

Testing the native agent

As soon as the deployment is full and the mannequin is downloaded, we are able to check the agent by way of the AWS IoT Core MQTT Check Consumer. Steps:

Subscribe to issues//# subject to view the response of the agent.
Publish a check question to the enter subject issues//agent/question:

{
    "question": "What's the standing of the conveyor belt?"
}

You need to obtain responses on a number of subjects:
1. Ultimate response subject (issues//agent/response) which comprises the ultimate response of the Orchestrator Agent:

{
    "question": "What's the standing of the oven?",
    "response": "The oven is presently working at 802.2°F (barely above the setpoint of 800.0°F), with heating lively...",
    "timestamp": 1757677413.6358254,
    "standing": "success"
}

1. Sub-agent responses (issues//agent/subagent) which comprises the response from middleman brokers similar to OPC-UA Agent and Documentation Agent:

{
    "agent": "opc manufacturing unit",
    "question": "Get present oven standing",
    "response": "**Oven Standing Report:**n- **Present Temperature:** 802.2°F...",
    "timestamp": 1757677323.443954
}

The agent will course of your question utilizing the native SLM and supply responses primarily based on each the OPC-UA simulated knowledge and the tools documentation saved domestically.For demonstration functions, we use the AWS IoT Core MQTT check shopper as a simple interface to speak with the native system. In manufacturing, Strands Brokers can run absolutely on the system itself, eliminating the necessity for any cloud interplay.

Monitoring the element

To watch the element’s operation, you may join remotely to your AWS IoT Greengrass system and examine the element logs:

sudo tail -f /greengrass/v2/logs/com.strands.agent.greengrass.log

It will present you the real-time operation of the agent, together with mannequin loading, question processing, and response technology. You’ll be able to be taught extra about Greengrass logging system in Monitor AWS IoT Greengrass logs documentation.

Go to AWS IoT Core Greengrass console to delete the sources created on this submit:

Go to Deployments, select the deployment that you just used for deploying the element, then revise the deployment by eradicating the Strands Brokers element.
You probably have deployed S3FileDownloader element, you may take away it from the deployment as defined within the earlier step.
Go to Elements, select the Strands Brokers element and select ‘Delete model’ to delete the element.
You probably have created S3FileDownloader element, you may delete it as defined within the earlier step.
For those who deployed the CloudFormation stack to run the demo in an EC2 occasion, delete the stack from AWS CloudFormation console. Observe that the EC2 occasion will incur hourly costs till it’s stopped or terminated.
For those who don’t want the Greengrass core system, you may delete it from Core gadgets part of Greengrass console.
After deleting Greengrass Core system, delete the IoT certificates hooked up to the core factor. To search out the factor certificates, go to AWS IoT Issues console, select the IoT factor created on this information, view the Certificates tab, select the hooked up certificates, select Actions, then select Deactivate and Delete.

On this submit, we confirmed find out how to run a SLM domestically utilizing Ollama built-in by way of Strands Brokers on AWS IoT Greengrass. This workflow demonstrated how light-weight AI fashions will be deployed and managed on constrained {hardware} whereas benefiting from cloud integration for scale and monitoring. Utilizing OPC-UA as our manufacturing instance, we highlighted how SLMs on the edge allow operators to question tools standing, interpret telemetry, and entry documentation in actual time—even with restricted connectivity. The hybrid mannequin ensures vital selections occur domestically, whereas complicated analytics and retraining are dealt with securely within the cloud.This structure will be prolonged to create a hybrid cloud-edge AI agent system, the place edge AI brokers (utilizing AWS IoT Greengrass) seamlessly combine with cloud-based brokers (utilizing Amazon Bedrock). This allows distributed collaboration: edge brokers handle real-time, low-latency processing and instant actions, whereas cloud brokers deal with complicated reasoning, knowledge analytics, mannequin refinement, and orchestration.

Concerning the authors

Ozan Cihangir is a Senior Prototyping Engineer at AWS Specialists & Companions Group. He helps clients to construct progressive options for his or her rising expertise initiatives within the cloud.

Luis Orus is a senior member of the AWS Specialists & Companions Group, the place he has held a number of roles – from constructing high-performing groups at international scale to serving to clients innovate and experiment shortly by way of prototyping.

Amir Majlesi leads the EMEA prototyping crew inside AWS Specialists & Companions Group. He has intensive expertise in serving to clients speed up cloud adoption, expedite their path to manufacturing and foster a tradition of innovation. By means of fast prototyping methodologies, Amir allows buyer groups to construct cloud native purposes, with a deal with rising applied sciences similar to Generative & Agentic AI, Superior Analytics, Serverless and IoT.

Jaime Stewart centered his Options Architect Internship inside AWS Specialists & Companions Group round Edge Inference with SLMs. Jaime presently pursues a MSc in Synthetic Intelligence.

Previous articleLimX Dynamics unveils TRON 2 shape-shifting limbed robotic

Next articleRobotic Speak Episode 138 – Robots within the surroundings, with Stefano Mintchev

Deploying Small Language Fashions at Scale with AWS IoT Greengrass and Strands Brokers

Part deployment

Downloading the mannequin file

(Non-obligatory) Use S3FileDownloader to obtain mannequin information

Testing the native agent

Monitoring the element

Concerning the authors

Make This Dongle to Play VRC Professional with a Actual RC Transmitter

Recreation Too Loud and Browser Too Quiet? Construct This Desktop Quantity Mixer

This Robotic Seems to be Like a Cyberpunk Caterpillar

LEAVE A REPLY Cancel reply

Most Popular

High 10 MCP Servers for AI Builders in 2026

Chinese language County Expands Drone Community To Cowl 964 Miles

Vodafone Thought increase broadband infra throughout India

Gentle guided system delivers uniform nanoliter droplets on chip

Recent Comments

ABOUT US

POPULAR POSTS

High 10 MCP Servers for AI Builders in 2026

Chinese language County Expands Drone Community To Cowl 964 Miles

Vodafone Thought increase broadband infra throughout India

POPULAR CATEGORY