New Amazon Bedrock service tiers enable you to match AI workload efficiency with value

November 19, 2025

49

Right this moment, Amazon Bedrock introduces new service tiers that provide you with extra management over your AI workload prices whereas sustaining the efficiency ranges your purposes want.

I’m working with clients constructing AI purposes. I’ve seen firsthand how completely different workloads require completely different efficiency and price trade-offs. Many organizations working AI workloads face challenges balancing efficiency necessities with value optimization. Some purposes want speedy response occasions for real-time interactions, whereas others can course of knowledge extra progressively. With these challenges in thoughts, in the present day we’re asserting extra choices pricing that provide you with extra flexibility in matching your workload necessities with value optimization.

Amazon Bedrock now affords three service tiers for workloads: Precedence, Normal, and Flex. Every tier is designed to match particular workload necessities. Functions have various response time necessities primarily based on the use case. Some purposes—akin to monetary buying and selling techniques—demand the quickest response occasions, others want speedy response occasions to help enterprise processes like content material technology, and purposes akin to content material summarization can course of knowledge extra progressively.

The Precedence tier processes your requests forward of different tiers, offering preferential compute allocation for mission-critical purposes like customer-facing chat-based assistants and real-time language translation companies, although at a premium worth level. The Normal tier gives constant efficiency at common charges for on a regular basis AI duties, very best for content material technology, textual content evaluation, and routine doc processing. For workloads that may deal with longer latency, the Flex tier affords a cheaper possibility with decrease pricing, which is properly fitted to mannequin evaluations, content material summarization, and multistep evaluation and agentic workflows.

Now you can optimize your spending by matching every workload to probably the most applicable tier. For instance, for those who’re working a customer support chat-based assistant that wants fast responses, you need to use the Precedence tier to get the quickest processing occasions. For content material summarization duties that may tolerate longer processing occasions, you need to use the Flex tier to scale back prices whereas sustaining dependable efficiency. For many fashions that help Precedence Tier, clients can understand as much as 25% higher output tokens per second (OTPS) latency in comparison with customary tier.

Test the Amazon Bedrock documentation for an up-to-date listing of fashions supported for every service tier.

Selecting the best tier to your workload

Here’s a psychological mannequin that will help you select the appropriate tier to your workload.

Class	Really useful service tier	Description
Mission-critical	Precedence	Requests are dealt with forward of different tiers. Decrease latency responses for user-facing apps (for instance, customer support chat assistants, real-time language translation, interactive AI assistants)
Enterprise-standard	Normal	Responsive efficiency for essential workloads (for instance, content material technology, textual content evaluation, routine doc processing)
Enterprise-noncritical	Flex	Value-efficient for much less pressing workloads (for instance, mannequin evaluations, content material summarization, multistep agentic workflows)

Begin by reviewing with software homeowners your present utilization patterns. Subsequent, determine which workloads want rapid responses and which of them can course of knowledge extra progressively. You’ll be able to then start routing a small portion of your visitors by means of completely different tiers to check efficiency and price advantages.

The AWS Pricing Calculator helps you estimate prices for various service tiers by getting into your anticipated workload for every tier. You’ll be able to estimate your finances primarily based in your particular utilization patterns.

To observe your utilization and prices, you need to use the AWS Service Quotas console or activate mannequin invocation logging in Amazon Bedrock and observe the metrics with Amazon CloudWatch. These instruments present visibility into your token utilization and enable you to observe efficiency throughout completely different tiers.

You can begin utilizing the brand new service tiers in the present day. You select the tier on a per-API name foundation. Right here is an instance utilizing the ChatCompletions OpenAI API, however you possibly can cross the identical service_tier parameter within the physique of InvokeModel, InvokeModelWithResponseStream, Converse, andConverseStream APIs (for supported fashions):

from openai import OpenAI

shopper = OpenAI(
    base_url="https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1",
    api_key="$AWS_BEARER_TOKEN_BEDROCK" # Change with precise API key
)

completion = shopper.chat.completions.create(
    mannequin= "openai.gpt-oss-20b-1:0",
    messages=[
        {
            "role": "developer",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Hello!"
        }
    ]
    service_tier= "precedence"  # choices: "precedence | default | flex"
)

print(completion.selections[0].message)

To be taught extra, take a look at the Amazon Bedrock Person Information or contact your AWS account group for detailed planning help.

I’m wanting ahead to listening to how you utilize these new pricing choices to optimize your AI workloads. Share your expertise with me on-line on social networks or join with me at AWS occasions.

— seb