In e-commerce, delivering quick, related search outcomes helps customers discover merchandise rapidly and precisely, enhancing satisfaction and growing gross sales. OpenSearch is a distributed search engine that provides superior search capabilities together with superior full-text and faceted search, customizable analyzers and tokenizers, and auto-complete to assist clients rapidly discover the merchandise they need. It scales to deal with hundreds of thousands of merchandise, catalogs and site visitors surge. Amazon OpenSearch Service is a managed service that lets customers construct search workloads balancing search high quality, efficiency at scale and price. Designing and sizing an Amazon OpenSearch Service cluster appropriately is required to fulfill these calls for.
Whereas basic sizing tips for OpenSearch Service domains are lined intimately in OpenSearch Service documentation, on this submit we particularly deal with T-shirt-sizing OpenSearch Service domains for e-commerce search workloads. T-shirt sizing simplifies complicated capability planning by categorizing workloads into sizes like XS, S, M, L, XL based mostly on key workload parameters akin to information quantity and question concurrency. For e-commerce search, the place information development is reasonable and read-heavy queries predominate, this method gives a versatile, scalable option to allocate the assets with out overprovisioning or underestimating wants.
How OpenSearch Service shops indexes and performs queries
E-commerce search platforms deal with huge quantities of information and every day information ingestion is usually comparatively small and incremental, reflecting catalog modifications, value updates, stock standing and consumer actions like clicks and evaluations. Effectively managing this information and organizing it per OpenSearch Service finest practices is essential in attaining optimum efficiency. The workload is read-heavy, consisting of consumer queries with superior filtering and faceting, particularly throughout gross sales or seasonal spikes that require elasticity in compute and storage assets.
You ingest product and catalog updates (stock, listings, pricing) into OpenSearch utilizing bulk APIs or real-time streaming. You index information into logical indexes. The way you create and arrange indexes in e-commerce has a major affect on search, scalability and suppleness. The method will depend on the scale, variety and operational wants of the catalog. Small to medium-sized e-commerce platforms generally use a single, complete product index that shops all product info with product class. Extra indexes might exist for orders, customers, evaluations and promotions relying on search necessities and information separation wants. Giant, various catalogs might cut up merchandise into category-specific indexes for tailor-made mappings and scaling. You cut up every index into major shards, every storing a portion of the paperwork. To make sure excessive availability and improve question throughput, you configure every major shard with at the least one reproduction shard saved on completely different information nodes.

Diagram 1. How major and reproduction shards are distributed amongst nodes
This diagram reveals two indexes (Merchandise and Critiques), every cut up into two major shards with one reproduction. OpenSearch distributes these shards throughout cluster nodes to make sure that major and reproduction shards for a similar information don’t reside on the identical node. OpenSearch runs search requests utilizing a scatter-gather mechanism. When an software submits a request, any node within the cluster can obtain it. This receiving node turns into the coordinating node for that particular question. The coordinating node determines which indices and shards can serve the question. It forwards the question to both major or reproduction shards and orchestrates the completely different phases of the search operation and returns the response. This course of ensures environment friendly distribution and execution of search requests throughout the OpenSearch cluster.

Diagram 2. Tracing a Search question: “blue trainers”This diagram walks by means of how a search question–for instance, “blue trainers”flows by means of your OpenSearch Service area .
- Request: The applying sends the seek for “blue trainers” to the area. One information node acts because the coordinating node.
- Scatter: The coordinator broadcasts the question to both the first or reproduction shard for every of the shards within the ‘Merchandise’ index (Nodes 1, 2, and three on this case).
- Collect: Every information node searches its native shards(s) for “blue trainers” and returns its personal high outcomes (e.g. Node 1 returns its finest matches from P0).
- Ultimate outcomes: The coordinator merges these partial lists, types them into single definitive checklist of essentially the most related footwear, and sends the consequence again to the app.
Understanding T-Shirt Sizing for E-commerce OpenSearch Service Cluster
Storage planning
Storage impacts each efficiency and price. OpenSearch Service gives two predominant storage choices based mostly on question latency necessities and information persistence wants. Choosing the suitable storage sort in a managed OpenSearch Service improves each efficiency and optimizes price of the area. You’ll be able to select between Amazon Elastic Block Retailer( EBS) storage volumes and occasion storage volumes (native storage) in your information nodes.
Amazon EBS gp3 volumes provide excessive throughput, whereas the native NVMe SSD volumes, for instance, on the r8gd, i3, or i4i occasion households, provide low latency, quick indexing efficiency and high-speed storage, making them preferrred for eventualities the place actual time information updates and excessive search throughput are crucial for search operations. For search workloads that require a steadiness between efficiency and price, situations backed with EBS GP3 SSD volumes present a dependable possibility. This SSD storage gives enter/output operations per second (IOPS) which can be well-suited for general-purpose search workloads. It additionally permits customers to provision further IOPS and storage as wanted.
When sizing an OpenSearch cluster, begin by estimating complete storage wants based mostly on catalog dimension and anticipated development. For instance, if the catalog comprises 500,000 inventory protecting items (SKUs), averaging 50KB every; the uncooked information sums to about 25GB. The dimensions of the uncooked information, nevertheless, is only one side of the storage necessities. Additionally take into account the Duplicate depend, indexing overhead (10%), Linux reserves (5%), and OpenSearch Service reserves (20% as much as 20GB) per occasion whereas calculating the required storage.
In abstract,when you’ve got 25GB of information at any given time who need one reproduction, the minimal storage requirement is nearer to 25 * 2 * 1.1 / 0.95 / 0.8 = 72.5 GB. This calculation may be generalized as follows:
Storage requirement = Uncooked information * (1 + variety of replicas) * 1.45
This helps guarantee disk area headroom on all information nodes, stopping shard failures and sustaining search efficiency. Provisioning storage barely past this minimal is beneficial to accommodate future development and cluster rebalancing.
Information nodes:
For search workloads, compute-optimized situations (C8g) are well-suited for central processing unit (CPU)-intensive operations like nested queries and joins. Nonetheless, general-purpose situations like M8g provide a greater steadiness between CPU and reminiscence. Reminiscence-optimized situations (R8g, R8gd) are beneficial for memory-intensive operations like KNN search, the place bigger reminiscence footprint is required. In giant, complicated deployments, compute-optimized situations like c8g or general-purpose m8g, deal with CPU-intensive duties, offering environment friendly question processing and balanced useful resource allocation. The steadiness between CPU and reminiscence, makes them preferrred for managing complicated search operations for large-scale information processing. For terribly giant search workloads (tens of TB) the place latency will not be a major concern, think about using the brand new Amazon OpenSearch Service Writable heat which helps write operations on heat indices.
| Occasion Class | Greatest for customers who… | Examples (AWS) | Traits |
| Common Goal | have reasonable search site visitors and need a well-balanced, entry-level setup | M household (M8g) | Balanced CPU & reminiscence, EBS storage. Good start line for small to medium-sized catalogs. |
| Compute Optimized | have excessive queries per second (QPS) search site visitors or queries contain scoring scripts or complicated filtering | C household (C8) | Excessive CPU-to-memory ratio. Perfect for CPU-bound workloads like many concurrent queries. |
| Reminiscence Optimized | work with giant catalogs, want quick aggregations, or cache loads in reminiscence | R household (R8g) | Extra reminiscence per core. Holds giant indices in reminiscence to hurry up searches and aggregations. |
| Storage Optimized | replace stock continuously or have a lot information that disk entry slows issues down | I household (I3, I4g), Im4gn | NVMe SSD and SSD native storage. Greatest for I/O-heavy operations like fixed indexing or giant product catalogs hitting disk continuously. |
Cluster supervisor nodes:
For manufacturing workloads, it is strongly recommended so as to add devoted cluster supervisor nodes to extend the cluster stability and offload cluster administration duties from the information nodes. To decide on the suitable occasion sort in your cluster supervisor nodes, overview the service suggestions based mostly on the OpenSearch model and variety of shards within the cluster.
Sharding technique
As soon as storage necessities are understood, you’ll be able to examine the indexing technique. You create shards in OpenSearch Service to distribute an index evenly throughout the nodes in a cluster. AWS recommends single product index with class aspects for simplicity or partition indexes by class for big or distributed catalogs. The dimensions and variety of shards per index play an important function in OpenSearch Service efficiency and scalability. The suitable configuration ensures balanced information distribution, avoids sizzling recognizing, and minimizes coordination overhead on nodes to be used circumstances that prioritizes question pace and information freshness.
For read-heavy workloads like e-commerce, the place search latency is the important thing efficiency goal, preserve shard sizes between 10-30GB. To realize this, calculate the variety of major shards by dividing your complete index dimension by your goal shard dimension. For instance, when you’ve got a 300GB index and need 20GB shards, configure 15 major shards (300GB ÷ 20GB = 15 shards). Monitor shard sizes utilizing the _cat/shards API and alter the shard depend throughout reindexing if shards develop past the optimum vary.
Add reproduction shards to enhance search question throughput and fault tolerance. The minimal suggestion is to have one reproduction; you’ll be able to add extra replicas for prime question throughput necessities. In OpenSearch Service, a shard processes operations like querying single-threaded, which means one thread handles a shard’s duties at a time. Duplicate shards can serve learn requests by distributing them throughout a number of threads and nodes, enabling parallel processing.
T-shirt sizing for an e-commerce workload
In an OpenSearch T-shirt sizing desk, every dimension label (XSmall, Small, Medium, Giant, XLarge) represents a generalized cluster scale class that may assist groups translate technical necessities into easy, actionable capability planning. Every dimension permits architects to rapidly align their catalog dimension, storage necessities, shard planning, CPU and AWS occasion decisions to the cluster assets provisioned, making it simpler to scale infrastructure as enterprise grows.
By referring to this desk, groups can choose the class just like their present workload and use the T-shirt dimension as a place to begin whereas persevering with to refine configuration as they monitor and optimize real-world efficiency. For instance, XSmall is fitted to small catalogs with a whole bunch of 1000’s of merchandise and minimal search site visitors. Small clusters are designed for rising catalogs with hundreds of thousands of SKUs, supporting reasonable question volumes and scaling up throughout busy durations. Medium corresponds to mid-size e-commerce operations dealing with hundreds of thousands of merchandise and better search calls for, whereas Giant matches giant on-line companies with tens of hundreds of thousands of SKUs, requiring sturdy infrastructure for quick, dependable search. XLarge is meant for main marketplaces or world platforms with twenty million or extra SKUs, monumental information storage wants, and large concurrent utilization.
| T-shirt dimension | Variety of Merchandise | Catalog Measurement | Storage wanted | Main Shard Rely | Energetic Shard Rely | Information Nodes Occasion Kind | Cluster Supervisor Node instanceType |
| XSmall | 500K | 50 GB | 145 GB | 2 | 4 | [2] r8g.xlarge | [3] m8g.giant |
| Small | 2M | 200 GB | 580 GB | 8 | 16 | [2] c8g.4xlarge | [3] m8g.giant |
| Medium | 5M | 500 GB | 1.45 TB | 20 | 40 | [2] c8g.8xlarge | [3] m8g.giant |
| Giant | 10M | 1 TB | 2.9 TB | 40 | 80 | [4] c8g.8xlarge | [3] m8g.giant |
| XLarge | 20M | 2 TB | 5.8 TB | 80 | 160 | [4] c8g.16xlarge | [3] m8g.giant |
- T-shirt dimension: Represents the dimensions of the cluster, starting from XS as much as XL for high-volume workloads.
- Variety of merchandise: The estimated depend of SKUs within the e-commerce catalog, which drives the information quantity.
- Catalog dimension: The overall estimated disk dimension of all listed product information, based mostly on typical SKU doc dimension.
- Storage wanted: The precise storage required after accounting for replicas and overhead, making certain sufficient room for protected and environment friendly operation.
- Main shard depend: The variety of predominant index shards chosen to steadiness parallel processing and useful resource administration.
- Energetic shard depend: The overall variety of reside shards (major with one reproduction), indicating what number of shards must be distributed for availability and efficiency.
- Information node occasion sort: The beneficial occasion sort to make use of for information nodes, chosen for reminiscence, CPU, and disk throughput.
- Cluster supervisor node occasion sort: The beneficial occasion sort for light-weight, devoted grasp nodes which handle cluster stability and coordination.
Scaling methods for e-commerce workloads
E-commerce platforms regularly face challenges with unpredictable site visitors surges and rising product catalogs. To deal with these challenges, OpenSearch Service mechanically publishes crucial efficiency metrics to Amazon CloudWatch, enabling customers to observe when particular person nodes attain useful resource limits. These metrics embody CPU utilization exceeding 80%, JVM reminiscence strain above 75%, frequent rubbish assortment pauses, and thread pool rejections.
OpenSearch Service additionally gives sturdy scaling options that preserve constant search efficiency throughout various workload calls for. Use the vertical scaling technique to improve occasion sorts from smaller to bigger configurations, akin to m6g.giant to m6g.2xlarge. Whereas vertical scaling triggers a blue-green deployment, scheduling these modifications throughout off-peak hours minimizes affect on operations.
Use the horizontal scaling technique so as to add extra information nodes for distributing indexing and search operations. This method proves significantly efficient when scaling for site visitors development or growing dataset dimension. In domains with cluster supervisor nodes, including information nodes proceeds easily with out triggering a blue-green deployment. CloudWatch metrics information horizontal scaling choices by monitoring thread pool rejections throughout nodes, indexing latency, and cluster-wide load patterns. Although the method requires shard rebalancing and will quickly affect efficiency, it successfully distributes workload throughout the cluster.
Non permanent replicas present a versatile resolution for managing high-traffic durations. By growing reproduction shards by means of the _settings API, learn throughput may be boosted when wanted. This method gives a dynamic response to altering site visitors patterns with out requiring extra substantial infrastructure modifications.
For extra info on scaling an OpenSearch Service area, please consult with How do I scale up or scale out an OpenSearch Service area?
Monitoring and operational finest practices
Monitoring key efficiency CloudWatch metrics is crucial to make sure a well-optimised OpenSearch service area. One of many key elements is sustaining CPU utilization on information nodes underneath 80% to forestall question slowdowns. One other metric is making certain that JVM reminiscence strain is maintained under 75% on information nodes to forestall rubbish assortment (GC) pauses that may have an effect on search response time. OpenSearch service publishes these metrics to CloudWatch at 1 minute interval and customers can create alarms on these metrics for alerts on the manufacturing workloads. Please refer beneficial CloudWatch alarms for OpenSearch Service
P95 question latency needs to be monitored to determine sluggish queries and optimize efficiency. One other necessary indicator is thread pool rejections. A excessive variety of thread pool rejections may end up in failed search requests, and affecting consumer expertise. By constantly monitoring these CloudWatch metrics, customers can proactively scale assets, optimise queries, and forestall efficiency bottlenecks.
Conclusion
On this submit, we confirmed the right way to right-size Amazon OpenSearch Service domains for e-commerce workloads utilizing a T-shirt sizing method. We explored key elements together with storage optimization, sharding methods, scaling strategies, and important Amazon CloudWatch metrics for monitoring efficiency.
To construct a performant search expertise, begin with a smaller deployment and iterate as your enterprise scales. Get began with these 5 steps:
- Consider your workload necessities by way of storage, search throughput, and search efficiency
- Choose your preliminary T-shirt dimension based mostly in your product catalog dimension and site visitors patterns
- Deploy the beneficial sharding technique in your catalog scale
- Load take a look at your cluster utilizing OpenSearch benchmark and re-iterate till efficiency necessities are reached
- Configure Amazon CloudWatch monitoring and alarms, then proceed to observe your manufacturing area
In regards to the authors

