Bridging information silos: cross-bounded context querying with Vanguard’s Operational Learn-only Information Retailer (ORDS) utilizing Amazon Redshift

October 6, 2025

36

Are you modernizing your legacy batch processing techniques? At Vanguard, we confronted important challenges with our legacy mainframe system that restricted our capability to ship trendy, personalised buyer experiences. Our centralized database structure created efficiency bottlenecks and made it tough to scale companies independently for our tens of millions of private and institutional buyers.

On this put up, we present you the way we modernized our information structure utilizing Amazon Redshift as our Operational Learn-only Information Retailer (ORDS). You’ll learn the way we transitioned to a cloud-native, domain-driven structure whereas preserving essential batch processing capabilities. We present you the way this resolution enabled us to create logically remoted information domains whereas sustaining cross-domain analytics capabilities—all whereas adhering to the rules of bounded contexts and distributed information possession.

Background and challenges

As monetary wants proceed to evolve, Vanguard is dedicated to delivering adaptable, top-notch experiences that foster long-lasting buyer relationships. This dedication spans from enhancing the non-public investor journey to bringing personalised cellular dashboards and connecting institutional purchasers with superior recommendation choices.

To raise buyer expertise and drive digital transformation, Vanguard has embraced domain-driven design rules. This method focuses on creating autonomous groups, fostering sooner innovation, and constructing information mesh structure. Central to this transformation is the Private Investor workforce’s mainframe modernization effort, transitioning from a legacy system to a cloud-based, distributed information structure organized round bounded contexts – distinct enterprise domains that handle their very own information. As a part of this shift, every microservice now manages its personal native information retailer utilizing Amazon Aurora PostgreSQL-Suitable Version or Amazon DynamoDB. This method allows domain-level information possession and operational autonomy.

Vanguard’s current mainframe system, constructed on a centralized Db2 database, allows cross-domain information entry and integration but in addition introduces a number of architectural challenges. Although batch processes can be part of information throughout a number of bounded contexts utilizing SQL joins and database operations to combine data from numerous sources, this tight coupling creates important dangers and operational points.

Challenges with the centralized database method embrace:

Useful resource Rivalry: Processes from one area can negatively impression different domains because of shared compute sources, resulting in efficiency degradation throughout the system.
Lack of Area Isolation: Modifications in a single bounded context can have unintended ripple results throughout different domains, rising the chance of system-wide failures.
Scalability Constraints: The centralized structure creates bottlenecks as load will increase, making it tough to scale particular person parts independently.
Excessive Coupling: Tight integration between domains makes it difficult to change or improve particular person parts with out affecting the whole system.
Restricted Fault Tolerance: Points in a single area can cascade throughout the whole system because of shared infrastructure and information dependencies.

To deal with these architectural challenges, we selected to make use of Amazon Redshift as our Operational Learn-only Information Retailer (ORDS). The Amazon Redshift structure has compute and storage separation, which allows us to create multi-cluster architectures with a separate endpoint for every area with impartial scaling of compute and storage sources. Our resolution leverages the info sharing capabilities of Amazon Redshift to create logically remoted information domains whereas sustaining the flexibility to carry out cross-domain analytics when wanted.

Key advantages of the Amazon Redshift resolution embrace:

Useful resource Isolation: Every area will be assigned devoted Amazon Redshift compute sources, ensuring one area’s workload doesn’t impression others.
Unbiased Scaling: Domains can scale their compute sources independently primarily based on their particular wants.
Managed Information Sharing: Amazon Redshift’s information sharing characteristic allows safe and managed cross-domain information entry with out tight coupling, sustaining clear area boundaries.

Let’s discover the totally different options we evaluated earlier than deciding on ORDS with Amazon Redshift as our optimum method.

Options explored

We carried out ORDS as our optimum resolution after conducting a complete analysis of obtainable choices. This part outlines our decision-making course of and examines the alternate options we thought of throughout our evaluation.

Operational Learn-only Information Retailer (ORDS):

In our analysis, we discovered that utilizing Amazon Redshift for ORDS gives a robust resolution for dealing with information throughout totally different enterprise areas. It excels at managing massive volumes of knowledge from a number of sources, offering quick entry to replicated information for batch processes that require cross-bounded context information, and mixing data utilizing acquainted SQL queries. The answer notably shines in dealing with high-volume reads from our information sources.

Benefits:

Works properly in a relational database
Excels at real-time entry to information from a number of enterprise areas
Improves efficiency of batch jobs coping with massive information volumes
Shops information in acquainted desk format, accessible by way of SQL
Enforces clear information possession, with every enterprise space accountable for its information
Gives scalable structure that reduces the chance of single level of failure

Disadvantages:

Requires further information validation throughout loading processes to keep up information uniqueness
Wants cautious administration of major key constraints since Amazon Redshift optimizes for analytical efficiency
Could require further monitoring and controls in comparison with conventional RDBMS techniques

Listed below are the opposite options we evaluated:

Bulk APIs:

We discovered that Bulk APIs gives an method for dealing with massive volumes of knowledge.

Benefits:

Close to actual time entry to bulk information via a single request
Autonomous groups have management over entry patterns
Environment friendly batch processing of enormous datasets with multi-record retrieval

Disadvantages:

Every product workforce must create their very own bulk API
Should you want information from totally different areas, you need to mix it your self
The workforce offering the API should ensure that it might probably deal with massive quantities of requests
You would possibly want to make use of a number of APIs to get all the info you need
Should you’re getting information in chunks (pagination), you would possibly miss some data if it modifications between requests

Whereas Bulk APIs provide highly effective capabilities, we discovered they require substantial workforce coordination and cautious implementation to be efficient.

Information Lake:

Our analysis confirmed that information lakes can successfully mix data from totally different components of our enterprise. They excel at processing massive quantities of knowledge directly, offering search capabilities via unified information codecs, and managing massive volumes of numerous and sophisticated information.

Benefits:

Handles huge information volumes effectively
Helps a number of information codecs and constructions
Permits advanced analytics and information science workloads
Gives cost-effective storage options
Accommodates each structured and unstructured information

Disadvantages:

Could not present real-time, high-speed information entry
Requires further effort with advanced information constructions, particularly these with many interconnected components
Wants particular methods to prepare information in a easy, flat construction
Calls for important information governance and administration
Requires specialised abilities for efficient implementation

Whereas information lakes excel at big-picture evaluation of enormous datasets, they weren’t optimum for our real-time information wants and sophisticated information relationships.

S3 Export/Trade:

In our evaluation, we discovered that S3 Export/Trade gives a way for sharing information between totally different enterprise areas utilizing file storage. This method successfully handles massive volumes of knowledge and permits simple filtering of data utilizing information frames.

Benefits:

Gives easy, cost-effective information storage
Helps high-volume information transfers
Permits simple information filtering capabilities
Gives versatile entry management
Facilitates cross-region information sharing

Disadvantages:

Not appropriate for real-time information wants
Requires further processing to transform information into usable desk format
Calls for important information preparation effort
Lacks instant information consistency
Wants further instruments for information transformation

Whereas S3 Export/Trade works properly for sharing massive datasets between groups, it didn’t meet our necessities for fast, real-time entry or instantly usable information codecs.

The next desk gives a high-level comparability of the totally different information integration options we thought of for our modernization efforts. It outlines the place every resolution is most acceptable to make use of and when it won’t be your best option:

Answer

Bulk APIs

Information Lake

ORDS

S3 Export/Trade

When to make use of

Actual-time operational information is required

Fetching particular information subsets

Processing massive quantities of knowledge directly

Many bounded context

Close to real-time entry throughout a number of bounded contexts

Massive quantity batch processing

Few bounded contextsHandling massive volumes of knowledge

Level-in-time export is enough

When to not use

Many bounded contexts concerned

Actual-time information entry wanted

Structured, transactional information processing

Inside a single bounded context

Actual-time information wants

Many bounded contexts

Desk 1: Information Integration Options Comparability

Based mostly on our comparability, we discovered ORDS to be the optimum resolution for our wants, notably when our batch processes require entry to information from a number of bounded contexts in real-time. Our implementation effectively handles massive volumes of knowledge, considerably enhancing the efficiency of our batch jobs. We selected ORDS as a result of it shops information in a well-known desk format, accessible by way of SQL, making it easy and environment friendly for our groups to make use of.

The structure additionally aligns with our domain-driven design rules by imposing clear information possession, the place every bounded context maintains accountability for its personal information administration. This method gives us with each scalability and reliability, decreasing the chance of a single level of failure.

Amazon Redshift: Powering Vanguard’s ORDS Answer

Amazon Redshift serves because the spine of our ORDS implementation, providing a number of essential options that help our modernization objectives:

Information Sharing

Our resolution leveraged the strong information sharing capabilities of Amazon Redshift, accessible on each Server-based Redshift RA3 cases and Redshift Serverless choices. This performance offered us with prompt, safe, and stay information entry with out copies, sustaining transactional consistency throughout our surroundings. The pliability of similar account, cross-account, and cross-Area information sharing has been notably priceless for our distributed structure.

Excessive Efficiency

We’ve achieved important efficiency enhancements via Amazon Redshift’s environment friendly question processing and information retrieval capabilities. The system successfully handles our advanced information wants whereas sustaining strong efficiency throughout numerous workloads and information volumes.

Multi-Availability Zone Assist

Our implementation benefited from Amazon Redshift’s Multi-AZ help, which maintains excessive availability and reliability for our essential operations. This characteristic minimizes downtime with out requiring in depth setup and considerably reduces our danger of knowledge loss.

Acquainted Interface

The relational setting of Amazon Redshift, comparable conventional databases like Amazon RDS and IBM Db2, has enabled a clean transition for our groups. This familiarity has accelerated adoption and improved productiveness, as our groups can leverage their current SQL experience. By centralizing information from a number of enterprise areas in ORDS utilizing Amazon Redshift, we keep constant, environment friendly, and safe information entry throughout our product groups. This setup is especially priceless for our batch processing that requires information from numerous components of the enterprise, providing us a mix of efficiency, reliability, and ease of use.

Operational Learn-only Information Retailer (ORDS) utilizing Amazon Redshift

Right here’s how our ORDS structure implements Amazon Redshift information sharing to unravel these challenges:

Determine 1: Vanguard’s ORDS Structure utilizing Amazon Redshift Information Sharing

Amazon Redshift Ingestion Sample:

We utilized Amazon Redshift’s zero-ETL performance to combine information and allow real-time analytics straight on operational information, which helped cut back complexity and upkeep overhead. To enhance this functionality and to meet our complete compliance necessities that necessitate full transaction replication, we carried out further information ingestion pipelines.

Our information ingestion technique for Amazon Redshift employs totally different AWS companies relying on the supply. For Amazon Aurora PostgreSQL databases, we use AWS Database Migration Service (AWS DMS) to straight replicate information into Amazon Redshift. For information from Amazon DynamoDB, we leverage Amazon Kinesis to stream the info into Amazon Redshift, the place it lands in materialized views. These views are then additional processed to generate tables for end-users.

This method permits us to effectively ingest information from our operational information shops whereas assembly each analytical wants and compliance necessities.

Amazon Redshift Information Sharing:

We used the Amazon Redshift’s information sharing characteristic to successfully decouple our information producers from customers, permitting every group to function inside their very own boundaries whereas sustaining a unified and simplified ruled mechanism for information sharing.

Our implementation adopted a transparent course of: as soon as information is ingested and accessible in Amazon Redshift desk format, we created views for customers to entry the info. We then established information shares and granted entry to those views to shopper Amazon Redshift information warehouses for batch processing. In our surroundings with a number of bounded contexts, we’ve established a collaborative mannequin the place customers work with numerous producer groups to entry information from totally different information shares, every created per bounded context.

This entry remained strictly read-only—when customers must replace or write new information that falls outdoors their bounded context, they need to use APIs or different designated mechanisms for such operations. This method has confirmed efficient for our group, selling clear information possession and governance whereas enabling versatile information entry throughout organizational boundaries. It simplified our information administration and made certain every workforce can function independently whereas nonetheless sharing information successfully.

Instance: VG couple of cross bounded context

Disclaimer: That is offered for reference functions solely and doesn’t characterize an actual instance.

Let’s have a look at a sensible instance: our brokerage account assertion technology course of. This cross-bounded context batch course of requires integrating information from a number of sources, accessing a whole bunch of tables and processing massive volumes of knowledge month-to-month. The problem was to create an environment friendly, cost-effective resolution that minimizes information replication whereas sustaining information accessibility.ORDS proved superb for this use case, because it gives information from a number of bounded contexts with out replication, provides close to real-time entry, and allows simple information aggregation utilizing SQL-like queries in Amazon Redshift.

The next diagram reveals how we carried out this resolution:

Determine 2: Cross-Bounded Context Instance for Brokerage Account Assertion Technology

We’d like the next bounded contexts to generate brokerage statements for tens of millions of our purchasers.

Account:
- Particulars: Consists of details about the shopper’s brokerage accounts, corresponding to account numbers, varieties, and statuses.
- Holdings and Positions: Gives present holdings and positions throughout the account, detailing the securities owned, their portions, and present market values.
- Stability Info: Incorporates the steadiness data of the account, together with money balances, margin balances, and whole account worth.
Consumer Profile:
- Private Info: Details about the shopper, corresponding to their identify, date of delivery, and social safety quantity.
- Contact Info: Consists of the shopper’s e mail handle, bodily handle, and telephone numbers.
Transaction Historical past:
- Transaction Data: A complete document of transactions related to the account, together with buys, gross sales, transfers, and dividends.
- Transaction Particulars: Every transaction document contains particulars corresponding to transaction date, kind, amount, worth, and related charges.
- Historic Information: Historic information of transactions over time, offering an entire view of the account’s exercise.

By way of this structure, we effectively generate correct and complete brokerage account statements by consolidating information from these bounded contexts, assembly each our purchasers’ wants and regulatory necessities.

Enterprise End result

Our journey with the Operational Learn-only Information Retailer (ORDS) and Amazon Redshift has enhanced our shopper expertise (CX) via improved information administration and accessibility. By transitioning from our mainframe system to a cloud-based, domain-driven structure, we’ve got empowered our autonomous groups and established a resilient batch structure.

This shift facilitates environment friendly cross-domain information entry, maintains high-quality information consistency, and gives scalability. Our ORDS implementation, supported by Amazon Redshift, provides near-real-time entry to massive information volumes, guaranteeing excessive efficiency, reliability, and cost-effectiveness. This modernization effort aligns with our mission to ship distinctive, personalised shopper experiences and maintain long-lasting shopper relationships.

Name to Motion

If you’re going through comparable challenges along with your batch processing techniques, we encourage you to discover how an Operational Learn-only Information Retailer (ORDS) can rework your information structure. Begin by assessing your present system’s limitations and figuring out alternatives for enchancment via domain-driven design and cloud-based options. Think about how this method will help you handle massive volumes of knowledge from a number of sources, present quick entry to replicated information for batch processes, and help high-volume reads from numerous information sources.

Take the following step by conducting a proof of idea (POC) to guage ORDS effectiveness in reaching environment friendly cross-domain information entry, enhancing the efficiency of batch jobs, and sustaining clear information possession inside your online business domains. By implementing this resolution, you possibly can improve your information administration capabilities, cut back operational dangers, and drive innovation inside your group. Embrace this chance to raise your information structure and ship distinctive buyer experiences.

Conclusion

Our transition to a cloud-native, domain-driven structure with ORDS utilizing Amazon Redshift has efficiently remodeled our batch processing capabilities in AWS cloud. This modernization effort has considerably enhanced the efficiency, reliability, and scalability of our batch operations whereas sustaining seamless information entry and integration throughout totally different enterprise domains.

The strategic adoption of ORDS has harnessed the potential of cross-domain information entry in a distributed setting, offering us with a strong resolution for real-time information entry and environment friendly batch processing. This transformation has empowered us to higher meet the calls for of the digital age, delivering superior buyer experiences and reinforcing our dedication to innovation within the monetary companies trade.