HomeBig DataBridging information silos: cross-bounded context querying with Vanguard’s Operational Learn-only Information Retailer...

Bridging information silos: cross-bounded context querying with Vanguard’s Operational Learn-only Information Retailer (ORDS) utilizing Amazon Redshift


Are you modernizing your legacy batch processing techniques? At Vanguard, we confronted important challenges with our legacy mainframe system that restricted our capability to ship trendy, personalised buyer experiences. Our centralized database structure created efficiency bottlenecks and made it tough to scale companies independently for our tens of millions of private and institutional buyers.

On this put up, we present you the way we modernized our information structure utilizing Amazon Redshift as our Operational Learn-only Information Retailer (ORDS). You’ll learn the way we transitioned to a cloud-native, domain-driven structure whereas preserving essential batch processing capabilities. We present you the way this resolution enabled us to create logically remoted information domains whereas sustaining cross-domain analytics capabilities—all whereas adhering to the rules of bounded contexts and distributed information possession.

Background and challenges

As monetary wants proceed to evolve, Vanguard is dedicated to delivering adaptable, top-notch experiences that foster long-lasting buyer relationships. This dedication spans from enhancing the non-public investor journey to bringing personalised cellular dashboards and connecting institutional purchasers with superior recommendation choices.

To raise buyer expertise and drive digital transformation, Vanguard has embraced domain-driven design rules. This method focuses on creating autonomous groups, fostering sooner innovation, and constructing information mesh structure. Central to this transformation is the Private Investor workforce’s mainframe modernization effort, transitioning from a legacy system to a cloud-based, distributed information structure organized round bounded contexts – distinct enterprise domains that handle their very own information. As a part of this shift, every microservice now manages its personal native information retailer utilizing Amazon Aurora PostgreSQL-Suitable Version or Amazon DynamoDB. This method allows domain-level information possession and operational autonomy.

Vanguard’s current mainframe system, constructed on a centralized Db2 database, allows cross-domain information entry and integration but in addition introduces a number of architectural challenges. Although batch processes can be part of information throughout a number of bounded contexts utilizing SQL joins and database operations to combine data from numerous sources, this tight coupling creates important dangers and operational points.

Challenges with the centralized database method embrace:

  • Useful resource Rivalry: Processes from one area can negatively impression different domains because of shared compute sources, resulting in efficiency degradation throughout the system.
  • Lack of Area Isolation: Modifications in a single bounded context can have unintended ripple results throughout different domains, rising the chance of system-wide failures.
  • Scalability Constraints: The centralized structure creates bottlenecks as load will increase, making it tough to scale particular person parts independently.
  • Excessive Coupling: Tight integration between domains makes it difficult to change or improve particular person parts with out affecting the whole system.
  • Restricted Fault Tolerance: Points in a single area can cascade throughout the whole system because of shared infrastructure and information dependencies.

To deal with these architectural challenges, we selected to make use of Amazon Redshift as our Operational Learn-only Information Retailer (ORDS). The Amazon Redshift structure has compute and storage separation, which allows us to create multi-cluster architectures with a separate endpoint for every area with impartial scaling of compute and storage sources. Our resolution leverages the info sharing capabilities of Amazon Redshift to create logically remoted information domains whereas sustaining the flexibility to carry out cross-domain analytics when wanted.

Key advantages of the Amazon Redshift resolution embrace:

  1. Useful resource Isolation: Every area will be assigned devoted Amazon Redshift compute sources, ensuring one area’s workload doesn’t impression others.
  2. Unbiased Scaling: Domains can scale their compute sources independently primarily based on their particular wants.
  3. Managed Information Sharing: Amazon Redshift’s information sharing characteristic allows safe and managed cross-domain information entry with out tight coupling, sustaining clear area boundaries.

Let’s discover the totally different options we evaluated earlier than deciding on ORDS with Amazon Redshift as our optimum method.

Options explored

We carried out ORDS as our optimum resolution after conducting a complete analysis of obtainable choices. This part outlines our decision-making course of and examines the alternate options we thought of throughout our evaluation.

Operational Learn-only Information Retailer (ORDS):

In our analysis, we discovered that utilizing Amazon Redshift for ORDS gives a robust resolution for dealing with information throughout totally different enterprise areas. It excels at managing massive volumes of knowledge from a number of sources, offering quick entry to replicated information for batch processes that require cross-bounded context information, and mixing data utilizing acquainted SQL queries. The answer notably shines in dealing with high-volume reads from our information sources.

Benefits:

  • Works properly in a relational database
  • Excels at real-time entry to information from a number of enterprise areas
  • Improves efficiency of batch jobs coping with massive information volumes
  • Shops information in acquainted desk format, accessible by way of SQL
  • Enforces clear information possession, with every enterprise space accountable for its information
  • Gives scalable structure that reduces the chance of single level of failure

Disadvantages:

  • Requires further information validation throughout loading processes to keep up information uniqueness
  • Wants cautious administration of major key constraints since Amazon Redshift optimizes for analytical efficiency
  • Could require further monitoring and controls in comparison with conventional RDBMS techniques

Listed below are the opposite options we evaluated:

Bulk APIs:

We discovered that Bulk APIs gives an method for dealing with massive volumes of knowledge.

Benefits:

  • Close to actual time entry to bulk information via a single request
  • Autonomous groups have management over entry patterns
  • Environment friendly batch processing of enormous datasets with multi-record retrieval

Disadvantages:

  • Every product workforce must create their very own bulk API
  • Should you want information from totally different areas, you need to mix it your self
  • The workforce offering the API should ensure that it might probably deal with massive quantities of requests
  • You would possibly want to make use of a number of APIs to get all the info you need
  • Should you’re getting information in chunks (pagination), you would possibly miss some data if it modifications between requests

Whereas Bulk APIs provide highly effective capabilities, we discovered they require substantial workforce coordination and cautious implementation to be efficient.

Information Lake:

Our analysis confirmed that information lakes can successfully mix data from totally different components of our enterprise. They excel at processing massive quantities of knowledge directly, offering search capabilities via unified information codecs, and managing massive volumes of numerous and sophisticated information.

Benefits:

  • Handles huge information volumes effectively
  • Helps a number of information codecs and constructions
  • Permits advanced analytics and information science workloads
  • Gives cost-effective storage options
  • Accommodates each structured and unstructured information

Disadvantages:

  • Could not present real-time, high-speed information entry
  • Requires further effort with advanced information constructions, particularly these with many interconnected components
  • Wants particular methods to prepare information in a easy, flat construction
  • Calls for important information governance and administration
  • Requires specialised abilities for efficient implementation

Whereas information lakes excel at big-picture evaluation of enormous datasets, they weren’t optimum for our real-time information wants and sophisticated information relationships.

S3 Export/Trade: 

In our evaluation, we discovered that S3 Export/Trade gives a way for sharing information between totally different enterprise areas utilizing file storage. This method successfully handles massive volumes of knowledge and permits simple filtering of data utilizing information frames.

Benefits:

  • Gives easy, cost-effective information storage
  • Helps high-volume information transfers
  • Permits simple information filtering capabilities
  • Gives versatile entry management
  • Facilitates cross-region information sharing

Disadvantages:

  • Not appropriate for real-time information wants
  • Requires further processing to transform information into usable desk format
  • Calls for important information preparation effort
  • Lacks instant information consistency
  • Wants further instruments for information transformation

Whereas S3 Export/Trade works properly for sharing massive datasets between groups, it didn’t meet our necessities for fast, real-time entry or instantly usable information codecs.

The next desk gives a high-level comparability of the totally different information integration options we thought of for our modernization efforts. It outlines the place every resolution is most acceptable to make use of and when it won’t be your best option:

Answer Bulk APIs Information Lake ORDS S3 Export/Trade
When to make use of Actual-time operational information is required

Fetching particular information subsets

Processing massive quantities of knowledge directly

Many bounded context

Close to real-time entry throughout a number of bounded contexts

Massive quantity batch processing

Few bounded contextsHandling massive volumes of knowledge

Level-in-time export is enough

When to not use Many bounded contexts concerned Actual-time information entry wanted

Structured, transactional information processing

Inside a single bounded context Actual-time information wants

Many bounded contexts

Desk 1: Information Integration Options Comparability

Based mostly on our comparability, we discovered ORDS to be the optimum resolution for our wants, notably when our batch processes require entry to information from a number of bounded contexts in real-time. Our implementation effectively handles massive volumes of knowledge, considerably enhancing the efficiency of our batch jobs. We selected ORDS as a result of it shops information in a well-known desk format, accessible by way of SQL, making it easy and environment friendly for our groups to make use of.

The structure additionally aligns with our domain-driven design rules by imposing clear information possession, the place every bounded context maintains accountability for its personal information administration. This method gives us with each scalability and reliability, decreasing the chance of a single level of failure.

Amazon Redshift: Powering Vanguard’s ORDS Answer

Amazon Redshift serves because the spine of our ORDS implementation, providing a number of essential options that help our modernization objectives:

Information Sharing

Our resolution leveraged the strong information sharing capabilities of Amazon Redshift, accessible on each Server-based Redshift RA3 cases and Redshift Serverless choices. This performance offered us with prompt, safe, and stay information entry with out copies, sustaining transactional consistency throughout our surroundings. The pliability of similar account, cross-account, and cross-Area information sharing has been notably priceless for our distributed structure.

Excessive Efficiency

We’ve achieved important efficiency enhancements via Amazon Redshift’s environment friendly question processing and information retrieval capabilities. The system successfully handles our advanced information wants whereas sustaining strong efficiency throughout numerous workloads and information volumes.

Multi-Availability Zone Assist

Our implementation benefited from Amazon Redshift’s Multi-AZ help, which maintains excessive availability and reliability for our essential operations. This characteristic minimizes downtime with out requiring in depth setup and considerably reduces our danger of knowledge loss.

Acquainted Interface

The relational setting of Amazon Redshift, comparable conventional databases like Amazon RDS and IBM Db2, has enabled a clean transition for our groups. This familiarity has accelerated adoption and improved productiveness, as our groups can leverage their current SQL experience. By centralizing information from a number of enterprise areas in ORDS utilizing Amazon Redshift, we keep constant, environment friendly, and safe information entry throughout our product groups. This setup is especially priceless for our batch processing that requires information from numerous components of the enterprise, providing us a mix of efficiency, reliability, and ease of use.

Operational Learn-only Information Retailer (ORDS) utilizing Amazon Redshift

Right here’s how our ORDS structure implements Amazon Redshift information sharing to unravel these challenges:

ORDS Architecture Diagram

Determine 1: Vanguard’s ORDS Structure utilizing Amazon Redshift Information Sharing

Amazon Redshift Ingestion Sample:

We utilized Amazon Redshift’s zero-ETL performance to combine information and allow real-time analytics straight on operational information, which helped cut back complexity and upkeep overhead. To enhance this functionality and to meet our complete compliance necessities that necessitate full transaction replication, we carried out further information ingestion pipelines.

Our information ingestion technique for Amazon Redshift employs totally different AWS companies relying on the supply. For Amazon Aurora PostgreSQL databases, we use AWS Database Migration Service (AWS DMS) to straight replicate information into Amazon Redshift. For information from Amazon DynamoDB, we leverage Amazon Kinesis to stream the info into Amazon Redshift, the place it lands in materialized views. These views are then additional processed to generate tables for end-users.

This method permits us to effectively ingest information from our operational information shops whereas assembly each analytical wants and compliance necessities.

Amazon Redshift Information Sharing:

We used the Amazon Redshift’s information sharing characteristic to successfully decouple our information producers from customers, permitting every group to function inside their very own boundaries whereas sustaining a unified and simplified ruled mechanism for information sharing.

Our implementation adopted a transparent course of: as soon as information is ingested and accessible in Amazon Redshift desk format, we created views for customers to entry the info. We then established information shares and granted entry to those views to shopper Amazon Redshift information warehouses for batch processing. In our surroundings with a number of bounded contexts, we’ve established a collaborative mannequin the place customers work with numerous producer groups to entry information from totally different information shares, every created per bounded context.

This entry remained strictly read-only—when customers must replace or write new information that falls outdoors their bounded context, they need to use APIs or different designated mechanisms for such operations. This method has confirmed efficient for our group, selling clear information possession and governance whereas enabling versatile information entry throughout organizational boundaries. It simplified our information administration and made certain every workforce can function independently whereas nonetheless sharing information successfully.

Instance: VG couple of cross bounded context

Disclaimer: That is offered for reference functions solely and doesn’t characterize an actual instance.

Let’s have a look at a sensible instance: our brokerage account assertion technology course of. This cross-bounded context batch course of requires integrating information from a number of sources, accessing a whole bunch of tables and processing massive volumes of knowledge month-to-month. The problem was to create an environment friendly, cost-effective resolution that minimizes information replication whereas sustaining information accessibility.ORDS proved superb for this use case, because it gives information from a number of bounded contexts with out replication, provides close to real-time entry, and allows simple information aggregation utilizing SQL-like queries in Amazon Redshift.

The next diagram reveals how we carried out this resolution:

ORDS example

Determine 2: Cross-Bounded Context Instance for Brokerage Account Assertion Technology

We’d like the next bounded contexts to generate brokerage statements for tens of millions of our purchasers.

  1. Account:
    • Particulars: Consists of details about the shopper’s brokerage accounts, corresponding to account numbers, varieties, and statuses.
    • Holdings and Positions: Gives present holdings and positions throughout the account, detailing the securities owned, their portions, and present market values.
    • Stability Info: Incorporates the steadiness data of the account, together with money balances, margin balances, and whole account worth.
  2. Consumer Profile:
    • Private Info: Details about the shopper, corresponding to their identify, date of delivery, and social safety quantity.
    • Contact Info: Consists of the shopper’s e mail handle, bodily handle, and telephone numbers.
  3. Transaction Historical past:
    • Transaction Data: A complete document of transactions related to the account, together with buys, gross sales, transfers, and dividends.
    • Transaction Particulars: Every transaction document contains particulars corresponding to transaction date, kind, amount, worth, and related charges.
    • Historic Information: Historic information of transactions over time, offering an entire view of the account’s exercise.

By way of this structure, we effectively generate correct and complete brokerage account statements by consolidating information from these bounded contexts, assembly each our purchasers’ wants and regulatory necessities.

Enterprise End result

Our journey with the Operational Learn-only Information Retailer (ORDS) and Amazon Redshift has enhanced our shopper expertise (CX) via improved information administration and accessibility. By transitioning from our mainframe system to a cloud-based, domain-driven structure, we’ve got empowered our autonomous groups and established a resilient batch structure.

This shift facilitates environment friendly cross-domain information entry, maintains high-quality information consistency, and gives scalability. Our ORDS implementation, supported by Amazon Redshift, provides near-real-time entry to massive information volumes, guaranteeing excessive efficiency, reliability, and cost-effectiveness. This modernization effort aligns with our mission to ship distinctive, personalised shopper experiences and maintain long-lasting shopper relationships.

Name to Motion

If you’re going through comparable challenges along with your batch processing techniques, we encourage you to discover how an Operational Learn-only Information Retailer (ORDS) can rework your information structure. Begin by assessing your present system’s limitations and figuring out alternatives for enchancment via domain-driven design and cloud-based options. Think about how this method will help you handle massive volumes of knowledge from a number of sources, present quick entry to replicated information for batch processes, and help high-volume reads from numerous information sources.

Take the following step by conducting a proof of idea (POC) to guage ORDS effectiveness in reaching environment friendly cross-domain information entry, enhancing the efficiency of batch jobs, and sustaining clear information possession inside your online business domains. By implementing this resolution, you possibly can improve your information administration capabilities, cut back operational dangers, and drive innovation inside your group. Embrace this chance to raise your information structure and ship distinctive buyer experiences.

Conclusion 

Our transition to a cloud-native, domain-driven structure with ORDS utilizing Amazon Redshift has efficiently remodeled our batch processing capabilities in AWS cloud. This modernization effort has considerably enhanced the efficiency, reliability, and scalability of our batch operations whereas sustaining seamless information entry and integration throughout totally different enterprise domains.

The strategic adoption of ORDS has harnessed the potential of cross-domain information entry in a distributed setting, offering us with a strong resolution for real-time information entry and environment friendly batch processing. This transformation has empowered us to higher meet the calls for of the digital age, delivering superior buyer experiences and reinforcing our dedication to innovation within the monetary companies trade.


Concerning the authors

Malav Shah

Malav Shah

Malav is a Area Architect in Vanguard’s Private Investor Expertise division, with over a decade of expertise in cloud-native options. He focuses on architecting and designing scalable techniques, and contributes hands-on via improvement and proof-of-concept work. Malav holds a number of AWS certifications, together with AWS Licensed Options Architect and AWS Licensed AI Practitioner.

Timothy Dickens

Timothy Dickens

Timothy is a Senior Architect at Vanguard, specializing in superior information streaming designs, AI, real-time information entry, and analytics. With experience in AWS companies like Redshift, DynamoDB, and Aurora Postgres, Timothy excels in creating strong distributed architectures that drive innovation and effectivity. Obsessed with leveraging cutting-edge applied sciences, Timothy is devoted to delivering reliable, actionable information that empowers assured, well timed decision-making.

Priyadharshini Selvaraj

Priyadharshini Selvaraj

Priyadharshini is a knowledge architect with AWS Skilled Companies, bringing over a decade of experience in serving to clients navigate their information journeys. She focuses on information migration and modernization initiatives, specializing in information lakes, information warehouses, and distributed processing utilizing Apache Spark. As an skilled in Generative AI and agentic architectures, Priyadharshini allows clients to harness cutting-edge AI applied sciences for enterprise transformation. Past her technical pursuits, she practices yoga, performs piano and enjoys pastime baking, bringing steadiness to her skilled life.

Naresh Rajaram

Naresh Rajaram

Naresh is a seasoned Options Architect with over 20 years of expertise, with major focus in cloud computing and synthetic intelligence. Specializing in enterprise-scale AI implementations and cloud structure, he’s serving to clients develop and deploy superior AI options, with specific deal with autonomous AI techniques and agent-based architectures. His experience spans designing cutting-edge AI infrastructures utilizing Amazon Bedrock, Amazon Bedrock AgentCore, and cloud-native AI companies, whereas pioneering work in Agentic AI purposes and autonomous techniques.

© 2025 The Vanguard Group, Inc. All rights reserved.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments