Scaling information governance with Amazon DataZone: Covestro success story

November 4, 2025

3

Covestro Deutschland AG, headquartered in Leverkusen, Germany, is a world chief in high-performance polymer supplies and parts. Since its spin-off from Bayer AG in 2015, Covestro has established itself as a key participant within the chemical trade, with 48 manufacturing websites worldwide, €14.4 billion 2023 income, and 17,500 staff. Covestro’s core enterprise focuses on creating revolutionary, sustainable options for merchandise utilized in numerous features of day by day life. The corporate gives supplies for mobility, constructing and residing, electrical and electronics sectors, along with sports activities and leisure, cosmetics, well being, and the chemical trade. The corporate’s merchandise, reminiscent of polycarbonates, polyurethanes, coatings, adhesives, and specialty elastomers, are vital parts in automotive, building, electronics, and medical system industries.

To help this world operation and various product portfolio, Covestro adopted a sturdy information administration answer. On this submit, we present you ways Covestro reworked its information structure by implementing Amazon DataZone and AWS Serverless Information Lake Framework (SDLF), transitioning from a centralized information lake to an information mesh structure. By means of this strategic shift, groups can share and devour information whereas sustaining prime quality requirements via a consolidated information market and enterprise metadata glossary. The outcome: streamlined information entry, higher information high quality, and stronger governance at scale that numerous producer and shopper groups can use to run information and analytics workloads at scale, enabling over 1,000 information pipelines and reaching a 70% discount in time-to-market.

Enterprise and information challenges

Previous to their transformation, Covestro operated with a centralized information lake managed by a single information platform workforce that dealt with the info engineering duties. This centralized strategy created a number of challenges: bottlenecks in mission supply due to restricted engineering assets, difficult prioritization of use circumstances, and inefficient information sharing processes. The setup usually resulted in pointless information duplication, which in flip slowed down time-to-market for brand new analytics initiatives, elevated prices, and restricted the flexibility of enterprise items to behave shortly on insights.The shortage of visibility into information property created important operational challenges:

Groups couldn’t discover present datasets, usually recreating information already saved elsewhere
No clear understanding of knowledge lineage or high quality metrics
Problem in figuring out who owned particular information property or who to contact for entry
Absence of metadata and documentation about obtainable datasets
Departments shared little information about how they have been utilizing information

These visibility points, mixed with the shortage of unified entry controls, led to:

Siloed information initiatives throughout departments
Decreased belief in information high quality
Inefficient use of assets
Delayed mission timelines
Missed alternatives for cross-functional collaboration and insights

A strategic answer: Why Amazon DataZone and SDLF?

The challenges Covestro confronted replicate deeper structural limitations of centralized information architectures. As Covestro scaled, central information groups usually turned bottlenecks, and lack of area context led to fragmented high quality, inconsistent requirements, and poor collaboration. As a substitute of centralizing management, a information mesh offers possession to the groups who generate and perceive the info, whereas protecting the governance and interoperability constant throughout the group. This makes it well-suited for Covestro’s atmosphere, which requires agility, scalability, and cross-team collaboration.

AWS Serverless Information Lake Framework (SDLF) is an answer to those challenges, offering a sturdy basis for information mesh architectures. Conventional information lake implementations usually centralize information possession and governance, however with the versatile design of SDLF, organizations can construct decentralized information domains that align with trendy information mesh ideas. The framework offers domain-oriented groups with the infrastructure, safety controls, and operational patterns wanted to personal and handle their information merchandise independently, whereas sustaining constant governance throughout the group. By means of its modular structure and infrastructure as code templates, SDLF accelerates the creation of domain-specific information merchandise, in order that Covestro’s groups can deploy standardized but customizable information pipelines. This strategy helps the important thing pillars of knowledge mesh: domain-oriented decentralization, information as a product, self-serve infrastructure, and federated governance, offering Covestro with a sensible path to beat the restrictions of conventional centralized architectures.

Amazon DataZone enhances the info mesh implementation via a unified expertise for locating and accessing information throughout decentralized domains. As a knowledge administration service, Amazon DataZone helps organizations catalog, uncover, share, and govern information throughout organizational boundaries. It offers a central governance layer the place organizations can set up information sharing agreements, handle entry controls, and allow self-service information entry whereas supporting safety and compliance. Whereas groups can use the SDLF framework to construct and function domain-specific information merchandise, Amazon DataZone enhances it with a searchable catalog enriched with metadata, enterprise context, and utilization insurance policies, making information merchandise simpler to seek out, belief, and reuse.

By means of the sharing capabilities of Amazon DataZone, area groups can share their information merchandise with different domains whereas sustaining granular entry controls and governance insurance policies, enabling cross-domain collaboration and information reuse. This integration signifies that area groups can publish their SDLF-managed datasets to an Amazon DataZone catalog, so licensed customers throughout the group can uncover and entry them. By means of the built-in governance capabilities constructed into Amazon DataZone, organizations can implement standardized information sharing workflows, examine information high quality, and implement constant entry controls throughout their distributed information system, strengthening their information mesh structure with sturdy information governance and democratization capabilities.Collectively, SDLF and Amazon DataZone present Covestro with a complete answer for implementing a contemporary information mesh structure, enabling autonomous information domains to function with constant governance, seamless information sharing, and enterprise-wide information discovery.

Answer structure and implementation

The next structure illustrates the high-level design of the info mesh answer. The implementation used a complete AWS answer constructed on AWS companies to create a sturdy, scalable, and ruled information mesh that serves a number of enterprise domains throughout the Covestro group.

Information area basis: Serverless Information Lake Framework

A key pillar of the implementation is the Serverless Information Lake Framework (SDLF), which offers the foundational infrastructure and safety wanted to help information mesh methods. SDLF delivers the core constructing blocks for information domains reminiscent of Amazon S3 storage layers, built-in encryption with AWS KMS, IAM-based entry management, and infrastructure as code (IaC) automation. By utilizing these parts, Covestro can deploy decentralized, domain-owned information merchandise quickly whereas sustaining constant governance throughout the enterprise.

The framework makes use of Amazon Easy Storage Service (Amazon S3) as the first information storage layer, delivering just about limitless scalability and eleven nines of sturdiness for various information property. The proposed S3 bucket structure follows AWS Properly-Architected ideas, implementing a multi-tiered construction with distinct uncooked, staging, and analytics information zones. This layered strategy helps completely different enterprise domains to keep up information sovereignty (every area owns and controls its information, whereas protecting accessibility patterns organization-wide).

Safety is a basic facet in Covestro’s information mesh implementation. SDLF robotically implements encryption at relaxation and in transit throughout information storage and processing parts. AWS Key Administration Service (AWS KMS) offers centralized key administration, whereas rigorously crafted AWS Identification and Entry Administration (IAM) roles allow useful resource isolation.

Information processing with AWS Glue

AWS Glue serves because the cornerstone of the info processing and transformation capabilities, providing serverless extract, rework, and cargo ETL companies that robotically scale based mostly on workload calls for.

Covestro’s pre-existent centralized information lake was fed by greater than 1,000 ingestion information pipelines interacting with a wide range of supply methods. To help the migration of present ingestion and processing pipelines, Covestro developed reusable blueprints that included the event and safety requirements outlined for the info mesh.Covestro launched standardized patterns that groups can deploy throughout a number of domains whereas offering the pliability wanted for domain-specific necessities. These blueprints help various supply methods, from conventional databases like Oracle, SQL Server, and MySQL to trendy software program as a service (SaaS) functions reminiscent of SAP C4C.

Additionally they developed specialised blueprints for processing, standardizing, and cleansing ingested uncooked information. These blueprints retailer processed information in Apache Iceberg format, robotically saving metadata within the AWS Glue Information Catalog and offering built-in capabilities to deal with schema evolution seamlessly.

Covestro depends on SDLF to shortly configure and deploy the blueprints as AWS Glue jobs contained in the area. With SDLF, groups deploy a knowledge pipeline via a YAML configuration file, and the orchestration and administration mechanisms of SDLF deal with the remainder. The answer contains complete monitoring capabilities constructed on Amazon DynamoDB, offering real-time visibility into information pipeline well being and efficiency metrics (when groups deploy a pipeline via SDLF, the system robotically integrates it with the monitoring setup).

Information high quality with AWS Glue Information High quality

To realize information reliability throughout domains, Covestro prolonged the capabilities of SDLF to include AWS Glue Information High quality into information processing pipelines. This integration allows automated information high quality checks as a part of the usual information processing workflow. Because of the configuration-driven design of SDLF, information producers can implement quality control both utilizing really useful guidelines, that are robotically generated via information profiling, or making use of their very own domain-specific guidelines.

The combination offers information groups with the pliability to outline high quality expectations whereas sustaining consistency in how high quality checks are applied on the pipeline degree. The answer logs high quality analysis outcomes, offering visibility into the info high quality metrics for every information product. These parts are illustrated within the following determine.

Enterprise-ready entry management with AWS Lake Formation

AWS Lake Formation integration with the Information Catalog helps the safety and entry management layer that makes the info mesh implementation enterprise-ready. By means of Lake Formation, Covestro applied fine-grained entry controls that respect area boundaries whereas enabling managed cross-domain information sharing.

The service’s integration with IAM signifies that Covestro can implement role-based entry patterns that align with their organizational construction, so customers can entry the info they want whereas protecting applicable safety boundaries.

Information democratization with Amazon DataZone

Amazon DataZone features as the center of the info mesh implementation. Deployed in a devoted AWS account, it offers the info governance, discovery, and sharing capabilities that have been lacking within the earlier centralized strategy. DataZone gives a unified, searchable catalog enriched with enterprise context, automated entry controls, and standardized sharing workflows that allow true information democratization throughout the group.

By means of Amazon DataZone, Covestro established a complete information catalog that helps enterprise customers throughout completely different domains to find, perceive, and request entry to information property with out requiring deep technical experience. The enterprise glossary performance helps constant information definitions throughout domains, eliminating the confusion that always arises when completely different groups use completely different terminology for a similar ideas.

Information product house owners can use the mixing of Amazon DataZone integration with AWS Lake Formation to grant or revoke cross-domain entry to information, streamlining the info sharing course of whereas supporting safety and compliance necessities.

Managing cross-domain information pipeline dependencies

When implementing Covestro’s information mesh structure on AWS, some of the important challenges was orchestrating information pipelines throughout a number of domains. The core query to handle was “How can Information Area A decide when a required dataset from Information Area B has been refreshed and is prepared for consumption?”.

In a knowledge mesh structure, domains keep possession of their information merchandise whereas enabling consumption by different domains. This distributed mannequin creates advanced dependency chains the place downstream pipelines should look forward to upstream information merchandise to finish processing earlier than execution can start.

To deal with this cross-domain dependency coordination, Covestro prolonged the SDLF with a customized dependency checker element that operates via each shared and domain-specific parts.

The shared parts encompass two centralized Amazon DynamoDB tables situated in a hub AWS account: one gathering profitable pipeline execution logs from the domains, and one other aggregating pipeline dependencies throughout the whole information mesh.

These domains deploy native parts reminiscent of a dependency-tracking Amazon DynamoDB desk and an AWS Step Features state machine. The state machine checks conditions utilizing centralized execution logs and integrates seamlessly as step one in each SDLF-deployed pipeline, with out further configuration. The next diagram exhibits the method described.

To forestall round dependencies that might create locks within the distributed orchestration system, Covestro applied a complicated detection mechanism utilizing Amazon Neptune. DynamoDB Streams robotically replicate dependency modifications from area tables to the central registry, triggering an AWS Lambda operate that makes use of the Gremlin graph traversal language (utilizing pygremlin) to assemble, replace, and analyze a directed acyclic graph (DAG) of the pipeline relationships, with native Gremlin features detecting round dependencies and sending automated notifications, as illustrated within the following diagram. This course of repeatedly updates the graph to replicate any new pipeline dependencies or modifications throughout the info mesh.

Operational excellence via infrastructure as code

Infrastructure as code (IaC) practices utilizing AWS CloudFormation and the AWS Cloud Improvement Package (AWS CDK) considerably enhance the operational effectivity of the info mesh implementation. The infrastructure code is version-controlled in GitHub repositories, offering full traceability and collaboration capabilities for information engineering groups. This strategy makes use of a devoted deployment account that makes use of AWS CodePipeline to orchestrate constant deployments throughout a number of information mesh domains.

The centralized deployment mannequin helps that infrastructure modifications comply with a standardized steady integration and deployment (CI/CD) course of, the place code commits set off automated pipelines that validate, take a look at, and deploy infrastructure parts to the suitable area accounts. Every information area resides in its personal separate set of AWS accounts (dev, qa, prod), and the centralized deployment pipeline respects these boundaries whereas enabling managed infrastructure provisioning.

IaC allows the info mesh to scale horizontally when onboarding new domains, supporting the upkeep of constant safety, governance, and operational requirements throughout the whole atmosphere. Covestro provisions new domains shortly utilizing confirmed templates, accelerating time-to-value for enterprise groups.

Enterprise influence and technical outcomes

The implementation of the info mesh structure utilizing Amazon DataZone and SDLF has delivered important measurable advantages throughout Covestro’s group:

Accelerated information pipeline improvement

70% discount in time-to-market for brand new information merchandise via standardized blueprints
Profitable migration of greater than 1,000 information pipelines to the brand new structure
Automated pipeline creation with out handbook coding necessities
Standardized strategy and sharing throughout domains

Enhanced information governance and high quality

Complete enterprise glossary implementation that helps constant terminology
Automated information high quality checks built-in into pipelines
Finish-to-end information lineage visibility throughout domains
Standardized metadata administration via Apache Iceberg integration

Improved information discovery and entry

Self-service information discovery portal via Amazon DataZone
Streamlined cross-domain information sharing with applicable safety controls
Decreased information duplication via improved visibility of present property
Environment friendly administration of cross-domain pipeline dependencies

Operational effectivity

Decreased central information workforce bottlenecks via domain-oriented possession
Decreased operational overhead via automated deployment processes
Improved useful resource utilization via elimination of redundant information processing
Enhanced monitoring and troubleshooting capabilities

The brand new infrastructure has basically reworked how Covestro’s groups work together with information, enabling enterprise domains to function autonomously whereas upholding enterprise-wide requirements for high quality and governance. This has created a extra agile, environment friendly, and collaborative information ecosystem that helps each present wants and future progress.

What’s subsequent

As Covestro’s information platform continues to evolve, the main target is now to help area groups to successfully constructed information merchandise for cross area analytics. In parallel, Covestro is actively working to enhance information transparency with information lineage in Amazon DataZone via OpenLineage to help extra complete information traceability throughout a various set of processing instruments and codecs.

Conclusion

On this submit, we confirmed you ways Covestro reworked its information structure transitioning from a centralized information lake to an information mesh structure, and the way this basis will show invaluable in supporting their journey towards turning into a extra data-driven group. Their expertise demonstrates how trendy information architectures, when correctly applied with the proper instruments and frameworks, can rework enterprise operations and unlock new alternatives for innovation.

This implementation serves as a blueprint for different enterprises trying to modernize their information infrastructure whereas sustaining safety, governance, and scalability. It exhibits that with cautious planning and the proper expertise decisions, organizations can efficiently transition from centralized to distributed information architectures with out compromising on management or high quality.

For extra on Amazon DataZone, see the Getting Began information. To be taught concerning the SDLF, see Deploy and handle a serverless information lake on the AWS Cloud through the use of infrastructure as code.