HomeBig DataSAP information ingestion and replication with AWS Glue zero-ETL

SAP information ingestion and replication with AWS Glue zero-ETL


Organizations more and more wish to ingest and achieve sooner entry to insights from SAP methods with out sustaining advanced information pipelines. AWS Glue zero-ETL with SAP now helps information ingestion and replication from SAP information sources corresponding to Operational Information Provisioning (ODP) managed SAP Enterprise Warehouse (BW) extractors, Superior Enterprise Utility Programming (ABAP), Core Information Companies (CDS) views, and different non-ODP information sources. Zero-ETL information replication and schema synchronization writes extracted information to AWS companies like Amazon Redshift, Amazon SageMaker lakehouse, and Amazon S3 Tables, assuaging the necessity for guide pipeline improvement. This creates a basis for AI-driven insights when used with AWS companies corresponding to Amazon Q and Amazon Fast Suite, the place you should use pure language queries to investigate SAP information, create AI brokers for automation, and generate contextual insights throughout your enterprise information panorama.

On this publish, we present the way to create and monitor a zero-ETL integration with varied ODP and non-ODP SAP sources.

Resolution overview

The important thing part of SAP integration is the AWS Glue SAP OData connector, which is designed to work with the SAP information constructions and protocols. The connector supplies connectivity to ABAP-based SAP methods and adheres to the SAP safety and governance frameworks. Key options of the AWS SAP connector embody:

  • Makes use of OData protocol for information extraction from varied SAP NetWeaver methods
  • Managed replication for advanced SAP information fashions corresponding to BW extractors (corresponding to 2LIS_02_ITM) and CDS views (corresponding to C_PURCHASEORDERITEMDEX)
  • Handles each ODP and non-ODP entities utilizing the SAP change information seize (CDC) know-how

The SAP connector works with each AWS Glue Studio or AWS managed replication with zero-ETL. Self-managed replication in AWS Glue Studio supplies full management over information processing items, replication frequencies, adjusting price-performance, web page dimension, information filters, locations, file codecs, information transformation, and writing your personal code with chosen runtime. AWS managed information replication in zero-ETL removes burden of customized configurations and supplies an AWS managed various, permitting replication frequencies between quarter-hour to six days. The next answer structure demonstrates the approaches of ingesting ODP and non-ODP SAP information utilizing zero-ETL from varied SAP sources and writing to Amazon Redshift, SageMaker lakehouse, and S3 Tables.

Change information seize for ODP sources

SAP ODP is a knowledge extraction framework that permits incremental and information replication from SAP supply methods to focus on methods. The ODP framework supplies functions (subscribers) to request information from supported objects, corresponding to BW extractors, CDS views, and BW objects, in an incremental method.

AWS Glue zero-ETL information ingestion begins with executing a full preliminary load of entity information to ascertain the baseline dataset within the goal system. After the preliminary full load is full, SAP provisions a delta queue referred to as Operational Delta Queue (ODQ), which captures information adjustments, together with deletions. The delta token is shipped to the subscriber throughout the preliminary load and persevered inside the zero-ETL inside state administration system.

The incremental processing retrieves the final saved delta token from the state retailer, then sends a delta change request to SAP utilizing this token utilizing the OData protocol. The system processes returned INSERT/UPDATE/DELETE operations by the SAP ODQ mechanism and receives a brand new delta token from SAP even in situations the place no information have been modified. This new token is persevered within the state administration system after profitable ingestion. In error situations, the system preserves the prevailing delta token state, enabling retry mechanics with out information loss.

The next screenshot illustrates a profitable preliminary load adopted by 4 incremental information ingestions on the SAP system.

Change information seize for non-ODP sources

Non-ODP constructions are OData companies that aren’t ODP enabled. These are APIs, features, views, or CDS views which can be uncovered straight with out the ODP framework. Information is extracted utilizing this mechanism; nonetheless, incremental information extraction is determined by the character of the article. If the article, for instance, incorporates a “final modified date” discipline, it’s used to trace adjustments and supply incremental information extraction.

AWS Glue zero-ETL supplies out-of-the-box incremental information extraction for non-ODP OData companies, supplied the entity features a discipline to trace adjustments (final modified date or time). For such SAP companies, zero-ETL supplies two approaches for information ingestion: timestamp-based incremental processing and full load.

Timestamp-based incremental processing

Timestamp-based incremental processing makes use of prospects’ configured timestamp fields in zero-ETL to optimize the info extraction course of. The zero-ETL system establishes a beginning timestamp that serves as the inspiration for subsequent incremental processing operations. This timestamp, referred to as the watermark, is essential for facilitating information consistency. The question development mechanism builds OData filters primarily based on timestamp comparisons. These queries extract information which can be created or modified because the final profitable processing execution. The system’s watermark administration performance maintains monitoring of the very best timestamp worth from every processing cycle and makes use of this info as the place to begin for subsequent executions. The zero-ETL system performs an upsert on the goal utilizing the configured main keys. This method facilitates correct dealing with of updates whereas sustaining information integrity. After every profitable goal system replace, the watermark timestamp is superior, making a dependable checkpoint for future processing cycles.

Nevertheless, the timestamp-based method has a limitation: it may possibly’t monitor bodily deletions as a result of SAP methods don’t preserve deletion timestamps. In situations the place timestamp fields are both unavailable or not configured, the system transitions to a full load with upsert processing.

Full load

The complete load method serves as each a standalone method and a fallback mechanism when timestamp-based processing isn’t possible. This technique entails extracting the whole entity dataset throughout every processing cycle, making it appropriate for situations the place change monitoring isn’t accessible or required. The extracted dataset is upserted within the goal system. The upsert processing logic handles each new file insertions and updates to current information.

When to decide on incremental or full load

The timestamp-based incremental processing method gives optimum efficiency and useful resource utilization for giant datasets with frequent updates. Information switch volumes are decreased by the selective switch of solely modified information, leading to reductions in community site visitors. This optimization straight interprets into decrease operational prices. The complete load with upsert facilitates information synchronization in situations the place incremental processing isn’t possible.

Collectively, these approaches type a whole answer for zero-ETL integration with non-ODP SAP constructions, addressing the various necessities of enterprise information integration situations. Organizations utilizing these approaches ought to consider their particular use circumstances, information volumes, and efficiency necessities when selecting between the 2 approaches.The next diagram illustrates the SAP information ingestion workflow.

Flowchart diagram showing a data replication process. Starts with 'Entity Selected for Replication' at the top, flows to 'Initial Snapshot' step, then branches based on a decision 'Entity supports ODP?' into three paths: left path shows 'ODP Setup' leading to 'ODP Incremental Processing', middle path shows 'Timestamp based Incremental Setup' leading to 'Timestamp based Incremental Processing', and right path shows 'Full Load Setup' leading to 'Full Load Processing'. Each processing path includes an 'Integration Active?' decision point that loops back if yes, or flows to 'Error Recovery' at the bottom if no. The diagram uses rounded rectangles for processes, diamonds for decisions, and arrows showing flow direction.

Observing SAP zero-ETL integrations

AWS Glue maintains state administration, logs, and metrics utilizing Amazon CloudWatch logs. For directions to configure observability, consult with Monitoring an integration. Be sure that AWS Identification and Entry Administration (IAM) roles are configured for log supply. The combination is monitored from each supply ingestion and writing to the chosen goal.

Monitoring supply ingestion

The combination of AWS Glue zero-ETL with CloudWatch supplies monitoring capabilities to trace and troubleshoot the info integration processes. By way of CloudWatch, you’ll be able to entry detailed logs, metrics, and occasions that assist determine points, monitor efficiency, and preserve operational well being of your SAP information integrations. Let’s have a look at a couple of cases of success and error situations.

State of affairs 1: Lacking permissions in your position

This error occurred throughout a knowledge integration course of in AWS Glue when making an attempt to entry SAP information. The connection encountered a CLIENT_ERROR with a 400 Dangerous Request standing code, indicating that the position has lacking permissions:

{
    "eventTimestamp": 1755031897157,
    "integrationArn": "arn:aws:glue:us-east-2:012345678901:integration:1da4dccd-96ce-4661-8ef1-bf216623d65f",
    "sourceArn": "arn:aws:glue:us-east-2:012345678901:connection/SAPOData-sap-glue-dev",
    "stage": "ERROR",
    "messageType": "IngestionFailed",
    "particulars": {
        "loadType": "",
        "errorMessage": "You don't have the mandatory permissions to entry the glue connection. just be sure you have the proper IAM permissions to entry AWS Glue sources.",
        "errorCode": "CLIENT_ERROR"
    }
}

State of affairs 2: Damaged delta hyperlinks

The CloudWatch log signifies a difficulty with lacking delta tokens throughout information synchronization from SAP to AWS Glue. The error happens when making an attempt to entry the SAP gross sales doc merchandise desk FactsOfCSDSLSDOCITMDX by the OData service. The absence of delta tokens, that are wanted for incremental information loading and monitoring adjustments, has resulted in a CLIENT_ERROR (400 Dangerous Request) when the system tried to open the info extraction API RODPS_REPL_ODP_OPEN:

{
    "eventTimestamp": 1760700305466,
    "integrationArn": "arn:aws:glue:us-east-1:012345678901:integration:f62e1971-092c-46a3-ba88-d32f4c6cd649",
    "sourceArn": "arn:aws:glue:us-east-1:012345678901:connection/SAPOData-sap-glue-dev",
    "stage": "ERROR",
    "messageType": "IngestionFailed",
    "particulars": {
        "tableName": "/sap/opu/odata/sap/Z_C_SALESDOCUMENTITEMDEX_SRV/FactsOfCSDSLSDOCITMDX",
        "loadType": "",
        "errorMessage": "Obtained an error from SAPOData: Couldn't open information entry by way of extraction API RODPS_REPL_ODP_OPEN. Standing code 400 (Dangerous Request).",
        "errorCode": "CLIENT_ERROR"
    }

State of affairs 3: Consumer errors on SAP information ingestion

This CloudWatch log reveals a shopper exception state of affairs the place the SAP entity EntityOf0VENDOR_ATTR isn’t situated or accessed by the OData service. This CLIENT_ERROR happens when the AWS Glue connector makes an attempt to parse the response from the SAP system however fails, because of both the entity being non-existent within the supply SAP system or the SAP occasion being briefly unavailable:

{
    "eventTimestamp": 1752676327649,
    "integrationArn": "arn:aws:glue:us-east-1:012345678901:integration:9f1acbc0-599f-47d2-8e84-e9779976af59",
    "sourceArn": "arn:aws:glue:us-east-1:012345678901:connection/SAPOData-sap-glue-dev",
    "stage": "ERROR",
    "messageType": "IngestionFailed",
    "particulars": {
        "tableName": "/sap/opu/odata/sap/ZVENDOR_ATTR_SRV/EntityOf0VENDOR_ATTR",
        "loadType": "",
        "errorMessage": "Information learn from supply failed for entity /sap/opu/odata/sap/ZVENDOR_ATTR_SRV/EntityOf0VENDOR_ATTR utilizing connector SAPOData; ErrorMessage: Glue connector returned shopper exception. The response from the connector software could not be parsed.",
        "errorCode": "CLIENT_ERROR"
    }
}

Monitoring goal write

Zero-ETL employs monitoring mechanisms relying on the goal system. For Amazon Redshift targets, it makes use of the svv_integration system view, which supplies detailed details about integration standing, job execution, and information motion statistics. When working with SageMaker lakehouse targets, zero-ETL tracks integration states by the zetl_integration_table_state desk, which maintains metadata about synchronization standing, timestamps, and execution particulars. Moreover, you should use CloudWatch logs to watch the combination progress, capturing details about profitable commits, metadata updates, and potential points throughout the information writing course of.

State of affairs 1: Profitable processing on SageMaker lakehouse goal

The CloudWatch logs present profitable information synchronization exercise for the plant desk utilizing CDC mode. The primary log entry (IngestionCompleted) confirms the profitable completion of the ingestion course of at timestamp 1757221555568, with a final sync timestamp of 1757220991999. The second log (IngestionTableStatistics) supplies detailed statistics of the info modifications, displaying that in this CDC sync 300 new information have been inserted, 8 information have been up to date, and a pair of information have been deleted from the goal database gluezetl. This stage of element helps in monitoring the amount and forms of adjustments being propagated to the goal system.

{
    "eventTimestamp": 1757221555568,
    "integrationArn": "arn:aws:glue:us-east-1:012345678901:integration:b7a1c69a-e180-4d27-b71d-5fcf196d9d2d",
    "sourceArn": "arn:aws:glue:us-east-1:012345678901:connection/mam301",
    "targetArn": "arn:aws:glue:us-east-1:012345678901:database/gluezetl",
    "stage": "VERBOSE",
    "messageType": "IngestionCompleted",
    "particulars": {
        "tableName": "plant",
        "loadType": "CDC",
        "message": "Efficiently accomplished ingestion",
        "lastSyncedTimestamp": 1757220991999,
        "consumedResourceUnits": "10"
    }
}

{
    "eventTimestamp": 1757222506936,
    "integrationArn": "arn:aws:glue:us-east-1:012345678901:integration:b7a1c69a-e180-4d27-b71d-5fcf196d9d2d",
    "sourceArn": "arn:aws:glue:us-east-1:012345678901:connection/mam301",
    "targetArn": "arn:aws:glue:us-east-1:012345678901:database/gluezetl",
    "stage": "INFO",
    "messageType": "IngestionTableStatistics",
    "particulars": {
        "tableName": "plant",
        "loadType": "CDC",
        "insertCount": 300,
        "updateCount": 8,
        "deleteCount": 2
    }
}

State of affairs 2: Metrics on Amazon SageMaker lakehouse goal

The zetl_integration_table_state desk in SageMaker lakehouse supplies a view of integration standing and information modification metrics. On this instance, the desk reveals a profitable integration for an SAP CDS view desk with integration ID 62b1164f-5b85-45e4-b8db-9aa7ab841e98 within the testdb database. The file signifies that at timestamp 1733000485999, there have been 10 insertion information processed (recent_insert_record_count: 10), with no updates or deletions (each counts at 0). This desk serves as a monitoring instrument, offering a centralized view of integration states and detailed statistics about information modifications, making it simple to trace and confirm information synchronization actions within the lakehouse.

+---+--------------------------------------+---------------+----------------------------------------------------------+-----------+--------+-----------------+-------------------------------+------------------------------+------------------------------+------------------------------+
| # | integration_id                       | target_database | table_name                                               | table_state | cause | last_updated_timestamp | recent_ingestion_record_count | recent_insert_record_count | recent_update_record_count | recent_delete_record_count |
+---+--------------------------------------+---------------+----------------------------------------------------------+-----------+--------+-----------------+-------------------------------+------------------------------+------------------------------+------------------------------+
| 2 | 62b1164f-5b85-45e4-b8db-9aa7ab841e98 | testdb        | _sap_opu_odata_sap_zcds_po_scl_new_srv_factsofzmmpurordsldex | SUCCEEDED |        | 1733000485999   | 10                            | 0                            | 0                            | 0                            |
+---+--------------------------------------+---------------+----------------------------------------------------------+-----------+--------+-----------------+-------------------------------+------------------------------+------------------------------+------------------------------+

State of affairs 3: Redshift monitoring system makes use of two views to trace zero-ETL integration standing

svv_integration supplies a high-level overview of the combination standing, displaying that integration ID 03218b8a-9c95-4ec2-81ad-dd4d5398e42a has efficiently replicated 18 tables with no failures, and the final checkpoint was at transaction sequence 1761289852999.

+--------------------------------------+---------------+-----------+-----------------+-------------+----------------------------------------------+-------------------------+-----------------------+---------------+------------------+-----------------+-----------------+------------------+-----------------+-----------------+
| integration_id                       | target_database | supply    | state           | current_lag | last_replicated_checkpoint                   | total_tables_replicated | total_tables_failed | creation_time | refresh_interval | source_database | is_history_mode | query_all_states | truncatecolumns | accept_invchars |
+--------------------------------------+---------------+-----------+-----------------+-------------+----------------------------------------------+-------------------------+-----------------------+---------------+------------------+-----------------+-----------------+------------------+-----------------+-----------------+
| 03218b8a-9c95-4ec2-81ad-dd4d5398e42a | test_case     | GlueSaaS  | CdcRefreshState | 771754      | {"txn_seq":"1761289852999","txn_id":"0"}     | 18                      | 0                     | 22:54.7       | 0                |                 | FALSE           | FALSE            | FALSE           | FALSE           |
+--------------------------------------+---------------+-----------+-----------------+-------------+----------------------------------------------+-------------------------+-----------------------+---------------+------------------+-----------------+-----------------+------------------+-----------------+-----------------+

svv_integration_table_state gives table-level monitoring particulars, displaying the standing of particular person tables inside the integration. On this case, the SAP materials group textual content entity desk is in Synced state, with its final replication checkpoint matching the combination checkpoint (1761289852999). The desk at present reveals 0 rows and 0 dimension, suggesting it’s newly created.

+--------------------------------------+---------------+-------------+--------------------------------------------------------------+-------------+----------------------------------------------+--------+-----------------------+------------+------------+-----------------+
| integration_id                       | target_database | schema_name | table_name                                                   | table_state | table_last_replicated_checkpoint             | cause | last_updated_timestamp | table_rows | table_size | is_history_mode |
+--------------------------------------+---------------+-------------+--------------------------------------------------------------+-------------+----------------------------------------------+--------+-----------------------+------------+------------+-----------------+
| 03218b8a-9c95-4ec2-81ad-dd4d5398e42a | test_case     | public      | /sap/opu/odata/sap/ZMATL_GRP_1_SRV/EntityOf0MATL_GRP_1_TEXT | Synced      | {"txn_seq":"1761289852999","txn_id":"0"}     |        | 23:03.8               | 0          | 0          | FALSE           |
+--------------------------------------+---------------+-------------+--------------------------------------------------------------+-------------+----------------------------------------------+--------+-----------------------+------------+------------+-----------------+

These views collectively present a complete monitoring answer for monitoring each total integration well being and particular person desk synchronization standing in Amazon Redshift.

Stipulations

Within the following sections, we stroll by the steps required to arrange an SAP connection and utilizing that connection to create a zero-ETL integration. Earlier than implementing this answer, you have to have the next in place:

  • An SAP account
  • An AWS account with administrator entry
  • Create an S3 Tables goal and affiliate the S3 bucket sap_demo_table_bucket as a location of the database
  • Replace AWS Glue Information Catalog settings utilizing the next IAM coverage for fine-grained entry management of the Information Catalog for zero-ETL
  • Create an IAM position named zero_etl_bulk_demo_role, for use by zero-ETL to entry information out of your SAP account
  • Create the key zero_etl_bulk_demo_secret in AWS Secrets and techniques Supervisor to retailer SAP credentials

Create connection to SAP occasion

To arrange a connection to your SAP occasion and supply information to entry, full the next steps:

  1. On the AWS Glue console, within the navigation pane beneath Information catalog, select Connections, then select Create Connection.
  2. For Information sources, choose SAP OData, then select Subsequent.

  3. Enter the SAP occasion URL.
  4. For IAM service position, select the position zero_etl_bulk_demo_role (created as a prerequisite).
  5. For Authentication Kind, select the authentication kind that you simply’re utilizing for SAP.
  6. For AWS Secret, select the key zero_etl_bulk_demo_secret (created as a prerequisite).
  7. Select Subsequent.

  8. For Title, enter a reputation, corresponding to sap_demo_conn.
  9. Select Subsequent.

Create zero-ETL integration

To create the zero-ETL integration, full the next steps:

  1. On the AWS Glue console, within the navigation pane beneath Information catalog, select Zero-ETL integrations, then select Create zero-ETL integration.
  2. For Information supply, choose SAP OData, then select Subsequent.

  3. Select the connection title and IAM position that you simply created within the earlier step.
  4. Select the SAP objects you need in your integration. The non-ODP objects are both configured for full load or incremental load, and ODP objects are mechanically configured for incremental ingestion.
    1. For full load, go away Incremental replace discipline set as No timestamp discipline chosen.

    2. For incremental load, select the edit icon for Incremental replace discipline and select a timestamp discipline.

    3. For ODP entities that supply delta token, the incremental replace discipline is pre-selected, and no buyer motion is critical.



      When making a brand new integration utilizing the identical SAP connection and entity within the information filter, you will be unable to pick out a distinct incremental replace discipline from the primary integration.
  5. For Goal particulars, select sap_demo_table_bucket (created as a prerequisite).
  6. For Goal IAM position, select sap_demo_role (created as a prerequisite).
  7. Select Subsequent.

  8. Within the Integration particulars part, for Title, enter sap-demo-integration.
  9. Select Subsequent.

  10. Overview the main points and select Create and launch integration.

The newly created integration is proven as Energetic in a few minute.

Clear up

To wash up your sources, full the next steps. This course of will completely delete the sources created on this publish; again up vital information earlier than continuing.

  1. Delete the zero-ETL integration sap-demo-integration.
  2. Delete the S3 Tables goal bucket sap_demo_table_bucket.
  3. Delete the Information Catalog connection sap_demo_conn.
  4. Delete the Secrets and techniques Supervisor secret zero_etl_bulk_demo_secret.

Conclusion

Now you can rework your SAP information analytics with out the complexity of conventional ETL processes. With AWS Glue zero-ETL, you’ll be able to achieve fast entry to your SAP information whereas sustaining its construction throughout S3 Tables, SageMaker lakehouse, and Amazon Redshift. Your groups can use ACID-compliant storage with time journey capabilities, schema evolution, and concurrent reads/writes at scale, whereas preserving information in cost-effective cloud storage. The answer’s AI capabilities by Amazon Q and SageMaker might help your corporation create on-demand information merchandise, run text-to-SQL queries, and deploy AI brokers utilizing Amazon Bedrock and Fast Suite.

To study extra, consult with the next sources:

Able to modernize your SAP information technique? Discover AWS Glue zero-ETL and enrich your group’s information analytics capabilities.


Concerning the authors

Shashank Sharma

Shashank Sharma

Shashank is an Engineering Chief with over 15 years of expertise in delivering information integration and replication options for first-party and third-party databases and SaaS for enterprise prospects. He leads engineering for AWS Glue Zero-ETL and Amazon AppFlow.

Parth Panchal

Parth Panchal

Parth is an skilled Software program Engineer with over 10 years of improvement expertise, specializing in AWS Glue zero-ETL and SAP information integration options. He excels at diving deep into advanced information replication challenges, delivering scalable options whereas sustaining excessive requirements for efficiency and reliability.

Diego Lombardini

Diego Lombardini

Diego is an skilled Enterprise Architect with over 20 years’ expertise throughout SAP applied sciences, specializing in SAP innovation and information and analytics. He has labored each as accomplice and as a buyer, giving him a whole perspective on what it takes to promote, implement, and run methods and organizations. He’s obsessed with know-how and innovation, specializing in buyer outcomes and delivering enterprise worth.

Abhijeet Jangam

Abhijeet Jangam

Abhijeet is Information and AI chief with 20 years of SAP techno practical expertise main technique and supply throughout a number of industries. With dozens of SAP implementations experiences, he brings broad practical course of data together with deep technical experience in software improvement, information engineering, and integrations.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments