Introducing AWS Glue 5.1 for Apache Spark

December 10, 2025

37

AWS Glue is a serverless, scalable knowledge integration service that makes it easy to find, put together, transfer, and combine knowledge from a number of sources. AWS just lately introduced Glue 5.1, a brand new model of AWS Glue that accelerates knowledge integration workloads in AWS. AWS Glue 5.1 upgrades the Spark engines to Apache Spark 3.5.6, providing you with newer Spark launch together with the newer dependent libraries so you possibly can develop, run, and scale your knowledge integration workloads and get insights quicker.

On this put up, we describe what’s new in AWS Glue 5.1, key highlights on Spark and associated libraries, and how one can get began on AWS Glue 5.1.

What’s new in AWS Glue 5.1

The next updates are in AWS Glue 5.1:

Runtime and library upgrades

AWS Glue 5.1 upgrades the runtime to Spark 3.5.6, Python 3.11, and Scala 2.12.18 with new enhancements from the open supply model. AWS Glue 5.1 additionally updates help for open desk format libraries to Apache Hudi 1.0.2, Apache Iceberg 1.10.0, and Delta Lake 3.3.2 so you possibly can resolve superior use circumstances round efficiency, value, governance, and privateness in your knowledge lakes.

Assist for brand new Apache Iceberg options

AWS Glue 5.1 provides help for Apache Iceberg Materialized View, and Apache Iceberg format model 3.0. AWS Glue 5.1 additionally provides help for knowledge writes into Iceberg and Hive tables with Spark-native fine-grained entry management with AWS Lake Formation.

Apache Iceberg Materialized View is very helpful in circumstances the place it’s essential to speed up continuously run queries on massive knowledge units by pre-computing costly aggregations. If you want to study extra about Apache Iceberg materialized views, confer with Introducing Apache Iceberg materialized views in AWS Glue Knowledge Catalog.

Apache Iceberg format model 3.0 is the newest Iceberg format model outlined in Iceberg Desk Spec. Following options are supported:

Create an Iceberg V3 format desk

To create an Iceberg V3 format desk, specify the format-version to 3 when creating the desk. The next is a pattern PySpark script: (substitute amzn-s3-demo-bucket along with your S3 bucket identify):

from pyspark.sql import SparkSession

s3bucket = "amzn-s3-demo-bucket" 
database = "glue51_blog_demo" 
table_name = "iceberg_v3_table_demo"

spark = (
    SparkSession.builder
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
    .config("spark.sql.defaultCatalog", "glue_catalog")
    .config("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog")
    .config("spark.sql.catalog.glue_catalog.kind", "glue")
    .config("spark.sql.catalog.glue_catalog.warehouse", f"s3://{s3bucket}/{database}/{table_name}/")
    .getOrCreate()
)

spark.sql(f"CREATE DATABASE IF NOT EXISTS {database}")

# Create Iceberg desk with V3 format-version
spark.sql(f"""
    CREATE TABLE IF NOT EXISTS {database}.{table_name} (
        id int,
        identify string,
        age int,
        created_at timestamp
    ) USING iceberg
    TBLPROPERTIES (
        'format-version'='3',
        'write.delete.mode'='merge-on-read'
    )
""")

Emigrate from V2 format to V3, use ALTER TABLE ... SET TBLPROPERTIES to replace the format-version. The next is a pattern PySpark script:

spark.sql(f"ALTER TABLE {database}.{table_name} SET TBLPROPERTIES ('format-version'='3')")

You can not rollback from V3 to V2, so it’s essential to watch out to confirm that every one your Iceberg shoppers help Iceberg V3 format model. As soon as upgraded, older variations can not appropriately learn newer format variations, as Iceberg desk format variations usually are not forward-compatible.

Create a desk with Row Lineage monitoring enabled

To create a desk with Row Lineage monitoring enabled, set the desk property row-lineage to true. The next is a pattern PySpark script:

# Create Iceberg desk with row-lineage-tracking
spark.sql(f"""
    CREATE TABLE IF NOT EXISTS {database}.{table_name} (
        id int,
        identify string,
        age int,
        created_at timestamp
    ) USING iceberg
    TBLPROPERTIES (
        'format-version'='3',
        'row-lineage'='true',
        'write.delete.mode'='merge-on-read'
    )
""")

In tables with Row Lineage monitoring enabled, row IDs are managed on the metadata stage for monitoring row modifications over time and auditing.

Prolonged help for AWS Lake Formation permissions

Positive-grained entry management with Lake Formation has been supported by means of native Spark DataFrames and Spark SQL in Glue 5.0 for learn operations. Glue 5.1 extends fine-grained entry management for write operations.

Full-Desk Entry (FTA) management in Apache Spark have been launched for Apache Hive and Iceberg tables in Glue 5.0. Glue 5.1 extends FTA help for Apache Hudi tables and Delta Lake tables.

S3A by default

AWS Glue 5.1 makes use of S3A because the default S3 connector. This alteration aligns with the current Amazon EMR adoption of S3A because the default connector and brings enhanced efficiency and superior options to Glue workloads. For extra particulars in regards to the S3A connector’s capabilities and optimizations, see Optimize Amazon EMR runtime for Apache Spark with EMR S3A.

Observe when migrating from Glue 5.0 to Glue 5.1, If each spark.hadoop.fs.s3a.endpoint and spark.hadoop.fs.s3a.endpoint.area usually are not set, the default area utilized by S3A is us-east-2. This will likely trigger points. To mitigate the problems attributable to this variation, set the spark.hadoop.fs.s3a.endpoint.area Spark configuration when utilizing the S3A file system in AWS Glue 5.1.

Dependent library upgrades

AWS Glue 5.1 upgrades the runtime to Spark 3.5.6, Python 3.11, and Scala 2.12.18 with upgraded dependent libraries.

The next desk lists dependency upgrades:

Dependency	Model in AWS Glue 5.0	Model in AWS Glue 5.1
Spark	3.5.4	3.5.6
Hadoop	3.4.1	3.4.1
Scala	2.12.18	2.12.18
Hive	2.3.9	2.3.9
EMRFS	2.69.0	2.73.0
Arrow	12.0.1	12.0.1
Iceberg	1.7.1	1.10.0
Hudi	0.15.0	1.0.2
Delta Lake	3.3.0	3.3.2
Java	17	17
Python	3.11	3.11.14
boto3	1.34.131	1.40.61
AWS SDK for Java	2.29.52	2.35.5
AWS Glue Knowledge Catalog Consumer	4.5.0	4.9.0
EMR DynamoDB Connector	5.6.0	5.7.0

The next are database connector (JDBC driver) upgrades:

Driver	Connector model in AWS Glue 5.0	Connector model in AWS Glue 5.1
MySQL	8.0.33	8.0.33
Microsoft SQL Server	10.2.0	10.2.0
Oracle Databases	23.3.0.23.09	23.3.0.23.09
PostgreSQL	42.7.3	42.7.3
Amazon Redshift	redshift-jdbc42-2.1.0.29	redshift-jdbc42-2.1.0.29

The next are Spark connector upgrades:

Driver	Connector model in AWS Glue 5.0	Connector model in AWS Glue 5.1
Amazon Redshift	6.4.0	6.4.2
OpenSearch	1.2.0	1.2.0
MongoDB	10.3.0	10.3.0
Snowflake	3.0.0	3.1.1
BigQuery	0.32.2	0.32.2
AzureCosmos	4.33.0	4.33.0
AzureSQL	1.3.0	1.3.0
Vertica	3.3.5	3.3.5

Get began with AWS Glue 5.1

You can begin utilizing AWS Glue 5.1 by means of AWS Glue Studio, the AWS Glue console, the newest AWS SDK, and the AWS Command Line Interface (AWS CLI).

To begin utilizing AWS Glue 5.1 jobs in AWS Glue Studio, open the AWS Glue job and on the Job Particulars tab, select the model Glue 5.1 – Helps Spark 3.5, Scala 2, Python 3.

To begin utilizing AWS Glue 5.1 on an AWS Glue Studio pocket book or an interactive session by means of a Jupyter pocket book, set 5.1 within the %glue_version magic:

The next output exhibits that the session is ready to make use of AWS Glue 5.1:

Setting Glue model to: 5.1

Spark Troubleshooting with Glue 5.1

To speed up Apache Spark troubleshooting and job efficiency optimization to your Glue 5.1 ETL jobs, you should use the newly launched Apache Spark troubleshooting agent. Conventional Spark troubleshooting requires in depth guide evaluation of logs, efficiency metrics, and error patterns to establish root causes and optimization alternatives. The agent simplifies this course of by means of pure language prompts, automated workload evaluation, and clever code suggestions. The agent has three fundamental elements: an MCP-compatible AI assistant in your growth atmosphere for interplay, the MCP proxy for AWS that handles safe communication between your consumer and the MCP server, and an Amazon SageMaker Unified Studio managed MCP Server (preview) that gives specialised Spark troubleshooting and improve instruments for Glue 5.1 jobs.

To arrange the agent, observe the directions to arrange the sources and MCP configuration: Setup for Apache Spark Troubleshooting agent. Then, you possibly can launch your most well-liked MCP consumer and use dialog to work together with the instruments for troubleshooting.

The next is an indication on how you should use the Apache Spark troubleshooting agent with Kiro CLI to debug a Glue 5.1 job run.

For extra info and video walkthroughs for how one can use the Apache Spark troubleshooting agent, please confer with Apache Spark Troubleshooting agent for Amazon EMR.

Conclusion

On this put up, we mentioned the important thing options and advantages of AWS Glue 5.1. You may create new AWS Glue jobs on AWS Glue 5.1 or migrate your current AWS Glue jobs to profit from the enhancements.

We wish to thank the help of quite a few engineers and leaders who helped construct Glue 5.1 to help prospects with a efficiency optimized Spark runtime and ship new capabilities.

In regards to the authors

Previous articleLinux Basis launches Agentic AI Basis

Next articleCisco AI Readiness Index warns about ageing infrastructure

Introducing AWS Glue 5.1 for Apache Spark

What’s new in AWS Glue 5.1

Runtime and library upgrades

Assist for brand new Apache Iceberg options

Create an Iceberg V3 format desk

Create a desk with Row Lineage monitoring enabled

Prolonged help for AWS Lake Formation permissions

S3A by default

Dependent library upgrades

Get began with AWS Glue 5.1

Spark Troubleshooting with Glue 5.1

Conclusion

In regards to the authors

Make PPTs, PDFs, and Excel Sheets in Seconds With Kimi K2.5

16 NotebookLM Prompts Each Trainer Ought to Be Utilizing in 2026

Streamline your Amazon Redshift upkeep occasion notifications with Amazon Easy Notification Service

LEAVE A REPLY Cancel reply

Most Popular

ABB Robotics seeks to standardize measurement of robotic power consumption

Make PPTs, PDFs, and Excel Sheets in Seconds With Kimi K2.5

9 weeks to 9 days: How autonomous drilling is reworking information heart development

AT&T beats targets, plans 40m fiber areas, extra convergence

Recent Comments

ABOUT US

POPULAR POSTS

ABB Robotics seeks to standardize measurement of robotic power consumption

Make PPTs, PDFs, and Excel Sheets in Seconds With Kimi K2.5

9 weeks to 9 days: How autonomous drilling is reworking information heart development

POPULAR CATEGORY