We’re excited to announce the Public Preview for Apache IcebergTM help in Databricks, unlocking the complete Apache Iceberg and Delta Lake ecosystems with Unity Catalog. This Preview introduces two new options to Unity Catalog. First, now you can learn and write Managed Iceberg tables utilizing Databricks or exterior Iceberg engines by way of Unity Catalog’s Iceberg REST Catalog API. Powered by Predictive Optimization, these tables robotically run superior desk operations, together with Liquid Clustering, to ship out-of-box, quick question efficiency and storage effectivity. Managed Iceberg tables are additionally built-in with superior options throughout the Databricks platform, together with DBSQL, Mosaic AI, Delta Sharing, and MVs. Second, as a part of Lakehouse Federation, Unity Catalog now allows you to seamlessly entry and govern Iceberg tables managed by overseas catalogs similar to AWS Glue, Hive Metastores, and Snowflake Horizon Catalog.
With these new options, you’ll be able to hook up with Unity Catalog from any engine and entry all of your knowledge, throughout catalogs and no matter format, breaking knowledge silos and resolving ecosystem incompatibilities. On this weblog, we are going to cowl:
- Figuring out new knowledge silos
- Utilizing Unity Catalog as a totally open Iceberg catalog
- Extending UC governance to your complete Lakehouse
- Our imaginative and prescient for the way forward for open desk codecs
The brand new knowledge silos
New knowledge silos have emerged alongside two foundational parts of the Lakehouse: open desk codecs and knowledge catalogs. Open desk codecs allow ACID transactions on knowledge saved in object storage. Delta Lake and Apache Iceberg, the 2 main open desk codecs, developed connector ecosystems throughout a variety of open supply frameworks and industrial platforms. Nevertheless, hottest platforms solely adopted one of many two requirements, forcing clients to decide on engines when selecting a format.
Catalogs introduce extra challenges. A core accountability of a catalog is managing a desk’s present metadata recordsdata throughout writers and readers. Nevertheless, some catalogs limit what engines are allowed to put in writing them. Even when you handle to retailer all of your knowledge in a format supported by all of your engines, you should still not be capable to use your chosen engine as a result of it can’t hook up with your catalog. This vendor lock-in forces clients to fragment knowledge discovery and governance throughout disparate catalogs.
Over the subsequent two sections, we are going to cowl how Unity Catalog makes use of open requirements and catalog federation to resolve format and catalog incompatibilities.
A Totally Open Iceberg Catalog
Unity Catalog breaks format silos by open requirements. Now in Public Preview, you should use Databricks and exterior engines to put in writing Iceberg tables managed by Unity Catalog. Managed Iceberg tables are totally open to your complete Iceberg ecosystem by way of Unity Catalog’s implementation of the Iceberg REST Catalog APIs. The REST Catalog is an open API specification that gives a typical interface for interacting with Iceberg tables. Unity Catalog was an early adopter of the REST Catalog, first launching help in 2023. This Preview builds on that basis. Now, just about any Iceberg shopper appropriate with the REST spec, similar to Apache Spark™, Apache Flink, or Trino can learn and write to Unity Catalog.
We plan to retailer all our knowledge in an open format and need a single catalog that may hook up with all of the instruments we use. Unity Catalog permits us to put in writing Iceberg tables which are totally open to any Iceberg shopper, unlocking your complete Lakehouse ecosystem and future proofing our structure.
— Hen Ben-Hemo, Information Platform Architect
With Managed Iceberg, you’ll be able to deliver Unity Catalog governance to the Iceberg ecosystem even amongst OSS instruments like PyIceberg that don’t natively help authorization. Unity Catalog lets you create knowledge pipelines that span the complete Lakehouse ecosystem. For instance, Apache Iceberg presents a common sink connector for writing from Kafka to Iceberg tables. You need to use Kafka Join to put in writing Iceberg tables to Unity Catalog and downstream use Databricks’s best-in-class price-performance for ETL, knowledge warehousing, and machine studying capabilities.
All Managed Tables robotically ship best-in-class learn efficiency and storage optimization utilizing Predictive Optimization. Predictive Optimization robotically expires outdated snapshots, deletes unreferenced recordsdata, and incrementally clusters your knowledge utilizing Liquid Clustering. In our instance utilizing Kafka, this prevents efficiency degradation generally attributable to the proliferation of small recordsdata. You’ll be able to maintain your Iceberg tables wholesome and performant with out the effort of manually managing your personal desk upkeep.
Managed Iceberg tables are built-in with the Databricks platform, permitting you to leverage these tables with superior platform options similar to DBSQL, Mosaic AI, Delta Sharing, and MVs. Past Databricks, Unity Catalog helps a accomplice ecosystem to securely land knowledge in Iceberg utilizing exterior instruments. For instance, Redpanda ingests streaming knowledge produced to Kafka matters by Unity catalog’s Iceberg REST Catalog API:
With Unity Catalog Managed Iceberg Tables and the Iceberg REST Catalog, Redpanda can now stream the most important, most demanding Kafka workloads immediately into Iceberg tables which are optimized by Unity Catalog, unlocking out-of-box discoverability and quick question efficiency on arbitrary streams. With push-button configuration, all real-time streaming knowledge is now totally obtainable to the Iceberg ecosystem, so clients can have faith that their structure is constructed to final, regardless of how their stack evolves.
— Matthew Schumpert, Head of Product, Platform
We’re excited to have the next launch companions on board: Atlan, Buf, CelerData, Clickhouse, dbt Labs, dltHub, Fivetran, Informatica, PuppyGraph, Redpanda, RisingWave, StreamNative, and extra.
The Lakehouse Catalog
With Unity Catalog, you’ll be able to interoperate not solely throughout desk codecs, but in addition throughout catalogs. Now additionally in Public Preview, you’ll be able to seamlessly question and govern Iceberg tables managed by exterior catalogs similar to AWS Glue, Hive Metastores, and Snowflake Horizon Catalog. Extending Hive Metastore and AWS Glue Federation, these connectors assist you to mount complete catalogs inside Unity Catalog, making a unified interface for knowledge discovery and governance.
Federation supplies a seamless integration to leverage Unity Catalog’s superior options on Iceberg tables managed by overseas catalogs. You need to use Databricks’ fine-grained entry controls, lineage, and auditing on all of your knowledge, throughout catalogs and no matter format.
Unity Catalog permits Rippling ML engineers and Information Scientists to seamlessly entry Iceberg tables in current OLAP warehouses with zero copy. This helps us decrease prices, create constant sources of fact, and cut back latency of knowledge refresh — all whereas sustaining excessive requirements on knowledge entry and privateness throughout your complete knowledge lifecycle.
— Albert Strasheim, Chief Expertise Officer
With federation, Unity Catalog can govern the whole lot of your Lakehouse – throughout all of your tables, AI fashions, recordsdata, notebooks, and dashboards.
The Way forward for Desk Codecs
Unity Catalog is pushing the trade nearer to realizing the simplicity, flexibility, and decrease prices of the open knowledge lakehouse. At Databricks, we imagine that we will advance the trade even additional – with a single, unified open desk format. Delta Lake and Apache Iceberg share a lot of the identical design, however refined variations trigger giant incompatibilities for patrons. To resolve these shared issues, the Delta and Apache Iceberg communities are aligning ideas and contributions, unifying the Lakehouse ecosystem.
Iceberg v3 is a serious step towards this imaginative and prescient. Iceberg v3 contains key options like Deletion Vectors, Variant knowledge kind, Row IDs, and geospatial knowledge sorts that share an identical implementations in Delta Lake. These enhancements allow you to maneuver knowledge and delete recordsdata between codecs simply, with out rewriting petabytes of knowledge.
In future Delta Lake and Apache Iceberg releases, we wish to construct on this basis in order that Delta and Iceberg shoppers can use the identical metadata and thus, can share tables immediately. With these investments, clients can understand the unique purpose of an open knowledge lakehouse – a totally built-in platform for knowledge and AI on a single copy of knowledge.
Managed and International Iceberg tables at the moment are obtainable in Public Preview. Take a look at our documentation to get began! Replay our bulletins at Information and AI Summit on June 9-12, 2025 to be taught extra about our latest Iceberg options and the way forward for open desk codecs.