HomeBig DataIntroducing SQL Saved Procedures in Databricks

Introducing SQL Saved Procedures in Databricks


Managing repetitive SQL duties—like cleansing information, updating enterprise guidelines, or operating batch logic will be tedious and error-prone when you’re copying and pasting code. 

Now, SQL Saved Procedures in Databricks allow you to retailer that logic as soon as, run it everytime you want, and maintain it ruled below Unity Catalog. 

Whether or not you are cleansing up information earlier than analytics, updating tables based mostly on enterprise standards, or shifting workloads from a legacy enterprise information warehouse, saved procedures make the method less complicated, extra constant, and simpler to keep up.

Databricks helps open requirements and interoperability, avoiding proprietary or vendor-specific implementations. SQL Saved Procedures observe the ANSI/PSM SQL normal and will probably be contributed to open supply Apache Spark™.

Procedures are extensively utilized in administrative duties, information administration, and ETL workflows—particularly in enterprise information warehouses (EDWs). For purchasers shifting from EDWs to Databricks, present saved procedures will be migrated with out rewriting, making the transition less complicated. And as all the time, the very best information warehouse is a lakehouse.

For certainly one of our essential use circumstances round buyer segmentation, we leveraged SQL Saved Procedures with DBSQL to attain higher efficiency, scalability, and price effectivity. Being conversant in SQL helped us implement and deploy the answer to manufacturing in a really brief time. Utilizing Saved Procedures has enabled us to handle complicated logic extra successfully whereas preserving the general structure streamlined and maintainable. —SambaSiva Rao, Sr. Information Engineer/Architect, ClicTechnologies

Overview of SQL Saved Procedures

What are Saved Procedures?

Throughout information processing workflows, prospects can battle to keep up consistency and efficiency of repetitive duties and complicated logic. Saved procedures are a fantastic method in these circumstances, guaranteeing information is processed in a constant, standardized means, and efficiency is perfect. 

For information cleansing duties, procedures can apply transformations similar to changing inconsistent date codecs right into a standardized construction, trimming main and trailing whitespaces from textual content fields, and changing or correcting faulty values. This ensures that your information is ready for downstream evaluation. See the detailed ETL instance beneath.

On the info administration aspect, saved procedures can effectively replace desk values based mostly on outlined enterprise guidelines—similar to flagging outdated data, recalculating fields, or synchronizing information throughout associated tables. By encapsulating these operations into procedures, groups can guarantee constant execution, scale back guide intervention, and enhance information high quality at scale. See the detailed information administration instance beneath, utilizing saved procedures to replace a loyalty/membership program.

So what are Procedures? They’re pre-compiled collections of SQL statements that permit a person to handle their SQL logic right into a single, reusable unit. Procedures are saved in Unity Catalog, which means they’re ruled and absolutely encapsulate permissions. When a saved process is named, the database executes these pre-defined operations, providing the advantage of enhanced safety, simplified upkeep of complicated workloads, and the potential for improved efficiency.

What’s supported in Public Preview?

There are 5 core instructions that help procedures: CREATE, CALL, DESCRIBE, SHOW, and DROP.

  • CREATE PROCEDURE: Outline and retailer a brand new saved process inside Unity Catalog. It specifies the process’s identify, parameters (if any), and the SQL statements to be executed when the process is named.
  • CALL PROCEDURE: Execute a beforehand created saved process; passing in any required parameters.
  • DESCRIBE PROCEDURE: Return the essential metadata about an present process, similar to its identify and parameters. With EXTENDED, the outline will embody extra metadata together with proprietor, datetime it was created, safety kind, and many others.
  • SHOW PROCEDURES: Checklist all saved procedures out there inside the present catalog schema.
  • DROP PROCEDURE: Delete an present process from storage. 

When making a process, you should utilize a number of parameter varieties to regulate the enter and output. 

  • `IN` parameter: Used to cross values right into a process as enter. For instance, you would possibly cross in a buyer ID to retrieve or course of solely that buyer’s information. The process can learn however not modify these values.
  • `OUT` parameter: Used for returning values from a process, after being assigned. For instance, you would possibly cross in a buyer ID, and return their account standing or computed gross sales complete, for additional processing exterior the process.
  • `INOUT` parameter: Serves a twin goal, permitting a price to be handed right into a process, modified inside it, and the up to date worth returned. It capabilities as each an enter and an output. 

These parameters will be assigned to:

  • Native variables, declared inside a script/process
  • Session variables, declared inside the session, and even exterior of the script/process

The logic encapsulated inside a SQL Saved Process is constructed on prime of SQL Scripting. A saved process will be thought of as a reusable script with parameters, ruled by Unity Catalog. You may examine Scripting in these two introductory blogs:

Nested and recursive process calls are supported, which means that prospects can manage their items of labor or enterprise logic conveniently into separate procedures, making the whole SQL execution move extra modular. This improves readability and upkeep.

How are Procedures totally different from Capabilities?

Procedures are grouped with Capabilities in Unity Catalog within the UI. Nonetheless, procedures and capabilities, whereas letting you reuse SQL logic, serve totally different functions.

A perform is used to return a price or a desk. It should be used inside a SQL question and can’t embody dynamic SQL or procedural logic. A process, against this, is used to execute a sequence of SQL statements. It will probably embody management move, variables, loops, and dynamic SQL utilizing IDENTIFIER and EXECUTE IMMEDIATE. You name a process as a standalone command, sometimes to carry out a activity or workflow.

Examples of utilizing SQL Saved Procedures

Now that we’ve lined the capabilities of SQL Saved Procedures, let’s discover some examples to show their worth and the issues they assist resolve.

You should utilize this pocket book to observe alongside – it comprises all examples from this publish, in addition to information preparation instructions.

ETL – Information Cleansing: Making ready Silver or Gold layer tables

In the event you observe the standard medallion structure, you realize shifting information from Bronze to Silver (or Silver to Gold) can require cleansing, remodeling, aggregating, and formatting information. Saved Procedures are nice for managing repetitive processes like these inside an ETL workflow. 

On this ETL situation, a Process is used to:

  • Load information from uncooked right into a truth desk
  • Choose information based mostly on date vary and origin of sale (net, cellular, point-of-sale, 3p-vendor)
  • Clear and format the info by changing dates to a selected format and eradicating whitespaces
  • As soon as clear, loading the info right into a ‘clear’ desk
  • Add a log file based mostly on timestamp, to-from dates, and origin of sale
  • Utilizing the Process, search for the mobileApp gross sales for June 2025

Procedures like this assist standardize information merchandise. Any person of this process will produce information in the identical construction, whatever the date vary or point-of-sale. This can be a major advantage of reusing code. Code reuse will naturally be much less error inclined as the identical logic will get executed each time.

Information Administration: Replace a desk based mostly on enterprise standards

Information administration is the observe of guaranteeing your information is correct, constant, and effectively accessed—qualities which can be important for any group aiming to make data-driven selections. With out sturdy information administration, even essentially the most superior analytics or reporting efforts will be undermined by unreliable or inconsistent info. 

Let’s look at one instance discovered throughout business industries the place it is not uncommon for a enterprise to determine a loyalty program to supply prospects with advantages based mostly on their tier. Airways have frequent flier packages and most retail franchises have rewards packages, and many others. As prospects fly extra with the identical airline or buy extra gadgets from the identical franchise, prospects earn extra advantages.

Right here is an instance of how Saved Procedures can be utilized to handle and replace a normal retail loyalty program. There are two procedures used to handle the client loyalty tiers, one for updating a selected buyer loyalty tier for the offered customerID, and the opposite one which updates the client loyalty tier for all prospects from a offered nation.

Now let’s use the created process to replace buyer loyalty tiers for purchasers from Serbia, Germany and Canada, after which test the up to date data:

The earlier question produces the next end result:

By encapsulating the replace tier logic into respective procedures, we’re avoiding code duplication whereas additionally eradicating complexity from the caller, who solely must invoke the process with the suitable process parameters.

What’s subsequent?

With SQL Saved Procedures now in DBSQL, prospects can proceed emigrate legacy enterprise information warehouse workloads to the lakehouse. Based mostly on buyer suggestions, there are a number of key capabilities we need to handle as we transfer towards GA:

  • SQL Saved Procedures in Apache Spark™: Help for all SQL Procedures in open supply
  • Help for SQL SECURITY DEFINER: Enable prospects to execute process logic with the permissions of the process creator (definer)

Prospects who need to share suggestions or requests associated to SQL Scripting and Procedures can achieve this via this type.

Two different essential SQL constructs are carefully tied to Saved Procedures: Short-term Tables and Multi-Assertion (multi-table) Transactions. Each of those efforts are presently in Non-public Preview and accepting nominations. If you’re , attain out to your Databricks account staff.

Whether or not you’re an present Databricks person or migrating from one other Information Warehouse, SQL Saved Procedures are a functionality it is best to use to simplify the way you handle complicated SQL workflows. Get began with SQL Saved Procedures by studying the Databricks documentation.

To study extra about Databricks SQL, go to our web site or learn the documentation. You may also take a look at the product tour for Databricks SQL. If you wish to migrate your present warehouse to a high-performance, serverless information warehouse with a fantastic person expertise and decrease complete value, then Databricks SQL is the answer — attempt it at no cost.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments