At present, Databricks publicizes help for the ANSI SQL/PSM scripting language!
SQL Scripting is now obtainable in Databricks, bringing procedural logic like looping and control-flow instantly into the SQL you already know. Scripting in Databricks is predicated on open requirements and totally suitable with Apache Spark™.
For SQL-first customers, this makes it simpler to work instantly on the Lakehouse whereas benefiting from Databricks’ scalability and AI capabilities.
In case you already use Databricks, you’ll discover SQL scripting particularly helpful for constructing administrative logic and ELT duties. Key options embody:
- Scoped native variables
- Native exception dealing with based mostly on symbolic error circumstances
- IF-THEN-ELSE and CASE help
- A number of loop constructs, together with FOR loops over queries
- Loop management with ITERATE and LEAVE
- Dynamic SQL execution by EXECUTE IMMEDIATE
Sufficient with the function listing — let’s stroll by some actual examples. You possibly can use this pocket book to observe alongside.
Information administration
Administrative duties and knowledge cleanup are a continuing in enterprise knowledge administration — obligatory, routine, and unimaginable to keep away from. You’ll want to scrub up historic information, standardize combined codecs, apply new naming conventions, rename columns, widen knowledge sorts, and add column masks. The extra you’ll be able to automate these duties, the extra dependable and manageable your techniques might be over time. One widespread instance: imposing case-insensitive habits for all STRING columns in a desk.
Let’s stroll by how SQL scripting could make this type of schema administration repeatable and easy.
Schema administration: make all STRING columns in a desk case-insensitive
On this instance, we need to apply a brand new coverage for string sorting and comparability for each relevant column within the desk known as workers. We are going to use a regular collation sort, UTF8_LCASE, to make sure that sorting and evaluating the values on this desk will all the time be case-insensitive. Making use of this customary permits customers to profit from the efficiency advantages of utilizing collations, and simplifies the code as customers now not have to use LOWER() of their queries.
We are going to use widgets to specify which desk and collation sort to change. Utilizing the data schema, we are going to then discover all present columns of sort STRING in that desk and alter their collation. We are going to accumulate the column names into an array. Lastly, we are going to accumulate new statistics for the altered columns, multi functional script.
A pure extension of the above script is to increase it to all tables in a schema, and refresh views to choose up the collation change.
Information cleaning: repair grammar in free-form textual content fields
Is there any subject extra widespread on the earth of knowledge than ‘soiled knowledge’? Information from totally different techniques, units, and people, will inevitably have variations or errors that have to be corrected. If knowledge is just not cleaned up, you could have fallacious outcomes and miss an vital perception. You possibly can anticipate a rubbish response in case you feed rubbish into an LLM.
Let’s take a look at an instance that features the bane of each publication, together with this weblog: typos. Now we have a desk that features free-text entries in a column known as description. The problems within the textual content, which embody spelling and grammar errors, can be obvious to anybody who is aware of English. Leaving the information on this state will undoubtedly result in points later if attempting to investigate or examine the textual content. Let’s repair it with SQL Scripting! First, we extract tables holding this column identify from the data schema. Then repair any spelling errors utilizing ai_fix_grammar(). This perform is non-deterministic. So we use MERGE to realize our objective.
An attention-grabbing enchancment may very well be to let ai_classify() deduce whether or not a column comprises free-form textual content from the column identify or pattern knowledge. SQL Scripting makes administrative duties and cleansing up messy knowledge environment friendly and easy.
ETL
Clients use SQL for ETL right this moment. Why? As a result of SQL helps a strong set of knowledge transformation capabilities, together with joins, aggregations, filtering, with intuitive syntax, making pipeline code simple for any Information Engineer to increase, replace, and preserve. Now, with SQL Scripting, clients can simplify beforehand advanced approaches or deal with extra advanced logic with pure SQL.
Updating a number of tables
Anybody who sells bodily merchandise could have a course of for monitoring gross sales and monitoring shipments. A typical knowledge administration sample is to mannequin a number of tables to trace transactions, shipments, deliveries, and returns. Transaction monitoring is enterprise essential, and like all essential course of, it requires the dealing with of sudden values. With SQL Scripting, it’s simple to leverage a conditional CASE assertion to parse transactions into their acceptable desk, and if an error is encountered, to catch the exception.
On this instance, we take into account a uncooked transactions desk for which rows have to be routed right into a recognized set of goal tables based mostly on the occasion sort. If the script encounters an unknown occasion, a user-defined exception is raised. A session variable tracks how far the script acquired earlier than it completed or encountered an exception.
This instance script may very well be prolonged with an outer loop that retains polling for extra knowledge. With SQL Scripting, you’ve gotten each the facility and suppleness to handle and replace knowledge throughout your knowledge property. SQL Scripting offers you energy to deal with any knowledge administration process and effectively management the stream of knowledge processing.
Keep tuned to the Databricks weblog and the SQL periods on the upcoming Information + AI Summit, as we put together to launch help for Temp Tables, SQL Saved Procedures, and extra!
What to do subsequent
Whether or not you might be an present Databricks consumer doing routine upkeep or orchestrating a large-scale migration, SQL Scripting is a functionality you must exploit. SQL Scripting is described intimately in SQL Scripting | Databricks Documentation.
You possibly can attempt these examples instantly on this SQL Scripting Pocket book. For extra particulars, keep tuned for Half 2 of this sequence, which dives into SQL Scripting constructs and tips on how to use them.