Making Each Search Rewarding: How Ibotta Remodeled Provide Discovery With Databricks

June 18, 2025

3

At Ibotta, our mission is to Make Each Buy Rewarding. Serving to our customers (whom we name Savers) discover and activate related presents via our direct-to-consumer (D2C) app, browser extension, and web site is a essential a part of this mission. Our D2C platform helps hundreds of thousands of consumers earn cashback from their on a regular basis purchases—whether or not they’re unlocking grocery offers, incomes bonus rewards, or planning their subsequent journey. By way of the Ibotta Efficiency Community (IPN), we additionally energy white-label cashback applications for a number of the greatest names in retail, together with Walmart and Greenback Basic, serving to over 2,600 manufacturers attain greater than 200 million shoppers with digital presents throughout companion ecosystems.

Behind the scenes, our Knowledge and Machine Studying groups energy essential experiences like fraud detection, supply advice engines, and search relevance to make the Saver journey customized and safe. As we proceed to scale, we’d like data-driven, clever methods that assist each interplay at each touchpoint.

Throughout D2C and the IPN, search performs a pivotal function in engagement and must hold tempo with our enterprise scale, evolving supply content material, and altering Saver expectations.

On this submit we’ll stroll via how we considerably refined our D2C search expertise: from an formidable hackathon undertaking to a strong manufacturing characteristic now benefiting hundreds of thousands of Savers.

We believed our search may higher sustain with our Savers

Consumer search conduct has advanced from easy key phrases to incorporating pure language, misspellings, and conversational phrases. Trendy search methods should bridge the hole between what customers kind and what they really imply, deciphering context and relationships to ship related outcomes even when question phrases don’t precisely match the content material.

At Ibotta, our unique homegrown search system, at instances, struggled to maintain tempo with the evolving expectations of our Savers and we acknowledged a possibility to refine it.

The important thing areas for alternative we noticed included:

Enhancing semantic relevance: Specializing in understanding Saver intent over precise key phrase matches to attach them with the correct presents.
Enhancing understanding: Deciphering the complete nuance and context of person queries to offer extra complete and actually related outcomes.
Growing flexibility: Extra quickly integrating new supply sorts and adapting to altering Saver search patterns to maintain our discovery expertise rewarding.
Boosting discoverability: We needed extra sturdy instruments to make sure particular forms of presents or key promotions had been constantly seen throughout a wide selection of related search queries.
Accelerating iteration and optimization: Enabling sooner, impactful enhancements to the search expertise via real-time changes and efficiency tuning.

We believed the system may higher hold tempo with altering supply content material, search behaviors, and evolving Saver expectations. We noticed alternatives to extend the worth for each our Savers and our model companions.

From hackathon to manufacturing: reimagining search with Databricks

Addressing the constraints of our legacy search system required a centered effort. This initiative gained important momentum throughout an inside hackathon the place a cross-functional group, together with members from Knowledge, Engineering, Advertising Analytics, and Machine Studying, got here along with the thought to construct a contemporary, different search system utilizing Databricks Vector Search, which some members had realized about on the Databricks Knowledge + AI Summit.

In simply three days, our group developed a working proof-of-concept that delivered semantically related search outcomes. Right here’s how we did it:

Collected supply content material from a number of sources in our Databricks catalog
Created a Vector Search endpoint and index with the Python SDK
Used pay-per-token embedding endpoints with 4 completely different fashions (BGE massive, GTE massive, GTE small, a multilingual open-source mannequin, and a Spanish-language-specific mannequin)
Linked every part to our web site for a dwell demo

The hackathon undertaking gained first place, generated sturdy inside buy-in and momentum to transition the prototype right into a manufacturing system. Over the course of some months, and with shut collaboration from the Databricks group, we reworked our prototype into a strong full-fledged manufacturing search system.

From proof of idea to manufacturing

Shifting the hackathon proof-of-concept to a production-ready system required cautious iteration and testing. This section was essential not just for technical integration and efficiency tuning, but additionally for evaluating whether or not our anticipated system enhancements would translate into constructive adjustments in Saver conduct and engagement. Given search’s important function and deep integration throughout inside methods, we opted for the next method: we modified a key inside service that referred to as our unique search system, changing these calls with requests directed to the Databricks Vector Search endpoint, whereas constructing in sturdy, sleek fallbacks to the legacy system.

Most of our early work centered on understanding:

Within the first month, we ran a take a look at with a small share of our Savers which didn’t obtain the engagement outcomes we had hoped for. Engagement decreased, significantly amongst our most lively Savers, indicated by a drop in clicks, unlocks (when Savers specific curiosity in a proposal), and activations.

Nevertheless, the Vector Search answer provided important advantages together with:

Sooner response instances
A less complicated psychological mannequin
Larger flexibility in how we listed information
New skills to regulate thresholds and alter embedding textual content

Happy with the system’s underlying technical efficiency, we noticed its larger flexibility as the important thing benefit wanted to iteratively enhance search consequence high quality and overcome the disappointing engagement outcomes.

Constructing a semantic analysis framework

Following our preliminary take a look at outcomes, relying solely on A/B testing for search iterations was clearly inefficient and impractical. The variety of variables influencing search high quality was immense—together with embedding fashions, textual content mixtures, hybrid search settings, Approximate Nearest Neighbors (ANN) thresholds, reranking choices, and lots of extra.

To navigate this complexity and speed up our progress, we determined to ascertain a strong analysis framework. This framework wanted to be uniquely tailor-made to our particular enterprise wants and able to predicting real-world person engagement from offline efficiency metrics.

Our framework was designed round an artificial analysis atmosphere that tracked over 50 on-line and offline metrics. Offline, we monitored commonplace data retrieval metrics like Imply Reciprocal Rank (MRR) and precision@ok to measure relevance. Crucially, this was paired with on-line real-world engagement indicators equivalent to supply unlocks and click-through charges. A key determination was implementing an LLM-as-a-judge. This allowed us to label information and assign high quality scores to each on-line query-result pairs and offline outputs. This method proved to be essential for speedy iteration based mostly on dependable metrics and accumulating the labeled information essential for future mannequin fine-tuning.

Alongside the way in which, we leaned into a number of components of the Databricks Knowledge Intelligence Platform, together with:

Mosaic AI Vector Search: Used to energy high-precision, semantically wealthy search outcomes for analysis assessments.
MLflow patterns and LLM-as-a-judge: Supplied the patterns to judge mannequin outputs and implement our information labeling course of.
Mannequin Serving Endpoints: Environment friendly deployment of fashions straight from our catalog.
AI Gateway: To safe and govern our entry to 3rd occasion fashions through API.
Unity Catalog: Ensured the group, administration, and governance of all datasets used throughout the analysis framework.

This sturdy framework dramatically elevated our iteration velocity and confidence. We performed over 30 distinct iterations, systematically testing main variable adjustments in our Vector Search answer, together with:

Totally different embedding fashions (foundational, open-weights, and third occasion through API)
Numerous textual content mixtures to feed into the fashions
Totally different question modes (ANN vs Hybrid)
Testing completely different columns for hybrid textual content search
Adjusting thresholds for vector similarity
Experimenting with separate indexes for various supply sorts

The analysis framework reworked our growth course of, permitting us to make data-driven selections quickly and validate potential enhancements with excessive confidence earlier than exposing them to customers.

The seek for the perfect off-the-shelf mannequin

Following the preliminary broad take a look at that confirmed disappointing engagement outcomes, we shifted our focus to exploring the efficiency of particular fashions recognized as promising throughout our offline analysis. We chosen two third-party embedding fashions for manufacturing testing, accessed securely via AI Gateway. We performed short-term, iterative assessments in manufacturing (lasting a couple of days) with these fashions.

Happy with the preliminary outcomes, we proceeded to run an extended, extra complete manufacturing take a look at evaluating our main third-party mannequin and its optimized configuration towards the legacy system. This take a look at yielded combined outcomes. Whereas we noticed general enhancements in engagement metrics and efficiently eradicated the unfavourable impacts seen beforehand, these positive aspects had been modest—principally single-digit share will increase. These incremental advantages weren’t compelling sufficient to completely justify a whole alternative of our current search expertise.

Extra troubling, nevertheless, was the perception gained from our granular evaluation: whereas efficiency considerably improved for sure search queries, others noticed worse outcomes in comparison with our legacy answer. This inconsistency introduced a major architectural dilemma. We confronted the unappealing alternative of implementing a fancy traffic-splitting system to route queries based mostly on predicted efficiency—an method that may require sustaining two distinct search experiences and introduce a brand new, complicated layer of rule-based routing administration—or accepting the constraints.

This was a essential juncture. Whereas we had seen sufficient promise to maintain going, we wanted extra important enhancements to justify absolutely changing our homegrown search system. This led us to start fine-tuning.

Advantageous-tuning: customizing mannequin conduct

Whereas the third-party embedding fashions explored beforehand confirmed technical promise and modest enhancements in engagement, in addition they introduced essential limitations that had been unacceptable for a long-term answer at Ibotta. These included:

Incapability to coach embedding fashions on our proprietary supply catalog
Issue evolving fashions alongside enterprise and content material adjustments
Uncertainty concerning long-term API availability from exterior suppliers
The necessity to set up and handle new exterior enterprise relationships
Community calls to those suppliers weren’t as performant as self-hosted fashions

The clear path ahead was to fine-tune a mannequin particularly tailor-made to Ibotta’s information and the wants of our Savers. This was made doable because of the hundreds of thousands of labeled search interactions we had gathered from actual customers through our LLM-as-a-judge course of inside our customized analysis framework. This high-quality manufacturing information grew to become our coaching gold.

We then launched into a methodical fine-tuning course of, leveraging our offline analysis framework extensively.

Key components had been:

Infrastructure: We used AI Runtime with A10s in a serverless atmosphere, and Databricks ML Runtime for stylish hyperparameter sweeping.
Mannequin choice: We chosen a BGE household mannequin over GTE, which demonstrated stronger efficiency in our offline evaluations and proved extra environment friendly to coach.
Dataset engineering: We constructed quite a few coaching datasets, together with producing artificial coaching information, finally deciding on:
- One constructive consequence (a verified good match from actual searches)
- ~10 unfavourable examples per constructive, combining:
  - 3-4 “laborious negatives” (LLM labeled, human-verified inappropriate matches)
  - “In-batch negatives” (sampling of outcomes from unrelated search phrases)
Hyperparameter optimization: We systematically swept issues like studying price, batch measurement, length, and unfavourable sampling methods to search out optimum configurations.

After quite a few iterations and evaluations throughout the framework, our top-performing fine-tuned mannequin beat our greatest third-party baseline by 20% in artificial analysis. These compelling offline outcomes supplied the boldness wanted to speed up our subsequent manufacturing take a look at.

Search that drives outcomes—and income

The technical rigor and iterative course of paid off. We engineered a search answer particularly optimized for Ibotta’s distinctive supply catalog and person conduct patterns, delivering outcomes that exceeded our expectations and provided the pliability wanted to evolve alongside our enterprise. Primarily based on these sturdy outcomes, we accelerated migration onto Databricks Vector Search as the inspiration for our manufacturing search system.

In our closing manufacturing take a look at, utilizing our personal fine-tuned embedding mannequin, we noticed the next enhancements:

14.8% extra supply unlocks in search.
This measures customers choosing presents from search outcomes, indicating improved consequence high quality and relevance. Extra unlocks are a number one indicator of downstream redemptions and income.
6% improve in engaged customers.
This exhibits a larger share of customers discovering worth and taking significant motion throughout the search expertise, contributing to improved conversion, retention and lifelong worth.
15% improve in engagement on bonuses.
This displays improved surfacing of high-value, brand-sponsored content material, translating straight to higher efficiency and ROI for our model and retail companions.
72.6% lower in searches with zero outcomes.
The numerous discount means fewer irritating experiences and a serious enchancment in semantic search protection.
60.9% fewer customers encountering searches returning no outcomes.
This highlights the breadth of impression, exhibiting that a big portion of our person base is now constantly discovering outcomes, enhancing the expertise throughout the board.

Past user-facing positive aspects, the brand new system delivered on efficiency. We noticed 60% decrease latency to our search system, attributable to Vector Search question efficiency and the fine-tuned mannequin’s decrease overhead.

Leveraging the pliability of this new basis, we additionally constructed highly effective enhancements like Question Transformation (enriching obscure queries) and Multi-Search (fanning out generic phrases). The mix of a extremely related core mannequin, improved system efficiency, and clever question enhancements has resulted in a search expertise that’s smarter, sooner, and finally extra rewarding

Question Transformation

One problem with embedding fashions is their restricted understanding of area of interest key phrases, equivalent to rising manufacturers. To deal with this we constructed a question transformation layer that dynamically enriches search phrases in-flight based mostly on predefined guidelines.

For instance, if a person searches for an rising yogurt model the embedding mannequin may not acknowledge, we will remodel the question so as to add “Greek yogurt” alongside the model title earlier than sending it to Vector Search. This gives the embedding mannequin with essential product context whereas preserving the unique textual content for hybrid search.

This functionality additionally works hand-in-hand with our fine-tuning course of. Profitable transformations can be utilized to generate coaching information; as an illustration, together with the unique model title as a question and the related yogurt merchandise as constructive ends in a future coaching run helps the mannequin study these particular associations.

Multi-Search

For broad, generic searches like “child,” Vector Search may initially return a restricted variety of candidates, doubtlessly filtered down additional by focusing on and funds administration. To deal with this and improve consequence variety, we constructed a multi-search functionality that followers out a single search time period into a number of associated searches.

As a substitute of simply looking for “child,” our system robotically runs parallel searches for phrases like “child meals,” “child clothes,” “child drugs,” “child diapers,” and so forth. Due to the low latency of Vector Search, we will execute a number of searches in parallel with out rising the general response time to the person. This gives a wider and extra various set of related outcomes for wide-ranging class searches.

Classes Realized

Following the profitable closing manufacturing take a look at and the complete rollout of Databricks Vector Search to our person base – delivering constructive engagement outcomes, elevated flexibility, and highly effective search instruments like Question Transformation and Multi-Search – this undertaking journey yielded a number of beneficial classes:

Begin with a proof of idea: The preliminary hackathon method allowed us to rapidly validate the core idea with minimal upfront funding.
Measure what issues to you: Our tailor-made 50-metric analysis framework was essential; it gave us confidence that enhancements noticed offline would translate into enterprise impression, enabling us to keep away from repeated dwell testing till options had been actually promising.
Do not soar straight to fine-tuning: We realized the worth of completely evaluating off-the-shelf fashions and exhausting these choices earlier than investing within the larger effort required for fine-tuning.
Acquire information early: Beginning to label information from our second experiment ensured a wealthy, proprietary dataset was prepared when fine-tuning grew to become essential.
Collaboration accelerates progress: Shut partnership with Databricks engineers and researchers, sharing insights on Vector Search, embedding fashions, LLM-as-a-judge patterns, and fine-tuning approaches, considerably accelerated our progress.
Acknowledge cumulative impression: Every particular person optimization, even seemingly minor, contributed considerably to the general transformation of our search expertise.

What’s subsequent

With our fine-tuned embedding mannequin now dwell throughout all direct-to-consumer (D2C) channels, we subsequent plan to discover scaling this answer to the Ibotta Efficiency Community (IPN). This may carry improved supply discovery to hundreds of thousands extra consumers throughout our writer community. As we proceed to gather labeled information and refine our fashions via Databricks, we imagine we’re nicely positioned to evolve the search expertise alongside the wants of our companions and the expectations of their clients.

This journey from a hackathon undertaking to a manufacturing system proved that reimagining a core product expertise quickly is achievable with the correct instruments and assist. Databricks was instrumental in serving to us transfer quick, fine-tune successfully, and finally, make each search extra rewarding for our Savers.

Previous articleRetail versus finance: How genAI coding methods diverge

Next articleThe 7 Most Helpful Jupyter Pocket book Extensions for Information Scientists

Making Each Search Rewarding: How Ibotta Remodeled Provide Discovery With Databricks

We believed our search may higher sustain with our Savers

From hackathon to manufacturing: reimagining search with Databricks

From proof of idea to manufacturing

Constructing a semantic analysis framework

The seek for the perfect off-the-shelf mannequin

Advantageous-tuning: customizing mannequin conduct

Search that drives outcomes—and income

Question Transformation

Multi-Search

Classes Realized

What’s subsequent

WEKA Launches NeuralMesh to Serve Wants of Rising AI Workloads

RocksDB 101: Optimizing stateful streaming in Apache Spark with Amazon EMR and AWS Glue

Past Hashtags: The Rising Tech Instruments and Methods Powering Social Media Promotions

LEAVE A REPLY Cancel reply

Most Popular

Apple Provides M4 MacBook Air To Refurbished Retailer With 15% Low cost

Google launches conversational Search Reside in AI Mode

Is Karen Learn responsible? Contained in the Boston homicide case that resulted in a mistrial.

Pittsburgh’s Neighborhood 91 Advances as a Complete Additive Manufacturing Ecosystem

Recent Comments

ABOUT US

POPULAR POSTS

Apple Provides M4 MacBook Air To Refurbished Retailer With 15% Low cost

Google launches conversational Search Reside in AI Mode

Is Karen Learn responsible? Contained in the Boston homicide case that resulted in a mistrial.

POPULAR CATEGORY