How Socure achieved 50% value discount by migrating from self-managed Spark to Amazon EMR Serverless

December 15, 2025

36

Socure is likely one of the main suppliers of digital id verification and fraud options. Its predictive analytics platform applies synthetic intelligence (AI) and machine studying (ML) strategies to course of each on-line and offline intelligence, together with government-issued paperwork, contact info (e mail, telephone, deal with), private identifiers (DOB, SSN), and machine or community information (IP, velocity) to confirm identities precisely and in actual time.

Socure ID+ is an id verification platform that makes use of a number of Socure choices corresponding to KYC, SIGMA, eCBSV. Telephone Danger and extra. It has two environments centered on proof of idea (POC) and dwell prospects. The Information Science (DS) setting is designed for the POC or proof of worth (POV) stage. On this setting, prospects present datasets by way of SFTP, that are processed by Socure’s information scientists by an inside endpoint. The info undergoes ML-based scoring and different intelligence calculations relying on the chosen modules and processed outcomes are saved in Amazon Easy Storage Service (Amazon S3) in delta open desk format . Within the Manufacturing (Prod) setting, prospects can confirm identities both in actual time by dwell endpoints or by way of a batch processing interface.

Socure’s information science setting features a streaming pipeline referred to as Transaction ETL (TETL), constructed on OSS Apache Spark operating on Amazon EKS. TETL ingests and processes information volumes starting from small to massive datasets whereas sustaining high-throughput efficiency.

The first function of this pipeline is to present information scientists a versatile setting to run POC workloads for patrons.

Information scientists…

set off ingestion of POC datasets, starting from small batches to large-scale volumes.
eat the processed outputs written by the pipeline for evaluation and mannequin growth.
share the outcomes with Socure’s prospects.

The next diagram exhibits the Transaction ETL (TETL) structure.

Transaction ETL architecture

This pipeline instantly helps buyer POCs, guaranteeing that the best information is obtainable for experimentation, validation, and demonstration. As such, it’s a important hyperlink between uncooked information and customer-facing outcomes, making its reliability and efficiency important for delivering worth. On this submit, we present how Socure was capable of obtain 50% value discount by migrating the TETL streaming pipeline from self-managed spark to Amazon EMR serverless.

Motivation

As information volumes have scaled by 10x, a number of challenges like latency and information reliability have emerged that instantly influence the shopper expertise:

Efficiency points attributable to inefficient autoscaling main to extend in latency as much as 5x
Excessive operational value of sustaining an OSS Spark setting on EKS

Moreover, now we have recognized different vital points:

Useful resource constraints attributable to occasion provisioning limits, forcing using smaller nodes. This results in frequent spark executor out of reminiscence (OOM) failures underneath heavy hundreds, growing job latency and delaying information availability.
Efficiency bottlenecks with Delta Lake, the place massive batch operations corresponding to OPTIMIZE compete for assets and decelerate streaming workloads.

Throughout this migration, we additionally took the chance to transition to AWS Graviton, enabling further value efficiencies as defined on this submit.

With these two major drivers we started exploring various structure utilizing Amazon EMR. We already dd in depth benchmarking on a number of id verification associated batch workloads on totally different EMR platforms and got here to the conclusion that Amazon EMR Serverless (EMR-S) presents a path to cut back operational value, enhance reliability, and higher deal with large-scale batch and streaming workloads; tackling each customer-facing points and platform-level inefficiencies.

The brand new pipeline structure

The info processing pipeline follows a two-stage structure the place streaming information from Amazon Kinesis Information Stream first flows into the uncooked layer, which parses incoming information into massive JSON blobs, applies encryption, and shops the leads to append-only Delta Tables. The processed layer consumes information from these uncooked Delta tables, performs decryption, transforms the information right into a flattened and large construction with correct area parsing, applies particular person encryption to personally identifiable info (PII) fields, and writes the refined information to separate append-only Delta Tables for downstream consumption.

The next diagram exhibits the TETL earlier than/after structure we carried out, transitioning from OSS Spark on EKS to Spark on EMR Serverless.

Benchmarking

We benchmarked end-to-end pipeline efficiency throughout OSS Spark on EKS and EMR Serverless. The analysis centered on latency and value underneath comparable useful resource configurations.

Useful resource Configuration

EKS (OSS Spark):

Min 30 executors
Max 90 executors
14 GB reminiscence / 2 cores per executor

EMR Serverless:

Min 10 executors
Max 30 executors
27 GB reminiscence / 4 cores per executor
Successfully ~60 executors when normalized for 2x reminiscence and cores, designed to mitigate the OOM points described earlier.

Observations

Autoscaling Effectivity: EMR Serverless scaled down successfully to twenty staff on common over the weekend (low site visitors day), leading to decrease prices as much as 12% in comparison with weekday.
Executor Sizing: Bigger executors on EMR Serverless prevented OOM failures and improved stability underneath load.

Definitions

Value: It’s the service value for each uncooked & processed jobs from the AWS Value Explorer.
Latency: Finish-to-end latency measures the time from Socure ID+ occasion technology till information arrives within the processed delta desk, calculated as Inserted Date minus Occasion Date.

Outcomes

The values within the following desk signify share enhancements noticed when operating on EMR in comparison with EKS.

	Low Visitors (Weekend)	Common Visitors (Weekday)
Data Depend	~1M	~5M
Min Latency (finest case)	73.3%	69.2%
Avg Latency (consultant workload)	51.0%	47.9%
Max Latency (worst case)	12.3%	34.7%
Complete Value	57.1%	45.2%

Word: Even with a conservative 40% value discount utilized to the EKS setting to account for Graviton, EMR-S stays roughly 15% cheaper.

Performance improvement graph

The benchmarking outcomes clearly exhibit that EMR Serverless outperforms OSS Spark on EKS for our end-to-end pipeline workloads. By transferring to EMR Serverless, we achieved:

Improved efficiency: Common latency diminished by greater than 50%, with constantly decrease min and max latencies.
Value effectivity: Total pipeline execution prices dropped by greater than half.
Scalability: Autoscaling optimized useful resource utilization, additional decreasing value throughout off-peak durations.
Operational overhead: EMR-S absolutely managed and serverless nature eliminates the necessity to preserve EKS and OSS Spark.

Conclusion

On this submit, we confirmed how Socure transitioning to EMR Serverless not solely resolved important points round value, reliability, and latency, but in addition supplied a extra scalable and sustainable structure for serving buyer POCs successfully, enabling us to ship outcomes to prospects quicker and strengthen our place for potential customized contracts.

Concerning the authors

Previous articleHugging Face Says AI Fashions With Reasoning Use 30x Extra Power on Common

Next articleSimplicity, service, scale – Vodafone IoT preps for world ‘hyperscale’

How Socure achieved 50% value discount by migrating from self-managed Spark to Amazon EMR Serverless

Motivation

The brand new pipeline structure

Benchmarking

Useful resource Configuration

EKS (OSS Spark):

EMR Serverless:

Observations

Definitions

Outcomes

Conclusion

Concerning the authors

Learn how to Entry and Use Qwen3-Coder-Subsequent?

Anthropic Superbowl Adverts Mock OpenAI; Sam Altman goes on Rant

40 Inquiries to Go from Newbie to Superior

LEAVE A REPLY Cancel reply

Most Popular

Learn how to Entry and Use Qwen3-Coder-Subsequent?

The ‘Tremendous Bowl’ customary: Architecting distributed techniques for enormous concurrency

New DIU Undertaking Targets Scalable, Containerized Drone Programs

German watchdog bans Amazon’s value controls on companions

Recent Comments

ABOUT US

POPULAR POSTS

Learn how to Entry and Use Qwen3-Coder-Subsequent?

The ‘Tremendous Bowl’ customary: Architecting distributed techniques for enormous concurrency

New DIU Undertaking Targets Scalable, Containerized Drone Programs

POPULAR CATEGORY