Mohammad Abu Sheikh is reworking the AI panorama within the MENA area, driving a shift from passive consumption to sovereign innovation. As CEO of CNTXT AI and founding father of a $10 million AI fund, he has led three profitable exits and secured over a billion {dollars} in funding. His work is laying the muse for an AI ecosystem rooted in language, tradition, and information sovereignty.
We noticed the abundance of underutilized information on this a part of the world. A variety of issues in scaling AI got here from the dearth of information readiness â which finally meant an absence of AI readiness. Thatâs why we began CNTXT AI.
Initially, we have been fixing the identical issues we confronted whereas constructing LocAIâŠWe noticed these challenges firsthand working with AI71, TII and G42 (IIAI). As we helped these entities clear up these issues, the imaginative and prescient bought clearer and the enterprise simply saved rising.
Youâve performed a key function in constructing the biggest Arabic digital library for AI coaching. What have been a few of the greatest challenges in doing so, and the way did you overcome them?
High quality was one of many greatest challenges. One other was the restricted availability of high-quality Arabic information on-line: Arabic is significantly underrepresented. Solely a small portion of Arabic-language content material has been digitized, and simply 3â5% of all on-line content material is in Arabic. Thatâs nearly nothing. We overcame that downside by deploying information labelers, annotators, and information scientists to digitize, create, and curate the info ourselves.
CNTXT AI operates on the intersection of tradition and computation. How do you steadiness cutting-edge AI innovation with the purpose of constructing culturally related options for the MENA area?
We construct culturally grounded fashions from the bottom up. From infrastructure to ultimate product, tradition is embedded from the very starting â itâs not one thing we add later. We design, innovate, and construct with particular cultures, dialects, and wishes in thoughts from day one. Arabic is one language, however it carries many dialects and cultural contexts throughout the area, so we construct native merchandise for native international locations. And we do this by working with native annotators, individuals on the bottom, in their very own international locations.
You have additionally co-founded LocAI and lead the SMPL AI Fund. How do these ventures complement the mission of CNTXT AI?
LocAI is the applying layer â the half individuals truly work together with. It sits proper on high of the info and infrastructure constructed by CNTXT AI. Thatâs what made it profitable: it transforms AI foundations offered by CNTXT AI into real-world options individuals can use.
SMPL AI, then again, is about giving again to the neighborhood. It focuses on investing in early-stage startups and serving to construct the regional AI ecosystem. We share the instruments and classes weâve realized from constructing AI ourselves, so founders can develop sooner and keep away from frequent pitfalls.
Munsit has been referred to as essentially the most correct Arabic speech recognition mannequin on the planet. What drove the event of this mannequin, and why now?
What drove the event of this mannequin was easy: the necessity.
We at all times construct out of necessity. We seemed on the market and noticed the panorama was ripe â authorities companies and personal shoppers have been all asking for an answer like this.
The present fashions simply werenât as much as the duty. Most are constructed on English tech after which tailored. They arenât designed for Arabic from the bottom up, and positively not for the precise issues weâre fixing.
So we determined to construct our personal. Itâs Arabic first â by design.
The analysis behind Munsit introduces a weakly supervised studying strategy. Are you able to clarify what meaning and why it was important for coaching Arabic ASR at scale?
Annotation is dear. So we needed to transfer past conventional strategies that depend upon giant quantities of guide transcription. Weakly supervised studying helped us scale with out having to label each audio file by hand â which is very essential for Arabic, a language with restricted information and many alternative dialects.
As an alternative of utilizing professionally transcribed audio, we began with 30,000 hours of unlabeled Arabic speech. We constructed an annotation pipeline that generates, filters and cleans the most effective ones utilizing automated checks. This gave us a high-quality 15,000-hour dataset â all with out human transcription.
This strategy made it attainable to coach our mannequin from scratch, capturing the richness of spoken Arabic throughout real-life conditions, shortly and cost-effectively. With out this methodology, constructing an Arabic ASR system at this scale would have taken years and thousands and thousands in guide effort.
Munsit outperformed fashions from OpenAI, Microsoft, and Meta throughout a number of benchmarks. What does this achievement say about the way forward for Arabic AI innovation?
The way forward for Arabic AI is in our palms; and thatâs precisely what this achievement proves. We will not afford to depend on applied sciences we donât personal or depend upon third events who donât prioritize our area.
Munsit exhibits that we are able to construct world-class AI, from the area, for the area â utilizing native expertise to resolve native issues. Itâs a transparent sign that the following wave of Arabic AI innovation will come from inside.
How do you see Munsit evolving in future variations, and what are the following frontiers for Arabic voice AI at CNTXT?
Youâll simply have to attend and see. What I can say is that we’ve a contemporary, new suite of Arabic-first AI options on the best way â all powered by Munsit and different fashions weâre presently constructing at CNTXT AI. That is only the start.
You usually communicate concerning the significance of âsovereign AI.â What does that time period imply to you, and why is it crucial for the Gulf and broader MENA area?
To me, sovereign AI means having full possession and management over the info, infrastructure, and fashions that form our future. Itâs crucial as a result of we have to personal our personal destiny, and that begins with information.
Knowledge sovereignty is the whole lot. Knowledge is treasured, and we want to verify it stays in our palms.
We willât afford handy over our future and sit idle whereas others construct the know-how for us. The way forward for AI on this area will come from this area. Thatâs precisely what weâre working towards.
How do you see CNTXT AI shaping the AI ecosystem within the Center East over the following 5 years?
By enabling true AI readiness. We go in, perceive what corporations and governments want, construct the info and AI methods, after which assist them construct, check, deploy and scale.
If information is the brand new oil, then unstructured information is oil unrefined â stuffed with potential however ineffective till processed. Thatâs why weâve constructed CNTXT AI to assist organizations clear, construction, and activate their information. As a result of thatâs the place actual AI transformation begins.
Out of your vantage level as each an entrepreneur and investor, what recommendation would you give to different founders constructing AI startups in rising markets?
Begin now. Transfer shortly. Fail quick, be taught sooner, and hold iterating.
Most significantly, construct for actual issues. Keep near the bottom â hearken to customers, not simply the hype. In rising markets, relevance and adaptableness are key.
Thanks for the nice interview, readers who want to be taught extra ought to go to CNTXT AI.