Introduction
Operating Massive Language Fashions (LLMs) and different open-source fashions regionally gives important benefits for builders. That is the place Ollama shines. Ollama simplifies the method of downloading, organising, and working these highly effective fashions in your native machine, providing you with higher management, enhanced privateness, and lowered prices in comparison with cloud-based options.
Whereas working fashions regionally gives immense advantages, integrating them with cloud-based tasks or sharing them for broader entry is usually a problem. That is exactly the place Clarifai Native Runners are available in. Native Runners allow you to show your regionally working Ollama fashions by way of a public API endpoint, permitting seamless integration with any challenge, anyplace, successfully bridging the hole between your native atmosphere and the cloud.
On this put up, we’ll stroll via the right way to run open-source fashions utilizing Ollama and expose them with a public API utilizing Clarifai Native Runners. This makes your native fashions accessible globally whereas nonetheless working completely in your machine.
Native Runners Defined
Native Runners allow you to run fashions by yourself machine, whether or not it is your laptop computer, workstation, or on-prem server, whereas exposing them via a safe, public API endpoint. You needn’t add the mannequin to the cloud. The mannequin stays native however behaves prefer it’s hosted on Clarifai.
As soon as initialized, the Native Runner opens a safe tunnel to Clarifai’s management aircraft. Any requests to your mannequin’s Clarifai API endpoint are routed to your machine, processed regionally, and returned to the caller. From the skin, it features like some other hosted mannequin. Internally, all the things runs in your {hardware}.
Native Runners are particularly helpful for:
- Quick native improvement: Construct, take a look at, and iterate on fashions in your individual atmosphere with out deployment delays. Examine site visitors, take a look at outputs, and debug in actual time.
- Utilizing your individual {hardware}: Reap the benefits of native GPUs or customized {hardware} setups. Let your machine deal with inference whereas Clarifai manages routing and API entry.
- Non-public and offline information: Run fashions that depend on native recordsdata, inside databases, or personal APIs. Preserve all the things on-prem whereas nonetheless exposing a usable endpoint.
Native Runners provides you the pliability of native execution together with the attain of a managed API, all with out giving up management over your information or atmosphere.
Expose Native Ollama Fashions by way of Public API
This part will stroll you thru the steps to get your Ollama mannequin working regionally and accessible by way of a Clarifai public endpoint.
Stipulations
Earlier than we start, guarantee you will have:
Step 1: Set up Clarifai and Login
First, set up the Clarifai Python SDK:
Subsequent, log in to Clarifai to configure your context. This hyperlinks your native atmosphere to your Clarifai account, permitting you to handle and expose your fashions.
Observe the prompts to enter your Person ID and Private Entry Token (PAT). Should you need assistance acquiring these, discuss with the documentation right here.
Step 2: Set Up Your Native Ollama Mannequin for Clarifai
Subsequent, you’ll put together your native Ollama mannequin so it may be accessed by Clarifai’s Native Runners. This step units up the mandatory recordsdata and configuration to show your mannequin via a public API endpoint utilizing Clarifai’s platform.
Use the next command to initialize the setup:
This generates three key recordsdata inside your challenge listing:
-
mannequin.py
-
config.yaml
-
necessities.txt
These outline how Clarifai will talk together with your regionally working Ollama mannequin.
You can even customise the command with the next choices:
-
--model-name
: Title of the Ollama mannequin you wish to serve. This pulls from the Ollama mannequin library (defaults tollama3:8b
). -
--port
: The port the place your Ollama mannequin is working (defaults to23333
). -
--context-length
: Units the mannequin’s context size (defaults to8192
).
For instance, to make use of the gemma:2b
mannequin with a 16K context size on port 8008
, run:
After this step, your native mannequin is able to be uncovered utilizing Clarifai Native Runners.
Step 3: Begin the Clarifai Native Runner
As soon as your native Ollama mannequin is configured, the following step is to run Clarifai’s Native Runner. This exposes your native mannequin to the web via a safe Clarifai endpoint.
Navigate into the mannequin listing and run:
As soon as the runner begins, you’ll obtain a public Clarifai URL. This URL is your gateway to accessing your regionally working Ollama mannequin from anyplace. Requests made to this Clarifai endpoint can be securely routed to your native machine, permitting your Ollama mannequin to course of them.
Operating Inference on Your Uncovered Mannequin
Along with your Ollama mannequin working regionally and uncovered by way of Clarifai Native Runner, now you can ship inference requests to it from anyplace utilizing the Clarifai SDK or an OpenAI-compatible endpoint.
Inference utilizing OpenAI suitable methodology
Set your Clarifai PAT as an atmosphere variable:
Then, you should utilize the OpenAI shopper to ship requests:
For multimodal inference, you’ll be able to embody picture information:
Inference with Clarifai SDK
You can even use the Clarifai Python SDK for inference. The mannequin URL could be obtained out of your Clarifai account.
Customizing Ollama Mannequin Configuration
The clarifai mannequin init --toolkit ollama
 command generates a mannequin file construction:
ollama-model-upload/
├── 1/
│ └── mannequin.py
│
├── config.yaml
└── necessities.txt
You may customise the generated recordsdata to regulate how your mannequin works:
-
1/mannequin.py
– Customise to tailor your mannequin’s habits, implement customized logic, or optimize efficiency. -
config.yaml
– Outline settings resembling compute necessities, particularly helpful when deploying to devoted compute utilizing Compute Orchestration. -
necessities.txt
– Listing any required Python packages to your mannequin.
This setup provides you full management over how your Ollama mannequin is uncovered and used by way of API. Consult with the documentation right here.
Conclusion
Operating open-source fashions regionally with Ollama provides you full management over privateness, latency, and customization. With Clarifai Native Runners, you’ll be able to expose these fashions by way of a public API with out counting on centralized infrastructure. This setup makes it straightforward to plug native fashions into bigger workflows or agentic methods, whereas retaining compute and information absolutely in your management. If you wish to scale past your machine, try Compute Orchestration to deploy fashions on devoted GPU nodes.