I Tried to Construct Picture Captioning App With OpenAI Codex CLI

April 21, 2025

44

OpenAI Codex CLI is an open‑supply command-line device that brings the facility of OpenAI’s newest reasoning fashions on to your terminal. Consider it as a light-weight AI coding assistant that lives in your shell: it could learn your code, modify recordsdata, and even execute instructions in your mission surroundings. This implies you possibly can ask it to construct options, repair bugs, or clarify unfamiliar code with out leaving your growth workflow. In brief, it’s chat-driven growth – you work together with Codex in pure language and it responds with code edits or command outcomes, successfully providing you with ChatGPT-level reasoning plus the power to run code and see outcomes in real-time. Sounds intriguing, proper? Additional on this article, I’ll let you know about entry it and use to your queries.

Key Options of OpenAI Codex CLI

OpenAI Codex CLI comes with a number of highly effective options that make it a helpful companion for builders. One of many greatest benefits of Codex CLI is that it runs solely in your native machine. Your supply code and recordsdata keep in your surroundings and aren’t uploaded wholesale to a cloud service. Solely your prompts and high-level context (like summarized diffs or related snippets) are despatched to the OpenAI API for producing responses. As a result of the CLI is open-source and works domestically, it offers you privateness and management by design – your workflow and code stay personal. This makes Codex CLI particularly interesting for codebases that you may’t or don’t need to share, whereas nonetheless leveraging highly effective AI help.

By integrating instantly into the terminal, Codex CLI suits naturally right into a developer’s day-to-day work. You may chat with the AI assistant proper subsequent to your git instructions, textual content editor, and construct instruments, which suggests much less context-switching in comparison with utilizing a separate chat interface. The device is designed for fast iteration: ask a query or give an instruction, let it suggest or apply a change, run the code, and repeat – multi function place.

Listed below are a few of the highlights:

1. Zero-Setup Set up

Codex CLI is extraordinarily straightforward to get operating. All you want is Node.js and an OpenAI API key – a single command like npm set up -g @openai/codex installs the CLI globally, with no different setup required. There’s no advanced configuration or surroundings fiddling; deliver your API key and it “simply works”. (You may even replace to the most recent model at any time with a easy codex –improve command.)

2. Terminal-Native Design

Codex runs solely in your terminal, so it looks like a pure extension of your shell surroundings. You may invoke it out of your mission listing and have it work together together with your native recordsdata and instruments. This terminal-native strategy means you don’t have to change to a browser or GUI – good for sustaining stream and context whereas coding. The CLI supplies an interactive chat-like interface in textual content, so that you see the AI’s responses (like code diffs or command outputs) proper within the console.

3. Multimodal Inputs

In contrast to plain text-only instruments, Codex CLI accepts multimodal inputs – you possibly can move not simply textual content prompts, but additionally photos equivalent to screenshots or diagrams to information the assistant. For instance, you possibly can drag a screenshot of an error message or a UI sketch into the terminal, and Codex can interpret it and act on it. This can be a distinctive functionality that lets the AI use visible info to generate or edit code accordingly. Underneath the hood, it makes use of vision-enabled fashions to know photos, enabling use instances like debugging from a screenshot of a stack hint or constructing a structure from a wireframe.

4. Wealthy Approvals Workflow

Codex CLI offers you fine-grained management over what it could do autonomously by means of a wealthy approval system. You may select between three modes (Counsel, Auto Edit, Full Auto) that decide whether or not the AI’s proposed code modifications or instructions are auto-executed or require your affirmation. This versatile workflow helps you to determine how hands-on you need to be: you can begin conservatively (handbook approvals for the whole lot) and dial as much as full automation for repetitive duties. We’ll dive deeper into these modes within the subsequent part, however the important thing level is that Codex gained’t make modifications you’re uncomfortable with – you’re at all times in command of approvals.

5. Native Execution and Privateness

All code execution and file modifying occurs on your machine, inside your mission’s surroundings. Except for the mannequin queries, nothing is shipped out – the CLI doesn’t add your codebase to OpenAI. This implies you keep full privateness. You may safely use Codex CLI on proprietary or delicate code realizing that the device isn’t retaining or sharing your information. Even when utilizing probably the most autonomous mode, Codex runs in a sandboxed surroundings with no community entry, guaranteeing any actions it takes keep native to your system. In brief, you get the advantages of an AI pair programmer with out giving up privateness or safety.

Codex CLI Modes that You Should Know

Github Hyperlink: openai/codex

A standout characteristic of Codex CLI is its approval workflow – basically, you determine how a lot freedom the AI has to make modifications or run instructions. There are three approval modes: Counsel, Auto Edit, and Full Auto. Every mode strikes a special stability between automation and consumer oversight, so you possibly can choose what suits your consolation stage for the duty at hand. Right here’s an outline of how they work:

1. Counsel Mode (Default)

That is probably the most conservative mode, preferrred for once you need to fastidiously evaluate the whole lot. The AI can learn your mission recordsdata and recommend code edits or terminal instructions, nevertheless it gained’t apply modifications or execute something with out your express approval . Basically, Codex will work together with you want an knowledgeable advisor: it’d suggest a patch diff for a bug repair or present a shell command to run checks, after which ask to your affirmation. Use Counsel mode for secure exploration – e.g. studying a brand new codebase or doing a code evaluate – the place you need to see suggestions however apply them manually .

2. Auto Edit Mode

In Auto Edit, Codex is allowed to routinely apply code modifications (it could edit/write to recordsdata by itself) however nonetheless should ask earlier than operating any shell instructions . This mode is nice for duties like refactoring or making repetitive edits throughout a codebase. You get the effectivity of the AI instantly modifying code for you, whereas retaining a checkpoint of management earlier than any program execution. For instance, Codex would possibly rewrite a perform in a number of recordsdata and save the modifications instantly, but when it needs to run your check suite or begin the dev server, it’ll pause and ask to your go-ahead. Auto Edit mode is a stability: quicker coding iterations, but you continue to supervise side-effects like instructions .

3. Full Auto Mode

Full Auto offers the AI probably the most autonomy. Codex can learn and write recordsdata and likewise execute shell instructions by itself with out stopping for approval . On this mode, it turns into a really automated agent – you possibly can ask it to carry out a fancy job after which sit again whereas it really works by means of the steps. To maintain issues secure, Full Auto runs in a restricted sandbox: all instructions are executed with community entry disabled and scoped to your mission listing (it could’t wander exterior or entry the web). This mode is right for longer duties the place you belief the AI to iterate, as an example, fixing a damaged construct or prototyping a brand new characteristic when you take a brief break . In fact, you must use Full Auto with warning – it’s highly effective, however you’ll need to make sure you’ve backed up or version-controlled your code (the CLI will really warn you should you’re not in a git repo when beginning Auto Edit or Full Auto) .

Comparability of Modes

The variations between the three modes are summarized within the desk beneath, together with typical use instances for every:

Mode	What the Agent Can Do	When to Use (Use Instances)
Counsel (default)	– Learn any recordsdata in your repo – Suggest edits and shell instructions (requires your approval to use/execute)	Secure exploration of codebases, code evaluations, studying a brand new mission’s construction the place you need full management over modifications .
Auto Edit	– Learn and modify recordsdata (apply edits routinely) – Suggest shell instructions (execution nonetheless requires approval)	Refactoring code or making bulk edits whereas keeping track of unwanted effects. Nice for repetitive modifications the place handbook file modifying is tedious however you continue to need to approve any instructions .
Full Auto	– Learn, write, and execute instructions autonomously (all actions auto-approved) – Runs in sandbox (no community, confined to mission listing)	Massive or time-consuming duties like fixing all checks in a damaged construct or scaffolding a brand new app from scratch. Helpful once you need to delegate execution solely to the AI (e.g. fast prototyping) .

In observe, you possibly can choose the mode that is sensible to your state of affairs. By default, should you simply run codex it begins in Counsel mode. To explicitly select a mode, you possibly can launch the CLI with a flag: for instance, use –auto-edit or –full-auto to begin in these modes . There’s additionally an interactive command (/mode) to toggle modes throughout a session . This fashion, you would possibly start in Counsel mode to see what Codex plans to do, then change to Auto Edit when you’re snug with its ideas, and possibly kick into Full Auto for the ultimate stretch of a job. The essential factor is that you management the extent of autonomy always.

System Necessities for Codex CLI

Earlier than putting in Codex CLI, make certain your growth surroundings meets the minimal necessities. The device is cross-platform, however at the moment works greatest on Unix-like methods. Listed below are the minimal and advisable specs:

Requirement	Minimal	Really useful
Working System	macOS 12+ or Ubuntu 20.04+/Debian 10+ (Linux)； Home windows 11 through WSL2	Newest OS updates (newest macOS or LTS Linux launch; Home windows with newest WSL2) for greatest compatibility.
Node.js	22 (or newer)	Newest LTS model of Node.js (>= 22) for stability.
Git (non-obligatory)	2.23+ (if utilizing model management options)	Latest Git out there (non-obligatory, however advisable for full performance like PR helpers).
Reminiscence (RAM)	4 GB minimal	8 GB or extra (for smoother efficiency on giant duties).

Codex CLI has been examined on macOS and Linux. Home windows customers can run it through WSL2 (Home windows Subsystem for Linux) since native Home windows help continues to be experimental . You’ll additionally want an OpenAI API key (out of your OpenAI account) to authenticate the CLI – we’ll cowl that subsequent. Except for these, no different particular {hardware} is required; should you can run trendy Node.js, you’re doubtless good to go.

Be aware: It’s advisable to have your mission underneath supply management (git) when utilizing Codex CLI, particularly for Auto modes. Whereas Git isn’t strictly required to run the CLI, having model management will permit you to simply evaluate modifications and rollback if wanted. Actually, Codex will remind you with a warning should you attempt to use Auto Edit or Full Auto in a listing that’s not a git repo

The right way to Use OpenAI Codex CLI?

Step 1: Set up Node.js

Obtain Node.js v22+ from nodejs.org.
Set up utilizing default settings.
Confirm set up:

bash
node --version  # Ought to present v22+
npm --version   # Ought to present v10

Step 2: Set up Codex CLI

bash
npm set up -g @openai/codex

Troubleshooting: In case you see permission denied errors:
- Home windows: Run PowerShell as Administrator.
- Linux/macOS: Use sudo npm set up -g @openai/codex (not advisable; repair npm permissions as a substitute).

Step 3: Set OpenAI API Key

For PowerShell (Home windows):

Powershell

$env:OPENAI_API_KEY = "your-api-key-here"

To make it everlasting:

Powershell

setx OPENAI_API_KEY "your-api-key-here"

For Git Bash/MINGW64:

bash
export OPENAI_API_KEY="your-api-key-here"

To make it everlasting, add to ~/.bash_profile:

bash
nano ~/.bash_profile  # Add "export OPENAI_API_KEY=..."
supply ~/.bash_profile

Step 4: Repair “sh.exe” Errors (Home windows Solely)

Set up Git for Home windows from git-scm.com.
Throughout set up:
- Choose “Use Git and Unix instruments within the Command Immediate”.
- Allow “Allow symbolic hyperlinks”.
Restart your terminal.

Step 5: Run Codex

Interactive Mode

Run interactively:

Codex

Fingers-on OpenAI Codex CLI to Construct Recreation and Picture Captioning APP

Activity 1: Primary Immediate Execution

I began with a easy job—asking Codex to jot down 2–3 sentences about myself. The CLI responded rapidly and precisely, producing coherent, grammatically sound output in simply seconds. It demonstrated robust immediate understanding and fluency, even with minimal enter.

Activity 2: Picture Captioning App with OpenAI Mannequin

Subsequent, I attempted constructing a extra advanced utility: a picture captioning device the place customers add a picture and obtain a descriptive caption generated by an OpenAI mannequin. Whereas Codex offered an honest place to begin, the code was outdated—referencing deprecated code and lacking key parts for file dealing with and mannequin integration. I needed to step in and replace the code myself. (I’ve included a screenshot for reference.) This highlighted a limitation: for newer or less-documented APIs, Codex would possibly fall again on older patterns or incomplete implementations.

Error with Codex CLI

Activity 3: Tetris Recreation with Python and Pygame

Output

For the ultimate job, I requested Codex to construct a Tetris recreation utilizing Python and Pygame. This time, it nailed it. The code was well-structured, absolutely useful, and required no main edits. The sport ran easily and included all of the core mechanics—block motion, rotation, line clearing, and scoring. A stable demonstration of Codex’s skill to deal with interactive, graphics-based tasks when working with well-established libraries like Pygame.

Use Instances for Codex CLI

Codex CLI can supercharge your growth workflow throughout a number of frequent duties:

Bug Fixing: While you hit a bug or failing check, use Counsel mode to ask issues like “Why is the login perform throwing an error?” Codex analyzes the code, spots points (like a unsuitable variable or lacking examine), and suggests fixes. You evaluate and approve the patch. For trickier points, Full Auto mode lets Codex repair a number of failures by iteratively operating checks and making use of modifications. You continue to confirm the outcomes, nevertheless it handles the heavy lifting.
Code Refactoring: Refactoring throughout recordsdata—like switching from callbacks to async/await—will be tedious. In Auto Edit mode, Codex can apply constant modifications all through your codebase. For instance, say “Refactor the API routes to async/await,” and it’ll deal with the file edits, pausing provided that wanted. You supervise the modifications through diffs, letting Codex do the grunt work when you oversee high quality.
Studying a New Codebase: Simply cloned a repo? Use Counsel mode to ask, “What does the Scheduler class do?” or “How does authentication work?” Codex reads the code and explains in plain language, serving to you navigate unfamiliar tasks rapidly. You may request summaries, perceive module tasks, and discover performance with out making modifications.
Prototyping and Scaffolding: Wish to kickstart a brand new mission or characteristic? Full Auto mode can generate code and set the whole lot up. Ask it to “Create a easy TODO net app in Flask,” and it’ll generate recordsdata, set up dependencies, and run the app—routinely. For brand spanking new options like “Add CSV export to this CLI device,” Codex writes and integrates the code, providing you with a working baseline to construct on.

Codex CLI acts like an AI pair-programmer—serving to with the whole lot from mundane edits to advanced automation. You management how hands-on or autonomous it’s, relying on the duty.

Conclusion

With the OpenAI Codex CLI, builders achieve a pleasant AI accomplice proper within the terminal – one that may purpose about code and deal with the mechanics of modifying and operating it. I’ve lined what Codex CLI is and the way it works, from its zero-effort set up to the intelligent approval modes that maintain you in management. You’ve seen get began and run some fundamental instructions, and the way it can assist in real-world use instances like fixing bugs, refactoring, studying codebases, and prototyping new concepts. In essence, Codex CLI brings the ChatGPT expertise into your growth surroundings, turning pure language directions into working code, all when you stay in cost. It’s an thrilling device that embodies the way forward for AI-assisted software program growth: quick, versatile, and constructed with developer empowerment in thoughts. Give it a strive in your subsequent mission!

Hello, I’m Pankaj Singh Negi – Senior Content material Editor | Captivated with storytelling and crafting compelling narratives that remodel concepts into impactful content material. I really like studying about know-how revolutionizing our life-style.

Login to proceed studying and luxuriate in expert-curated content material.

Previous articleCommerce tensions immediate European companies to rethink cloud methods

Next articleADDITIV Metals World Offers Metallic AM a Stage – 3DPrint.com