Revealed on: February 16, 2026
Some time again, my app crashed mid-workout on the fitness center. I uploaded the crash report, gave my AI agent some context, and went again to my set. By the point I completed, there was a pull request ready for me. I reviewed it, merged it, and had a set TestFlight construct on my gadget shortly after — with out ever opening Xcode.
That sort of turnaround is barely attainable due to the supply pipeline I’ve constructed round agentic engineering. And that is what this submit is about. Know that this submit would not introduce something revolutionary when it comes to how I work. However it is a setup that works nicely for me, and I feel at the moment, it is vital for people to get some insights into what others are doing as a substitute of seeing one more “I SHIP TONS OF AI CODE” submit.
I am hoping to be just a little extra balanced than that…
Agentic engineering (aka vibe coding) is rising in popularity by the day. Increasingly builders are letting AI brokers deal with massive components of their iOS tasks, and actually, I get it. It is extremely productive. However it comes with an actual danger: whenever you hand off the coding to an agent, high quality and structure can degrade quick if you do not have the proper guardrails in place.
On this submit, I wish to stroll you thru the pipeline I exploit to be sure that though I do agentic engineering, my product high quality stays strong (sure, it includes me studying the code and generally tweaking by hand).
We’ll cowl organising your native surroundings, why planning mode issues, automated PR evaluations with Cursor’s BugBot, operating CI builds and exams with Bitrise, and the magic of getting TestFlight builds land in your gadget virtually instantly after merging.
For those who’re curious about my broader ideas on balancing AI and high quality, you may get pleasure from my earlier submit on the significance of human contact in AI-driven growth.
Organising your native surroundings for agentic engineering
Every little thing begins domestically. Earlier than you even take into consideration CI/CD or automated evaluations, it’s good to make sure that your AI agent is aware of easy methods to write code the way in which you need it written. Crucial device for that is an brokers.md file (or your editor’s equal, like Cursor guidelines).
Consider brokers.md as a coding requirements doc in your agent. It tells the agent what language options to desire, easy methods to construction code, and what conventions to comply with. This is an instance of what mine seems like for an iOS challenge:
## Swift code conventions
- Use 2-space indentation
- Choose SwiftUI over UIKit until explicitly concentrating on UIKit
- Goal iOS 26 and Swift 6.2
- Use async/await over completion handlers
- Choose structured concurrency over unstructured duties
## Structure
- Use MVVM with Observable for view fashions
- Maintain views skinny; transfer logic into view fashions or devoted companies
- By no means put networking code straight in a view
## Testing
- Write exams for all new logic utilizing Swift Testing
- Run exams earlier than making a pull request
- Choose testing habits over implementation particulars
Add this file to the basis of your Xcode challenge, and Xcode 26.3’s agent will choose up your guidelines too.
This file is simply a place to begin. The factor is, your brokers.md is a dwelling doc. Each time the agent does one thing you do not like, you add a rule. Each time you discover a sample that works nicely, you codify it. I replace mine continuously.
For instance, I’d discover my agent creating new networking helper courses as a substitute of utilizing the APIClient I already had. So I can add a rule: “At all times use the present APIClient for community requests. By no means create new networking helpers.”. From that second on, the agent ought to honor my preferences and use current code as a substitute of including new code.
Past guidelines, it’s also possible to equip your agent with expertise. A ability is a standalone Markdown file that teaches the agent a few particular matter in depth. The place brokers.md units broad guidelines and conventions, a ability often comprises detailed patterns for easy methods to construction issues like SwiftUI navigation, deal with Swift Concurrency safely, or work with Core Knowledge. Xcode 26.3 even has an MCP (you’ll be able to roughly consider that as a predecessor of expertise) that may assist brokers discover documentation, greatest practices, and extra.
Your native surroundings is the inspiration. Every little thing that comes after (PR evaluations, CI, TestFlight) is determined by the agent producing cheap code within the first place.
Planning earlier than constructing
That is the step that, in my view, carries a ton of worth however is straightforward to skip.
For those who use Cursor (or the same device), you most likely have entry to a planning mode. As an alternative of letting the agent soar straight into writing code, you ask it to make a plan first. The agent outlines what it intends to do — which recordsdata it’s going to change, what method it’s going to take, what tradeoffs it is contemplating — and also you evaluation that plan earlier than giving the inexperienced gentle.
The distinction between “fireplace off a immediate and hope for the perfect” and “evaluation a plan, then execute” is large. Once you evaluation the plan, you catch unhealthy architectural choices earlier than they turn into unhealthy code. You possibly can steer the agent towards the proper method with out having to undo a bunch of labor.
Planning may also make it extra apparent if the agent misunderstood you. For instance, in case your immediate is not tremendous focused to sort out all ambiguity up-front, the agent may confidently assume you meant one factor when you meant one other. A humorous instance is “persist this information on gadget” and the agent assumes “write to consumer default” whenever you meant “create Swift Knowledge fashions”. You possibly can typically catch this stuff in planning mode and repair the agent’s trajectory.
In apply, my workflow seems like this: I describe what I would like in planning mode, the agent proposes an method, I give suggestions or approve, and solely then does the agent change to implementation. Going by means of planning first can really feel sluggish however often I discover that it makes the output so a lot better that it is 100% price it.
For instance, after I needed so as to add a streaks characteristic to Maxine, the agent proposed creating a wholly new information mannequin and consider mannequin from scratch. Within the plan evaluation, I seen it was going to duplicate logic I already had in my exercise historical past queries. I steered it towards reusing that current information layer, and the consequence was cleaner and extra maintainable. With out the planning step, I’d have ended up with redundant code that I would have to wash up later.
Automated PR evaluations with BugBot
As soon as the agent has written code and I’ve carried out a fast test to evaluation modifications, I run the code on my gadget to ensure issues feel and appear proper. As soon as I log out, the agent could make a PR on my repo. If the agent is operating within the cloud, I skip this step totally and the agent will make a PR instantly when it thinks it is carried out.
That is the place BugBot is available in. BugBot is a part of Cursor’s ecosystem and it routinely evaluations your pull requests. It seems for logic points, edge instances, and unintended modifications that I’d miss throughout a fast scan. It could even push fixes on to the PR department.
BugBot has been invaluable in my course of as a result of though I do my very own PR evaluation, the entire level of agentic engineering is to let the agent deal with as a lot as attainable. My purpose is to kick off a immediate, shortly eyeball the consequence, run it on my gadget, and transfer on. BugBot acts as an automatic security web that catches what I may not.
Let me offer you two examples from Maxine. The primary is about edge instances. Maxine recovers your exercise if the app crashes. BugBot flagged that there was a attainable situation the place, if the consumer tapped “begin exercise” earlier than the restoration accomplished, the app would try to start out a Watch exercise twice. Actually, I thought of this situation almost unimaginable in apply — however the code allowed it. As an alternative of counting on what I could not realistically take a look at, BugBot added safeguards to ensure this path was dealt with correctly. That is precisely the sort of factor I would by no means catch throughout a fast eyeball evaluation.
The second is about unintended modifications. I as soon as had a PR the place I had left behind a number of orphaned debugging properties. BugBot noticed them as “most likely not a part of this modification” — the PR description the agent had written did not point out them (as a result of I did the debugging myself), and no code really referenced these properties. BugBot eliminated them. Small factor, nevertheless it’s the sort of cleanup that retains your codebase tidy whenever you’re shifting quick and reviewing shortly.
Working builds and exams with Bitrise
Despite the fact that the agent runs exams domestically earlier than I ever see the code, I desire a second layer of confidence. That is the place CI is available in. I exploit Bitrise for this, however the identical workflow ideas apply to Xcode Cloud, GitHub Actions, or any CI supplier that may run xcodebuild.
This step is much more vital for my cloud based mostly brokers as a result of these do not get entry to xcodebuild in any respect.
I’ve two Bitrise workflows arrange for my tasks, every triggered by totally different occasions.
The take a look at workflow (runs on each PR)
The primary workflow is a test-only pipeline that triggers every time a pull request is opened or up to date. The steps are minimal:
- Clone the repository
- Resolve Swift packages
- Run the take a look at suite with
xcodebuild take a look at
That is it. No archiving, no signing, no importing. The one job of this workflow is to reply one query: do the exams nonetheless cross? If one thing the agent wrote (or one thing BugBot mounted) breaks a take a look at, I do know earlier than I merge. And I can inform an agent to go repair no matter Bitrise reported.
I set this up as a set off on pull requests concentrating on my foremost department. Bitrise picks up the PR routinely, runs the workflow, and stories the consequence again as a GitHub standing test. If it is crimson, I do not merge.
The discharge workflow (runs on merge to foremost)
The second workflow triggers when one thing is pushed to foremost — which in apply means when a PR is merged. This one does considerably extra:
- Clone the repository
- Resolve Swift packages
- Run the complete take a look at suite
- Archive the app with launch signing
- Add the construct to App Retailer Join
The take a look at step may really feel redundant since we already examined on the PR, however I like having it right here as a closing security web. Merges can sometimes introduce points (particularly if a number of PRs land shut collectively), and I would somewhat catch that earlier than importing a damaged construct.
The archive and add steps use Bitrise’s built-in steps for Xcode archiving and App Retailer Join deployment. You arrange your signing certificates and provisioning profiles as soon as in Bitrise’s code signing tab, and from that time on, each merge produces a signed construct that goes straight to TestFlight.
Why exams matter much more with AI
Having a strong take a look at suite might be essentially the most impactful factor you are able to do for agentic engineering. Your exams act as a contract. They inform the agent what appropriate habits seems like, they usually catch regressions in CI even when the agent’s native run by some means missed one thing. Higher exams imply extra confidence, which implies you’ll be able to let the agent deal with extra.
By the point I really hit “merge” on a pull request, the code has been by means of: native exams by the agent, my very own fast evaluation, BugBot’s automated evaluation, and a inexperienced Bitrise construct. That is loads of confidence for little or no handbook effort.
The magic of quick TestFlight suggestions
That is the place every thing I wrote about to this point comes collectively. As a result of the discharge workflow uploads each merge to App Retailer Join routinely, each single merge to foremost leads to a TestFlight construct — no handbook intervention required. You do not open Xcode, you do not archive domestically, nothing. You merge, and some minutes later there is a new construct in TestFlight. This closes the loop from “I had an concept” to “I’ve a construct on my gadget” with minimal friction.
Once you’re testing your app within the discipline and also you discover one thing you wish to tweak — a structure that feels off, a label that is unclear, a circulate that is clunky — you’ll be able to typically simply inform your agent what to repair. If the change is straightforward sufficient and also you’re good at prompting and planning, you’ll be able to have a brand new construct in your gadget surprisingly shortly. By means of your native planning, by means of the PR, by means of Bitrise, and onto your gadget through TestFlight.
Let’s return to the instance from the intro of the submit…
Throughout certainly one of my exercises with Maxine the app crashed. Proper there within the fitness center, I pulled up Cursor, uploaded the crash report that TestFlight gave me, added some context about what I used to be doing within the app, and kicked off a immediate. Then I simply resumed my exercise.
By the point I used to be carried out, there was a PR ready for me. The repair wasn’t good — I needed to nudge a number of issues — however the bulk of the work was carried out. I merged it, Bitrise picked it up, and I had a brand new TestFlight construct shortly after. All whereas I used to be targeted on my exercise, not on debugging.
That is what occurs when each piece of the pipeline is automated. The agent writes the repair, BugBot evaluations it, Bitrise exams and builds it, and TestFlight delivers it. Your job is to steer, to not crank.
Abstract
Agentic engineering doesn’t suggest giving up on high quality. It means constructing the proper guardrails so you’ll be able to transfer quick with out breaking issues.
The pipeline I exploit seems like this: a well-maintained brokers.md and AI expertise set the inspiration domestically. Planning mode ensures the agent’s method is sound earlier than it writes a line of code. BugBot catches points in pull requests that I’d miss. Bitrise runs exams on each PR and archives plus uploads on each merge to foremost. And TestFlight delivers the consequence to my gadget routinely.
Every bit reinforces the others. With out good native setup, the agent writes worse code. With out planning, it makes unhealthy architectural choices. With out BugBot and Bitrise, bugs slip by means of. With out computerized TestFlight uploads, the suggestions loop is simply too sluggish to be helpful.
To be clear: this pipeline would not catch every thing. An agent can nonetheless write code that passes all exams however is architecturally questionable, and BugBot will not at all times flag it. You continue to must evaluation and assume critically. However the mixture of all these layers critically cuts down the danger of delivery one thing damaged — and that is the purpose. It is about decreasing danger, not eliminating it.
For those who’re prototyping or simply exploring an concept, you most likely do not want all of this instantly. However the second you might have actual customers relying in your app, this type of pipeline pays for itself. Set it up as soon as, iterate in your brokers.md as you go, and you can transfer quick with out sacrificing the standard your customers count on.

