Jit- announcement icon

Announcing Jit’s AI Agents: Human-directed automation for your most time-consuming AppSec tasks.

Read the blog

In this article

From Python to Prompts: Becoming an AI-First Developer

a man in a black shirt is posing for a picture
By Sahar Carmel

Published May 25, 2025.

From Python to Prompts: Becoming an AI-First Developer

As part of the DevSecNext AI series, Jit hosted Sahar Carmel—Principal AI Engineer at Flare—for an inside look into what it really takes to become an “AI-first” developer. With nearly a decade of experience in AI and machine learning, Sahar has been hands-on with copilots and agents long before they were mainstream. In this session, he walks through his radical shift in workflow: from writing code line-by-line to orchestrating prompts, tokens, and memory banks. His goal is to become not just a 10x engineer—but a 100x one.

Watch the video here:



In this guest post, Sahar will share his perspective on what it truly means to become an AI-First Developer.

The Mindset Shift: From Programming to Prompting

Let me start with a bold claim: most of my code is now written by large language models.

In my daily workflow, coding means communicating with AI models in natural language, not wrangling syntax. “Programming in English” has replaced programming in Python or TypeScript. Copilots have become compilers—not deterministic ones like GCC, but probabilistic systems that require context and strategic input to generate consistent results.

This shift comes with a new skill set: understanding how large language models (LLMs) work under the hood. Knowing what a token is—and how models interpret sequences—is essential. In this post, I’m going to break down the history of tokenization, context limits, and why your copilot sometimes fails at counting characters: it’s not working with words or characters, but tokens with unpredictable segmentation (yes, even a double space is its own token).

Context is King

At the core of effective AI development lies context management. Short-term memory—the immediate context window the model sees during a chat session—is everything. While LLMs come pre-trained on a massive corpus of internet data (long-term memory), they need precise, timely information to be effective in specific tasks.

LLMs have no memory unless I give them one. That’s not a bug—it’s leverage.

This means that every coding session is a clean slate unless I explicitly pass context. So I build memory banks: structured prompts that teach the model about the codebase, tech stack, naming conventions, product goals, and what I’m currently working on.

This typically includes:

  • Product and technical context

  • Design patterns

  • Active feature work

  • Progress summaries

  • System messages and instructions

With tools like client (a local AI shell) and Claude’s artifacts, I build persistent knowledge structures that copilots can reference across sessions. This solves a common frustration: starting a new chat and having to re-teach the assistant everything. Instead, the memory bank acts like a persistent onboarding doc—shared across tools, teams, and time.

This short-term memory isn’t a nice-to-have—it’s my working memory, my local cache. Without it, I’d waste time re-explaining React 18 to a model stuck in 17. Once seeded properly, my copilot doesn’t feel like a chatbot—it feels like a teammate who’s been onboarded.

And yes, context windows are limited. Around 200K tokens on high-end models, and less on others. When a session loops or stalls, I don’t fight it—I open a new one, reload the memory, and keep going.

Parallelism and Elasticity

One of the biggest advantages in my workflow is the ability to run multiple LLM agents in parallel. While I focus on a core feature, copilots handle documentation, refactors, CI/CD config, or even task decomposition. I can explore several solutions in parallel and choose the best—something no human can do at scale.

If I’m stuck waiting for an agent, I launch another. We can now develop in parallel. Humans are lazy. LLMs are obedient.

I also encourage developers to stop limiting themselves to languages they already know. With copilots, switching to a new language or framework is as simple as describing the task clearly. In my view, AI-first engineers don’t specialize by language—they specialize in clarity of thought.

Choosing the Right Models: Thinkers vs. Executors

Not all LLMs are created equal. I divide them into two types:

  • Thinking models like Claude 3 or GPT-4, which are better at high-level planning and reasoning

  • Execution models like GPT-3.5, which are faster and cheaper for generating actual code

I often start by planning, such as diagrams, architecture design, or exploratory work––I use Claude or GPT-4 in “Ask” mode because they tend to be slower, smarter, and more context-aware. Then, when I’m ready to build, I switch to GPT-3.5 in “Act” mode to execute—which is fast, cheap, and surprisingly capable if you feed it the right input.

Tools like Cursor and client let me fluidly switch between these modes.

Tools of the Trade

I prefer tools that expose what’s happening under the hood. I want to know exactly what’s in the prompt, what’s getting dropped, and when memory is getting compressed. That transparency matters.

I use a mix of tools, depending on the task:

  • client: a shell-based tool that plugs directly into API keys, offering full transparency and control

  • Claude: best for planning, diagrams, and in-depth reasoning

  • Copilot: occasionally useful, though limited by rate caps and less control

  • Cursor: solid for integrated IDE workflows, but lacks transparency around context optimization

Supercharging with MCPs (Memory Context Plugins)

One of the most powerful things I’ve added to my workflow is MCPs. These let agents pull in real-world context: GitHub PR comments, browser states, package documentation, Notion pages, Postgres schemas—you name it.

Once you wire this up, LLMs stop being assistants and start acting like independent contributors. You’re not just building with code—you’re building with context-aware agents that can act on your behalf.

Now my agents can:

  • QA frontend apps by interacting with the browser

  • Resolve PR comments using GitHub metadata

  • Load and follow real-time docs for any library

  • Even use my authenticated browser session to take actions (yes, it shops for me)

One of my go-to MCPs is Context7, which fetches up-to-date documentation for packages like uv, which I prefer over pip.

Building with and for Agents

At some point, I realized the off-the-shelf copilots weren’t enough. So I started building my own. A prompt-engineered companion trained on my patterns, my projects, my workflows.

It’s not complicated—just structured prompts, a consistent memory bank, and a few open-source hooks. But the performance difference is wild. These agents now feel like pairing with someone who’s been on my team for years.

That’s where I think this is heading. Everyone will have their own stack of copilots tuned to their domain, language, and style.

Final Thoughts

The shift to AI-first development isn’t theoretical anymore—it’s tactical, it’s personal, and it’s already reshaping how I build software every day. Once you stop treating copilots as tools and start treating them as teammates, everything changes. The bottlenecks aren’t in the code—they’re in how we prompt, how we manage context, and how we scale ourselves. The future isn’t about writing better code—it’s about designing better systems of thought. And if you’re willing to make that leap, you won’t just move faster. You’ll build in ways you didn’t think were possible.