AI

When should you fire your AI Coding Assistant?

Coding Assistants in Action: WindSurf, Cursor and Roo’s Approach to Complex Feature Implementation …and how Generative AI changes Software Engineering

Diagram explaining when to use AI coding assistants, direct LLM usage, or manual coding. Highlights how AI handles routine coding while the profession splits between system thinkers and pure coders, widening the skill gap.

Coding Assistants in Action: WindSurf, Cursor and Roo’s Approach to Complex Feature Implementation …and how Generative AI changes Software Engineering

In this Part 2 of our “What it takes to make AI Native Analytics work in the real world” series (Part 1 here) we’ll take an in-depth look at how LLMs ar are changing Software Engineer’s daily work. We’ll explore the practical differences between WindSurf, Cursor and Roo, identifying when each tool excels, when direct LLM interaction proves more effective, and when traditional manual coding remains the superior choice.

The section titled “When to Fire Your AI Assistant: The Hidden Architecture of Cursor and Windsurf ” was originally published by Avigad Oron, Ask-Y’s Head of Technology here. Please follow Avigad if you’d like to get notified about his next posts.

Software Engineering represents one of the domains most profoundly transformed by LLMs, and we believe Analytics will follow a similar trajectory. This article investigates how generative AI is reshaping Software Engineering practices, setting the stage for our next piece where we’ll extrapolate these lessons to predict the coming disruption in Analytics workflows.

Summary graphic showing when to use AI coding assistants, direct LLM usage, or manual coding, along with notes on how AI is changing the coding profession and widening the skills gap.

In software development, a coding task often arrives with predefined parameters established by product managers and architects. This includes a relatively clear scope of work, product requirements, system architecture design, and specifications for how the component will function and interface with other systems.

Before coding begins, ambiguities are usually researched and resolved through experiments, planning sessions and documentation. The coding process itself is essentially the technical implementation phase where developers transform these predetermined specifications into functional code.

LLMs excel at coding tasks by leveraging more than simple pattern matching. Recent interpretability research shows these models utilize sophisticated computational circuits with parallel processing pathways.

Diagram showing how large language models process input code patterns through interconnected circuits of syntax, logic, state, and flow to generate new code, highlighting semantic relationships and parallel processing.
Diagram explaining how LLMs transform code using pattern recognition, improved naming, and two processing levels—syntax handling structure and semantics interpreting meaning.

Their architecture allows them to represent programming concepts at multiple levels of abstraction simultaneously—from syntax patterns to semantic relationships—while planning several tokens ahead.

This structured nature of software development, with its defined syntax and established patterns, maps well to how LLMs internally process information, enabling them to generate coherent code by activating interconnected feature networks representing programming knowledge, even demonstrating abilities to maintain consistent variable naming and logical flow across functions.

The structured approach outlined in these steps creates an ideal environment for LLMs to excel at code development:

Flowchart showing a five-step structured approach for code development: analyzing existing code, designing new components, managing dependencies and UI, improving state management and security, and finally testing, optimizing, and deploying.

Index and analyze existing code: LLMs excel when they can reference existing patterns in codebases. This provides concrete examples of integration points, naming conventions, and architecture decisions they can follow rather than inventing solutions.

Design and build new components: Component-based architectures (React, Angular, microservices) give LLMs clear boundaries to work within. When building a speech transcription component, for example, the LLM can focus on specific inputs/outputs without reengineering the entire system.

Manage dependencies and implement UI changes: Package managers, import systems, and declarative UI frameworks create predictable patterns LLMs can recognize and implement. They can add speech input controls following existing UI conventions without reinventing interface design principles.

Enhance state management and security: Established patterns like Redux, Context API, or authentication libraries provide templates LLMs can apply, reducing ambiguity around how to maintain application state during speech processing or secure transcribed content.

Test, optimize, and deploy: Automated testing frameworks and CI/CD pipelines give LLMs immediate feedback on code correctness, allowing for systematic improvement through clear signals.

LLMs thrive in coding tasks because modern software development naturally breaks problems into discrete, well-defined units with established patterns and interfaces.

With strongly-typed languages, linters, compiler errors, and comprehensive documentation, LLMs don’t need to innovate fundamental technologies or resolve ambiguous requirements. They can instead focus on implementation within proven frameworks, making them particularly effective at translating specific requirements into working code despite the underlying complexity.

If you have never tried extensive, production-worthy coding with LLMs, here’s an example of how it can accelerate your process when used proficiently. Here is a chart showing the time it would typically take to implement a feature like “Speech to text” in a complex application (time are real based on our experience):

Bar chart comparing development time for a speech-to-text feature: 40 hours with prior experience, 8–16 hours using an LLM directly, 6 hours with LLM assistance, and 3 hours using a coding assistant.
  • A skilled engineer without prior experience in speech-to-text: approximately 1 week
  • An engineer with relevant prior experience: 1-2 days
  • Using a leading  LLM directly (like Claude Sonnet 3.7): about 6 hours
  • Using a dedicated coding assistant like WindSurf or Cursor: approximately 3 hours

This dramatic acceleration in development time illustrates the power of AI-assisted coding and provides a useful reference point for understanding what AI-native analytics platforms must achieve in their own domain.

Note: this is not a “Windsurf vs Cursor vs…” comparison, we use several assistants and they’re all great. Looking across coding assistants allows us a more complete understanding of how they approach challenges with complex workflows and when to use assistants vs LLMs vs manual code.

When to Fire Your AI Assistant: The Hidden Architecture of Cursor and Windsurf

…and how it informs the new coding paradigm for Software Engineers

This section was originally published by Avigad Oron, Ask-Y’s Head of Technology here. Please follow Avigad if you’d like to get notified about his next posts.

“Why is Windsurf failing at this cross-component change?” “Would Claude do better for this complex business logic?” “Should I just write this code myself?” These are questions every software engineer has—or should have—as we navigate the new reality of AI-augmented development. In my engineering practice, 80% of my code now comes from AI—but the critical insight isn’t that AI can write code, it’s understanding when to use which tool and why they succeed or fail in different scenarios.

Diagram showing that 80% of coding is done using AI coding assistants or LLMs directly, while 20% is done through manual coding, with a note emphasizing the importance of knowing when to use each tool.

Beneath the surface, Cursor and  Windsurf, and Roo Code operate on multi-model architectures that create both opportunities and limitations. Each uses different approaches to solve the fundamental challenge of context management. Cursor provides granular control with its reasoning and apply models, making it ideal for precise modifications but requiring explicit context guidance. Windsurf excels with automatic context detection in standard codebases, handling pattern replication efficiently while sometimes struggling with novel approaches. Roo Code offers transparent token management and targeted precision that gives developers more direct control over the AI interaction.

When these tools reach their limits—for complex business logic, cross-component changes, or high-criticality features—I switch to direct engagement with Claude 3.7 or GPT o3, each bringing distinct strengths to different coding challenges. In this article, I’ll share the architectural insights that transformed my workflow, demonstrating how understanding the reasoning/apply model separation helps bridge the context gap that often leads to integration failures and bugs. You’ll learn practical strategies for selecting the right tool based on context requirements, pattern complexity, and business criticality—moving beyond simple prompting to true AI collaboration.

Diagram comparing coding assistant limits with direct LLM approaches, showing challenges like complex business logic and high-criticality features, and highlighting the need for manual context curation to bridge the context gap.

Perhaps most importantly, we’ll explore how this shift isn’t democratizing development as expected, but instead highlighting the value of systems thinking, integration expertise, and debugging across boundaries—skills that remain firmly in the human domain despite AI’s rapid advancement.

What AI has revealed is that writing code was never our true value; it was just one time-consuming task in our palette. As technical implementation details are increasingly automated, the profession’s core skills are now in the spotlight: managing complexity, making judicious trade-offs, and controlling the rich context required for creating harmonious systems. The differentiator now is how effectively we can communicate context to these AI tools—while focusing our own efforts on the architectural vision and integration expertise that have always been the hallmarks of exceptional engineering.

Context and Multi-Model Architecture: Why Understanding It Matters

The effectiveness of my AI-augmented workflow stems from understanding the technical architecture of AI driven IDEs (Integrated Development Environments) like Cursor, which employ a multi-model approach. The core challenge in AI-assisted tasks is context management—a problem that remains mostly unsolved.

The fundamental limitation is that no single model today can practically both fully understand your entire codebase, and efficiently implement precise changes – such a model requires both very strong general reasoning and planning capabilities and very precise ability to change code, all with a very large context windows – even with the ability to handle huge context windows, the actual task is too complex, and existing frontier LLMs are too slow and expensive and inaccurate to be practical. This creates a context gap that engineers must learn to bridge.

Diagram showing that no single AI model can both fully understand a codebase and implement precise code changes; one requires large context and reasoning, the other needs precise editing but limited understanding, with a note that combining both would be too slow, expensive, and inaccurate.

The Multi-Model Architecture

Tools like Cursor address this limitation through a specialized architecture:

  1. Primary Reasoning Model (Claude 3.5/3.7 Sonnet): This “thinking” model interprets requirements and plans code changes. While it has strong reasoning capabilities, it doesn’t directly write code.
  2. Apply Model: A separate, fine-tuned model optimized for implementing precise code changes to existing files. It is a fine tuned llama3-70b and it’s faster and more accurate for code changes, but has critical limitations:
  3. Context System: AI IDEs retrieves and selects relevant code sections for the prompts using vectorized semantic search and pattern-based search techniques. The system manages:
  • Vector embeddings of files: Code is converted to mathematical vectors that capture semantic meaning using . For example, a function calculate_total_price(items, tax_rate) and compute_final_cost(products, tax_percentage) would have similar vector representations despite different naming.
  • Relevance scoring: Files are ranked by multiple factors. For example: Recently edited files receive higher priority, as do files with stronger semantic matches to the query.
  • Context window optimization: Cursor selectively includes code fragments to maximize relevance while minimizing token usage. For instance, it prioritizes recently modified sections while excluding irrelevant parts of the codebase.
Diagram showing Cursor’s multi-model architecture: a Context System that handles vector embeddings, relevance scoring, and context window optimization, and two models beneath it—a Primary Reasoning Model for planning changes and an Apply Model optimized for precise code edits with a smaller context window.

Coming back to the key insight that transformed my workflow: the model that understands your task is not the same model that writes your code, i developed strategies to manage the context disconnect where the reasoning model might comprehend the full requirements, but the apply model operates with limited information, and the existing context systems are unable to bridge this gap.

This architectural split forces me to explicitly manage context handoffs between models. For example, I once asked Cursor to update a data grid component that consumed a filtering service, but the apply model created incompatible filter parameters because it couldn’t see the service implementation, resulting in runtime errors. Understanding this limitation, I now include specific interface examples and explicitly reference related implementation details in my prompts, compensating for the context gap.

Selecting the Right Tool

Understanding the multi-model architecture guides me in when to use each AI coding approach. The key insight is matching your development task to the right tool based on context requirements, pattern complexity, and business criticality.

When I Use AI Agents IDEs (60% of the Code I generate with AI)

I leverage apps like cursor/windsurf when technical constraints align with their architecture:

  • Well-Defined Context Tasks: Changes limited to specific components where I can explicitly identify all relevant code. Example: Adding a form validation function to a single component where all validation logic is contained within that file.
  • Pattern Replication: Implementing established software patterns that ensure code quality. Example: Creating a new Redux reducer that follows the same structure as existing reducers in the codebase.
  • Rich Example Availability: When similar implementations exist that the apply model can reference. Example: Adding a new API endpoint that follows the same error handling, authentication, and response formatting as existing endpoints.
  • Lower Business Criticality: Components where bugs would have minimal system impact. Example: Implementing UI improvements to an admin dashboard that’s only used internally and doesn’t affect customer-facing functionality.
  • Simple Features in Standard Systems: I trust IDEs like Windsurf for example, to identify the correct context and write the code when the feature is simple and not at the core of the system, or when the application is very standard, like a templated user registry React app. Example: Adding a “forgot password” feature to a standard authentication flow using common libraries and patterns.
Diagram listing scenarios where AI coding assistants are most effective, including well-defined context tasks, pattern replication, rich example availability, lower business criticality, and simple features in standard systems, each with a short example.

My prompts are engineered to overcome the apply model’s limitations, for example:

Diagram showing an engineered coding prompt with highlighted references: explicit file reference, pattern reference, data source reference, and styling reference, demonstrating how to structure detailed prompts for LLM-assisted coding.

What makes this effective:

  • Direct file and function references (@header.component.html) create clear anchoring points
  • Implementation examples that the ‘apply model’ can follow
  • Clear code pattern and styling references maintain code, behavioral, and visual consistency

Comparing AI Coding Assistants: Context Management and Performance

Between the tools, Windsurf excels at identifying the right context for medium-sized codebases with standard implementations, making it ideal for simple features in templated apps, though somewhat slower. Cursor works more efficiently for simpler tasks where I can precisely define the context and can be very fast when used correctly. Roo Code offers precise targeting with transparent context management that gives developers more control over their interactions.

Comparison chart showing strengths and ideal use cases for three AI coding assistants: WindSurf, Cursor, and Roo Code, highlighting differences in context identification, task efficiency, precision, and developer control.

Context Management Capabilities

Windsurf uses intelligent automatic context detection that works particularly well in standardized projects. For instance, when adding a new React component, it can automatically identify similar components and project structures without explicit prompting.

Cursor offers granular control that excels when you need precision. For complex functions, you can explicitly select specific files and components to ensure Cursor has exactly the right context, preventing assumptions based on unrelated code.

Roo Code provides “point-and-shoot precision” with its right-click interaction model, allowing you to quickly add specific code blocks while maintaining transparent token usage.

Model Control and Transparency

All three tools use multi-model architectures, but with different approaches:

Windsurf abstracts away model details with its apply model, focusing on delivering a clean experience that writes changes to disk before approval, letting you see results in your dev server in real time.

Cursor provides access to both reasoning models (Claude 3.5/3.7 Sonnet) and specialized apply models for implementing changes, though with less transparency about token usage.

Roo Code stands out by displaying token consumption explicitly when using custom APIs and models, giving developers full control and visibility on which LLM is used, and into how context is constructed and affects costs and model limits.

Table comparing Cursor, WindSurf, and Roo Code across context detection, code application, interaction speed, and model control, highlighting differences such as manual vs automatic context, fast vs moderate apply models, and transparent token usage.

For standard applications with familiar patterns, Windsurf’s automatic context detection provides a smoother experience. For smaller, well-defined tasks requiring more control, Cursor’s keyboard shortcuts and power features excel. When transparency and explicit context management are priorities, especially for complex modifications, Roo Code’s right-click workflow and token visibility offer the most control.

When I Code Directly With LLMs (40% of my AI-Generated Code)

I bypass integrated tools and go directly to Claude 3.7 or GPT o3 when working against the constraints of AI IDEs:

  • Complex Business Logic: Tasks requiring deep reasoning about system behavior. Example: Implementing a context ranking algorithm that accounts for user history, data dependencies, and use multiple services.
  • Cross-Component Changes: Modifications spanning multiple interdependent files. Example: Refactoring an authentication system to support both OAuth and passwordless login that requires coordinated changes to backend APIs, database schemas, and frontend components.
  • Novel Patterns: Creating new approaches without existing examples. Example: Designing a custom state management solution for a specific performance optimization that doesn’t follow Redux or other established patterns in the codebase.
  • Context-Heavy Tasks: When complete system understanding is crucial. Example: Rewriting a data processing pipeline that interacts with multiple third-party services and needs to maintain transaction integrity across system boundaries.
  • High Business Criticality: When bugs would significantly impact users or the business. Example: Implementing code commit functionality in an developer application where errors could directly impact everything the user does and cause compliance issues.
Diagram listing situations where developers code directly with LLMs, including complex business logic, cross-component changes, novel architectural patterns, context-heavy tasks, and high business criticality examples.

For these tasks, I handle planning and architecture myself, then create detailed prompts for each code component with detailed specifications and manually curated context. I provide complete code files and references as context, to get individual code components written properly by the LLM, then manually integrate and test them.

Example prompt showing how to instruct an LLM to add an auto-select toggle in an Angular TypeScript codebase, including file paths, BehaviorSubject binding, service method integration, and detailed toggle logic.

While slower, this approach ensures system control and prevents costly bugs that often emerge from letting AI handle the entire implementation.

Claude or GPT – Who Is the Better Code Developer?

Understanding each model’s strengths helps you choose the right one:

For precise implementation with complex code integration, I use GPT o3 because it:

  • Follows explicit instructions with high accuracy (e.g., “insert this function here and leave the rest untouched”)
  • Handles complex merges and keeps critical elements intact when rewriting files (like combining two data‑import scripts without losing a single flag)
  • Produces consistent code with fewer unexpected changes (diffs stay small and predictable)

For tasks requiring deeper reasoning, I use Claude 3.7, as it better:

  • Understands complex coding problems and architectural implications (it can spot why a new handler might break existing auth flow)
  • Implements intricate business logic with fewer errors (think multi‑currency order routing or layered discount rules)
  • Provides clearer reasoning about potential issues (it walks through edge cases and explains how to patch them)

Old School Manual Coding: The Other 20% of the Code

This is not a negligible percentage. It’s not easy to go back to ‘manual’, but I often find this is the better option for different reasons:

  • I need to write a critical capability, like introducing a new data structure, and I’m unable to fully specify it to the LLM—it’s just easier to start writing it. Example: Creating a custom caching mechanism for frequently accessed datasets that optimizes memory usage based on specific access patterns of our analysts.
  • I need to tune details that LLMs tend to get wrong, like HTML and CSS. Example: Fine-tuning the data studio’s dashboard layout to ensure visualizations resize correctly when analysts are working with split screens or comparing multiple datasets side-by-side.
  • I work on changes that require surgical modifications in several files, and it’s harder to define the exact changes and all the change points. Example: Adding comprehensive error tracking across the data pipeline UI that requires inserting consistent error handling in dozens of component files while maintaining the existing state management approach.
  • I implement some code as part of my definition for the LLM and end up writing it. Example: Starting to outline a custom drag-and-drop interface for analysts to build data transformation workflows and finding it faster to implement the core interaction logic myself than explain all the nuanced behaviors needed.
Diagram showing the 20% of coding tasks done manually, including critical capabilities, HTML and CSS fine-tuning, surgical code modifications, and feature implementation requiring precise control.

The Real Bottlenecks Remain Human

The shift to AI-generated code reveals an important truth: pure code production was never the true bottleneck in software development. Despite the likely prediction that soon 99% of code will be AI-written, the most challenging aspects of software engineering remain distinctly human domains. System design and integration between components require holistic thinking that current AI struggles to replicate—for instance, understanding how a new notification service impacts both database load and user experience across mobile and web platforms. Bug investigation demands forensic reasoning across system boundaries, like tracing an intermittent payment processing failure through APIs, databases, and third-party integrations. Handling edge cases requires anticipating scenarios not explicitly defined, such as managing connectivity issues during multi-step workflows for remote users.

This reality has transformed our hiring focus as well: we now prioritize candidates who demonstrate systems thinking and debugging prowess over routine technical skills. Our technical interviews have evolved from coding exercises and algorithm puzzles to scenarios like “explain how you’d diagnose this cross-service data inconsistency” or “design a system that gracefully handles these competing requirements.” The most valuable engineers aren’t those who write the cleanest functions, but those who can articulate trade-offs, anticipate integration challenges, and bridge technical capabilities with business requirements—skills that remain firmly in the human domain despite AI’s rapid advancement.

The Widening Talent Gap

Counterintuitively, AI-augmented development is magnifying rather than diminishing the value gap between engineers. As routine coding tasks become automated, the profession is splitting into two tiers: those who excel at system thinking, debugging complex interactions, and understanding product needs versus those who primarily coded without these broader skills. Top engineers now leverage AI to rapidly test architectural hypotheses while focusing their expertise on system integration, behavior prediction, and edge case management. Their value comes from superior product intuition, cross-functional communication, and the ability to bridge technical and business concerns. Meanwhile, engineers who historically relied on coding proficiency without developing these systems-level capabilities find themselves with diminishing comparative advantage, regardless of how well they can prompt AI tools.

The mythical “10x engineer” concept is becoming more pronounced, not less, in the AI era. AI tools don’t transform average engineers into exceptional ones—they amplify existing differences. A senior engineer who understands architectural patterns can use AI to implement robust solutions quickly, while less experienced developers generate seemingly functional code that introduces subtle bugs they can’t identify. “Vibe coding”—prompting for solutions without understanding what’s generated—creates a dangerous productivity illusion with hard limits. As systems grow, these engineers become trapped debugging AI-introduced issues they lack the mental models to diagnose, widening the performance gap between the most and least effective team members.

Making the Product Work in the Real World: The Final Frontier

Perhaps the most persistent challenge in software development—and the area most resistant to AI automation—is making systems work reliably with users in unpredictable real-world environments. Handling various user needs, unexpected behaviors, real-world system complexities, and resolving conflicts between competing requirements all require iteration, time, judgment, communication and management skills, prioritization intuitions, and adaptive problem-solving that current AI systems still cannot match. Product managers and engineers who excel at bridging this gap between theoretical implementation and practical reality are becoming increasingly valuable, as their expertise addresses the final mile problem that separates functioning code from successful products.

Graphic showing three major challenges in software development: adapting to diverse user needs, handling real-world system complexity, and resolving conflicting product requirements.

Bridging The Context Gap for Data Teams with Ask-y

Our experience with AI code assistants revealed a fundamental truth: the context gap remains the primary challenge in AI-augmented workflows. This insight directly shaped our approach to Ask-y, our Multi-agent solution built specifically for data teams.

Just as software engineers’ value wasn’t in writing code but in systems thinking and integration, data professionals’ core value lies in their domain knowledge, data, numbers and business intuitions, and judgment—not in implementing data connections, statistical methods or writing queries. Ask-y leverages our Joint Associative Memory (JAM) architecture to bridge this context gap, learning which elements of a data environment matter most for different analytical tasks. When an analyst explores campaign performance patterns, for instance, JAM doesn’t just surface similar analyses but learns which contextual elements—data structure, business constraints, validation methods—were most predictive of successful outcomes.

Diagram illustrating the Joint Associative Memory workflow: data environment context feeds into JAM architecture, which learns the relevance of memories and context, resulting in an enhanced analyst workflow.

Unlike black-box approaches, Ask-y maintains complete transparency, giving data teams full control on both the context and the resulting components. This keeps human judgment central while removing technical barriers that slow hypothesis testing and exploration. The result is an environment where data professionals focus on methodological decisions and interpretation while Ask-y manages the LLMs that handle the technical tasks —preserving the analyst’s essential role while dramatically accelerating their ability to translate insights into business value.

…in Part 1 of this series, we explored “What it takes to make AI Native Analytics work in the real world”.

… in our next article we will look at how analytics workflows call for different scaffolding around LLMs to effectively leverage the benefits of generative AI and how Ask-y sees that changing the jobs of Analysts.

Link to the article on LinkedIn.