AI

When should you fire your AI Coding Assistant?

Coding Assistants in Action: WindSurf, Cursor and Roo’s Approach to Complex Feature Implementation …and how Generative AI changes Software Engineering

Diagram explaining when to use AI coding assistants, direct LLM usage, or manual coding. Highlights how AI handles routine coding while the profession splits between system thinkers and pure coders, widening the skill gap.

Coding Assistants in Action: WindSurf, Cursor and Roo’s Approach to Complex Feature Implementation …and how Generative AI changes Software Engineering

In this Part 2 of our “What it takes to make AI Native Analytics work in the real world” series (Part 1 here) we’ll take an in-depth look at how LLMs ar are changing Software Engineer’s daily work. We’ll explore the practical differences between WindSurf, Cursor and Roo, identifying when each tool excels, when direct LLM interaction proves more effective, and when traditional manual coding remains the superior choice.

The section titled “When to Fire Your AI Assistant: The Hidden Architecture of Cursor and Windsurf ” was originally published by Avigad Oron, Ask-Y’s Head of Technology here. Please follow Avigad if you’d like to get notified about his next posts.

Software Engineering represents one of the domains most profoundly transformed by LLMs, and we believe Analytics will follow a similar trajectory. This article investigates how generative AI is reshaping Software Engineering practices, setting the stage for our next piece where we’ll extrapolate these lessons to predict the coming disruption in Analytics workflows.

Infographic explaining when to use different coding approaches: AI coding assistants, direct LLM usage, or manual coding. It highlights the profession shift as AI takes over routine coding, splitting roles between systems thinkers and pure coders, and widening the "10x engineer" skill gap.

In software development, a coding task often arrives with predefined parameters established by product managers and architects. This includes a relatively clear scope of work, product requirements, system architecture design, and specifications for how the component will function and interface with other systems.

Before coding begins, ambiguities are usually researched and resolved through experiments, planning sessions and documentation. The coding process itself is essentially the technical implementation phase where developers transform these predetermined specifications into functional code.

LLMs excel at coding tasks by leveraging more than simple pattern matching. Recent interpretability research shows these models utilize sophisticated computational circuits with parallel processing pathways.

Diagram showing how LLMs generate code by processing input patterns into logic, state, flow, and syntax patterns to produce generated code, highlighting semantic relationships and parallel processing.
Article content

Their architecture allows them to represent programming concepts at multiple levels of abstraction simultaneously—from syntax patterns to semantic relationships—while planning several tokens ahead.

Diagram of LLM neural representation for code generation showing syntax level (token processing), semantic level (meaning), and planning level (multi-token) working together to maintain consistent variable naming, logical flow, and programming knowledge.

This structured nature of software development, with its defined syntax and established patterns, maps well to how LLMs internally process information, enabling them to generate coherent code by activating interconnected feature networks representing programming knowledge, even demonstrating abilities to maintain consistent variable naming and logical flow across functions.

The structured approach outlined in these steps creates an ideal environment for LLMs to excel at code development:

Flowchart illustrating a structured approach for code development with five steps: index and analyze existing code, design and build new components, manage dependencies and implement UI, enhance state management and security, and test, optimize, and deploy.

Index and analyze existing code: LLMs excel when they can reference existing patterns in codebases. This provides concrete examples of integration points, naming conventions, and architecture decisions they can follow rather than inventing solutions.

Design and build new components: Component-based architectures (React, Angular, microservices) give LLMs clear boundaries to work within. When building a speech transcription component, for example, the LLM can focus on specific inputs/outputs without reengineering the entire system.

Manage dependencies and implement UI changes: Package managers, import systems, and declarative UI frameworks create predictable patterns LLMs can recognize and implement. They can add speech input controls following existing UI conventions without reinventing interface design principles.

Enhance state management and security: Established patterns like Redux, Context API, or authentication libraries provide templates LLMs can apply, reducing ambiguity around how to maintain application state during speech processing or secure transcribed content.

Test, optimize, and deploy: Automated testing frameworks and CI/CD pipelines give LLMs immediate feedback on code correctness, allowing for systematic improvement through clear signals.

LLMs thrive in coding tasks because modern software development naturally breaks problems into discrete, well-defined units with established patterns and interfaces.

With strongly-typed languages, linters, compiler errors, and comprehensive documentation, LLMs don’t need to innovate fundamental technologies or resolve ambiguous requirements. They can instead focus on implementation within proven frameworks, making them particularly effective at translating specific requirements into working code despite the underlying complexity.

If you have never tried extensive, production-worthy coding with LLMs, here’s an example of how it can accelerate your process when used proficiently. Here is a chart showing the time it would typically take to implement a feature like “Speech to text” in a complex application (time are real based on our experience):

Bar chart showing speech-to-text feature development timeline: 40 hours without assistance, 8–16 hours with prior experience, 6 hours using LLM directly, and 3 hours using a coding assistant.
  • A skilled engineer without prior experience in speech-to-text: approximately 1 week
  • An engineer with relevant prior experience: 1-2 days
  • Using a leading  LLM directly (like Claude Sonnet 3.7): about 6 hours
  • Using a dedicated coding assistant like WindSurf or Cursor: approximately 3 hours

This dramatic acceleration in development time illustrates the power of AI-assisted coding and provides a useful reference point for understanding what AI-native analytics platforms must achieve in their own domain.

Note: this is not a “Windsurf vs Cursor vs…” comparison, we use several assistants and they’re all great. Looking across coding assistants allows us a more complete understanding of how they approach challenges with complex workflows and when to use assistants vs LLMs vs manual code.

When to Fire Your AI Assistant: The Hidden Architecture of Cursor and Windsurf

…and how it informs the new coding paradigm for Software Engineers

This section was originally published by Avigad Oron, Ask-Y’s Head of Technology here. Please follow Avigad if you’d like to get notified about his next posts.

“Why is Windsurf failing at this cross-component change?” “Would Claude do better for this complex business logic?” “Should I just write this code myself?” These are questions every software engineer has—or should have—as we navigate the new reality of AI-augmented development. In my engineering practice, 80% of my code now comes from AI—but the critical insight isn’t that AI can write code, it’s understanding when to use which tool and why they succeed or fail in different scenarios.

Diagram showing coding approach split: 80% using coding assistants or direct LLMs, 20% manual coding, with note emphasizing the importance of knowing when and why to use each tool.

Beneath the surface, Cursor and  Windsurf, and Roo Code operate on multi-model architectures that create both opportunities and limitations. Each uses different approaches to solve the fundamental challenge of context management. Cursor provides granular control with its reasoning and apply models, making it ideal for precise modifications but requiring explicit context guidance. Windsurf excels with automatic context detection in standard codebases, handling pattern replication efficiently while sometimes struggling with novel approaches. Roo Code offers transparent token management and targeted precision that gives developers more direct control over the AI interaction.

When these tools reach their limits—for complex business logic, cross-component changes, or high-criticality features—I switch to direct engagement with Claude 3.7 or GPT o3, each bringing distinct strengths to different coding challenges. In this article, I’ll share the architectural insights that transformed my workflow, demonstrating how understanding the reasoning/apply model separation helps bridge the context gap that often leads to integration failures and bugs. You’ll learn practical strategies for selecting the right tool based on context requirements, pattern complexity, and business criticality—moving beyond simple prompting to true AI collaboration.

Diagram explaining the context gap in AI-assisted coding, showing coding assistant limits with complex logic, cross-component changes, critical features, and novel patterns, contrasted with direct LLM approaches like Claude or GPT using manual context curation and detailed prompts.

Perhaps most importantly, we’ll explore how this shift isn’t democratizing development as expected, but instead highlighting the value of systems thinking, integration expertise, and debugging across boundaries—skills that remain firmly in the human domain despite AI’s rapid advancement.

What AI has revealed is that writing code was never our true value; it was just one time-consuming task in our palette. As technical implementation details are increasingly automated, the profession’s core skills are now in the spotlight: managing complexity, making judicious trade-offs, and controlling the rich context required for creating harmonious systems. The differentiator now is how effectively we can communicate context to these AI tools—while focusing our own efforts on the architectural vision and integration expertise that have always been the hallmarks of exceptional engineering.

Context and Multi-Model Architecture: Why Understanding It Matters

The effectiveness of my AI-augmented workflow stems from understanding the technical architecture of AI driven IDEs (Integrated Development Environments) like Cursor, which employ a multi-model approach. The core challenge in AI-assisted tasks is context management—a problem that remains mostly unsolved.

The fundamental limitation is that no single model today can practically both fully understand your entire codebase, and efficiently implement precise changes – such a model requires both very strong general reasoning and planning capabilities and very precise ability to change code, all with a very large context windows – even with the ability to handle huge context windows, the actual task is too complex, and existing frontier LLMs are too slow and expensive and inaccurate to be practical. This creates a context gap that engineers must learn to bridge.

Diagram showing the split in AI model capabilities: models can either fully understand a codebase with large context and strong reasoning or implement precise changes with smaller context and limited understanding, but no single model can do both effectively.

The Multi-Model Architecture

Tools like Cursor address this limitation through a specialized architecture:

  1. Primary Reasoning Model (Claude 3.5/3.7 Sonnet): This “thinking” model interprets requirements and plans code changes. While it has strong reasoning capabilities, it doesn’t directly write code.
  2. Apply Model: A separate, fine-tuned model optimized for implementing precise code changes to existing files. It is a fine tuned llama3-70b and it’s faster and more accurate for code changes, but has critical limitations:
  3. Context System: AI IDEs retrieves and selects relevant code sections for the prompts using vectorized semantic search and pattern-based search techniques. The system manages:
  • Vector embeddings of files: Code is converted to mathematical vectors that capture semantic meaning using . For example, a function calculate_total_price(items, tax_rate) and compute_final_cost(products, tax_percentage) would have similar vector representations despite different naming.
  • Relevance scoring: Files are ranked by multiple factors. For example: Recently edited files receive higher priority, as do files with stronger semantic matches to the query.
  • Context window optimization: Cursor selectively includes code fragments to maximize relevance while minimizing token usage. For instance, it prioritizes recently modified sections while excluding irrelevant parts of the codebase.
Diagram of Cursor's multi-model architecture showing three components: a Context System (vector embeddings, relevance scoring, token efficiency, selective code fragment inclusion), a Primary Reasoning Model (Claude 3.5/3.7 Sonnet for interpreting requirements and planning changes with strong reasoning but no direct coding), and an Apply Model (fine-tuned Llama3-70B optimized for precise code changes with smaller context window and limited reasoning).

Coming back to the key insight that transformed my workflow: the model that understands your task is not the same model that writes your code, i developed strategies to manage the context disconnect where the reasoning model might comprehend the full requirements, but the apply model operates with limited information, and the existing context systems are unable to bridge this gap.

This architectural split forces me to explicitly manage context handoffs between models. For example, I once asked Cursor to update a data grid component that consumed a filtering service, but the apply model created incompatible filter parameters because it couldn’t see the service implementation, resulting in runtime errors. Understanding this limitation, I now include specific interface examples and explicitly reference related implementation details in my prompts, compensating for the context gap.

Selecting the Right Tool

Understanding the multi-model architecture guides me in when to use each AI coding approach. The key insight is matching your development task to the right tool based on context requirements, pattern complexity, and business criticality.

When I Use AI Agents IDEs (60% of the Code I generate with AI)

I leverage apps like cursor/windsurf when technical constraints align with their architecture:

  • Well-Defined Context Tasks: Changes limited to specific components where I can explicitly identify all relevant code. Example: Adding a form validation function to a single component where all validation logic is contained within that file.
  • Pattern Replication: Implementing established software patterns that ensure code quality. Example: Creating a new Redux reducer that follows the same structure as existing reducers in the codebase.
  • Rich Example Availability: When similar implementations exist that the apply model can reference. Example: Adding a new API endpoint that follows the same error handling, authentication, and response formatting as existing endpoints.
  • Lower Business Criticality: Components where bugs would have minimal system impact. Example: Implementing UI improvements to an admin dashboard that’s only used internally and doesn’t affect customer-facing functionality.
  • Simple Features in Standard Systems: I trust IDEs like Windsurf for example, to identify the correct context and write the code when the feature is simple and not at the core of the system, or when the application is very standard, like a templated user registry React app. Example: Adding a “forgot password” feature to a standard authentication flow using common libraries and patterns.
Infographic titled "When I Use AI Coding Assistants (60% of my AI-Generated Code)" listing five use cases: Well-Defined Context Tasks (e.g., form validation for one component), Pattern Replication (e.g., new Redux reducer), Rich Example Availability (e.g., API endpoint with similar error handling), Lower Business Criticality (e.g., UI improvements to admin dashboard), and Simple Features in Standard Systems (e.g., forgot password in auth flow).

My prompts are engineered to overcome the apply model’s limitations, for example:

Diagram titled "Engineered Prompt Example" showing a coding instruction with highlighted references. Explicit file reference marked in red (@header.component.html), pattern reference in green (@notifications.component.ts), data source reference in blue (@landing-page.component.html), and styling reference in purple (@landing-page.component.scss). Demonstrates how structured prompts guide AI coding assistants with context-specific instructions.

What makes this effective:

  • Direct file and function references (@header.component.html) create clear anchoring points
  • Implementation examples that the ‘apply model’ can follow
  • Clear code pattern and styling references maintain code, behavioral, and visual consistency

Comparing AI Coding Assistants: Context Management and Performance

Between the tools, Windsurf excels at identifying the right context for medium-sized codebases with standard implementations, making it ideal for simple features in templated apps, though somewhat slower. Cursor works more efficiently for simpler tasks where I can precisely define the context and can be very fast when used correctly. Roo Code offers precise targeting with transparent context management that gives developers more control over their interactions.

Comparison chart titled "AI Coding Assistants: Choosing the Right Tool," showing three options: WindSurf (strengths: excellent context identification, medium-sized codebases, standard implementations; ideal for simple features in templated apps), Cursor (strengths: fast, efficient for well-defined tasks, precise control; ideal for simpler tasks with defined context), and Roo Code (strengths: precise targeting, transparent context management, developer control; ideal for fine-grained control and transparency).

Context Management Capabilities

Windsurf uses intelligent automatic context detection that works particularly well in standardized projects. For instance, when adding a new React component, it can automatically identify similar components and project structures without explicit prompting.

Cursor offers granular control that excels when you need precision. For complex functions, you can explicitly select specific files and components to ensure Cursor has exactly the right context, preventing assumptions based on unrelated code.

Roo Code provides “point-and-shoot precision” with its right-click interaction model, allowing you to quickly add specific code blocks while maintaining transparent token usage.

Model Control and Transparency

All three tools use multi-model architectures, but with different approaches:

Windsurf abstracts away model details with its apply model, focusing on delivering a clean experience that writes changes to disk before approval, letting you see results in your dev server in real time.

Cursor provides access to both reasoning models (Claude 3.5/3.7 Sonnet) and specialized apply models for implementing changes, though with less transparency about token usage.

Roo Code stands out by displaying token consumption explicitly when using custom APIs and models, giving developers full control and visibility on which LLM is used, and into how context is constructed and affects costs and model limits.

Comparison table of AI coding assistants Cursor, WindSurf, and Roo Code showing differences in context detection, code application, interaction speed, and model control.

For standard applications with familiar patterns, Windsurf’s automatic context detection provides a smoother experience. For smaller, well-defined tasks requiring more control, Cursor’s keyboard shortcuts and power features excel. When transparency and explicit context management are priorities, especially for complex modifications, Roo Code’s right-click workflow and token visibility offer the most control.

When I Code Directly With LLMs (40% of my AI-Generated Code)

I bypass integrated tools and go directly to Claude 3.7 or GPT o3 when working against the constraints of AI IDEs:

  • Complex Business Logic: Tasks requiring deep reasoning about system behavior. Example: Implementing a context ranking algorithm that accounts for user history, data dependencies, and use multiple services.
  • Cross-Component Changes: Modifications spanning multiple interdependent files. Example: Refactoring an authentication system to support both OAuth and passwordless login that requires coordinated changes to backend APIs, database schemas, and frontend components.
  • Novel Patterns: Creating new approaches without existing examples. Example: Designing a custom state management solution for a specific performance optimization that doesn’t follow Redux or other established patterns in the codebase.
  • Context-Heavy Tasks: When complete system understanding is crucial. Example: Rewriting a data processing pipeline that interacts with multiple third-party services and needs to maintain transaction integrity across system boundaries.
  • High Business Criticality: When bugs would significantly impact users or the business. Example: Implementing code commit functionality in an developer application where errors could directly impact everything the user does and cause compliance issues.
Visual guide on when to code directly with LLMs, highlighting use cases such as implementing complex business logic, handling cross-component changes, designing novel patterns, rewriting context-heavy data pipelines, and managing high business criticality features.

For these tasks, I handle planning and architecture myself, then create detailed prompts for each code component with detailed specifications and manually curated context. I provide complete code files and references as context, to get individual code components written properly by the LLM, then manually integrate and test them.

Example of a direct LLM coding prompt in Angular TypeScript, showing structured instructions for adding an auto-select toggle, binding it to a state model, updating session logic, and implementing functions with clear merge-ready code.

While slower, this approach ensures system control and prevents costly bugs that often emerge from letting AI handle the entire implementation.

Claude or GPT – Who Is the Better Code Developer?

Understanding each model’s strengths helps you choose the right one:

For precise implementation with complex code integration, I use GPT o3 because it:

  • Follows explicit instructions with high accuracy (e.g., “insert this function here and leave the rest untouched”)
  • Handles complex merges and keeps critical elements intact when rewriting files (like combining two data‑import scripts without losing a single flag)
  • Produces consistent code with fewer unexpected changes (diffs stay small and predictable)

For tasks requiring deeper reasoning, I use Claude 3.7, as it better:

  • Understands complex coding problems and architectural implications (it can spot why a new handler might break existing auth flow)
  • Implements intricate business logic with fewer errors (think multi‑currency order routing or layered discount rules)
  • Provides clearer reasoning about potential issues (it walks through edge cases and explains how to patch them)

Old School Manual Coding: The Other 20% of the Code

This is not a negligible percentage. It’s not easy to go back to ‘manual’, but I often find this is the better option for different reasons:

  • I need to write a critical capability, like introducing a new data structure, and I’m unable to fully specify it to the LLM—it’s just easier to start writing it. Example: Creating a custom caching mechanism for frequently accessed datasets that optimizes memory usage based on specific access patterns of our analysts.
  • I need to tune details that LLMs tend to get wrong, like HTML and CSS. Example: Fine-tuning the data studio’s dashboard layout to ensure visualizations resize correctly when analysts are working with split screens or comparing multiple datasets side-by-side.
  • I work on changes that require surgical modifications in several files, and it’s harder to define the exact changes and all the change points. Example: Adding comprehensive error tracking across the data pipeline UI that requires inserting consistent error handling in dozens of component files while maintaining the existing state management approach.
  • I implement some code as part of my definition for the LLM and end up writing it. Example: Starting to outline a custom drag-and-drop interface for analysts to build data transformation workflows and finding it faster to implement the core interaction logic myself than explain all the nuanced behaviors needed.
Manual coding examples covering 20% of development tasks, including critical capabilities like custom caching, HTML/CSS fine-tuning, surgical modifications for error tracking, and implementation within definition such as drag-and-drop interfaces.

The Real Bottlenecks Remain Human

The shift to AI-generated code reveals an important truth: pure code production was never the true bottleneck in software development. Despite the likely prediction that soon 99% of code will be AI-written, the most challenging aspects of software engineering remain distinctly human domains. System design and integration between components require holistic thinking that current AI struggles to replicate—for instance, understanding how a new notification service impacts both database load and user experience across mobile and web platforms. Bug investigation demands forensic reasoning across system boundaries, like tracing an intermittent payment processing failure through APIs, databases, and third-party integrations. Handling edge cases requires anticipating scenarios not explicitly defined, such as managing connectivity issues during multi-step workflows for remote users.

This reality has transformed our hiring focus as well: we now prioritize candidates who demonstrate systems thinking and debugging prowess over routine technical skills. Our technical interviews have evolved from coding exercises and algorithm puzzles to scenarios like “explain how you’d diagnose this cross-service data inconsistency” or “design a system that gracefully handles these competing requirements.” The most valuable engineers aren’t those who write the cleanest functions, but those who can articulate trade-offs, anticipate integration challenges, and bridge technical capabilities with business requirements—skills that remain firmly in the human domain despite AI’s rapid advancement.

The Widening Talent Gap

Counterintuitively, AI-augmented development is magnifying rather than diminishing the value gap between engineers. As routine coding tasks become automated, the profession is splitting into two tiers: those who excel at system thinking, debugging complex interactions, and understanding product needs versus those who primarily coded without these broader skills. Top engineers now leverage AI to rapidly test architectural hypotheses while focusing their expertise on system integration, behavior prediction, and edge case management. Their value comes from superior product intuition, cross-functional communication, and the ability to bridge technical and business concerns. Meanwhile, engineers who historically relied on coding proficiency without developing these systems-level capabilities find themselves with diminishing comparative advantage, regardless of how well they can prompt AI tools.

The mythical “10x engineer” concept is becoming more pronounced, not less, in the AI era. AI tools don’t transform average engineers into exceptional ones—they amplify existing differences. A senior engineer who understands architectural patterns can use AI to implement robust solutions quickly, while less experienced developers generate seemingly functional code that introduces subtle bugs they can’t identify. “Vibe coding”—prompting for solutions without understanding what’s generated—creates a dangerous productivity illusion with hard limits. As systems grow, these engineers become trapped debugging AI-introduced issues they lack the mental models to diagnose, widening the performance gap between the most and least effective team members.

Making the Product Work in the Real World: The Final Frontier

Perhaps the most persistent challenge in software development—and the area most resistant to AI automation—is making systems work reliably with users in unpredictable real-world environments. Handling various user needs, unexpected behaviors, real-world system complexities, and resolving conflicts between competing requirements all require iteration, time, judgment, communication and management skills, prioritization intuitions, and adaptive problem-solving that current AI systems still cannot match. Product managers and engineers who excel at bridging this gap between theoretical implementation and practical reality are becoming increasingly valuable, as their expertise addresses the final mile problem that separates functioning code from successful products.

Key challenges in software development include adapting to changing user needs, managing real-world system complexity, and resolving requirement conflicts between competing product demands.

Bridging The Context Gap for Data Teams with Ask-y

Our experience with AI code assistants revealed a fundamental truth: the context gap remains the primary challenge in AI-augmented workflows. This insight directly shaped our approach to Ask-y, our Multi-agent solution built specifically for data teams.

Just as software engineers’ value wasn’t in writing code but in systems thinking and integration, data professionals’ core value lies in their domain knowledge, data, numbers and business intuitions, and judgment—not in implementing data connections, statistical methods or writing queries. Ask-y leverages our Joint Associative Memory (JAM) architecture to bridge this context gap, learning which elements of a data environment matter most for different analytical tasks. When an analyst explores campaign performance patterns, for instance, JAM doesn’t just surface similar analyses but learns which contextual elements—data structure, business constraints, validation methods—were most predictive of successful outcomes.

Joint Associative Memory (JAM) diagram illustrating process flow: Data Environment Context feeds into JAM Architecture, which learns relevance of memories and context, leading to Enhanced Analyst Workflow. © Ask-Y

Unlike black-box approaches, Ask-y maintains complete transparency, giving data teams full control on both the context and the resulting components. This keeps human judgment central while removing technical barriers that slow hypothesis testing and exploration. The result is an environment where data professionals focus on methodological decisions and interpretation while Ask-y manages the LLMs that handle the technical tasks —preserving the analyst’s essential role while dramatically accelerating their ability to translate insights into business value.

…in Part 1 of this series, we explored “What it takes to make AI Native Analytics work in the real world”.

… in our next article we will look at how analytics workflows call for different scaffolding around LLMs to effectively leverage the benefits of generative AI and how Ask-y sees that changing the jobs of Analysts.

Link to the article on LinkedIn.