Blog article

The software development landscape is witnessing the emergence of a new category of AI tools that goes far beyond traditional coding assistants. While tools like GitHub Copilot and ChatGPT excel at providing code suggestions and completions, autonomous coding agents represent a fundamental shift toward truly independent development workflows.

Understanding Asynchronous Coding Agents

Traditional AI coding assistants are reactive – they respond to prompts and provide suggestions, but require constant human input to orchestrate the overall workflow. In contrast, asynchronous coding agents operate with complete autonomy once given a high-level goal.

These agents share several key characteristics:

  • Cloud-based execution: They work in remote environments without requiring constant access to your local setup
  • Autonomous task completion: Given a high-level objective, they plan and execute entire workflows independently
  • Minimal human intervention: Developers interact primarily at task assignment and result review stages
  • Comprehensive output: They deliver complete solutions including code, documentation, tests, and deployment-ready artifacts

The value proposition is transformative: “It’s an agentic experience; it’s completely asynchronous to you. You could be doing one task, and Copilot could be executing on five others, and that’s really the value at the end.”

This represents a shift from synchronous collaboration (constant prompting) to asynchronous delegation (assign and review), fundamentally changing how developers can structure their workflows and multiply their productivity.

Why This Comparison Matters

As the coding agent ecosystem rapidly evolves, developers face the challenge of choosing the right tools for their workflows. We compared these agents in a real-world scenario – setting up comprehensive test coverage for a Next.js application – to evaluate not just their technical capabilities, but their practical value in everyday development. With 99% of developers building AI applications, exploring or developing AI agents, and projections showing AI agents will automate 15-50% of routine business tasks by 2027, the question isn’t whether these tools will become standard—but which ones will best serve your specific development needs.

The Challenge

For this comparison, we selected a common but substantial development task: adding comprehensive test coverage to a Next.js application that had none. The repository was a greenfield project with just a few screens and only tens of components, making it an ideal candidate for establishing a testing foundation. This represented perfect timing to set up the test harness—substantial enough to be meaningful, yet manageable enough that coding agents could realistically succeed.

The Task: “We don’t have any test coverage in this repository. We need to add a test suite.”

Note that we deliberately kept the prompt open-ended without providing a precise definition of what the test suite should include. This gave the coding agents creative space to propose their own plans and demonstrate their ability to interpret requirements and make architectural decisions autonomously.

Success Criteria:

  • Complete test setup and configuration
  • Representative test cases covering different types of components and utilities
  • Functional test runner (npm test should work and tests should pass)
  • Proper documentation and project structure

In all cases, we requested that the agent first propose a comprehensive plan before execution, allowing us to evaluate both their planning capabilities and final implementation quality.

The Contenders

We tested five coding agents representing different approaches to autonomous development:

Google Jules

Google’s autonomous coding agent powered by Gemini 2.5 Pro. Operates asynchronously by cloning repositories into secure Google Cloud VMs, allowing developers to assign tasks and focus elsewhere while Jules works independently. Currently in public beta with free access (up to 5 tasks daily).

OpenAI Codex

OpenAI’s coding agent that generates solutions in its web interface before creating GitHub pull requests. Operates with restricted internet access by default for security, requiring manual intervention for package installations and test execution.

GitHub Copilot Agent

GitHub’s native coding agent that works directly within its ecosystem. Uses third-party models (including Anthropic’s models for coding tasks) rather than relying solely on first-party AI, and offers seamless PR creation with comprehensive documentation and automated testing capabilities.

Cursor Agent

Cursor’s web-based coding agent that extends their IDE capabilities to provide asynchronous development workflows with strong integration back to the Cursor development environment. Like GitHub Copilot, it utilizes third-party AI models rather than proprietary ones.

Claude Code (Bonus)

Anthropic’s command-line coding agent that runs locally on the developer’s machine. While not strictly an “asynchronous” agent since it requires local execution permissions, it offers autonomous planning and implementation with comprehensive workflow capabilities that make it worth including in our comparison.

Tool-by-Tool Analysis

Google Jules: Fast but Minimal

Jules initially failed during our first attempt, getting stuck in an indefinite processing state for several hours. However, when we retried the task the following day, it completed successfully and delivered results quickly.

What Worked:

  • Fast Execution: Once functional, Jules produced code changes remarkably quickly
  • Seamless Integration: Easy branch creation in GitHub through the UI with a simple button click
  • Clean Implementation: Basic Jest setup with proper configuration

The Minimal Approach: Jules took a very conservative approach, implementing only the bare essentials:

  • Simple Jest testing infrastructure
  • Single test file covering a utility function
  • No component tests or multi-layer coverage

Test Results:

> jest

 PASS  lib/__tests__/utils.test.ts
  cn
    ✓ should merge tailwind classes (4 ms)
    ✓ should handle conditional classes (1 ms)
    ✓ should override conflicting classes

Test Suites: 1 passed, 1 total
Tests:       3 passed, 3 total
Time:        0.453 s

Reliability Concerns: The initial failure and need to retry after a day suggests potential stability issues that teams should consider for critical workflows.

Verdict: Jules shows promise with its speed and clean implementation, but the minimal approach may not meet comprehensive testing needs. The reliability issues experienced highlight the importance of having fallback plans when using bleeding-edge autonomous agents.

OpenAI Codex: Functional but Limited

Codex delivered a working but minimal solution. The agent successfully set up Jest testing infrastructure and created basic test cases, but required significant manual intervention to complete the task.

Key Issues:

  • Internet Access Limitations: By default, Codex operates without internet access after initial setup, preventing automatic package installation
  • Dependency Management: Installed testing libraries as dependencies instead of devDependencies
  • Incomplete Artifacts: Missing package-lock.json updates in the commit
  • Test Output: Cluttered test output without optimization

Manual Intervention Required: We had to enable internet access specifically to allow npm install for the new testing dependencies.

Test Results:

PASS  lib/utils.test.ts
PASS  lib/conversationalFormLLM.test.ts
PASS  components/theme-toggle.test.tsx
Test Suites: 3 passed, 3 total
Tests:       5 passed, 5 total
Time:        1.388 s

Verdict: Codex provides a functional foundation but requires more hand-holding compared to other agents. The security-first approach with restricted internet access, while prudent, creates friction in practical workflows.

GitHub Copilot Agent: The Comprehensive Solution

Copilot Agent delivered the most polished and comprehensive solution, showcasing the advantages of native GitHub integration and access to broader tooling capabilities.

Standout Features:

  • Rich Test Coverage: Generated 52 test cases across 5 test suites covering components, utilities, and API layers
  • Complete Documentation: Added TESTING.md with comprehensive setup and usage instructions
  • Proper Configuration: Correctly configured dependencies, test scripts, and project structure
  • Performance Optimization: Tests completed faster despite broader coverage
  • Seamless Integration: Automatic PR creation within GitHub ecosystem

Test Results:

PASS  __tests__/components/theme-toggle.test.tsx
PASS  __tests__/lib/conversationalFormLLM.test.ts
PASS  __tests__/lib/api/conversationalFormLLM-api.test.ts
PASS  __tests__/types/form.test.ts
PASS  __tests__/lib/utils.test.ts
Test Suites: 5 passed, 5 total
Tests:       52 passed, 52 total
Time:        0.647 s

GitHub Integration Advantage: Working directly within GitHub provided significant workflow benefits – the agent could automatically create PRs, had access to repository context, and integrated seamlessly with existing development workflows.

Claude Code: The Strategic Planner

Claude Code demonstrated sophisticated planning capabilities, presenting a comprehensive testing roadmap that extended far beyond the immediate requirements.

Strategic Approach: Rather than just implementing basic test coverage, Claude Code presented a multi-phase plan covering:

  • Phase 1: Unit and component testing (what was implemented)
  • Future Phases: API route testing, integration tests, and end-to-end testing

Implementation Quality: The delivered solution was comparable to Copilot Agent in terms of code quality and test coverage, with clean, maintainable test cases.

Test Results:

PASS  components/theme-toggle.test.tsx
PASS  lib/conversationalFormLLM.test.ts
PASS  lib/utils.test.ts
Test Suites: 3 passed, 3 total
Tests:       24 passed, 24 total
Time:        0.821 s

Local Execution Trade-offs: While requiring local permissions and manual PR creation, Claude Code’s local execution enabled deeper repository analysis and more contextual decision-making.

Notable: When we asked Copilot Agent to review Claude Code’s PR, it had no significant comments or suggestions – indicating there were no obvious errors.

Cursor Agent: Strong Integration with Trade-offs

Cursor Agent delivered results comparable to Copilot Agent while providing excellent developer experience features that bridge web-based and local development.

Key Strengths:

  • Comprehensive Implementation: Test coverage and structure similar to Copilot Agent’s solution
  • Enhanced Documentation: Generated extensive documentation (possibly over-documented with 2 additional MD files)
  • Seamless Integration: Strong connection to Cursor IDE with one-click code import capabilities
  • User Experience: Intuitive web interface for PR management and review

Cost Consideration: Required enabling a $50 usage limit beyond the base plan, indicating additional per-use charges that teams should factor into adoption decisions.

Development Workflow: The tight integration with Cursor IDE provides a compelling workflow where developers can seamlessly move between web-based agent work and local development.

Conclusions

While our testing was limited to a single repository and use case, it surfaced three agents that delivered strong implementations with comprehensive coverage and professional execution: GitHub Copilot Agent, Claude Code, and Cursor Agent.

Although our test narrowed the field in this particular case, we encourage maintaining an experimental mindset. One experiment alone can’t provide definitive conclusions, but it serves as a valuable example of how to begin evaluating and adopting one or more coding agents in your own projects.

Since providers are rapidly evolving and competing, results will naturally vary over time. That’s why we recommend establishing periodic routines to evaluate how different coding agents handle low-risk, easy-to-validate tasks within your project context.

The good news? Running these experiments is relatively simple. It just requires adopting a mindset of continuous improvement and treating coding agents as part of your team’s evolving engineering toolkit.

The coding agent landscape is evolving rapidly, with each tool bringing unique strengths to autonomous development. Rather than replacing human developers, these agents are emerging as powerful force multipliers that enable teams to:

  • Scale Development Capacity: Handle routine tasks asynchronously while developers focus on architecture and complex problem-solving
  • Improve Code Quality: Establish and propagate best practices, documentation standards, and testing methodologies across codebases more consistently and rapidly than manual approaches alone
  • Accelerate Time-to-Market: Reduce setup overhead and boilerplate implementation time if applied to the right type of task (low risk, low-medium complexity)

References