navigation

Account

In your account you can view the status of your application, save incomplete applications and view current news and events

enEnglish

deGerman

May 13, 2026

AI Software Development –
How AI Agents Are Transforming Our Engineering

Data science

Working methods

What is this article about?

The article demonstrates how OTTO is transforming its software development from traditional coding to AI-assisted engineering: Spec-driven development, Git-controlled architecture (ADRs, C4 models), and an ecosystem of specialized AI agents (Copilot, LLMs) are creating more efficient development processes for microservices. The focus is shifting from code creation to system design, precise specifications, and AI-assisted review, resulting in significant improvements in quality, speed, and scalability.

Do you still have a Stack Overflow tab open? Or the AWS SDK documentation?

In the past, software developers who wrote code were constantly stuck poring over documentation, Stack Overflow, and similar sites. Today, those tabs are usually closed – not because we suddenly know every library by heart, but because the way we develop software has fundamentally changed.

In the logistics domain, we design and operate event-driven microservices on AWS: Lambda, API Gateway, DynamoDB, SNS, SQS, Kafka, and S3 – completely serverless, optimized for runtime, and closely integrated with the AWS SDKs. To illustrate the scale of this architectural challenge: Through just one of our synchronous HTTP APIs, we process several billion updates every month. These are rudimentarily validated in the API (Do we recognize the ID? Is the value within the valid range?) and acknowledged immediately with qualified feedback. The data then flows asynchronously into our underlying domain logic and is processed further in a matter of milliseconds.

How we scale this system landscape and operate it in a fail-safe manner could fill an entire article on cloud architecture. But this post focuses on something else: how we, as an engineering organization, develop these services.

Pure AI software development has evolved into true AI-assisted engineering – and along the way, AI has shifted from a tool to being a partner.

Hi, AI. Nice to meet you.

The first real change was subtle. We worked in the classic pair programming setup: two developers, one problem, one screen. At some point, a third partner joined us. The pair became a mob – human, human, AI.

The countless browser tabs for SDK documentation disappeared because it was simply more efficient to type the question directly into the Copilot chat. The cognitive disruption caused by constant context switching was eliminated. That alone noticeably accelerated our development speed and significantly shortened time-to-market.

Vibe Coding – The agent takes over

With GitHub Copilot and Agent Mode – which is a standard at our company – we’ve entered what is often called “vibe coding” today, marking the next stage of AI software development. We work with the AI in an iterative and exploratory manner, selecting the LLM (ChatGPT, Gemini, Claude, etc.) directly within the IDE based on the task at hand: one model excels at system design, while another is better suited for writing scripts.

At first, we had a clear rule: blindly adopting AI results is not an option. Nothing has changed in that regard – our company guidelines are unambiguous on this point. However, with increasingly powerful models, our focus is shifting noticeably: from writing code ourselves to critical review. The decisive lever here lies not in the code, but in the way we communicate our requirements to the AI.

The Five Pillars – How we bring architecture to life

At some point, vibe-coding reached its limits. The realization: AI is only as good as the context we provide it with. This is where spec-driven development comes in. For us, this means more than just “first a ticket, then code.” It is the connecting element between architecture, code, and AI.

To make these standards scalable, we solved a long-standing problem: architectural decisions used to be scattered across Confluence – without versioning, without integration into the review process. Today, our architecture is entirely in Git. It is no longer documentation about the system, but part of the system itself.

A crucial factor here: All these documents – especially our Architecture Decision Records (ADRs), C4 models, and ACCs – follow a strict structure and a fixed nomenclature. Only when terms and formats remain consistent can the AI process the context without room for misinterpretation.

Our architecture rests on five pillars:

1. Engineering Constitution – “What we believe”

At the heart of our architecture repository lies basis-constitution.md: our binding constitution. It defines global architecture and security standards, such as “serverless-first” and compliance with the OWASP Top 10. Its most important rule: Write the specification before you write code. Each service constitution may expand upon this constitution, but must never weaken it.

2. Architecture Decision Records (ADRs) – “Why we made this decision”

Every significant decision – whether it’s migrating from SNS to Kafka or switching technologies – is documented in Git as Markdown following a strict template. Thanks to this consistent, structured format of the ADRs, the AI agent accurately interprets the technical context and trade-offs. Every principle in the Constitution refers directly to these underlying ADRs – transparent for both developers and the AI.

3. C4 Models – “As it looks”

We visualize our architecture using C4 and have automated this process at the context, container, and component levels using PlantUML and our pipeline. In doing so, we deliberately distinguish between the current state and our vision: the diagrams show not only where we are now, but also where we want to go. (Recommended reading: You can read on how we generally use C4 models at OTTO in this tech blog article.)

4. Architecture Communication Canvas (ACC) – “The value we deliver”

Not every stakeholder wants to read diagrams. ACCs translate technical systems into business value: value proposition, quality requirements, stakeholders. Thanks to this standardized template, the AI agent knows exactly where to find the technical context for each service.

5. Service Specs – “What we’re building right now”

Before any code is written, features are described in detailed service specifications: user stories, acceptance criteria, data models, and tasks. The sequence is mandatory: tests first, then implementation. Only on this structured foundation does the AI generate production-ready code – without having to guess.

Specialized agents as the next level

Building on this structured foundation, we no longer work solely with a generic assistant, but with a small ecosystem of specialized agents.

An Architecture Agent validates architectural decisions against the Constitution, ADRs, C4 models, ACCs, AWS Well-Architected Pillars, and domain boundaries. Building on this, an Implementation Agent implements spec-driven development and guides the process from Specify through Clarify and Plan to Implementation. A Review Agent checks the generated code against specifications, plans, architecture, and security requirements before a PR is created. And a general Story Agent can formulate new Jira stories from existing knowledge or improve existing stories.

The benefit lies not only in automation, but in the clear division of responsibilities. Architecture, implementation, review, and story formulation speak the same language while operating in specialized roles. This increases quality without making the workflow more cumbersome, and it allows knowledge to be reused and scaled more effectively.

Architecture Repository (Git)
Eine KI-Agenten-Architektur mit Orchestrator, Architektur-Agent, Code-Agent, Review-Agent und Jira-Integration, die eine spezifikationsgesteuerte Entwicklung und automatisierte Software-Engineering-Workflows ermöglicht.

Figure: A look inside our architecture repository: Constitutions, ADRs, C4 models, and ACCs exist as versioned code in Git

The AI angle – Copilot as an Architecture Guardian

How does this work in practice? In every service repository, the agents access the same structured knowledge base via 'copilot-instructions.md', the Constitution, C4 models, and ACCs. This not only provides spec-driven development with better inputs, but also enables the agents themselves to perform architecture, implementation, and review within a shared set of rules.

Our set of rules is clear: When fundamental changes are made – for example, to infrastructure via Terraform or to domain boundaries in the context of Domain-Driven Design – the corresponding architectural artifacts must be updated. This ensures that the architecture remains not only documented but also consistent with the actual implementation.

GitHub Copilot can act as a reviewer in this process. If a specification or documentation is missing, the AI doesn’t simply block the process; ideally, it generates a draft directly. As a result, governance doesn’t become a hindrance, but an integral part of the development flow.

A real-world example from our engineering practice

A live incident was detected due to anomalies in API behavior: elevated 4xx error rates were observed in the AWS API Gateway. Our goal was to enhance monitoring and receive proactive alerts through existing channels should this issue recur. No further context was available at first.

From this point on, the specialized agents worked together:

Analysis: The Story Agent used the Architecture Agent and the Code Agent to examine the existing architecture and metrics.
Solution Proposal: The agents recognized that the API Gateway already provides suitable standard metrics and proposed two CloudWatch alarms: one for 4xx, one for 5xx errors.
Technical Clarification: The appropriate thresholds were determined in consultation with the experts and documented directly in the ticket.
Implementation & Approval: The finished story was sent to the Implementation Agent, including an AI review. A final human review approved the changes for automatic deployment.

The result: from live incident to production deployment, including documentation, the process took less than 30 minutes - and all code was generated by AI.

Incident to Deployment Workflow
KI-gesteuerter Workflow für die Reaktion auf Vorfälle mit Überwachung über AWS API Gateway, CloudWatch-Warnmeldungen, automatischer Erstellung von Jira-Tickets sowie Codegenerierung und Bereitstellung in weniger als 30 Minuten.

Figure: AI-powered workflow for responding to a live incident

Outlook – Specialized Agents and networked context

While we have established architecture governance using Git, the next level of automation is already on the horizon: no longer a single general-purpose agent, but a small ecosystem of specialized agents that are already being tested in practice.

One agent possesses comprehensive architectural knowledge and can reliably integrate ADRs, C4 models, and ACCs. Another specializes in formulating clean Jira stories and specifications from domain-specific input or improving existing stories. Yet another can derive the technical plan based on this, while taking our architectural guidelines into account.

MCP is a potential integration layer for connecting these agents to our knowledge sources in a structured way. For me, however, the decisive factor is not the individual technology, but rather that the agents access the right sources in a networked manner and, by working together, deliver better results than a general-purpose assistant.

The actual work then takes place in the background via the networked context:

The agent reads the ticket and jumps to the parent ticket – that is, the epic – to understand the business context.
If necessary, it pulls in linked Confluence pages to incorporate business value, KPIs, or diagrams.
At the same time, it takes into account the relevant architecture artifacts from Git: base constitution, service-specific constitution, and ADRs.
Based on this knowledge, it generates or refines the formal specification.
Based on the specification, the technical plan can then be developed, with us as developers and architects intervening only where there is genuine room for decision-making.

AI-Agenten-Ökosystem
Eine KI-Agenten-Architektur mit Orchestrator, Architektur-Agent, Code-Agent, Review-Agent und Jira-Integration, die eine spezifikationsgesteuerte Entwicklung und automatisierte Software-Engineering-Workflows ermöglicht.

Figure: AI agent architecture with an orchestrator, architecture agent, code agent, review agent, and Jira integration

Conclusion: What this means for us – and for other OTTO teams

The key takeaway: the era of simply writing code is over. AI software development is becoming the new standard – and is evolving into true AI-assisted engineering.

From implementer to system designer and reviewer. The focus is shifting: it’s about getting to the heart of technical problems, clearly defining system boundaries, and reviewing the code generated by AI as a critical quality control measure.
Precise specifications as the most important programming language. If you set vague requirements, you get vague code – or, in the worst case, AI hallucinations (Recommended reading: we’ve described here how good technical writing helps with this). Actively use your Jira tickets and architectural decisions as input for your LLMs.
Establish clear guardrails. Don’t assume that AI knows company policies or security standards on its own. Document your architecture and security rules in a machine-readable format so they serve as a reliable framework for developers and agents alike.
Invest in automated testing pipelines. Whether you use strict trunk-based development or take a different approach: a high degree of test automation and reliable CI/CD pipelines are the best way to safely bring AI to the forefront.

CI/CD & Test-Pipeline
CI/CD-Pipeline für KI-generierten Code mit automatisierten Test-, Build- und Bereitstellungsabläufen, die für Zuverlässigkeit in serverlosen AWS-Mikroservice-Umgebungen sorgt.

Figure: No trust in AI code without safeguards

Our job as tech experts is to set clear guidelines that ensure the code we generate delivers real, scalable business value for OTTO. AI doesn’t make us obsolete – it makes us more effective. The technology is ready. Now it’s our turn.

Want to join the team(s)?

Jobsuche

4 people like this.

0No comments yet.

Write a comment

Leave us a comment here and let the authors know what you thought of the article.

Answer to: Reply directly to the topic