The article demonstrates how OTTO is transforming its software development from traditional coding to AI-assisted engineering: Spec-driven development, Git-controlled architecture (ADRs, C4 models), and an ecosystem of specialized AI agents (Copilot, LLMs) are creating more efficient development processes for microservices. The focus is shifting from code creation to system design, precise specifications, and AI-assisted review, resulting in significant improvements in quality, speed, and scalability.
In the past, software developers who wrote code were constantly stuck poring over documentation, Stack Overflow, and similar sites. Today, those tabs are usually closed – not because we suddenly know every library by heart, but because the way we develop software has fundamentally changed.
In the logistics domain, we design and operate event-driven microservices on AWS: Lambda, API Gateway, DynamoDB, SNS, SQS, Kafka, and S3 – completely serverless, optimized for runtime, and closely integrated with the AWS SDKs. To illustrate the scale of this architectural challenge: Through just one of our synchronous HTTP APIs, we process several billion updates every month. These are rudimentarily validated in the API (Do we recognize the ID? Is the value within the valid range?) and acknowledged immediately with qualified feedback. The data then flows asynchronously into our underlying domain logic and is processed further in a matter of milliseconds.
How we scale this system landscape and operate it in a fail-safe manner could fill an entire article on cloud architecture. But this post focuses on something else: how we, as an engineering organization, develop these services.
Pure AI software development has evolved into true AI-assisted engineering – and along the way, AI has shifted from a tool to being a partner.
The first real change was subtle. We worked in the classic pair programming setup: two developers, one problem, one screen. At some point, a third partner joined us. The pair became a mob – human, human, AI.
The countless browser tabs for SDK documentation disappeared because it was simply more efficient to type the question directly into the Copilot chat. The cognitive disruption caused by constant context switching was eliminated. That alone noticeably accelerated our development speed and significantly shortened time-to-market.
With GitHub Copilot and Agent Mode – which is a standard at our company – we’ve entered what is often called “vibe coding” today, marking the next stage of AI software development. We work with the AI in an iterative and exploratory manner, selecting the LLM (ChatGPT, Gemini, Claude, etc.) directly within the IDE based on the task at hand: one model excels at system design, while another is better suited for writing scripts.
At first, we had a clear rule: blindly adopting AI results is not an option. Nothing has changed in that regard – our company guidelines are unambiguous on this point. However, with increasingly powerful models, our focus is shifting noticeably: from writing code ourselves to critical review. The decisive lever here lies not in the code, but in the way we communicate our requirements to the AI.
At some point, vibe-coding reached its limits. The realization: AI is only as good as the context we provide it with. This is where spec-driven development comes in. For us, this means more than just “first a ticket, then code.” It is the connecting element between architecture, code, and AI.
To make these standards scalable, we solved a long-standing problem: architectural decisions used to be scattered across Confluence – without versioning, without integration into the review process. Today, our architecture is entirely in Git. It is no longer documentation about the system, but part of the system itself.
A crucial factor here: All these documents – especially our Architecture Decision Records (ADRs), C4 models, and ACCs – follow a strict structure and a fixed nomenclature. Only when terms and formats remain consistent can the AI process the context without room for misinterpretation.
Our architecture rests on five pillars:
At the heart of our architecture repository lies basis-constitution.md: our binding constitution. It defines global architecture and security standards, such as “serverless-first” and compliance with the OWASP Top 10. Its most important rule: Write the specification before you write code. Each service constitution may expand upon this constitution, but must never weaken it.
Every significant decision – whether it’s migrating from SNS to Kafka or switching technologies – is documented in Git as Markdown following a strict template. Thanks to this consistent, structured format of the ADRs, the AI agent accurately interprets the technical context and trade-offs. Every principle in the Constitution refers directly to these underlying ADRs – transparent for both developers and the AI.
We visualize our architecture using C4 and have automated this process at the context, container, and component levels using PlantUML and our pipeline. In doing so, we deliberately distinguish between the current state and our vision: the diagrams show not only where we are now, but also where we want to go. (Recommended reading: You can read on how we generally use C4 models at OTTO in this tech blog article.)
Not every stakeholder wants to read diagrams. ACCs translate technical systems into business value: value proposition, quality requirements, stakeholders. Thanks to this standardized template, the AI agent knows exactly where to find the technical context for each service.
Before any code is written, features are described in detailed service specifications: user stories, acceptance criteria, data models, and tasks. The sequence is mandatory: tests first, then implementation. Only on this structured foundation does the AI generate production-ready code – without having to guess.
Building on this structured foundation, we no longer work solely with a generic assistant, but with a small ecosystem of specialized agents.
An Architecture Agent validates architectural decisions against the Constitution, ADRs, C4 models, ACCs, AWS Well-Architected Pillars, and domain boundaries. Building on this, an Implementation Agent implements spec-driven development and guides the process from Specify through Clarify and Plan to Implementation. A Review Agent checks the generated code against specifications, plans, architecture, and security requirements before a PR is created. And a general Story Agent can formulate new Jira stories from existing knowledge or improve existing stories.
The benefit lies not only in automation, but in the clear division of responsibilities. Architecture, implementation, review, and story formulation speak the same language while operating in specialized roles. This increases quality without making the workflow more cumbersome, and it allows knowledge to be reused and scaled more effectively.
Figure: A look inside our architecture repository: Constitutions, ADRs, C4 models, and ACCs exist as versioned code in Git
How does this work in practice? In every service repository, the agents access the same structured knowledge base via 'copilot-instructions.md', the Constitution, C4 models, and ACCs. This not only provides spec-driven development with better inputs, but also enables the agents themselves to perform architecture, implementation, and review within a shared set of rules.
Our set of rules is clear: When fundamental changes are made – for example, to infrastructure via Terraform or to domain boundaries in the context of Domain-Driven Design – the corresponding architectural artifacts must be updated. This ensures that the architecture remains not only documented but also consistent with the actual implementation.
GitHub Copilot can act as a reviewer in this process. If a specification or documentation is missing, the AI doesn’t simply block the process; ideally, it generates a draft directly. As a result, governance doesn’t become a hindrance, but an integral part of the development flow.
A live incident was detected due to anomalies in API behavior: elevated 4xx error rates were observed in the AWS API Gateway. Our goal was to enhance monitoring and receive proactive alerts through existing channels should this issue recur. No further context was available at first.
From this point on, the specialized agents worked together:
The result: from live incident to production deployment, including documentation, the process took less than 30 minutes - and all code was generated by AI.
Figure: AI-powered workflow for responding to a live incident
While we have established architecture governance using Git, the next level of automation is already on the horizon: no longer a single general-purpose agent, but a small ecosystem of specialized agents that are already being tested in practice.
One agent possesses comprehensive architectural knowledge and can reliably integrate ADRs, C4 models, and ACCs. Another specializes in formulating clean Jira stories and specifications from domain-specific input or improving existing stories. Yet another can derive the technical plan based on this, while taking our architectural guidelines into account.
MCP is a potential integration layer for connecting these agents to our knowledge sources in a structured way. For me, however, the decisive factor is not the individual technology, but rather that the agents access the right sources in a networked manner and, by working together, deliver better results than a general-purpose assistant.
The actual work then takes place in the background via the networked context:
Figure: AI agent architecture with an orchestrator, architecture agent, code agent, review agent, and Jira integration
The key takeaway: the era of simply writing code is over. AI software development is becoming the new standard – and is evolving into true AI-assisted engineering.
Figure: No trust in AI code without safeguards
Our job as tech experts is to set clear guidelines that ensure the code we generate delivers real, scalable business value for OTTO. AI doesn’t make us obsolete – it makes us more effective. The technology is ready. Now it’s our turn.
Want to join the team(s)?


We have received your feedback.