navigation

Account

In your account you can view the status of your application, save incomplete applications and view current news and events

enEnglish

deGerman

December 03, 2018

OTTO goes AWS - Part 2

Architecture

Part 2: Experiment "Decentralized Operation"

One of the key questions during the AWS migration was whether all services that had previously been provided by a central Operations / Platform Engineering team could be successfully decentralized. Previously, we had a traditional hosting contract with a provider who managed the infrastructure for us in the form of provisioned VMs. In OTTO E-Commerce, there were several Platform Teams (P-Teams) that developed and operated infrastructure services for the Development / Feature Teams (F-Teams) based on this.

For example, there was a MESOS platform operated by OTTO for the microservices of the F teams, and several MongoDB clusters as databases for each F team, which were further developed and operated by these central P teams. Furthermore, many services necessary for development such as LDAP server for authentication, Jenkins for deployments, etc. With the migration to AWS, the need for a central operations team at OTTO is now eliminated for many business services.

Take database as an example - Now, when using the high-level services offered by AWS such as DynamoDB, the F teams can use them immediately without worrying about operations in terms of updates, maintenance, etc. to worry about. So pretty much any service that was previously managed centrally can have a counterpart in AWS. Instead of a Jenkins, for example, teams can use Code Pipeline in conjunction with Codebuild to deploy their services. If that's not enough for a team, they could run a Jenkins themselves on EC2 or use an entirely different solution. Using AWS' Shared Responsibility Model, it's easy to see what this change means for AWS customers:

How much more central do you need?

A noble goal of the internal migration team was to decentralize as many of the services that were previously managed centrally at OTTO as possible. To achieve this and further strengthen team autonomy, we chose an account structure from the outset that allowed each team to fully manage at least two AWS accounts (one for Live and one for Nonlive) and deploy services into them. However, as the project has progressed, it has become clear that full decentralization does not always make sense everywhere. As far as databases, VMs and deployment tools are concerned, it was easy to argue that this is now up to the development teams. The requirements and preferences of the teams are too different and it was recently already difficult for the central P teams to meet the wishes of the developers.

For other services, such as administration of the DNS root zones, maintenance and further development of overlapping processes (account creation, user creation), authentication server, etc., it still makes sense to have an overlapping team that takes care of such things. In case of a decentralization, these efforts would occur in every team (there are no deviating requirements) and no development team can do such tasks on the side, respectively related efforts would always compete with functional features in the prioritization. Therefore a dedicated team 'Service Integration' has been created within the project. There is also a team that takes care of overarching security aspects, advises the other teams on these topics, and ensures that basic rules such as encryption, non-accessibility of internal services from the outside, etc. are guaranteed through checks in the accounts (e.g. the CIS benchmarks).

Even though the teams are now responsible for operations and the associated incident management themselves, there is still a small central on-call team where the alerting and communication strands converge in the event of a technical problem. The teams themselves are responsible for the alerting process, i.e. for sending alarms or warnings to a central monitoring system. The same applies to troubleshooting, since the know-how about the application and infrastructure is available in the development teams.

How closely does this tie me to the cloud service provider?

In terms of using the services offered by AWS, we made a conscious decision in our area. Instead of building our applications in such a way that they could also be moved to other clouds or even a local data center without much additional migration effort, we have opted to leverage the benefits of the cloud as far as possible for us and to use managed services from AWS where it makes sense.

We have met the protection of sensitive customer data or strategic business data with consistent encryption of all data both in the stored state ('at rest') and during transfer ('in transit') even within the private network segmentsTo get the maximum out of the cloud in terms of operation, flexibility and also costs, we have therefore consciously entered into the commitment to the service provider. A possible later migration (in whole or in part) to other cloud providers is of course still possible, since they offer similar concepts and services and we have already done a lot of preliminary work with the Lhotse project by breaking down our monolith into hundreds of small microservices that can be deployed quickly. However, if you were to build your services in advance so that you could run them everywhere with minimal migration effort, you would have to rely on abstraction layers that you could manage yourself (and thus replicate AWS). Thus, the (especially central) operating expenses would be at the same level as before. The flexibility to try something out quickly also disappears - since the services would first have to be provided by a central team, which can quickly become a bottleneck.

Nevertheless, we have set up our applications and especially the communication channels between our applications in such a way that we can also seamlessly integrate other cloud services or classically hosted services. There will be more on this in an upcoming blog article on 'Inter-Backend Communication'. Thus, despite leveraging the advantages of AWS, the strengths of various other cloud providers can still be leveraged.

Cultural changes

In conclusion, the biggest challenges were not technical but rather cultural and organizational. By no longer being dependent on central P-teams, F-teams also have all the freedom to choose services they want to use. On the other hand, the F teams now also have the obligation to operate these services themselves. Even though the development teams at OTTO are very similar in composition and work according to the same technological and methodological principles, the reaction to these new tasks was very different.

One of the main tasks of the central migration team at OTTO was therefore to prepare the teams for these new tasks and, in addition to the disadvantages that were often seen first by the employees, also to point out the advantages. Through the AWS migration, the developers have now grown much closer together with their operational colleagues and in many cases have integrated them into their F teams. In the process, both sides can learn from each other and broaden their scope. There will be more on this exciting topic soon in the third part of this series.

Conclusion

A few months after completing the migration, we can say that our experiment with decentralized operation has largely been a success. While we have not managed to fully decentralize operations, with the exception of a few centralized services, the F teams now independently manage their applications from development to operation with everything that entails. As expected, there are also differences in terms of the use of the cloud services offered. Most teams use the managed services offered by AWS and thus save operating expenses, while a few teams with special requirements manage a small part of their services themselves and thus accept higher operating expenses.

0No comments yet.

Write a comment

Leave us a comment here and let the authors know what you thought of the article.

Answer to: Reply directly to the topic

Written by

Techblogger

About the author

OTTO goes AWS - Part 2

Part 2: Experiment "Decentralized Operation"

How much more central do you need?

How closely does this tie me to the cloud service provider?

Cultural changes

Conclusion

Written by

Similar Articles

About the development of genAI assistants AskARev and Searchbuddy

Meet Nina – From Healthcare Management to Software Development

We want to improve out content with your feedback.