OTTO's history of using data centers is long and full of anecdotes. When I myself joined OTTO in 2010, most cloud providers were still in their fledgling state and cloud computing was far from being state-of-the-art for running heavy workloads. Besides the immaturity of the technology used and legal restrictions for using cloud providers, the company was not in a position to use a cloud provider. Cloud providers started offering virtual machines on demand. But the costs for having such machines run 24/7 were quite high compared to running virtual machines at an on-prem data center.
When we decided to build an independent webshop for otto.de (Lhotse) we opted for a microservice architecture which shared nothing (but ops). This architecture allowed us to run workloads on separate computer engines, but we were still running them in a single data center to ensure low latency between requests over different verticals. This journey became truly exciting when we decided to set up a microservice platform which, through orchestration, automatically runs several containers and creates pools for load balancing.
Running a cluster management system such as Kubernetes or Mesos is helpful when you can run multiple services in the same cluster. Assuming that not all systems are faced with identical workload patterns, you are able to run fewer servers in the cluster than needed for the maximum workload of all microservices. In the real world, the daily load generated at otto.de follows a predictable course – medium daytime load, high load in the evenings, minimum load at night. As a result we are forced to set up our clusters in a way that will ensure optimum handling of load peaks all day, no flexibility at all. Having a cluster sized to maximum peak is hardly result-orientated, especially if one's provider does not offer pay-per-use models. In my personal experience, having a shared cluster for multiple services requires an experienced, well-skilled cluster operations team. Having to install updates or to control services that have suddenly decided to run wild will often result in various other services being affected which may lead to a major outage.
Moving otto.de into AWS has fixed many of these issues, on the other hand product teams are now obliged to run their services on their own – responsibility has shifted from a central operations team to individual product teams.
The same approach was used when OTTO launched its Platform (aka DeepSea) welcoming partners to offer products on otto.de. Teams for DeepSea are building their own cloud native software in team-managed AWS accounts preferring data exchange with event sourcing patterns. There are several options for implementing this in AWS, including running RESTful interfaces with feeds as well as different queuing solutions. The main pattern, which is generally accepted, is using AWS SNS for pushing messages and AWS SQS for polling messages. This is a non-limiting way of communicating that provides technological independence.
Adopting a one vendor strategy is advantageous with regard to sharing knowledge, seamless onboarding of partners, competitive discounts and easy-to-use technical solutions. On the other hand, you have to deal with the lack of flexibility, the collective decision-making power of teams is restricted and its never advisable to have to depend on just one single supplier. We have a strong relationship with our supplier AWS built on trust, we often discuss how to handle specific issues and the AWS customer team always offers first-rate advice and helpful solutions. But the problem remains the same, AWS is just one among many suppliers providing cloud computing services in an ever growing market – and while the market is increasing, the gap between offers made by AWS and those of other cloud providers is getting smaller.
In fact, most software development teams at OTTO are using AWS as an underlying infrastructure. Besides the huge IT landscapes comprising otto.de and DeepSea, there are various other departments deploying Microsoft Azure or Google's Cloud Platform (GCP) to implement their services. The differences are not always that evident or significant, but departments independently decide on which providers to use by comparing different solutions and going for the one that fits their requirements best. OTTO is multi-cloud at company level, but single-cloud regarding cloud use in individual departments.
Multi-cloud does not make your life easier in software development, it entails greater complexity and increases the number of specialized skills required for running services properly. Keeping your product cloud agnostic is not a good decision at all. You lose the benefits of quality services provided by a cloud provider. Using high-level, specialized services helps to conserve one’s own resources and enables teams to invest time into creating added value. In simpler terms, we earn money by delivering required features instead of updating database clusters. Cloud agnostic is something I personally would not recommend. Running workloads can be facilitated through containers, but from an overall perspective, you definitely want to make use of high-level services to get rid of time-consuming and often not too rewarding routine tasks. To give an example, why should you use a self-hosted queue on AWS when AWS offers different queuing solutions? Even with given limitations, it‘s often easier to use a managed service than to install, update, monitor, bug-fix, ensure constant availability, etc. for self-hosted solutions.
First, we have reached a sound and reliable level of experience using AWS which allows us to face the complexity generated by adding a second provider. With a wider range of possible solutions, our product teams will become more flexible when it comes to implementing their products. I think that it’s a benefit to be able to choose the technical solutions that best match the requirements of a product.
From a company perspective I support the notion of being less dependent on just one single vendor. We accept the extra complexity because employees need to have a better understanding of both cloud providers. Since other cloud providers have caught up with recent technological developments, we find similar and comparable services to be offered by various cloud providers. Differences exist, but to identify them you have to deal with the applications in greater detail. From a technological angle, cloud providers offer comparable specialized services in all areas. Technically, communication between cloud providers can be achieved using REST-orientated APIs or queues such as AWS SNS/SQS or GCP pubsub. All cloud providers offer container runtimes, serverless frameworks or databases to store data.
We support teams by helping them select a single provider. Following a strategy aiming at teams using two or more cloud providers simultaneously is not our goal. This would increase internal product complexity and wouldn’t increase the speed of team processes at all. If a team has sensible reasons for going multi-cloud, we will accept this. We intend to achieve a balanced use of cloud providers and will try our hand at experimenting with with GCP next to AWS.
We feel that a new journey has begun and we are curious as to where it will take us.