navigation

Account

In your account you can view the status of your application, save incomplete applications and view current news and events

enEnglish

deGerman

December 17, 2025

Learning to Rank is Rocket Science: How Clojure accelerates Our Machine Learning with Deep Neural Networks

Development

In e-commerce search, decisions made in seconds determine whether customers stay or leave. At OTTO, we rely on learning to rank: instead of fixed rules, our models learn from millions of real interactions which products are truly relevant.

In this article, we explain why Clojure is crucial to our production stack in the field of machine learning — and how Clojure and Polylith help us run complex machine learning pipelines in a stable, efficient, and maintainable way.

From Deep Space 1 to Learning to Rank at OTTO

For years, Gradient Boosted Decision Trees (GBDTs) were the undisputed champions when it came to producing precise rankings for our customers. They are reliable, interpretable, and remarkably capable. Yet with our Deep Neural Network, we were able to outperform our champion.

Our goal: take our customers’ search experience to the next level, efficiently and cost-effectively. To do this, we use Clojure, a programming language that makes it easy for developers to maintain and extend code. A language that’s fun to use.

What is Clojure? A programming language based on Lisp. Lisp (short for “List Processor”) is a language developed in 1958 by John McCarthy at MIT. It is known for its unique syntax, powerful metaprogramming capabilities, and strong influence on the development of other programming languages.

Lisp has already been used in space. More precisely, Lisp was an important part of the software used on Deep Space 1 (DS1). The system for autonomous control of the spacecraft (Remote Agent) was written in Lisp.

Raumsonde Deep Space 1 im Weltall/ Quelle: NASA
Raumsonde Deep Space 1 im Weltall mit Galaxie im Hintergrund/ Quelle: NASA (https://science.nasa.gov/mission/deep-space-1/)

A remarkable aspect of using Lisp on DS1 was the ability to run a Read–Eval–Print Loop (REPL) on the spacecraft. This proved extremely valuable when debugging issues in space, as engineers were able to inspect and adjust code on the spacecraft, solving problems “in flight.”

Python is the first-class citizen for machine learning. Nevertheless, we deliberately chose the Lisp dialect Clojure to implement our Deep Neural Network–based Learning to Rank service. That may sound unusual at first. But Lisp can operate complex, fault-tolerant systems in one of the most demanding environments. This was already demonstrated on Deep Space 1, which makes it an ideal tool for AI and autonomous systems. That’s more than enough reason for us to explore it further.

What makes Clojure special?

Clojure is a dynamic, functional programming language that operates on the Java Virtual Machine (JVM). It was designed to combine the strengths of functional programming with the reliability and rich ecosystem of the JVM.

The language is based on Lisp, one of the most important languages in the history of computer science. Its innovative ideas and concepts have profoundly shaped many other languages and continue to do so today. Showing that good ideas are timeless - still relevant and valuable decades later.

Comic-Sketch zu Lisp/ Quelle: xkcd
Comic in drei Panels: Eine Person steht vor einem Computer und sagt, Lisp sei über 50 Jahre alt und habe eine zeitlose Ausstrahlung. Im zweiten Panel sitzt dieselbe Figur am Rechner und fragt sich, ob immer neue Generationen die Sprache wiederentdecken werden. Im dritten Panel hält eine Person einen Stapel gezeichneter Klammern hoch und sagt: „These are your father’s parentheses — elegant weapons for a more civilized age.Quelle: https://xkcd.com/297/

Released in 2007 by its creator, Rich Hickey, Clojure was developed with an emphasis on simplicity, clarity, and pragmatism. Hickey highlighted features like immutability, higher-order functions, and metaprogramming. All while keeping the language practical and approachable.

Beyond its elegance, Clojure offers many advantages:

• Concurrency: It was designed from the ground up for parallelism and concurrency, allowing it to take full advantage of modern hardware.

• Immutability: By default, Clojure’s data structures are immutable, keeping application state clear and traceable.

• JVM Interop: Running on the JVM gives Clojure direct access to the huge and mature ecosystem of Java libraries. This means developers can utilize proven solutions for things like database access, networking, and more.

• REPL-driven development: Clojure's REPL (Read-Eval-Print Loop) offers an interactive, incremental development style. This allows engineers to write, test, and debug code on a running application, which provides fast feedback and makes prototyping more efficient.

• Functional programming: The foundation of Clojure is its use of pure functions, computations that operate without side effects. This improves testability, enables parallelization, and leads to better software design.

Why Clojure for Machine Learning?

In the world of Machine Learning (ML), Python and R are the leading languages. They offer an easy entry point, numerous tools, and extensive documentation, which simplifies exploratory and prototyping work. However, in our business environment, we must focus more on operational stability and the cost efficiency of the solution. That is why we rely on Clojure and benefit from the following advantages:

• (Cost-)Efficient Data Pipelines and Modular Feature Engineering:
Transformations in our data pipelines are inherently functional. Large datasets must be manipulated, filtered, and aggregated to obtain new data. Most operations are independent and can be processed as a stream. This leads to better parallelization, improved testability, and reduced memory usage.

• Operational Resilience: Running on the JVM means we gain excellent performance in terms of runtime speed and memory footprint. Its comprehensive suite of application monitoring tools allows us to accurately diagnose and eliminate both performance and memory-related issues.

How we operate our Neural Network

Our architecture is built around a PostgreSQL database containing all queries and its products. At serving time, Clojure microservices access this data to rank incoming search results (Retrievals) based on the relevance scores.

The scores are generated by predictions from our Deep Neural Network. It operates on relevance signals, interaction data of our customers, as well as the product data. Our machine learning pipelines are built as Clojure jobs and run on AWS. They prepare data for model training and prediction.

Architekturübersicht unserer Learning-to-Rank-Pipeline
Diagramm einer Machine-Learning-Pipeline für Learning-to-Rank. Links fließen Produktdaten aus Kafka und Bewegungsdaten aus BigQuery in zwei Clojure-Jobs ein, die ihre Ergebnisse jeweils in einem S3-Bucket speichern. Diese Ausgaben werden von weiteren Jobs verarbeitet, bis ein finaler Job die Daten an eine Postgres-Datenbank übergibt. Rechts greift ein API-Service auf die gespeicherten Relevanzscores zu und stellt sie für Kund*innenanfragen bereit. Eine Legende markiert Jobs in Grün und Services in Blau

Our ETL pipelines are designed around the UNIX pipe concept. Each pipeline is composed of small, specialized jobs that handle tasks, like reading raw data from Kafka or transforming it. The output of each job is written to S3 as a file, which then serves as the input for subsequent jobs.

From Data Blocks to Data Streams: Streaming Pipelines with core.async

Inspired by UNIX pipes as well, our jobs treat data as a stream. Individual items flow through a series of processing steps like reading, filtering, and mapping. We use core.async channels to connect these steps, creating a continuous data flow.

Streaming-Architektur eines Clojure-Jobs
Diagramm eines Streaming-Jobs für Produktdaten: Aus Kafka eingehende Daten werden von mehreren parallelen Threads gelesen, anschließend in einer Map-Phase transformiert und schließlich von weiteren Threads in einen S3-Bucket geschrieben. Die Grafik zeigt für jede Phase mehrere blaue Thread-Blöcke sowie rote Warteschlangen-Elemente. Eine Legende markiert Threads in Blau und Queued Items in Rot.

Clojure channels let us process data in parallel, with multiple threads working on the same channel. By tuning thread counts per step, we use CPU cores efficiently.

The ->> threading macro connects functions in a clean pipeline, improving code clarity and data flow traceability.

Clojure Code mit Pipeline-Funktionen
Ein bunter Computer-Code in der Programmiersprache Clojure auf dunklem Hintergrund

Schema-based Data Storage with Protobuf

Our jobs process files in Protobuf format compressed with lz4 or zstandard, which results in the following advantages:

• Self-documenting data: Protobuf is a schema-based format and therefore documents the content of each record. When requirements change, we must update the schema and ensure the description remains up to date.

• Built-in validation: Schemas enforce correct data types and mandatory fields. Jobs validate these constraints automatically when processing files.

• Higher efficiency: By using the binary format, our jobs spend much less time on string processing and need significantly less time for reading and writing files.

• Streamable format: Protobuf's streaming capability lets us read and write files record-by-record, keeping memory usage low even with large datasets.

We used to share code through libraries, resulting in complex dependency chains. For this project, we wanted to simplify dependency management significantly. So we decided to give the monorepo approach a try and use Polylith to organize the project and manage dependencies.

Grafik zu Code-Sharing mit Polylith
Diagramm mit zwei gegenübergestellten Architekturansätzen. Links der klassische Library-Ansatz: Mehrere einzelne Komponenten wie „logging“, „aws“ und „pipeline-io“ bilden lose verbundene Libraries, auf die einzelne Jobs (job1, job2, job3) separat zugreifen. Rechts der Monorepo-Ansatz: Alle Jobs und Libraries liegen als klar getrennte Verzeichnisse in einem gemeinsamen Repository, visualisiert durch übereinanderliegende Module innerhalb eines großen Blocks. Eine Legende markiert Repositories, Directories, Business-Jobs und Libraries.

The benefits we have realized:

• Directories instead of separate projects: We organize libraries, jobs, and services as "building blocks" in separate directories within a single repository. Dependencies are just references to building block directories. This eliminates cross-repository management overhead.

• Transparent dependencies: Because applications always work with current shared code, no library-style versioning is needed, which entirely eliminates the possibility of outdated dependencies. Furthermore, development tooling finds all code references automatically, allowing us to refactor safely and confidently.

• Safety during changes: When we change functionality, we run all tests in the monorepo and immediately validate all affected components.

• Automated CI/CD with GitHub Actions: Polylith automatically determines which projects have changed and require a rebuild. Once identified, a single pipeline runs all tests, creates artifacts, and deploys them to development and production environments.

Summary

True, we're not launching rockets here. But Lisp and Clojure's proven advantages remain relevant today. They power efficient machine learning pipelines, enable fast iteration cycles, and deliver predictable costs.

If you're pursuing a similar direction, these topics are worth exploring:

• Clojure, a data-centric, simple, and performant language
• Monorepo, for clear source code organization
• Protobuf, for documentation, validation, and efficient processing

And if you want to delve deeper into the basics of learning to rank, you will find the right introduction in our previous articles:

• Learning to Rank – To the Moon and Back (-propagation): Deep Neural Networks That Learn to Rank What You Love
• Part 1: Introduction in Learning to Rank for E-Commerce Search
• Part 2: Data Collection for Learning to Rank

The journey continues for us as well. 🚀 We are planning personalized rankings that need to be calculated in real time and can no longer be pre-calculated. Clojure provides the foundation to meet this challenge effectively.