As soon as your technical team reaches a certain size, you usually start to divide your platform into specialized servers (aka microservices) that are individually more manageable by a dedicated small team. But those different services still need to smoothly work together to deliver your business goals.
In this article, I describe different approaches and why I think Apache Pulsar is an ideal choice to build a modern orchestration engine at scale based on what we learnt building Zenaton. If you are interested to follow this work, please subscribe here.
Problem: booking a flight + hotel
Let’s consider a typical “flight+hotel booking” process with the following steps:
- Process payment
- Book flight
- Book hotel
- Notify customer
Each of those steps is processed by a different service and may (and will) fail for technical (network issues, system failures…) or business (fraud detection, inventory not available anymore…) reasons. As often, the main difficulties in an actual implementation would come from those issues and the needs to respond gracefully to them (retry, reimbursement, notifications, booking cancellations..).
A first implementation that comes to mind is to have a coordinator that calls each system and ensure that each step is orchestrated according to business needs. For example a simple controller sequentially requesting each service through HTTP calls can do the trick. But It’s a brittle implementation as any issue into only one service will propagate to the entire process.
Also when you start having a lot of services and processes, it becomes an issue to keep track of who is using a particular service, making it difficult to update. Also each orchestrator needs to know each service to be able to connect directly (which servers? Which APIs?).
The current practice is to implement an event-driven architecture. Each service is connected to a message bus (hello Kafka!), subscribes to some messages and publishes others. For example, the payment service will subscribe to the “CheckoutTravelCart” command message and produce a new “PaymentProcessed” event. This latter will be catch by the flight booking service to trigger a “FlightBooked” event, catch by the hotel booking service to trigger the “HotelBooked” event, and so on.
This approach is seen as more decoupled as you do not have synchronous interactions between services. If a particular service is down, the process will pause and will resume as soon as the broken service is fixed. Nevertheless, this architecture is far from being a silver bullet as the definition of your business processes is actually distributed through events subscriptions and publishing, making it hard to update and really hard to have a clear understanding of the business situation without adding a dedicated service that will actually record and monitor each event.
Actually, you can extend the event-driven architecture by adding an orchestrator service. This service will be in charge of triggering commands based on the history of past events. This orchestrator maintains a durable state for each business process, making it easy both to implement sophisticated processes and to have a source of truth for each process state. Moreover, each service is even more decoupled as it does not need anymore to be aware of other service’s events. They can behave as simple asynchronous services receiving their own commands and publishing their own events only.
The main downside of this architecture is that it’s actually a lot of work to implement — you need:
- a job manager system. This job manager is responsible for processing commands. A client does the interface between the job manager and the service and ensures to send the results back to the event bus. In case of failure, the job manager should handle the retries gracefully. Ideally it should also provide a monitoring tool showing the state of each job. This monitoring tool should let an administrator cancel or retry a job. Indeed a simple failed job can block an entire complex workflow, so it’s essential to provide the tooling to fix any issue that may occur.
- An orchestration system able to receive an event and decide which command to trigger next based on the workflow definitions and the current workflow state. This work is called a “decision” and can be delegated to a special decision service. To ensure consistency, the orchestrator must guarantee that there is never more than one decision processed in parallel for the same process. A Timer service should also be added to handle time-sensitive action (eg. “wait until next Monday 9am”).
Here is a more detailed technical view of requirements :
This architecture was actually implemented at Zenaton:
- Event Bus and Command Queues were a managed RabbitMQ cluster
- Databases were large managed Postgresql instances (with hundreds of millions of lines)
- Orchestrator and the Timer were in-house services developed in Elixir
- Decision Service was in node.js (as workflows were defined in node.js).
Despite having not that much customers, we were struggling with scalability issues:
- Orchestrator was a in-memory stateful service, making it hard to scale horizontally
- Postgresql databases became painfully slow when calculating metrics
- RabbitMQ can not scale horizontally (but we did not reach the limit)
- Overall the platform was already too complex
In the next section, I’ll show how Apache Pulsar is actually a much better platform to build this type of architecture.
Using Apache Pulsar
Pulsar is sometimes presented as the “next-generation Kafka”. It’s already used in production at — very — large scale at Yahoo or Tencent. It has an impressive set of features:
- unified messaging system that handle both streams and message patterns
- horizontal scaling by use of Zookeeper and Bookkeeper, and stateless brokers
- functions able to do stateful computations
- Delayed messages
- ability to SQL-query structure data from Pulsar using Presto
- natively multi-tenant
- native Geo-replication
Let’s see how we can take benefit of these features to reach this simple architecture :
Unified messaging systems
An ideal event-driven architecture requires both a streaming platform for managing events and a queuing system for job processing. Of course, it’s possible to have only one or the other — at Zenaton we had a unique RabbitMQ cluster — but you make your life more complicated.
Apache Pulsar proposes a unified messaging/streaming system. It means you do not need to operate 2 different clusters and can use Pulsar to provide command queues and the event bus.
Horizontal Scaling of Storage
By leveraging Bookkeeper, Pulsar provides you out of the box an horizontal scaling of your storage. Bookkeeper is used to store your messages permanently. Of course you can define a limit in retention. You can also set up a tiered storage to offload long-term storage to S3 or other providers.
Using Pulsar, you do not need additional databases.
Functions And Stateful Computations
As described in Pulsar documentation, functions are lightweight compute processes that
- consume messages from one or more Pulsar topics,
- apply a user-supplied processing logic to each message,
- publish the results of the computation to another topic
Pulsar lets you “deploy” functions in its own cluster and ensure they are up and running. Pulsar also provides an API to retrieve and store a state in the embedded Bookeeper cluster.
Functions can be used to deploy our orchestrator, our decision service and even clients (if services can be reached through gRPC or http) directly inside the Pulsar cluster.
> Note: as of summer 2020, it’s not possible to configure a function to guarantee than not more than one decision is running at a given time. It should be possible in next release 2.7. Also, stateful functions are still in developer preview. You can expect some bugs, but hopefully they will be soon fixed.
Delayed messages can be used in the job manager to manage retries of failed jobs, but can also be used to manage delays in workflows.
By using this feature, you do not need a Timer service.
Ability to SQL-Query Data From Pulsar
As Pulsar provides a way to query data stored in Bookkeeper, it’s quite easy to set up an API that a dashboard can use to display what happens in workflows. Cherry on the cake, this API will include the data offloaded to tiered storage as S3.
> Note: as of summer 2020, it’s not possible to add indexes to these data. So some requests may be long. Again this should change in coming releases of Pulsar.
Based on my experience at Zenaton, Apache Pulsar appears to be an ideal platform to build a lean but powerful enterprise-grade event-driven workflow orchestration platform.
That's what I'm doing at Infintic. Infinitic proposes a "workflow as code" pattern, that lets you describe workflows as if you were on a single infallible machine! If you are not familiar with this pattern, please read code is the best DSL for building workflows.
You can have amore complete description on how it actually works by looking under the hood of an event-driven workflow engine.
I'm building in public, if you are interested in this work please subscribe here, and even better please contact me to discuss your use case.