Cloud-Native Event-Driven Transaction Management
This article provides an insight into how to design your transaction management for event-driven cloud-native microservices.
A legacy application usually has a single monolithic database. The ACID transactions can be easily maintained in the monolithic single database. The ACID means:
- A — Atomicity: A transaction is an atomic unit. All the instructions within a transaction will successfully execute, or none of them will execute
- C — Consistency: A transaction can only bring the database from one valid state to another and data is in a consistent state when a transaction starts and when it ends.
- I- Isolation: one state of a transaction is invisible to other transactions, this ensures that concurrency maintained across transactions and leaves the database in the same state
- D — Durability: Changes that have been committed to the database should remain even any failures
As a result of ACID, your monolithic application and database can easily manage the database transactions.
When you decouple an application to a cloud-native service or developing a new cloud-native service, data access management becomes complex because of polyglot principles. adopting a polyglot principle ensures that the microservices are loosely coupled and deploy and manage independently of one another. If multiple services access the same data, then you need to handle coordination across cloud-native services. One more obstacle in transaction management in polyglot microservices, as polyglot principle illustrates each microservices can use different databases for each as modern application stores diverse kinds of data, and one type of database not always beneficial.
For some cloud-native services, a NoSQL database might have a more convenient data model and offer much performance and scalability, similarly for search microservices, you may be considering an Elasticsearch and for the graph-related store, you might use graph databases like Neo4J, etc. at nutshell, in one system, you might use multiple types of databases. Using a polyglot persistence provides many benefits like scalability, manageability, and high availability and introduces distributed data management challenges.
The real challenges of using polyglot persistence in a cloud-native service are:
- Implementing a business transaction across services
- Retrieving data from multiple services
Let’s analyze how these challenges impact your cloud-native services.
The first challenge, implementing a business transaction that maintains consistency across services. Let’s consider an example of an eCommerce application. The eCommerce application consists of 100’s of cloud-native services to manage various business cases like Order, customer, inventory, catalog, etc.
In this example, I am considering three cloud-native services such as Customer Service, Order Service, and Inventory Service to illustrates transaction management.
- Customer Service: The responsibility of this microservice is to maintain customer information
- Order Service: The responsibility of this microservice is the management of order
- Inventory Service: The responsibility of this microservice is to manage the inventory and new order doesn’t confirm if inventory is less than requested.
In the traditional monolithic application of eCommerce, the Order service can simply use an ACID transaction to check the availability of inventory and confirm the order.
In the cloud-native service architecture, the Customer, Order, and Inventory tables are aligned to their services, as shown in the following diagram:
The Order Service cannot access the Inventory Service table directly and can use only through Inventory service APIs or channels. When cloud-native services such as Customer, Order, and Inventory services decomposes a monolithic system into self-encapsulated services, it can break transactions,
this means a local transaction of monolithic become distributed into multiple services. The below sequence diagram shows how the transaction could be handled in a monolithic eCommerce application. The customer order example with a monolithic eCommerce system using a local transaction.
The user login to the eCommerce system after authentication, the system creates a session. The user places an order in the system, the system creates a local transaction that manages multiple database tables by using ACID transaction. If one step fails, the transaction can rollback.
Two-Phase Commit in cloud native services
In the cloud-native services architecture, the Order Service could potentially use Inventory service through distributed transaction two-phase commit (2PC). The 2PC is a standardized protocol that ensures a database commit is implemented in the places where commit operation is divided into two separate phases.
- Prepare phase
- Commit Phase
Let me explain, how you can use 2PC for cloud-native services architecture for the Customer, Order, and Inventory services.
In the preparation phase, the Customer, Order, and Inventory services of the transaction prepare to commit and notify the coordinator that they ready to complete the transaction. In the commit phase, the transaction is either a commit or rollback command issued by the transaction coordinator to all the services. Here is the 2PC implementation for customer orders as shown below diagram and sequence diagram.
In the eCommerce example above, when a customer creates an order, the coordinator or orchestrator creates a global transaction with all the context information. It will interact with Order Service to create an Order and order replies to Coordinator after completion of order creation, then Coordinator sends a request to Inventory services for inventory availability with product ID, the Inventory Service sends OK, the stock is available. The Coordinator sends to Order service to confirm the order and same time Coordinator sends Inventory service to update the same. At any point in time, if the service fails to process, then the coordinator will abort the transaction and begin the rollback process.
The benefit of 2PC is a very strong consistency.
- Prepare and commit 2PC phases to guarantee that transactions are atomic, either complete or none
- 2PC allows read-write isolation, the changes are not reflected until unless the coordinator is not performing the commit.
The disadvantages of 2PC, you can solve the transaction by using 2PC but it is not at all recommended approach for any cloud-native architecture based systems because:
- 2PC is synchronous (blocking), it will lock all the cloud-native services until it completes the entire transaction, this could end up a bottleneck in the whole system
- This approach is very slow, due to the blocking of threads of all the participant’s microservices
- A coordinator or orchestrator is a single point of failure and the whole system’s transactions are based on the availability of a Coordinator.
- The CAP theorem (Consistency, Availability, and Partition Tolerance) requires you to choose between availability and ACID properties, based on my experience, the availability is better for cloud-native.
- Modern databases such as NoSQL do not support 2PC.
Transactions with Events
In an event-driven architecture, a microservice publishes an event based on as and when an issue a command and related cloud-native services subscribe to events.
You can use events to implement transactions that span across multiple participating services. You need to implement multiple steps to complete one business transaction and each step is processed with event publishes and triggers next steps.
We will exemplify below how to implement transactions by using event-driven and event sourcing with the same use case as mentioned in 2PC.
The microservices publish and subscribe to an event via an event broker and event store, each service publishes an event to the event broker and other services subscribe to an event as and when publishes. Here the following steps to complete a place an order transaction.
Step 1: Customer place an Order and Order services initiate an order confirmation transaction, Begin Transaction
Step 2: Order services publishes an event to check an inventory bypassing product id
Step 3: Inventory service subscribe to an event from the event broker and check stock level against the product, if the stock available, then
Step 4: Inventory Service publishes an event after reserving a stock
Step 5: Order Service subscribe to an event and confirm the order
Step 6: Order Service publishes an event to update a stock
Step 6.1: Order Service publishes an event for confirming an order
Step 7: Inventory service subscribe to an event and update the stock level
Step 8: Customer service subscribe to the event for a confirmed order and update the details
Step 9: End Transaction
Here each service updates its database and publishes an event and the event broker saves each event in the event store, all these transactions do not adhere to ACID properties but all follow the eventual consistency properties. In this entire transaction, atomicity is very important. To manage the atomicity of your transaction, your event store plays a very important role, for example, for order creation, you need to store order in order service database and publish an event to the event broker and these two should happen atomically. If the service fails after one task, then it became an inconsistency in a transaction. To achieve this inconsistency, you need to manage the event store table to store all kinds of events that occur in the whole transaction.
The event sourcing and event table persist all kinds of events in a transaction, if any transaction fails in between, the service can construct a state by using the event store as shown in the below diagram and each service publish and subscribe to an event by using the event broker.
One way of achieving an event-driven transaction by using the SAGA pattern and CQRS