CNCF in 2018 and has graduated in terms of project maturity.
Read this article to know everything about TiKV.
The name of this project is inspired from Titanium (‘Ti’), as the creators kept the element’s property in mind. ‘KV’ is for the ‘key-value’ combination in reference to the databases where it is majorly used. If you are wondering about how to pronounce it, it’s ‘Tai-K-V.’
The Product
TiKV is designed for scalability, super low latency, and added simplicity. It is a distributed key-value database without heavy dependencies on existing distributed file systems, unlike its counterparts. The solution aims at achieving similar functionality to Google Spanner, HBase, and F1.
The above image depicts the major components of the solution. Here is a quick description to explain it all better:
The previous section covered the TiKV components. These components require communicating with each other to perform their destined tasks, and that’s where the Protocol Buffer Protocol finds its application. However, as Rust does not offer gRPC compatibility, the protocol format is defined explicitly by TiKV.
Each communication happens by exchanging one or more messages, where the message format is Header + Payload.
Here, the Payload contains the Protocol Buffer’s data and has the length as specified in the header. So, the final message/payload is read as per the header’s details and decoding it done accordingly.
Header is a 16-byte sequence and has the following format: | 0xdaf4(2 bytes magic value) | 0x01(version 2 bytes) | msg\_len(4 bytes) | msg\_id(8 bytes) |
The project named kvproto project has implemented the TiKV interaction-specific protocol. Another project tipB comprises its algorithm that takes care of push-down.
So, if you want to understand the TiKV architecture better, these files from kvproto can help:
Now, if you want to utilize TiKV API for an external project, you must use the following files:
raft_cmdpb.proto: When you just need the key-value (KV) feature (basic).
Raft is a popular algorithm that helps various systems achieve better performance or fault-tolerance for relatively independent sub-problems. TiKV can use this consensus algorithm to achieve high consistency in distributed systems – independently. You can utilize it as required for your project.
The algorithm is chosen as it is simple to implement, production-ready, highly practical, and migration-friendly. In fact, the Raft in TiKV is fully-migrated from etcd only.
Wondering about how to use Raft in your TiKV project? Check these steps:
Storage-specific details related to HardState & ConfState are returned by this function.
This step will return the log entries between the low and high level/range.
It will fetch the log entry’s term specific to the corresponding log-index.
// 4. It fetches the 1st log-entry index at current position:
// 5. It fetches the very last log-entry index at current position:
// 6. It crafts the current snapshot and outputs it
Step 1: Through this step, the Raft’s storage properties are defined and deployed.
Step 2: For this, it creates a raw node object to pass the initial state details (configuration and hardware/storage).
In the config data, periodical ticks, e.g. election_tick & heartbeat_tick are most essential, as Raft steps are performed as per them. The leader sets a heartbeat frequency and as the heartbeat_tick’s value reaches this threshold, heartbeats are sent to the followers and the elapse is reset.
At every follower’s end, the occurrence of election-tick is checked against the elapsing election. When the threshold is reached, the election/consensus process is initiated.
Step 3: After a raw node’s creation, as per the tick period (say, 100ms), this node’s tick interface will be called repeatedly.
Step 4: When there is a need to write the data by TiKV Raft, it will simply call the Propose interface that replicates arbitrary binary-formatted data as its parameters. Data-handling depends totally upon external logics.
Step 5: When there is a need to modify the membership, it will be done using the propose_conf_change interface. For this, the raw node has to send a ConfChange object when it is called. Adding or removing a particular node will be tackled accordingly.
Step 6: When the Tick & Propose like functions are called, Raft will set the concerned raw node’s state to Ready. It may have any of the 3 meanings:
Once this status is handled by Raft, it will inform the next node/process in the ready state using the Advance function.
TiKV uses Raft through a Rust Library called Mio. Here’s the process:
The same steps are followed for all Raft groups. Each of these groups is independent and is related to a particular region.
When the process begins, there exists just 1 TiKV region with its range as (-inf, +inf). The first split happens with the arrival of data as soon as it reaches the threshold of 64MB. The next regions are also created using splitting (upon reaching the threshold) recurringly, as per the Split Key.
The TiKV roadmap has a merge process too, but it is yet to be implemented in practical terms. In this process, contrary to the above, the adjacent regions with too little data are merged to form a bigger region.
As told previously, PD forms the brain of a TiKV cluster. It is responsible for ensuring the high consistency and availability of the cluster.
Now, as it’s the central point of control, it may also act as the single point of failure. To avoid this, we may start various PD servers and elect one of these servers as the group's leader. The selection of the leader happens through the election mechanism in etcd, and therefore, is fair.
This selected leader takes care of external communication and provides services to the outsiders. When it becomes unresponsive due to failure or other reasons, another leader is elected utilizing the same election process.
When one leader is down, another leader takes their place. However, what about consistency? How can one ensure that the new leader has the latest and consistent data?
For this, PD data is kept with etcd. Being a distributed KV store, it can confirm data’s consistency level. The new PD (leader) will fetch this data present with etcd.
The previous versions had external etcd service. However, the current versions have the PD embedded in etcd, making it fast, simple to implement, and performance-oriented.
At present, the main responsibilities of the Placement Driver include:
TiKV takes its inspiration from Google’s Percolator and Xiaomi - Themis when it comes to performing and handling transactions. However, the TiKV model is a bit different and is more optimized. Here is where it differs:
Here’s how a typical transaction functions in TiKV:
This concept matches the process used in HBase. However, the TiKV coprocessor doesn’t load dynamically. Instead, it is processed in a static manner. Also, the major role of coprocessor is in serving the TiDB TiKV in scenarios, e.g., Split or pushdown. Here is how it helps TiKV:
Here’s how this process will proceed:
Wondering how TiKV serves a GET or PUT request in general, or how the replicas are updated? Let us explain the answer in this section.
TiKV may use three types of such operations, namely – simple, push-down, or transactional. The selection of operation depends upon the nature of the request. However, in the end, each such operation is treated as a simple KV operation (at present).
See how a simple KV operation for the PUT query proceeds:
The process for the GET request is also the same. It implies that the requests are only processed upon their replication by a majority of Raft nodes so that the distributed system could maintain data linearizability.
The TiKV raft may have the followers with the read service provision capability, or the leader leasing facility to offer the read service without the involvement of Raft replicated log. The purpose of these proposed ideas is to optimize the TiKV performance.
Stores in the TiKV maintain their multiple copies/replicas, in order to ensure cybersecurity of their data.
These replicas are placed as peers to each other in the storage. When a region has insufficient replicas, we will add more. And when a region has more than sufficient replacements, we will delete a few.
The Raft Membership Change component of TiKV takes care of modifying the replicas and completing the above two processes. However, the timestamp and the process for the region modification is controlled by PD.
To simplify, if PD is the manager, and Membership Change functions as the actor.
Now, see the steps of an example process to understand what exactly happens:
Note: Till now, only the meta of the region is updated with the replica’s information. Later on, if the Leader realized that the new Follower comprises no data, it will send a snapshot to it.
Note 2: The Raft paper has a different deployment process explained for the TiKV & etcd Membership Change than how it actually works. In it, Region meta is modified when we run the command ‘Propose.’ However, neither TiKV nor etcd adds the peer details to the Region meta before the log’s application.
TiKV is a reliable distributed database solution that is striving to achieve more in the future, as clear from its roadmap. Even at the present stage, the tool is now mature as per CNCF and can be used in commercial projects.
The above article must have helped you understand it better. So, if its architecture and processes look reasonable and performance-oriented, this solution can form an efficient part of your next project. For example, a TiKV kubernetes solution could lead to innovative results.
Subscribe for the latest news