Citation: | Yang Zhenkun, Yang Chuanhui, Han Fusheng, Wang Guoping, Yang Zhifeng, Cheng Xiaojun. Architecture and Technology of OceanBase Distributed Relational Database[J]. Journal of Computer Research and Development, 2024, 61(3): 540-554. DOI: 10.7544/issn1000-1239.202330835 |
Relational database is the key information infrastructure of today’s society. The Internet and digitization have brought high concurrency and massive data. Due to their centralized architectures, the processing power and storage capacity of traditional relational databases are stretched. OceanBase is a distributed relational database based on commodity PC servers. It achieves online horizontal scalability, automatic lossless disaster recovery from data center failure and high-ratio data compression. It has been used in finance, government affairs, telecommunication systems, Internet, etc. We introduce the architecture and some key technologies of OceanBase, including distributed transaction processing, LSM-tree-based storage system and distributed SQL optimizer. In addition, we explain in detail the high availability and data consistency of OceanBase, which can ensure that RPO is 0 and RTO is less than 8 seconds. At the same time, it also introduces OceanBase’s multi-tenant mechanism, which adopts a native multi-tenant design within the cluster to implement multiple independent database services in the cluster. Based on the Sysbench and TPC-H evaluation benchmarks, comparative experimental results show that 1) in a stand-alone mode, the performance of OceanBase is 1.27 times to over 2 times that of MySQL; 2) in a single-master mode, the performance of OceanBase is 1.25 times to nearly 2 times that of MySQL; 3) in a multi-master mode, the performance of OceanBase is 1.09 to 3.1 times that of MySQL, and for complex OLAP queries, the performance of OceanBase is 6 to 327 times that of MySQL.
[1] |
Codd E F. A relational model of data for large shared data banks[J]. Communications of the ACM, 1970, 13(6): 377−387 doi: 10.1145/362384.362685
|
[2] |
Yang Zhenkun, Yang Chuanhui, Han Fusheng, et al. OceanBase: A 707 million tpmC distributed relational database system[J]. Proceedings of the VLDB Endowment, 2022, 15(12): 3385−3397 doi: 10.14778/3554821.3554830
|
[3] |
Yang Zhifeng, Xu Quanqing, Gao Shanyan, et al. OceanBase Paetica: A hybrid shared-nothing/shared-everything database for supporting single machine and distributed cluster[J]. Proceedings of the VLDB Endowment, 2023, 16(12): 3728−3740 doi: 10.14778/3611540.3611560
|
[4] |
Serlin O. TPC-C Details: 60, 880, 800 tpmC [EB/OL]. [2023-11-25]. https://www.tpc.org/1799
|
[5] |
Serlin O. TPC-H Result Details: 15, 265, 305 QphH@30000GB [EB/OL]. [2023-11-25]. https://www.tpc.org/3375
|
[6] |
Lamport L. The part-time parliament[J]. ACM Transactions on Computer Systems, 1998, 16(2): 133−169 doi: 10.1145/279227.279229
|
[7] |
Gray J. The transaction concept: Virtues and limitations[C]//Proc of Int Conf on Very Large Data Bases. San Francisco: Morgan Kaufmann, 1981: 144−154
|
[8] |
Mohan C, Lindsay B, Obermarck R. Transaction management in the R* distributed database management system[J]. ACM Transactions on Database Systems, 1986, 11(4): 378−396 doi: 10.1145/7239.7266
|
[9] |
Berenson H, Bernstein P, Gray J, et al. A critique of ANSI SQL isolation levels[C]//Proc of the 1995 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 1995: 1−10
|
[10] |
Bernstein P, Goodman N. Multiversion concurrency control−Theory and algorithms[J]. ACM Transactions on Database Systems, 1983, 8(4): 465−483 doi: 10.1145/319996.319998
|
[11] |
O’Neil P, Cheng E, Gawlick D, et al. The log-structured merge-tree (LSM-tree)[J]. Acta Informatica, 1996, 33(4): 351−385 doi: 10.1007/s002360050048
|
[12] |
Selinger P, Astrahan M, Chamberlin D, et al. Access path selection in a relational database management system[C]//Proc of the ACM SIGMOD Conf on Management of Data. New York: ACM, 1979: 23−34
|
[13] |
Graefe G, McKenna W. The Volcano optimizer generator: Extensibility and efficient search[C]//Proc of the IEEE Conf on Data Engineering. Piscataway, NJ: IEEE, 1993: 209−218
|
[14] |
Graefe G. The Cascades framework for query optimization[J]. IEEE Data Engineering Bulletin, 1995, 18(3): 19−29
|
[15] |
Levy A, Mumick I, Sagiv Y. Query optimization by predicate move-around[C]//Proc of Int Conf on Very Large Data Bases. San Francisco: Morgan Kaufmann, 1994: 96–107
|
[16] |
Kim W. On optimizing an SQL-like nested query[J]. ACM Transactions on Database Systems, 1982, 7(3): 443−469 doi: 10.1145/319732.319745
|
[17] |
Chaudhuri S, Shim K. An overview of cost-based optimization of queries with aggregates[J]. IEEE Data Engineering Bulletin, 1995, 18(3): 3−9
|
[18] |
Kornacker M, Behm A, Bittorf V, et al. Impala: A modern, open-source SQL engine for Hadoop[C]//Proc of the 7th Biennial Conf on Innovative Data Systems Research. New York: ACM, 2015: 1−10
|
[19] |
Oracle. Adaptive SQL Plan Management (SPM) in Oracle Database 12c Release 1 (12.1) [EB/OL]. [2023-11-25]. https://oracle-base.com/articles/12c/adaptive-sql-plan-management-12cr1
|
[20] |
何宝宏. 中国通信标准化协会.数据库发展研究报告(2023)[EB/OL]. [2023-07-04]. https://www.c114.com.cn/market/39/a1236668.html
He Baohong. China Communications Standards Association. Database Development Research Report (2023) [EB/OL]. [2023-07-04]. https://www.c114.com.cn/market/39/a1236668.html(in Chinese)
|
[21] |
Ghemawat S, Gobioff H, Leung S. The Google file system[C]//Proc of the 19th Symp on Operating Systems Principles. Berkeley, CA: USENIX Association, 2003: 29−43
|
[22] |
Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters[C]//Proc of the 6th Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2012: 137−150
|
[23] |
Chang F, Dean J, Ghemawat S, et al. Bigtable: A distributed storage system for structured data[J]. ACM Transactions on Computer Systems, 2008, 26(2): 1−26
|
[24] |
DeCandia G, Hastorun D, Jampani M, et al. Dynamo: Amazon’s highly available key-value store[C]//Proc of the ACM Symp on Operating Systems Principles. Berkeley, CA: USENIX Association, 2007: 205–220
|
[25] |
Peng D, Dabek F. Large-scale incremental processing using distributed transactions and notifications[C]//Proc of the USENIX Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2010: 1–15
|
[26] |
Corbett J, Dean J, Epstein M, et al. Spanner: Google’s globally-distributed database[C]//Proc of the 10th USENIX Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2012: 251–264
|
[27] |
Bacon D, Bales N, Bruno N, et al. Spanner: Becoming a SQL system[C]//Proc of the 2017 ACM Int Conf on Management of Data. New York: ACM, 2017: 331–343
|
[28] |
Taft R, Sharif I, Matei A, et al. CockroachDB: The resilient geo-distributed SQL database[C]//Proc of the 2020 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2020: 1493−1509
|
[29] |
Cook B. YugabyteDB [EB/OL]. [2023-11-25]. https://www. yugabyte. com
|
[30] |
Yang Zhenkun. OceanBase [EB/OL]. [2023-11-25]. https://github.com/oceanbase
|
1. |
项秋艳,訾玲玲,丛鑫. 改进自适应模型池的在线异常检测算法. 电子学报. 2024(07): 2503-2514 .
![]() | |
2. |
吕飞亚,梁艳,刘炜,宫卓宏. 山西预警台站信息管理系统的设计与开发. 山西地震. 2024(03): 35-38 .
![]() | |
3. |
朱茂盛,王宝晗,康曼聪,于巍,杨利超. 智能物联网技术赋能算网一体数据库的效能优化. 计算机研究与发展. 2024(11): 2835-2845 .
![]() | |
4. |
王晓东,郭亮亮. 新一代工业物联网数据管理关键技术研究. 自动化博览. 2024(11): 70-72 .
![]() | |
5. |
李登峰,邓子龙,李擎伟. 基于插件化技术的港口装卸物联网平台实施方案. 港口装卸. 2024(06): 37-40+43 .
![]() |