Fault Tolerant Hash Join for Distributed Systems

Arsen Nasibullin

doi:10.32603/2071-2340-2022-4-68-82

Arsen Nasibullin Saint Petersburg State University, 28 Universitetskiy pr., Stary Peterhof, 198504, Saint Petersburg, Russia

DOI: https://doi.org/10.32603/2071-2340-2022-4-68-82

Keywords: Distributed systems, Fault Tolerance, Hash Join, Replication

Abstract

Nowadays, enterprises are inclined to deploy data processing and analytical applications from well-equipped mainframes with highly available hardware components to commodity computers. Commodity machines are less reliable than expensive mainframes. Applications deployed on commodity clusters have to deal with failures that occur frequently. Mostly, these applications perform complex client queries with aggregation and join operations. The longer a query executes, the more it suffers from failures. It causes the entire work has to be re-executed. This paper presents a fault tolerant hash join (FTHJ) algorithm for distributed systems implemented in Apache Ignite. The FTHJ achieves fault tolerance by using a data replication mechanism, materializing intermediate computations. To evaluate FTHJ, we implemented the baseline, unreliable hash join algorithm. Experimental results show that FTHJ takes at least 30% less time to recover and complete join operation when a failure occurs during the execution. This paper describes how we reached a compromise between executing recovery tasks for the least amount of time and using additional resources.

Author Biography

Arsen Nasibullin, Saint Petersburg State University, 28 Universitetskiy pr., Stary Peterhof, 198504, Saint Petersburg, Russia

PhD candidate, Postgraduate graduate of the Faculty of Mathematics and Mechanics, Saint Petersburg State University, nevskyarseny@yandex.ru

References

A. S. Tanenbaum and M. V. Steen, Distributed Systems: Principles and Paradigms, Upper Saddle River, NJ, USA: Prentice-Hall, 2006.

B. Catania and L. Jain, “Advanced Query Processing: An Introduction,” in Intelligent Systems Reference Library, vol. 36, pp. 1–13, 2012; doi:10.1007/978-3-642-28323-9_1

A. Avizienis, J. C. Laprie, B. Randell, and C. Landwehr, “Basic concepts and taxonomy of dependable and secure computing,” IEEE Trans. Dependable Secur. Comput., vol. 1, no. 1, pp. 11–33, 2004; doi:10.1109/tdsc.2004.2

G. Graefe, “Query evaluation techniques for large databases,” ACM Comput. Surv., vol. 25, no. 2, pp. 73–169, 1993; doi:10.1145/152610.152611

C. Barthels, I. Muller, T. Schneider, G. Alonso, and T. Hoefler, “Distributed join algorithms on thousands of cores,” in Proc. of the VLDB Endowment, vol. 10, no. 5, pp. 517–528, 2017; doi:10.14778/3055540.3055545

C. Kim et al., “Sort vs. hash revisited: Fast join implementation on modern multi-core cpus,” in Proc. of the VLDB Endowment, vol. 2, no. 2, pp. 1378–1389, 2009; doi:10.14778/1687553.1687564

Volt Inc., “VOLT ACTIVE DATA,” in voltdb.com, 2022. [Online]. Available: https://voltdb.com/

The Apache Software Foundation, “Apache hadoop,” in apache.org, 2019. [Online]. Available: https://hadoop.apache.org/

The Apache Software Foundation, “Apache ignite,” apache.org, 2022. [Online]. Available: https://ignite.apache.org/

The Apache Software Foundation, “Apache spark,” apache.org, 2020. [Online]. Available: https://spark.apache.org/

A. Nasibullin and B. Novikov, “Fault tolerant distributed hash join in relational databases,” in 5th Conf. on software engineering and information management (SEIM-2020), Aahen, Germany, pp. 11–17, no. 2691, 2020.

D. A. Schneider and D. J. DeWitt, “A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment,” ACM SIGMOD Record, vol. 18, no. 2, pp. 110–121, 1989.

S. Blanas et al., “A comparison of join algorithms for log processing in mapreduce,” in Proc. of the 2010 ACM SIGMOD International Conference on Management of data, pp. 975–986, 2010.

K. Shim, “Mapreduce algorithms for big data analysis,” in Int. Workshop on Databases in Networked Information Systems, pp. 44–48, 2013.

D. Van Hieu, S. Smanchat, and P. Meesad, “Mapreduce join strategies for key-value storage,” in 2014 11th Int. Joint Conference on Computer Science and Software Engineering (JCSSE), pp. 164–169, 2014.

Citus Data, Microsoft Company, “Citusdb,” in citusdata.com, 2022. [Online]. Available: https://citusdata.com/

M. T. Ozsu and P. Valduriez, ¨ Principles of Distributed Database Systems, Springer Publishing Company Inc., 2011.

J. Gray et al. “The recovery manager of the system r database manager,” ACM Comput. Surv., vol. 13, no. 2, pp. 223–242, 1981; doi:10.1145/356842.356847

M. Saadoon et al., “Fault tolerance in big data storage and processing systems: A review on challenges and solutions,” Ain Shams Engineering Journal, vol. 13, no. 2, p. 101538, 2022; doi:10.1016/j.asej.2021.06.024

N. Ayari, D. Barbaron, L. Lefevre, and P. Primet, “Fault tolerance for highly available internet services: concepts, approaches, and issues,” IEEE Communications Surveys & Tutorials, vol. 10, no. 2, pp. 34–46, 2008.

D. F. Bacon et al., “Spanner: Becoming a sql system,” in Proc. of the 2017 ACM International Conference on Management of Data, pp. 331–343, 2017.

Amazon Web Services Inc., “Aws rds,” in amazon.com, 2022. [Online]. Available: aws.amazon.com/rds/

ClickHouse Inc., “Clickhouse,” in clickhouse.com, 2022. [Online]. Available: https://clickhouse.com/

The Apache Software Foundation, “Apache hive,” apache.org, 2022. [Online]. Available: https://hive.apache.org/

D. Playfair, A. Trehan, B. McLarnon, and D. S. Nikolopoulos, “Big data availability: Selective partial checkpointing for in-memory database queries,” in 2016 IEEE International Conference on Big Data (Big Data), pp. 2785–2794, 2016.

T. Chen and K. Taura, “A selective checkpointing mechanism for query plans in a parallel database system,” in 2013 IEEE Int. Conf. on Big Data, pp. 237–245, 2013.

MongoDB, Inc., “Mongo db,” in mongodb.com, 2021. [Online]. Available: https://www.mongodb.com/

TPC, “Tpc-h benchmark,” in tpc.org, 2022. [Online]. Available: http://www.tpc.org/tpch/

H. Shin, I. Lee, and G. S. Choi, “Bucket-sorted hash join,” Journal of Information Science & Engineering, vol. 36, no. 1, pp. 171–190, 2020.

H. Gao and N. Sakharnykh, “Scaling joins to a thousand gpus. in Bordawekar,” in Int. Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures, ADMS@VLDB 2021, Copenhagen, Denmark, August 16, 2021, pp. 55–64, 2021.

C. Yang, C. Yen, C. Tan, and S. Madden, “Osprey: Implementing mapreducestyle fault tolerance in a shared-nothing distributed database,” in Proc. of the 26th Int. Conf. on Data Engineering, ICDE 2010, March 1-6, 2010, Long Beach, California, USA, pp. 657–668, 2010; doi:10.1109/ICDE.2010.5447913

J. A. Quiane-Ruiz, C. Pinkel, J. Schad, and J. Dittrich, “Rafting mapreduce: Fast recovery on the raft,” in 2011 IEEE 27th Int. Conf. on Data Engineering, pp. 589–600, 2011.

F. Wang, J. Qiu, J. Yang, B. Dong, X. Li, and Y. Li, “Hadoop high availability through metadata replication,” in Proc. of the First Int. Workshop on Cloud Data Management, CloudDB ’09, pp. 37–44, 2009; doi:10.1145/1651263.1651271