DISTRIBUTED APPLICATION CHECKPOINTING FOR REPLICATED STATE MACHINES

Celikel, Niyazi; Ovatman, Tolga

doi:10.12694/scpe.v22i1.1840

DISTRIBUTED APPLICATION CHECKPOINTING FOR REPLICATED STATE MACHINES

Atıf İçin Kopyala

Celikel N. O., Ovatman T.

SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, cilt.22, sa.1, ss.67-79, 2021 (ESCI)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 22 Sayı: 1
Basım Tarihi: 2021
Doi Numarası: 10.12694/scpe.v22i1.1840
Dergi Adı: SCALABLE COMPUTING-PRACTICE AND EXPERIENCE
Derginin Tarandığı İndeksler: Emerging Sources Citation Index (ESCI), Scopus, Applied Science & Technology Source, Compendex, Computer & Applied Sciences
Sayfa Sayıları: ss.67-79
İstanbul Teknik Üniversitesi Adresli: Evet

Özet

Application checkpointing is a widely used recovery mechanism that consists of saving an application's state periodically to be used in case of a failure. In this study we investigate the utilisation of distributed checkpointing for replicated state machines. Conventionally, for replicated state machines, checkpointing information is stored in a replicated way in each of the replicas or separately in a single instance. Applying distributed checkpointing provides a means to adjust the level of fault tolerance of the checkpointing approach by giving away from recovery time. We use a local cluster and cloud environment to examine the effects of distributed checkpointing in a simple state machine example and compare the results with conventional approaches. As expected, distributed checkpointing gains from memory consumption and utilise different levels of fault tolerance while performing worse in terms of recovery time.