![]() |
CiteULike | ![]() |
Group: Highly-Available Services at... | ![]() |
![]() |
|
![]() |
Register | ![]() |
Log in | ![]() |
Remus: high availability via asynchronous virtual machine replicationIn NSDI'08: Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation (2008), pp. 161-174.
|
Reviews
[Write a review of this article]
Find related articles from these CiteULike users
Find related articles with these CiteULike tags
Posting History
AbstractAllowing applications to survive hardware failure is an expensive undertaking, which generally involves reengineering software to include complicated recovery logic as well as deploying special-purpose hardware; this represents a severe barrier to improving the dependability of large or legacy applications. We describe the construction of a general and transparent high availability service that allows existing, unmodified software to be protected from the failure of the physical machine on which it runs. Remus provides an extremely high degree of fault tolerance, to the point that a running system can transparently continue execution on an alternate physical host in the face of failure with only seconds of downtime, while completely preserving host state such as active network connections. Our approach encapsulates protected software in a virtual machine, asynchronously propagates changed state to a backup host at frequencies as high as forty times a second, and uses speculative execution to concurrently run the active VM slightly ahead of the replicated system state.
BibTeX record
RIS record