Language:
English
繁體中文
Help
回圖書館首頁
手機版館藏查詢
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
Fault-tolerant cluster management.
~
Li, Ming.
Linked to FindBook
Google Book
Amazon
博客來
Fault-tolerant cluster management.
Record Type:
Language materials, printed : Monograph/item
Title/Author:
Fault-tolerant cluster management./
Author:
Li, Ming.
Description:
208 p.
Notes:
Adviser: Yuval Tamir.
Contained By:
Dissertation Abstracts International67-07B.
Subject:
Computer Science. -
Online resource:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3226042
ISBN:
9780542796708
Fault-tolerant cluster management.
Li, Ming.
Fault-tolerant cluster management.
- 208 p.
Adviser: Yuval Tamir.
Thesis (Ph.D.)--University of California, Los Angeles, 2006.
Cost-effective high-performance can be achieved using clusters of Commercial Off-The-Shelf (COTS) computers interconnected by high-speed networks. When clusters are used for critical applications and/or in hostile environment, the required system reliability can only be achieved using fault tolerance techniques that allow the system to continue to operate correctly despite component failure. Cluster management middleware (CMM) is a software layer above the operating system controlling individual nodes and below the applications. The CMM schedules tasks on a cluster, controls access to shared resources, provides for task submission and monitoring, and coordinates the cluster's fault tolerance mechanisms. Reliable operation of the cluster requires reliable, continuous operation of the management middleware.
ISBN: 9780542796708Subjects--Topical Terms:
626642
Computer Science.
Fault-tolerant cluster management.
LDR
:03185nam 2200289 a 45
001
974322
005
20110929
008
110929s2006 eng d
020
$a
9780542796708
035
$a
(UMI)AAI3226042
035
$a
AAI3226042
040
$a
UMI
$c
UMI
100
1
$a
Li, Ming.
$3
559294
245
1 0
$a
Fault-tolerant cluster management.
300
$a
208 p.
500
$a
Adviser: Yuval Tamir.
500
$a
Source: Dissertation Abstracts International, Volume: 67-07, Section: B, page: 3906.
502
$a
Thesis (Ph.D.)--University of California, Los Angeles, 2006.
520
$a
Cost-effective high-performance can be achieved using clusters of Commercial Off-The-Shelf (COTS) computers interconnected by high-speed networks. When clusters are used for critical applications and/or in hostile environment, the required system reliability can only be achieved using fault tolerance techniques that allow the system to continue to operate correctly despite component failure. Cluster management middleware (CMM) is a software layer above the operating system controlling individual nodes and below the applications. The CMM schedules tasks on a cluster, controls access to shared resources, provides for task submission and monitoring, and coordinates the cluster's fault tolerance mechanisms. Reliable operation of the cluster requires reliable, continuous operation of the management middleware.
520
$a
This dissertation is focused on the key challenges in building highly reliable CMM. The system is based on centralized decision making. However, unlike most other cluster middleware, the manager is protected by Byzantine fault-tolerant state machine replication and the ability to restore the management service to full functionality and full fault tolerance following arbitrary single faults. To this end, we use a low-cost fault-tolerant replication mechanism coupled with on-line self-diagnosis and reconfiguration. The robust replicated manager is coupled with less aggressive fault tolerance mechanisms for dealing with less critical system components and with a fault-tolerant system bootstrapping mechanism. A fault-tolerant cluster designed to operate autonomously, must include a highly-reliable trusted hardcore to control critical functions such as the initiation of a node reset. We describe the functionality required from this trusted hardcore and its interactions with the replicated cluster manager.
520
$a
The result of this work is a carefully balanced integrated set of efficient practical techniques for aggressive fault tolerance. These techniques allow a highly reliable system to be built using mostly standard COTS hardware and software components. This is demonstrated in an operational system, called Ghidrah, that has been built at UCLA. This dissertation includes preliminary performance evaluation of Ghidrah and validation of the fault tolerance mechanisms by fault injection experiments.
590
$a
School code: 0031.
650
4
$a
Computer Science.
$3
626642
690
$a
0984
710
2 0
$a
University of California, Los Angeles.
$3
626622
773
0
$t
Dissertation Abstracts International
$g
67-07B.
790
$a
0031
790
1 0
$a
Tamir, Yuval,
$e
advisor
791
$a
Ph.D.
792
$a
2006
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3226042
based on 0 review(s)
Location:
ALL
電子資源
Year:
Volume Number:
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
W9132552
電子資源
11.線上閱覽_V
電子書
EB W9132552
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login