DocumentCode :
2107571
Title :
Temperature based fault forecasting in computer clusters
Author :
Haider, Shahid ; Ansari, N.R.
Author_Institution :
Dept. of Comput., Shaheed Zulfikar Ali Bhutto Inst. of Sci. & Technol., Islamabad, Pakistan
fYear :
2012
fDate :
13-15 Dec. 2012
Firstpage :
69
Lastpage :
77
Abstract :
Clusters and Grids have one thing common and that is they both are used to achieve High Performance in Computing. The scope of Cluster is relatively narrow compared to Grid, as Clusters are homogeneous while Grids are heterogeneous. Another emerging area in High Performance Computing (HPC) is Cloud computing that can be considered as a further extension of Grid computing. Apart from other issues that exist in Clusters, Grids and Clouds, there is one common problem or issue that is available in all of them and that is Fault Tolerance and Handling. Fault Tolerance is the technique or the set of techniques that are used when different types of Hardware, Software, Network and other types of problems come during the handling and execution of Clusters, Grids and Clouds. In this research we have focused on fault identification and forecasting from Clusters point of view and have tried to establish a technique that forecasts the faults in Clusters based environments on the basis of temperature. Nodes keep on receiving and monitoring the temperature of the attached devices from temperature sensor and check the temperature threshold values of those devices. If the temperature threshold value of devices is within the range than we place/rate the machine in Green zone. Similarly if temperatures are approaching threshold values then we place the machines in Orange zone that represents that machine may or may not crash on the basis of temperature. Similarly when the devices have crossed the threshold values of the temperature then we place the machine in Red zone that represents that machine is likely to fail due to the failure of one or more hardware devices any time.
Keywords :
fault tolerant computing; grid computing; temperature; temperature sensors; cloud computing; cluster computing; computer cluster; fault handling; fault identification; fault tolerance; grid computing; high performance computing; temperature based fault forecasting; temperature sensor; temperature threshold value; Cluster; Distributed Systems; Fault Forecasting; Fault Tolerance; Grid;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multitopic Conference (INMIC), 2012 15th International
Conference_Location :
Islamabad
Print_ISBN :
978-1-4673-2249-2
Type :
conf
DOI :
10.1109/INMIC.2012.6511446
Filename :
6511446
Link To Document :
بازگشت