Assessment of the Effect of Memory Page Retirement on System RAS Against Hardware Faults

Philadelphia, PA(2006)

引用 92|浏览2
暂无评分
摘要
The Solaris 10 operating system includes a number of new features for predictive self-healing. One such feature is the ability of the fault management software to diagnose memory errors and drive automatic memory page retirement (MPR), intended to reduce the negative impact of permanent memory faults that generate either correctable or uncorrectable errors on system reliability, availability, and serviceability (RAS). The MPR technique allows memory pages suffering from correctable errors and relocatable clean pages suffering from uncorrectable errors to be removed from use in the virtual memory system without interrupting user applications. It also allows relocatable dirty pages associated with uncorrectable errors to be isolated with limited impact on affected user processes, avoiding an outage for the entire system. This study applies analytical models, with parameters calibrated by field experience, to quantify the reduction that can be made by this operating system self-healing technique on the system interruptions, yearly downtime, and number of services introduced by hardware permanent faults, for typical low-end and mid-range server systems. The results show that significant improvements can be made on these three system RAS metrics by deploying the MPR capability
更多
查看译文
关键词
fault management software,mid-range server system,memory error,permanent memory fault,hardware fault,memory page retirement,fault tolerant computing,automatic memory page retirement,operating systems (computers),predictive self-healing,operating system,system ras metrics,memory error diagnosis,solaris 10 operating system,program diagnostics,system reliability,system interruption,uncorrectable error,system ras,paged storage,system availability,virtual memory system,system serviceability,hardware faults,entire system,memory management,fault management,reliability,sun,operating systems,error correction,field experiment,hardware,virtual memory,availability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要