– Testing and Mitigation of Soft Errors
Dr. Charles H. Recchia
Chair
IEEE Reliability Society Boston Chapter
Abstract
This seminar covers testing and design mitigation of soft errors in high performance computation, drawing on experience from product development of large scale systems and industrial conference publications. Architectural features for soft error resilience will be addressed in conjunction with accelerated testing for faults and associated errors during single event upset tolerance feature verification. The impact of cache line residency with workload on architectural vulnerability as well as single-, double- and triple-bit error detection and correction strategies will be presented. Limitations of checkpoint for large system resilience and implication for scaling. The presentation will also cover some aspects of the JEDEC JESD 89A standard as it pertains to accelerated testing for soft errors in semiconductor devices. System-size scaling implications and associated mitigation strategies will be also discussed as well as a survey of IEEE SELSE Workshop key papers and neutron-beam testing at facilities such as Los Alamos LANSCE. The talk also reviews of software reliability presentations at the IEEE Reliability Society Boston Chapter monthly meetings held at MIT Lincoln Laboratory.
Biography
Charles H. Recchia has held technology development, reliability engineering and management positions at Intel Corporation, MKS Instruments, Saint-Gobain, Raytheon Integrated Defense Systems and currently M/A-COM Technology Solutions, having earned a Ph.D. in Experimental Solid State Physics from Ohio State University, an MBA from Babson College, with visiting academic appointments at Wittenberg University and Worcester Polytechnic Institute. He is author on 3 semiconductor technology patents, more than 20 peer-reviewed publications and has served on technical program committees for IEEE IRPS Conference and SELSE Workshops. He has also conducted ASQ RD webinars on Bayesian methods in reliability engineering. A Senior Member of IEEE and multiple years of advisory committee service, Dr. Recchia is currently serving as Chair of the IEEE Reliability Boston Chapter and IEEE Reliability Society AdCom member.
NEC
Hyundai
SFL
REU
TUES
(Reserved)
(Reserved)
(Reserved)
(Reserved)
(Reserved)
(Reserved)
(Reserved)
(Reserved)
(Reserved)