77TB of research data lost due to HPE software update

Kyoto University lost 77TB of critical research data from its supercomputer because Hewlett Packard Enterprise (HPE) released a software update that caused a script to malfunction and delete save data. As a result, working days are gone and a significant part of erased data is lost forever.

Kyoto University lost about 34 million files from 14 research groups generated from December 14 to 16, according to The battery. GizChina reported that the university was unable to restore the data of four groups by backup and is therefore permanently lost. Initially, Kyoto specialists believed that the university was losing up to 100 TB, but it turned out that the disaster limit was 77 TB of data.

HPE pushed an update that caused a malfunction of a script that deletes log files older than ten days. However, instead of deleting old log files stored with backups in a high-capacity storage system, it erased all files from the backup, erasing 77TB of critical research data.

HPE admitted that his software update caused the problem and took full responsibility for it.

“From 5:32 p.m. on December 14, 2021 to 12:43 p.m. on December 16, 2021, due to a fault in the program that backs up the storage of the supercomputer system (manufactured by Japan Hewlett Packard), the supercomputer system [malfunctioned]”, reads a statement from HPE translated by Google. “As a result, an accident occurred in which some data from the high capacity storage (/LARGE0) was inadvertently deleted. […] The backup log of the past which was originally useless due to a problem of reckless modification of the program and its application procedure in the repair function of the backup program by Japan Hewlett Packard, the supplier of the super system computer science. The process of deleting files malfunctioned as the process of deleting files under directory /LARGE0.”

The team suspended the backup process on the supercomputer. Still, Kyoto University plans to resume backing up by the end of January after resolving the software and script issue and taking steps to prevent them from happening again.