Recovering from filesystem corruption in btrfs
In cases where the
physical disk has not failed but instead something in the btrfs journal or checksum trees is corrupted and does not match, and the filesystem refuses to mount, this is the recommended procedure to try:
First make a backup the volume.
After that try to mount the volume in the read-only recovery mode:
$ mount -t btrfs -o ro,usebackuproot /dev/sda2 /home
If that fails, look in syslog (or run dmesg) and look for btrfs errors:
[ 74.926506] Btrfs loaded
[ 74.927393] BTRFS: device fsid 4e90ec15-e6f5-470d-96be-677f654a5c79 devid 2 transid 691061 /dev/sdc1
[ 77.439765] BTRFS info (device sdc1): disk space caching is enabled
[ 77.440620] BTRFS: failed to read the system array on sdc1
If there are messages relating to the log tree (not in the example above), then reset the log tree by running:
$ btrfs-zero-log
If syslog shows problems regarding the chunk tree, then btrfs rescue chunk-recover may be of used to replace the chunk blocks with new ones that should work (but may loose some data). Each disk has multiple copies of super blocks, and they are very unlikely to all get corrupted at the same time, but it happens they can be recovered with the rescue command:
$ btrfs rescue super-recover -v /dev/sde1
All Devices:
Device: id = 3, name = /dev/sdc1
Device: id = 1, name = /dev/sda2
Device: id = 2, name = /dev/sde1
Before Recovering:
[All good supers]:
device name = /dev/sdc1
superblock bytenr = 65536
device name = /dev/sdc1
superblock bytenr = 67108864
device name = /dev/sdc1
superblock bytenr = 274877906944
device name = /dev/sda2
superblock bytenr = 65536
device name = /dev/sda2
superblock bytenr = 67108864
device name = /dev/sda2
superblock bytenr = 274877906944
[All bad supers]:
device name = /dev/sde1
superblock bytenr = 65536
device name = /dev/sde1
superblock bytenr = 67108864
device name = /dev/sde1
superblock bytenr = 274877906944
After those, try btrfsck, and possibly with options -s1, -s2, -s3. It the volume is still not mountable, then try btrfsck --repair.
The command btrfsck --repair --init-extent-tree may be necessary if the extent tree was corrupted. If there is corruption in the checksums, try –init-csum-tree.
Last resort is to run btrfs check --repair but it’s not recommended because it might write changes to the disk that destroys data.
Generic tools might also be useful. For example the tool
testdisk is able to scan disks and find lost partition tables, including ones with btrfs partitions.
Restoring files from a broken btrfs filesystem
If it simply is impossible to mount a btrfs filesystem, it is possible to use the command btrfs restore to fetch files from withing a damaged btrfs partition. The default command will get all files from the root volume.
Sometimes simply restore isn’t enough. For example in Ubuntu, by default the /home directory is a separate btrfs subvolume. To fetch files from there the correct volume root must be defined via the -r option. Also you might not be interested in restoring all possible files, maybe just one particular directory, and for such use a filename filter can be defined with the --path-regex option.
To fetch all files from
/home/otto/Kuvat on a system there the
@home subvolume object id is
258, the task can be accomplished with the command:
$ btrfs restore -i -vvvv -r 258 --path-regex "^/(otto(|/Kuvat(|/.*)))$" /dev/sdc1 .
...
Restoring ./otto/Kuvat/2015/08/09/IMG_2888.JPG
Restoring ./otto/Kuvat/2015/08/09/IMG_2889.JPG
Restoring ./otto/Kuvat/2015/08/09/IMG_2890.JPG
Restoring ./otto/Kuvat/2015/08/09/IMG_2891.JPG
Restoring ./otto/Kuvat/2015/08/09/IMG_2892.JPG
Restoring ./otto/Kuvat/2015/08/09/IMG_2893.JPG
Found objectid=18094703, key=18094702
Done searching /otto/Kuvat/2015/08/09
Found objectid=18094632, key=18094631
Done searching /otto/Kuvat/2015/08
Found objectid=8304076, key=8304075
Done searching /otto/Kuvat/2015
Found objectid=272, key=271
Done searching /otto/Kuvat
Found objectid=258, key=257
Done searching /otto
Found objectid=257, key=256
Done searching
Detecting data corruption
If btrfs detects errors, they will be
logged to syslog. Btrfs also maintains error counters, which on normal healthy drives should always list all zeros:
$ btrfs device stats /mnt
[/dev/sda2].write_io_errs 0
[/dev/sda2].read_io_errs 0
[/dev/sda2].flush_io_errs 0
[/dev/sda2].corruption_errs 0
[/dev/sda2].generation_errs 0
[/dev/sdc1].write_io_errs 0
[/dev/sdc1].read_io_errs 0
[/dev/sdc1].flush_io_errs 0
[/dev/sdc1].corruption_errs 0
[/dev/sdc1].generation_errs 0
Btrfs automatically calculates
CRC-32C checksums for both data and metadata blocks, and at regular intervals checks if the checksums still match or not. If data corruption is detected, then btrfs will log errors to syslog. If RAID 1 is enabled, btrfs will also automatically fix the corrupted data by overwriting with by the correct duplicate. This process can also be automatically triggered by running btrfs scrub.
Checking disk health
Disks that support the SMART standard are able to report their health status. The Gnome tool ‘disks’ provides a very easy way to access SMART data. Just open Disks, select a device and choose ‘Show SMART Data & Self-Tests’.