Repairing Corrupt File Systems or File System Log Devices

Technote (FAQ)


Question

Repairing Corrupt File Systems or File System Log Devices

Answer

This document covers the use of the fsck and logform commands to repair inconsistencies or corruption in file systems and associated log devices. This document applies to AIX Versions 4 and 5.
Fixes for the fsck and logform commands


The fsck command is used to detect and repair inconsistencies in the journaled file system (JFS) and the enhanced journaled file system (JFS2) structure. It is not intended to correct problems with corrupt data, but only the file system structure itself. The basic syntax is:

     fsck [options] /dev/<LVname>
OR
     fsck [options] /<fsname>

Before running fsck to repair a filesystem it MUST be unmounted for any changes to occur. The fsck command requires the file system to be in a consistent state while performing its checks. If fsck is run while the filesystem is mounted it will be run in read-only mode, and will not correct any inconsistencies it finds. Attempts by users to write to the file system while mounted may cause fsck to report corruption that does not exist.
For this reason, any errors reported by fsck while the file system is mounted may not be relevant.
Various options for fsck exist:

-f    performs a fast check; files systems with check = true  entries in /etc/filesystems  are checked
-p    fixes minor problems without interaction from user
-y    gives permission to correct every problem found
-n    indicates not to correct any problems 

See the man page for fsck for more information.
While the -y flag is certainly time-saving, be careful with this option. If fsck cannot read a block due to a missing disk, for example, it will ask to clear the block. If the disk is missing due to an adapter failure, for instance, you may be removing recoverable data by responding yes or by giving the fsck command explicit permission to fix everything it finds.

Examples of fsck use:
     fsck /dev/lv00
     fsck -y /data
     fsck -p /dev/lv00

The logform command formats a logical volume for use as a log device, which stores transactional information about file system metadata changes and can be used to roll back incomplete operations if the machine crashes. The logform command is destructive; it wipes out all data in the logical volume. Accidentally running this on a file system completely destroys all file system data. The logform command should only be run on CLOSED logical volumes. If a log device is open due to its use by a mounted file system, the file system should be unmounted prior to running logform against the log device.
Run the following to ensure that the log device is closed:
 lsvg -l <VGname>

Here are some examples of messages you might receive that would indicate a corrupt log device:
     failure replaying log
     media is not formatted or format is not correct

Examples of logform use:
     logform /dev/loglv00
     logform: destroy /dev/loglv00 (y)?y


If you receive one of the following errors from the fsck or mount commands, the problem may be a corrupted superblock.
fsck: Not an AIX4 file system
fsck: Not an AIXV4 file system
fsck: Not a recognized file system type
0506-342 The superblock is dirty.  Run a full fsck to fix.
mount: invalid argument

The backup superblock can be copied over the primary superblock via one of these commands:

dd count=1 bs=4k skip=31 seek=1 if=/dev/lv00 of=/dev/lv00 (JFS)
dd count=1 bs=4k skip=15 seek=8 if=/dev/lv00 of=/dev/lv00 (JFS2) (Version 5 only)
fsck -p /dev/lv00 (works for both JFS and JFS2)

Once the copying over is completed, check the integrity of the file system by issuing:
fsck /dev/lv00
In many cases, copying the backup superblock to the primary superblock will recover the file system. If this does not work, you will have to recreate the file system and restore the data from a backup.

It is not possible under normal circumstances to unmount //usr/tmp, and /var, and thus close /dev/hd8 (the primary rootvg log device) so they can be checked or fixed. This can only be done in maintenance mode.
1.       Boot the machine into maintenance mode, access the rootvg volume group, and start a shell prior to mounting the file systems. If you need assistance with this, contact your AIX support center.
2.       Run the following commands to fsck the primary file systems:

fsck /dev/hd4
fsck /dev/hd2
fsck /dev/hd3
fsck /dev/hd9var
fsck /dev/hd1

Other fsck options as outlined previously can also be used, where appropriate.
3.       Format the default jfslog for the rootvg JFS file systems with:
/usr/sbin/logform /dev/hd8
Answer when asked if you want to destroy the log.
4.       Type exit to exit from the shell. The primary file systems will automatically mount.
5.       Shutdown and reboot with the key in normal:
sync;sync;sync;shutdown -Fr
You can also run fsck on any user-created file systems in rootvg, if needed. This can typically be done in normal mode.