Repairing Corrupt File Systems or File System
Log Devices
Technote (FAQ)
Question
Repairing
Corrupt File Systems or File System Log Devices
Answer
This
document covers the use of the fsck and logform commands to repair inconsistencies or
corruption in file systems and associated log devices. This document applies to
AIX Versions 4 and 5.
Fixes for
the fsck and logform commands
The fsck command is used to detect and repair
inconsistencies in the journaled file system (JFS) and the enhanced journaled file
system (JFS2) structure. It is not intended to correct problems with corrupt
data, but only the file system structure itself. The basic syntax is:
fsck [options] /dev/<LVname>
OR
fsck [options] /<fsname>
Before
running fsck to repair a filesystem it MUST be unmounted
for any changes to occur. The fsck command
requires the file system to be in a consistent state while performing its
checks. If fsck is run while the filesystem is
mounted it will be run in read-only mode, and will not correct any
inconsistencies it finds. Attempts by users to write to the file system while
mounted may cause fsck to report corruption that
does not exist.
For
this reason, any errors reported by fsck while the
file system is mounted may not be relevant.
Various
options for fsck exist:
-f performs a fast check;
files systems with check
= true entries in /etc/filesystems are checked
-p fixes minor problems without interaction from user
-y gives permission to correct every problem found
-n indicates not to correct any problems
-p fixes minor problems without interaction from user
-y gives permission to correct every problem found
-n indicates not to correct any problems
See
the man page for fsck for more information.
While
the -y flag is certainly time-saving, be careful with
this option. If fsck cannot read a block due
to a missing disk, for example, it will ask to clear the block. If the disk is
missing due to an adapter failure, for instance, you may be removing
recoverable data by responding yes or by giving the fsck command explicit permission to fix everything
it finds.
Examples of fsck use:
fsck /dev/lv00
fsck -y /data
fsck -p /dev/lv00
The logform command formats a logical volume for use
as a log device, which stores transactional information about file system
metadata changes and can be used to roll back incomplete operations if the
machine crashes. The logform command
is destructive; it wipes out all data in the logical volume. Accidentally
running this on a file system completely destroys all file system data.
The logform command should only be run on CLOSED
logical volumes. If a log device is open due to its use by a mounted file
system, the file system should be unmounted prior to running logform against the log device.
Run
the following to ensure that the log device is closed:
lsvg -l <VGname>
Here
are some examples of messages you might receive that would indicate a corrupt
log device:
failure replaying log
media is not formatted or format is not correct
Examples
of logform use:
logform /dev/loglv00
logform: destroy /dev/loglv00 (y)?y
If
you receive one of the following errors from the fsck or mount commands,
the problem may be a corrupted superblock.
fsck: Not an AIX4 file system
fsck: Not an AIXV4 file system
fsck: Not a recognized file system
type
0506-342 The superblock is
dirty. Run a full fsck to fix.
mount: invalid argument
The
backup superblock can be copied over the primary superblock via one of these
commands:
dd count=1 bs=4k skip=31 seek=1
if=/dev/lv00 of=/dev/lv00 (JFS)
dd count=1 bs=4k skip=15 seek=8
if=/dev/lv00 of=/dev/lv00 (JFS2) (Version 5 only)
fsck -p /dev/lv00 (works for both
JFS and JFS2)
Once
the copying over is completed, check the integrity of the file system by
issuing:
fsck /dev/lv00
In
many cases, copying the backup superblock to the primary superblock will
recover the file system. If this does not work, you will have to recreate the
file system and restore the data from a backup.
It is not possible under normal circumstances
to unmount /, /usr, /tmp, and /var, and thus
close /dev/hd8 (the primary rootvg log device) so they can be checked or fixed.
This can only be done in maintenance mode.
1. Boot the machine into maintenance mode, access
the rootvg volume group, and start a shell prior to
mounting the file systems. If you need assistance with this, contact your AIX
support center.
2. Run the following commands to fsck the primary file systems:
fsck /dev/hd4
fsck /dev/hd2
fsck /dev/hd3
fsck /dev/hd9var
fsck /dev/hd1
Other fsck options as outlined previously can also be
used, where appropriate.
3. Format the default jfslog for the rootvg JFS file
systems with:
/usr/sbin/logform /dev/hd8
Answer y when asked if you want to destroy the log.
4. Type exit to exit from
the shell. The primary file systems will automatically mount.
5. Shutdown and reboot with the key in normal:
sync;sync;sync;shutdown -Fr
You
can also run fsck on any user-created file
systems in rootvg, if needed. This can
typically be done in normal mode.