df says disk is full, but it is not

Disk full, du tells different. How to further investigate?



Missing diskspace in Linux
Today I had a problem with a server that had no more disk space. And I learned something new in the process.
df -h told me that 100% was in use. Which was about 29GB on that server.
But if I checked the root partition with du -shx / i got about 9GB of used space.
So off to checking where the space could have gone:

Reserved Space for root
Your filesystem has reserved some space that only root can write to, so that critical system process don't fall over when normal users run out of disk space. That's why you see 124G of 130G used, but zero available. Perhaps the files you deleted brought the utilisation down to this point, but not below the threshold for normal users.
If this is your situation and you're desperate, you may be able to alter the amount of space reserved for root. To reduce it to 1% (the default is 5%), your command would be
# tune2fs -m 1 /dev/sda3



“Hidden” directories under mountpoints
“du” will not see used space of files located in a path that is later mounted to another file-system. For example, if you have files in /home/ on your root partition, but has later mounted /home to its own partition. The files will be hidden behind the new mountpoint.
To be able to check these files without unmounting anything in use, I did the following:
mount --bind / /mnt
du -shx /mnt
If “du” would give me a different result now, I would have known that the files where hidden under one of my mountpoints. But sadly, they where not. I was starting to run out of options.
Deleted files
·         If a process opens a file, and then you delete it by rm thefile you will have the file deleted from the filesystem, but the inodes will not be freed before the process closes/releases the file. This is something I love about Linux/Posix systems, since that means that processes does not need to lock my files and I have full control over my files as opposed to other operating systems(windows). But I thought that when you delete a opened file, there is no easy way of knowing which deleted files are “held back” by processes. But there is!
·         lsof | grep deleted quickly gave me a list of files that has been deleted, but is held open by processes, and their size. In my case a deleted log file of 17GB in size was held by asterisk. So i reloaded the logger module of asterisk, and The diskspace was available again.Now only 9GB was “in use” according to df -h.

·         The operating system won't release disk space for deleted files which are still open. If you've deleted (say) one of Apache's log files, you'll need to restart Apache in order to free the space.


Inode usage
I know “du” does not take account for inodes, etc. But according to dumpe2fs /dev/sdx1 my Inode size * Inode count = about 700MB.
So that was not it.




No space left on device – running out of Inodes







 “No space left on device”, although partition was not nearly full. If you ever run into such trouble – most likely you have too many small or 0-sized files on your disk, and while you have enough disk space, you have exhausted all available Inodes. Below is the solution for this problem.
1. check available disk space to ensure that you still have some
$ df

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/xvda             33030016  10407780  22622236  32% /
tmpfs                   368748         0    368748   0% /lib/init/rw
varrun                  368748        56    368692   1% /var/run
varlock                 368748         0    368748   0% /var/lock
udev                    368748       108    368640   1% /dev
tmpfs                   368748         0    368748   0% /dev/shm
2. check available Inodes
$ df -i

Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/xvda            2080768 2080768       0  100% /
tmpfs                  92187       3   92184    1% /lib/init/rw
varrun                 92187      38   92149    1% /var/run
varlock                92187       4   92183    1% /var/lock
udev                   92187    4404   87783    5% /dev
tmpfs                  92187       1   92186    1% /dev/shm

If you have IUse% at 100 or near, then huge number of small files is the reason for “No space left on device” errors.
3. find those little bastards
$ for i in /*; do echo $i; find $i |wc -l; done
This command will list directories and number of files in them. Once you see a directory with unusually high number of files (or command just hangs over calculation for a long time), repeat the command for that directory to see where exactly the small files are.
$ for i in /home/*; do echo $i; find $i |wc -l; done
4. once you found the suspect – just delete the files
$ sudo rm -rf /home/bad_user/directory_with_lots_of_empty_files
You’re done. Check the results with df -i command again. You should see something like this:
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/xvda            2080768  284431 1796337   14% /
tmpfs                  92187       3   92184    1% /lib/init/rw
varrun                 92187      38   92149    1% /var/run
varlock                92187       4   92183    1% /var/lock
udev                   92187    4404   87783    5% /dev

tmpfs                  92187       1   92186    1% /dev/shm











Why is space not being freed from disk after deleting a file in Red Hat Enterprise Linux?

Environment

Red Hat Enterprise Linux (RHEL)

Issue

  • Why is space not being freed from disk after deleting a file in Red Hat Enterprise Linux?
  • When deleting a large file or files, the file is deleted successfully but the size of the filesystem does not reflect the change.
  • I've deleted some files but the amount of free space on the filesystem has not changed.
  • The OS was holding several very large log files open with some as large as ~30G. The file was previously deleted, but only stopping and restarting the jvm/java process released the disk space. The lsof command shows the following output before restarting the java process

    COMMAND     PID      USER   FD      TYPE    DEVICE   SIZE/OFF       NODE NAME
    : 
    java      49097    awdmw   77w      REG     253,6 33955068440    1283397 /opt/jboss/jboss-eap-5/jboss-as/server/all/log/server.log (deleted)
    
  • When you perform a df, the storage shows 90+% utilized, however, there is not really that much written to that space.

Resolution

Graceful shutdown of relevant process
First, obtain a list of deleted files which are still held open by applications:

$ /usr/sbin/lsof | grep deleted
ora    25575 data   33u   REG      65,65  4294983680   31014933 /oradata/DATAPRE/file.dbf (deleted)
The lsof output shows the process with pid 25575 has kept file /oradata/DATAPRE/file.dbf open with file descriptor (fd) number 33.
After a file has been identified, free the file used space by shutting down the affected process. If a graceful shutdown does not work, then issue the kill command to forcefully stop it by referencing the PID.
Truncate File Size
Alternatively, it is possible to force the system to de-allocate the space consumed by an in-use file by forcing the system to truncate the file via the proc file system. This is an advanced technique and should only be carried out when the administrator is certain that this will cause no adverse effects to running processes. Applications may not be designed to deal elegantly with this situation and may produce inconsistent or undefined behavior when files that are in use are abruptly truncated in this manner.
$ echo > /proc/pid/fd/fd_number
For example, from the lsof output above:
$ file /proc/25575/fd/33
/proc/25575/fd/33: broken symbolic link to `/oradata/DATAPRE/file.dbf (deleted)'
$ echo > /proc/25575/fd/33
The same reason will cause different disk usage from du command and df command, 
To identify the used file size (in blocks), use the command below:
# lsof -Fn -Fs |grep -B1 -i deleted | grep ^s | cut -c 2- | awk '{s+=$1} END {print s}'

Root Cause

On Linux or Unix systems, deleting a file via rm or through a file manager application will unlink the file from the file system's directory structure; however, if the file is still open (in use by a running process) it will still be accessible to this process and will continue to occupy space on disk. Therefore such processes may need to be restarted before that file's space will be cleared up on the filesystem.

Diagnostic Steps

Log Reaper will allow you to visualize and quickly narrow down the lsof data to exactly the subset you want to see

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
  1.  run the lsof command to list deleted files.  The below command identifies the log files that are deleted however OS is still holding up space.
  2. # lsof | grep z grep log | grep deleted | sort -k 7n
    The output is by default sorted by file size, now select the file at the bottom(ensure its a old log file) and identify the associated name.
  3. The other option is to empty/truncate the file size. This forces the system to deallocate the space consumed by the file by forcing the system to truncate the file via the proc file system. 
  4. IMP: Its CRITICAL that you ensure log file you want to truncate is not in use and is an old log file cause running this on a wrong file could cause issues,  one way to check that is to run tail -f filename on the file. This will output 'file not found' in almost all the cases which is a good check and file with date stamp in the name are generally old files.
  5. run below command to empty/truncate the file .
    echo > /proc/pid/fd/fd_number
    in the above command pid is process id and fd_number is the file descriptor associate with the log file. So for our example it will be :pid : 6784   fd_number : 306
    The command will look like 
    echo > /proc/6784/fd/306
  6. This will empty the log file and release space . run df -hP to confirm.
  7. Why this Happens : 
    • On Linux or Unix systems, deleting a file via rm or through a file manager application will unlink the file from the file system's directory structure; however, if the file is still open (in use by a running process) it will still be accessible to this process and will continue to occupy space on disk. Therefore such processes may need to be restarted before that file's space will be cleared up on the filesystem.

Post a Comment

Previous Post Next Post