When is a disk space problem not a disk space problem?

A co-worker setup an Ansible playbook to update some packges but it kept erroring out. The error that Ansible reported from “yum” was “No space left on device“. He had jumped onto the system and saw that this partition had plenty of space left so asked if I could look into it.

I got on and confirmed that when I ran a simple “yum update” it showed this:

[[email protected] ~]# echo n | yum update

Loaded plugins: product-id, rhnplugin, search-disabled-repos, security, subscription-manager

[Errno 28] No space left on device: ‘/var/run/rhsm/cert.pid’

This system is receiving updates from RHN Classic or RHN Satellite.

Could not create lock at /var/run/yum.pid: [Errno 28] No space left on device: ‘/var/run/yum.pid’

Hmm, no disk space still. Looking at the “df /var” output looks good:

[[email protected] ~]# df /var

Filesystem           1K-blocks   Used Available Use% Mounted on

/dev/mapper/rootvg-varlv

                       2514736 914948   1468716  39% /var

Suspecting other resource issues I checked the inode availability using “df -i:

[[email protected] ~]# df -i /var

Filesystem           Inodes  IUsed IFree IUse% Mounted on

/dev/mapper/rootvg-varlv

                     163840 163840     0  100% /var

A ha! No inodes left. I’ll let you use your favorite search engine to look up details, but an easy way to think of “inodes” is as space on the first few pages of a book dedicated to being the “table of contents.” If you have a book with a few chapters, you only need a single page for the table of contents (the inodes). If you have a book with lots of chapters and sub-chapters, you might need a lot of pages (more inodes). By default Unix systems have a forumla on how much of the filesystem to dedicate to being “inodes” and how much is left for actual data storage. Usually this is fine for most systems.

To find them we want to look for directories which have chewed up the 163K files:

for i in /var/*; do echo $i; find $i |wc -l; done

This pointed to the “/var/spool/app01/” directory – it has over 160K small files.  The owner of the system was able to clean up some old files there and the “yum update” worked as expected.

It’s possible to override the inode settings when the filesystem is formatted, so if you know this ahead a time you can do this. If you run into this after the fact, the usual resolution is to backup the data, reformat the filesystem with more inodes allocated, then restore from backup.

SELinux and NFS $HOME directories

Recently we re-installed a common server with RHEL-7 and that went well.  But after a couple days I noticed that I was unable to login with my personal ssh key but I had before. It was a minor annoyance and didn’t pursue it … until today.

It turns out that the /home/ directory on this system is an NFS mount, and in RHEL-7 we have set SELinux to default to enforcing.  There is an SELinux boolean flag, “use_nfs_home_dirs” that needed to be set to “1” (true).  Running the “setsebool -P use_nfs_home_dirs 1” on this system was the fix and now we/I can resume logging in with the SSH key instead of typing in my passwordeach time.

Some were reluctant to fix this as they always typed in their password. While typing in your password over the SSH login connection is encrypted, but it does present the possibility that your password could get copied given a compromised endpoint, plus we are trying to use longer passwords so typing this in multiple times per day was frustrating and slowed workflow.  Using SSH keys eliminates this risk and provides for other features such as scheduled/scripted command execution and file transfers.