Tag: Linux

  • A home-lab friendly container orchestration option: RKE2 with Rancher

    Recently I ran across this GitLab blog by “clemenko” setting up RKE2 (SUSE Linux backed K8s) with Rancher (GUI for management) and a few other handy plugins.

    This blog shows deploying a small three-node cluster (VMs on a cloud), but I was able to setup a single-node RKE2+Rancher node on a single machine in my home lab.

  • Journalctl and following output starting at a specific time.

    I was triaging an issue on my Fedora 41 (the RHEL 9 upstream), and had to watch for entries in “journalctl”. Tailing the output wasn’t helpful – the default tail output was starting well before the “current time”. By the time I clicked the appropriate buttons in the GUI, the output was unusable.

    I ended up using the “–since=XXX” with the “date” command like this:

    # journalctl --since="$(date +"%Y-%m-%d %H:%M:%S")" --no-tail --follow


    • $(date +"%Y-%m-%d %H:%M:%S") : The current date/time (can also be ‘yesterday’ or other parsable date strings). Handy to run in one window, then kick off the failing command in another.
    • --since="XXX" : With the date command, limit to only new events.
    • --no-tail : Show all stored output lines, even in follow mode. (This will follow the live log entries)
    • --follow : Follow the live log entries.

    The “journalctl” command has been available in many Linux distributions for years but it’s novelty has left many greybeards such as myself occasionally challenged using it.

  • File date details

    Recently I rebuilt my primary workstation and was restoring files from backup. I have a few copies built up over time and was trying to determine which specific files I wanted to keep. The output of ls was cumbersome, sometimes listing the year and other times listing just the month and date.

    The convention of the ls command is to only show the date when the timestamp of the file is more than six months away from the current date, but scanning this list was annoying as it swapped from an older file (Sep 29 2017) to newer files (Nov 23 20:53).

    Of course there was a flag for ls to handle this…

    This answer on https://unix.stackexchange.com was spot-on for what I was looking for:

    In short the ls -l --time-style=long-iso flags keep the format consistent: 2021-08-22 12:00, 2020-12-16 05:04, or 2022-10-21 04:12

    Viewing my backup archive directory now shows things consistently:

    $ ls -al --time-style=long-iso
    total 24
    drwxr-xr-x 2 999 999 0 2022-11-13 14:30 .
    drwxr-xr-x 2 999 999 0 2022-11-16 17:21 ..
    -rwxr-xr-x 1 999 999 3478 2021-10-16 01:30 BackupToNAS.2021-10-16_0130.log
    -rwxr-xr-x 1 999 999 4388 2021-10-16 01:30 BackupToNAS.2021-10-16_0130.log-scriptlog
    -rwxr-xr-x 1 999 999 2986 2021-11-13 10:17 BackupToNAS.2021-11-13_1010.log
    -rwxr-xr-x 1 999 999 4753 2021-11-13 10:17 BackupToNAS.2021-11-13_1010.log-scriptlog
    drwxr-xr-x 2 999 999 0 2022-11-14 06:04 dan
    drwxr-xr-x 2 999 999 0 2022-11-14 06:03 dan.old

  • Buildup

    Thanks to “https://www.reddit.com/user/MaricxX/” for this photo – it demonstrates how small glitches over time can add up if they aren’t addressed rapidly – or better yet, not allowed to start in the first place.

    Cross section of layers of paint showing deformation due to imperfections magnified with each layer.
    Layers of paint – credit to MaricxX from Reddit – https://www.reddit.com/user/MaricxX/

    At a previous job it was common to take our Windows virtual machine templates and power them on once a month to patch the OS and apply the latest security configurations. We had been doing this with our Red Hat Linux images, but a couple years ago I converted our process so each month we built those VM templates fresh from an ISO and a Hashicorp Packer script using VMware Workstation.

    This monthly fresh build ensured that we always knew how to build the VM templates in the event of a disaster, and it ensured that our build process contained exactly what we planned and advertised (through our team Git repository). As new requirements were received from the InfoSec team or other sources with system concerns that could only be readily addressed during the initial build phase, we would add those steps to the Packer config file, then test and build new.

    With the prevalence of new worms and other highly effective infection vectors, my fear was that we would get a piece of malware onto the templates and then that malware would be automatically replicated each time a new system was built. And there were many times when we started the patching process each month only to find that a couple of the Windows templates had been left running since the previous months patch effort. There is no telling what might have crawled onto these unmanaged systems in the intervening time, only waiting for us to start using them over time.

    While the paint analogy doesn’t perfectly match with the IT world, there are sufficient correlations that it makes the possibility of replicating and amplifying a small defect all the more understandable. Still, I prefer to have my freshly-built template with it’s minimal layers of paint knowing that I am confident that it only contains the bits we wanted.

  • When is a disk space problem not a disk space problem?

    A co-worker setup an Ansible playbook to update some packges but it kept erroring out. The error that Ansible reported from “yum” was “No space left on device“. He had jumped onto the system and saw that this partition had plenty of space left so asked if I could look into it.

    I got on and confirmed that when I ran a simple “yum update” it showed this:

    [root@linux5237 ~]# echo n | yum update

    Loaded plugins: product-id, rhnplugin, search-disabled-repos, security, subscription-manager

    [Errno 28] No space left on device: ‘/var/run/rhsm/cert.pid’

    This system is receiving updates from RHN Classic or RHN Satellite.

    Could not create lock at /var/run/yum.pid: [Errno 28] No space left on device: ‘/var/run/yum.pid’

    Hmm, no disk space still. Looking at the “df /var” output looks good:

    [root@linux5237 ~]# df /var

    Filesystem           1K-blocks   Used Available Use% Mounted on


                           2514736 914948   1468716  39% /var

    Suspecting other resource issues I checked the inode availability using “df -i:

    [root@linux5237 ~]# df -i /var

    Filesystem           Inodes  IUsed IFree IUse% Mounted on


                         163840 163840     0  100% /var

    A ha! No inodes left. I’ll let you use your favorite search engine to look up details, but an easy way to think of “inodes” is as space on the first few pages of a book dedicated to being the “table of contents.” If you have a book with a few chapters, you only need a single page for the table of contents (the inodes). If you have a book with lots of chapters and sub-chapters, you might need a lot of pages (more inodes). By default Unix systems have a forumla on how much of the filesystem to dedicate to being “inodes” and how much is left for actual data storage. Usually this is fine for most systems.

    To find them we want to look for directories which have chewed up the 163K files:

    for i in /var/*; do echo $i; find $i |wc -l; done

    This pointed to the “/var/spool/app01/” directory – it has over 160K small files.  The owner of the system was able to clean up some old files there and the “yum update” worked as expected.

    It’s possible to override the inode settings when the filesystem is formatted, so if you know this ahead a time you can do this. If you run into this after the fact, the usual resolution is to backup the data, reformat the filesystem with more inodes allocated, then restore from backup.

  • SELinux and NFS $HOME directories

    Recently we re-installed a common server with RHEL-7 and that went well.  But after a couple days I noticed that I was unable to login with my personal ssh key but I had before. It was a minor annoyance and didn’t pursue it … until today.

    It turns out that the /home/ directory on this system is an NFS mount, and in RHEL-7 we have set SELinux to default to enforcing.  There is an SELinux boolean flag, “use_nfs_home_dirs” that needed to be set to “1” (true).  Running the “setsebool -P use_nfs_home_dirs 1” on this system was the fix and now we/I can resume logging in with the SSH key instead of typing in my passwordeach time.

    Some were reluctant to fix this as they always typed in their password. While typing in your password over the SSH login connection is encrypted, but it does present the possibility that your password could get copied given a compromised endpoint, plus we are trying to use longer passwords so typing this in multiple times per day was frustrating and slowed workflow.  Using SSH keys eliminates this risk and provides for other features such as scheduled/scripted command execution and file transfers.