The Urban Legend of mandb on RHEL

Aka “The Case of mandb‘s Missing Daily Scheduled Job”

Lately I’ve been revisiting a number of fundamental RHEL OS parts that I’ve used regularly for I’ll just call it “awhile” already, with a beginner’s mind, and seeking to “kind of trust BUT definitely verify” the current textbooks being published, especially when it comes to old-school Linux tools, the tried-and-true, the everybody-knows-that stuff. So anytime I now encounter a declarative statement in a book like the latest RHCSA guide like this one: “…when you use the man -k command, the mandb database is consulted. This database is automatically created through a scheduled job“, I want to know if that’s still really true, and I’ll ask myself if it even seems to match my own experience with the latest and greatest RHEL version I’m working on. And in any case, I next motivate myself to take a first-principles approach to make sure I can SEE where and how this is all happening on my actual system. For the uninitiated, the mandb program is built to “create or update the manual page index caches”. This particular mandb rabbit hole ended up being sorta long to traverse, but it serves, in this writer’s humble opinion, as a flat-out fantastic reminder of what answering the bedrock questions for yourself of “how it all really works” can teach you about discovering your operating system. (Also, this post is not just about operating systems.)

By the way, I should preface the rest of what follows by reemphasizing that gerund, “discovering”. I realize that some advanced readers might arrive immediately at or within a few troubleshooting steps of the solution, where I took several others in between. A counterpoint to that sentiment would be that already knowing something is great, but knowing how to know something is perhaps just as important (and intellectually rewarding). What Nobel Prize-winning physicist Richard Feynman called “the pleasure of finding things out.”

The TL;DR is that mandbas a standalone program is not part of RHEL’s automatic manpage update puzzle anymore. What it effectively did has been replaced by macros that live on the RHEL system itself in /usr/lib/rpm/macros.d/macros.systemd but which are invoked by dnf, according to its compiled source code. The dnf.spec file specifically, which contains the line: %systemd_post dnf-makecache.timer, which invokes at least one of the two related “man-db-cache” systemd units, via what ultimately gets logged by auditd as a /usr/bin/systemctl start man-db-cache-update. This turns out to be super helpful, because it means that dnf is updating the manpage cache and index each and every time it installs a package. But the upshot is that what you would think of as a classic mandb now only happens on, say, a normal RHEL 9 system boot; and then whenever you install (or uninstall) a package via dnf; and then apparently before shutdown. Basically. If all of that sounds obscure, it is. But it’s how it actually works.

One of the first principles I follow when I am discovering things about software in general: There is no magic. Computers are, generally speaking, deterministic. Operating systems do precisely what their software developers and users tell them to do, whether the humans meant for them to do those precise things or not. And because of that, on any open source distro at least, there is almost always human-readable or decodable-from-binary text related to whatever you are looking for, somewhere on the filesystem.

So anyway, let’s deconstruct this statement which I already quoted: “…when you use the man -k command, the mandb database is consulted. This database is automatically created through a scheduled job“.

The end of that first sentence is true. Run man -k curl and you’ll get an answer back. So far, so good.

[root@rhel9 ~]# man -k curl
curl (1)             - transfer a URL
[root@rhel9 ~]#

The second sentence is no longer true, at all, and it didn’t “feel” true when I first read it. Because it certainly used to be that you’d install a program on RHEL and then have to either manually run mandb or wait for mandb to run on a schedule (usually via a daily cron job) to update the index, all so that you could read and search your latest cool program’s manpages. But this hasn’t been the observable fact of the matter for a while now. Yet many docs (including de facto “official” ones) continue to namecheck mandb like it’s still back there cranking away in some dark corner of your root volume. In actual fact, today if you dnf install wget -y and then man -k wget, its manpages are already there for you. With all that said, mandb‘s daily scheduled job’s disappearing (but still seeming to haunt your OS environment) act makes a whole lot of sense when you unravel it. It’s just not terribly-well documented (if it’s definitively documented anywhere). If any of this is laid out clearly in man sections 1, 5, or 8, I could not find it. And for their part, the new textbooks are probably all getting mandb totally wrong nowadays, probably because nobody’s been keeping up with the technical review on this one-of-one-gazillion RHEL topics. But hey, that’s what keeps us humble and hungry to keep learning (and relearning) what we thought we already knew.

#####

[root@rhel9 ~]# systemctl show man-db-cache-update.service -p Requires,Wants,Before,After
Requires=sysinit.target system.slice
Wants=
Before=shutdown.target
After=basic.target sysinit.target systemd-journald.socket system.slice local-fs.target
[root@rhel9 ~]#


#####

[root@rhel9 ~]# systemctl list-units | grep dnf
  dnf-makecache.timer                                                                      loaded active     waiting   dnf makecache --timer
[root@rhel9 ~]#

#####

[root@rhel9 ~]# grep -B 2 -A 2 cron /etc/sysconfig/man-db

# Set this to "no" to disable daily man-db update run by
# /etc/cron.daily/man-db.cron
CRON="yes"

[root@rhel9 ~]#

# Plot twist! `/etc/cron.daily/man-db.cron` doesn’t actually exist by default anymore!
 
[root@rhel9 ~]# stat /etc/cron.daily/man-db.cron
stat: cannot statx '/etc/cron.daily/man-db.cron': No such file or directory
[root@rhel9 ~]#

#####

# Relevant snippet of dnf.spec [https://github.com/rpm-software-management/dnf/blob/master/dnf.spec]:

%post
%systemd_post dnf-makecache.timer

# Translation: this macro, `%systemd_post`, is called to run the dnf-makecache.timer whenever dnf installs a package. It wasn’t always this way. Which is a nice change, but probably the opposite of obvious if you’re simply looking at the filesystem or the, ahem, manpages for any of this.

#####

# Relevant snippet of said macro in `/usr/lib/rpm/macros.d/macros.systemd`:

%systemd_post() \
%{expand:%%{?__systemd_someargs_%#:%%__systemd_someargs_%# systemd_post}} \
if [ $1 -eq 1 ] && [ -x "/usr/lib/systemd/systemd-update-helper" ]; then \
    # Initial installation \
    /usr/lib/systemd/systemd-update-helper install-system-units %{?*} || : \
fi \
%{nil}

# This ends up looking like this when logging running an `ausearch`:

"type=EXECVE msg=audit(1713317136.446:3364): argc=4 a0="/usr/bin/systemd-run" a1="/usr/bin/systemctl" a2="start" a3="man-db-cache-update"" anytime it occurs on a Red Hat Enterprise Linux 9 system. Full log is here: ---- time->Wed Apr 17 01:25:36 2024 type=PROCTITLE msg=audit(1713317136.446:3364): proctitle=2F7573722F62696E2F73797374656D642D72756E002F7573722F62696E2F73797374656D63746C007374617274006D616E2D64622D63616368652D757064617465 type=PATH msg=audit(1713317136.446:3364): item=1 name="/lib64/ld-linux-x86-64.so.2" inode=148674748 dev=fd:00 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ld_so_t:s0 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0 type=PATH msg=audit(1713317136.446:3364): item=0 name="/usr/bin/systemd-run" inode=67512497 dev=fd:00 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:bin_t:s0 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0 type=CWD msg=audit(1713317136.446:3364): cwd="/" type=EXECVE msg=audit(1713317136.446:3364): argc=4 a0="/usr/bin/systemd-run" a1="/usr/bin/systemctl" a2="start" a3="man-db-cache-update" type=SYSCALL msg=audit(1713317136.446:3364): arch=c000003e syscall=59 success=yes exit=0 a0=55bf12df4f60 a1=55bf12dfc340 a2=55bf12dfc0d0 a3=55bf12dfc6e0 items=2 ppid=89764 pid=89765 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts1 ses=16 comm="systemd-run" exe="/usr/bin/systemd-run" subj=unconfined_u:unconfined_r:rpm_script_t:s0-s0:c0.c1023 key="mandb-cmd" 

A tiny part of the challenge in isolating this man db cache update to dnf‘s compiled source code, which I suspected all along but had a bit of a time nailing down, is that the calling PID (short-lived), not program name, is logged above. Meaning that I had to put a watch on a ps -ef --forest to confirm that dnf was indeed the owner of that PID when it was doing a package install.

Again, there is no magic.

Fun list of programs, in no particular order, that I got to use in figuring all of this out:

  • auditctl
  • ausearch
  • dnf
  • find
  • journalctl
  • grep
  • man
  • ps
  • rpm
  • strace
  • systemctl