[Spotlight] Some linux system administration notes

As I’d recently looked into the health of the system which runs this forum, and have from time to time contributed to support threads about disk use or memory use, I thought I run through some of my commonly-used commands.

I usually work from a console, and ssh into a remote machine, so I like everything to be textual and static.

When first arriving on a machine, I’ll use

uptime
to check the CPU load and time since reboot. CPU load is presented as three numbers, being the short medium and long term average number of processes ready to use CPU time. I’m happy if that’s no larger than the number of CPUs (which I find with egrep Hz /proc/cpuinfo) or preferably rather less.

top
is commonly used to get an overview of activity. It’s an endlessly refreshing display so I tend to use
top -bin2
instead. I’d usually only use this to find out which processes are using CPU, or possibly which processes are using memory.

free
or
free -h
gives a snapshot of memory use. I tend to look only at how much swap space is in use: preferably rather less than 10%.

vmstat 5 5
gives a series of snapshots of machine activity. I use it mostly for the columns si and so which indicates active paging in and out of swap. I like that to be zero. Also the bi and bo columns indicate disk activity. Ideally one would have a record of what’s normal on any given machine, in order to know what’s abnormal. I used to run cron jobs on a 10 minute schedule to collect these sorts of textual snapshots - to be looked at only when there’s a problem. I’d rotate those logs daily and keep a month’s worth (by the simple expedient of using the day number in the filename.)

df -h /
shows the amount of free disk space - in the case of Discourse, our forum software, it insists on 5G free before embarking on an upgrade, which it prompts approximately monthly.

netstat -l
is one view of current network connections. I prefer
lsof -n -i
if it’s installed. We expect to see only the processes we expect - so familiarity is key.

/var/discourse/launcher enter app
Discourse is shipped and installed as a complete OS image in docker. With this command we can open a shell within that image, for example to look at processes, memory use, perhaps disk use. It’s also possible to run database commands: either queries or sometimes tidying-up, or (hopefully rarely) repairs.

Within the docker image, then, we could wonder which processes are running under the user ‘discourse’
ps fu -u discourse

Or which processes are using most memory:
ps aux | sort -n -k4 | tail

We exit the docker container with
exit

We talked a bit about monitoring or checking connectivity to the outside world. Ideally, we connect meaningfully to a service we actually care about: perhaps connect to wherever our backups are. But among the lower level checks we can easily do:
ping -c2 g.co
tests both name resolution and connectivity
ping -c2 8.8.8.8
or
ping -c2 1.1.1.1
test just connectivity. But they say little about the quality of a connection: one might need to transfer some data for that.

As noted,
df -h /
tells us how much disk space remains. To see how much is used, and where, I tend to pipe du into sort and then tail:
du -kx / | sort -n | tail -33
To answer a specific question about how much space is used by the base OS, how much by Docker, and how much by the forum data itself, I used:
du -csh /usr /swapfile /var/lib/apt /var/log /var/cache/ /boot/ /*bin /lib | sort -h
du -hsxc /var/discourse/ /var/lib/docker/
du -h /var/discourse/shared/standalone/backups/default

We can ask docker what it’s doing:
docker ps
and how it’s using space:
docker system df
and what volumes (if any) it’s looking after:
docker volume ls
and what images it knows about:
docker image ls
docker image ls -a
Sometimes one is advised to do some dramatic clean up action like this:
docker system prune --all --volumes --force

As an aside, systemd journals a lot of data and when space is tight we don’t need it - we only need it after or during an incident. So we can be very aggressive if we just need to reclaim space right now:
journalctl --disk-usage
journalctl --rotate
journalctl --vacuum-time=1s

Oh, we also mentioned small Linux distributions. It turns out Microsoft’s “CBL-Mariner” does weigh in about CD-sized, at 676 MByte. (It’s not a full end-user distribution, it’s minimal.)

Debian’s netinst image is about 300 MByte and up, depending on architecture, but it’s meant to be a stepping stone to a fully functional installation.

I have in the past used the very small Tiny Core Linux (21 MByte and up), when I needed something akin to a thin client.

This is brilliant - thanks.

A really straightforward set of useful every day commands for understanding and looking at cli fundamentals.

Now to the man pages for definition on some of those switches… :grin:

1 Like

Thanks @Marsh - happy to have been usefully informative! This was a write up of an off the cuff Spotlight at one of our Wednesday zooms. We give ourselves a 15 to 25 min slot to talk about anything which might be of interest.

Nice.

Will have to check out more of them.

Cheers :slight_smile: