Notes about open source software, computers, other stuff.

Tag: sysadmin (Page 1 of 5)

Expanding a partition-backed ZFS special device

The spinning disk pool on my home server uses a mirrored special device (for storing metadata and small blocks, see also this blog post at Klara Systems) based on two NVMe SSDs. Because my home server only has two M.2 slots and I wanted to have a pure SSD ZFS pool as well, I partitioned the SSDs. Each SSD has a partition for the SSD pool and one for the special device of the storage pool (which uses a mirror of spinning disks).

Note: This isn’t really a recommended production setup as you are basically hurting performance of both the special device and the SSD pool. But for my home server this works fine. For example, I use the special device’s small blocks functionality to store previews of the photo’s I store on my Nextcloud server. This makes scrolling through the Memories app’s timeline a breeze, even though the full-size photo’s are stored on the spinning disks.

Today, I noticed that the special device had filled up, and, given that there was still some unpartitioned space on the SSDs, I wondered if I could just expand the partition (using parted) used by the special device and then have the ZFS pool recognise the extra space. In the past I have expanded partition-based ZFS pools before, e.g. on after upgrading the SSD on my laptop, but I hadn’t tried this with a special device before.

After some experimentation, I can tell you: this works.

Here is how I tested this on a throw-away file-backed zpool. First create four test files: two for the actual mirror pool and two that I’ll add as a special device.

for i in {0..3} ; do truncate -s 1G file$i.raw ; done
ls -lh
total 4,0K
-rw-rw-r-- 1 lennart lennart 1,0G mrt 11 12:46 file0.raw
-rw-rw-r-- 1 lennart lennart 1,0G mrt 11 12:46 file1.raw
-rw-rw-r-- 1 lennart lennart 1,0G mrt 11 12:46 file2.raw
-rw-rw-r-- 1 lennart lennart 1,0G mrt 11 12:46 file3.raw

Create a regular mirror pool:

zpool create testpool mirror $(pwd)/file0.raw $(pwd)/file1.raw
zpool list -v testpool
NAME                       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
testpool                   960M   146K   960M        -         -     0%     0%  1.00x    ONLINE  -
  mirror-0                 960M   104K   960M        -         -     0%  0.01%      -    ONLINE
    /tmp/tests/file0.raw     1G      -      -        -         -      -      -      -    ONLINE
    /tmp/tests/file1.raw     1G      -      -        -         -      -      -      -    ONLINE

Add the special device to the zpool:

zpool add testpool special mirror $(pwd)/file2.raw $(pwd)/file3.raw
zpool list -v testpool
NAME                       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
testpool                  1.88G   190K  1.87G        -         -     0%     0%  1.00x    ONLINE  -
  mirror-0                 960M   190K   960M        -         -     0%  0.01%      -    ONLINE
    /tmp/tests/file0.raw     1G      -      -        -         -      -      -      -    ONLINE
    /tmp/tests/file1.raw     1G      -      -        -         -      -      -      -    ONLINE
special                       -      -      -        -         -      -      -      -         -
  mirror-1                 960M      0   960M        -         -     0%  0.00%      -    ONLINE
    /tmp/tests/file2.raw     1G      -      -        -         -      -      -      -    ONLINE
    /tmp/tests/file3.raw     1G      -      -        -         -      -      -      -    ONLINE

I wasn’t sure whether I could just truncate the backing files for the special device to a larger size while they were part of the pool, so I detached them one by one and created new ones of 2GB, and then reattached them:

zpool detach testpool /tmp/tests/file2.raw
zpool list -v testpool
truncate -s 2G file2.raw
zpool attach testpool $(pwd)/file3.raw /tmp/tests/file2.raw
zpool list -v testpool

NAME                       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
testpool                  1.88G   194K  1.87G        -         -     0%     0%  1.00x    ONLINE  -
  mirror-0                 960M   104K   960M        -         -     0%  0.01%      -    ONLINE
    /tmp/tests/file0.raw     1G      -      -        -         -      -      -      -    ONLINE
    /tmp/tests/file1.raw     1G      -      -        -         -      -      -      -    ONLINE
special                       -      -      -        -         -      -      -      -         -
  /tmp/tests/file3.raw       1G    90K   960M        -         -     0%  0.00%      -    ONLINE
NAME                       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
testpool                  1.88G   278K  1.87G        -         -     0%     0%  1.00x    ONLINE  -
  mirror-0                 960M  97.5K   960M        -         -     0%  0.00%      -    ONLINE
    /tmp/tests/file0.raw     1G      -      -        -         -      -      -      -    ONLINE
    /tmp/tests/file1.raw     1G      -      -        -         -      -      -      -    ONLINE
special                       -      -      -        -         -      -      -      -         -
  mirror-1                 960M   180K   960M        -         -     0%  0.01%      -    ONLINE
    /tmp/tests/file3.raw     1G      -      -        -         -      -      -      -    ONLINE
    /tmp/tests/file2.raw     2G      -      -        -         -      -      -      -    ONLINE

And for the second “disk”:

zpool detach testpool /tmp/tests/file3.raw
zpool list -v testpool
truncate -s 2G file3.raw
zpool attach testpool $(pwd)/file2.raw /tmp/tests/file3.raw
zpool list -v testpool

NAME                       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
testpool                  1.88G   218K  1.87G        -         -     0%     0%  1.00x    ONLINE  -
  mirror-0                 960M    66K   960M        -         -     0%  0.00%      -    ONLINE
    /tmp/tests/file0.raw     1G      -      -        -         -      -      -      -    ONLINE
    /tmp/tests/file1.raw     1G      -      -        -         -      -      -      -    ONLINE
special                       -      -      -        -         -      -      -      -         -
  /tmp/tests/file2.raw       2G   152K   960M        -        1G     0%  0.01%      -    ONLINE
NAME                       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
testpool                  1.88G   296K  1.87G        -         -     0%     0%  1.00x    ONLINE  -
  mirror-0                 960M    54K   960M        -         -     0%  0.00%      -    ONLINE
    /tmp/tests/file0.raw     1G      -      -        -         -      -      -      -    ONLINE
    /tmp/tests/file1.raw     1G      -      -        -         -      -      -      -    ONLINE
special                       -      -      -        -         -      -      -      -         -
  mirror-1                 960M   242K   960M        -        1G     0%  0.02%      -    ONLINE
    /tmp/tests/file2.raw     2G      -      -        -        1G      -      -      -    ONLINE
    /tmp/tests/file3.raw     2G      -      -        -         -      -      -      -    ONLINE

And now, here comes the magic. Time to expand the pool and see of the special device will grow to 2GB:

zpool online -e testpool $(pwd)/file2.raw $(pwd)/file3.raw
zpool list -v testpool
NAME                       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
testpool                  2.88G   226K  2.87G        -         -     0%     0%  1.00x    ONLINE  -
  mirror-0                 960M    48K   960M        -         -     0%  0.00%      -    ONLINE
    /tmp/tests/file0.raw     1G      -      -        -         -      -      -      -    ONLINE
    /tmp/tests/file1.raw     1G      -      -        -         -      -      -      -    ONLINE
special                       -      -      -        -         -      -      -      -         -
  mirror-1                1.94G   178K  1.94G        -         -     0%  0.00%      -    ONLINE
    /tmp/tests/file2.raw     2G      -      -        -         -      -      -      -    ONLINE
    /tmp/tests/file3.raw     2G      -      -        -         -      -      -      -    ONLINE

Yay! It worked! So, for my actual storage pool, I ended up doing the following:

  • Given that the partitions used by the special device are located at the end of the SSD, it was easy to expand them using parted resizepart.
  • Run zpool online -e storage partname-1 partname-2 (so I didn’t detach/attach here).

Don’t forget to clean up the testpool:

zpool destroy testpool
rm -r /tmp/tests

Related Images:

Fixing delayed syncing with Linux Nextcloud client

Recently, I noticed that changed files were not picked up by the Nextcloud client as fast as before. As a result I sometimes missed a file (or changes in a file) on my laptop that had been created on my desktop PC.

Today, I tried to run tail and got the following message:

tail: inotify resources exhausted
tail: inotify cannot be used, reverting to polling

This made me realise that the problem with the delayed client sync could be related to the inotify system for monitoring file changes.

It turns out there is a maximum to the number of inotify file watches:

sysctl fs.inotify
fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 524288

Given that on my desktop the Nextcloud client syncs about 235 GB with my personal Nextcloud server and about 10 GB to two servers for work (including several Git repositories), I could imagine the 65536 watches is not enough. Indeed, manually increasing that number made file syncs more or less instantaneous again:

sysctl -n -w fs.inotify.max_user_watches=524288
sysctl fs.inotify
524288
fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 524288

To make these changes permanent I added the following lines at the bottom of /etc/sysctl.conf:

grep -A1 inotify /etc/sysctl.conf
# Increase the number of inotify watches so that the Nextcloud client
# keeps syncing without delay
fs.inotify.max_user_watches=524288

Side note: to apply the changes in the /etc/sysctl.conf file use sudo sysctl -p.

Related Images:

Don’t install Citrix’s AppProtection!

Today I returned to my desktop computer after having been away for a couple of days, during which I only used my laptop. Both machines run Ubuntu Linux, currently version 23.04.

The project I was going to work on required R, but for some reason R didn’t start. No error message, only an exit status of 1 and I was back at the shell prompt. Running R --version worked fine, but Rscript failed in the same way as regular R. I tried running R --vanilla, starting R as a different user, nothing helped.

Time to dig deeper. Make sure all Ubuntu packages are up to date. Reinstall the r-base and r-base-core packages, check whether those packages had been recently updated (no), check whether the package versions were identical to the ones on my laptop (where R worked fine). Nothing…

Maybe there is problem with a (missing) dynamic library? In the (distant) past I have had problems with that, so it is worth a shot:

$ ldd /usr/bin/Rscript
	linux-vdso.so.1 (0x00007ffcac7d4000)
	/usr/local/lib/AppProtection/libAppProtection.so (0x00007f01b9e00000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f01b9bfb000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f01ba204000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f01ba1ff000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f01ba22f000)
	libX11.so.6 => /lib/x86_64-linux-gnu/libX11.so.6 (0x00007f01ba0c1000)
	libxcb.so.1 => /lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f01ba095000)
	libXi.so.6 => /lib/x86_64-linux-gnu/libXi.so.6 (0x00007f01ba081000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f01b9991000)
	libXau.so.6 => /lib/x86_64-linux-gnu/libXau.so.6 (0x00007f01ba07b000)
	libXdmcp.so.6 => /lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f01ba073000)
	libXext.so.6 => /lib/x86_64-linux-gnu/libXext.so.6 (0x00007f01ba05c000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f01b98a8000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f01ba038000)
	libbsd.so.0 => /lib/x86_64-linux-gnu/libbsd.so.0 (0x00007f01b9893000)
	libmd.so.0 => /lib/x86_64-linux-gnu/libmd.so.0 (0x00007f01ba02b000)

That’s strange: I usually don’t have stuff installed in /usr/local. What is this AppProtection library doing there? And then it hit me: I recently (the last time I had used my desktop PC) had to install Citrix’s ICA client to do some remote desktop work for one of my clients. When I installed that package I was asked something about installing some sort of app protection. I had selected “yes”… With a feature named like that I should have known better…

Anyway, time to see if this shared library was indeed the problem. I moved the /usr/local/lib/AppProtection/ directory out of the way and tried to start R. All was fine and dandy again! Except that even an ls command now gave an error:

$ ls /usr/local/lib/AppProtection
ERROR: ld.so: object '/usr/local/lib/AppProtection/libAppProtection.so' from /etc/ld.so.preload cannot be preloaded (cannot open shared object file): ignored.
ls: cannot access '/usr/local/lib/AppProtection': No such file or directory
ERROR: ld.so: object '/usr/local/lib/AppProtection/libAppProtection.so' from /etc/ld.so.preload cannot be preloaded (cannot open shared object file): ignored.

Apparently (obviously?), something still tried to preload the library, even though it no longer existed. It turns out this was done in the file /etc/ld.so.preload:

/usr/local/lib/AppProtection/libAppProtection.so

Given that this was the only content of that file, I opted to just delete it. Finally, my system is back in a working state.

Conclusion: be more careful when installing stuff from external sources and definitely don’t install anything you don’t really need, like Citrix’s App Protection.

P.S. It turns out this was also the reason why the Mattermost app wasn’t running successfully any more.

Related Images:

LXD container snapshots, ZFS snapshots and moving containers

Here, we investigate the behaviour of LXD when moving containers between LXD cluster nodes, with a focus on various types of (filesystem) snapshots.

LXD containers can be snapshot by LXD itself, but in case one uses a ZFS storage backend, one can also use a tool like Sanoid to make snapshots of a container’s filesystem. When moving an LXD container from one LXD cluster node to another, one, of course, wants those filesystem snapshots to move along as well. Spoiler: this isn’t always the case.

Let’s create a test container on my home LXD cluster (which uses ZFS as default storage backend), starting on node wiske2:

lxc launch ubuntu:22.04 snapmovetest --target=wiske2

Check the container is running:

lxc list snapmovetest

+--------------+---------+-----------------------+-------------------------------------------+-----------+-----------+----------+
|     NAME     |  STATE  |         IPV4          |                   IPV6                    |   TYPE    | SNAPSHOTS | LOCATION |
+--------------+---------+-----------------------+-------------------------------------------+-----------+-----------+----------+
| snapmovetest | RUNNING | 192.168.10.158 (eth0) | 2a10:3781:782:1:216:3eff:fed5:ef48 (eth0) | CONTAINER | 0         | wiske2   |
+--------------+---------+-----------------------+-------------------------------------------+-----------+-----------+----------+

Now, let’s use LXD to create two snapshots:

lxc snapshot snapmovetest "Test1"
sleep 10
lxc snapshot snapmovetest "Test2"

Check the snapshots have been made:

lxc info snapmovetest | awk '$1=="Snapshots:" {toprint=1}; {if(toprint==1) {print $0}}'

Snapshots:
+-------+----------------------+------------+----------+
| NAME  |       TAKEN AT       | EXPIRES AT | STATEFUL |
+-------+----------------------+------------+----------+
| Test1 | 2023/03/11 22:22 CET |            | NO       |
+-------+----------------------+------------+----------+
| Test2 | 2023/03/11 22:22 CET |            | NO       |
+-------+----------------------+------------+----------+

At the ZFS level:

zfs list -rtall rpool/lxd/containers/snapmovetest

NAME                                               USED  AVAIL     REFER  MOUNTPOINT
rpool/lxd/containers/snapmovetest                 24.7M   192G      748M  legacy
rpool/lxd/containers/snapmovetest@snapshot-Test1    60K      -      748M  -
rpool/lxd/containers/snapmovetest@snapshot-Test2    60K      -      748M  -

All is fine! Now, let’s move the container to node wiske3:

lxc stop snapmovetest
lxc move snapmovetest snapmovetest --target=wiske3
lxc list snapmovetest

+--------------+---------+------+------+-----------+-----------+----------+
|     NAME     |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS | LOCATION |
+--------------+---------+------+------+-----------+-----------+----------+
| snapmovetest | STOPPED |      |      | CONTAINER | 2         | wiske3   |
+--------------+---------+------+------+-----------+-----------+----------+

Check the snapshots:

lxc info snapmovetest | awk '$1=="Snapshots:" {toprint=1}; {if(toprint==1) {print $0}}'

Snapshots:
+-------+----------------------+------------+----------+
| NAME  |       TAKEN AT       | EXPIRES AT | STATEFUL |
+-------+----------------------+------------+----------+
| Test1 | 2023/03/11 22:22 CET |            | NO       |
+-------+----------------------+------------+----------+
| Test2 | 2023/03/11 22:22 CET |            | NO       |
+-------+----------------------+------------+----------+

At the ZFS level:

zfs list -rtall rpool/lxd/containers/snapmovetest

NAME                                               USED  AVAIL     REFER  MOUNTPOINT
rpool/lxd/containers/snapmovetest                  749M   202G      748M  legacy
rpool/lxd/containers/snapmovetest@snapshot-Test1    60K      -      748M  -
rpool/lxd/containers/snapmovetest@snapshot-Test2    60K      -      748M  -

So far so good: snapshots taken with the native LXD toolchain get moved. Now let’s manually create a ZFS snapshot:

zfs snapshot rpool/lxd/containers/snapmovetest@manual_zfs_snap
zfs list -rtall rpool/lxd/containers/snapmovetest

NAME                                                USED  AVAIL     REFER  MOUNTPOINT
rpool/lxd/containers/snapmovetest                   749M   202G      748M  legacy
rpool/lxd/containers/snapmovetest@snapshot-Test1     60K      -      748M  -
rpool/lxd/containers/snapmovetest@snapshot-Test2     60K      -      748M  -
rpool/lxd/containers/snapmovetest@manual_zfs_snap     0B      -      748M  -

Nove move the container back to node wiske2:

lxc move snapmovetest snapmovetest --target=wiske2
lxc list snapmovetest

+--------------+---------+------+------+-----------+-----------+----------+
|     NAME     |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS | LOCATION |
+--------------+---------+------+------+-----------+-----------+----------+
| snapmovetest | STOPPED |      |      | CONTAINER | 2         | wiske2   |
+--------------+---------+------+------+-----------+-----------+----------+

What happened to the snapshots?

lxc info snapmovetest | awk '$1=="Snapshots:" {toprint=1}; {if(toprint==1) {print $0}}'

Snapshots:
+-------+----------------------+------------+----------+
| NAME  |       TAKEN AT       | EXPIRES AT | STATEFUL |
+-------+----------------------+------------+----------+
| Test1 | 2023/03/11 22:22 CET |            | NO       |
+-------+----------------------+------------+----------+
| Test2 | 2023/03/11 22:22 CET |            | NO       |
+-------+----------------------+------------+----------+

zfs list -rtall rpool/lxd/containers/snapmovetest

NAME                                               USED  AVAIL     REFER  MOUNTPOINT
rpool/lxd/containers/snapmovetest                  749M   191G      748M  legacy
rpool/lxd/containers/snapmovetest@snapshot-Test1    60K      -      748M  -
rpool/lxd/containers/snapmovetest@snapshot-Test2    60K      -      748M  -

Somehow, the ZFS-level snapshot has been removed… I guess this part of the LXD manual should be written in bold (emphasis mine):

LXD assumes that it has full control over the ZFS pool and dataset. Therefore, you should never maintain any datasets or file system entities that are not owned by LXD in a ZFS pool or dataset, because LXD might delete them.

Consequently, in a LXD cluster one shouldn’t use Sanoid to make snapshots ZFS-backed LXD container filesystems. Instead, use LXD’s builtin automatic snapshot capabilities (see the snapshots.expiry and snapshots.schedule options).

Clean up:

lxc delete snapmovetest

Related Images:

Upgrading nodejs to the latest LTS release on Ubuntu 21.10

Today I upgraded the Bash language server (to v3.0.3), after which I noticed that it stopped working. When loading a .bash file, the language server didn’t load and told me to look in the error output for more information. In Emacs, the errors of the Bash language server can be found in the *bash-ls::stderr* buffer, which showed me:

/home/lennart/.emacs.d/.cache/lsp/npm/bash-language-server/lib/node_modules/bash-language-server/node_modules/vscode-jsonrpc/lib/common/linkedMap.js:40
        return this._head?.value;
                          ^

SyntaxError: Unexpected token '.'
    at wrapSafe (internal/modules/cjs/loader.js:915:16)
    at Module._compile (internal/modules/cjs/loader.js:963:27)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1027:10)
    at Module.load (internal/modules/cjs/loader.js:863:32)
    at Function.Module._load (internal/modules/cjs/loader.js:708:14)
    at Module.require (internal/modules/cjs/loader.js:887:19)
    at require (internal/modules/cjs/helpers.js:74:18)
    at Object.<anonymous> (/home/lennart/.emacs.d/.cache/lsp/npm/bash-language-server/lib/node_modules/bash-language-server/node_modules/vscode-jsonrpc/lib/common/api.js:37:21)
    at Module._compile (internal/modules/cjs/loader.js:999:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1027:10)

I re-ran lsp-install-server, which pointed out that I had nodejs v12.22.5 installed and the language server required v14 or higher.

Time to figure out how to install a newer nodejs version on my Ubuntu 21.10 machine. It turns out that v12 is no longer maintained. The current LTS version of nodejs is v16. Here I found instructions on how to install a given version of nodejs on Ubuntu. For v16, this boils down to running

curl -sL https://deb.nodesource.com/setup_16.x | sudo bash -

The script that this command fetches (and executes as root) is quite elaborate, but in the end it simply creates the file /etc/apt/sources.list.d/nodesource.list, with the following contents:

deb [signed-by=/usr/share/keyrings/nodesource.gpg] https://deb.nodesource.com/node_16.x impish main
deb-src [signed-by=/usr/share/keyrings/nodesource.gpg] https://deb.nodesource.com/node_16.x impish main

After that, a simple apt upgrade didn’t suffice. The nodejs upgrade was held back because of a dependency problem. Even an explicit upgrade of the nodejs package didn’t work:

$ sudo apt upgrade nodejs
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies.
 libnode72 : Conflicts: nodejs-legacy
E: Broken packages

So, I resorted to a full apt dist-upgrade, which worked fine. After that, I reopened a Bash script and all was fine.

Related Images:

Getting SMART information from a Seagate Expansion Portable drive

A couple of days ago, I bought myself a 5TB Seagate Expansion Portable drive. This is an 2.5″ external spinning hard disk that connects over USB. In a review on a well-known Dutch website for IT enthusiasts, I read that inside, the drive consists of an ST5000LM000 hard drive and a USB to SATA chip (in contrast to other manufacturers like WD that solder the USB connector directly on the drives circuit board).

After connecting the drive to my computer (that currently runs Ubuntu 21.10), I wanted to see what I could learn about the drive in terms of SMART information. So I tried:

$ sudo smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.13.0-41-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

Read Device Identity failed: scsi error unsupported field in scsi command

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

Trying the suggested -T option didn’t help. So I played around with the -d option that I had used before trying to connect to hard drives behind RAID controllers. That looked better:

$ sudo smartctl -a /dev/sda -T conservative -d sat,auto
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.13.0-41-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               Seagate
Product:              Expansion HDD
Revision:             1901
Compliance:           SPC-4
User Capacity:        5.000.981.077.504 bytes [5,00 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Logical Unit id:      0x3e41434334313346
Serial number:        00000000NACC413F
Device type:          disk
Local Time is:        Thu May 19 21:54:04 2022 CEST
SMART support is:     Unavailable - device lacks SMART capability.

=== START OF READ SMART DATA SECTION ===
Current Drive Temperature:     0 C
Drive Trip Temperature:        0 C

Error Counter logging not supported

No Self-tests have been logged

The drive reports the correct size, but also says there is not SMART support. In fact, using -d scsi gave identical output. Because there should only be this USB to SATA translation layer I thought that somehow I should be able to get the SMART commands to work. Looking through the smartmontools website, I came across this article that explains the “SAT with UAS” situation. It seems that the high speed UAS driver disables SAT transfers in certain cases. The workaround is to tell the kernel to use the older usb-storage driver instead of the uas driver. With the lsusb command I identified the manufacturer and device ID of the drive:

$ lsusb | grep -i seagate
Bus 004 Device 012: ID 0bc2:2037 Seagate RSS LLC Expansion HDD

Next, I made sure to unmount and disconnect the drive and then instructed the kernel to use the old driver for this device:

$ echo "0x0bc2:0x2037:u" | sudo tee /sys/module/usb_storage/parameters/quirks

and reconnected the drive. I verified in the kernel logs that the usb-storage driver was indeed used:

mei 19 22:08:30 barabas kernel: usb 4-3.3: new SuperSpeed USB device number 12 using xhci_hcd
mei 19 22:08:30 barabas mtp-probe[983206]: checking bus 4, device 12: "/sys/devices/pci0000:00/0000:00:08.1/0000:0c:00.3/usb4/4-3/4-3.3"
mei 19 22:08:30 barabas mtp-probe[983206]: bus: 4, device: 12 was not an MTP device
mei 19 22:08:30 barabas kernel: usb 4-3.3: New USB device found, idVendor=0bc2, idProduct=2037, bcdDevice=19.01
mei 19 22:08:30 barabas kernel: usb 4-3.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
mei 19 22:08:30 barabas kernel: usb 4-3.3: Product: Expansion HDD
mei 19 22:08:30 barabas kernel: usb 4-3.3: Manufacturer: Seagate
mei 19 22:08:30 barabas kernel: usb 4-3.3: SerialNumber: 00000000NACC413F
mei 19 22:08:30 barabas kernel: usb 4-3.3: UAS is ignored for this device, using usb-storage instead
mei 19 22:08:30 barabas kernel: usb-storage 4-3.3:1.0: USB Mass Storage device detected
mei 19 22:08:30 barabas kernel: usb-storage 4-3.3:1.0: Quirks match for vid 0bc2 pid 2037: 800000
mei 19 22:08:30 barabas kernel: scsi host6: usb-storage 4-3.3:1.0

Notice the “UAS is ignored” message. And lo and behold, smartctl now works and shows all relevant information:

$ sudo smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.13.0-41-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 2.5 5400
Device Model:     ST5000LM000-2U8170
Serial Number:    WCJ6AG24
LU WWN Device Id: 5 000c50 0e0939684
Firmware Version: 0001
User Capacity:    5.000.981.078.016 bytes [5,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu May 19 22:30:52 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x71) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 827) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x7035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   067   065   006    Pre-fail  Always       -       5367808
  3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       10
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   045    Pre-fail  Always       -       3765
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       0 (86 255 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       9
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   069   040    Old_age   Always       -       31 (Min/Max 29/31)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       23
194 Temperature_Celsius     0x0022   031   040   000    Old_age   Always       -       31 (0 19 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       0 (7 164 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       1723222
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       3644586

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Now that I know how to do this, let’s undo the “use usb-storage driver instead of the uas driver” (alternatively, a reboot should also work, but who wants that?):

$ echo "" | sudo tee /sys/module/usb_storage/parameters/quirks

I will use this drive as a backup drive while I am travelling, so the aim of this post is not only to inform you as my reader(s), but also to remind my future self of how I did this. Now I only need to remember to check the SMART values every once in a while :-).

Related Images:

Using the Lenovo Thunderbolt 3 Essential Dock with Linux

Today the Lenovo Thunderbolt 3 Essential dock I had ordered just before the new year arrived. My current laptop is a 6th Gen Lenovo Thinkpad X1 Carbon, which actually has two Tunderbolt 3 (TB3) ports. For a while the cable mess on my desk had been bothering me and a dock looked like a good way to get rid of both the clutter and the fact that I had to plug in a power cable, an HDMI cable, a USB cable and a network cable. The latter was especially tricky, because my laptop doesn’t have a dedicated ethernet port, but instead has a dongle that plugs into one of the TB3 ports. Of course, I didn’t want it to happen that I was on the road without the dongle, so double checking and making sure I had it in my bag was a regular worry.

With the TB3 Essential dock all this should be over. The dock is pretty well equipped:

  • 1 ethernet port (1Gbps)
  • 1 HDMI 2.0 port
  • 1 DisplayPort 1.4
  • 2 USB A 3.0 1.5Gbps ports
  • 2 USB C 10Gbps ports (no video support)
  • 1 3.5 mm audio jack.

The one thing that remained to be seen, of course, was whether the all those ports would work under Linux as well. I read some promising reports for other TB3-based docks from Lenovo, so I decided to order it.

After connecting the various cables to the dock came the moment supreme: I plugged in the TB3 cable to the laptop. And… The display connected to the HDMI port lit up. So far, so good! No USB functionality, however. Time to dive into the BIOS, because I remembered having seen some settings there, including some security related ones.

In the BIOS I couldn’t change the TB3 security setting because, as the help message explained, the graphics memory was set to 512MB. I’m not sure why this is an issue, but I looked up the graphics RAM setting and reduced it to 256MB. Next, I went back to the TB settings and set the security level to the first option: “No security”, followed by a reboot. Now, everything worked. That is one step in the right direction.

However, I wasn’t willing to forgo all security, so I went back to the BIOS settings and set the security level to “Secure Connection” (according to this blog post at dell.com this is security level 2 or SL2). I rebooted and indeed, no USB. So I went to Ubuntu’s Thunderbolt settings and there I had to press the ‘Unlock’ button in the top right corner, after which I could click on a button authorise the dock. After that all connections worked again. From the commandline, the boltctl utility can be used to see more information on the connected thunderbolt devices. This is the output for an unauthorised device:

$ boltctl list
? Lenovo Thunderbolt 3 Essential Dock
  ?? type:          peripheral
  ?? name:          Thunderbolt 3 Essential Dock
  ?? vendor:        Lenovo
  ?? uuid:          00b00089-417d-0801-ffff-ffffffffffff
  ?? status:        connected
  ?  ?? domain:     ca030000-0070-6f08-2382-4312b0238921
  ?  ?? authflags:  none
  ?? connected:     ma 04 jan 2021 16:26:45 UTC
  ?? stored:        ma 04 jan 2021 16:20:57 UTC
     ?? policy:     iommu
     ?? key:        no

And this is the output after clicking the “Authorise” button in the Ubuntu settings:

? boltctl list
? Lenovo Thunderbolt 3 Essential Dock
  ?? type:          peripheral
  ?? name:          Thunderbolt 3 Essential Dock
  ?? vendor:        Lenovo
  ?? uuid:          00b00089-417d-0801-ffff-ffffffffffff
  ?? status:        authorized
  ?  ?? domain:     ca030000-0070-6f08-2382-4312b0238921
  ?  ?? authflags:  none
  ?? authorized:    ma 04 jan 2021 16:30:07 UTC
  ?? connected:     ma 04 jan 2021 16:26:45 UTC
  ?? stored:        ma 04 jan 2021 16:20:57 UTC
     ?? policy:     iommu
     ?? key:        yes (new)

Nice!

However, my joy was shortlived, because after disconnecting and reconnecting the dock, the USB ports had stopped working again. The device had to be authorised again. This seemed tedious, so I set about reading more in the boltctl man page and found that there was a way to enroll a device permanently. So that is what I did. First I removed its UUID from the database:

$ boltctl forget 00b00089-417d-0801-ffff-ffffffffffff

And then added it again, this time with the auto policy:

$ boltctl enroll --policy auto 00b00089-417d-0801-ffff-ffffffffffff
 ? Lenovo Thunderbolt 3 Essential Dock
   ?? type:          peripheral
   ?? name:          Thunderbolt 3 Essential Dock
   ?? vendor:        Lenovo
   ?? uuid:          00b00089-417d-0801-ffff-ffffffffffff
   ?? dbus path:     /org/freedesktop/bolt/devices/00b00089_417d_0801_ffff_ffffffffffff
   ?? status:        authorized
   ?  ?? domain:     ca030000-0070-6f08-2382-4312b0238921
   ?  ?? parent:     ca030000-0070-6f08-2382-4312b0238921
   ?  ?? syspath:    /sys/devices/pci0000:00/0000:00:1d.0/0000:05:00.0/0000:06:00.0/0000:07:00.0/domain0/0-0/0-1
   ?  ?? authflags:  secure
   ?? authorized:    ma 04 jan 2021 22:33:08 UTC
   ?? connected:     ma 04 jan 2021 22:32:43 UTC
   ?? stored:        ma 04 jan 2021 16:20:57 UTC
      ?? policy:     auto
      ?? key:        yes

As you can see from the boltctl list output below, the policy is now set to auto (instead of the previous iommu):

$ boltctl list
 ? Lenovo Thunderbolt 3 Essential Dock
   ?? type:          peripheral
   ?? name:          Thunderbolt 3 Essential Dock
   ?? vendor:        Lenovo
   ?? uuid:          00b00089-417d-0801-ffff-ffffffffffff
   ?? status:        authorized
   ?  ?? domain:     ca030000-0070-6f08-2382-4312b0238921
   ?  ?? authflags:  secure
   ?? authorized:    ma 04 jan 2021 22:41:46 UTC
   ?? connected:     ma 04 jan 2021 22:41:45 UTC
   ?? stored:        ma 04 jan 2021 16:20:57 UTC
      ?? policy:     auto
      ?? key:        yes

Now I can disconnect and reconnect the dock without problems :-).

I also tested the 3.5mm audio port, which worked (at least for listening, I didn’t have a headset with microphone at hand). Same for the two USB A and the two USB C ports. Finally, I tested the DisplayPort and that worked too. In fact, connecting my 3440×1440 screen via both HDMI and DP to the dock worked fine. The Ubuntu display settings showed a “3 monitor” setup, two 3440×1440 screens and the latop’s own screen at 2560×1440.

So, in conclusion: the Lenovo Thunderbolt 3 Essential dock is fully supported under Ubuntu 20.04.

Thanks to this blog post at FunnelFiasco.com for pointing me to the boltctl utility!

Related Images:

And… we’re back!

The title of this blog post refers to two things: first of all, it has been more than 4 years since my last post here. Incredible… Looking back, I think “too busy” is the main reason for this. Both work and private life have eaten up a lot of my ‘spare’ time. So, as many irregular bloggers do, I hereby declare that I will try to post more frequently :-).

The second reason why the title of this post is appropriate, is because the site has been down for several months (since April this year). This was because I made the stupid mistake of wanting to do too many things at the same time. Let me explain.

The webserver that served this blog was a KVM virtual machine running on my home server. The VM was running a version of Ubuntu that was EOL. Moreover, I wasn’t happy with the fact that I used the Ubuntu WordPress package, instead of using the official WordPress installation method. The main reason for my dissatisfaction came from the fact that by using the Ubuntu package I was depending on the Ubuntu packages to update the package. Which didn’t happen often enough to my liking. By using the official WordPress installation method you can configure WordPress to (semi-)automatically update itself. Much better from a security perspective.

I had already migrated my other KVM VMs to LXD containers, with those providing web servers now running behind an Apache reverse proxy setup. I wanted the same for this last remaining VM.

On top of that, the site wasn’t properly configured to use HTTPS (I think I had a self-signed certificate, but no automatic redirection of HTTP to HTTPS). Thanks to Let’s Encrypt there is no excuse for running a proper HTTPS server so users can browse your site without any eavesdropping.

So, one fine weekend in April I bit the bullet. But I bit off too much by doing it all at once: Install an Apache in a fresh LXD container, put it behind a reverse proxy, install the latest WordPress, restore the files from the old VM, all the while trying to implement an HTTP to HTTPS redirect.

Of course, that didn’t work out. The Debian/Ubuntu configuration of WordPress is different from just editing the default WordPress config file, instead Ubuntu has its own config file in /etc/, which is included in the one from WordPress. Also, it turns out that simply redirecting HTTP to HTTPS won’t work because I had ‘mixed content’ warnings all over the place. To fix those I had to do a search-replace of HTTP to the secure variant in the site’s database and to add some extra lines to the reverse proxy config (more details hopefully soon in a separate post). And then it turned out I had forgotten to enable Apache’s mod_rewrite in the LXC container… Before you know it, you loose sight of what you did, when and why. Which config file you changed, what you disabled, enabled, etc. Especially because all the digging, reading and fixing had to be spread across multiple evenings.

Nevertheless, it was a good learning experience :-). Don’t do this if you don’t have ample time for follow-up, do these kind of upgrades in steps and make a proper plan so that dependencies between the steps are clear and you have a functional setup between each step.

Looking back, the following would have been a much better battle plan then “let’s go, this shouldn’t be hard, the migrations of other VMs/services went fine”:

  • put the VM behind the reverse proxy
  • transition to Let’s Encrypt SSL certificates and set up HTTP to HTTPS redirection
  • Configure the container with the upgraded WordPress installation and migrate the data from the old VM.

There might have been some issues with outdated PHP versions in the old VM, but those could have been (probably) mitigated with some extra Ubuntu upgrades focusing only on keeping the web server alive or the use of a PPA with updated PHP package. All the while I could have at least kept the VM running (only disabling it during the times I actually worked on the upgrade). And, of course, I could have put a backup of the VM back while trying and documenting the upgrade path, but every time I nailed an issue, I expected it to be the last one.

Keep on learning!

Related Images:

Fixing Emacs tramp mode when using zsh

Today I finally took some time to fix a long-standing problem: when trying to edit a file on a remote host using Emacs tramp mode, long time-outs occurred when typing the remote file name (after hitting C-x C-f). These time-outs were so long and occurred after each key press that tramp was effectively useless.

After some digging (e.g. excluding helm as the problem source) I found this entry in the Emacs Wiki which basically told my that using zsh (the Z shell) on the remote host could be the culprit. Indeed, after adding

[[ $TERM == "dumb" ]] && unsetopt zle && PS1='$ ' && return

at the top of my ~/.zshrc file solved the problem instantly. What this line does is simply replacing the shell prompt with a very simple one (a $ followed by a space) if the terminal is of the dumb type (which is the case for tramp).

Related Images:

Installing parted during Ubuntu installation

When installing Ubuntu (I guess a regular Debian installation won’t be any different), I sometimes would like to manually create or change partitions (by jumping to another terminal, e.g. using Alt-F2) before doing the actual install. My preferred tool for that is parted, however, on regular Ubuntu installation images (at least the server variety), parted isn’t available from the console by default.

Today I noticed that (at least on today’s daily image of Ubuntu 16.04 Xenial), a udeb file for parted is available. This is how you install it:

udpkg -i /cdrom/pool/main/p/parted/parted-udeb_3.2-15_amd64.udeb

after which you can use parted to your heart’s content.

For more information on udebs see the Debian Installer Internals documentation.

Related Images:

« Older posts

© 2024 Lennart's weblog

Theme by Anders NorénUp ↑