OpenSSH connections with post-quantum key exchange through WireGuard tunnel

a close up of a piece of paper with writing on it

Photo source: Bozhin Karaivanov (@murrayc) | Unsplash

The other day, I set up a machine using a fairly standard Fedora 43 Server installation with a WireGuard VPN tunnel to another machine running a fairly standard Arch Linux installation (of course).

When I tried to SSH from the Arch machine to the Fedora machine, the OpenSSH client would hang.

Verbose post-quantum OpenSSH key exchange

It was weird, but I thought, well, we used to teach that stuff several years ago as part of the Security of Information and Communication Systems course, it should be possible to figure out what is going on.

I started debugging the issue by enabling verbose mode:

ssh -v my-fedora-machine

(...)
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: mlkem768x25519-sha256
debug1: kex: host key algorithm: ssh-ed25519
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY

Key exchange hangs waiting for the reply. Could it be some unreadable file on the server? I went down the list of the usual suspects and disabled SELinux (hello, setenforce 0, my old friend) before even looking at the audit log. (Sorry, Major Hayden, I know I shouldn't have done that.) As you might guess, given that the length of this blog post goes beyond this paragraph, disabling SELinux did not help.

Legacy pre-quantum key exchange comes to the rescue

Despite the OpenSSH major version being 10 on both sides, I wondered if there could be a case of some subtle cipher incompatibility. After all, post-quantum ML-KEM algorithm (mlkem768x25519-sha256) is somewhat new. Just in case, I tried using a non-post-quantum cipher instead:

ssh -v -oKexAlgorithms=curve25519-sha256 my-fedora-machine

(...)
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: ssh-ed25519
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: SSH2_MSG_KEX_ECDH_REPLY received
debug1: Server host key: ssh-ed25519 SHA256:0ru7bD+izhNW+qTNFkxqHtDoiyDRNLUHHvvuF0O0I84
(...)

That works, hm, very interesting!

I would have almost left it at that and tried it again after OpenSSH version updates (or whenever I remembered). However, some hours later, I randomly noticed the SSH session hanging on large command outputs, such as running dmesg. Wait, I know how to solve that, but could it be that post-quantum OpenSSH key exchange also fails for the same underlying reason?

WireGuard interface MTU setting

Of course, I forgot to reduce the maximum transmission unit, that is, add the MTU = 1280 setting in the WireGuard interface config. (Whether the MTU value of 1280 is optimal is debatable at best, but it has worked for me thus far.) Because of the missing setting, the large packets aren't properly fragmented and cannot pass through.

I edited the config to have:

[Interface]
MTU = 1280

After restarting the wg-quick@.service, the setting is applied:

ip link

(...)
3: wgbtw: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1280 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/none
(...)

Does the original post-quantum key exchange algorithm also work after reducing the interface MTU? Sure enough, it does. OpenSSH with post-quantum cryptography now goes past the expecting SSH2_MSG_KEX_ECDH_REPLY message:

ssh -v -oKexAlgorithms=mlkem768x25519-sha256 my-fedora-machine

(...)
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: mlkem768x25519-sha256
debug1: kex: host key algorithm: ssh-ed25519
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: SSH2_MSG_KEX_ECDH_REPLY received
debug1: Server host key: ssh-ed25519 SHA256:0ru7bD+izhNW+qTNFkxqHtDoiyDRNLUHHvvuF0O0I84
(...)

Reducing the MTU is a good practice in general, but, as you can see, it is possible to forget it and still have a VPN tunnel that works reasonably well with pre-quantum key exchange and small amounts of text in command outputs. (In fact, the recommendation to do so appears in the ArchWiki page about WireGuard, where pages lean toward being concise rather than exhaustive.) To put it plainly, the issue in this particular case is that the missing WireGuard interface MTU setting is hardly the first suspect when OpenSSH key exchange fails to complete.

Is this behavior of post-quantum cryptographic algorithms well known, both in OpenSSH and elsewhere?

Well, yeah, sort of: both ControlPlane and Sophos wrote about a large ClientHello message in ML-KEM having issues with firewalls and requiring a reduced MTU setting. The former also mentions OpenSSH explicitly.

As explained above, the issue is not limited to firewalls, but also relevant for WireGuard (and possibly other VPNs), in which case the interface MTU setting isn't the obvious suspect that comes to mind. Finally, this behavior of OpenSSH through WireGuard is unlikely to be limited to Linux too; the FreeBSD 15.0-RELEASE announcement mentions:

OpenSSH has been upgraded to 10.0p2 which includes support for quantum-resistant key agreement by default.

It should be reasonable to expect the same behavior on FreeBSD 15.0-RELEASE and newer as well.

OpenSSH connections with post-quantum key exchange through WireGuard tunnel

Verbose post-quantum OpenSSH key exchange

Legacy pre-quantum key exchange comes to the rescue

WireGuard interface MTU setting

Related issues elsewhere