BGP (clevertor)
BGP Neighbour AS (and how many routes we do import from them)
State: 2022-11-17
[root@clevertor]:~ # birdc show route all | grep as_path | cut -d ' ' -f 2 | sed 's/^/AS/' | sort | uniq -c
2 AS112
6 AS201701
4 AS204911
10 AS206356
30 AS206813
4 AS208727
2 AS209894
4 AS212989
8 AS213106
12 AS250
104 AS29670
1076 AS3320
10 AS44194
1687228 AS47147 # AS-ANX // ANEXIA Internetdienstleistungs GmbH
12 AS48777
6 AS49009
15 AS50472
6 AS60729
10 AS64475
424487 AS8560 # IONOS-AS // 1&1 IONOS SE
Route to leintor
(State: 2022-11-17)
This is how the route looks like:
[root@clevertor]:~ # host leintor.e.ffh.zone
leintor.e.ffh.zone has address 81.3.6.94
leintor.e.ffh.zone has IPv6 address 2a02:790:1:ff::1001
[root@clevertor]:~ # ip route get 81.3.6.94 from 45.12.203.1
81.3.6.94 from 45.12.203.1 via 185.1.74.35 dev vlan2000 table cix uid 0
cache
[root@clevertor]:~ # birdc show route for 81.3.6.94 all
BIRD 2.0.7 ready.
Table master4:
81.3.0.0/18 unicast [cix1_ip4 2022-11-16 from 185.1.74.1] * (100) [AS24679i]
via 185.1.74.35 on vlan2000
Type: BGP univ
BGP.origin: IGP
BGP.as_path: 47147 24679
BGP.next_hop: 185.1.74.35
BGP.med: 0
BGP.local_pref: 100
BGP.community: (47147,1500) (47147,2000) (47147,2101) (47147,2301) (47147,2401) (47147,2702) (65101,1001) (65102,1000) (65103,276) (65104,150)
BGP.large_community: (6695, 1000, 1) (57555, 1900, 0) (57555, 1901, 200) (57555, 1901, 201)
unicast [cix2_ip4 2022-11-16 from 185.1.74.2] (100) [AS24679i]
via 185.1.74.35 on vlan2000
Type: BGP univ
BGP.origin: IGP
BGP.as_path: 47147 24679
BGP.next_hop: 185.1.74.35
BGP.med: 0
BGP.local_pref: 100
BGP.community: (47147,1500) (47147,2000) (47147,2101) (47147,2301) (47147,2401) (47147,2702) (65101,1001) (65102,1000) (65103,276) (65104,150)
BGP.large_community: (6695, 1000, 1) (57555, 1900, 0) (57555, 1901, 200) (57555, 1901, 201)
This means packets towards leintor should be routed to AS47147 (anexia) next. This can be seen in traceroute:
[root@clevertor]:~ # traceroute -s 45.12.203.1 leintor.e.ffh.zone
traceroute to leintor.e.ffh.zone (81.3.6.94), 30 hops max, 60 byte packets
1 anexia.w19.community-ix.de (185.1.74.35) 0.792 ms 1.203 ms 1.186 ms
2 cr01.h.as24679.net (80.81.192.230) 15.764 ms 15.765 ms 15.701 ms
3 ar11.h.as24679.net (195.47.229.39) 16.072 ms 16.414 ms 16.207 ms
4 leintor.e.ffh.zone (81.3.6.94) 15.927 ms 15.990 ms 15.962 ms
Disclaimer: Do not use MTR!
Mtr says, the route goes to AS56382 (vserver.site) and then to AS50629 (lwlcom):
[root@clevertor]:~ # mtr -w -z -r -4 -a 45.12.203.1 leintor.e.ffh.zone -c 1
Start: 2022-11-17T22:51:57+0100
HOST: clevertor Loss% Snt Last Avg Best Wrst StDev
1. AS56382 ae0-666.mx240.ffm2.vserver.site 0.0% 1 8.6 8.6 8.6 8.6 0.0
2. AS50629 ae0-502.cr10.fra02.lwlcom.net 0.0% 1 8.6 8.6 8.6 8.6 0.0
3. AS50629 ae8-100.cr11.fra01.lwlcom.net 0.0% 1 1.1 1.1 1.1 1.1 0.0
4. AS50629 ae1-100.cr10.fra01.lwlcom.net 0.0% 1 0.9 0.9 0.9 0.9 0.0
5. AS??? cr01.h.as24679.net 0.0% 1 16.0 16.0 16.0 16.0 0.0
6. AS24679 ar11.h.as24679.net 0.0% 1 16.3 16.3 16.3 16.3 0.0
7. AS24679 leintor.e.ffh.zone 0.0% 1 16.1 16.1 16.1 16.1 0.0
This turned out to be a bug in mtr:
https://github.com/traviscross/mtr/issues/250
(Mtr seems to use the correct source address, but it sends packets out to to the wrong interface.)
Speedtests
2022-11-21:
lon.speedtest.clouvider.net (Tx)
[root@clevertor]:~ # iperf3 -4 -B 45.12.203.1 -c lon.speedtest.clouvider.net -t 60 -P 15 -R -i 0
Connecting to host lon.speedtest.clouvider.net, port 5201
Reverse mode, remote host lon.speedtest.clouvider.net is sending
...
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-60.00 sec 3.51 GBytes 503 Mbits/sec 1401 sender
[ 5] 0.00-60.00 sec 3.51 GBytes 502 Mbits/sec receiver
[ 7] 0.00-60.00 sec 3.37 GBytes 483 Mbits/sec 1093 sender
[ 7] 0.00-60.00 sec 3.37 GBytes 482 Mbits/sec receiver
[ 9] 0.00-60.00 sec 3.21 GBytes 459 Mbits/sec 1413 sender
[ 9] 0.00-60.00 sec 3.20 GBytes 458 Mbits/sec receiver
[ 11] 0.00-60.00 sec 3.05 GBytes 437 Mbits/sec 1468 sender
[ 11] 0.00-60.00 sec 3.05 GBytes 437 Mbits/sec receiver
[ 13] 0.00-60.00 sec 3.07 GBytes 439 Mbits/sec 1594 sender
[ 13] 0.00-60.00 sec 3.07 GBytes 439 Mbits/sec receiver
[ 15] 0.00-60.00 sec 3.05 GBytes 437 Mbits/sec 1267 sender
[ 15] 0.00-60.00 sec 3.05 GBytes 436 Mbits/sec receiver
[ 17] 0.00-60.00 sec 3.24 GBytes 463 Mbits/sec 1439 sender
[ 17] 0.00-60.00 sec 3.23 GBytes 463 Mbits/sec receiver
[ 19] 0.00-60.00 sec 2.85 GBytes 409 Mbits/sec 1230 sender
[ 19] 0.00-60.00 sec 2.85 GBytes 408 Mbits/sec receiver
[ 21] 0.00-60.00 sec 3.57 GBytes 511 Mbits/sec 1252 sender
[ 21] 0.00-60.00 sec 3.56 GBytes 510 Mbits/sec receiver
[ 23] 0.00-60.00 sec 3.27 GBytes 468 Mbits/sec 1156 sender
[ 23] 0.00-60.00 sec 3.27 GBytes 468 Mbits/sec receiver
[ 25] 0.00-60.00 sec 3.13 GBytes 448 Mbits/sec 1666 sender
[ 25] 0.00-60.00 sec 3.12 GBytes 447 Mbits/sec receiver
[ 27] 0.00-60.00 sec 3.22 GBytes 461 Mbits/sec 1466 sender
[ 27] 0.00-60.00 sec 3.21 GBytes 460 Mbits/sec receiver
[ 29] 0.00-60.00 sec 2.73 GBytes 390 Mbits/sec 1632 sender
[ 29] 0.00-60.00 sec 2.72 GBytes 390 Mbits/sec receiver
[ 31] 0.00-60.00 sec 2.82 GBytes 404 Mbits/sec 1296 sender
[ 31] 0.00-60.00 sec 2.82 GBytes 403 Mbits/sec receiver
[ 33] 0.00-60.00 sec 2.49 GBytes 356 Mbits/sec 1214 sender
[ 33] 0.00-60.00 sec 2.48 GBytes 355 Mbits/sec receiver
[SUM] 0.00-60.00 sec 46.6 GBytes 6.67 Gbits/sec 20587 sender
[SUM] 0.00-60.00 sec 46.5 GBytes 6.66 Gbits/sec receiver
iperf Done.
lon.speedtest.clouvider.net (Rx)
[root@clevertor]:~ # iperf3 -4 -B 45.12.203.1 -c lon.speedtest.clouvider.net -t 60 -P 15 -i 0
Connecting to host lon.speedtest.clouvider.net, port 5201
...
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-60.00 sec 1.78 GBytes 254 Mbits/sec 572 sender
[ 5] 0.00-60.00 sec 1.77 GBytes 254 Mbits/sec receiver
[ 7] 0.00-60.00 sec 2.36 GBytes 338 Mbits/sec 1283 sender
[ 7] 0.00-60.00 sec 2.36 GBytes 337 Mbits/sec receiver
[ 9] 0.00-60.00 sec 1.84 GBytes 263 Mbits/sec 785 sender
[ 9] 0.00-60.00 sec 1.83 GBytes 263 Mbits/sec receiver
[ 11] 0.00-60.00 sec 1.71 GBytes 245 Mbits/sec 739 sender
[ 11] 0.00-60.00 sec 1.71 GBytes 245 Mbits/sec receiver
[ 13] 0.00-60.00 sec 2.09 GBytes 299 Mbits/sec 1376 sender
[ 13] 0.00-60.00 sec 2.09 GBytes 299 Mbits/sec receiver
[ 15] 0.00-60.00 sec 2.14 GBytes 307 Mbits/sec 1085 sender
[ 15] 0.00-60.00 sec 2.14 GBytes 306 Mbits/sec receiver
[ 17] 0.00-60.00 sec 1.98 GBytes 284 Mbits/sec 1205 sender
[ 17] 0.00-60.00 sec 1.98 GBytes 283 Mbits/sec receiver
[ 19] 0.00-60.00 sec 2.27 GBytes 325 Mbits/sec 1081 sender
[ 19] 0.00-60.00 sec 2.27 GBytes 325 Mbits/sec receiver
[ 21] 0.00-60.00 sec 2.21 GBytes 316 Mbits/sec 957 sender
[ 21] 0.00-60.00 sec 2.21 GBytes 316 Mbits/sec receiver
[ 23] 0.00-60.00 sec 2.11 GBytes 302 Mbits/sec 988 sender
[ 23] 0.00-60.00 sec 2.10 GBytes 301 Mbits/sec receiver
[ 25] 0.00-60.00 sec 1.91 GBytes 274 Mbits/sec 564 sender
[ 25] 0.00-60.00 sec 1.91 GBytes 273 Mbits/sec receiver
[ 27] 0.00-60.00 sec 1.84 GBytes 263 Mbits/sec 781 sender
[ 27] 0.00-60.00 sec 1.84 GBytes 263 Mbits/sec receiver
[ 29] 0.00-60.00 sec 1.72 GBytes 246 Mbits/sec 689 sender
[ 29] 0.00-60.00 sec 1.71 GBytes 245 Mbits/sec receiver
[ 31] 0.00-60.00 sec 2.01 GBytes 287 Mbits/sec 789 sender
[ 31] 0.00-60.00 sec 2.00 GBytes 287 Mbits/sec receiver
[ 33] 0.00-60.00 sec 1.82 GBytes 261 Mbits/sec 860 sender
[ 33] 0.00-60.00 sec 1.82 GBytes 261 Mbits/sec receiver
[SUM] 0.00-60.00 sec 29.8 GBytes 4.26 Gbits/sec 13754 sender
[SUM] 0.00-60.00 sec 29.7 GBytes 4.26 Gbits/sec receiver
iperf Done.
lon.speedtest.clouvider.net (Traceroute)
[root@clevertor]:~ # traceroute -s 45.12.203.1 lon.speedtest.clouvider.net
traceroute to lon.speedtest.clouvider.net (5.180.211.133), 30 hops max, 60 byte packets
1 * * *
2 ae0-0.bbr01.anx25.fra.de.anexia-it.net (144.208.208.143) 1.236 ms 1.231 ms 1.192 ms
3 ffm-b5-link.ip.twelve99.net (62.115.14.116) 0.992 ms 1.144 ms 0.954 ms
4 ffm-bb1-link.ip.twelve99.net (62.115.114.88) 1.258 ms ffm-bb2-link.ip.twelve99.net (62.115.114.90) 1.326 ms ffm-bb1-link.ip.twelve99.net (62.115.114.88) 1.471 ms
5 prs-bb1-link.ip.twelve99.net (62.115.123.13) 10.199 ms 10.172 ms prs-bb2-link.ip.twelve99.net (62.115.122.138) 10.910 ms
6 ldn-bb1-link.ip.twelve99.net (62.115.135.24) 16.826 ms * ldn-bb4-link.ip.twelve99.net (62.115.133.238) 15.319 ms
7 ldn-b3-link.ip.twelve99.net (62.115.120.75) 16.576 ms 16.498 ms ldn-b3-link.ip.twelve99.net (62.115.122.181) 15.354 ms
8 clouvider-ic337427-ldn-b3.ip.twelve99-cust.net (62.115.154.43) 16.958 ms 16.898 ms 14.737 ms
9 h185-42-222-17.reverse.clouvider.net (185.42.222.17) 15.032 ms 15.735 ms 15.620 ms
10 185.245.80.45 (185.245.80.45) 16.409 ms 19.955 ms 19.837 ms
11 94.154.158.21 (94.154.158.21) 35.027 ms 34.940 ms 34.539 ms
12 * * *
13 5.180.211.133 (5.180.211.133) 14.940 ms 14.830 ms 15.226 m
Verteilung der sysirq auf die CPUs
Testbedingungen
TODO: This test should be repeated. It was performed using vmxnet3 driver for the virtual NICs of sn02 and clevertor. We have noticed that vmxnet3 only seems to make use of one the Rx queues:
[root@clevertor]:~ # ethtool -S ens20 | grep -i "Rx Queue" -A 4
Rx Queue#: 0
LRO pkts rx: 87324
LRO byte rx: 132204579
ucast pkts rx: 2371899
ucast bytes rx: 556133012
--
Rx Queue#: 1
LRO pkts rx: 0
LRO byte rx: 0
ucast pkts rx: 0
ucast bytes rx: 0
--
Rx Queue#: 2
LRO pkts rx: 0
LRO byte rx: 0
ucast pkts rx: 0
ucast bytes rx: 0
--
Rx Queue#: 3
LRO pkts rx: 0
LRO byte rx: 0
ucast pkts rx: 0
ucast bytes rx: 0
--
Rx Queue#: 4
LRO pkts rx: 0
LRO byte rx: 0
ucast pkts rx: 0
ucast bytes rx: 0
--
Rx Queue#: 5
LRO pkts rx: 0
LRO byte rx: 0
ucast pkts rx: 0
ucast bytes rx: 0
--
Rx Queue#: 6
LRO pkts rx: 0
LRO byte rx: 0
ucast pkts rx: 0
ucast bytes rx: 0
--
Rx Queue#: 7
LRO pkts rx: 0
LRO byte rx: 0
ucast pkts rx: 0
ucast bytes rx: 0

Test
[root@sn02]:~ # cat /proc/softirqs | grep NET_RX; sleep 10; cat /proc/softirqs | grep NET_RX
NET_RX: 444152123 444864750 318602372 282814552 715704096 14246170 21869027 22622646 22875175 24139240 24413337 24628322 1163353563 56883115 65401651 66196755 71835329 78190059 74964839 74105994 43763250 1879390129 433031072 357154977
NET_RX: 444175136 444886221 318621208 282825827 715720719 14247796 21872928 22626024 22876508 24142666 24416872 24629679 1163361372 56891050 65408727 66201332 71841311 78197288 74971750 74113723 43765303 1879394368 433052575 357173257
[root@sn02]:~ # python3
...
import numpy as np
fn = lambda a: np.array(list(map(int, filter(lambda x: x!="", a.split(" ")))))
a = fn("444152123 444864750 318602372 282814552 715704096 14246170 21869027 22622646 22875175 24139240 24413337 24628322 1163353563 56883115 65401651 66196755 71835329 78190059 74964839 74105994 43763250 1879390129 433031072 357154977")
b = fn("444175136 444886221 318621208 282825827 715720719 14247796 21872928 22626024 22876508 24142666 24416872 24629679 1163361372 56891050 65408727 66201332 71841311 78197288 74971750 74113723 43765303 1879394368 433052575 357173257")
b - a
array([23013, 21471, 18836, 11275, 16623, 1626, 3901, 3378, 1333,
3426, 3535, 1357, 7809, 7935, 7076, 4577, 5982, 7229,
6911, 7729, 2053, 4239, 21503, 18280])
Analyse mit Perf

Verteilung auf die ksoftirqds
Es sieht so aus, als ob die Last da nicht gleichmäßig verteilt wird:

Zoom in die ksoftirqds
ksoftirqd/21 und ksoftirqd/12 machen vmxnet3_poll Aktionen (also auf der NIC). Die anderen ksoftirqds machen scheinbar andere Dinge.
CPU Load before test:

CPU Load During Test:

Test:

#!/bin/sh
set -x
# cleanup
ip netns exec container ip link del wgtest1
ip netns del container
ip link del wgtest2
ip netns add container
ip link add wgtest1 type wireguard
ip link add wgtest2 type wireguard
PRIV_KEY1="eHv2ZoAgJTD1B+XdtHwrhAlatiRWDVhH70MDHOFwwEU="
PUB_KEY1=`echo "$PRIV_KEY1" | wg pubkey`
PRIV_KEY2="cJAUU/ox+gyW5C3Gw69tkexwKJY2i7Gbrv77I/bzZVA="
PUB_KEY2=`echo "$PRIV_KEY2" | wg pubkey`
echo -n "$PRIV_KEY1" | wg set wgtest1 private-key /proc/self/fd/0
echo -n "$PRIV_KEY2" | wg set wgtest2 private-key /proc/self/fd/0
wg set wgtest1 listen-port 52821 peer $PUB_KEY2 allowed-ips 192.168.121.0/24 endpoint [::1]:52822
wg set wgtest2 listen-port 52822 peer $PUB_KEY1 allowed-ips 192.168.121.0/24 endpoint [::1]:52821
ip link set wgtest1 netns container
ip netns exec container ip link set wgtest1 up
ip netns exec container ip a a 192.168.121.1/24 dev wgtest1
ip link set wgtest2 up
ip a a 192.168.121.2/24 dev wgtest2