Leap/SLES 15.5 – strange compatibility problem between tcpdump, libpcap and arping from iputils

My readers know that I presently work again with virtual networks. A part of my studies is related to ARP and routes on veth devices with VLAN-interfaces. I follow the packet transfer across VLANs with tcpdump, which itself depends on libpcap. On Leap 15 the relevant package is: libpcap1. ARP commands were generated by the arping command, which gets available after an installation of the package “iputils“.

This worked perfectly on a laptop. I know for a presentation had to use another system with the same kernelversion and virtual networking support. There I got strange messages for ARP packets passing a veth endpoint’s main device (in my example: veth2V) on their way to a VLAN interface (“veth2V.30) on the same veth endpoint:

netns2:~ # tcpdump -n -e -i any -v
tcpdump: data link type LINUX_SLL2
tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes

15:15:36.334692 veth2V B   ifindex 2 46:b9:81:00:00:1e ethertype ARP (0x0806), length 52: Unknown Hardware (36461) (len 0), Unknown Protocol (0x0000) (len 1), Unknown (2048) 
        0x0000:  8e6d 0000 0001 0800 0604 0001 46b9 8b4b  .m..........F..K
        0x0010:  8e6d c0a8 0501 ffff ffff ffff c0a8 0502  .m..............
15:15:36.334692 veth2V.30 B   ifindex 4 46:b9:8b:4b:8e:6d ethertype ARP (0x0806), length 48: Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.5.2 (ff:ff:ff:ff:ff:ff) tell 192.168.5.1, length 28
15:15:36.334720 veth2V.30 Out ifindex 4 d2:85:39:c8:43:fc ethertype ARP (0x0806), length 48: Ethernet (len 6), IPv4 (len 4), Reply 192.168.5.2 is-at d2:85:39:c8:43:fc, length 28
15:15:36.334721 veth2V Out ifindex 2 d2:85:81:00:00:1e ethertype ARP (0x0806), length 52: Unknown Hardware (17404) (len 0), Unknown Protocol (0x0000) (len 1), Unknown (2048) 
        0x0000:  43fc 0000 0001 0800 0604 0002 d285 39c8  C.............9.
        0x0010:  43fc c0a8 0502 46b9 8b4b 8e6d c0a8 0501  C.....F..K.m....

I had never seen similar messages in comparable experiments with veths before. And these messages about “Unknown Hardware”. In addition the length of the Ethernet packets were wrong. I did not get such errors on my laptop where I had prepared the setups of the virtual VLANs.

It took me some time to find the difference between the systems: iputils as well as tcpdump on both systems came from the Network:Utilites-repository “https://download.opensuse.org/repositories/network:/utilities/15.5/”.

However, libpcap1 on the presentation system came from the main SLES 15.5 OSS repository in version 1.10.1-150400.1.7. On my laptop I had instead fetched the library from the Network:Utilites-repository, too.

Changing libpcap1 to the present version 1.10.4-lp155.92.1 from the Network:Utilites-repository led to correct tcpdump information:

netns2:~ # tcpdump -n -e -i any -v
tcpdump: data link type LINUX_SLL2
tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes

15:38:44.899645 veth2V B   ifindex 2 f2:5b:23:ba:16:8a ethertype ARP (0x0806), length 48: Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.5.2 (ff:ff:ff:ff:ff:ff) tell 192.168.5.1, length 28
15:38:44.899645 veth2V.30 B   ifindex 4 f2:5b:23:ba:16:8a ethertype ARP (0x0806), length 48: Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.5.2 (ff:ff:ff:ff:ff:ff) tell 192.168.5.1, length 28
15:38:44.899667 veth2V.30 Out ifindex 4 9a:88:24:0c:f9:99 ethertype ARP (0x0806), length 48: Ethernet (len 6), IPv4 (len 4), Reply 192.168.5.2 is-at 9a:88:24:0c:f9:99, length 28
15:38:44.899669 veth2V Out ifindex 2 9a:88:24:0c:f9:99 ethertype ARP (0x0806), length 48: Ethernet (len 6), IPv4 (len 4), Reply 192.168.5.2 is-at 9a:88:24:0c:f9:99, length 28

Interestingly, even if one changed both tcpdump, iputils and libpcap1 to the main Leap /SLES 15.5 repository the problem would come up, too.

So, there seems to be something severely wrong with libpcap1 of the main Leap /SLES 15.5 repositories.

 

WordPress 6.5, a bug in plugin Kadence Importer, defunct FooGallery pages – and some lessons for WP error finding

My wife still administrates Web-sites for customers. Some of them are based on WordPress. After an upgrade to WP6.5 some pages of a customer web-site went down in the middle of last week.

In addition previewing of authors on these specific pages did not work any longer.

A standard WP error message regarding a critical error of the few affected pages and messages from Browser developer tools that the pages were found to be in “Quirks mode” were not really helpful. Especially as the header of the pages in question consistently were correct!

The common property of all these pages was:

They included relatively large FooGalleries with image numbers beyond 35.

As the customer had to participate in a fair pretty soon and wanted to prepare even more pages with large Foogalleries, he and my wife got into a time pressure situation. Both asked me for help. Well, it took me some time to find the culprit. But I was stupid enough not to follow three simple reasonable rules for such cases:

  • Check for PHP errors first!
  • Systematically deactivate or even delete unnecessary plugins afterward.
  • Never perform some changes in parallel when trying to isolate faulty SW and errors.

Instead I was misguided by experimenting with the FooGalleries first.

As a first measure we performed all the steps recommended by Foogallery for problems (see this site). No effect.

Then I found that if one emptied the thumbnail cache completely the web pages with the large galleries could be loaded just once (either by a page preview or a call from a browser via a VPN). The second call crashed the page however. From this experience I focused on a caching problem first. Which obviously existed …

Afterward I had a look at a reference test site with WP version 6.0.3. There we also had some problem in the beginning.

Concentrating on potential cache handling problems we after a while thought that we had a major problem with a certain Statify method to count visitor numbers in case of caching plugins. From a time when the website owner had installed the W3TC-plugin, the settings of Statify have remained such that a Javascript based evaluation of site visits had been activated without nonce check. Meanwhile W3TC was removed due to some major problems with this plugin in WP and PHP-upgrade phases. But the Statify settings had remained the same. On our test site this appeared to have helped immediately. But I had overlooked that my wife in the meantime also had reduced the number of plugins in parallel to a minimum.

Again a substantial mistake in a process of finding errors! Never work with multiple people at different investigation fronts in parallel without a coordination of the changes in a step-wise row! And do not let pressure from customers affect you.

Correcting the Statify method for counting visitors on the main web-site did not prove to be a remedy in the long run. Some test pages which had not worked before now ran flawlessly. The stupid thing was that for other major pages a cache problem remained and after some time some pages became defunct again. But only some!

On the customer’s website the following plugins were installed: Kadence theme, Statify, FooGallery, FooBox Image Lightbox, Pinnacle Toolkit, Kadence blocks, Kadence Importer.

To make a long story short: Kadence Importer (as of present version V2.1.1) has a major PHP bug. Had I run the website in debug mode from the beginning I would have found it and saved me from hours of testing. And I am now pretty sure that the problem had existed for a longer time – undetected. So even returning to a backup before the upgrade would not have helped.

Deactivating and deleting this plugin, which is not needed on productive WP sites anyway (!), removed all problems. I cannot go into details here why this stupid plugin created caching problems. It is bad – just do not use it!

Lessons learned

  1. Do not let the customer play with WP-plugins. Mistrust any plugins which basically are made for advertisement.
  2. Strictly cling to a step-wise strategy for isolating faulty SW after upgrades. Do not let the pressure of a customer make you deviate from your planned sequence of steps.
  3. First look for PHP errors in WP debug mode.
  4. Have a test site available and compare the plugin status and settings in detail.
  5. Deactivate and remove all unnecessary plugins in a systematic manner.
  6. Keep track of special settings of WP plugins, in particular with respect to other caching plugins. And do not let the customer fiddle with such parameters undocumented.
  7. And, of course: Make a full backup of yours site before upgrades. Deactivate any automatic upgrade.

I hope this helps other WP admins.

 

More fun with veth, network namespaces, VLANs – VI – veth subdevices for private 802.1q VLANs

My present post series on veth devices and Linux network namespaces continues articles I have written in 2017 about virtual networking and veth based connections. The last two posts of the present series

were a kind of excursion: We studied the effects of routes on ARP-requests in network namespaces where two or more L2-segments of a LAN terminated – with all IPs in both L2-segments belonging to one and the same IP subnet. So far, the considered segments in this series were standard L2-segments, not VLAN segments.

Despite being instructive, scenarios with multiple standard L2-segments attached to a namespace (or virtualized host) are a bit academic. However, when you replace the virtual L2-segments by private virtual VLAN segments we come to more relevant network configurations. A namespace or a virtualized host with multiple attached VLAN segments is not academic. Even without forwarding and even if the namespace got just one IP. We speak of a virtual “multi-homed” namespace (or virtualized host) connected to multiple VLAN segments.

What could be relevant examples?

  • Think of a virtual host that administers other guest systems in various private VLANs of a virtualization server! To achieve this we need routes in the multi-homed host, but no forwarding.
  • On the other side think of a namespace/host where multiple VLANs terminate and where groups of LAN segments must be connected to each other or to the outer world. In this case we would combine private VLANs with forwarding; the namespace would become a classic router for virtual hosts loacted in different VLANs.

The configuration of virtual private VLAN scenarios based on Linux veth and bridge devices is our eventual objective. In forthcoming posts we will consider virtual VLANs realized with the help of veth subdevices. Such VLANs follow the IEEE 802.1q standard for the Link layer. And regarding VLAN-aware bridges we may refer to RFC5517. A Linux (virtual) bridge fulfills almost all of the requirements of this RFC.

This very limited set of tools gives us a choice between many basic options of setting up a complex VLAN landscape. Different options to configure Linux bridge ports and/or VLAN aware veth endpoints in namespaces extends the spectrum of decisions to be made. An arangement of multiple cascading bridges may define a complex hierarchcal topology. But there are other factors, too. Understanding the influence of routes and of Linux kernel parameters on the path and propagation of ARP and ICMP requests in virtual LANs with VLANs and multi-homed namespaces (or hosts) can become crucial regarding both functionality and weak points of different solutions.

In this post we start with briefly discussing different basic options of VLAN creation. We then look closer at configuration variants for veth VLAN aware interfaces inside a namespace. The results will prepares us for two very simple VLAN scenarios that will help us to investigate the impact of routes and kernel parameters in the next two posts. In later posts I will turn to the port configuration of Linux bridges for complex VLAN solutions.

Private VLANs and veth endpoints

We speak of private VLANs because we want to separate packet traffic between different groups of hosts within a LAN-segment – on the Link layer. This requires that the packets get marked and processed according to these marks. In a 802.1q compliant VLAN the packets get VLAN tags, which identify a VLAN ID [VID] in special data sections of Ethernet packets. By analyzing these tags by and in VLAN aware devices we can filter packets and thus subdivide the original LAN segment into different VLAN-segments. Private VLAN segments form a Link layer broadcast domain (here: Ethernet broadcast domain) of their own.

The drawing below shows tagged packets (diamond like shape) moving along one and the same veth connection line. The colors represent a certain VID.

We see some VLAN aware network devices on both sides. They send and receive packets with certain tags only. A veth endpoint can be supplemented with such VLAN aware subdevices. The device in the middle may instead handle untagged packets only. But we could also have something like this:

Here on the right side a typical “trunk interface” transports all kinds of tagged and untagged packets into a target space – which could be a namespace or the inner part of a bridge. We will see that such a trunk interface is realized by a standard veth endpoint.

Basic VLAN variants based on veths and Linux bridges

Linux bridge ports for attached VLAN aware or unaware devices can be configured in various ways to tag or de-tag packets. Via VID-related filters the bridge can establish and control allowed or disallowed connections between its ports, all on the Link layer. Based on veths and Linux bridges there are four basic ways to set up to (virtual) VLAN solutions:

  1. We can completely rely on the tagging and traffic separation abilities of Linux bridges. All namespaces and hosts would be connected to the bridge via veth lines and devices that transport untagged packets, only. The bridge then defines the VLANs of a LAN-segment.
  2. We can start with VLAN separation already in the multi-homed namespace/host via VLAN aware sub-devices of veth endpoints and connect respective VLAN-aware devices or a trunk device of a veth peer device to the bridge. These options require different settings on bridge ports.
  3. We can combine approaches (1) and (2).
  4. We can connect different namespaces directly via VLAN aware veth peer devices.

Continue reading