NIC goes dark when Proxmox kernel loads after GPU install (works again if GPU removed) - eviltoast

Like the title says. I installed a GPU, everything posts and boots fine. The lights on the Ethernet port are lit up and will stay lit up indefinitely (I assume) if I leave it at the kernel select screen.

But as soon as I load a kernel, the lights go dark. It also is not shown as an active client on my gateway, so it’s not working at all.

I’ve tried lots of commands I’ve found to force it up. It looks to me like the NIC assigned to vmbr0 is correct. Etc. I just can’t get it to work.

If I remove the GPU, it immediately works again. NIC stays up after the kernel loads and I can access the web UI as normal.

rooteprox. *

root@prox:*# ip a

  1. 10: «LOOPBACK, UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
    inet6 :: 1/128 scope host noprefixroute valid_lft forever preferred_lft forever
  2. enpsso: ‹BROADCAST, MULTICAST> mtu 1500 qdisc noop state DOHN group default qlen 1000 link/ether a8:a1:59:be:f2:33 brd ff:ff:ff:ff:ff:ff
    enp0s31f6: «NO-CARRIER, BROADCAST, MULTICAST, UP> mtu 1500 qdisc pfifo_fast master vmbro state DOWN group default qlen 1000 link/ether a8:a1:59:be:f2:32 brd ff:ff:ff:ff:ff:ff
    vmbrO: ‹NO-CARRIER, BROADCAST, MULTICAST, UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000 link/ether a8:a1:59:be:f2:32 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.3/24 scope global vmbro valid_lft forever preferred_lft forever

root@prox: *# cat /etc/network/interfaces

auto lo

iface lo inet loopback

iface enp0s31f6 inet manual

auto vmbro

iface vmbro inet static

address 192.168.1.3/24

gateway 192.168.1.1

bridge-ports enp0s31f6

bridge-stp off bridge-fd o

iface enps0 inet manual

source /etc/network/interfaces.d/*

root@prox: ~# service network restart

Failed to restart network.service: Unit network.service not found.

  • barsquid@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    5 months ago

    I read through your screenshot. The ip command has enp3s0 and the config has enp2s0, I think this might be it.

    • nemanin@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      4
      ·
      5 months ago

      Ohhh. In that last line. I wasn’t even looking at that, I assumed the block above that was setting up the primary NIC…

      I’ll see if changing that interface name does it…

      • 🐍🩶🐢@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        5 months ago

        I changed my settings to name nic cards by mac address instead of the enumeration as I got sick of the name changing when I would add/remove pci devices.

          • 🐍🩶🐢@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            5 months ago

            I am not at home, but what I did was change the 99-default.link file. I found this from the two pages below. https://wiki.debian.org/NetworkInterfaceNames#CUSTOM_SCHEMES_USING_.LINK_FILES https://wiki.debian.org/NetworkInterfaceNames

            Basically, by doing this, your nic cards will be forcibly named using the mac address:

            #/etc/systemd/network/99-default.link
             [Match]
             OriginalName=*
            
             [Link]
             NamePolicy=mac
             MACAddressPolicy=persistent
            

            Afterwards, you will need to reboot and then update your network config file to use the correct names. I don’t ever change the network config with the GUI in proxmox as it has wrecked it too many times. I will update this reply again later with some more information on what to do.

          • 🐍🩶🐢@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            edit-2
            5 months ago

            Sorry, didn’t make it home until today and not sure if you get notifications on edits. You will need a monitor and keyboard hooked up to your server as you will not have ssh access until the network config is “fixed”. I would do the below with the GPU removed, so you know 100% that your networking config is correct before mucking about further.

            Step 1 - Create 99-default.link file

            Add a /etc/systemd/network/99-default.link with the below contents.

            # SPDX-License-Identifier: MIT-0
             #
             # This config file is installed as part of systemd.
             # It may be freely copied and edited (following the MIT No Attribution license).
             #
             # To make local modifications, one of the following methods may be used:
             # 1. add a drop-in file that extends this file by creating the
             #    /etc/systemd/network/99-default.link.d/ directory and creating a
             #    new .conf file there.
             # 2. copy this file into /etc/systemd/network or one of the other paths checked
             #    by systemd-udevd and edit it there.
             # This file should not be edited in place, because it'll be overwritten on upgrades.
            
             [Match]
             OriginalName=*
            
             [Link]
             NamePolicy=mac
             MACAddressPolicy=persistent
            

            Step 2 - Reboot and find new name of NIC that will be based on MAC

            I forget if you have to reboot, but I am going to assume so. At this point, you can get the new name of your nic card and fix your network config.

            1. ip link should list all of your nic devices, both real and virtual. Here is how mine looks like for reference, with the MAC obfuscated:
            1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
                link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
            2: enxAABBCCDDEEFF: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP mode DEFAULT group default qlen 1000
                link/ether AA:BB:CC:DD:EE:FF brd ff:ff:ff:ff:ff:ff
            3: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
                link/ether AA:BB:CC:DD:EE:FF brd ff:ff:ff:ff:ff:ff
            

            Step 3 - Fix your network config and restart network manager

            You will need to edit your /etc/network/interfaces file so the correct card is used.

            1. Make a copy of /etc/network/interfaces, just in case you mess something up.
            2. sudo vim /etc/network/interfaces (or whatever text editor makes you happy) It will need to look something like below. I have to have DHCP turned on for mine, so your config likely uses static. Really all you need to do is change wherever it says enp yada yada to the enxAABBCCDDEEFF you identified above.
             source /etc/network/interfaces.d/*
            
             auto lo
             iface lo inet loopback
            
             iface enxAABBCCDDEEFF inet manual
            
             auto vmbr0
             iface vmbr0 inet dhcp
             #iface vmbr0 inet static
             #address 192.168.5.100/20
             #gateway 192.168.0.1
                 bridge-ports enxAABBCCDDEEFF
                 bridge-stp off
                 bridge-fd 0
            
            1. Restart your networking service. You shouldn’t need to reboot. sudo systemctl restart networking.service

            Step 4 - Profit?

            Hopefully at this point you have nework access again. Check the below, do some ping tests, and if it doesn’t work, double check that you edited the interfaces file correctly.

            1. sudo systemctl status networking.service will show you if anything went wrong and hopefully show that everything is working correctly
            2. ip -br addr show should show that the interface is up now.
            lo               UNKNOWN        127.0.0.1/8 ::1/128
            enxAABBCCDDEEFF  UP
            vmbr0            UP             192.168.5.100/20 
            

            At this point, if all is well, I would reboot anyways, just to make sure. If you add any GPUs, sata drives, other PCI device, disable/enable wifi/bt in the BIOS, or anything else that changes the PCI numbering, you don’t have to worry about your NIC changing.