Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose SRIOV information #230

Closed
wants to merge 1 commit into from
Closed

Conversation

ffromani
Copy link
Collaborator

@ffromani ffromani commented Feb 24, 2021

Expose informations about SRIOV devices. After some design iterations, we fixed these goals:

  • model SRIOV devices like GPUs, so with a top-level package.
  • to shape the API, we start checking what the k8s SRIOV operator needs
  • to further refine later, we will check what other projects like the k8s node feature discovery consume
  • the goal is to enable such projects to switch to ghw if/when they want.

There are few more noteworthy items in this PR:

  • due to lack of resources, initial support is for linux only. Nothing linux-specific should have sneaked in the API. Should the current API design make unnecessarily hard to add support on other platforms (e.g. windows) this should be treated as bug
  • the code prefers Physical Functions for discovery, but we acknowledge that forcing consumers to traverse the physical functions to learn about virtual functions is awkward, so the API provides shallow references to the Virtual Functions for the sake of practicality.

Fixes: #92

Signed-off-by: Francesco Romani fromani@redhat.com

@ffromani
Copy link
Collaborator Author

ffromani commented Feb 24, 2021

Example output, PF device:

lssriov is the same code shown in the example added to the README (see)

./lssriov 0000:05:00.0 | jq
{
  "driver": "igb",
  "interfaces": [
    "enp5s0f0"
  ],
  "physfn": {
    "max_vf_num": 7,
    "vfs": [
      {
        "id": 0,
        "pci_address": "0000:05:10.0"
      },
      {
        "id": 1,
        "pci_address": "0000:05:10.4"
      },
      {
        "id": 2,
        "pci_address": "0000:05:11.0"
      },
      {
        "id": 3,
        "pci_address": "0000:05:11.4"
      }
    ]
  }
}

@ffromani
Copy link
Collaborator Author

ffromani commented Feb 24, 2021

Example output, VF device:

lssriov is the same code shown in the example added to the README (see)

/lssriov 0000:05:10.0 | jq
{
  "driver": "igbvf",
  "interfaces": [
    "enp5s0f0v0"
  ],
  "virtfn": {
    "parent_pci_address": "0000:05:00.0"
  }
}

README.md Outdated
*This API is PROVISIONAL! ghw will try hard to not make breaking changes to this API, but still users are advice this new API is not
declared stable yet. We expect to declare it stable with ghw version 1.0.0*

SRIOV (Single-Root Input/Output Virtualization) is a class of PCI devices that ghw treats explicitely, like gpus; but unlike
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/treats/handles/

@ffromani
Copy link
Collaborator Author

We are converging about which information we should expose about SRIOV devices. The current set of attributes represents what other established components, for example the k8s SRIOV operator exposes . Please note that in case of ghw, the same amount of data is split between the pci, net package and this new SRIOV addition. I think this is a fair enough representation which expands nicely the current layout of ghw packages, and provides a convenient enough API.
Of course, reviews and feedback are welcome to futher improve and to fill any gaps that may have escaped investigation up until now.

The biggest problem however is how to represent the SRIOV devices proper.
At best of knowledge, all the SRIOV devices are also pci devices. Is not possible, in the foreseeable feature, to have SRIOV devices which aren't also PCI devices. This seems to suggest a is-a relationship between SRIOV and PCI.
So should SRIOV devices be presented as subclasses of PCI devices?

Should SRIOV be a subpackage of the pci package (like in the current PR?)
Or should SRIOV be independent, same-level package, mimicing the relationship we currently have between gpu and pci packages?

@ffromani ffromani force-pushed the net-sriov-info branch 2 times, most recently from bbe0fb9 to 8b33ae8 Compare March 18, 2021 11:12
@ffromani
Copy link
Collaborator Author

We are converging about which information we should expose about SRIOV devices. The current set of attributes represents what other established components, for example the k8s SRIOV operator exposes . Please note that in case of ghw, the same amount of data is split between the pci, net package and this new SRIOV addition. I think this is a fair enough representation which expands nicely the current layout of ghw packages, and provides a convenient enough API.
Of course, reviews and feedback are welcome to futher improve and to fill any gaps that may have escaped investigation up until now.

The biggest problem however is how to represent the SRIOV devices proper.
At best of knowledge, all the SRIOV devices are also pci devices. Is not possible, in the foreseeable feature, to have SRIOV devices which aren't also PCI devices. This seems to suggest a is-a relationship between SRIOV and PCI.
So should SRIOV devices be presented as subclasses of PCI devices?

Should SRIOV be a subpackage of the pci package (like in the current PR?)
Or should SRIOV be independent, same-level package, mimicing the relationship we currently have between gpu and pci packages?

Had a chat with @jaypipes and the my takeaways are:

  • we agreed about modeling SRIOV devices like GPUs, so with a top-level package.
  • we agreed that the k8s SRIOV operator is a good testbed for this API - if this new API (+ other enhancemenets we did/are doing in ghw) is enough to fill their use case, our API is good enough.
  • we will watch the k8s node feature discovery them and, if possible and convenient, enable them to consume ghw instead of writing their own code.

@ffromani ffromani force-pushed the net-sriov-info branch 2 times, most recently from 2f4aee4 to b146139 Compare March 29, 2021 15:25
@ffromani ffromani changed the title WIP: RFC: expose SRIOV information expose SRIOV information Mar 29, 2021
@ffromani
Copy link
Collaborator Author

ok, test failure is unexpected. I'll have a look ASAP.

@ffromani
Copy link
Collaborator Author

ok, test failure is unexpected. I'll have a look ASAP.

weird, can't reproduce inside a xenial container:

FROM ubuntu:xenial

RUN apt update
RUN apt install -y curl git make gcc

RUN curl -sL -o /usr/local/bin/gimme https://github.com/travis-ci/gimme/master/gimme && chmod 0755 /usr/local/bin/gimme
RUN gimme 1.16

WORKDIR /go/src/github.com/jaypipes/ghw

# Force the go compiler to use modules.
ENV GO111MODULE=on
ENV GOPROXY=direct

# go.mod and go.sum go into their own layers.
COPY go.mod .
COPY go.sum .

COPY . .

CMD /bin/bash

@ffromani
Copy link
Collaborator Author

same with go1.14.15 (yep, copypasta here #230 (comment))

@ffromani ffromani force-pushed the net-sriov-info branch 3 times, most recently from 9b6c782 to f88d4a5 Compare March 31, 2021 06:45
@ffromani
Copy link
Collaborator Author

ffromani commented Mar 31, 2021

Interesting. We had apparently-random CI failures on xenial only.
I added multiple tests consuming the pci package, and the second/third (can't pinpoint further) was always failing regardless of the order of the tests. The code looks innocent and the stacktrace run deep into the pcidb package. This change seems to fix everything

	// debug CI failures on travis on xenial
	if err := os.Setenv("PCIDB_DISABLE_NETWORK_FETCH", "1"); err != nil {
		t.Fatalf("Canoot set PCIDB_DISABLE_NETWORK_FETCH")
	}

Now I wonder if the default behaviour of pcidb should be reversed: leave network alone unless explicitely requested. @jaypipes any thoughts? worth a issue on the pcidb project for further discussion?

ffromani added a commit to ffromani/ghw that referenced this pull request Apr 7, 2021
Should the pcidb pkg fail to find a local pci database, it optionally
fetches an updated copy from the network.
Even though this is unlikely to happen, and even though this is a nice
feature to have, this is enabled by default and can lead to
hard-to-debug issues.

we seen this first with failing CI on xenial in travis (jaypipes#230 (comment))
It's possible the ultimate cause is out of date test VM, or any CI
failure; but it is also true this a symptom of a deeper issue:
test environments should be tightly controlled in order to be reproducible.

Fetching data from the network, even though  from a safe and trusted location,
voids this assumption.
So we need to force this off in the test path.

Unfortunately this patch has its own drawback: it changes the pcidb
default behaviour, so test code and production code have different
preferred code paths.
To really fix this behaviour completely we should probably file a pcidb
issue (+ PR) to change its defaults.

Signed-off-by: Francesco Romani <fromani@redhat.com>
@ffromani ffromani force-pushed the net-sriov-info branch 3 times, most recently from 0f62884 to 62a092d Compare April 7, 2021 16:00
@ffromani
Copy link
Collaborator Author

ffromani commented Apr 7, 2021

Interesting. We had apparently-random CI failures on xenial only.
I added multiple tests consuming the pci package, and the second/third (can't pinpoint further) was always failing regardless of the order of the tests. The code looks innocent and the stacktrace run deep into the pcidb package. This change seems to fix everything

	// debug CI failures on travis on xenial
	if err := os.Setenv("PCIDB_DISABLE_NETWORK_FETCH", "1"); err != nil {
		t.Fatalf("Canoot set PCIDB_DISABLE_NETWORK_FETCH")
	}

Now I wonder if the default behaviour of pcidb should be reversed: leave network alone unless explicitely requested. @jaypipes any thoughts? worth a issue on the pcidb project for further discussion?

Well this still have some merits, turns out it was in turn a symptom. The real issue -with is fix- is captured in 62a092d

@ffromani ffromani force-pushed the net-sriov-info branch 2 times, most recently from 080eb0b to 1b41001 Compare April 12, 2021 05:39
@ffromani
Copy link
Collaborator Author

depends on #247 - intentionally NOT fixing the conflict to reflect we should not move forward until we discuss #247

@ffromani ffromani force-pushed the net-sriov-info branch 3 times, most recently from ea52d44 to 4715e44 Compare June 25, 2021 15:01
@ffromani ffromani changed the title expose SRIOV information WIP: expose SRIOV information Sep 29, 2021
@ffromani
Copy link
Collaborator Author

back to WIP: need to finish the rebase, and I'd like to land also #281 before to finish this PR.

@ffromani ffromani changed the title WIP: expose SRIOV information expose SRIOV information Nov 4, 2021
@ffromani
Copy link
Collaborator Author

ffromani commented Nov 4, 2021

All the deps of this PR have been merged! removing the WIP status
...but we had some bitrot. I'll take care ASAP.

@ffromani
Copy link
Collaborator Author

@jaypipes at last! all the dependencies of this PR have been merged, CI is green and it's ready for a new review round!

@ffromani
Copy link
Collaborator Author

Fixes: #92

@ffromani
Copy link
Collaborator Author

Example output (long):

# ./ghwc -h

          __
 .-----. |  |--. .--.--.--.
 |  _  | |     | |  |  |  |
 |___  | |__|__| |________|
 |_____|

Discover hardware information.

https://github.com/jaypipes/ghw

Usage:
  ghwc [flags]
  ghwc [command]

Available Commands:
  baseboard   Show baseboard information for the host system
  bios        Show BIOS information for the host system
  block       Show block storage information for the host system
  chassis     Show chassis information for the host system
  cpu         Show CPU information for the host system
  gpu         Show graphics/GPU information for the host system
  help        Help about any command
  memory      Show memory information for the host system
  net         Show network information for the host system
  pci         Show information about PCI devices on the host system
  product     Show product information for the host system
  sriov       Show SRIOV devices information for the host system
  topology    Show topology information for the host system
  version     Display the version of gofile

Flags:
      --debug           Enable or disable debug mode
  -f, --format string   Output format.
                        Choices are 'json','yaml', and 'human'. (default "human")
  -h, --help            help for ghwc
      --pretty          When outputting JSON, use indentation

Use "ghwc [command] --help" for more information about a command.

# ./ghwc sriov
sriov (2 phsyical 8 virtual devices)
 physical function  [affined to NUMA node 0]@0000:05:00.0 -> driver: 'igb' class: 'Network controller' vendor: 'Intel Corporation' product: 'I350 Gigabit Network Connection' with 4/7 virtual functions
 physical function  [affined to NUMA node 1]@0000:05:00.1 -> driver: 'igb' class: 'Network controller' vendor: 'Intel Corporation' product: 'I350 Gigabit Network Connection' with 4/7 virtual functions

# ./ghwc sriov -f yaml
sriov:
  physical_functions:
  - address:
      Bus: "05"
      Device: "00"
      Domain: "0000"
      Function: "0"
    interfaces:
    - enp5s0f0
    max_vf_num: 7
    pci:
      address: "0000:05:00.0"
      class:
        id: "02"
        name: Network controller
      driver: igb
      product:
        id: "1521"
        name: I350 Gigabit Network Connection
      programming_interface:
        id: "00"
        name: unknown
      revision: "0x01"
      subclass:
        id: "00"
        name: Ethernet controller
      subsystem:
        id: "0000"
        name: AOC-SGP-i4
      vendor:
        id: "8086"
        name: Intel Corporation
    vfs:
    - address:
        Bus: "05"
        Device: "10"
        Domain: "0000"
        Function: "0"
      id: 0
      interfaces:
      - enp5s0f0v0
      parent_address:
        Bus: "05"
        Device: "00"
        Domain: "0000"
        Function: "0"
      pci:
        address: "0000:05:10.0"
        class:
          id: "02"
          name: Network controller
        driver: igbvf
        product:
          id: "1520"
          name: I350 Ethernet Controller Virtual Function
        programming_interface:
          id: "00"
          name: unknown
        revision: "0x01"
        subclass:
          id: "00"
          name: Ethernet controller
        subsystem:
          id: "0000"
          name: unknown
        vendor:
          id: "8086"
          name: Intel Corporation
    - address:
        Bus: "05"
        Device: "10"
        Domain: "0000"
        Function: "4"
      id: 1
      interfaces:
      - enp5s0f0v1
      parent_address:
        Bus: "05"
        Device: "00"
        Domain: "0000"
        Function: "0"
      pci:
        address: "0000:05:10.4"
        class:
          id: "02"
          name: Network controller
        driver: igbvf
        product:
          id: "1520"
          name: I350 Ethernet Controller Virtual Function
        programming_interface:
          id: "00"
          name: unknown
        revision: "0x01"
        subclass:
          id: "00"
          name: Ethernet controller
        subsystem:
          id: "0000"
          name: unknown
        vendor:
          id: "8086"
          name: Intel Corporation
    - address:
        Bus: "05"
        Device: "11"
        Domain: "0000"
        Function: "0"
      id: 2
      interfaces:
      - enp5s0f0v2
      parent_address:
        Bus: "05"
        Device: "00"
        Domain: "0000"
        Function: "0"
      pci:
        address: "0000:05:11.0"
        class:
          id: "02"
          name: Network controller
        driver: igbvf
        product:
          id: "1520"
          name: I350 Ethernet Controller Virtual Function
        programming_interface:
          id: "00"
          name: unknown
        revision: "0x01"
        subclass:
          id: "00"
          name: Ethernet controller
        subsystem:
          id: "0000"
          name: unknown
        vendor:
          id: "8086"
          name: Intel Corporation
    - address:
        Bus: "05"
        Device: "11"
        Domain: "0000"
        Function: "4"
      id: 3
      interfaces:
      - enp5s0f0v3
      parent_address:
        Bus: "05"
        Device: "00"
        Domain: "0000"
        Function: "0"
      pci:
        address: "0000:05:11.4"
        class:
          id: "02"
          name: Network controller
        driver: igbvf
        product:
          id: "1520"
          name: I350 Ethernet Controller Virtual Function
        programming_interface:
          id: "00"
          name: unknown
        revision: "0x01"
        subclass:
          id: "00"
          name: Ethernet controller
        subsystem:
          id: "0000"
          name: unknown
        vendor:
          id: "8086"
          name: Intel Corporation
  - address:
      Bus: "05"
      Device: "00"
      Domain: "0000"
      Function: "1"
    interfaces:
    - enp5s0f1
    max_vf_num: 7
    pci:
      address: "0000:05:00.1"
      class:
        id: "02"
        name: Network controller
      driver: igb
      product:
        id: "1521"
        name: I350 Gigabit Network Connection
      programming_interface:
        id: "00"
        name: unknown
      revision: "0x01"
      subclass:
        id: "00"
        name: Ethernet controller
      subsystem:
        id: "0000"
        name: AOC-SGP-i4
      vendor:
        id: "8086"
        name: Intel Corporation
    vfs:
    - address:
        Bus: "05"
        Device: "10"
        Domain: "0000"
        Function: "1"
      id: 0
      interfaces:
      - enp5s0f1v0
      parent_address:
        Bus: "05"
        Device: "00"
        Domain: "0000"
        Function: "1"
      pci:
        address: "0000:05:10.1"
        class:
          id: "02"
          name: Network controller
        driver: igbvf
        product:
          id: "1520"
          name: I350 Ethernet Controller Virtual Function
        programming_interface:
          id: "00"
          name: unknown
        revision: "0x01"
        subclass:
          id: "00"
          name: Ethernet controller
        subsystem:
          id: "0000"
          name: unknown
        vendor:
          id: "8086"
          name: Intel Corporation
    - address:
        Bus: "05"
        Device: "10"
        Domain: "0000"
        Function: "5"
      id: 1
      interfaces:
      - enp5s0f1v1
      parent_address:
        Bus: "05"
        Device: "00"
        Domain: "0000"
        Function: "1"
      pci:
        address: "0000:05:10.5"
        class:
          id: "02"
          name: Network controller
        driver: igbvf
        product:
          id: "1520"
          name: I350 Ethernet Controller Virtual Function
        programming_interface:
          id: "00"
          name: unknown
        revision: "0x01"
        subclass:
          id: "00"
          name: Ethernet controller
        subsystem:
          id: "0000"
          name: unknown
        vendor:
          id: "8086"
          name: Intel Corporation
    - address:
        Bus: "05"
        Device: "11"
        Domain: "0000"
        Function: "1"
      id: 2
      interfaces:
      - enp5s0f1v2
      parent_address:
        Bus: "05"
        Device: "00"
        Domain: "0000"
        Function: "1"
      pci:
        address: "0000:05:11.1"
        class:
          id: "02"
          name: Network controller
        driver: igbvf
        product:
          id: "1520"
          name: I350 Ethernet Controller Virtual Function
        programming_interface:
          id: "00"
          name: unknown
        revision: "0x01"
        subclass:
          id: "00"
          name: Ethernet controller
        subsystem:
          id: "0000"
          name: unknown
        vendor:
          id: "8086"
          name: Intel Corporation
    - address:
        Bus: "05"
        Device: "11"
        Domain: "0000"
        Function: "5"
      id: 3
      interfaces:
      - enp5s0f1v3
      parent_address:
        Bus: "05"
        Device: "00"
        Domain: "0000"
        Function: "1"
      pci:
        address: "0000:05:11.5"
        class:
          id: "02"
          name: Network controller
        driver: igbvf
        product:
          id: "1520"
          name: I350 Ethernet Controller Virtual Function
        programming_interface:
          id: "00"
          name: unknown
        revision: "0x01"
        subclass:
          id: "00"
          name: Ethernet controller
        subsystem:
          id: "0000"
          name: unknown
        vendor:
          id: "8086"
          name: Intel Corporation
  virtual_functions:
  - address:
      Bus: "05"
      Device: "10"
      Domain: "0000"
      Function: "0"
    id: 0
    interfaces:
    - enp5s0f0v0
    parent_address:
      Bus: "05"
      Device: "00"
      Domain: "0000"
      Function: "0"
    pci:
      address: "0000:05:10.0"
      class:
        id: "02"
        name: Network controller
      driver: igbvf
      product:
        id: "1520"
        name: I350 Ethernet Controller Virtual Function
      programming_interface:
        id: "00"
        name: unknown
      revision: "0x01"
      subclass:
        id: "00"
        name: Ethernet controller
      subsystem:
        id: "0000"
        name: unknown
      vendor:
        id: "8086"
        name: Intel Corporation
  - address:
      Bus: "05"
      Device: "10"
      Domain: "0000"
      Function: "4"
    id: 1
    interfaces:
    - enp5s0f0v1
    parent_address:
      Bus: "05"
      Device: "00"
      Domain: "0000"
      Function: "0"
    pci:
      address: "0000:05:10.4"
      class:
        id: "02"
        name: Network controller
      driver: igbvf
      product:
        id: "1520"
        name: I350 Ethernet Controller Virtual Function
      programming_interface:
        id: "00"
        name: unknown
      revision: "0x01"
      subclass:
        id: "00"
        name: Ethernet controller
      subsystem:
        id: "0000"
        name: unknown
      vendor:
        id: "8086"
        name: Intel Corporation
  - address:
      Bus: "05"
      Device: "11"
      Domain: "0000"
      Function: "0"
    id: 2
    interfaces:
    - enp5s0f0v2
    parent_address:
      Bus: "05"
      Device: "00"
      Domain: "0000"
      Function: "0"
    pci:
      address: "0000:05:11.0"
      class:
        id: "02"
        name: Network controller
      driver: igbvf
      product:
        id: "1520"
        name: I350 Ethernet Controller Virtual Function
      programming_interface:
        id: "00"
        name: unknown
      revision: "0x01"
      subclass:
        id: "00"
        name: Ethernet controller
      subsystem:
        id: "0000"
        name: unknown
      vendor:
        id: "8086"
        name: Intel Corporation
  - address:
      Bus: "05"
      Device: "11"
      Domain: "0000"
      Function: "4"
    id: 3
    interfaces:
    - enp5s0f0v3
    parent_address:
      Bus: "05"
      Device: "00"
      Domain: "0000"
      Function: "0"
    pci:
      address: "0000:05:11.4"
      class:
        id: "02"
        name: Network controller
      driver: igbvf
      product:
        id: "1520"
        name: I350 Ethernet Controller Virtual Function
      programming_interface:
        id: "00"
        name: unknown
      revision: "0x01"
      subclass:
        id: "00"
        name: Ethernet controller
      subsystem:
        id: "0000"
        name: unknown
      vendor:
        id: "8086"
        name: Intel Corporation
  - address:
      Bus: "05"
      Device: "10"
      Domain: "0000"
      Function: "1"
    id: 0
    interfaces:
    - enp5s0f1v0
    parent_address:
      Bus: "05"
      Device: "00"
      Domain: "0000"
      Function: "1"
    pci:
      address: "0000:05:10.1"
      class:
        id: "02"
        name: Network controller
      driver: igbvf
      product:
        id: "1520"
        name: I350 Ethernet Controller Virtual Function
      programming_interface:
        id: "00"
        name: unknown
      revision: "0x01"
      subclass:
        id: "00"
        name: Ethernet controller
      subsystem:
        id: "0000"
        name: unknown
      vendor:
        id: "8086"
        name: Intel Corporation
  - address:
      Bus: "05"
      Device: "10"
      Domain: "0000"
      Function: "5"
    id: 1
    interfaces:
    - enp5s0f1v1
    parent_address:
      Bus: "05"
      Device: "00"
      Domain: "0000"
      Function: "1"
    pci:
      address: "0000:05:10.5"
      class:
        id: "02"
        name: Network controller
      driver: igbvf
      product:
        id: "1520"
        name: I350 Ethernet Controller Virtual Function
      programming_interface:
        id: "00"
        name: unknown
      revision: "0x01"
      subclass:
        id: "00"
        name: Ethernet controller
      subsystem:
        id: "0000"
        name: unknown
      vendor:
        id: "8086"
        name: Intel Corporation
  - address:
      Bus: "05"
      Device: "11"
      Domain: "0000"
      Function: "1"
    id: 2
    interfaces:
    - enp5s0f1v2
    parent_address:
      Bus: "05"
      Device: "00"
      Domain: "0000"
      Function: "1"
    pci:
      address: "0000:05:11.1"
      class:
        id: "02"
        name: Network controller
      driver: igbvf
      product:
        id: "1520"
        name: I350 Ethernet Controller Virtual Function
      programming_interface:
        id: "00"
        name: unknown
      revision: "0x01"
      subclass:
        id: "00"
        name: Ethernet controller
      subsystem:
        id: "0000"
        name: unknown
      vendor:
        id: "8086"
        name: Intel Corporation
  - address:
      Bus: "05"
      Device: "11"
      Domain: "0000"
      Function: "5"
    id: 3
    interfaces:
    - enp5s0f1v3
    parent_address:
      Bus: "05"
      Device: "00"
      Domain: "0000"
      Function: "1"
    pci:
      address: "0000:05:11.5"
      class:
        id: "02"
        name: Network controller
      driver: igbvf
      product:
        id: "1520"
        name: I350 Ethernet Controller Virtual Function
      programming_interface:
        id: "00"
        name: unknown
      revision: "0x01"
      subclass:
        id: "00"
        name: Ethernet controller
      subsystem:
        id: "0000"
        name: unknown
      vendor:
        id: "8086"
        name: Intel Corporation

Add support to report SRIOV devices.
Much like GPU devices, support is added using a new top-level package.

SRIOV devices are either Physical Functions or Virtual functions.
The preferred representation for ghw is Physical Functions, whose
dependent devices will be Virtual Functions; however, for the sake of
practicality, the API also exposes soft references to Virtual Functions,
so consumers of the API can access them directly and not navigating
the parent devices.

This patch also adds support in `ghwc`, to report the sriov information,
and in the `snapshot` package, to make sure to capture all the files
in sysfs that ghw cares about.

Last but not least, lacking access to suitable non-linux systems,
support is provided only on linux OS, even though the API tries hard
not to be linux-specific.

Resolves: jaypipes#92
Signed-off-by: Francesco Romani <fromani@redhat.com>
Copy link
Owner

@jaypipes jaypipes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fromanirh, the more I look at this patch, the more I think these changes belong in the pkg/pci package and not as a new pkg/sriov package. The fact is, SR-IOV is specific to PCI Express and therefore should really be a set of additional attributes on the existing pkg/pci.Device struct.

See inline for more explanation.

@@ -1115,6 +1115,67 @@ information
`ghw.TopologyNode` struct if you'd like to dig deeper into the NUMA/topology
subsystem

### SRIOV

*This API is PROVISIONAL! ghw will try hard to not make breaking changes to this API, but still users are advice this new API is not
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend removing this. :) The interfaces we publish are specific to a Git tag or SHA1 per Go package versioning and semver behaviour.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem! will remove.

"virtfn*",
}

ignoreSet := newKeySet(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to create a new keySet typedef. We can just use map[string]bool here and where you're doing the ignoreSet.Contains call below, just do:

if _, ignored := ignoreSet[globbedEntry]; ignored {
     continue
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing, will simplify

}
}

func findNetworks(ctx *context.Context, devPath string) []string {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are they network names or are they network interface names? :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, they are actually network interface names. Will clarify the function name.

Comment on lines +19 to +24
type Device struct {
Interfaces []string `json:"interfaces"`
// the PCI address where the SRIOV instance can be found
Address *pciaddr.Address `json:"address"`
PCI *pci.Device `json:"pci"`
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a new pkg/sriov package and pkg/sriov.Device structI think we should just modify thepkg/pci.Device` struct to read SR-IOV (and network interface) information when the PCI device is a PF or VF.

We should add a new Interface struct to pkg/net:

// Interface represents a physical or virtual network interface on the host
type Interface struct {
    // Name is the name of the interface on the host
    Name string `json:"name"`
}

We can then modify the pkg/pci.Device struct to add the following:

...
import (
...
    "github.com/jaypipes/ghw/pkg/net"
...
)
...
type Device struct {
...
    // NetworkInterfaces is the collection of network interfaces housed on this
    // PCI device.
    NetworkInterfaces []net.Interface `json:"network_interfaces,omitempty"`
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting idea. I'd like to process the concept a bit more to fully grok it but doesn't look bad at all. However this also mean that we aim to absorb all the PCI device type (= gpu) into the PCI package in the long run?
Or it's just SRIOV which is not special enough? (again no strong feeling, but wondering about the general direction)

}

type PhysicalFunction struct {
Device
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here, we'd inherit from pkg/pci.Device instead.

}

type VirtualFunction struct {
Device
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.


type VirtualFunction struct {
Device
ID int `json:"id"`
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the index of the VF within the PF? If so, should we call this Index?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it should be called probably Index and I should clarify it with a code comment.

Comment on lines +47 to +48
// Address of the (parent) Physical Function this Virtual Function pertains to.
ParentAddress *pciaddr.Address `json:"parent_address,omitempty"`
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just store a pointer to the parent pkg/pci.Device struct instead?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it, could we just use a single Function struct to describe all of the SR-IOV functions, both physical and virtual?

// Function describes an SR-IOV physical or virtual function. Physical functions
// will have no Parent Function struct pointer and will have one or more Function
// structs in the Functions field.
type Function struct {
    Device // All Functions are PCI Devices
    // Parent contains a pointer to the parent physical function.
    // Will be empty when this is a physical function
    Parent *Function `json:"parent,omitempty"`
    // MaxVFs contains the maximum number of supported virtual
    // functions for this physical function
    MaxVFs int `json:"max_vfs"
    // Functions contains the physical function's virtual functions
    Functions []*Function `json:"functions"`
}

You could create a simple IsPhysical utility method on pkg/pci.Function:

// IsPhysical returns true if the PCIe function is a physical function, false
// if it is a virtual function
func (f *Function) IsPhysical() bool {
    return f.Parent == nil
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting angle, I don't have again strong feelings about this idea. With respect the current approach, we lose something, we gain something. I'll file a separate PR so we can compare the two approaches, and I'll close the one we like less (with rationale of course).

@ffromani
Copy link
Collaborator Author

sorry for the delay, caused by winter holidays plus crazy january. I'll resume work here ASAP, let's try this variation!

@ffromani
Copy link
Collaborator Author

@fromanirh, the more I look at this patch, the more I think these changes belong in the pkg/pci package and not as a new pkg/sriov package. The fact is, SR-IOV is specific to PCI Express and therefore should really be a set of additional attributes on the existing pkg/pci.Device struct.

See inline for more explanation.

(at LONG last I finally got some time to resume working on this PR!)
I don't have strong feelings here, so I'm fine modeling SRIOV devices as extra attributes of PCI devices.
The only minor question that this would arise is, however, related to GPU devices which sit in their own module. Do we want to absorb GPU devices in the PCI devices someday in the future?

Again, I don't have strong feelings here, but this (perceived?) inconsistency makes me think.
Going to address your comments now, thanks for your review!

@ffromani
Copy link
Collaborator Author

ffromani commented May 1, 2022

at long last, the alternate implementation is available: #315

@ffromani
Copy link
Collaborator Author

we agreed to move forward with #315

@ffromani ffromani closed this Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Grab information on VFs for SR-IOV PFs
2 participants