NVIDIA NetQ 数据手册说明书

NVIDIA NetQ 数据手册说明书


2024年5月9日发(作者:电脑垃圾清理专家)

Datasheet

NVIDIA NetQ

Holistic and real-time visibility, troubleshooting,

and monitoring.

As cloud-scale networking becomes the enterprise norm, so does complexity.

Network operators must manage constant change within multiple network layers,

and polling-based legacy network management tools simply cannot adapt.

Network operators often struggle with operational challenges, such as network

disruption caused by maintenance and configuration changes, and simple

misconfigurations can have a significant impact on operator workloads. Furthermore,

business networks are usually large and complex, which means the set of tasks a

network administrator needs to perform can quickly overwhelm manual efforts. This

requires a shift to both modern networking and modern operational tools.

NVIDIA® NetQ

is a highly-scalable, modern network operations tool that provides

actionable visibility for the NVIDIA Spectrum

Ethernet platform, including NVIDIA®

Cumulus® Linux and SONiC (software for open networking in the cloud), as well as

NVIDIA data processing units (DPUs).

NetQ is built to accelerate NVIDIA platforms, including NVIDIA EGX, NVIDIA

Key Features

> Validations

> Trace

> WJH

> Flow telemetry analysis

> RoCE monitoring

> Events and threshold crossing

alerts

> Notification channels

> Software upgrade management

> Snapshot and compare

DGX

POD and NVIDIA OVX

SuperPOD, and AI solution stacks such as NVIDIA AI

Enterprise and NVIDIA LaunchPad. NetQ uses fabric-wide telemetry data to provide

visibility and troubleshooting of the overlay and underlay network in real time,

delivering the following benefits:

> Network-outage prevention using validation and functional testing with network

continuous integration/continuous delivery (CI/CD), in conjunction with the

NVIDIA Air platform.

> Rapid root-cause detection using network telemetry data, including NVIDIA

What Just Happened® (WJH) data from NVIDIA switches, reducing mean time

to innocence.

> Fabric-wide latency and buffer occupancy analysis of all the paths of a 4-tuple or

5-tuple flow to identify congestion points impacting application performance.

> Network-wide telemetry database to optimize network usage supporting GUI, CLI,

API, and plug-ins (Grafana, etc.).

> Multiple event notification integrations (Syslog, PagerDuty, Slack, email, and

Generic Webhook).

> Topology

> Microservices architecture

> High-availability clustering

> APIs for integration

Proof Points

> Simplified scaling of Cumulus

Linux

> Speeds mean time to innocence

> Reduces opex

> Cuts downtime

> Increases productivity

> Simplifies upgrades

> Reduces security risks

> Maximizes value of network

infrastructure

NVIDIA NetQ | Datasheet | 1

As part of the NVIDIA Spectrum platform, NetQ is tested and validated with

NVIDIA’s full portfolio of Ethernet networking technology, including NVIDIA

BlueField® DPUs. An end-to-end switch to host solution, NetQ is critical for

powering accelerated workloads, and delivers the high performance and innovative

feature set needed to supercharge cloud-native applications at scale.

Orchestration and Operations

Pager DutySyslogSlackEmailWebhookRest APIGrafanaCLIGUI

Systems Integration

NVIDIA NETQ SERVER

Notification System

API Gateway

Telemetry Aggregation

Analytics Applications and Messaging System

and Analytics

NoSQL

On-Premise Telemetry Aggregator (OPTA)

Database

TelemetryTelemetryTelemetry

NVIDIANVIDIANVIDIA

Telemetry From

NetQ AgentCumulus LinuxNetQ Agent

SONiC

DOCA

SONIC

Agent

DPU

Switches and Hosts

Figure 1: NetQ real-time telemetry data collection and deep analytics.

Protect Network Integrity With Validations and CI/CD

Network configuration changes and software upgrades can cause numerous trouble

tickets because of the inability to test before deploying in production. When this

happens, a large amount of data is collected and stored in multiple tools, which

makes it difficult to correlate events to resolve issues. NetQ can be used as the

functional test platform for the network CI/CD in conjunction with NVIDIA Air.

Customers benefit from testing the new configuration with NetQ in the NVIDIA

Air environment (“digital twin”) and fix errors before deploying to their production

network (“physical twin”). In the physical production network, NetQ validations

provide insight into the live state of the network, shorten troubleshooting times,

and prevent network issues like MTU mismatch, VLAN misconfigurations, and more.

Rapid Root Cause Detection

NetQ greatly reduces time-to-innocence by pinpointing and isolating faults

caused by network state changes. Working hand in hand with Cumulus Linux and

SONiC, NetQ enables organizations to validate network state, both during regular

operations and for post-mortem diagnostic analysis. NetQ provides both a CLI and

robust GUI to allow for on-box interactions as part of troubleshooting or visually as

a high-level dashboard.

With the NetQ trace capability, paths are verified, providing additional information

that NetQ uses to discover misconfigurations along all the hops simultaneously,

speeding the time to resolution. NetQ trace allows users to view all of the paths

between devices to find potential problems.

NVIDIA NetQ | Datasheet | 2

The NetQ agents running on switches and hosts monitor various events in real time,

like interface state, BGP neighbors, MACs, and routes, providing a single source of

truth for data center-wide events. These events can be viewed via NetQ CLI, GUI,

and multiple third-party notification services like PagerDuty or Slack.

Deploy Reliable Networking with WJH and Flow Telemetry

WJH is a hardware-accelerated telemetry feature available on NVIDIA Spectrum

switches that streams detailed and contextual telemetry data for analysis. WJH

provides real-time visibility into problems in the network, such as hardware packet

drops due to misconfigurations, buffer congestion, ACL, or layer 1 problems.

WJH provides telemetry data from the switches collected by NetQ, extending GUI

and CLI functionality to WJH as well. When WJH capabilities are combined with

NetQ, packet drops can be identified anywhere in the fabric to improve network

reliability by:

> Viewing current or historic drop information, including the reason for the drop.

> Identifying problematic flows or endpoints and pinpointing exactly where

communication is failing in the network.

> Including contextual WJH drops information in the output with NetQ trace.

gRPC network management interface (gNMI) can also be used to collect WJH data

from the NetQ Agent.

While WJH is always-on—detecting packet drops, latency, and congestion

events—flow telemetry provides on-demand analysis of specific application flows.

NetQ, working with Cumulus Linux, samples packets matching 4-tuple or 5-tuple

application flow, analyzes and reports per switch latency (max., min., avg.), and

buffers occupancy details along the path of the flow. The NetQ GUI reports all the

possible paths, paths in use, and per-path details. On each switch, NetQ shows

minimum latency, maximum latency, average latency, and buffer occupancy.

By combining WJH with flow telemetry analysis, network operators can proactively

identify root cause server and application issues, and inform the server or

application administrator about the possible outage or performance impact.

NetQ Components and Deployment Options

NetQ Components

> NetQ Agents run on Cumulus Linux and SONiC switches and other certified Linux

systems, such as Ubuntu®, Red Hat®, and CentOS hosts. NetQ Agents capture

network data and other state information in real time and transmit the data to

the NetQ Server.

> NetQ Server consists of telemetry data collection software, “on-premises

telemetry aggregator” (OPTA), data analytics applications, and the database. The

NetQ applications and database can be deployed on-premises or consumed as a

cloud-based service.

NVIDIA NetQ | Datasheet | 3

NetQ on Customer Premises

In this deployment option, all NetQ components are deployed on customer premises.

Deployments can span a single site or multiple sites.

> Single-site deployment: NetQ Agents running on switches and hosts collect and

transmit data to the NetQ OPTA, which hosts the NetQ applications and database.

> Multi-site deployment: For the multi-site NetQ implementation, the NetQ Agents

at each premise collect and transmit data from the switches and hosts to the

local OPTA. The OPTAs then transmit the data to a common NetQ applications

server for processing and storage.

For high availability, OPTAs and applications with storage can be deployed as a cluster.

NetQ as a Cloud Service

NetQ as a cloud service is similar to the multi-site deployment, where the OPTAs

run on premises at the customer site, securely connecting to the NetQ multi-tenant

cloud service operated and maintained by NVIDIA.

Ready to Get Started?

To learn more about NetQ, visit:

/en-us/networking/ethernet-switching/netq

© 2023 NVIDIA Corporation and affiliates. All rights reserved. NVIDIA, the NVIDIA logo, NetQ, Spectrum, and

Cumulus Linux are trademarks and/or registered trademarks of NVIDIA Corporation and affiliates in the U.S. and

other countries. All other trademarks and copyrights are the property of their respective owners. 2705480. MAR23


发布者:admin,转转请注明出处:http://www.yc00.com/xitong/1715188830a2579689.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信