Getting familiar with Auvidea NVIDIA Jetson TX2 NX

Hardware Overview

Jetson TX2 NX module

Jetson TX2 NX Module Jetson TX2 NX Module

Specifications:

AI performance 1.33 TFLOPS
GPU NVIDIA Pascalâ„¢ Architecture GPU with 256 CUDA cores
CPU Dual-core NVIDIA Denver 2 64-bit CPU and quad-core ARM A57 Complex
RAM 4GB 128-bit LPDDR4, 1600 MHz - 51.2 GBs
Storage 16GB eMMC 5.1 Flash Storage

For more details, see the Jetson TX2 NX Module product page.

Auvidea board

The JNX30M carrier board looks like this:

JNX30M carrier board

For more details, see the Auvidea product page.

GBEOS

Content

  • Linux For Tegra: l4t-32.7.1
  • Linux kernel: 4.9.253-l4t-r32.7
  • NVIDIA JetPack: 4.6.1
  • NVIDIA Cuda: 10.2.300
  • k3s: v1.22.6

Using or building AI/ML image containers for the Jetson TX2 NX

Since the GBEOS distribution aligns with Linux for Tegra r32.7.1 and JetPack 4.6.1, the container images provided by NVIDIA can be used at runtime, or as a starting point for building new images. You can find the image catalog at the NVIDIA NGC website.

Compatible images contain the r32.7.1 string. For example, for a compatible tensorflow 2.7.0 image, the appropriate image would be: l4t-tensorflow:r32.7.1-tf2.7-py3.

Post-installation steps

root password

The root user has no password. Set a password to protect administrator access to the box. Complete the following steps.

  1. Open a terminal on your workstation and connect to the box using SSH:

    ssh root@<box_ip_address>
    
  2. After the connection is established, run the passwd command, and enter a new password.

    Expected output:

    New password:******
    Retype new password:*****
    passwd: password updated successfully
    

Power mode

The Jetson TX2 NX module supports several power modes. The default power mode provides performance profile.

The nvpmodel tool allows to set the preferred power mode.

Default power model is 0. Run the nvpmodel -q command to display the current power model.

NVPM WARN: fan mode is not set!
NV Power Mode: MAXN
0

This model brings the best performance level. All CPU cores are enabled. The 4 Cortex A57 are enabled as well as the 2 Denver Cores, as shown in the output of the lscpu command:

Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 6
  On-line CPU(s) list:  0-5
Vendor ID:              ARM
  Model name:           Cortex-A57
    Model:              3
    Thread(s) per core: 1
    Core(s) per socket: 1
    Socket(s):          1
    Stepping:           r1p3
    CPU max MHz:        2035.2000
    CPU min MHz:        345.6000
    BogoMIPS:           62.50
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32
  Model name:           Denver 2
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 2
    Socket(s):          1
    Stepping:           0x0
    CPU max MHz:        2035.2000
    CPU min MHz:        345.6000
    BogoMIPS:           62.50
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32
  Model name:           Cortex-A57
    Model:              3
    Thread(s) per core: 1
    Core(s) per socket: 3
    Socket(s):          1
    Stepping:           r1p3
    CPU max MHz:        2035.2000
    CPU min MHz:        345.6000
    BogoMIPS:           62.50
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32
Caches (sum of all):
  L1d:                  128 KiB (6 instances)
  L1i:                  192 KiB (6 instances)
  L2:                   2 MiB (2 instances)

Note: Power model 0 configures the cores max frequencies to their maximal nominal value. In this mode all the cores can operate at their max frequencies. (You can check the maximum frequency of the cores by running cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq). However, in the absence of a cooling fan this power level may lead to overheating under heavy loads.

To set a different power model, run the nvpmodel -m <number> command. For example, to set power model 2, run nvpmodel -m 2

Power model 2 enables all the cores, balancing the power consumption by capping the clock frequencies to lower values, as you can see on the lscpu output:

Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 6
  On-line CPU(s) list:  0-5
Vendor ID:              ARM
  Model name:           Cortex-A57
    Model:              3
    Thread(s) per core: 1
    Core(s) per socket: 1
    Socket(s):          1
    Stepping:           r1p3
    CPU max MHz:        2035.2000
    CPU min MHz:        345.6000
    BogoMIPS:           62.50
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32
  Model name:           Denver 2
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 2
    Socket(s):          1
    Stepping:           0x0
    CPU max MHz:        2035.2000
    CPU min MHz:        345.6000
    BogoMIPS:           62.50
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32
  Model name:           Cortex-A57
    Model:              3
    Thread(s) per core: 1
    Core(s) per socket: 3
    Socket(s):          1
    Stepping:           r1p3
    CPU max MHz:        2035.2000
    CPU min MHz:        345.6000
    BogoMIPS:           62.50
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32
Caches (sum of all):
  L1d:                  128 KiB (6 instances)
  L1i:                  192 KiB (6 instances)
  L2:                   2 MiB (2 instances)

To help investigate the different configurations, you can download and use the following tool:

curl -LOJ https://raw.githubusercontent.com/piaoling199/TX2-notes/master/sources/jetson_clocks.sh && chmod +x jetson_clocks.sh

To visualize the current configuration for the clock frequencies, run ./jetson_clocks.sh --show

Expected output:

SOC family:tegra186  Machine:lanai-3636
Online CPUs: 0-5
CPU Cluster Switching: Disabled
cpu0: Gonvernor=schedutil MinFreq=345600 MaxFreq=2035200 CurrentFreq=806400
cpu1: Gonvernor=schedutil MinFreq=345600 MaxFreq=2035200 CurrentFreq=345600
cpu2: Gonvernor=schedutil MinFreq=345600 MaxFreq=2035200 CurrentFreq=345600
cpu3: Gonvernor=schedutil MinFreq=345600 MaxFreq=2035200 CurrentFreq=1881600
cpu4: Gonvernor=schedutil MinFreq=345600 MaxFreq=2035200 CurrentFreq=1420800
cpu5: Gonvernor=schedutil MinFreq=345600 MaxFreq=2035200 CurrentFreq=1728000
GPU MinFreq=114750000 MaxFreq=1300500000 CurrentFreq=114750000
EMC MinFreq=204000000 MaxFreq=1600000000 CurrentFreq=1600000000 FreqOverride=0
Can't access Fan!

This reveals that the min, max, and current frequencies differ because the frequencies are adjusted on-demand. While this is reasonable considering the power consumption and hence the thermal dissipation optimizations (given the fact that the board operates in a fanless box as indicated by the last output line), it may in some cases lower the deterministic performances of certain workflows.

Executing the tool without any parameter configures the clocks for best performance. Run the following commands:

./jetson_clocks.sh
./jetson_clocks.sh --show

Expected output shows that the clock frequency values have all been set to the maximal and nominal values:

SOC family:tegra186  Machine:lanai-3636
Online CPUs: 0-5
CPU Cluster Switching: Disabled
cpu0: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
cpu1: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
cpu2: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
cpu3: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
cpu4: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
cpu5: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
GPU MinFreq=1300500000 MaxFreq=1300500000 CurrentFreq=1300500000
EMC MinFreq=204000000 MaxFreq=1600000000 CurrentFreq=1600000000 FreqOverride=1
Can't access Fan!

Kubernetes scheduling optimizations

TX2 NX Specifics

The TX2 NX comes with two core clusters:

  • 4 x ARM CORTEX-A57 cores
  • 2 x NVIDIA Denver core

Usage for these different cores may be determined per workload type. One good practice is to use the CORTEX cores for general purpose workloads and reserve the Denver cores for workloads requiring an exclusive access to the CPU. To this extent, the kernel command line excludes cores 1 and 2 (Denver cores) from its scheduler. This is achieved by using the kernel parameter isolcpus=1-2. These cores are used only by processes that explicitly require them.

This can be correlated to the Kubernetes QoS classes:

  • BestEffort
  • Guaranteed

Kubernetes (K3S) cpu manager configuration

The cpu manager policy is preconfigured to static and assigns cores 0, 3, 4 and 5 to the BestEffort QoS class. Cores 1 and 2 (Denver cores) are reserved for executing pods with the Guaranteed QoS class, hence getting an exclusive access to these cores.

Using the Guaranteed QoS class

Initial state: cat /var/lib/kubelet/cpu_manager_state

The output shows that all cores are available to the CPU manager.

{"policyName":"static","defaultCpuSet":"0-5","checksum":1946793818}

Create a pod with the BestEffort QoS class:

cat > l4t-be.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
  name: l4t-be
spec:
  containers:
    - name: l4t
      image: nvcr.io/nvidia/l4t-base:r32.7.1
      command: ["/bin/bash"]
EOF

Apply the pod:

kubectl apply -f l4t-be.yaml

The l4t-be pod is now running in the BestEffort QoS class:

kubectl get po l4t-be -oyaml | grep qosClass

Expected output:

  qosClass: BestEffort

All CPU core remain available to the cpu manager:

cat /var/lib/kubelet/cpu_manager_state```

Expected output:

```json
{"policyName":"static","defaultCpuSet":"0-5","checksum":1946793818}

Create a pod with a Guaranteed QoS class:

cat > l4t-st.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
  name: l4t-st
spec:
  containers:
    - name: l4t
      image: nvcr.io/nvidia/l4t-base:r32.7.1
      command: ["/bin/bash"]
      resources:
        limits:
          cpu: 1
          memory: 1Gi
        requests:
          cpu: 1
          memory: 1Gi
EOF

Apply the pod:

kubectl apply -f l4t-st.yaml

The l4t-st pod is now running in the Guaranteed QoS class:

kubectl get po l4t-st -oyaml | grep qosClass
  qosClass: Guaranteed

The cpu manager has been updated:

cat /var/lib/kubelet/cpu_manager_state
{"policyName":"static","defaultCpuSet":"0,2-5","entries":{"2e317c46-9ee8-46c0-a4d6-ef550e54acb2":{"l4t":"1"}},"checksum":710351396}

The available cpuset is now "defaultCpuSet":"0,2-5". Core 1 has been excluded from the "defaultCpuSet":"0-5" list you have seen before deploying the l4t-st pod. Note that "entries":{"2e317c46-9ee8-46c0-a4d6-ef550e54acb2":{"l4t":"1"}} shows that core number 1 has been allocated to the l4t container from the l4t-st pod.

References