Getting familiar with Auvidea NVIDIA Jetson TX2 NX
Hardware Overview
Jetson TX2 NX module
Specifications:
AI performance | 1.33 TFLOPS |
GPU | NVIDIA Pascalâ„¢ Architecture GPU with 256 CUDA cores |
CPU | Dual-core NVIDIA Denver 2 64-bit CPU and quad-core ARM A57 Complex |
RAM | 4GB 128-bit LPDDR4, 1600 MHz - 51.2 GBs |
Storage | 16GB eMMC 5.1 Flash Storage |
For more details, see the Jetson TX2 NX Module product page.
Auvidea board
The JNX30M carrier board looks like this:
For more details, see the Auvidea product page.
GBEOS
Content
- Linux For Tegra:
l4t-32.7.1
- Linux kernel:
4.9.253-l4t-r32.7
- NVIDIA JetPack:
4.6.1
- NVIDIA Cuda:
10.2.300
- k3s:
v1.22.6
Using or building AI/ML image containers for the Jetson TX2 NX
Since the GBEOS distribution aligns with Linux for Tegra r32.7.1 and JetPack 4.6.1, the container images provided by NVIDIA can be used at runtime, or as a starting point for building new images. You can find the image catalog at the NVIDIA NGC website.
Compatible images contain the r32.7.1
string. For example, for a compatible tensorflow 2.7.0 image, the appropriate image would be: l4t-tensorflow:r32.7.1-tf2.7-py3
.
Post-installation steps
root password
The root user has no password. Set a password to protect administrator access to the box. Complete the following steps.
-
Open a terminal on your workstation and connect to the box using SSH:
ssh root@<box_ip_address>
-
After the connection is established, run the
passwd
command, and enter a new password.Expected output:
New password:****** Retype new password:***** passwd: password updated successfully
Power mode
The Jetson TX2 NX module supports several power modes. The default power mode provides performance profile.
The nvpmodel
tool allows to set the preferred power mode.
Default power model is 0. Run the nvpmodel -q
command to display the current power model.
NVPM WARN: fan mode is not set!
NV Power Mode: MAXN
0
This model brings the best performance level. All CPU cores are enabled. The 4 Cortex A57 are enabled as well as the 2 Denver Cores, as shown in the output of the lscpu
command:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 6
On-line CPU(s) list: 0-5
Vendor ID: ARM
Model name: Cortex-A57
Model: 3
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
Stepping: r1p3
CPU max MHz: 2035.2000
CPU min MHz: 345.6000
BogoMIPS: 62.50
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32
Model name: Denver 2
Model: 0
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
Stepping: 0x0
CPU max MHz: 2035.2000
CPU min MHz: 345.6000
BogoMIPS: 62.50
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32
Model name: Cortex-A57
Model: 3
Thread(s) per core: 1
Core(s) per socket: 3
Socket(s): 1
Stepping: r1p3
CPU max MHz: 2035.2000
CPU min MHz: 345.6000
BogoMIPS: 62.50
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32
Caches (sum of all):
L1d: 128 KiB (6 instances)
L1i: 192 KiB (6 instances)
L2: 2 MiB (2 instances)
Note: Power model 0 configures the cores max frequencies to their maximal nominal value. In this mode all the cores can operate at their max frequencies. (You can check the maximum frequency of the cores by running
cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
). However, in the absence of a cooling fan this power level may lead to overheating under heavy loads.
To set a different power model, run the nvpmodel -m <number>
command. For example, to set power model 2, run nvpmodel -m 2
Power model 2 enables all the cores, balancing the power consumption by capping the clock frequencies to lower values, as you can see on the lscpu
output:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 6
On-line CPU(s) list: 0-5
Vendor ID: ARM
Model name: Cortex-A57
Model: 3
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
Stepping: r1p3
CPU max MHz: 2035.2000
CPU min MHz: 345.6000
BogoMIPS: 62.50
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32
Model name: Denver 2
Model: 0
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
Stepping: 0x0
CPU max MHz: 2035.2000
CPU min MHz: 345.6000
BogoMIPS: 62.50
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32
Model name: Cortex-A57
Model: 3
Thread(s) per core: 1
Core(s) per socket: 3
Socket(s): 1
Stepping: r1p3
CPU max MHz: 2035.2000
CPU min MHz: 345.6000
BogoMIPS: 62.50
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32
Caches (sum of all):
L1d: 128 KiB (6 instances)
L1i: 192 KiB (6 instances)
L2: 2 MiB (2 instances)
To help investigate the different configurations, you can download and use the following tool:
curl -LOJ https://raw.githubusercontent.com/piaoling199/TX2-notes/master/sources/jetson_clocks.sh && chmod +x jetson_clocks.sh
To visualize the current configuration for the clock frequencies, run ./jetson_clocks.sh --show
Expected output:
SOC family:tegra186 Machine:lanai-3636
Online CPUs: 0-5
CPU Cluster Switching: Disabled
cpu0: Gonvernor=schedutil MinFreq=345600 MaxFreq=2035200 CurrentFreq=806400
cpu1: Gonvernor=schedutil MinFreq=345600 MaxFreq=2035200 CurrentFreq=345600
cpu2: Gonvernor=schedutil MinFreq=345600 MaxFreq=2035200 CurrentFreq=345600
cpu3: Gonvernor=schedutil MinFreq=345600 MaxFreq=2035200 CurrentFreq=1881600
cpu4: Gonvernor=schedutil MinFreq=345600 MaxFreq=2035200 CurrentFreq=1420800
cpu5: Gonvernor=schedutil MinFreq=345600 MaxFreq=2035200 CurrentFreq=1728000
GPU MinFreq=114750000 MaxFreq=1300500000 CurrentFreq=114750000
EMC MinFreq=204000000 MaxFreq=1600000000 CurrentFreq=1600000000 FreqOverride=0
Can't access Fan!
This reveals that the min, max, and current frequencies differ because the frequencies are adjusted on-demand. While this is reasonable considering the power consumption and hence the thermal dissipation optimizations (given the fact that the board operates in a fanless box as indicated by the last output line), it may in some cases lower the deterministic performances of certain workflows.
Executing the tool without any parameter configures the clocks for best performance. Run the following commands:
./jetson_clocks.sh
./jetson_clocks.sh --show
Expected output shows that the clock frequency values have all been set to the maximal and nominal values:
SOC family:tegra186 Machine:lanai-3636
Online CPUs: 0-5
CPU Cluster Switching: Disabled
cpu0: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
cpu1: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
cpu2: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
cpu3: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
cpu4: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
cpu5: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
GPU MinFreq=1300500000 MaxFreq=1300500000 CurrentFreq=1300500000
EMC MinFreq=204000000 MaxFreq=1600000000 CurrentFreq=1600000000 FreqOverride=1
Can't access Fan!
Kubernetes scheduling optimizations
TX2 NX Specifics
The TX2 NX comes with two core clusters:
- 4 x ARM CORTEX-A57 cores
- 2 x NVIDIA Denver core
Usage for these different cores may be determined per workload type. One good practice is to use the CORTEX cores for general purpose workloads and reserve the Denver cores for workloads requiring an exclusive access to the CPU. To this extent, the kernel command line excludes cores 1 and 2 (Denver cores) from its scheduler. This is achieved by using the kernel parameter isolcpus=1-2
. These cores are used only by processes that explicitly require them.
This can be correlated to the Kubernetes QoS classes:
- BestEffort
- Guaranteed
Kubernetes (K3S) cpu manager configuration
The cpu manager policy is preconfigured to static
and assigns cores 0, 3, 4 and 5 to the BestEffort QoS class. Cores 1 and 2 (Denver cores) are reserved for executing pods with the Guaranteed QoS class, hence getting an exclusive access to these cores.
Using the Guaranteed QoS class
Initial state: cat /var/lib/kubelet/cpu_manager_state
The output shows that all cores are available to the CPU manager.
{"policyName":"static","defaultCpuSet":"0-5","checksum":1946793818}
Create a pod with the BestEffort QoS class:
cat > l4t-be.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: l4t-be
spec:
containers:
- name: l4t
image: nvcr.io/nvidia/l4t-base:r32.7.1
command: ["/bin/bash"]
EOF
Apply the pod:
kubectl apply -f l4t-be.yaml
The l4t-be pod is now running in the BestEffort QoS class:
kubectl get po l4t-be -oyaml | grep qosClass
Expected output:
qosClass: BestEffort
All CPU core remain available to the cpu manager:
cat /var/lib/kubelet/cpu_manager_state```
Expected output:
```json
{"policyName":"static","defaultCpuSet":"0-5","checksum":1946793818}
Create a pod with a Guaranteed QoS class:
cat > l4t-st.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: l4t-st
spec:
containers:
- name: l4t
image: nvcr.io/nvidia/l4t-base:r32.7.1
command: ["/bin/bash"]
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 1
memory: 1Gi
EOF
Apply the pod:
kubectl apply -f l4t-st.yaml
The l4t-st pod is now running in the Guaranteed QoS class:
kubectl get po l4t-st -oyaml | grep qosClass
qosClass: Guaranteed
The cpu manager has been updated:
cat /var/lib/kubelet/cpu_manager_state
{"policyName":"static","defaultCpuSet":"0,2-5","entries":{"2e317c46-9ee8-46c0-a4d6-ef550e54acb2":{"l4t":"1"}},"checksum":710351396}
The available cpuset is now "defaultCpuSet":"0,2-5"
. Core 1 has been excluded from the "defaultCpuSet":"0-5"
list you have seen before deploying the l4t-st pod. Note that "entries":{"2e317c46-9ee8-46c0-a4d6-ef550e54acb2":{"l4t":"1"}}
shows that core number 1 has been allocated to the l4t container from the l4t-st pod.