Node tuning
Manage node-level tuning with the Node Tuning Operator.
Creating a simple TuneD profile for setting sysctl settings
If you would like to set some node-level tuning on the nodes in your hosted cluster, you can use the Node Tuning Operator. In HyperShift, node tuning can be configured by creating ConfigMaps which contain Tuned objects, and referencing these ConfigMaps in your NodePools.
-
Create a ConfigMap which contains a valid Tuned manifest and reference it in a NodePool. The example Tuned manifest below defines a profile which sets
vm.dirty_ratio
to 55, on Nodes which contain the Node labeltuned-1-node-label
with any value.Save the ConfigMap manifest in a file called
tuned-1.yaml
:apiVersion: v1 kind: ConfigMap metadata: name: tuned-1 namespace: clusters data: tuning: | apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: tuned-1 namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Custom OpenShift profile include=openshift-node [sysctl] vm.dirty_ratio="55" name: tuned-1-profile recommend: - priority: 20 profile: tuned-1-profile
NOTE: In the case where no labels are added to an entry in the
spec.recommend
section of the Tuned spec, NodePool based matching is assumed, so the highest priority profile in thespec.recommend
section will be applied to Nodes in the pool. While more fine-grained Node label based matching is still possible by setting a label value in the Tuned.spec.recommend.match
, users should be aware that Node labels will not persist during an upgrade, unless the NodePool.spec.management.upgradeType
is set toInPlace
.Create the ConfigMap in the management cluster:
oc --kubeconfig="$MGMT_KUBECONFIG" create -f tuned-1.yaml
Reference the ConfigMap in the NodePools
spec.tuningConfig
field, either by editing an existing NodePool or creating a new NodePool. In this example we assume we only have one NodePool callednodepool-1
, containing 2 Nodes.apiVersion: hypershift.openshift.io/v1alpha1 kind: NodePool metadata: ... name: nodepool-1 namespace: clusters ... spec: ... tuningConfig: - name: tuned-1 status: ...
NOTE: You may reference the same ConfigMap in multiple NodePools. In HyperShift, NTO will append a hash of the NodePool name and namespace to the name of the Tuneds to distinguish them. Outside of this case, users should be careful not to create multiple Tuned profiles of the same name in different Tuneds for the same hosted cluster.
-
Now that the ConfigMap containing a Tuned manifest has been created and referenced in a NodePool, the Node Tuning Operator will sync the Tuned objects into the hosted cluster. You can check which Tuneds are defined and which profiles are set for each Node.
List the Tuned objects in the hosted cluster:
oc --kubeconfig="$HC_KUBECONFIG" get Tuneds -n openshift-cluster-node-tuning-operator
Example output:
NAME AGE default 7m36s rendered 7m36s tuned-1 65s
List the Profiles in the hosted cluster:
oc --kubeconfig="$HC_KUBECONFIG" get Profiles -n openshift-cluster-node-tuning-operator
Example output:
NAME TUNED APPLIED DEGRADED AGE nodepool-1-worker-1 tuned-1-profile True False 7m43s nodepool-1-worker-2 tuned-1-profile True False 7m14s
As we can see, both worker nodes in the NodePool have the tuned-1-profile applied. Note that if no custom profiles are created, the
openshift-node
profile will be applied by default. -
To confirm the tuning was applied correctly, we can start a debug shell on a Node and check the sysctl values:
oc --kubeconfig="$HC_KUBECONFIG" debug node/nodepool-1-worker-1 -- chroot /host sysctl vm.dirty_ratio
Example output:
vm.dirty_ratio = 55
Applying tuning which requires kernel boot parameters
You can also use the Node Tuning Operator for more complex tuning which requires setting kernel boot parameters. As an example, the following steps can be followed to create a NodePool with huge pages reserved.
-
Create the following ConfigMap which contains a Tuned object manifest for creating 10 hugepages of size 2M.
Save this ConfigMap manifest in a file called
tuned-hugepages.yaml
:apiVersion: v1 kind: ConfigMap metadata: name: tuned-hugepages namespace: clusters data: tuning: | apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: hugepages namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Boot time configuration for hugepages include=openshift-node [bootloader] cmdline_openshift_node_hugepages=hugepagesz=2M hugepages=50 name: openshift-node-hugepages recommend: - priority: 20 profile: openshift-node-hugepages
NOTE: The
.spec.recommend.match
field is intentionally left blank. In this case this Tuned will be applied to all Nodes in the NodePool where this ConfigMap is referenced. It is advised to group Nodes with the same hardware configuration into the same NodePool. Not following this practice might result in TuneD operands calculating conflicting kernel parameters for two or more nodes sharing the same NodePool.Create the ConfigMap in the management cluster:
oc --kubeconfig="$MGMT_KUBECONFIG" create -f tuned-hugepages.yaml
-
Create a new NodePool manifest YAML file, customize the NodePools upgrade type, and reference the previously created ConfigMap in the
spec.tuningConfig
section before creating it in the management cluster.Create the NodePool manifest and save it in a file called
hugepages-nodepool.yaml
:NODEPOOL_NAME=hugepages-example INSTANCE_TYPE=m5.2xlarge NODEPOOL_REPLICAS=2 hypershift create nodepool aws \ --cluster-name $CLUSTER_NAME \ --name $NODEPOOL_NAME \ --node-count $NODEPOOL_REPLICAS \ --instance-type $INSTANCE_TYPE \ --render > hugepages-nodepool.yaml
Edit
hugepages-nodepool.yaml
. Set.spec.management.upgradeType
toInPlace
, and set.spec.tuningConfig
to reference thetuned-hugepages
ConfigMap you created.apiVersion: hypershift.openshift.io/v1alpha1 kind: NodePool metadata: name: hugepages-nodepool namespace: clusters ... spec: management: ... upgradeType: InPlace ... tuningConfig: - name: tuned-hugepages
NOTE: Setting
.spec.management.upgradeType
toInPlace
is recommended to avoid unnecessary Node recreations when applying the new MachineConfigs. With theReplace
upgrade type, Nodes will be fully deleted and new nodes will replace them when applying the new kernel boot parameters that are calculated by the TuneD operand.Create the NodePool in the management cluster:
oc --kubeconfig="$MGMT_KUBECONFIG" create -f hugepages-nodepool.yaml
-
After the Nodes become available, the containerized TuneD daemon will calculate the required kernel boot parameters based on the applied TuneD profile. After the Nodes become
Ready
and reboot once to apply the generated MachineConfig, you can verify that the Tuned profile is applied and that the kernel boot parameters have been set.List the Tuned objects in the hosted cluster:
oc --kubeconfig="$HC_KUBECONFIG" get Tuneds -n openshift-cluster-node-tuning-operator
Example output:
NAME AGE default 123m hugepages-8dfb1fed 1m23s rendered 123m
List the Profiles in the hosted cluster:
oc --kubeconfig="$HC_KUBECONFIG" get Profiles -n openshift-cluster-node-tuning-operator
Example output:
NAME TUNED APPLIED DEGRADED AGE nodepool-1-worker-1 openshift-node True False 132m nodepool-1-worker-2 openshift-node True False 131m hugepages-nodepool-worker-1 openshift-node-hugepages True False 4m8s hugepages-nodepool-worker-2 openshift-node-hugepages True False 3m57s
Both worker nodes in the new NodePool have the
openshift-node-hugepages
profile applied. -
To confirm the tuning was applied correctly, we can start a debug shell on a Node and check
/proc/cmdline
oc --kubeconfig="$HC_KUBECONFIG" debug node/nodepool-1-worker-1 -- chroot /host cat /proc/cmdline
Example output:
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-... hugepagesz=2M hugepages=50
How to debug Node Tuning issues
If you face issues with Node Tuning, first check the Condition ValidTuningConfig
in the NodePool that references your Tuned config. This reports any issue that may prevent the configuration load.
- lastTransitionTime: "2023-03-06T14:30:35Z"
message: ConfigMap "tuned" not found
observedGeneration: 2
reason: ValidationFailed
status: "False"
type: ValidTuningConfig
If the NodePool condition shows no issues, it means that the configuration has been loaded and propagated to the NodePool. You can then check the status of the relevant Profile
Custom Resource in your HostedCluster. In the conditions you should see if the configuration has been applied succesfully and whether there are any outstanding Warning or Errors. An example can be seen below.
status:
bootcmdline: ""
conditions:
- lastTransitionTime: "2023-03-06T14:22:14Z"
message: The TuneD daemon profile not yet applied, or application failed.
reason: Failed
status: "False"
type: Applied
- lastTransitionTime: "2023-03-06T14:22:14Z"
message: 'TuneD daemon issued one or more error message(s) during profile application.
TuneD stderr: ERROR tuned.daemon.controller: Failed to reload TuneD: Cannot
load profile(s) ''tuned-1-profile'': Cannot find profile ''openshift-node-notexistin''
in ''[''/etc/tuned'', ''/usr/lib/tuned'']''.'
reason: TunedError
status: "True"
type: Degraded
tunedProfile: tuned-1-profile