bm_sno
Agent-based bare metal OCP SNO deployment via iDRAC Redfish APIs.
This role is included by the reproducer role when
cifmw_bm_sno: true. It performs an agent-based installation on a
physical bare metal host managed via iDRAC Redfish APIs. The workflow generates
a self-contained agent ISO on the Zuul controller, pushes it to the target
host’s iDRAC via Redfish VirtualMedia, and waits for the host to self-install.
Privilege escalation
Bare metal deployment requires privilege escalation for /etc/hosts
management and running the ISO HTTP server via podman.
Network architecture
Three routed isolated networks (no shared L2 domain required):
Network |
Purpose |
|---|---|
BMC management |
iDRAC interfaces; controller reaches iDRAC via routing |
BMO provision |
Node’s 1st NIC, OS interface IP; VirtualMedia boot |
Controller |
Zuul controller; serves the agent ISO to iDRAC |
A 2nd NIC on the node carries isolated MetalLB networks for RHOSO EDPM services (ctlplane, internalapi, storage, tenant) via VLANs.
The api and *.apps DNS names resolve directly to the node’s BMO
provision IP via /etc/hosts entries managed by the role.
Parameters
Required (typically set in the scenario’s vars.yaml)
Parameter |
Type |
Description |
|---|---|---|
|
str |
OpenShift cluster name |
|
str |
Base domain for the cluster |
|
str |
BMO provision network CIDR |
|
str |
Node IP on the BMO provision network |
|
str |
RHCOS interface name on the BMO provision network |
|
str |
iDRAC hostname or IP on the BMC management network |
|
list |
Single-element list with |
|
str |
OCP version (e.g. |
Optional (have defaults or are auto-discovered)
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
str |
|
Alternative to version: extract |
|
int |
|
Port for the podman HTTP server that serves the agent ISO (only a privileged port may accept external traffic on Zuul controllers) |
|
int |
|
Total seconds before the installer times out (split between bootstrap and install phases) |
|
str |
|
Path to the pull secret JSON file |
|
str |
|
Path to a YAML file with |
|
bool |
|
Allow the role to automatically enable |
|
str |
auto-discovered |
UEFI device path for the Virtual Optical Drive; auto-discovered from UEFI boot options if omitted |
|
str |
— |
Set a |
|
bool |
|
Patch the agent ISO with password, autologin, and systemd debug shell on |
|
list |
|
Extra NIC names to disable IPv4/IPv6 on during agent-based install. Prevents overlapping-subnet validation failures when multiple NICs share a native VLAN (e.g. |
|
dict |
|
When set, creates an Ignition partition at install time to cap CoreOS rootfs growth and leave unallocated space for the LVMS StorageClass. Keys: |
Secrets management
The bare metal path requires two secret files:
BMC credentials
A YAML file at cifmw_bmc_credentials_file (default ~/secrets/idrac_access.yaml)
with the following structure:
username: root
password: <idrac-password>
Pull secret
The OCP pull secret JSON at cifmw_manage_secrets_pullsecret_file
(default ~/pull-secret).
Task files
The agent-based deployment is composed of reusable task files under
tasks/:
Task file |
Description |
|---|---|
|
Main orchestrator: validates variables, generates ISO, serves it via HTTP, manages VirtualMedia, waits for install completion |
|
Idempotent power-on via Redfish with POST wait (retries 30x at 10s intervals) |
|
Idempotent force power-off via Redfish with confirmation wait |
|
Reads |
|
Wraps |
|
Ejects VirtualMedia from the iDRAC Virtual Optical Drive |
|
Discovers or validates the UEFI device path for VirtualMedia, clears pending iDRAC config jobs, and sets a one-time boot override |
|
Patches the agent ISO ignition with core password, autologin, and debug shell on tty6 (used when |
|
Generates a MachineConfig manifest to set the core user password hash post-install |
openshift-install acquisition
The openshift-install binary is obtained automatically via one of two
methods, depending on which variable is set:
By version (
cifmw_bm_agent_openshift_version): downloads the tarball fromhttps://mirror.openshift.com/pub/openshift-v4/clients/ocp/<version>/openshift-install-linux.tar.gzand extracts it.By release image (
cifmw_bm_agent_release_imageorOPENSHIFT_RELEASE_IMAGEenv var): runsoc adm release extract --command=openshift-installagainst the image.
If the binary already exists in the working directory it is reused.
Deployment workflow
Validate required variables
Ensure
GenericUsbBootis enabled in BIOS (auto-enable with power cycle if allowed)Power off the host
Generate SSH keys, template
install-config.yamlandagent-config.yamlOptionally generate an LVMS partition MachineConfig into
openshift/manifestsAcquire
openshift-installbinary (see above) and runopenshift-install agent create imageto build the agent ISOOptionally patch the ISO for discovery-phase console access
Serve the ISO via a root podman httpd container (rootless podman cannot use privileged ports)
Eject any existing VirtualMedia, then insert the agent ISO
Discover the Virtual Optical Drive UEFI path and set a one-time boot override
Power on the host
Verify BIOS
GenericUsbBootis enabled after POSTAdd
/etc/hostsentries forapi/api-intand*.appsdomainsWait for bootstrap and install to complete
Copy kubeconfig and kubeadmin-password to the dev-scripts-compatible auth directory
Eject VirtualMedia and stop the HTTP server
Molecule tests
bm_redfish scenario
The bm_redfish Molecule scenario validates the bare metal Redfish task files
(bm_power_on, bm_power_off, bm_check_usb_boot, bm_ensure_usb_boot,
bm_eject_vmedia, bm_discover_vmedia_target) against a stateful Python
mock iDRAC server that simulates Redfish API responses over HTTPS.
The mock server (molecule/bm_redfish/files/mock_idrac.py) provides:
Stateful GET/POST/PATCH handlers for power, BIOS, VirtualMedia, boot override, and job queue Redfish endpoints
A
/test/resetadmin endpoint to set mock state between test casesA
/test/stateendpoint to query current mock state for assertionsSelf-signed TLS certificates generated during
prepare.yml
Test coverage:
Test file |
Scenarios |
|---|---|
|
Already off (idempotent), On -> Off |
|
Already on (idempotent), Off -> On |
|
Enabled (succeeds), Disabled (expected failure) |
|
Already enabled (no cycle), Disabled + auto-enable (BIOS change + cycle), Disabled + no auto-enable (expected failure) |
|
Inserted (ejects), Not inserted (idempotent) |
|
Auto-discover, user-provided valid path, user-provided invalid path (expected failure) |
Examples
Minimal vars.yaml for a bare metal SNO deployment:
cifmw_bm_sno: true
cifmw_bm_agent_cluster_name: ocp
cifmw_bm_agent_base_domain: example.com
cifmw_bm_agent_machine_network: "192.168.10.0/24"
cifmw_bm_agent_node_ip: "192.168.10.50"
cifmw_bm_agent_node_iface: eno12399np0
cifmw_bm_agent_bmc_host: idrac.mgmt.example.com
cifmw_bm_agent_openshift_version: "4.18.3"
cifmw_bm_agent_enable_usb_boot: true
cifmw_bm_nodes:
- mac: "b0:7b:25:xx:yy:zz"
root_device: /dev/sda
Local debugging on an autoheld Zuul node
When a Zuul job is held (autohold), you can SSH into the Zuul controller
and iterate on the deployment without re-provisioning SNO from scratch.
1. Prepare the environment
Edit ~/configs/zuul_vars.yaml to skip SNO re-provisioning and OpenStack
cleanup (there is nothing to clean up if doing the first RHOSO deployment):
cifmw_cleanup_architecture: false
reuse_ocp: true
run_cleanup: false
2. Run the playbook
From the ci-framework-jobs checkout on the Zuul controller:
cd ~/src/gitlab.cee.redhat.com/ci-framework/ci-framework-jobs
ansible-playbook playbooks/baremetal/run-sno-bm.yaml \
--flush-cache \
-e@/home/zuul/configs/default-vars.yaml \
-e@/home/zuul/src/gitlab.cee.redhat.com/ci-framework/ci-framework-jobs/scenarios/test/test-tool-versions.yaml \
-e@/home/zuul/src/gitlab.cee.redhat.com/ci-framework/ci-framework-jobs/scenarios/uni/default-vars.yaml \
-e@/home/zuul/src/gitlab.cee.redhat.com/ci-framework/ci-framework-jobs/scenarios/baremetal/vaf/rhel-vars.yaml \
-e@/home/zuul/configs/networking_defintion.yaml \
-e@/home/zuul/configs/nmstate_config.yaml \
-e@/home/zuul/configs/scenario-vars.yaml \
-e@/home/zuul/configs/secrets.yaml \
-e@/home/zuul/configs/vars.yaml \
-e@/home/zuul/configs/zuul_vars.yaml
With reuse_ocp: true, run-sno-bm.yaml will:
Copy the SNO kubeconfig from
dev-scripts/ocp/<cluster>/auth/to~/.kube/configandoc loginaskubeadminwith--insecure-skip-tls-verify(agent-based installer uses self-signed certs)Generate
openshift-login-params.ymlvia theopenshift_loginroleWrite a static inventory mapping
controller-0tolocalhostRun
deploy-edpm-reuse.yamlinstead ofreproducer.yml, which skips OCP provisioning and goes straight to architecture deployment
3. Subsequent iterations
Once the first EDPM deployment succeeds, set cifmw_cleanup_architecture
back to true so that cleanup-architecture.sh tears down the previous
OpenStack deployment before re-applying:
cifmw_cleanup_architecture: true
reuse_ocp: true
run_cleanup: false
4. Quick OCP and agent/SNO SSH access
The SNO kubeconfig and kubeadmin password live in the dev-scripts auth directory:
export KUBECONFIG=~/src/github.com/openshift-metal3/dev-scripts/ocp/<cluster>/auth/kubeconfig
oc login -u kubeadmin \
-p "$(cat ~/src/github.com/openshift-metal3/dev-scripts/ocp/<cluster>/auth/kubeadmin-password)" \
--insecure-skip-tls-verify=true
oc get nodes
For ssh access into SNO host:
ssh -i ~/ci-framework-data/artifacts/agent-install/agent_ssh_key \
core@<cluster>.<cifmw_bm_agent_base_domain>
Replace <cluster> with the value of cifmw_bm_agent_cluster_name (e.g.
sno).
For ssh into agent-install appliance, use -i ci-framework-data/artifacts/cifmw_ocp_access_key.
You can also get autologin and debug shell on tty6 of the agent with:
cifmw_bm_agent_core_password: changeme
cifmw_bm_agent_live_debug: true
LVMS partition
By default CoreOS expands its rootfs partition to fill the entire disk
at first boot. To reserve space for the LVMS (Logical Volume Manager
Storage) StorageClass, set cifmw_bm_agent_lvms_partition with at
least the device key. The role injects a MachineConfig manifest into
the agent ISO that creates a labeled partition via Ignition — before
growfs runs — so CoreOS rootfs stops at rootfs_mib and the
remainder is available for LVMS.
cifmw_bm_agent_lvms_partition:
device: /dev/disk/by-path/pci-0000:65:00.0-scsi-0:3:111:0
rootfs_mib: 150000 # ~150 GB for CoreOS (minimum 25000)
size_mib: 0 # 0 = rest of disk
label: lvmstorage # partition label
After OCP is installed, create an LVMCluster CR that targets the
partition by label:
apiVersion: lvm.topolvm.io/v1alpha1
kind: LVMCluster
metadata:
name: lvmcluster
namespace: openshift-storage
spec:
storage:
deviceClasses:
- name: lvmstorage
deviceSelector:
paths:
- /dev/disk/by-partlabel/lvmstorage
thinPoolConfig:
name: thin-pool
overprovisionRatio: 10
sizePercent: 90