Skip to main content
Version: 1.14

Contrast Runtime

The Contrast runtime is responsible for starting pods as confidential virtual machines. This works by specifying the runtime class to be used in a pod spec and by registering the runtime class with the API server. The RuntimeClass resource defines a name for referencing the class and a handler used by the container runtime (containerd) to identify the class.

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
# This name is used by pods in the runtimeClassName field
name: contrast-cc-abcdef
# This name is used by the
# container runtime interface implementation (containerd)
handler: contrast-cc-abcdef

Confidential pods that are part of a Contrast deployment need to specify the same runtime class in the runtimeClassName field, so Kubernetes uses the Contrast runtime instead of the default containerd / runc handler.

apiVersion: v1
kind: Pod
spec:
runtimeClassName: contrast-cc-abcdef
# ...

Node-level components

The runtime consists of additional software components that need to be installed and configured on every SEV-SNP-enabled/TDX-enabled worker node. This installation is performed automatically by the node-installer DaemonSet.

Runtime components

Containerd shim

The handler field in the Kubernetes RuntimeClass instructs containerd not to use the default runc implementation. Instead, containerd invokes a custom plugin called containerd-shim-contrast-cc-v2. This shim is described in more detail in the upstream source repository and in the containerd documentation.

Virtual machine manager (VMM)

The containerd shim uses a virtual machine monitor to create a confidential virtual machine for every pod. On bare metal, Contrast uses QEMU. The appropriate files are installed on every node by the node-installer.

Pod-VM image

Every pod-VM starts with the same guest image. The root filesystem is read-only and integrity protected using dm-verity. The verity hash of the root filesystem is passed via the kernel command line and thus part of the launch measurement of the pod-VM. The image contains the guest components of Kata Containers and Contrast: The Kata Containers agent that creates the sandbox and containers withing the guest, the Contrast image puller that retrieves and verifies container images, the Contrast secure mount service that can setup an ephemerally encrypted volume for image storage, as well as the Contrast initdata processor that receives initializing data from the runtime.

Initdata processor

After completing the boot process, the pod-VM runs the initdata processor. This program verifies the initdata document that was annotated to the pod and provisioned by the runtime. If the initdata document matches expectations, the initdata processor writes the policy to an in-memory filesystem. The Kata agent is started only after the initdata processor finished.

Contrast image puller

In addition to the kata agent, every pod-VM also starts Contrast's image puller. For each container scheduled to run in the pod, the kata agent will request the image puller to pull and mount the corresponding container image.

The image puller verifies the checksums of both the image manifest, as well as of all subsequently pulled image layers against the provided digest. For this reason, unpinned images aren't supported.

Contrast secure image store

In order to reduce memory requirements, Contrast supports mounting an ephemeral volume into each pod-VM. Before first use, the underlying block device issued by Kubernetes is LUKS-encrypted and integrity protected with keys generated inside the pod-VM. These keys are never persisted or transferred outside of the pod-VM, meaning the volume is only usable from within the specific pod it's attached to, and only for the duration of that pod's lifetime.

warning

It's not possible to detect a specific kind of replay attack by the host system wherein the host replaces the contents of the encrypted block device by a previously captured snapshot of the disk. Since the snapshot consists purely of ciphertext generated by the pod-VM, it reads as correctly integrity-protected and the pod-VM can't distinguish it from fresh data.

Importantly, this doesn't allow a malicious host to decrypt the ciphertext stored on the device or to inject arbitrary data.

The trade-off between reduced resource requirements and a weakened security posture must be evaluated on a per-use-case basis. See the secure image store how-to for details on how to configure the feature for your use-case.

Node installer DaemonSet

The RuntimeClass resource above registers the runtime with the Kubernetes api. The node-level installation is carried out by the Contrast node-installer DaemonSet that ships with every Contrast release.

After deploying the installer, it performs the following steps on each node:

  • Install the Contrast containerd shim (containerd-shim-contrast-cc-v2)
  • Install cloud-hypervisor or QEMU as the virtual machine manager (VMM)
  • Install an IGVM file or separate firmware and kernel files for pod-VMs of this class
  • Install a read only root filesystem disk image for the pod-VMs of this class
  • Backup any existing containerd configuration in the format <containerd-path>/<config-name>.<time>.bak
  • Reconfigure containerd by adding a runtime plugin that corresponds to the handler field of the Kubernetes RuntimeClass
  • Restart containerd to make it aware of the new plugin

Kubernetes RuntimeClass

Kubernetes can be extended to use more than one container runtime with RuntimeClass objects. The Container Runtime Interface (CRI) implementation, for example containerd, dispatches pod management API calls to the appropriate RuntimeClass. RuntimeClass implementations are usually based on an OCI runtime, such as runc, runsc or crun. Contrast uses the Kata Containers runtime with added confidential computing capabilities.

Kata Containers is an OCI runtime that runs pods in VMs. The pod VM spawns an agent process that accepts management commands from the Kata runtime running on the host. Kata Containers was originally designed to isolate the guest from the host, but it can also run pods in confidential VMs (CVMs) to shield pods from their underlying infrastructure. In confidential mode, the guest agent is configured with an Open Policy Agent (OPA) policy to authorize API calls from the host. This policy also contains checksums for the expected container images. It's derived from Kubernetes resource definitions and its checksum is included in the attestation report.