Skip to main content
Version: Next

Troubleshooting

This section provides guidance on diagnosing and resolving issues in your Contrast deployment.

  • Logging: How to capture useful logs from your Contrast deployment.

contrast generate returns errors

Some workload configurations are known to be insecure or incompatible with Contrast. If such a configuration is detected during policy generation, an error is logged and the command fails.

Images with VOLUME declarations but without a Kubernetes mount

During contrast generate, an error like the following is printed and the process returns with a non-zero exit code:

level=ERROR msg="The following volumes declared in image config don't have corresponding Kubernetes mounts: [\"/data\"]"

This error indicates that the container image declares a VOLUME, but there is no Kubernetes volume mounted at that path (/data in the example). Since it's not clearly specified if or what a container runtime is supposed to mount in that case, all declared volumes need to have a corresponding explicit Kubernetes volume mount. Depending on the needs of the application, this could either be an emptyDir or a Contrast-managed persistent volume.

Pod fails to start

If the Coordinator or a workload pod fails to even start, it can be helpful to look at the events of the pod during the startup process using the describe command.

kubectl -n <namespace> events --for pod/<coordinator-pod-name>

Example output:

LAST SEEN  TYPE     REASON  OBJECT             MESSAGE
32m Warning Failed Pod/coordinator-0 kubelet Error: failed to create containerd task: failed to create shim task: "CreateContainerRequest is blocked by policy: ...

A common error, as in this example, is that the container creation was blocked by the policy. Potential reasons are a modification of the deployment YAML without updating the policies afterward, or a version mismatch between Contrast components.

Regenerating the policies

To ensure there isn't a mismatch between Kubernetes resource YAML and the annotated policies, rerun

contrast generate

on your deployment. If any of the policy annotations change, re-deploy with the updated policies.

Pin container images

When generating the policies, Contrast will download the images specified in your deployment YAML and include their cryptographic identity. If the image tag is moved to another container image after the policy has been generated, the image downloaded at deploy time will differ from the one at generation time, and the policy enforcement won't allow the container to be started in the pod VM.

To ensure the correct image is always used, pin the container image to a fixed sha256:

image: ubuntu:22.04@sha256:19478ce7fc2ffbce89df29fea5725a8d12e57de52eb9ea570890dc5852aac1ac

This way, the same image will still be pulled when the container tag (22.04) is moved to another image.

Validate Contrast components match

A version mismatch between Contrast components can cause policy validation or attestation to fail. Each Contrast runtime is identifiable based on its (shortened) measurement value used to name the runtime class version.

First, analyze which runtime class is currently installed in your cluster by running

kubectl get runtimeclasses

This should give you output similar to the following one.

NAME                                           HANDLER                                        AGE
contrast-cc-aks-clh-snp-7173acb5 contrast-cc-aks-clh-snp-7173acb5 23h
kata-cc-isolation kata-cc 45d

The output shows that there are four Contrast runtime classes installed (as well as the runtime class provided by the AKS CoCo preview, which isn't used by Contrast).

Next, check if the pod that won't start has the correct runtime class configured, and the Coordinator uses the exact same runtime:

kubectl -n <namespace> get -o=jsonpath='{.spec.runtimeClassName}' pod/<pod-name>
kubectl -n <namespace> get -o=jsonpath='{.spec.runtimeClassName}' pod/<coordinator-pod-name>

The output should list the runtime class the pod is using:

contrast-cc-aks-clh-snp-7173acb5

Version information about the currently used CLI can be obtained via the version flag:

contrast --version
contrast version v1.XX.X

container image versions:
ghcr.io/edgelesssys/contrast/coordinator:v1.XX.X@sha256:...
ghcr.io/edgelesssys/contrast/initializer:v1.XX.X@sha256:...
ghcr.io/edgelesssys/contrast/service-mesh-proxy:v1.XX.X@sha256:...
ghcr.io/edgelesssys/contrast/node-installer-microsoft:v1.XX.X@sha256:...
ghcr.io/edgelesssys/contrast/node-installer-kata:v1.XX.X@sha256:...
ghcr.io/edgelesssys/contrast/node-installer-kata-gpu:v1.XX.X@sha256:...
ghcr.io/edgelesssys/contrast/tardev-snapshotter:3.2.0.azl5@sha256:...

reference values for AKS-CLH-SNP platform:
runtime handler: contrast-cc-aks-clh-snp-7173acb5
- launch digest: 6cf7f93545210549c25e4efde6878deabfb5357da1a50b0fc9126e1218d182402a5ba2400d708a3d054ba96d663a2918
default SNP TCB:
bootloader: 3
tee: 0
snp: 8
microcode: 115
genpolicy version: 3.2.0.azl5

reference values for K3s-QEMU-TDX platform:
...

Check the output for the section with the platform you are using, for example AKS-CLH-SNP or K3s-QEMU-TDX. The runtime handler must match the runtime class name of the pod that won't start.

Contrast attempts to pull the wrong image reference

Containerd versions before v2.0.0 have a bug that can lead to pulling image references that differ from the PodSpec. The policy failure contains a line starting with allow_create_container_input at the very top. This is the request received from the runtime and subject to policy enforcement. The JSON contains a list of annotations nested under .OCI.Annotations. Verify that the value for annotation key io.kubernetes.cri.image-name corresponds to an image in your PodSpec. If it doesn't, you need to remove that image entirely from the affected node, for example with crictl.

crictl rmi $IMAGE

Upstream backport that's fixing the bug is pending: https://github.com/containerd/containerd/pull/11644.

VM runs out of memory

Since pod VMs are statically sized, it's easier to run out of memory due to misconfigurations. Setting the right memory limits is even more important on bare metal, where the image layers need to be stored in the guest memory, too. If you see an error message like this, the VM doesn't have enough space to pull images:

LAST SEEN   TYPE      REASON      OBJECT                            MESSAGE
2m31s Warning Failed Pod/my-pod-76dc84fc75-6xn7s Error: failed to create containerd task: failed to create shim task: failed to handle layer: hasher sha256: failed to unpack [...] No space left on device (os error 28)

This error can be resolved by increasing the memory limit of the containers, see the Workload deployment guide.

Connection to Coordinator fails

Connections from the CLI to the Coordinator may fail due to a variety of reasons. If the error happens during the attested TLS handshake, it will usually be reported as an error message of the following form: rpc error: code = <GRPC ERROR CODE> desc = connection error: desc = "<DESCRIPTION>". The following table explains the reason for the error and suggests further debugging steps.

DescriptionCauseNext steps
transport: authentication handshake failed: EOFConnection was closed before the Coordinator could send a certificate.Check the load balancer.
received context error while waiting for new LB policy update: context deadline exceededThe Coordinator didn't send attestation documents before the deadline.Check the Coordinator logs for issuer problems.
transport: authentication handshake failed: remote error: tls: internal errorCoordinator failed to issue attestation documentsCheck the Coordinator logs for issuer problems.
transport: authentication handshake failed: no valid attestation document certificate extensions foundCoordinator served an unexpected certificate.Check whether remote end is the Coordinator with port 1313; Compare versions of Coordinator and CLI.
transport: authentication handshake failed: tls: first record does not look like a TLS handshakeCoordinator didn't serve TLS.Check whether remote end is the Coordinator with port 1313.
transport: Error while dialing: dial tcp <host:port>: connect: connection refusedCoordinator port is closed.Check connectivity to the Coordinator; Check coordinator readiness; Check load balancer is pointing to the Coordinator port 1313.
transport: authentication handshake failed: [...] validator tdx-0 failed: validating report data: quote field MR_CONFIG_ID is [...]. Expect [...]"Wrong Coordinator policy hash.Compare versions of Coordinator and CLI