Introduction To Kubernetes Part 4 - Pod Security Standards

Introduction To Kubernetes Part 4 - Pod Security Standards

Sangam Biradar's photo
Sangam Biradar
·Sep 14, 2022·

14 min read

Subscribe to our newsletter and never miss any upcoming articles

Table of contents

PSP ( PodSecurityPolicy ) is dead . Replace PodSecurityPolicy with a new built-in admission controller that enforces the Pod Security Standards. read more

Pod Security Standards

The Pod Security standard divided into three security spectrums as following :

  • Privilege: Unrestricted policy, providing the widest possible level of permissions. This policy allows for known privilege escalations.
  • Baseline: Minimally restrictive policy which prevents known privilege escalations. Allows the default (minimally specified) Pod configuration.
  • Restricted: Heavily restricted policy, following current Pod hardening best practices.

Built-in Pod Security admission enforcement

kubectl exec -it kube-apiserver-controlplane -n kube-system -- kube-apiserver -h | grep enable-admission-plugins

syntax: 
kubectl exec -it kube-apiserver-<master-node> -n kube-system -- kube-apiserver -h | grep enable-admission-plugins

here you can check all enabled admission plugins

 --admission-control strings              Admission is divided into two phases. In the first phase, only mutating admission plugins run. In the second phase, only validating admission plugins run. The names in the below list may represent a validating plugin, a mutating plugin, or both. The order of plugins in which they are passed to this flag does not matter. Comma-delimited list of: AlwaysAdmit, AlwaysDeny, AlwaysPullImages, CertificateApproval, CertificateSigning, CertificateSubjectRestriction, DefaultIngressClass, DefaultStorageClass, DefaultTolerationSeconds, DenyServiceExternalIPs, EventRateLimit, ExtendedResourceToleration, ImagePolicyWebhook, LimitPodHardAntiAffinityTopology, LimitRanger, MutatingAdmissionWebhook, NamespaceAutoProvision, NamespaceExists, NamespaceLifecycle, NodeRestriction, OwnerReferencesPermissionEnforcement, PersistentVolumeClaimResize, PersistentVolumeLabel, PodNodeSelector, PodSecurity, PodSecurityPolicy, PodTolerationRestriction, Priority, ResourceQuota, RuntimeClass, SecurityContextDeny, ServiceAccount, StorageObjectInUseProtection, TaintNodesByCondition, ValidatingAdmissionWebhook. (DEPRECATED: Use --enable-admission-plugins or --disable-admission-plugins instead. Will be removed in a future version.)


 --enable-admission-plugins strings       admission plugins that should be enabled in addition to default enabled ones (NamespaceLifecycle, LimitRanger, ServiceAccount, TaintNodesByCondition, PodSecurity, Priority, DefaultTolerationSeconds, DefaultStorageClass, StorageObjectInUseProtection, PersistentVolumeClaimResize, RuntimeClass, CertificateApproval, CertificateSigning, CertificateSubjectRestriction, DefaultIngressClass, MutatingAdmissionWebhook, ValidatingAdmissionWebhook, ResourceQuota). Comma-delimited list of admission plugins: AlwaysAdmit, AlwaysDeny, AlwaysPullImages, CertificateApproval, CertificateSigning, CertificateSubjectRestriction, DefaultIngressClass, DefaultStorageClass, DefaultTolerationSeconds, DenyServiceExternalIPs, EventRateLimit, ExtendedResourceToleration, ImagePolicyWebhook, LimitPodHardAntiAffinityTopology, LimitRanger, MutatingAdmissionWebhook, NamespaceAutoProvision, NamespaceExists, NamespaceLifecycle, NodeRestriction, OwnerReferencesPermissionEnforcement, PersistentVolumeClaimResize, PersistentVolumeLabel, PodNodeSelector, PodSecurity, PodSecurityPolicy, PodTolerationRestriction, Priority, ResourceQuota, RuntimeClass, SecurityContextDeny, ServiceAccount, StorageObjectInUseProtection, TaintNodesByCondition, ValidatingAdmissionWebhook.

Alternative Option use pod security admission web-hook independently

git clone https://github.com/kubernetes/pod-security-admission.git
cd pod-security-admission/webhook
make certs
kubectl apply -k .

Add Pod Security Label to Namespace level

apiVersion: v1
kind: Namespace
metadata:
  name: my-baseline-namespace
  labels:
    pod-security.kubernetes.io/enforce: baseline
    pod-security.kubernetes.io/enforce-version: latest
    pod-security.kubernetes.io/warn: baseline
    pod-security.kubernetes.io/warn-version: latest
  # We are setting these to our _desired_ `enforce` <span class='kc-markdown-code-copy'></span> level.
    pod-security.kubernetes.io/audit: baseline
    pod-security.kubernetes.io/audit-version: latest

---
apiVersion: v1
kind: Namespace
metadata:
  name: my-restricted-namespace
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: latest
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/warn-version: latest
    pod-security.kubernetes.io/audit:  restricted
    pod-security.kubernetes.io/audit-version: latest
EOF

as per your pod security label base on namespace level you will get waning , audit log

kubectl apply -f resource.yaml --namespace=my-baseline-namespace

PodTemplate Resources

Audit and Warn modes are also checked on resource types that embed a PodTemplate (enumerated below), but enforce mode only applies to actual pod resources.

Since users do not create pods directly in the typical deployment model, the warning mechanism is only effective if it can also warn on templated pod resources. Similarly, for audit it is useful to tie the audited violation back to the requesting user, so audit will also apply to templated pod resources. In the interest of supporting mutating admission controllers, policies will only be enforced on actual pods.

Templated pod resources include:

v1 ReplicationController
v1 PodTemplate
apps/v1 ReplicaSet
apps/v1 Deployment
apps/v1 StatefulSet
apps/v1 DaemonSet
batch/v1 CronJob
batch/v1 Job

Lets Learn about Pod Security Standards with Baseline - Minimally restrictive policy which prevents known privilege escalations. Allows the default (minimally specified) Pod configuration.

disallow capabilities

  • Adding capabilities beyond those listed in the policy must be disallowed. validate

  • Any capabilities added beyond the allowed list (AUDIT_WRITE, CHOWN, DAC_OVERRIDE, FOWNER, FSETID, KILL, MKNOD, NET_BIND_SERVICE, SETFCAP, SETGID, SETPCAP, SETUID, SYS_CHROOT are disallowed.

          - AUDIT_WRITE
          - CHOWN
          - DAC_OVERRIDE
          - FOWNER
          - FSETID
          - KILL
          - MKNOD
          - NET_BIND_SERVICE
          - SETFCAP
          - SETGID
          - SETPCAP
          - SETUID
          - SYS_CHROOT
    

NET_RAW is a default permissive setting in Kubernetes. It’s there to allow ICMP traffic between containers. But in addition to ICMP traffic, this capability grants an application the ability to craft raw packets (like ARP and DNS), so there's a lot of freedom for an attacker to play with network related attacks.

apiVersion: v1
kind: Pod
metadata:
  name: badpod01
spec:
  containers:
  - name: container01
    image: dummyimagename
    securityContext:
      capabilities:
        add:
        - NET_RAW

One or more containers do not have resource limits - this could starve other processes


       CAP_NET_RAW
              * Use RAW and PACKET sockets;
              * bind to any address for transparent proxying.
apiVersion: v1
kind: Pod
metadata:
  name: badpod02
spec:
  containers:
  - name: container01
    image: dummyimagename
    securityContext:
      capabilities:
        add:
        - NET_RAW
        - SETGID

You should run your container with privilege escalation turned off to prevent escalating privileges using setuid or setgid binaries.

apiVersion: v1
kind: Pod
metadata:
  name: badpod06
spec:
  initContainers:
  - name: initcontainer01
    image: dummyimagename
    securityContext:
      capabilities:
        add:
        - NET_RAW
  containers:
  - name: container01
    image: dummyimagename
    securityContext:
      capabilities:
        add:
        - SYS_ADMIN

CAP_SYS_ADMIN is the most privileged capability and should always be avoided

Capabilities permit certain named root actions without giving full root access. They are a more fine-grained permissions model, and all capabilities should be dropped from a pod, with only those required added back.

There are a large number of capabilities, with CAP_SYS_ADMIN bounding most. Never enable this capability - it’s equivalent to root and should always be avoided.

disallow-host-namespaces

  • Host namespaces (Process ID namespace, Inter-Process Communication namespace, an network namespace) allow access to shared information and can be used to elevate privileges. Pods should not be allowed access to host namespaces. This policy ensures

fields which make use of these host namespaces are unset or set to false. validate:

Sharing the host namespaces is disallowed. The fields spec.hostNetwork, spec.hostIPC, and spec.hostPID must be unset or set to false.

          spec:
            =(hostPID): "false"
            =(hostIPC): "false"
            =(hostNetwork): "false"

Don't set hostPID as true

Sharing the host’s PID namespace allows visibility of processes on the host, potentially leaking information such as environment variables and configuration

apiVersion: v1
kind: Pod
metadata:
  name: badpod01
spec:
  hostPID: true
  containers:
  - name: container01
    image: dummyimagename

Don't set hostIPC as true

Sharing the host’s IPC namespace allows container processes to communicate with processes on the host

Removing namespaces from pods reduces isolation and allows the processes in the pod to perform tasks as if they were running natively on the host.

This circumvents the protection models that containers are based on and should only be done with absolutely certainty (for example, for low-level observation of other containers).

apiVersion: v1
kind: Pod
metadata:
  name: badpod02
spec:
  hostIPC: true
  containers:
  - name: container01
    image: dummyimagename

Don't set HostNetwork as true

Sharing the host’s network namespace permits processes in the pod to communicate with processes bound to the host’s loopback adapter

apiVersion: v1
kind: Pod
metadata:
  name: badpod03
spec:
  hostNetwork: true
  containers:
  - name: container01
    image: dummyimagename

set everything as false

 hostPID: false
 hostIPC: false
 hostNetwork: false

disallow host path

HostPath volumes let Pods use host directories and volumes in containers. Using host resources can be used to access shared data or escalate privileges and should not be allowed. This policy ensures no hostPath volumes are in use.

HostPath volumes are forbidden. The field spec.volumes[*].hostPath must be unset.

A volume can be declared in a pod’s Kubernetes yaml manifest. You can specify .spec.volumes along with .spec.containers[*].volumeMounts to specify what kind of volume it is, and where to mount it inside of the container. Here’s an example of a pod that creates a container that mounts the host’s root directory to /host inside of the container.


apiVersion: v1
kind: Pod
metadata:
  name: badpod01
spec:
  containers:
  - name: container01
    image: dummyimagename
    volumeMounts:
      - name: udev
        mountPath: /data
  volumes:
  - name: udev
    hostPath:
      path: /etc/udev
-
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gooddeployment02
spec:
  replicas: 1
  selector:
    matchLabels:
      app: app
  template:
    metadata:
      labels:
        app: app
    spec:
      containers:
      - name: container01
        image: dummyimagename
        volumeMounts:
          - name: temp
            mountPath: /scratch
      volumes:
      - name: temp
        emptyDir: {}

There are a few ways to protect against potential misconfigurations relating to HostPath volumes.

Scope the HostPath volume to a specific directory.

Be sure to specify a spec.volumes.hostpath.path directory that is essential. Otherwise avoid using HostPaths altogether.

Ensure the HostPath volume is read only.

When mounting the volume you can set it to read only mode.

volumeMounts:
    - mountPath: /var/log/host
      name: test-volume
      readOnly: true

Bonus Points: Use a container optimized OS like Google’s Container Optimized OS or AWS’s Bottlerocket, which include read only root filesystems by default.

Restrict access to HostPath volumes through an pod security admission controller.

A collection of manifests that create pods with different elevated privileges. Quickly demonstrate the impact of allowing security sensitive pod attributes like hostNetwork, hostPID, hostPath, hostIPC, and privileged

Readme Card

disallow host ports

Access to host ports allows potential snooping of network traffic and should not be allowed, or at minimum restricted to a known list. This policy ensures the hostPort field is unset or set to 0.

The hostPort setting applies to the Kubernetes containers. The container port will be exposed to the external network at :, where the hostIP is the IP address of the Kubernetes node where the container is running, and the hostPort is the port requested by the user.

We recommend that you do not specify a hostPort for a pod unless it is absolutely necessary. When you bind a pod to a hostPort, it limits the number of places the pod can be scheduled, because each combination must be unique.

If you do not specify the hostIP and protocol explicitly, Kubernetes will use 0.0.0.0 as the default hostIP and TCP as the default protocol. This will expose your host to the internet.


apiVersion: v1
kind: Pod
metadata:
  name: influxdb
spec:
  containers:
    - name: influxdb
      image: influxdb
      ports:
        - containerPort: 8086
          hostPort: 8086

The hostPort feature allows to expose a single container port on the host IP. Using the hostPort to expose an application to the outside of the Kubernetes cluster has the same drawbacks as the hostNetwork approach discussed in the previous section. The host IP can change when the container is restarted, two containers using the same hostPort cannot be scheduled on the same node and the usage of the hostPort is considered a privileged operation on OpenShift.

What is the hostPort used for? For example, the nginx based Ingress controller is deployed as a set of containers running on top of Kubernetes. These containers are configured to use hostPorts 80 and 443 to allow the inbound traffic on these ports from the outside of the Kubernetes cluster.

disallow-host-ports-range

Access to host ports allows potential snooping of network traffic and should not be allowed, or at minimum restricted to a known list. This policy ensures the hostPort field is set to one in the designated list.

The only permitted hostPorts are in the range 5000-6000.

apiVersion: v1
kind: Pod
metadata:
  name: goodpod02
spec:
  containers:
  - name: container01
    image: dummyimagename
    ports:
    - name: admin
      containerPort: 8000
      hostPort: 5555
      protocol: TCP
disallow-host-process

Windows pods offer the ability to run HostProcess containers which enables privileged access to the Windows node. Privileged access to the host is disallowed in the baseline policy. HostProcess pods are an alpha feature as of Kubernetes v1.22. This policy ensures the hostProcess field, if present, is set to false.

HostProcess containers are disallowed. The fields spec.securityContext.windowsOptions.hostProcess, spec.containers[*].securityContext.windowsOptions.hostProcess, spec.initContainers[*].securityContext.windowsOptions.hostProcess, and spec.ephemeralContainers[*].securityContext.windowsOptions.hostProcess must either be undefined or set tofalse.

   securityContext:
      windowsOptions:
        hostProcess: false

set hostProcess to false

apiVersion: v1
kind: Pod
metadata:
  name: badpod01
spec:
  hostNetwork: true
  containers:
  - name: container01
    image: dummyimagename
    securityContext:
      windowsOptions:
        hostProcess: true

disallow-privileged-containers

Privileged mode disables most security mechanisms and must not be allowed. This policy ensures Pods do not call for privileged mode.

Privileged mode is disallowed. The fields spec.containers[].securityContext.privileged and spec.initContainers[].securityContext.privileged must be unset or set to false.

set privileged to false

apiVersion: v1
kind: Pod
metadata:
  name: goodpod02
spec:
  containers:
  - name: container01
    image: dummyimagename
    securityContext:
      privileged: false

disallow-proc-mount

The default /proc masks are set up to reduce attack surface and should be required. This policy ensures nothing but the default procMount can be specified. Note that in order for users to deviate from the Default procMount requires setting a feature gate at the API server.

Changing the proc mount from the default is not allowed. The field spec.containers[*].securityContext.procMount, spec.initContainers[*].securityContext.procMount, and spec.ephemeralContainers[*].securityContext.procMount must be unset or set toDefault.

Don't set procMount as Unmasked always set as Default

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gooddeployment02
spec:
  replicas: 1
  selector:
    matchLabels:
      app: app
  template:
    metadata:
      labels:
        app: app
    spec:
      containers:
      - name: container01
        image: dummyimagename
        securityContext:
          procMount: Default

disallow selinux

SELinux options can be used to escalate privileges and should not be allowed. This policy ensures that the seLinuxOptions field is undefined.

Don't set seLinuxOptions as spc_t

set either as container_t container_kvm_t or container_init_t

 securityContext:
        seLinuxOptions:
          type: container_t


    securityContext:
        seLinuxOptions:
          type: container_kvm_t


  securityContext:
          seLinuxOptions:
            type: container_init_t

restrict-apparmor-profiles

On supported hosts, the 'runtime/default' AppArmor profile is applied by default. The default policy should prevent overriding or disabling the policy, or restrict overrides to an allowed set of profiles. This policy ensures Pods do not specify any other AppArmor profiles than runtime/default or localhost/*.

Specifying other AppArmor profiles is disallowed.

The annotation container.apparmor.security.beta.kubernetes.io if defined must not be set to anything other than runtime/default or localhost/*.

always set as


        metadata:
          annotations:
            container.apparmor.security.beta.kubernetes.io/container01: runtime/default

or

  metadata:
          annotations:
            container.apparmor.security.beta.kubernetes.io/container01: localhost/foo

restrict-seccomp

The seccomp profile must not be explicitly set to Unconfined. This policy, requiring Kubernetes v1.19 or later, ensures that seccomp is unset or set to RuntimeDefault or Localhost.

Use of custom Seccomp profiles is disallowed. The fields spec.securityContext.seccompProfile.type, spec.containers[].securityContext.seccompProfile.type, spec.initContainers[].securityContext.seccompProfile.type, and spec.ephemeralContainers[*].securityContext.seccompProfile.type must be unset or set to RuntimeDefault or Localhost.

set as

 securityContext:
    seccompProfile:
      type: RuntimeDefault

or

 securityContext:
    seccompProfile:
      type: Localhost
      localhostProfile: operator/default/profile1.json

restrict-sysctls

Sysctls can disable security mechanisms or affect all containers on a host, and should be disallowed except for an allowed "safe" subset. A sysctl is considered safe if it is namespaced in the container or the Pod, and it is isolated from other Pods or processes on the same Node. This policy ensures that only those "safe" subsets can be specified in a Pod.

Setting additional sysctls above the allowed type is disallowed. The field spec.securityContext.sysctls must be unset or not use any other names than kernel.shm_rmid_forced, net.ipv4.ip_local_port_range, net.ipv4.ip_unprivileged_port_start, net.ipv4.tcp_syncookies and net.ipv4.ping_group_range.

set as

      securityContext:
        sysctls:
        - name: net.ipv4.ip_unprivileged_port_start
          value: "2048"

or

  securityContext:
        sysctls:
        - name: net.ipv4.ip_local_port_range
          value: "31000    60999"

or

   securityContext:
        sysctls:
        - name: kernel.shm_rmid_forced
          value: "2"

or

     securityContext:
        sysctls:
        - name: net.ipv4.tcp_syncookies
          value: "0"

or

 securityContext:
        sysctls:
        - name: net.ipv4.ip_unprivileged_port_start
          value: "2048"

Lets Learn about Pod Security Standards with Restricted - Heavily restricted policy, following current Pod hardening best practices.

disallow-capabilities-strict - Pod Security Standards (Restricted)

Adding capabilities other than NET_BIND_SERVICE is disallowed. In addition, all containers must explicitly drop ALL capabilities.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gooddeployment01
spec:
  replicas: 1
  selector:
    matchLabels:
      app: app
  template:
    metadata:
      labels:
        app: app
    spec:
      containers:
      - name: container01
        image: dummyimagename
        securityContext:
          capabilities:
            drop:
            - ALL

disallow privilege escalation - Pod Security Standards (Restricted)

Privilege escalation, such as via set-user-ID or set-group-ID file mode, should not be allowed. This policy ensures the allowPrivilegeEscalation field is set to false. allowPrivilegeEscalation: "false"

 apiVersion: v1
kind: Pod
metadata:
  name: goodpod01
spec:
  containers:
  - name: container01
    image: dummyimagename
    securityContext:
      allowPrivilegeEscalation: false

require run as non root user - Pod Security Standards (Restricted)

Containers must be required to run as non-root users. This policy ensures runAsUser is either unset or set to a number greater than zero.

Running as root is not allowed. The fields spec.securityContext.runAsUser, spec.containers[*].securityContext.runAsUser, spec.initContainers[*].securityContext.runAsUser, and spec.ephemeralContainers[*].securityContext.runAsUser must be unset or set to a number greater than zero.

apiVersion: v1
kind: Pod
metadata:
  name: badpod02
spec:
  containers:
  - name: container01
    image: dummyimagename
    securityContext:
      runAsUser: 0

set runAsUser: 1

apiVersion: v1
kind: Pod
metadata:
  name: goodpod02
spec:
  containers:
  - name: container01
    image: dummyimagename
  securityContext:
    runAsUser: 1

require run as nonroot

Containers must be required to run as non-root users. This policy ensures runAsNonRoot is set to true. A known issue prevents a policy such as this using anyPattern from being persisted properly in Kubernetes 1.23.0-1.23.2.

Running as root is not allowed. Either the field spec.securityContext.runAsNonRoot must be set to true, or the fields spec.containers[*].securityContext.runAsNonRoot , spec.initContainers[*].securityContext.runAsNonRoot, and spec.ephemeralContainers[*].securityContext.runAsNonRoot must be set to true.

set runAsNonRoot as true

apiVersion: v1
kind: Pod
metadata:
  name: goodpod01
spec:
  containers:
  - name: container01
    image: dummyimagename
  securityContext:
    runAsNonRoot: true

Restrict Seccomp (Strict)

The seccomp profile in the Restricted group must not be explicitly set to Unconfined but additionally must also not allow an unset value. This policy, requiring Kubernetes v1.19 or later, ensures that seccomp is set to RuntimeDefault or Localhost. A known issue prevents a policy such as this using anyPattern from being persisted properly in Kubernetes 1.23.0-1.23.2.

Use of custom Seccomp profiles is disallowed. The fields spec.securityContext.seccompProfile.type, spec.containers[*].securityContext.seccompProfile.type, spec.initContainers[*].securityContext.seccompProfile.type, and spec.ephemeralContainers[*].securityContext.seccompProfile.type must be set to RuntimeDefault or Localhost.

apiVersion: v1
kind: Pod
metadata:
  name: goodpod02
spec:
  containers:
  - name: container01
    image: dummyimagename
  securityContext:
    seccompProfile:
      localhostProfile: operator/default/profile1.json
      type: Localhost

or

apiVersion: v1
kind: Pod
metadata:
  name: goodpod03
spec:
  containers:
  - name: container01
    image: dummyimagename
    securityContext:
      seccompProfile:
        type: RuntimeDefault

restrict volume types

In addition to restricting HostPath volumes, the restricted pod security profile limits usage of non-core volume types to those defined through PersistentVolumes. This policy blocks any other type of volume other than those in the allow list.

Only the following types of volumes may be used: configMap, csi, downwardAPI, emptyDir, ephemeral, persistentVolumeClaim, projected, and secret.

Check it out :- kubernetes.io/docs/concepts/security/pod-se.. for more

Did you find this article valuable?

Support CloudNativeFolks by becoming a sponsor. Any amount is appreciated!

Learn more about Hashnode Sponsors