kOps addons ¶

kOps supports two types of addons:

Managed addons, which are configurable through the cluster spec
Static addons, which are manifest files that are applied as-is

Managed addons ¶

The following addons are managed by kOps and will be upgraded following the kOps and kubernetes lifecycle, and configured based on your cluster spec. kOps will consider both the configuration of the addon itself as well as what other settings you may have configured where applicable.

Available addons ¶

AWS Load Balancer Controller ¶

Introduced
kOps 1.20

AWS Load Balancer Controller offers additional functionality for provisioning ELBs.

spec:
  awsLoadBalancerController:
    enabled: true
    cpuRequest: "100m"
    cpuLimit: "200m"
    memoryRequest: "200Mi"
    memoryLimit: "500Mi"

Though the AWS Load Balancer Controller can integrate the AWS WAF and Shield services with your Application Load Balancers (ALBs), kOps disables those capabilities by default.

Introduced
kOps 1.24

You can enable use of either or both of the WAF and WAF Classic services by including the following fields in the cluster spec:

spec:
  awsLoadBalancerController:
    enabled: true
    enableWAF: true
    enableWAFv2: true
    cpuRequest: "100m"
    cpuLimit: "200m"
    memoryRequest: "200Mi"
    memoryLimit: "500Mi"

Note that the controller will only succeed in associating one WAF with a given ALB at a time, despite it accepting both the "alb.ingress.kubernetes.io/waf-acl-id" and "alb.ingress.kubernetes.io/wafv2-acl-arn" annotations on the same Ingress object.

You can enable use of Shield Advanced by including the following fields in the cluster spec:

spec:
  awsLoadBalancerController:
    enabled: true
    enableShield: true

Support for the WAF and Shield services in kOps is currently beta, meaning that the accepted configuration and the AWS resources involved may change.

Read more in the official documentation.

Cluster autoscaler ¶

Introduced
kOps 1.19

Cluster autoscaler can be enabled to automatically adjust the size of the kubernetes cluster.

spec:
  clusterAutoscaler:
    enabled: true
    expander: least-waste
    balanceSimilarNodeGroups: false
    emitPerNodegroupMetrics: false
    awsUseStaticInstanceList: false
    scaleDownUtilizationThreshold: 0.5
    skipNodesWithCustomControllerPods: true
    skipNodesWithLocalStorage: true
    skipNodesWithSystemPods: true
    newPodScaleUpDelay: 0s
    scaleDownDelayAfterAdd: 10m0s
    scaleDownUnneededTime: 10m0s
    scaleDownUnreadyTime: 20m0s
    image: <the latest supported image for the specified kubernetes version>
    cpuRequest: "100m"
    memoryRequest: "300Mi"

Read more about cluster autoscaler in the official documentation.

Expander strategies ¶

Cluster autoscaler supports several different expander strategies.

Priority Expander configuration ¶

Introduced
kOps 1.26

The priority expander requires additional configuration through a ConfigMap as described in its documentation

When expander: priority is defined kOps will create this ConfigMap based on the InstanceGroup spec. You can change priority of each instance group by adding the followig to the InstanceGroup spec.

spec:
  autoscale: true
  autoscalePriority: 100

If autoscalePriority is not set, it will default to 0.

If you need a more complex configuration, eg use regex for matching the InstanceGoup, you can provide your own custom configuration. If this is configured, the priority set on the InstanceGroup specs are ignored.

clusterAutoscaler:
  customPriorityExpanderConfig:
    100:
    - .*foo.*
    50:
    - .*bar.*
    0:
    - .*

Disable ¶

If you want to manage the priority expander ConfigMap outside of kOps, you can disable the ConfigMap creation by adding the following to the Cluster spec:

clusterAutoscaler:
  createPriorityExpanderConfig: false

Disabling cluster autoscaler for a given instance group ¶

Introduced
kOps 1.20

You can disable the autoscaler for a given instance group by adding the following to the instance group spec.

spec:
  autoscale: false

Cert-manager ¶

Introduced	Minimum K8s Version
kOps 1.20	k8s 1.16

Cert-manager handles x509 certificates for your cluster.

spec:
  certManager:
    enabled: true
    defaultIssuer: yourDefaultIssuer

Warning: cert-manager only supports one installation per cluster. If you are already running cert-manager, you need to either remove this installation prior to enabling this addon, or mark cert-manger as not being managed by kOps (see below). As long as you are using v1 versions of the cert-manager resources, it is safe to remove existing installs and replace it with this addon

Self-provisioned cert-manager ¶

Introduced	Minimum K8s Version
kOps 1.20.2	k8s 1.16

The following cert-manager configuration allows provisioning cert-manager externally and allows all dependent plugins to be deployed. Please note that addons might run into errors until cert-manager is deployed.

spec:
  certManager:
    enabled: true
    managed: false

DNS nameserver configuration for cert-manager pod ¶

Introduced	Minimum K8s Version
kOps 1.23.3	k8s 1.16

Optional list of DNS nameserver IP addresses for the cert-manager pod to use. This is useful if you have a public and private DNS zone for the same domain to ensure that cert-manager can access ingress, or DNS01 challenge TXT records at all times.

You can set pod DNS nameserver configuration for cert-manager like so:

spec:
  certManager:
    enabled: true
    nameservers:
      - 1.1.1.1
      - 8.8.8.8

Enabling dns-01 challenges ¶

Introduced
kOps 1.25.0

Cert Manager may be granted the necessary IAM privileges to solve dns-01 challenges by adding a list of hostedzone IDs. This requires external permissions for service accounts to be enabled.

spec:
  certManager:
    enabled: true
    hostedZoneIDs:
    - ZONEID
  iam:
    useServiceAccountExternalPermissions: true

Read more about cert-manager in the official documentation

Karpenter ¶

Introduced
kOps 1.24

The Karpenter addon enables Karpenter-managed InstanceGroups.

spec:
  karpenter:
    enabled: true

See more details on how to configure Karpenter in the kOps Karpenter docs and the official documentation

Metrics server ¶

Introduced
kOps 1.19

Metrics Server is a scalable, efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines.

spec:
  metricsServer:
    enabled: true

Read more about Metrics Server in the official documentation.

Secure TLS ¶

Introduced
kOps 1.20

By default, API server will not verify the metrics server TLS certificate. To enable TLS verification, set the following in the cluster spec:

spec:
  certManager:
    enabled: true
  metricsServer:
    enabled: true
    insecure: false

This requires that cert-manager is installed in the cluster.

Node local DNS cache ¶

Introduced	Minimum K8s Version
kOps 1.18	k8s 1.15

NodeLocal DNSCache can be enabled if you are using CoreDNS. It is used to improve the Cluster DNS performance by running a dns caching agent on cluster nodes as a DaemonSet.

memoryRequest and cpuRequest for the node-local-dns pods can also be configured. If not set, they will be configured by default to 5Mi and 25m respectively.

If forwardToKubeDNS is enabled, kubedns will be used as a default upstream

spec:
  kubeDNS:
    provider: CoreDNS
    nodeLocalDNS:
      enabled: true
      memoryRequest: 5Mi
      cpuRequest: 25m

Node termination handler ¶

Introduced
kOps 1.19

Node Termination Handler ensures that the Kubernetes control plane responds appropriately to events that can cause your EC2 instance to become unavailable, such as EC2 maintenance events, EC2 Spot interruptions, and EC2 instance rebalance recommendations. If not handled, your application code may not stop gracefully, take longer to recover full availability, or accidentally schedule work to nodes that are going down.

spec:
  nodeTerminationHandler:
    cpuRequest: 200m
    enabled: true
    enableRebalanceMonitoring: true
    enableSQSTerminationDraining: true
    managedASGTag: "aws-node-termination-handler/managed"
    prometheusEnable: true
    webhookURL: "https://hooks.slack.com/services/YOUR/SLACK/URL"

Queue Processor Mode ¶

Introduced
kOps 1.21

If enableSQSTerminationDraining is not false Node Termination Handler will operate in Queue Processor mode. In addition to the events mentioned above, Queue Processor mode allows Node Termination Handler to take care of ASG Scale-In, AZ-Rebalance, Unhealthy Instances, EC2 Instance Termination via the API or Console, and more. kOps will provision the necessary infrastructure: an SQS queue, EventBridge rules, and ASG Lifecycle hooks. managedASGTag can be configured with Queue Processor mode to distinguish resource ownership between multiple clusters.

The kOps CLI requires additional IAM permissions to manage the requisite EventBridge rules and SQS queue:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "events:DeleteRule",
        "events:ListRules",
        "events:ListTargetsByRule",
        "events:ListTagsForResource",
        "events:PutEvents",
        "events:PutRule",
        "events:PutTargets",
        "events:RemoveTargets",
        "events:TagResource",
        "sqs:CreateQueue",
        "sqs:TagQueue",
        "sqs:DeleteQueue",
        "sqs:GetQueueAttributes",
        "sqs:ListQueues",
        "sqs:ListQueueTags"
      ],
      "Resource": "*"
    }
  ]
}

Warning: If you switch between the two operating modes on an existing cluster, the old resources have to be manually deleted. For IMDS to Queue Processor, this means deleting the k8s nth daemonset. For Queue Processor to IMDS, this means deleting the Kubernetes NTH deployment and the AWS resources: the SQS queue, EventBridge rules, and ASG Lifecycle hooks.

Node Problem Detector ¶

Introduced
kOps 1.22

Node Problem Detector aims to make various node problems visible to the upstream layers in the cluster management stack. It is a daemon that runs on each node, detects node problems and reports them to apiserver.

spec:
  nodeProblemDetector:
    enabled: true
    memoryRequest: 32Mi
    cpuRequest: 10m

Pod Identity Webhook ¶

Introduced
kOps 1.23

When using IAM roles for Service Accounts (IRSA), Pods require an additinal token to authenticate with the AWS API. In addition, the SDK requires specific environment variables set to make use of these tokens. This addon will mutate Pods configured to use IRSA so that users do not need to do this themselves.

All ServiceAccounts configured with AWS privileges in the Cluster spec will automatically be mutated to assume the configured role.

spec:
  certManager:
    enabled: true
  podIdentityWebhook:
    enabled: true

The EKS annotations on ServiceAccounts are typically not necessary as kOps will configure the webhook with all ServiceAccount to role mapping configured in the Cluster spec. But if you need specific configuration, you may annotate the ServiceAccount, overriding the kOps configuration.

Read more about Pod Identity Webhook in the official documentation.

Snapshot controller ¶

Introduced	Minimum K8s Version
kOps 1.21	k8s 1.20

Snapshot controller implements the volume snapshot features of the Container Storage Interface (CSI).

You can enable the snapshot controller by adding the following to the cluster spec:

spec:
  snapshotController:
    enabled: true

Note that the in-tree volume drivers do not support this feature. If you are running a cluster on AWS, you can enable the EBS CSI driver by adding the following:

spec:
  cloudConfig:
    awsEBSCSIDriver:
      enabled: true

Self-managed aws-ebs-csi-driver ¶

Introduced
kOps 1.25

The following configuration allows for a self-managed aws-ebs-csi-driver. Please note that if you’re using Amazon EBS volumes, you must install the Amazon EBS CSI driver. If the Amazon EBS CSI plugin is not installed, then volume operations will fail.

If IRSA is not enabled, the control plane will have the permissions to provision nodes, and the self-managed controllers should run on the control plane. If IRSA is enabled, kOps will create the respective AWS IAM Role, assign the policy, and establish a trust relationship allowing the ServiceAccount to assume the IAM Role. To configure Pods to assume the given IAM roles, enable the Pod Identity Webhook. Without this webhook, you need to modify your Pod specs yourself for your Pod to assume the defined roles.

spec:
  cloudConfig:
    awsEBSCSIDriver:
      enabled: true
      managed: false

Custom addons ¶

Static addons are configured with spec.addons. Each entry points to a manifest that the control plane can read.

spec:
  addons:
  - manifest: https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/standard-install.yaml

Starting with kOps 1.36, that file can be a regular Kubernetes resource manifest. For example, the manifest above installs the Gateway API CRDs. kOps applies the resources as-is.

It can also be a kOps Addons index. Use this when one entry in spec.addons should point at multiple versioned manifests. The addon management docs describe versioned addon indexes in more detail.

Here is a minimal Addons index that installs two addons.

kind: Addons
metadata:
  name: example
spec:
  addons:
  - name: foo.addons.org.io
    version: 0.0.1
    selector:
      k8s-addon: foo.addons.org.io
    manifest: foo.addons.org.io/v0.0.1.yaml
  - name: bar.addons.org.io
    version: 0.0.1
    selector:
      k8s-addon: bar.addons.org.io
    manifest: bar.addons.org.io/v0.0.1.yaml

In this example, the file structure should look like this:

addon.yaml
  foo.addons.org.io
    v0.0.1.yaml
  bar.addons.org.io
    v0.0.1.yaml

The YAML files in the foo/bar folders can be any Kubernetes resource manifest. Typically this file structure would be pushed to S3 or another supported backend and then referenced from spec.addons. If master nodes need access to an S3 bucket containing addon manifests, add IAM policies using spec.additionalPolicies, like so:

spec:
  additionalPolicies:
    master: |
      [
        {
          "Effect": "Allow",
          "Action": [
            "s3:GetObject"
          ],
          "Resource": ["arn:aws:s3:::my-kops-addons/*"]
        },
        {
          "Effect": "Allow",
          "Action": [
            "s3:GetBucketLocation",
            "s3:ListBucket"
          ],
          "Resource": ["arn:aws:s3:::my-kops-addons"]
        }
      ]

The masters will poll for changes in the bucket and keep the addons up to date.