Kubernetes Blog Note

Kubernetes Blog

The official homepage of Kubernetes, a container orchestration system for automating deployment, scaling, and management of containerized applications. This platform offers comprehensive documentation on Kubernetes, a project maintained by Cloud Native Computing Foundation. It includes details about running stateless and stateful applications, batch jobs, and CI/CD workflows using Kubernetes. The site includes detailed guides, tutorials, reference material, API documentation, and community engagement initiatives to help users get started with Kubernetes and leverage its features effectively to manage cloud-based applications efficiently.

Thread Of Notes

From Kubernetes Dashboard to Headlamp: Understanding the Transition

The Kubernetes Dashboard, once a primary visual interface for Kubernetes, has been archived. It served as an important onramp for many users, simplifying cluster visibility and resource inspection. Headlamp now carries this legacy forward, building on the Dashboard's foundation. It offers a clear visual interface while incorporating modern Kubernetes usage patterns. Headlamp provides multi-cluster visibility, application-centric views through Projects, and extensibility via plugins. The transition aims to honor the user-centered legacy of Dashboard and offer a growing UI solution. Many familiar workflows from Kubernetes Dashboard are retained in Headlamp, ensuring continuity and ease of use. Headlamp expands capabilities by allowing multi-cluster management from a single interface, reducing friction for distributed environments. Projects within Headlamp offer application-centered views, grouping related resources for better understanding and troubleshooting. The platform is also extensible through plugins, such as the Flux plugin for GitOps workflows or an AI Assistant for guidance. Headlamp offers flexible deployment options, usable as an in-cluster tool or a desktop application. Understanding current Dashboard usage, including clusters, namespaces, and authentication, aids a smooth transition to Headlamp.
CdXz5zHNQW_NGyZ88sXD3.png

Reconciling the Past: Correcting Records for Unfixed Kubernetes CVEs

Kubernetes is improving transparency by refining its CVE records for better accuracy. They discovered discrepancies in older CVE records, with some incorrectly listing fixed versions. The Kubernetes Security Response Committee will correct these records on June 1, 2026. This might lead to vulnerability scanners identifying previously undetected issues. This post offers technical details about three unfixed vulnerabilities: CVE-2020-8561, CVE-2020-8562, and CVE-2021-25740. The updates ensure correct vulnerability scanning and clarify persistent administrative mitigation needs. CVE-2020-8554, also unfixed, will receive a standardized version number format. The identified vulnerabilities remain unfixed because fixing them would disrupt core Kubernetes functionality. Each vulnerability has specific mitigations that administrators should implement to secure their clusters. These actions are crucial given the vulnerabilities’ architectural nature. The project emphasizes a "secure by configuration" approach to manage these risks. Updating these records shows a maturing security ecosystem promoting transparency and accurate risk assessment.

Announcing etcd 3.7.0-beta.0

SIG-Etcd has released the first beta of etcd v3.7.0, a significant update for the distributed database. This version introduces RangeStream, a feature designed to improve handling of large result sets, enhancing latency and memory management. The release also includes refactoring and cleanup of legacy components and interfaces, improving overall performance. The developers encourage users to test the beta and report any issues found in the etcd repository. A key highlight is the removal of the last vestiges of etcd v2store, completing the transition to v3store. This transition may introduce breaking changes, particularly for users not on v3.6.11, so feedback is requested regarding any problems encountered. This beta release also incorporates updates to bbolt and raft libraries. Furthermore, the release timeline is linked to the End of Life (EOL) for etcd v3.4, which will cease updates after May. The community is prepared to release an additional security patch for v3.4 if required, before its ultimate deprecation. Users are urged to upgrade from v3.4. Future betas are planned, potentially with further protobuf refactoring leading to release candidates and the final version in June or early July. Feedback is actively solicited through GitHub issues, the Kubernetes Slack channel, and the etcd-dev mailing list.

Kubernetes v1.36: New Metric for Route Sync in the Cloud Controller Manager

This article, originally misdated, now reflects a May 15, 2026, publication date. Kubernetes v1.36 introduces a new alpha metric, `route_controller_route_sync_total`, for the Cloud Controller Manager's route controller. This metric tracks route sync operations with the cloud provider, aiding in monitoring the `CloudControllerManagerWatchBasedRoutesReconciliation` feature gate. This feature, introduced in v1.35, switches the route controller to a watch-based approach. This change reduces API calls by only reconciling routes when nodes change. To test the new feature, compare the metric's behavior with the feature gate disabled and enabled. With the feature gate disabled, the counter increments at a fixed interval. Conversely, with the feature enabled, the counter only increments upon node changes. This difference is most noticeable in stable clusters with infrequent node modifications. Feedback can be provided through Kubernetes Slack, a GitHub issue, and the SIG Cloud Provider community page. Further details are available in KEP-5237.

Kubernetes v1.36: Mixed Version Proxy Graduates to Beta

The Mixed Version Proxy (MVP) enhances Kubernetes cluster upgrades by safely routing requests for unknown resources to newer API servers, preventing 404 errors. Initially introduced as an Alpha feature in Kubernetes 1.28, MVP is now moving to Beta in version 1.36 and will be enabled by default. MVP addresses the issue of API servers with differing versions during upgrades, where requests for new resources might fail on older servers. Instead of the incorrect 404, the request is proxied to a server capable of handling it. The Beta version of MVP uses aggregated discovery instead of the StorageVersion API for determining peer capabilities, improving functionality. This update also includes peer-aggregated discovery, providing clients with a unified view of all available APIs. To enable MVP, API servers require the --peer-ca-file flag, along with --peer-advertise-ip and --peer-advertise-port if needed. With kubeadm, you can include those flags in your ClusterConfiguration file to streamline the process. Users are encouraged to test MVP in staging environments and provide feedback to the SIG API Machinery as part of the 1.36 upgrade.

Kubernetes v1.36: Deprecation and removal of Service ExternalIPs

The `.spec.externalIPs` field in Kubernetes Services, initially designed for non-cloud load-balancer functionality, is now deprecated due to security vulnerabilities identified in CVE-2020-8554. This field allows specifying additional IP addresses a Service responds to, but it has inherent security risks because it assumes trust among all users. Kubernetes 1.21 already recommended disabling `.spec.externalIPs`, and an admission controller was introduced to enforce this. Alternatives, like manually managed LoadBalancer services or non-cloud load balancer controllers such as MetalLB, offer better security and control. MetalLB allows administrators to control IP address assignments, mitigating security concerns. The Gateway API also provides a secure solution, giving administrators control over the IP through a Gateway resource. Kubernetes 1.36 officially deprecated `.spec.externalIPs` and started issuing warnings about its usage. Kube-proxy support for the feature will be disabled in a future release, with full removal planned in subsequent versions. Users are encouraged to migrate away from this insecure feature.

Kubernetes v1.36: Advancing Workload-Aware Scheduling

Kubernetes v1.35 introduced workload-aware scheduling improvements, including the Workload API and basic gang scheduling for identical Pods. Kubernetes v1.36 refines this architecture by separating the Workload API (static template) from the new PodGroup API (runtime state). This separation streamlines the kube-scheduler, enabling it to directly read PodGroup information for enhanced performance. A new PodGroup scheduling cycle allows atomic processing of workloads, evaluating entire groups as a unified operation to prevent deadlocks. If a valid placement is found and group constraints are met, Pods are bound together; otherwise, the entire group is considered unschedulable and retries later. This forms the foundation for gang scheduling, ensuring all-or-nothing placement for strict workload requirements. Topology-aware scheduling in v1.36 enables defining topology constraints on PodGroups, co-locating Pods within specific physical or logical domains to reduce network latency. This involves generating, evaluating, and scoring candidate placements based on scheduling constraints. Workload-aware preemption is introduced to support the PodGroup scheduling cycle, preempting Pods from multiple Nodes simultaneously to make space for an entire PodGroup. It treats the PodGroup as a single preemptor unit, with PodGroup priority and disruptionMode fields controlling preemption behavior. Finally, v1.36 integrates Dynamic Resource Allocation (DRA) with the Workload API, allowing PodGroups to request and share specialized hardware resources through ResourceClaims. These advancements lay a robust foundation for building advanced workload scheduling capabilities in future Kubernetes releases.

Kubernetes v1.36: PSI Metrics for Kubernetes Graduates to GA

Pressure Stall Information (PSI) has been integrated into the Linux kernel since 2018, providing high-fidelity signals for identifying resource saturation before it leads to outages. Unlike traditional utilization metrics, PSI quantifies stalled tasks and lost time across CPU, memory, and I/O. With Kubernetes v1.36, a stable interface for observing resource contention at node, pod, and container levels is now available. PSI offers cumulative totals of stalled time and moving averages (10s, 60s, 300s) to distinguish between transient spikes and sustained resource tension. Extensive performance testing by SIG Node on high-density workloads (80+ pods) proved PSI's readiness for production. Kubelet overhead, measured by toggling the KubeletPSI feature gate, showed negligible impact on resource usage. The Kubelet's collection logic proved lightweight, blending seamlessly into standard housekeeping cycles, consuming less than 0.1 cores or 2.5% of total node capacity. Regarding kernel overhead, enabling PSI on the Linux kernel (psi=1 vs psi=0) resulted in a consistent delta of 0.037 to 0.125 cores (0.925% - 3.125% of node capacity) under heavy load. The kubelet process, as the primary collector, also maintained remarkably low CPU usage, with spikes not exceeding 0.25 cores (6.25%) for more than a second. Improvements in v1.36 include smarter metric emission; the Kubelet now detects OS-level PSI support via cgroup configurations before reporting, preventing misleading zero-valued metrics. To use PSI, nodes must run Linux kernel 4.20+, use cgroup v2, and have PSI enabled at the OS level (CONFIG_PSI=y, no psi=0 boot parameter). PSI metrics are generally available in v1.36 and require no feature gate opt-in. Users can scrape the /metrics/cadvisor endpoint or query the Summary API. PSI is a Linux-kernel feature and is not available on Windows nodes. Proxying to the Kubelet's HTTP API via the control plane's API server allows real-time pressure data from the Summary API but is a privileged operation.
CdXz5zHNQW_xWB13lRlZh.png

Kubernetes v1.36: Moving Volume Group Snapshots to GA

Kubernetes v1.36 introduces General Availability (GA) for volume group snapshots, a feature that was previously an Alpha and then Beta enhancement. This functionality leverages extension APIs to enable crash-consistent snapshots of multiple volumes simultaneously. The system groups PersistentVolumeClaim objects using label selectors, allowing for the restoration of workloads to a consistent recovery point. This feature is exclusively supported for CSI volume drivers, offering a significant advantage for applications utilizing multiple volumes that require write order consistency. Previously, individual volume snapshots could lead to inconsistencies if taken at different times, particularly for multi-volume applications. Group snapshots eliminate the need for manual application quiescence, providing crash consistency across all volumes in the group without tedious, sequential individual snapshots. Kubernetes manages group snapshots through three custom API kinds: VolumeGroupSnapshot, VolumeGroupSnapshotContent, and VolumeGroupSnapshotClass. These CRDs, now promoted to v1 in the GA release, allow users to request group snapshots, track their provisioned resources, and define their creation policies, respectively. The GA release brings enhanced stability, bug fixes, and improved restoreSize reporting based on feedback from prior beta versions. To use this feature, users must label their PersistentVolumeClaims to be grouped and then define a VolumeGroupSnapshot object with a selector matching these labels, along with a VolumeGroupSnapshotClass. For restoration, new PersistentVolumeClaims are created from individual VolumeSnapshot objects that are part of a larger VolumeGroupSnapshot. Storage vendors can add support by implementing new group controller services and RPCs within their CSI drivers.

Kubernetes v1.36: More Drivers, New Features, and the Next Era of DRA

Dynamic Resource Allocation (DRA) in Kubernetes v1.36 introduces significant advancements, extending its capabilities beyond specialized hardware to native resources like CPU and memory. Driver support for various hardware types, including networking, is expanding, making DRA a more hardware-agnostic solution. Several key features have graduated, enhancing scheduling flexibility and cluster utilization. The Prioritized list feature enables fallback preferences for device requests, improving resource allocation efficiency. Extended resource support allows a gradual transition to DRA by enabling resource requests via traditional extended resources. Partitionable devices provide native DRA support for dynamically carving physical hardware into smaller, logical instances. Device taints empower administrators to manage hardware more effectively by preventing faulty devices from being allocated or reserving specific hardware. Device binding conditions improve scheduling reliability by delaying Pod commitment until external resources are fully prepared. Resource health status exposes device health information directly in Pod status, aiding in quick identification and reaction to hardware failures. New alpha features include ResourceClaim support for workloads, optimizing large-scale AI/ML by managing shared resources across PodGroups. Node allocatable resources integrate CPU and memory allocation under the DRA umbrella, allowing for fine-grained performance tuning. DRA resource availability visibility provides administrators with real-time device capacity information for better planning. Deterministic device selection allows drivers to influence scheduling through lexicographical ordering. Discoverable device metadata in containers provides a standard protocol for drivers to expose device attributes to containers. The future roadmap focuses on maturing existing features, enhancing performance, scalability, and integration with workload-aware and topology-aware scheduling, with a strong emphasis on migrating users from Device Plugin to DRA.