This article builds upon an old post I wrote many years ago:
Kubernetes PVC stuck in Terminating state. That post covered the symptoms and quick fixes.
This one is for platform engineers and Kubernetes operators who want to understand why resources like PVCs get stuck in Terminating
, how Kubernetes handles deletion internally, and what it really means when a finalizer hangs around.
What Are Finalizers and Why Do They Matter?
In Kubernetes, deleting a resource is a two-phase operation. When a user runs
kubectl delete
, the object is not immediately removed from etcd. Instead, Kubernetes sets a deletionTimestamp
and, if finalizers are present, waits for them to be cleared before actually removing the resource from the API server.
Finalizers are strings listed in the metadata.finalizers
array. Each one signals that a
controller must perform cleanup logic before the object can be deleted. This ensures consistency and is critical when external resources (cloud volumes, DNS records, firewall rules) are involved.
metadata:
finalizers:
- example.com/cleanup-hook
Until this list is empty, Kubernetes will not fully delete the object. This behavior is central to the garbage collection process and the reliability of resource teardown.
Deletion Flow Internals
Here’s what actually happens under the hood:
- The user requests deletion (e.g.
kubectl delete pvc my-claim
) - Kubernetes sets
metadata.deletionTimestamp
but leaves the object in etcd - If
metadata.finalizers
is non-empty, deletion is paused - Each controller responsible for a finalizer must reconcile the object, complete cleanup, then remove its string from the list
- Once the list is empty, the object is garbage collected
Visual Flow
[kubectl delete] → [deletionTimestamp set]
↓
[finalizers exist?] — No → resource deleted
↓
Yes
↓
[Controller reconciles → does cleanup → removes finalizer]
↓
[All finalizers removed?] — No → Wait
↓
Yes
↓
[Object deleted from etcd]
PVCs and the kubernetes.io/pvc-protection Finalizer
This finalizer is added by the PVC Protection Controller, a core Kubernetes controller
responsible for ensuring that a PVC isn’t deleted while it’s still in use by a Pod. It’s a guardrail that prevents accidental data loss.
To view it on a PVC:
kubectl get pvc my-claim -o yaml
You’ll see:
metadata:
finalizers:
- kubernetes.io/pvc-protection
As long as any Pod references that PVC, even if the Pod is Terminating
, Kubernetes won’t remove the finalizer. This also applies if the Pod’s deletion is delayed due to a finalizer or node unavailability.
Why Finalizers Hang Around (and PVCs Get Stuck)
If the controller responsible for a finalizer crashes or is unavailable, it can’t remove its entry.
As a result, the resource stays in Terminating
indefinitely. For PVCs, common culprits include:
- Pods still referencing the PVC
- Nodes being unresponsive (Pod can’t be torn down)
- CSI driver failing to detach/unmount volumes
- Stale
VolumeAttachment
objects lingering
To debug:
# Find referencing Pods
kubectl get pods --all-namespaces -o json | jq -r '
.items[] |
select(.spec.volumes[]?.persistentVolumeClaim.claimName=="my-claim") |
"\(.metadata.namespace)/\(.metadata.name)"'
# Check VolumeAttachments
kubectl get volumeattachments
# Describe PVC for recent events
kubectl describe pvc my-claim
vSphere CSI: Finalizers and Cleanup Flow
I’m going to use the vSphere CSI as my real-world example’s for looking at a CSI in Kubernetes, as it’s the one I’ve spent the most time with troubleshooting.
The vSphere CSI driver utilizes the external-attacher/csi-vsphere-vmware-com
finalizer on PersistentVolume (PV) objects. This finalizer ensures that the CSI external-attacher completes necessary cleanup operations before the PV is deleted.
If this finalizer remains on a PV, it can prevent the PV from being fully deleted, especially if the corresponding VolumeAttachment
object still exists. In such cases, manual intervention may be required to remove the finalizer and delete the PV.
For example, in Issue #266, a user encountered a situation where a PV couldn’t be deleted due to the lingering finalizer. The recommended workaround involved manually detaching the disk, removing the finalizer from the VolumeAttachment
and PV, and then deleting the PV.
Example: vSphere CSI Log on Failed Volume Unmap
E0324 04:21:58.987894 nestedpendingoperations.go:301]
Operation for \"{volumeName:kubernetes.io/csi/csi.vsphere.vmware.com^pvc-1234...}\" failed.
Error: \"UnmapVolume.UnmapBlockVolume failed: blkUtil.DetachFileDevice failed.\"
This log line shows a failed volume unmap operation, one reason PVC deletion might hang. These issues are common with block-mode volumes and can often be resolved by forcing detach or recycling the node.
Some tips and ideas
- Never remove finalizers blindly—they exist for a reason. Manual removal is only valid after confirming no side effects.
- Use readiness/liveness probes to ensure Pods cleanly terminate, helping PVCs detach properly.
- Monitor VolumeAttachments with alerts if they remain after PVCs are deleted.
- Build automation to identify stuck resources using
kubectl get all -o json
piped into custom jq scripts.
Conclusion
Finalizers play a critical role in Kubernetes’ safety and consistency guarantees. They ensure cleanup nhappens before resource deletion, but when mismanaged, or if a controller crashes, they can leave resources like PVCs hanging indefinitely.
By understanding the internals of how finalizers interact with deletion, controllers, and etcd, you gain the power to confidently debug and resolve these issues in complex environments. And with CSI drivers like vSphere, knowing the exact role and behavior of both PVC finalizers and custom CRD finalizers is key to long-term platform resilience.
Regards