Example errors:
0/1 nodes available: insufficient memory 0/1 nodes available: insufficient cpu
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 11m (x2550 over 2d19h) default-scheduler 0/5 nodes are available: 2 node(s) had volume node affinity conflict, 3 Insufficient cpu. Warning FailedScheduling 65s (x199 over 2d19h) default-scheduler 0/5 nodes are available: 1 Insufficient memory, 1 node(s) had volume node affinity conflict, 4 Insufficient cpu.
kget pods | grep Pending prod-us-west-2-elasticsearch--1 0/1 Pending 0 2d19h prod-us-w-2-fluentd-24x 0/1 Pending 0 2d19h prod-us-eus-0 0/2 Pending 0 2d19h
This issue happens when Kubernetes needs more resources to satisfy your workload request.
Troubleshooting Steps:
- Determine requested resources
- Have you requested too many resources?
To determine your requested resources for your workload, you must first extract its YAML.
• What type of resource to extract the YAML may depend, but you can most commonly get the YAML for the pod that reports the problem.
• From that YAML, determine whether resource requests are made in the containers section under resources.
• A simplified YAML that makes a large request for memory resources (100G) might look like this, for example:
apiVersion: v1
kind: Pod
metadata:
name: too-much-mem
spec:
containers:
- command:
- sleep
- "3600"
image: busybox
name: broken-pods-too-much-mem-container
resources:
requests:
memory: "100Gi"
A default request may be made if no resource requests are in the YAML. This request will depend on other configurations in the cluster. See here for more information.
See here for more information.
If no request is made and you are out of resources, you will likely have no available nodes. At this point, you need to consider solution A.
2) Have you requested too many resources?
If you have made a resource request, then there are two possibilities:
- Your resource request cannot fit into any node on the cluster
- Your resource request can fit on a node in the cluster, but those nodes already have workloads running on them, which block yours from being provisioned.
Step 1 should have shown you whether you are explicitly requesting resources. Once you know what those resources are, you can compare them to the resources available on each node.
If you are able, run:
➜ kubectl describe nodes
➜ kubectl top nodes
Which, under ‘Capacity:,’ ‘Allocatable:,’ and‘Allocated resources:’ will tell you the resources
available in each node, e.g.:
$ kubectl describe nodes
[...]
Capacity:
cpu: 4
ephemeral-storage: 61255492Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 2038904Ki
pods: 110
Allocatable:
cpu: 4
ephemeral-storage: 56453061334
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1936504Ki
pods: 110
[...]
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 750m (18%) 0 (0%)
memory 140Mi (7%) 340Mi (17%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
You should be able to compare these to the resources you requested to determine why your request was not met and choose where to autoscale or provision a larger node accordingly.
Solutions:
A) Set up the autoscaling
B) Provision appropriately sized nodes
A) Set up the autoscaling
The details of this will vary depending on your platform, but generally, the principle is that you have legitimately used up all your resources, and you need more nodes to take the load.
Note that this solution will not work if:
- Your nodes are unavailable for other reasons (such as you have a ‘runaway’ workload consuming all the resources it finds), as you will see this error again once the new resources are consumed.
- Your workload cannot fit on any node in the cluster.
Some potentially helpful links to achieving this:
B) Provision appropriately sized nodes
The details of this will vary according to your platform. You will need to add a node (or set of nodes) that exceeds in size the amount your workload is requesting.
Also, your workload scheduler may ‘actively’ or ‘intelligently’ move workloads to make them all ‘fit’ onto the given nodes. In these cases, you may need to significantly over-provision node sizes to reliably accommodate your workload.
Check Resolution
If the error is no longer seen in the workload description in Kubernetes, then this particular issue has been resolved.
When you request Kubernetes to run your workload, it tries to find all the nodes that can fulfil the requirements.
Kubernetes resource management docs