0/1 nodes available: insufficient cpu, insufficient memory


 

Example errors:

0/1 nodes available: insufficient memory
0/1 nodes available: insufficient cpu

Events:   Type     Reason            Age                     From               Message   ----     ------            ----                    ----               -------   Warning  FailedScheduling  11m (x2550 over 2d19h)  default-scheduler  0/5 nodes are available: 2 node(s) had volume node affinity conflict, 3 Insufficient cpu.   Warning  FailedScheduling  65s (x199 over 2d19h)   default-scheduler  0/5 nodes are available: 1 Insufficient memory, 1 node(s) had volume node affinity conflict, 4 Insufficient cpu.

kget pods | grep Pending prod-us-west-2-elasticsearch--1                             0/1     Pending     0               2d19h prod-us-w-2-fluentd-24x                                      0/1     Pending     0               2d19h prod-us-eus-0                                       0/2     Pending     0               2d19h 

 

This issue happens when Kubernetes needs more resources to satisfy your workload request.


Troubleshooting Steps: 

  1. Determine requested resources
  2. Have you requested too many resources?

1) Determine requested resources

To determine your requested resources for your workload, you must first extract its YAML.

What type of resource to extract the YAML may depend, but you can most commonly get the YAML for the pod that reports the problem.

From that YAML, determine whether resource requests are made in the containers section under resources.


A simplified YAML that makes a large request for memory resources (100G) might look like this, for example:

apiVersion: v1

kind: Pod

metadata:

  name: too-much-mem

spec:

  containers:

    - command:

        - sleep

        - "3600"

     image: busybox

     name: broken-pods-too-much-mem-container

     resources:

        requests:

          memory: "100Gi"

 

 A default request may be made if no resource requests are in the YAML. This request will depend on other configurations in the cluster. See here for more information.

See here for more information.

If no request is made and you are out of resources, you will likely have no available nodes. At this point, you need to consider solution A.


2) Have you requested too many resources?

If you have made a resource request, then there are two possibilities:

  • Your resource request cannot fit into any node on the cluster
  • Your resource request can fit on a node in the cluster, but those nodes already have workloads running on them, which block yours from being provisioned.

Step 1 should have shown you whether you are explicitly requesting resources. Once you know what those resources are, you can compare them to the resources available on each node.

If you are able, run:

➜  kubectl describe nodes

➜ kubectl top nodes


Which, under ‘Capacity:,’ ‘Allocatable:,’ and‘Allocated resources:’ will tell you the resources

available in each node, e.g.:

  

$ kubectl describe nodes

[...]

Capacity:

  cpu:                4

  ephemeral-storage:  61255492Ki

  hugepages-1Gi:      0

  hugepages-2Mi:      0

  memory:             2038904Ki

  pods:               110

Allocatable:

  cpu:                4

  ephemeral-storage:  56453061334

  hugepages-1Gi:      0

  hugepages-2Mi:      0

  memory:             1936504Ki

  pods:               110

[...]

Allocated resources:

  (Total limits may be over 100 percent, i.e., overcommitted.)

  Resource           Requests    Limits

  --------           --------    ------

  cpu                750m (18%)  0 (0%)

  memory             140Mi (7%)  340Mi (17%)

  ephemeral-storage  0 (0%)      0 (0%)

  hugepages-1Gi      0 (0%)      0 (0%)

  hugepages-2Mi      0 (0%)      0 (0%)


You should be able to compare these to the resources you requested to determine why your request was not met and choose where to autoscale or provision a larger node accordingly.


Solutions:

A) Set up the autoscaling

B) Provision appropriately sized nodes


A) Set up the autoscaling

The details of this will vary depending on your platform, but generally, the principle is that you have legitimately used up all your resources, and you need more nodes to take the load.

Note that this solution will not work if:

  • Your nodes are unavailable for other reasons (such as you have a ‘runaway’ workload consuming all the resources it finds), as you will see this error again once the new resources are consumed.
  • Your workload cannot fit on any node in the cluster.

Some potentially helpful links to achieving this:


B) Provision appropriately sized nodes

The details of this will vary according to your platform. You will need to add a node (or set of nodes) that exceeds in size the amount your workload is requesting.

Also, your workload scheduler may ‘actively’ or ‘intelligently’ move workloads to make them all ‘fit’ onto the given nodes. In these cases, you may need to significantly over-provision node sizes to reliably accommodate your workload.


Check Resolution

If the error is no longer seen in the workload description in Kubernetes, then this particular issue has been resolved.

When you request Kubernetes to run your workload, it tries to find all the nodes that can fulfil the requirements.

Kubernetes resource management docs

 

Post a Comment

Previous Post Next Post