Proxmox - For stubborn clusters (quorum challenges)

 


Here are concrete command-only options you can use to address the cluster not ready no quorum error, with inline comments to explain intent. These are standard steps from Proxmox cluster management practices.

Step-by-step commands (no-URL references included)

  1. Check current cluster status and node list

  • Purpose: confirm quorum state and identify node IDs/names.

  • Commands:

    • pvecm status

      shows total votes, quorum, and node list

    • pvecm nodes

      lists node IDs and names in the cluster

  • Interpretation: ensure enough online voters exist to reach quorum before attempting join or delnode operations.

  1. Verify network connectivity and time synchronization

  • Purpose: ensure all cluster nodes can communicate and have synchronized clocks.

  • Commands:

    • ping -c 3 <other-node-IP> or ping -c 3 <hostname>

    • timedatectl status

    • systemctl status corosync

  • Notes: adjust firewalls to allow cluster ports if needed (1024-65535 ranges, depending on your setup). This is not a single command, but a quick checklist to reduce quorum issues.

  1. If a node is offline but should remain part of the cluster later

  • Purpose: bring the cluster back to a majority without removing required nodes.

  • Commands (run on an online master node):

    • pvecm status

    • If the cluster currently has N votes and you know there is an offline node that should be kept for redundancy, avoid forcing a delnode until connectivity is restored.

  1. Remove a node from a healthy cluster

  • Purpose: safely remove a node when the cluster is quorate.

  • Commands (on a surviving, quorant node):

    • Identify the node name to remove (from pvecm nodes output)

    • pvecm delnode <nodename>

    • If you encounter issues with an online node, you can use a forceful removal as a last resort, but only after attempting standard removal:

      • pvecm delnode <nodename> --force

  • Cleanup residual config on the removed node (do this only if you can access the node or it’s offline):

    • On the remaining cluster node(s): rm -rf /etc/pve/nodes/<nodename>

    • If you still have access to the removed node, you can also wipe its cluster files locally; this is typically unnecessary when removal is clean.

  1. For stubborn 2-node clusters (quorum challenges)

  • Scenario: two-node cluster where one node is down or unreachable. Quorum can become impossible.

  • Approaches:

    • If the surviving node must operate standalone temporarily, you can adjust the expected votes (only if you understand the risk):

      • pvecm expected 1

    • If pvecm expected 1 fails due to CS_ERR_INVALID_PARAM, revert to the previous value (usually 2) and restore connectivity first.

  • Important: changes to quorum in a two-node setup reduce redundancy and are not recommended for long-term production use.

  1. After removal or recovery

  • Verify cluster health again

    • pvecm status

    • pvecm nodes

  • In the Proxmox GUI, refresh the Datacenter view to ensure the cluster reflects the current state.

Notes and safeguards

  • Always migrate or shut down VMs/CTs on a node before removing it, to avoid data loss or service disruption.

  • Maintain backups of VMs/containers and cluster configuration before performing removal or quorum changes.

  • If Ceph is used, ensure Ceph health (ceph -s) remains HEALTH_OK or is rebalanced safely after node removal.

  • Time synchronization is critical; ensure NTP is consistently configured across all cluster nodes.

If you can share:

  • The exact outputs of pvecm status and pvecm nodes from your environment

  • Whether the target node is online or offline

  • Your Proxmox version and cluster topology (2-node vs multi-node)
    I can tailor a precise, minimal command sequence for your situation and provide a fail-safe rollback plan.

  1. https://forum.proxmox.com/threads/remove-node-from-cluster.98752/
  2. https://forum.proxmox.com/threads/remove-node-from-cluster.107046/
  3. https://www.reddit.com/r/Proxmox/comments/1clf51c/removing_readding_nodes_from_a_proxmox_cluster/
  4. https://www.youtube.com/watch?v=6gmuf1ZZUcE
  5. https://www.youtube.com/watch?v=-qPpCjskLOg
  6. https://pve.proxmox.com/wiki/Cluster_Manager
  7. https://dzone.com/articles/how-to-remove-node-from-proxmox-cluster
  8. https://forum.proxmox.com/threads/removing-a-node-from-a-cluster-of-2-nodes.104511/
  9. https://forum.proxmox.com/threads/remove-node-from-cluster-datacenter-after-physically-removing-the-device.139006/
  10. https://forum.proxmox.com/threads/removing-all-cluster-nodes-and-clearing-cluster.151235/

Comments

Popular posts from this blog

Clearing DNS cache on vCenter Server Appliance

How to step by step in detail install ssh on cachyos

Start and stop services in vCenter Server Appliance