Proxmox - For stubborn clusters (quorum challenges)

 


Here are concrete command-only options you can use to address the cluster not ready no quorum error, with inline comments to explain intent. These are standard steps from Proxmox cluster management practices.

Step-by-step commands (no-URL references included)

  1. Check current cluster status and node list

  • Purpose: confirm quorum state and identify node IDs/names.

  • Commands:

    • pvecm status

      shows total votes, quorum, and node list

    • pvecm nodes

      lists node IDs and names in the cluster

  • Interpretation: ensure enough online voters exist to reach quorum before attempting join or delnode operations.

  1. Verify network connectivity and time synchronization

  • Purpose: ensure all cluster nodes can communicate and have synchronized clocks.

  • Commands:

    • ping -c 3 <other-node-IP> or ping -c 3 <hostname>

    • timedatectl status

    • systemctl status corosync

  • Notes: adjust firewalls to allow cluster ports if needed (1024-65535 ranges, depending on your setup). This is not a single command, but a quick checklist to reduce quorum issues.

  1. If a node is offline but should remain part of the cluster later

  • Purpose: bring the cluster back to a majority without removing required nodes.

  • Commands (run on an online master node):

    • pvecm status

    • If the cluster currently has N votes and you know there is an offline node that should be kept for redundancy, avoid forcing a delnode until connectivity is restored.

  1. Remove a node from a healthy cluster

  • Purpose: safely remove a node when the cluster is quorate.

  • Commands (on a surviving, quorant node):

    • Identify the node name to remove (from pvecm nodes output)

    • pvecm delnode <nodename>

    • If you encounter issues with an online node, you can use a forceful removal as a last resort, but only after attempting standard removal:

      • pvecm delnode <nodename> --force

  • Cleanup residual config on the removed node (do this only if you can access the node or it’s offline):

    • On the remaining cluster node(s): rm -rf /etc/pve/nodes/<nodename>

    • If you still have access to the removed node, you can also wipe its cluster files locally; this is typically unnecessary when removal is clean.

  1. For stubborn 2-node clusters (quorum challenges)

  • Scenario: two-node cluster where one node is down or unreachable. Quorum can become impossible.

  • Approaches:

    • If the surviving node must operate standalone temporarily, you can adjust the expected votes (only if you understand the risk):

      • pvecm expected 1

    • If pvecm expected 1 fails due to CS_ERR_INVALID_PARAM, revert to the previous value (usually 2) and restore connectivity first.

  • Important: changes to quorum in a two-node setup reduce redundancy and are not recommended for long-term production use.

  1. After removal or recovery

  • Verify cluster health again

    • pvecm status

    • pvecm nodes

  • In the Proxmox GUI, refresh the Datacenter view to ensure the cluster reflects the current state.

Notes and safeguards

  • Always migrate or shut down VMs/CTs on a node before removing it, to avoid data loss or service disruption.

  • Maintain backups of VMs/containers and cluster configuration before performing removal or quorum changes.

  • If Ceph is used, ensure Ceph health (ceph -s) remains HEALTH_OK or is rebalanced safely after node removal.

  • Time synchronization is critical; ensure NTP is consistently configured across all cluster nodes.

If you can share:

  • The exact outputs of pvecm status and pvecm nodes from your environment

  • Whether the target node is online or offline

  • Your Proxmox version and cluster topology (2-node vs multi-node)
    I can tailor a precise, minimal command sequence for your situation and provide a fail-safe rollback plan.

  1. https://forum.proxmox.com/threads/remove-node-from-cluster.98752/
  2. https://forum.proxmox.com/threads/remove-node-from-cluster.107046/
  3. https://www.reddit.com/r/Proxmox/comments/1clf51c/removing_readding_nodes_from_a_proxmox_cluster/
  4. https://www.youtube.com/watch?v=6gmuf1ZZUcE
  5. https://www.youtube.com/watch?v=-qPpCjskLOg
  6. https://pve.proxmox.com/wiki/Cluster_Manager
  7. https://dzone.com/articles/how-to-remove-node-from-proxmox-cluster
  8. https://forum.proxmox.com/threads/removing-a-node-from-a-cluster-of-2-nodes.104511/
  9. https://forum.proxmox.com/threads/remove-node-from-cluster-datacenter-after-physically-removing-the-device.139006/
  10. https://forum.proxmox.com/threads/removing-all-cluster-nodes-and-clearing-cluster.151235/

Comments

Popular posts from this blog

Clearing DNS cache on vCenter Server Appliance

Start and stop services in vCenter Server Appliance