[[ch-pacemaker]]
== Integrating DRBD with Pacemaker clusters

indexterm:[Pacemaker]Using DRBD in conjunction with the Pacemaker
cluster stack is arguably DRBD's most frequently found use
case. Pacemaker is also one of the applications that make DRBD
extremely powerful in a wide variety of usage scenarios.

[[s-pacemaker-primer]]
=== Pacemaker primer

Pacemaker is a sophisticated, feature-rich, and widely deployed
cluster resource manager for the Linux platform. It comes with a rich
set of documentation. In order to understand this chapter, reading the
following documents is highly recommended:

* http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf[Clusters
  From Scratch], a step-by-step guide to configuring high availability
  clusters;
* http://www.clusterlabs.org/doc/crm_cli.html[CRM CLI (command line
  interface) tool], a manual for the CRM shell, a simple and intuitive
  command line interface bundled with Pacemaker;
* http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-intro-pacemaker.html[Pacemaker
  Configuration Explained], a reference document explaining the
  concept and design behind Pacemaker.


[[s-pacemaker-crm-drbd-backed-service]]
=== Adding a DRBD-backed service to the cluster configuration

This section explains how to enable a DRBD-backed service in a
Pacemaker cluster.

NOTE: If you are employing the DRBD OCF resource agent, it is
recommended that you defer DRBD startup, shutdown, promotion, and
demotion _exclusively_ to the OCF resource agent. That means that you
should disable the DRBD init script:

----------------------------
chkconfig drbd off
----------------------------

The +ocf:linbit:drbd+ OCF resource agent provides Master/Slave
capability, allowing Pacemaker to start and monitor the DRBD resource
on multiple nodes and promoting and demoting as needed. You must,
however, understand that the +drbd+ RA disconnects and detaches all
DRBD resources it manages on Pacemaker shutdown, and also upon
enabling standby mode for a node.


IMPORTANT: The OCF resource agent which ships with DRBD belongs to the
+linbit+ provider, and hence installs as
+/usr/lib/ocf/resource.d/linbit/drbd+. There is a legacy resource
agent that ships as part of the OCF resource agents package, which
uses the +heartbeat+ provider and installs into
+/usr/lib/ocf/resource.d/heartbeat/drbd+. The legacy OCF RA is
deprecated and should no longer be used.

In order to enable a DRBD-backed configuration for a MySQL database in
a Pacemaker CRM cluster with the +drbd+ OCF resource agent, you must
create both the necessary resources, and Pacemaker constraints to
ensure your service only starts on a previously promoted DRBD
resource. You may do so using the +crm+ shell, as outlined in the
following example:

.Pacemaker configuration for DRBD-backed MySQL service
----------------------------
crm configure
crm(live)configure# primitive drbd_mysql ocf:linbit:drbd \
                    params drbd_resource="mysql" \
                    op monitor interval="29s" role="Master" \
                    op monitor interval="31s" role="Slave"
crm(live)configure# ms ms_drbd_mysql drbd_mysql \
                    meta master-max="1" master-node-max="1" \
                         clone-max="2" clone-node-max="1" \
                         notify="true"
crm(live)configure# primitive fs_mysql ocf:heartbeat:Filesystem \
                    params device="/dev/drbd/by-res/mysql" \
                      directory="/var/lib/mysql" fstype="ext3"
crm(live)configure# primitive ip_mysql ocf:heartbeat:IPaddr2 \
                    params ip="10.9.42.1" nic="eth0"
crm(live)configure# primitive mysqld lsb:mysqld
crm(live)configure# group mysql fs_mysql ip_mysql mysqld
crm(live)configure# colocation mysql_on_drbd \
                      inf: mysql ms_drbd_mysql:Master
crm(live)configure# order mysql_after_drbd \
                      inf: ms_drbd_mysql:promote mysql:start
crm(live)configure# commit
crm(live)configure# exit
bye
----------------------------

After this, your configuration should be enabled. Pacemaker now
selects a node on which it promotes the DRBD resource, and then starts
the DRBD-backed resource group on that same node.

[[s-pacemaker-fencing]]
=== Using resource-level fencing in Pacemaker clusters

This section outlines the steps necessary to prevent Pacemaker from
promoting a +drbd+ Master/Slave resource when its DRBD replication link
has been interrupted. This keeps Pacemaker from starting a service
with outdated data and causing an unwanted "time warp" in the
process.

In order to enable any resource-level fencing for DRBD, you must add
the following lines to your resource configuration:

[source,drbd]
----------------------------
resource <resource> {
  disk {
    fencing resource-only;
    ...
  }
}
----------------------------

You will also have to make changes to the +handlers+ section depending
on the cluster infrastructure being used:

* Heartbeat-based Pacemaker clusters can employ the configuration
  outlined in <<s-pacemaker-fencing-dopd>>.
* Both Corosync- and Heartbeat-based clusters can use the
  functionality explained in <<s-pacemaker-fencing-cib>>.

IMPORTANT: It is absolutely vital to configure at least two
independent cluster communications channels for this functionality to
work correctly. Heartbeat-based Pacemaker clusters should define at
least two cluster communication links in their +ha.cf+ configuration
files. Corosync clusters should list at least two redundant rings in
+corosync.conf+.

[[s-pacemaker-fencing-dopd]]
==== Resource-level fencing with +dopd+

indexterm:[dopd]In Heartbeat-based Pacemaker clusters, DRBD can
use a resources-level fencing facility named the _DRBD outdate-peer
daemon_, or +dopd+ for short.


[[s-dopd-heartbeat-config]]
===== Heartbeat configuration for +dopd+

To enable dopd, you must add these lines to your indexterm:[ha.cf
(Heartbeat configuration file)]+/etc/ha.d/ha.cf+ file:

[source,drbd]
----------------------------
respawn hacluster /usr/lib/heartbeat/dopd
apiauth dopd gid=haclient uid=hacluster
----------------------------

You may have to adjust +dopd+'s path according to your preferred
distribution. On some distributions and architectures, the correct
path is +/usr/lib64/heartbeat/dopd+.

After you have made this change and copied +ha.cf+ to the peer node,
put Pacemaker in maintenance mode and run `/etc/init.d/heartbeat
reload` to have Heartbeat re-read its configuration file. Afterwards,
you should be able to verify that you now have a running +dopd+
process.

NOTE: You can check for this process either by running `ps ax | grep
dopd` or by issuing `killall -0 dopd`.


[[s-dopd-drbd-config]]
===== DRBD Configuration for +dopd+

Once +dopd+ is running, add these items to your DRBD resource
configuration:

[source,drbd]
----------------------------
resource <resource> {
    handlers {
        fence-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
        ...
    }
    disk {
        fencing resource-only;
        ...
    }
    ...
}
----------------------------

As with +dopd+, your distribution may place the +drbd-peer-outdater+
binary in +/usr/lib64/heartbeat+ depending on your system
architecture.

Finally, copy your +drbd.conf+ to the peer node and issue `drbdadm
adjust resource` to reconfigure your resource and reflect your
changes.

[[s-dopd-test]]
===== Testing +dopd+ functionality

To test whether your +dopd+ setup is working correctly, interrupt the
replication link of a configured and connected resource while
Heartbeat services are running normally. You may do so simply by
physically unplugging the network link, but that is fairly
invasive. Instead, you may insert a temporary +iptables+ rule to drop
incoming DRBD traffic to the TCP port used by your resource.

After this, you will be able to observe the resource
<<s-connection-states,connection state>> change from
indexterm:[connection state]indexterm:[Connected (connection state)]
+Connected+ to indexterm:[connection state]indexterm:[WFConnection
(connection state)]+WFConnection+. Allow a few seconds to pass, and
you should see the <<s-disk-states,disk state>>become indexterm:[disk
state]indexterm:[Outdated (disk state)]+Outdated/DUnknown+. That is
what +dopd+ is responsible for.

Any attempt to switch the outdated resource to the primary role will
fail after this.

When re-instituting network connectivity (either by plugging the
physical link or by removing the temporary +iptables+ rule you inserted
previously), the connection state will change to +Connected+, and then
promptly to +SyncTarget+ (assuming changes occurred on the primary node
during the network interruption). Then you will be able to observe a
brief synchronization period, and finally, the previously outdated
resource will be marked as indexterm:[disk state]indexterm:[UpToDate
(disk state)]+UpToDate+ again.


[[s-pacemaker-fencing-cib]]
==== Resource-level fencing using the Cluster Information Base (CIB)

In order to enable resource-level fencing for Pacemaker, you will have
to set two options in +drbd.conf+:

[source,drbd]
----------------------------
resource <resource> {
  disk {
    fencing resource-only;
    ...
  }
  handlers {
    fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
    after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
    ...
  }
  ...
}
----------------------------

Thus, if the DRBD replication link becomes disconnected, the
+crm-fence-peer.sh+ script contacts the cluster manager, determines the
Pacemaker Master/Slave resource associated with this DRBD resource,
and ensures that the Master/Slave resource no longer gets promoted on
any node other than the currently active one. Conversely, when the
connection is re-established and DRBD completes its synchronization
process, then that constraint is removed and the cluster manager is
free to promote the resource on any node again.

[[s-pacemaker-stacked-resources]]
=== Using stacked DRBD resources in Pacemaker clusters

Stacked resources allow DRBD to be used for multi-level redundancy in
multiple-node clusters, or to establish off-site disaster recovery
capability. This section describes how to configure DRBD and Pacemaker
in such configurations.

[[s-pacemaker-stacked-dr]]
==== Adding off-site disaster recovery to Pacemaker clusters

In this configuration scenario, we would deal with a two-node high
availability cluster in one site, plus a separate node which would
presumably be housed off-site. The third node acts as a disaster
recovery node and is a standalone server. Consider the following
illustration to describe the concept.

.DRBD resource stacking in Pacemaker clusters
image::drbd-resource-stacking-pacemaker-3nodes[scaledwidth="100%"]

In this example, +alice+ and +bob+ form a two-node Pacemaker cluster,
whereas +charlie+ is an off-site node not managed by Pacemaker.

To create such a configuration, you would first configure and
initialize DRBD resources as described in <<s-three-nodes>>. Then,
configure Pacemaker with the following CRM configuration:

[source,drbd]
----------------------------
primitive p_drbd_r0 ocf:linbit:drbd \
	params drbd_resource="r0"

primitive p_drbd_r0-U ocf:linbit:drbd \
	params drbd_resource="r0-U"

primitive p_ip_stacked ocf:heartbeat:IPaddr2 \
	params ip="192.168.42.1" nic="eth0"

ms ms_drbd_r0 p_drbd_r0 \
	meta master-max="1" master-node-max="1" \
        clone-max="2" clone-node-max="1" \
        notify="true" globally-unique="false"

ms ms_drbd_r0-U p_drbd_r0-U \
	meta master-max="1" clone-max="1" \
        clone-node-max="1" master-node-max="1" \
        notify="true" globally-unique="false"

colocation c_drbd_r0-U_on_drbd_r0 \
        inf: ms_drbd_r0-U ms_drbd_r0:Master

colocation c_drbd_r0-U_on_ip \
        inf: ms_drbd_r0-U p_ip_stacked

colocation c_ip_on_r0_master \
        inf: p_ip_stacked ms_drbd_r0:Master

order o_ip_before_r0-U \
        inf: p_ip_stacked ms_drbd_r0-U:start

order o_drbd_r0_before_r0-U \
        inf: ms_drbd_r0:promote ms_drbd_r0-U:start
----------------------------

Assuming you created this configuration in a temporary file named
+/tmp/crm.txt+, you may import it into the live cluster configuration
with the following command:

----------------------------
crm configure < /tmp/crm.txt
----------------------------

This configuration will ensure that the following actions occur in the
correct order on the +alice+/ +bob+ cluster:

. Pacemaker starts the DRBD resource +r0+ on both cluster nodes, and
  promotes one node to the Master (DRBD Primary) role.

. Pacemaker then starts the IP address 192.168.42.1, which the stacked
  resource is to use for replication to the third node. It does so on
  the node it has previously promoted to the Master role for +r0+ DRBD
  resource.

. On the node which now has the Primary role for +r0+ and also the
  replication IP address for +r0-U+, Pacemaker now starts the
  +r0-U+ DRBD resource, which connects and replicates to the off-site
  node.

. Pacemaker then promotes the +r0-U+ resource to the Primary role too,
  so it can be used by an application.

Thus, this Pacemaker configuration ensures that there is not only full
data redundancy between cluster nodes, but also to the third, off-site
node.

NOTE: This type of setup is usually deployed together with
<<s-drbd-proxy,DRBD Proxy>>.

[[s-pacemaker-stacked-4way]]
==== Using stacked resources to achieve 4-way redundancy in Pacemaker clusters

In this configuration, a total of three DRBD resources (two unstacked,
one stacked) are used to achieve 4-way storage redundancy. This means
that of a 4-node cluster, up to three nodes can fail while still
providing service availability.

Consider the following illustration to explain the concept.

.DRBD resource stacking in Pacemaker clusters
image::drbd-resource-stacking-pacemaker-4nodes[scaledwidth="100%"]

In this example, +alice+, +bob+, +charlie+, and +daisy+ form two
two-node Pacemaker clusters. +alice+ and +bob+ form the cluster named
+left+ and replicate data using a DRBD resource between them, while
+charlie+ and +daisy+ do the same with a separate DRBD resource, in a
cluster named +right+. A third, stacked DRBD resource connects the two
clusters.

NOTE: Due to limitations in the Pacemaker cluster manager as of
Pacemaker version 1.0.5, it is not possible to create this setup in a
single four-node cluster without disabling CIB validation, which is an
advanced process not recommended for general-purpose use. It is
anticipated that this is being addressed in future Pacemaker releases.

To create such a configuration, you would first configure and
initialize DRBD resources as described in <<s-three-nodes>> (except
that the remote half of the DRBD configuration is also stacked, not
just the local cluster). Then, configure Pacemaker with the following
CRM configuration, starting with the cluster +left+:

[source,drbd]
----------------------------
primitive p_drbd_left ocf:linbit:drbd \
	params drbd_resource="left"

primitive p_drbd_stacked ocf:linbit:drbd \
	params drbd_resource="stacked"

primitive p_ip_stacked_left ocf:heartbeat:IPaddr2 \
	params ip="10.9.9.100" nic="eth0"

ms ms_drbd_left p_drbd_left \
	meta master-max="1" master-node-max="1" \
        clone-max="2" clone-node-max="1" \
        notify="true"

ms ms_drbd_stacked p_drbd_stacked \
	meta master-max="1" clone-max="1" \
        clone-node-max="1" master-node-max="1" \
        notify="true" target-role="Master"

colocation c_ip_on_left_master \
        inf: p_ip_stacked_left ms_drbd_left:Master

colocation c_drbd_stacked_on_ip_left \
        inf: ms_drbd_stacked p_ip_stacked_left

order o_ip_before_stacked_left \
        inf: p_ip_stacked_left ms_drbd_stacked:start

order o_drbd_left_before_stacked_left \
        inf: ms_drbd_left:promote ms_drbd_stacked:start

----------------------------

Assuming you created this configuration in a temporary file named
+/tmp/crm.txt+, you may import it into the live cluster configuration
with the following command:

----------------------------
crm configure < /tmp/crm.txt
----------------------------

After adding this configuration to the CIB, Pacemaker will execute the
following actions:

. Bring up the DRBD resource +left+ replicating between +alice+ and
  +bob+ promoting the resource to the Master role on one of these nodes.

. Bring up the IP address 10.9.9.100 (on either +alice+ or +bob+,
  depending on which of these holds the Master role for the resource
  +left+).

. Bring up the DRBD resource +stacked+ on the same node that holds the
  just-configured IP address.

. Promote the stacked DRBD resource to the Primary role.

Now, proceed on the cluster +right+ by creating the following
configuration:

[source,drbd]
----------------------------
primitive p_drbd_right ocf:linbit:drbd \
	params drbd_resource="right"

primitive p_drbd_stacked ocf:linbit:drbd \
	params drbd_resource="stacked"

primitive p_ip_stacked_right ocf:heartbeat:IPaddr2 \
	params ip="10.9.10.101" nic="eth0"

ms ms_drbd_right p_drbd_right \
	meta master-max="1" master-node-max="1" \
        clone-max="2" clone-node-max="1" \
        notify="true"

ms ms_drbd_stacked p_drbd_stacked \
	meta master-max="1" clone-max="1" \
        clone-node-max="1" master-node-max="1" \
        notify="true" target-role="Slave"

colocation c_drbd_stacked_on_ip_right \
        inf: ms_drbd_stacked p_ip_stacked_right

colocation c_ip_on_right_master \
        inf: p_ip_stacked_right ms_drbd_right:Master

order o_ip_before_stacked_right \
        inf: p_ip_stacked_right ms_drbd_stacked:start

order o_drbd_right_before_stacked_right \
        inf: ms_drbd_right:promote ms_drbd_stacked:start
----------------------------

After adding this configuration to the CIB, Pacemaker will execute the
following actions:

. Bring up the DRBD resource +right+ replicating between +charlie+ and
  +daisy+, promoting the resource to the Master role on one of these
  nodes.

. Bring up the IP address 10.9.10.101 (on either +charlie+ or +daisy+,
  depending on which of these holds the Master role for the resource
  +right+).

. Bring up the DRBD resource +stacked+ on the same node that holds the
  just-configured IP address.

. Leave the stacked DRBD resource in the Secondary role (due to
  +target-role="Slave"+).

[[s-pacemaker-floating-peers]]
=== Configuring DRBD to replicate between two SAN-backed Pacemaker clusters

This is a somewhat advanced setup usually employed in split-site
configurations. It involves two separate Pacemaker clusters, where
each cluster has access to a separate Storage Area Network (SAN). DRBD
is then used to replicate data stored on that SAN, across an IP link
between sites.

Consider the following illustration to describe the concept.

.Using DRBD to replicate between SAN-based clusters
image::drbd-pacemaker-floating-peers[scaledwidth="100%"]

Which of the individual nodes in each site currently acts as the DRBD
peer is not explicitly defined -- the DRBD peers
<<s-floating-peers,are said to _float_>>; that is, DRBD binds to
virtual IP addresses not tied to a specific physical machine.


NOTE: This type of setup is usually deployed together with
<<s-drbd-proxy,DRBD Proxy>>and/or <<s-truck-based-replication,truck
based replication>>.

Since this type of setup deals with shared storage, configuring and
testing STONITH is absolutely vital for it to work properly.

[[s-pacemaker-floating-peers-drbd-config]]
==== DRBD resource configuration

To enable your DRBD resource to float, configure it in +drbd.conf+ in
the following fashion:

[source,drbd]
----------------------------
resource <resource> {
  ...
  device /dev/drbd0;
  disk /dev/sda1;
  meta-disk internal;
  floating 10.9.9.100:7788;
  floating 10.9.10.101:7788;
}
----------------------------

The +floating+ keyword replaces the +on <host>+ sections normally
found in the resource configuration. In this mode, DRBD identifies
peers by IP address and TCP port, rather than by host name. It is
important to note that the addresses specified must be virtual cluster
IP addresses, rather than physical node IP addresses, for floating to
function properly. As shown in the example, in split-site
configurations the two floating addresses can be expected to belong to
two separate IP networks -- it is thus vital for routers and firewalls
to properly allow DRBD replication traffic between the nodes.

[[s-pacemaker-floating-peers-crm-config]]
==== Pacemaker resource configuration

A DRBD floating peers setup, in terms of Pacemaker configuration,
involves the following items (in each of the two Pacemaker clusters
involved):

* A virtual cluster IP address.

* A master/slave DRBD resource (using the DRBD OCF resource agent).

* Pacemaker constraints ensuring that resources are started on the
  correct nodes, and in the correct order.

To configure a resource named +mysql+ in a floating peers
configuration in a 2-node cluster, using the replication address
+10.9.9.100+, configure Pacemaker with the following +crm+ commands:

----------------------------
crm configure
crm(live)configure# primitive p_ip_float_left ocf:heartbeat:IPaddr2 \
                    params ip=10.9.9.100
crm(live)configure# primitive p_drbd_mysql ocf:linbit:drbd \
                    params drbd_resource=mysql
crm(live)configure# ms ms_drbd_mysql drbd_mysql \
                    meta master-max="1" master-node-max="1" \
                         clone-max="1" clone-node-max="1" \
                         notify="true" target-role="Master"
crm(live)configure# order drbd_after_left \
                      inf: p_ip_float_left ms_drbd_mysql
crm(live)configure# colocation drbd_on_left \
                      inf: ms_drbd_mysql p_ip_float_left
crm(live)configure# commit
bye
----------------------------

After adding this configuration to the CIB, Pacemaker will execute the
following actions:

. Bring up the IP address 10.9.9.100 (on either +alice+ or +bob+).
. Bring up the DRBD resource according to the IP address configured.
. Promote the DRBD resource to the Primary role.

Then, in order to create the matching configuration in the other
cluster, configure _that_ Pacemaker instance with the following
commands:

----------------------------
crm configure
crm(live)configure# primitive p_ip_float_right ocf:heartbeat:IPaddr2 \
                    params ip=10.9.10.101
crm(live)configure# primitive drbd_mysql ocf:linbit:drbd \
                    params drbd_resource=mysql
crm(live)configure# ms ms_drbd_mysql drbd_mysql \
                    meta master-max="1" master-node-max="1" \
                         clone-max="1" clone-node-max="1" \
                         notify="true" target-role="Slave"
crm(live)configure# order drbd_after_right \
                      inf: p_ip_float_right ms_drbd_mysql
crm(live)configure# colocation drbd_on_right
                      inf: ms_drbd_mysql p_ip_float_right
crm(live)configure# commit
bye
----------------------------

After adding this configuration to the CIB, Pacemaker will execute the
following actions:

. Bring up the IP address 10.9.10.101 (on either +charlie+ or
  +daisy+).
. Bring up the DRBD resource according to the IP address configured.
. Leave the DRBD resource in the Secondary role (due to
  +target-role="Slave"+).

[[s-pacemaker-floating-peers-site-fail-over]]
==== Site fail-over

In split-site configurations, it may be necessary to transfer services
from one site to another. This may be a consequence of a scheduled
transition, or of a disastrous event. In case the transition is a
normal, anticipated event, the recommended course of action is this:

* Connect to the cluster on the site about to relinquish resources,
  and change the affected DRBD resource's +target-role+ attribute from
  +Master+ to +Slave+. This will shut down any resources depending on
  the Primary role of the DRBD resource, demote it, and continue to
  run, ready to receive updates from a new Primary.

* Connect to the cluster on the site about to take over resources, and
  change the affected DRBD resource's +target-role+ attribute from
  +Slave+ to +Master+. This will promote the DRBD resources, start any
  other Pacemaker resources depending on the Primary role of the DRBD
  resource, and replicate updates to the remote site.

* To fail back, simply reverse the procedure.

In the event that of a catastrophic outage on the active site, it can
be expected that the site is off line and no longer replicated to the
backup site. In such an event:

* Connect to the cluster on the still-functioning site resources, and
  change the affected DRBD resource's +target-role+ attribute from
  +Slave+ to +Master+. This will promote the DRBD resources, and start
  any other Pacemaker resources depending on the Primary role of the
  DRBD resource.

* When the original site is restored or rebuilt, you may connect the
  DRBD resources again, and subsequently fail back using the reverse
  procedure.
