community.riocities.com
  • Home
  • Categories
  • Tags
  • Archives

XEN ZFS storage driver domain

Contents

  • Intro
  • Preparations
    • Switch from pygrub to pvgrub
      • In domU (Debian Jessie only)
      • In dom0 (Debian Jessie only)
    • Setup zfshost
      • Installation
      • PCI export
      • Setup xenstore
      • Install ZFS on Linux
      • Creating the tank pool
      • Switch to sysvinit
  • Patch xendomains init scripts
  • Moving existing domU data
    • Install netcat on dom0 and zfshost
    • zvol topdir for XEN domUs
    • Creating zvol for domU swap
    • LVM lv to zvol
  • Appendix
    • attaching volumes to domains (and dom0)
    • detaching volumes from domains (and dom0)

Intro¶

The storage driver domain in this howto is called zfshost

I have two small Intel 320 SSD 40GB in MD raid1 in a LVM VG (Volume Group) called vg_raid1. This VG contains 4 LV's (root+swap for dom0 and zfshost)

The LVM LVs (Logical Volume) zfshost-disk zfshost-swap are used for boot and swap for the Debian based zfs storage driver domain.

The actual ZFS storage disks are handled fully by the storage driver domain and they are on a SATA controller exported with pci-export to the zfshost domU.

The SATA disks on the exported SATA controller are in a pool called tank.

Why not FreeBSD

  • NFSv4 and Kerberos not working well enough
  • No pure PV mode (only HVM) (causes issues with pci passthrough)
  • Resize of zvol needs domU restart

Preparations¶

Switch from pygrub to pvgrub¶

Pygrub can not be used with disks directly from a storage driver domain as pygrub runs on dom0 it self. Instead all domUs using pygrub shall be changed to use pvgrub.

In domU (Debian Jessie only)¶

Change from grub-legacy to pvgrub (based on grub2)

# apt-get install grub-xen
# mv /boot/grub/menu.lst /root/
# update-grub

In dom0 (Debian Jessie only)¶

Make sure the package grub-xen-host is installed first, then apply the following diff to the domU

--- a/xen/<domU-name>.cfg
+++ b/xen/<domU-name>.cfg
@@ -8,7 +8,7 @@
 #


-bootloader = '/usr/lib/xen-4.4/bin/pygrub'
+kernel = '/usr/lib/grub-xen/grub-x86_64-xen.bin'

 vcpus       = '1'
 memory      = '1024'
@@ -17,7 +17,6 @@ memory      = '1024'
 #
 #  Disk device(s).
 #
-root        = '/dev/xvda2 ro'
 disk        = [
                   'phy:/dev/vg_raid1/<domU-name>-disk,xvda2,w',
                   'phy:/dev/vg_raid1/<domU-name>-swap,xvda1,w',

For 32-bit domUs use /usr/lib/grub-xen/grub-i386-xen.bin.

Shutdown the domU and restart it

# xl shutdown <domU-name>
  - (wait until it is down)
# xl create /etc/xen/<domU-name>.cfg -c

Setup zfshost¶

Installation¶

Install Debian Jessie as a XEN-PV on a LVM lv from the dom0 (e.g. /dev/vg_raid1/zfshost-root and /dev/vg_raid1/zfshost-swap)

The disks that will be managed by ZFS are connected to a SATA controller exported to the domU with PCI export.

PCI export¶

Find the PCI id for the SATA/SAS card to export, in my case (on a HP Microserver Gen8)

# lspci | fgrep AHCI
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family SATA AHCI Controller (rev 05)

Add the following to the end of /etc/xen/zfshost.cfg to export the pci device

pci         = [ '00:1f.2' ]

Hand the device over to the dom0 xen-pciback module

# echo xen-pciback >> /etc/modules
# modprobe xen-pciback
# xl pci-assignable-add 00:1f.2

For automatic handling of xl pci-assignable-add at reboot see here setup-pci-passthrough

Setup xenstore¶

root@zfshost:~ # RUNLEVEL=1 apt-get install --no-install-recommends xen-utils-4.4

xen-tools are usually used in a dom0, to be used in a storage driver domain we should disable services only used in a dom0

root@zfshost:~ # systemctl disable xen.service
root@zfshost:~ # systemctl disable xendomains.service

Mount /proc/xen

root@zfshost:~ # mount -t xenfs xenfs /proc/xen

Also add the /proc/xen mounting to /etc/rc.local, plus add a xenstore call about that the storage domain is online (we will wait for this in the dom0).

mount -t xenfs xenfs /proc/xen
xenstore-write /local/domain/`xenstore-read domid`/data/storage-online 1

exit 0

Install ZFS on Linux¶

Follow this guide ZoL Debian

Creating the tank pool¶

Setup disks for gpt format (without adding any partitions), you can use gdisk for this.

Create the pool with ashift for Advanced Format disks (4k sector size), this will automatically partition the disks as well:

root@zfshost:~ # zpool create -o ashift=12 tank mirror sda sdb

Or as an alternative to sd[a-z] naming you can use "disk by-id" names (see /dev/disk/by-id/)

After this the pool should be up and running (Note that I use "disk by-id" names)

root@zfshost:~ # zpool status
  pool: tank
 state: ONLINE
  scan: resilvered 240K in 0h0m with 0 errors on Sat Jul 11 23:58:12 2015
config:

        NAME                                          STATE     READ WRITE CKSUM
        tank                                          ONLINE       0     0     0
          mirror-0                                    ONLINE       0     0     0
            ata-WDC_WD20EFRX-.....                    ONLINE       0     0     0
            ata-WDC_WD20EFRX-.....                    ONLINE       0     0     0

errors: No known data errors

Set pool to autoexpand if you add larger disks later to the mirror

root@zfshost:~ # zpool set autoexpand=on tank

Switch to sysvinit¶

The pool will fail to mount with this error at reboot

zpool[231]: cannot import 'tank': no such pool or dataset

The reason for this is this zfs services and tasks start in the wrong order.

I could not find a reliable solution for this for a system with systemd (zol version 0.6.4-1.2-1), so I did a fallback to sysvinit instead:

root@zfshost:~ # apt-get install --purge -y sysvinit-core

Fix getty startup on hvc0

--- a/inittab
+++ b/inittab
-1:2345:respawn:/sbin/getty 38400 tty1
-2:23:respawn:/sbin/getty 38400 tty2
-3:23:respawn:/sbin/getty 38400 tty3
-4:23:respawn:/sbin/getty 38400 tty4
-5:23:respawn:/sbin/getty 38400 tty5
-6:23:respawn:/sbin/getty 38400 tty6
+1:2345:respawn:/sbin/getty 38400 hvc0
+#2:23:respawn:/sbin/getty 38400 tty2
+#3:23:respawn:/sbin/getty 38400 tty3
+#4:23:respawn:/sbin/getty 38400 tty4
+#5:23:respawn:/sbin/getty 38400 tty5
+#6:23:respawn:/sbin/getty 38400 tty6

Patch xendomains init scripts¶

The following patch adds storage domain support to the xendomains start script

--- a/default/xendomains
+++ b/default/xendomains
@@ -58,3 +58,7 @@ XENDOMAINS_AUTO=/etc/xen/auto
 #
 XENDOMAINS_STOP_MAXWAIT=300

+# If using a storage domain its name should be supplied. The storage
+# domain will be started first and no other domains will start before it
+# is fully online.
+XENDOMAINS_STORAGE_DOM_NAME="zfshost"
diff --git a/init.d/xendomains b/init.d/xendomains
index 5fd5a5d..1ac35db 100755
--- a/init.d/xendomains
+++ b/init.d/xendomains
@@ -150,10 +150,38 @@ do_start_auto()
   done
 }

+start_storage()
+{
+       log_action_begin_msg "Starting Storage domain $XENDOMAINS_STORAGE_DOM_NAME"
+
+        out=$(xen create --quiet --defconfig "/etc/xen/${XENDOMAINS_STORAGE_DOM_NAME}.cfg" 2>&1 1>/dev/null)
+        case "$?" in
+          0)
+            log_action_end_msg 0
+            ;;
+          *)
+            log_action_end_msg 1
+            echo "$out"
+            ;;
+        esac
+
+       sleep 5
+       stor_dom=$(xen domid $XENDOMAINS_STORAGE_DOM_NAME)
+
+       log_action_begin_msg "Waiting for storage to come online (forever)."
+       until $(xenstore-exists /local/domain/${stor_dom}/data/storage-online)
+       do
+               sleep 2
+       done
+       log_action_end_msg  0
+}
+
 do_start() 
 {
   declare -A domains

+  [ -n "$XENDOMAINS_STORAGE_DOM_NAME" ] && start_storage
+
   do_start_restore
   do_start_auto
 }
@@ -183,7 +211,7 @@ do_stop_shutdown()
 {
   while read id name rest; do
     log_action_begin_msg "Shutting down Xen domain $name ($id)"
-    xen shutdown $id 2>&1 1>/dev/null
+    xen shutdown --wait $id 2>&1 1>/dev/null
     log_action_end_msg $?
   done < <(/usr/lib/xen-common/bin/xen-init-list)
   while read id name rest; do

Moving existing domU data¶

Install netcat on dom0 and zfshost¶

# apt-get install netcat-openbsd

zvol topdir for XEN domUs¶

Create a topdir for xen domU storage with lz4 compression

zfs create -o compression=lz4 tank/xen

Creating zvol for domU swap¶

The block size should match the vm:s system page size (for Linux 64-bit it is 4k)

Example

root@zfshost:~ # zfs create -b 4k \
      -V <size>G \
      -o com.sun:auto-snapshot=false \
      tank/xen/<domU-name>-swap

LVM lv to zvol¶

WARNING Transferring data like it is done in this chapter is very fast, but it puts high stress on ZFS. When testing this on a storage domU with only 5GB RAM it resulted in a kernel panic related to that the system was out of memory. Tuning /proc/sys/vm/min_free_kbytes up to 128MB solved these problems for me.

For next start-up add the following to /etc/sysctl.conf

# Make sure ZFS does not take all memory when stressed
vm.min_free_kbytes = 128000

Create zvol for non swap

root@zfshost:~ # zfs create -V <existing-lv-size>G tank/xen/<domU-name>-disk

Start netcat on zfshost

root@zfshost:~ # nc -l 2222 > /dev/zvol/tank/xen/<domU-name>-disk

Stop domU

root@dom0:~ # xl shutdown <domU-name>

Send data from dom0

root@dom0:~ # nc zfshost 2222 < /dev/vg_raid1/<domU-name>-disk

Patch domU.cfg file

--- a/xen/<domU-name>.cfg
+++ b/xen/<domU-name>.cfg
@@ -18,8 +18,8 @@ memory      = '512'
 #  Disk device(s).
 #
 disk        = [
-                  'phy:/dev/vg_raid1/<domU-name>-disk,xvda2,w',
-                  'phy:/dev/vg_raid1/<domU-name>-swap,xvda1,w',
+                  'phy:/dev/zvol/tank/xen/<domU-name>-disk,xvda2,w,backend=zfshost',
+                  'phy:/dev/zvol/tank/xen/<domU-name>-swap,xvda1,w,backend=zfshost',
               ]

Start domU and attach the console. In pvgrub add fsck.mode=force as a kernel parameter.

root@dom0:~ # xl create /etc/xen/<domU-name>.cfg -c

In the domU

root@domU:~ # mkswap /dev/xvda1
root@domU:~ # swapon -a

Appendix¶

attaching volumes to domains (and dom0)¶

Example attach a zvol to dom0 as /dev/xvdc1

root@dom0:~ # xl block-attach Domain-0 'format=raw,backendtype=phy,backend=zfshost,vdev=xvdc1,target=/dev/zvol/tank/xen/dom0'

detaching volumes from domains (and dom0)¶

xl block-list does not work with disks from a storage driver domain, instead you need to look for <DevId> in xenstore with xenstore-ls

After finding the right <DevId> volumes can be detached as per usual xl block-detach <Domain> <DevId>

Example for dom0

root@dom0:~ # xenstore-ls | fgrep -C2 /dev/zvol/tank/xen/dom0
      51745 = ""
       frontend = "/local/domain/0/device/vbd/51745"
       params = "/dev/zvol/tank/xen/dom0"
       script = "/etc/xen/scripts/block"
       frontend-id = "0"

root@dom0:~ # xl block-detach Domain-0 51745

  • « NFSv4+Kerberos in FreeBSD
  • Fedora as Xen domU with pv-grub »

Published

Sep 11, 2015

Author

henrik

Category

HOWTOs

Tags

  • XEN 21
  • ZFS 2

Social

  • atom feed
  • rss feed
  • ipv6 ready
  • Powered by Pelican. Theme: Elegant by Talha Mansoor