community.riocities.com
  • Home
  • Categories
  • Tags
  • Archives

Alpine Linux as a Xen Storage Driver Domain

Contents

  • Alpine Linux as a Xen Storage Driver Domain
  • Various references
    • Dom0 work
      • domU's boot disk
      • dom0 BIOS
      • PCI Passthru
      • domU configuration file
      • Handover PCI Device to dom0 xen-pciback module
      • Verify PCI Passthru
      • Start domU
    • domU work
      • Disks visible?
      • Partition the disks
      • RAID and LVM configuration
      • Mountpoints etc
      • Running the setup-alpine
      • Modules configuration
      • MDADM config
      • Store filesystem
      • Update GRUB
      • Fix initfs
      • Time to halt
    • Back to dom0
      • Fix dom0's domU config file
      • Start domU
      • Add udev
      • Add normal user
      • Add sudo
      • Set minimum free memory
      • Reboot to confirm udev
      • Time to set started flag
      • Fix autostart of domU
        • Edit the /etc/conf.d/xendomains file
        • Fix the /etc/init.d/xendomains file to handle the storage domain correctly
        • Store the LBU state and reboot to verify
    • ZFS
      • Add packages
      • Option A: Create a pool called tank and add a mirrored SLOG
      • Option B: Create the tank pool without the SLOG
      • ZFS Cron job
      • ZFS boot services
      • ZFS Basic Config
      • ZFS Xen Related
      • Update Your system
      • Add swap to the zfshost
      • ZFS volumes for a new domU
        • On the zfshost
        • On the dom0
      • Useful commands

Alpine Linux as a Xen Storage Driver Domain¶

Guide to configure an Xen Storage Driver Domain based on Alpine 3.11, the Dom0 is setup like Alpine Dom0 V3.8 but later upgraded in accordance with Alpine dom0 upgrade.

The aim of this domU is to serve ZFS at the end, hence the name of the domU is zfshost

To do this I have the following extra hardware

  • SSD Root/SLOG Samsung SM863 120GB 2.5 inch 7mm SSD, 2 pieces so I can mirror them.
  • HDD Tank WD Red 6TB NAS HDD 3.5" 6Gb/s Intellipower WD60EFRX, 2 pieces so I can mirror them

On the SSD disks I will partition it as below

  • 30GB for Raid and LVM. Will contain Root for zfshost (and some unused space in the LVM VG).
  • 10GB unmarked. Will contain SLOG
  • Rest will be un-partitioned for increased endurance, and future use.

The HDD disks will not be partitioned at all. ZFS will handle them straight up.

The end result will have /boot mounted from dom0 (virtual disk), while rest from the local domU disks.

Mount Point "From where" Disk
/boot dom0 USB-Stick LVM LV zfshost-boot in vg_domU (in dom0)
/ domU SSD LVM LV lv_root in vg_zfshost
SLOG domU SSD Two raw partitions
zfs pool disks domU HDD Whole disks

Various references¶

Here are some various references I have been looking at

  1. HP Microserver Gen8
  2. Alpine dom0
  3. Alpine domU
  4. XEN Storage Domain Driver
  5. ZFS On Linux
  6. ZFS Administration Intent Log
  7. PCI Passthrough
  8. Installing ZFS and setting Pool
  9. ZFS Compression
  10. Creating a ZFS File System Hierarchy

Dom0 work¶

In Alpine Dom0 V3.8 we created a very basic Xen dom0 server, which was only prepared for it's domU guests. Now we need to add the specific parts related to this ZFS domU, mainly

  • Virtual boot disk
  • domU configuration file
  • PCI Pass-through so we can access the required domU disks (SDD and HDD)

domU's boot disk¶

We need to create, and prepare the disk for the domU

dom0 # lvcreate -n zfshost-boot -L 512M vg_domU
dom0 # apk add e2fsprogs
dom0 # mkfs.ext4 /dev/vg_domU/zfshost-boot

dom0 BIOS¶

We will use ZFS and mdadm, and let these systems handle the RAID part, hence we need to de-activate hardware based RAID in the BIOS, as well as enable AHCI mode.

  • Disable hardware based RAID in BIOS
  • Enable AHCI mode in BIOS

PCI Passthru¶

Now we need to find out which PCI device our SSD disks are connected to. The default lspci application (part of busybox) will not provide enough information, so we need to install a more feature rich implemenation. In my case, I will look for the hotplug SATA devices. If you had an PCI board, look for what chip-set you have, and search for this in the lspci output

dom0 # apk add pciutils
dom0 # lspci -k
...
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port Desktop SATA AHCI Controller (rev 05)
        Subsystem: Hewlett-Packard Company Device 330d
        Kernel driver in use: pciback
...
dom0 #

Verify xen_pciback is in /etc/modules, if not, add it.

dom0 # grep xen_pciback /etc/modules

Confirm that the module xen_pciback is loaded

dom0 # modprobe xen_pciback

domU configuration file¶

Ok, time to create the domU configuration file.

  • Observe that the MAC Address has to be uniq among the dom0 and all domUs. A tool to help you with this might be random_mac.py, as an example.
  • Make sure that you specify the correct PCI device to pass through to domU.
  • The cdrom points to the installer image which was prepared in the dom0 installation

If you do it manually, please start the MAC Address with 00:16:3E followed by a unique combination for you. For instance 00:16:3e:AA:AA:01 or 00:16:3e:BE:EF:01 or something similar unique for Your Network

dom0 # cat > /etc/xen/zfshost.cfg
#####
##### zfshost domU
#####
vcpus       = '1'
memory      = '8096'
maxmem      = '8096'
kernel      = "/domU_installer/vmlinuz-lts"
ramdisk     = "/domU_installer/initramfs-lts"
extra       = "alpine_dev=hdc:iso9660 modules=loop,squashfs,sd-mod,usb-storage console=hvc0"
disk        = [
                  'file://domU_installer/alpine-extended-3.11.2-x86_64.iso,hdc:cdrom,r',
                  'phy:/dev/vg_domU/zfshost-boot,xvda1,w',
              ]
name        = 'zfshost'
## ENSURE MAC ADDRESS IS UNIQ!!!
vif         = [ 'mac=<Unique MAC Address>,bridge=br0' ]
on_poweroff = 'destroy'
on_reboot   = 'restart'
on_crash    = 'restart'
pci         = [ '00:1f.2' ]
CTRL-D
dom0 #

If You have several PCI Devices You want to Passthru to the domU then should the line pci = [ '00:1f.2' ] above be changed to pci = [ '00:1f.2', 'XX:YY.C', 'XX:YY.C', ... ], where the XX:YY.C has to be changed to the proper PCI bus adresses.

Handover PCI Device to dom0 xen-pciback module¶

To configure automatic PCI handover on every reboot, we need to modify /etc/conf.d/xen_pci and add your device to the list of devices.

dom0 # vi /etc/conf.d/xen-pci
...
DEVICES="00:1f.2"
...

If You have several PCI devices then the DEVICES line should look like below:

DEVICES="00:1f.2 XX:YY.C ..."

And last add it to rc-update so it is executed on reboot.

dom0 # rc-update add xen-pci
 * service xen-pci added to runlevel default
dom0 # lbu commit
dom0 #

Verify PCI Passthru¶

Reboot dom0 to verify that the PCI Passthru is working.

dom0 # reboot

And when system is up and running, verify

dom0 # xl pci-assignable-list
0000:00:1f.2

Start domU¶

First confirm that the installation directory is mounted.

dom0 # mount /domU_installer

Then it is time to start the installation, to do this we simply start the domU

dom0 # xl create /etc/xen/zfshost.cfg -c

To get back to the dom0 environment from the domU console, you press CTRL+]

domU work¶

Disks visible?¶

Did the PCI Passthru work? Lets check

# dmesg | grep logical\ blocks
[    5.242266] sd 0:0:0:0: [sda] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB)
[    5.718972] sd 1:0:0:0: [sdb] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB)
[    6.194013] sd 2:0:0:0: [sdc] 234441648 512-byte logical blocks: (120 GB/112 GiB)
[    6.667206] sd 3:0:0:0: [sdd] 234441648 512-byte logical blocks: (120 GB/112 GiB)

Yes, in my case it the PCI Passthru worked just fine.

  • sdc : (Optional) This disk will have a RAID, and a SLOG partition created at this stage.
  • sdd : (Optional) This disk will have a RAID, and a SLOG partition created at this stage.
  • sdc & sdd RAID partitions are mirrored, with LVM on top, which will contain the root filesystem (/)
  • sda & sdb will not be touched at this stage. (Data disks for ZFS)

Partition the disks¶

Partitioning of the disks (sdc and sdd) is done using fdisk for instance. These disks will contain the root volume under RAID/LVM control in the first partition, while the second partition will contain the SLOG under ZFS control.

# fdisk /dev/sdc
# fdisk /dev/sdd

The result should be something like this. Best to align with 4096 sectors.

Device  Boot StartCHS    EndCHS        StartLBA     EndLBA    Sectors  Size Id Type
/dev/sdc1    0,65,2      1023,254,63       4096   70332415   70328320 33.5G da Unknown
/dev/sdc2    1023,254,63 1023,254,63   70332416   93775871   23443456 11.1G da Unknown

/dev/sdd1    0,65,2      1023,254,63       4096   70332415   70328320 33.5G da Unknown
/dev/sdd2    1023,254,63 1023,254,63   70332416   93775871   23443456 11.1G da Unknown

We use partition type da (non-fs data) as this is the type recommended to be used with mdadm Partition_Types and it also works very good to use this for the SLOG partition.

RAID and LVM configuration¶

We are missing the MD RAID and LVM packages, so lets install them

# apk add mdadm lvm2

and create the raid device

# mdadm --zero-superblock /dev/sdc1
# mdadm --zero-superblock /dev/sdd1
# mdadm --create --bitmap=internal md0 --level=1 --raid-devices=2 /dev/sdc1 /dev/sdd1

If you want to see the result the mirroring or follow the synchronization, check with /proc/mdstat

# cat /proc/mdstat

If it takes a long time, and you want to see it progress...

# watch cat /proc/mdstat

and lets create the new LVM devices for this domU. In case if this is not the first attempt, just answer Y on a confirmation question.

# pvcreate -ff -y /dev/md/md0
# vgcreate vg_zfshost /dev/md/md0
# lvcreate -y -n lv_root -L 10G vg_zfshost
# apk add e2fsprogs
# mkfs.ext4 /dev/vg_zfshost/lv_root

Mountpoints etc¶

Time to configure the mountpoints for root and boot, as well as mount them. We will put them under /mnt for the installation process.

Boot (/mnt/boot) will be mounted from a virtual disk provided by dom0. One good advantage with this is that you can look at (read troubleshoot) the domU's boot disk while you are in dom0.

Root will mounted from the domU's local SSD virtual disk.

# mount /dev/vg_zfshost/lv_root /mnt
# mkdir /mnt/boot
# mount /dev/xvda1 /mnt/boot

Running the setup-alpine¶

Finally, time to configure (setup) the actual alpine part

Key things to remember

  • Answer none on last questions (Disks, config, and apk repository)
  • Which disk(s) would you like to use? (or '?' for help or 'none') [none]
  • Enter where to store configs ('floppy', 'usb' or 'none') [none]:
  • Enter apk cache directory (or '?' or 'none') [/var/cache/apk]: none
# setup-alpine
Available keyboard layouts:
af     be     cn     fi     hu     jp     lt     my     ro     tj
al     bg     cz     fo     ie     ke     lv     ng     rs     tm
am     br     de     fr     il     kg     ma     nl     ru     tr
ara    brai   dk     gb     in     kr     md     no     se     tw
at     by     dz     ge     iq     kz     me     ph     si     ua
az     ca     ee     gh     ir     la     mk     pk     sk     us
ba     ch     epo    gr     is     latam  ml     pl     sy     uz
bd     cm     es     hr     it     lk     mt     pt     th
Select keyboard layout [none]: us
Available variants: us-alt-intl us-altgr-intl us-chr us-colemak us-dvorak-alt-intl us-dvorak-classic us-dvorak-intl us-dvorak-l us-dvorak-r us-dvorak us-dvp us-euro us-hbs us-intl us-mac us-olpc2 us-rus us-workman-intl us-workman us
Select variant []: us
 * Caching service dependencies ... [ ok ]
 * Setting keymap ... [ ok ]
Enter system hostname (short form, e.g. 'foo') [localhost]: zfshost
Available interfaces are: eth0.
Enter '?' for help on bridges, bonding and vlans.
Which one do you want to initialize? (or '?' or 'done') [eth0]
Ip address for eth0? (or 'dhcp', 'none', '?') [dhcp] 192.168.1.19/24
Gateway? (or 'none') [none] 192.168.1.1
Configuration for eth0:
  type=static
  address=192.168.1.19
  netmask=255.255.255.0
  gateway=192.168.1.1
Do you want to do any manual network configuration? [no]
DNS domain name? (e.g 'bar.com') [] example.com
DNS nameserver(s)? [] 8.8.8.8
Changing password for root
New password:
Retype password:
passwd: password for root changed by root
Which timezone are you in? ('?' for list) [UTC] Australia/Melbourne
 * Starting busybox acpid ... [ ok ]
 * Starting busybox crond ... [ ok ]
HTTP/FTP proxy URL? (e.g. 'http://proxy:8080', or 'none') [none]

Available mirrors:
1) dl-cdn.alpinelinux.org
...
19) http://mirror.aarnet.edu.au
...
36) mirrors.shu.edu.cn

r) Add random from the above list
f) Detect and add fastest mirror from above list
e) Edit /etc/apk/repositories with text editor

Enter mirror number (1-36) or URL to add (or r/f/e/done) [f]: 19
Added mirror mirror.aarnet.edu.au
Updating repository indexes... done.
Which SSH server? ('openssh', 'dropbear' or 'none') [openssh]
 * service sshd added to runlevel default
 * Caching service dependencies ... [ ok ]
ssh-keygen: generating new host keys: RSA DSA ECDSA ED25519
 * Starting sshd ... [ ok ]
Which NTP client to run? ('busybox', 'openntpd', 'chrony' or 'none') [chrony]
 * service chronyd added to runlevel default
 * Caching service dependencies ... [ ok ]
 * Starting chronyd ... [ ok ]
Available disks are:
  sda   (6001.2 GB ATA      WDC WD60EFRX-68L)
  sdb   (6001.2 GB ATA      WDC WD60EFRX-68L)
  dm-0  (10.7 GB  )
  dm-1  (2.1 GB  )
Which disk(s) would you like to use? (or '?' for help or 'none') [none]
Enter where to store configs ('floppy', 'usb' or 'none') [none]:
Enter apk cache directory (or '?' or 'none') [/var/cache/apk]: none
zfshost:~#

Work-a-round: If the above fails or You need to re-run setup-alpine for some reason, then You must do the following work-a-round; tear down and rise the Network manually before re-running the above setup-alpine script:

# ifdown eth0
# ifup eth0

Modules configuration¶

Confirm that the required modules (xen-pcifront, raid1 and LVM (dm-mod & dm-snapshot)) are in /etc/modules, if not, add them.

# vi /etc/modules
xen-pcifront
dm-mod
dm-snapshot
raid1

MDADM config¶

Save the RAID configuration to enable MDADM to load the proper configuration at boot time.

# mdadm --detail --scan >> /etc/mdadm.conf

Store filesystem¶

Time to install domU (zfshost) to the filesystem on /mnt (which points to lv_root)

We will use the -m (write system to disk), -r (RAID), and -L (LVM) parameters

# setup-disk -m sys -r -L /mnt
Installing system on /dev/vg_zfshost/lv_root:
/mnt/boot is device /dev/xvda1
100% ############################################==> initramfs: creating /boot/initramfs-lts
/boot is device /dev/xvda1
You might need fix the MBR to be able to boot

Update GRUB¶

We need to create a GRUB boot stanza

# mkdir /mnt/boot/grub
# cat > /mnt/boot/grub/grub.cfg
set timeout=2
set default=0
menuentry "alpine" {
    linux /boot/vmlinuz-lts modules=ext4 console=hvc0 root=/dev/vg_zfshost/lv_root
    initrd /boot/initramfs-lts
}
CTRL-D

Fix initfs¶

We need to make sure that the disks are accessable during early boot, hence the xen pci driver must be loaded by the initramfs.

1: Add the features xenpci, lvm and raid to mkinitfs.conf, if they are not there. In below example I just added them last on the default list.

# vi /mnt/etc/mkinitfs/mkinitfs.conf
features="ata base ide scsi usb virtio ext4 xenpci lvm raid"

2: Re-generate the initramfs

# mkinitfs -c /mnt/etc/mkinitfs/mkinitfs.conf -b /mnt `uname -r`

Time to halt¶

Time to halt this newly installed system, and go back to dom0 for some changes.

# halt

Back to dom0¶

Fix dom0's domU config file¶

First we need to add the kernel for domU (grub-x86_64-xen.bin), OK not really a kernel but a bootloader compiled to be loadable as a kernel

dom0 # apk add grub-xenhost

then we need to update the domU configuration file to use the newly added kernel, as well as remove the cdrom.

dom0 # cat > /etc/xen/zfshost.cfg
####
#### zfshost domU
####
vcpus       = '1'
memory      = '8096'
maxmem      = '8096'
kernel      = "/usr/lib/grub-xen/grub-x86_64-xen.bin"
disk        = [
                  'phy:/dev/vg_domU/zfshost-boot,xvda1,w',
              ]
name        = 'zfshost'
## ENSURE MAC ADDRESS IS UNIQ!!!
vif         = [ 'mac=<Unique MAC Address>,bridge=br0' ]
on_poweroff = 'destroy'
on_reboot   = 'restart'
on_crash    = 'restart'
pci         = [ '00:1f.2' ]
CTRL-D
dom0 #

And lastly we need to make these changes restart safe

dom0 # lbu commit

Start domU¶

Finally time to start the newly created domU, and see if it all works.

dom0 # xl create /etc/xen/zfshost.cfg -c

Add udev¶

Add udev, so we get proper disk names (UUID)

# apk add eudev zfs-udev
# setup-udev

Add normal user¶

As per normal security, we should not use the root account for normal operations, so we need to create a normal user

# adduser <username>

Add sudo¶

For security reasons, and good practice, lets install sudo, and add allow the just created user to use sudo. Use visudo to remove the comment marker from the line: #%wheel ALL=(ALL) ALL.

# apk add sudo
# visudo
# adduser <username> wheel

Set minimum free memory¶

To avoid running out of memory during high data volume transfers, we specify how much minimum memory should be.

# vi /etc/sysctl.d/local.conf
# Make sure ZFS does not take all memory when stressed
vm.min_free_kbytes = 128000

Reboot to confirm udev¶

Time to reboot again, to verify udev is working

# reboot

This might take a while (a few minutes), so be patient. You can check the progress from dom0 with the following command:

dom0 # xl list

and log back in again

dom0 # xl console zfshost

or (the command below should attach to the default session)

dom0 # tmux attach-session

Verify that your /dev/disk directory is populated

# ls -l /dev/disk
total 0
drwxr-xr-x    2 root     root           400 Jan  6 22:55 by-id
drwxr-xr-x    2 root     root            80 Jan  7 22:52 by-partuuid
drwxr-xr-x    2 root     root            60 Jan  6 22:55 by-uuid
#

Time to set started flag¶

Time now to create a started flag in Xen Store (xenstore), which dom0 can check for when it is deciding if it is time to start the other domU (zfshost starts first).

First we need to install Xen

# apk add xen

Then we add the script

# cat > /etc/init.d/zfs-ok-informdom0
#!/sbin/openrc-run
description="Add a flag (1) to xenstore which can be read by dom0 to determine if zfshost is running or not"
depend()
{
        after syslog xendriverdomain
        before zfs-share
}
start()
{
        ebegin "Inform dom0"
        xenstore-write /local/domain/`xenstore-read domid`/data/storage-online 1
        eend $? "Failed to inform dom0"
}
CTRL-D

# chmod a+x /etc/init.d/zfs-ok-informdom0
# rc-update add zfs-ok-informdom0

Fix autostart of domU¶

Time to fix so that this domU is automatically started on reboot.

Lets stop domU!

# halt
.
.
.
dom0 # xl list

Repeat the xl list command above until the domU has gone from the list. It might take a while (a few minutes) before the domU is gone.

And on the dom0 we create the auto start link, remember, do not forget to enter the lbu commit command.

dom0 # ln -s /etc/xen/zfshost.cfg /etc/xen/auto
dom0 # rc-update add xendomains
dom0 # lbu commit

Lets not stop all domain in parallel, safer to stop them one by one, so we disable this option.

dom0 # vi /etc/conf.d/xendomains
PARALLEL_SHUTDOWN=no
#

Now we need to patch how the various domains are started and stopped, since the Storage Domain has to be started before all other domains, as well as stopped after all other has stopped.

First, we need to add (if they are missing) the two patched xendomains files to the lbu management. The files to be patched are:

  • /etc/init.d/xendomains
  • /etc/conf.d/xendomains

If these two files are not visible in the lbu ls command, then please add them with the lbu add command.

dom0 # lbu ls | grep xendomains
dom0 # lbu add /etc/init.d/xendomains
dom0 # lbu add /etc/conf.d/xendomains
dom0 # lbu commit

Edit the /etc/conf.d/xendomains file¶

Add the following lines to the end of /etc/conf.d/xendomains

dom0 # cat >> /etc/conf.d/xendomains

# If using a storage domain its name should be supplied. The storage
# domain will be started first and no other domains will start before it
# is fully online.
XENDOMAINS_STORAGE_DOM_NAME="zfshost"
CTRL-D

Fix the /etc/init.d/xendomains file to handle the storage domain correctly¶

Please download and apply this patch

xendomains-storage-domU.patch download

--- a/init.d/xendomains
+++ b/init.d/xendomains
@@ -120,6 +120,42 @@
 	esac
 }
 
+start_storage() {
+	einfo "Starting Xen storage domain from ${AUTODIR:=/etc/xen/auto}"
+
+	# Create storage domain.
+	want_usleep=
+	for dom in $(ls "${AUTODIR:=/etc/xen/auto}/"${XENDOMAINS_STORAGE_DOM_NAME}.cfg 2>/dev/null | sort); do
+		name=$(get_domname ${dom})
+		if ! is_running ${name} ; then
+			if [ -n "$want_usleep" ]; then
+				usleep ${XENDOMAINS_CREATE_USLEEP:=5000000}
+			else
+				want_usleep=1
+			fi
+			ebegin "  Starting domain ${name}"
+			$startdom "${name}" "${dom}"
+			eend $?
+		else
+			einfo "  Not starting domain ${name} - already running"
+		fi
+	done
+	#
+	# Lets wait until storage domain is fully running
+	# zfshost domain stores a 1 in data/storage-online when it is fully up.
+	#
+	# Sleep 5 to ensure we get a domain id
+	sleep 5
+
+	stor_dom=$(xl domid $XENDOMAINS_STORAGE_DOM_NAME)
+	einfo "Waiting for storage domain to come online (forever)"
+	until $(xenstore-exists /local/domain/${stor_dom}/data/storage-online)
+	do
+	    sleep 2
+	done
+	einfo "Done Xen Starting storage domain from ${AUTODIR:=/etc/xen/auto}"
+}
+
 start() {
 	set_dom_cmd
 	checkpath --directory --mode 755 /var/run/xen
@@ -127,6 +163,10 @@
 	einfo "Starting Xen domains from ${AUTODIR:=/etc/xen/auto}"
 
 	$initconsole
+	# If Storage Domain is definied, start this domain first.
+        if [ -n "$XENDOMAINS_STORAGE_DOM_NAME" ]; then
+                start_storage
+        fi
 
 	# Create all domains with config files in AUTODIR.
 	want_usleep=
@@ -157,11 +197,13 @@
 	if yesno "$PARALLEL_SHUTDOWN"; then
 		for dom in $DOMAINS ; do
 			name=$(get_domname ${dom})
-			if is_running ${name} ; then
-				ebegin "  Asking domain ${name} to shutdown in the background..."
-				xl shutdown -w ${name} >/dev/null &
-			else
-				einfo "  Not stopping domain ${name} - not running"
+			if [ "n${name}" != "n${XENDOMAINS_STORAGE_DOM_NAME}" ]; then
+				if is_running ${name} ; then
+					ebegin "  Asking domain ${name} to shutdown in the background..."
+					xl shutdown -w ${name} >/dev/null &
+				else
+					einfo "  Not stopping domain ${name} - not running"
+				fi
 			fi
 		done
 		einfo "  Waiting for shutdown of domains that are still running"
@@ -170,14 +212,27 @@
 	else
 		for dom in $DOMAINS ; do
 			name=$(get_domname ${dom})
-			if is_running ${name} ; then
-				ebegin "  Waiting for domain ${name} to shutdown"
-				xl shutdown -w ${name} >/dev/null
-				eend $?
-			else
-				einfo "  Not stopping domain ${name} - not running"
+			if [ "n${name}" != "n${XENDOMAINS_STORAGE_DOM_NAME}" ]; then
+				if is_running ${name} ; then
+					ebegin "  Waiting for domain ${name} to shutdown"
+					xl shutdown -w ${name} >/dev/null
+					eend $?
+				else
+					einfo "  Not stopping domain ${name} - not running"
+				fi
 			fi
 		done
+	fi
+
+	# If Storage Domain is definied, stop this domain last.
+	if [ -n "$XENDOMAINS_STORAGE_DOM_NAME" ]; then
+		if is_running ${XENDOMAINS_STORAGE_DOM_NAME} ; then
+			ebegin "  Waiting for storage domain ${XENDOMAINS_STORAGE_DOM_NAME} to shutdown"
+			xl shutdown -w ${XENDOMAINS_STORAGE_DOM_NAME} >/dev/null
+			eend $?
+		else
+			einfo "  Not stopping storage domain ${XENDOMAINS_STORAGE_DOM_NAME} - not running"
+		fi
 	fi
 
 	$closeconsole

Apply it in the dom0

dom0# cd /etc
dom0# patch -p1 < .../xendomains-storage-domU.patch

Store the LBU state and reboot to verify¶

And finally, execute the lbu commit command to make this reboot safe, and then reboot to verify that all is working as intended.

dom0 # lbu commit
dom0 # reboot

Then, after dom0 is up and running again, check that the zfshost domain is up and running.

dom0 # xl list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0  1024     2     r-----      17.3
zfshost                                      1  8096     1     -b----      10.6

ZFS¶

Time to change this plain Alpine domU to a proper ZFS file server.

Start the console for this zfshost.

dom0 # xl console zfshost

Or use the tmux attach-sessioncommand.

Add packages¶

Add the ZFS packages to the domU domain. Also enable automatic start on boot.

# apk add parted
# apk add zfs zfs-$(uname -r | rev | cut -d'-' -f1 | rev)
...
# modprobe zfs
# lsmod | grep zfs
zfs                  3760128  0
zunicode              335872  1 zfs
zlua                  176128  1 zfs
zcommon                90112  1 zfs
znvpair                94208  2 zfs,zcommon
zavl                   16384  1 zfs
icp                   311296  1 zfs
spl                   122880  5 zfs,icp,znvpair,zcommon,zavl

NOTE: If Linux was updated in the apk add ... command, then a reboot is necessary before doing the modprobe zfs command (because the files in the directory /lib/modules has been changed.

Option A: Create a pool called tank and add a mirrored SLOG¶

NOTE If You are not using SLOG then go to next chapter.

Lets add the tank pool (the tank is the general pool for ZFS volumes).

Just a recap, we have the following disks and partitions to play with (extract from dmesg, and fdisk -l)

dmesg | grep logical\ blocks
...
[    5.242266] sd 0:0:0:0: [sda] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB)
[    5.718972] sd 1:0:0:0: [sdb] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB)
...
fdisk -l 2>/dev/null | grep "^/"
...
/dev/sdc2    1023,254,63 1023,254,63   58605120   78156224   19551105 9546M da Unknown
/dev/sdd2    1023,254,63 1023,254,63   58605120   78156224   19551105 9546M da Unknown
...

OK, now we need to check what these disk devices are called in the /dev/disk/by-id directory.

# ls -l /dev/disk/by-id | awk '{print $(NF-2), $(NF-1), $NF}'
total 0 total 0
ata-SAMSUNG_MZ7KM120HAFD-00005_S2HPNX0HB00214 -> ../../sdd
ata-SAMSUNG_MZ7KM120HAFD-00005_S2HPNX0HB00214-part1 -> ../../sdd1
ata-SAMSUNG_MZ7KM120HAFD-00005_S2HPNX0HB00214-part2 -> ../../sdd2
ata-SAMSUNG_MZ7KM120HAFD-00005_S2HPNX0HB00793 -> ../../sdc
ata-SAMSUNG_MZ7KM120HAFD-00005_S2HPNX0HB00793-part1 -> ../../sdc1
ata-SAMSUNG_MZ7KM120HAFD-00005_S2HPNX0HB00793-part2 -> ../../sdc2
ata-WDC_WD60EFRX-68L0BN1_WD-WX11D28H9KVT -> ../../sda
ata-WDC_WD60EFRX-68L0BN1_WD-WX11D28H9KVT-part1 -> ../../sda1
ata-WDC_WD60EFRX-68L0BN1_WD-WX11D28H9KVT-part9 -> ../../sda9
ata-WDC_WD60EFRX-68L0BN1_WD-WX21D48FDP8H -> ../../sdb
ata-WDC_WD60EFRX-68L0BN1_WD-WX21D48FDP8H-part1 -> ../../sdb1
ata-WDC_WD60EFRX-68L0BN1_WD-WX21D48FDP8H-part9 -> ../../sdb9
wwn-0x50014ee265419796 -> ../../sda
wwn-0x50014ee265419796-part1 -> ../../sda1
wwn-0x50014ee265419796-part9 -> ../../sda9
wwn-0x50014ee2bae31e07 -> ../../sdb
wwn-0x50014ee2bae31e07-part1 -> ../../sdb1
wwn-0x50014ee2bae31e07-part9 -> ../../sdb9
wwn-0x5002538c4045ab3f -> ../../sdd
wwn-0x5002538c4045ab3f-part1 -> ../../sdd1
wwn-0x5002538c4045ab3f-part2 -> ../../sdd2
wwn-0x5002538c4045aecc -> ../../sdc
wwn-0x5002538c4045aecc-part1 -> ../../sdc1
wwn-0x5002538c4045aecc-part2 -> ../../sdc2

Using the disk /dev/disk/by-id's makes life much easier if you need to replace a disk, or move the disk around a bit.

NOTE: The option -o ashift=12 below is for disks with physical sector size 4096 bytes (2^12 = 4096), which are almost all modern disks. If Your disks has the physical sector size 512 bytes (2^9 = 512, older disks) then should the argument to ashift be 9 instead of 12. One way to check the physical sector size of Your disks is with the following command:

# parted --list 2>/dev/null | egrep "^Disk /|^Sector"
...
Disk /dev/sdc: 3001GB
Sector size (logical/physical): 512B/4096B
Disk /dev/sdd: 3001GB
Sector size (logical/physical): 512B/4096B
Disk /dev/sde: 3001GB
Sector size (logical/physical): 512B/4096B
...
#

Here the 4096B (at the end of lines starting with Sector) shows that these disks have physical sector size of 4096 bytes. (If any line ends with 512B/512B then the physical sector size are 512 bytes for those disks...)

Now, let's create the tank pool, and it should be on the sda and the sdb disks. (Yes, the whole disk! ZFS is capable of handling this correctly.)

# zpool create -o ashift=12 tank mirror \
ata-WDC_WD60EFRX-68L0BN1_WD-WX11D28H9KVT \
ata-WDC_WD60EFRX-68L0BN1_WD-WX21D48FDP8H

and the SLOG should be on the SLOG partition sdc2 and sdd2 (which should be fast SSD disks...).

# zpool add tank log mirror \
ata-SAMSUNG_MZ7KM120HAFD-00005_S2HPNX0HB00214-part2 \
ata-SAMSUNG_MZ7KM120HAFD-00005_S2HPNX0HB00793-part2

Option B: Create the tank pool without the SLOG¶

Just do, for each wanted ZFS pool, as in the previous section, but do not do the last command (zpool add tank log mirror ...)

ZFS Cron job¶

Add cron job for automatic check and fix bit rot.

# cat > /etc/periodic/weekly/zfs_scrub
#!/bin/sh
/usr/sbin/zpool scrub tank
CTRL-D
# chmod a+x /etc/periodic/weekly/zfs_scrub

ZFS boot services¶

We need to configure ZFS to automatically mount tank on boot.

# rc-update add zfs-import boot
# rc-update add zfs-mount
# rc-update add zfs-zed
# rc-update add zfs-share

ZFS Basic Config¶

Set the Tank pool to autoexpand=on, in case larger disks are added in future

# zpool set autoexpand=on tank

Set default compression on top level (sub levels will inherit)

# zfs set compression=lz4 tank

To see the status do the following

# zpool get autoexpand
NAME          PROPERTY    VALUE   SOURCE
tank          autoexpand  on      local
# zfs get compression
NAME          PROPERTY     VALUE     SOURCE
tank          compression  lz4       local

ZFS Xen Related¶

Create a top dir for Xen domU storage, for example we use the name xen here.

# zfs create tank/xen

And we also need to be able to use zfshost disks as disks for other domUs. To enable this we need xendriverdomain. Start xendriverdomain manually, as well as configure automatic start on boot.

# service xendriverdomain start
# rc-update add xendriverdomain

Update Your system¶

Get Your system into sync with possible changes in the Alpine repository

# apk update
fetch http://ftp.acc.umu.se/mirror/alpinelinux.org/v3.11/main/x86_64/APKINDEX.tar.gz
v3.11.2-51-g7cf8ea7952 [http://ftp.acc.umu.se/mirror/alpinelinux.org/v3.11/main]
OK: 5371 distinct packages available
# apk upgrade
OK: 642 MiB in 204 packages
#

Add swap to the zfshost¶

There seems to be quite a few trouble reports on swap on zvol for zfs on linux. So we better avoid using a zvol for swap for the time being. It seems stable on FreeBSD but they recommend a much "simpler" setup than most zfs on linux related howtos for swap on zvol.

NOTE: For the swap, we need to make the block size match the VM's system page size, which You can find with the command getconf PAGESIZE (to be used with the -b option below). Then we need to disable automatic snapshots for the swap (com.sun:auto-snapshot=false).

The FreeBSD way

zfs create -V 2G -o org.freebsd:swap=on -o checksum=off -o compression=off -o dedup=off -o sync=disabled -o primarycache=none <pool name>/swap

For Linux and our case that would be

zfs create -V 8G -b 4k -o com.sun:auto-snapshot=false -o checksum=off -o compression=off -o dedup=off -o sync=disabled -o primarycache=none <pool name>/zfshost-swap

But using checksum=off totally takes away the only reason to put swap on zfs in the first place, bit rot protecton. So I would suggest to not turn of checksums.

We need to put the swap on the /tank/xen/zfshost-swap area

Then to complete the setup

# mkswap -f /dev/zvol/tank/xen/zfshost-swap
Setting up swapspace version 1, size = 8 GiB (8589930496 bytes)
no label, UUID=e84d527d-5bbb-4075-a377-6c7258c24633

# echo "/dev/zvol/tank/xen/zfshost-swap none swap sw,discard 0 0" >> /etc/fstab

# swapon -a

# rc-update add swap boot

# cat /proc/swaps 
Filename                  Type            Size    Used    Priority
/dev/zd0                  partition       8388604 0       -2

# cat /proc/meminfo 
MemTotal:        8112416 kB
MemFree:         7962448 kB
...
SwapTotal:       8388604 kB
SwapFree:        8388604 kB
Dirty:                 8 kB
...
#

NOTE But for maximum stability during memory pressure its probably much more vise to put swap on a lvm lv.

# lvcreate -n zfshost-swap -L 8G vg_domU

and the rest like above but /dev/vg_domU/zfshost-swap instead of /dev/zvol/tank/xen/zfshost-swap.

ZFS volumes for a new domU¶

When we want to create a new domU we will need to do a bit of work on both zfshost and on the dom0

On the zfshost¶

On zfshost we will need to create the relevant domU disks. (We are using the domU DNS, which will provide the DNS service, as an example of a domU.)

NOTE: For the swap, we need to make the block size match the VM's system page size, which You can find with the command getconf PAGESIZE (to be used with the -b option below). Then we need to disable automatic snapshots for the swap (com.sun:auto-snapshot=false), and set the following other attributes according to https://github.com/zfsonlinux/zfs/wiki/FAQ#using-a-zvol-for-a-swap-device.

# getconf PAGESIZE
4096
# zfs create -V 2G tank/xen/<domU-Service>-disk
# zfs create -V 512M -b 4k \
    -o logbias=throughput \
    -o sync=always \
    -o primarycache=metadata \
    -o com.sun:auto-snapshot=false \
    tank/xen/<domU-Service>-swap

As with the DNS Server as an example;

# getconf PAGESIZE
4096
# zfs create -V 2G tank/xen/dns-disk
# zfs create -V 512M -b 4k \
    -o logbias=throughput \
    -o sync=always \
    -o primarycache=metadata \
    -o com.sun:auto-snapshot=false \
    tank/xen/dns-swap

On the dom0¶

On the dom0 we need to use the backend parameter to indicate where the disks are located.

Apart from the backend parameter, the rest of the domU configuration file is the same as for a normal domU installation.

domU # grep backend /etc/xen/dnshost.cfg
              'backend=zfshost,phy:/dev/zvol/tank/xen/<domU-Service>-disk,xvda1,w',
              'backend=zfshost,phy:/dev/zvol/tank/xen/<domU-Service>-swap,xvda2,w',

Or as with the dnshost as an example

domU # grep backend /etc/xen/dnshost.cfg
              'backend=zfshost,phy:/dev/zvol/tank/xen/dns-disk,xvda1,w',
              'backend=zfshost,phy:/dev/zvol/tank/xen/dns-swap,xvda2,w',

Useful commands¶

Below are some simple useful commands to check the zfs status

# zpool status
# zpool history
# zpool events
# zpool list
# zfs list

  • « NitroKey HSM in Alpine
  • Alpine v3.15 Linux as a XEN dom0 from a USB stick »

Published

Jan 10, 2021

Authors

bengt pekka henrik

Category

HOWTOs

Tags

  • Alpine 15
  • XEN 21
  • ZFS 2

Social

  • atom feed
  • rss feed
  • ipv6 ready
  • Powered by Pelican. Theme: Elegant by Talha Mansoor