Alpine Linux as a Xen Storage Driver Domain¶
Guide to configure an Xen Storage Driver Domain based on Alpine 3.11, the Dom0 is setup like Alpine Dom0 V3.8 but later upgraded in accordance with Alpine dom0 upgrade.
The aim of this domU is to serve ZFS at the end, hence the name of the domU is zfshost
To do this I have the following extra hardware
- SSD Root/SLOG Samsung SM863 120GB 2.5 inch 7mm SSD, 2 pieces so I can mirror them.
- HDD Tank WD Red 6TB NAS HDD 3.5" 6Gb/s Intellipower WD60EFRX, 2 pieces so I can mirror them
On the SSD disks I will partition it as below
- 30GB for Raid and LVM. Will contain Root for
zfshost
(and some unused space in the LVM VG). - 10GB unmarked. Will contain SLOG
- Rest will be un-partitioned for increased endurance, and future use.
The HDD disks will not be partitioned at all. ZFS will handle them straight up.
The end result will have /boot
mounted from dom0 (virtual disk), while rest from the local domU disks.
Mount Point | "From where" | Disk |
---|---|---|
/boot |
dom0 USB-Stick | LVM LV zfshost-boot in vg_domU (in dom0) |
/ |
domU SSD | LVM LV lv_root in vg_zfshost |
SLOG | domU SSD | Two raw partitions |
zfs pool disks | domU HDD | Whole disks |
Various references¶
Here are some various references I have been looking at
- HP Microserver Gen8
- Alpine dom0
- Alpine domU
- XEN Storage Domain Driver
- ZFS On Linux
- ZFS Administration Intent Log
- PCI Passthrough
- Installing ZFS and setting Pool
- ZFS Compression
- Creating a ZFS File System Hierarchy
Dom0 work¶
In Alpine Dom0 V3.8 we created a very basic Xen dom0 server, which was only prepared for it's domU guests. Now we need to add the specific parts related to this ZFS domU, mainly
- Virtual boot disk
- domU configuration file
- PCI Pass-through so we can access the required domU disks (SDD and HDD)
domU's boot disk¶
We need to create, and prepare the disk for the domU
dom0 # lvcreate -n zfshost-boot -L 512M vg_domU dom0 # apk add e2fsprogs dom0 # mkfs.ext4 /dev/vg_domU/zfshost-boot
dom0 BIOS¶
We will use ZFS and mdadm
, and let these systems handle the RAID part, hence we need to de-activate hardware based RAID in the BIOS, as well as enable AHCI mode.
- Disable hardware based RAID in BIOS
- Enable AHCI mode in BIOS
PCI Passthru¶
Now we need to find out which PCI device our SSD disks are connected to. The default lspci
application (part of busybox
) will not provide enough information, so we need to install a more feature rich implemenation. In my case, I will look for the hotplug SATA devices. If you had an PCI board, look for what chip-set you have, and search for this in the lspci
output
dom0 # apk add pciutils dom0 # lspci -k ... 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port Desktop SATA AHCI Controller (rev 05) Subsystem: Hewlett-Packard Company Device 330d Kernel driver in use: pciback ... dom0 #
Verify xen_pciback
is in /etc/modules
, if not, add it.
dom0 # grep xen_pciback /etc/modules
Confirm that the module xen_pciback
is loaded
dom0 # modprobe xen_pciback
domU configuration file¶
Ok, time to create the domU configuration file.
- Observe that the MAC Address has to be uniq among the dom0 and all domUs. A tool to help you with this might be random_mac.py, as an example.
- Make sure that you specify the correct PCI device to pass through to domU.
- The
cdrom
points to the installer image which was prepared in the dom0 installation
If you do it manually, please start the MAC Address with 00:16:3E
followed by a unique combination for you.
For instance 00:16:3e:AA:AA:01
or 00:16:3e:BE:EF:01
or something similar unique for Your Network
dom0 # cat > /etc/xen/zfshost.cfg ##### ##### zfshost domU ##### vcpus = '1' memory = '8096' maxmem = '8096' kernel = "/domU_installer/vmlinuz-lts" ramdisk = "/domU_installer/initramfs-lts" extra = "alpine_dev=hdc:iso9660 modules=loop,squashfs,sd-mod,usb-storage console=hvc0" disk = [ 'file://domU_installer/alpine-extended-3.11.2-x86_64.iso,hdc:cdrom,r', 'phy:/dev/vg_domU/zfshost-boot,xvda1,w', ] name = 'zfshost' ## ENSURE MAC ADDRESS IS UNIQ!!! vif = [ 'mac=<Unique MAC Address>,bridge=br0' ] on_poweroff = 'destroy' on_reboot = 'restart' on_crash = 'restart' pci = [ '00:1f.2' ] CTRL-D dom0 #
If You have several PCI Devices You want to Passthru to the domU then should the line pci = [ '00:1f.2' ]
above be changed to pci = [ '00:1f.2', 'XX:YY.C', 'XX:YY.C', ... ]
, where the XX:YY.C
has to be changed to the proper PCI bus adresses.
Handover PCI Device to dom0 xen-pciback
module¶
To configure automatic PCI handover on every reboot, we need to modify /etc/conf.d/xen_pci
and add your device to the list of devices.
dom0 # vi /etc/conf.d/xen-pci ... DEVICES="00:1f.2" ...
If You have several PCI devices then the DEVICES
line should look like below:
DEVICES="00:1f.2 XX:YY.C ..."
And last add it to rc-update
so it is executed on reboot.
dom0 # rc-update add xen-pci * service xen-pci added to runlevel default dom0 # lbu commit dom0 #
Verify PCI Passthru¶
Reboot dom0 to verify that the PCI Passthru is working.
dom0 # reboot
And when system is up and running, verify
dom0 # xl pci-assignable-list 0000:00:1f.2
Start domU¶
First confirm that the installation directory is mounted.
dom0 # mount /domU_installer
Then it is time to start the installation, to do this we simply start the domU
dom0 # xl create /etc/xen/zfshost.cfg -c
To get back to the dom0 environment from the domU console, you press CTRL+]
domU work¶
Disks visible?¶
Did the PCI Passthru work? Lets check
# dmesg | grep logical\ blocks [ 5.242266] sd 0:0:0:0: [sda] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB) [ 5.718972] sd 1:0:0:0: [sdb] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB) [ 6.194013] sd 2:0:0:0: [sdc] 234441648 512-byte logical blocks: (120 GB/112 GiB) [ 6.667206] sd 3:0:0:0: [sdd] 234441648 512-byte logical blocks: (120 GB/112 GiB)
Yes, in my case it the PCI Passthru worked just fine.
sdc
: (Optional) This disk will have a RAID, and a SLOG partition created at this stage.sdd
: (Optional) This disk will have a RAID, and a SLOG partition created at this stage.sdc
&sdd
RAID partitions are mirrored, with LVM on top, which will contain the root filesystem (/
)sda
&sdb
will not be touched at this stage. (Data disks for ZFS)
Partition the disks¶
Partitioning of the disks (sdc
and sdd
) is done using fdisk
for instance. These disks will contain the root volume under RAID/LVM control in the first partition, while the second partition will contain the SLOG under ZFS control.
# fdisk /dev/sdc # fdisk /dev/sdd
The result should be something like this. Best to align with 4096 sectors.
Device Boot StartCHS EndCHS StartLBA EndLBA Sectors Size Id Type /dev/sdc1 0,65,2 1023,254,63 4096 70332415 70328320 33.5G da Unknown /dev/sdc2 1023,254,63 1023,254,63 70332416 93775871 23443456 11.1G da Unknown /dev/sdd1 0,65,2 1023,254,63 4096 70332415 70328320 33.5G da Unknown /dev/sdd2 1023,254,63 1023,254,63 70332416 93775871 23443456 11.1G da Unknown
We use partition type da
(non-fs data) as this is the type recommended to be used with mdadm Partition_Types and it also works
very good to use this for the SLOG partition.
RAID and LVM configuration¶
We are missing the MD RAID and LVM packages, so lets install them
# apk add mdadm lvm2
and create the raid device
# mdadm --zero-superblock /dev/sdc1 # mdadm --zero-superblock /dev/sdd1 # mdadm --create --bitmap=internal md0 --level=1 --raid-devices=2 /dev/sdc1 /dev/sdd1
If you want to see the result the mirroring or follow the synchronization, check with /proc/mdstat
# cat /proc/mdstat
If it takes a long time, and you want to see it progress...
# watch cat /proc/mdstat
and lets create the new LVM devices for this domU. In case if this is not the first attempt, just answer Y on a confirmation question.
# pvcreate -ff -y /dev/md/md0 # vgcreate vg_zfshost /dev/md/md0 # lvcreate -y -n lv_root -L 10G vg_zfshost # apk add e2fsprogs # mkfs.ext4 /dev/vg_zfshost/lv_root
Mountpoints etc¶
Time to configure the mountpoints for root and boot, as well as mount them. We will put them under /mnt
for the installation process.
Boot (/mnt/boot
) will be mounted from a virtual disk provided by dom0. One good advantage with this is that you can look at (read troubleshoot) the domU's boot disk while you are in dom0.
Root will mounted from the domU's local SSD virtual disk.
# mount /dev/vg_zfshost/lv_root /mnt # mkdir /mnt/boot # mount /dev/xvda1 /mnt/boot
Running the setup-alpine
¶
Finally, time to configure (setup) the actual alpine part
Key things to remember
- Answer
none
on last questions (Disks, config, and apk repository) - Which disk(s) would you like to use? (or '?' for help or 'none') [none]
- Enter where to store configs ('floppy', 'usb' or 'none') [none]:
- Enter apk cache directory (or '?' or 'none') [
/var/cache/apk
]:none
# setup-alpine Available keyboard layouts: af be cn fi hu jp lt my ro tj al bg cz fo ie ke lv ng rs tm am br de fr il kg ma nl ru tr ara brai dk gb in kr md no se tw at by dz ge iq kz me ph si ua az ca ee gh ir la mk pk sk us ba ch epo gr is latam ml pl sy uz bd cm es hr it lk mt pt th Select keyboard layout [none]: us Available variants: us-alt-intl us-altgr-intl us-chr us-colemak us-dvorak-alt-intl us-dvorak-classic us-dvorak-intl us-dvorak-l us-dvorak-r us-dvorak us-dvp us-euro us-hbs us-intl us-mac us-olpc2 us-rus us-workman-intl us-workman us Select variant []: us * Caching service dependencies ... [ ok ] * Setting keymap ... [ ok ] Enter system hostname (short form, e.g. 'foo') [localhost]: zfshost Available interfaces are: eth0. Enter '?' for help on bridges, bonding and vlans. Which one do you want to initialize? (or '?' or 'done') [eth0] Ip address for eth0? (or 'dhcp', 'none', '?') [dhcp] 192.168.1.19/24 Gateway? (or 'none') [none] 192.168.1.1 Configuration for eth0: type=static address=192.168.1.19 netmask=255.255.255.0 gateway=192.168.1.1 Do you want to do any manual network configuration? [no] DNS domain name? (e.g 'bar.com') [] example.com DNS nameserver(s)? [] 8.8.8.8 Changing password for root New password: Retype password: passwd: password for root changed by root Which timezone are you in? ('?' for list) [UTC] Australia/Melbourne * Starting busybox acpid ... [ ok ] * Starting busybox crond ... [ ok ] HTTP/FTP proxy URL? (e.g. 'http://proxy:8080', or 'none') [none] Available mirrors: 1) dl-cdn.alpinelinux.org ... 19) http://mirror.aarnet.edu.au ... 36) mirrors.shu.edu.cn r) Add random from the above list f) Detect and add fastest mirror from above list e) Edit /etc/apk/repositories with text editor Enter mirror number (1-36) or URL to add (or r/f/e/done) [f]: 19 Added mirror mirror.aarnet.edu.au Updating repository indexes... done. Which SSH server? ('openssh', 'dropbear' or 'none') [openssh] * service sshd added to runlevel default * Caching service dependencies ... [ ok ] ssh-keygen: generating new host keys: RSA DSA ECDSA ED25519 * Starting sshd ... [ ok ] Which NTP client to run? ('busybox', 'openntpd', 'chrony' or 'none') [chrony] * service chronyd added to runlevel default * Caching service dependencies ... [ ok ] * Starting chronyd ... [ ok ] Available disks are: sda (6001.2 GB ATA WDC WD60EFRX-68L) sdb (6001.2 GB ATA WDC WD60EFRX-68L) dm-0 (10.7 GB ) dm-1 (2.1 GB ) Which disk(s) would you like to use? (or '?' for help or 'none') [none] Enter where to store configs ('floppy', 'usb' or 'none') [none]: Enter apk cache directory (or '?' or 'none') [/var/cache/apk]: none zfshost:~#
Work-a-round: If the above fails or You need to re-run setup-alpine
for some reason, then You must do the following work-a-round; tear down and rise the Network manually before re-running the above setup-alpine
script:
# ifdown eth0 # ifup eth0
Modules configuration¶
Confirm that the required modules (xen-pcifront
, raid1
and LVM (dm-mod
& dm-snapshot
)) are in /etc/modules
, if not, add them.
# vi /etc/modules xen-pcifront dm-mod dm-snapshot raid1
MDADM config¶
Save the RAID configuration to enable MDADM to load the proper configuration at boot time.
# mdadm --detail --scan >> /etc/mdadm.conf
Store filesystem¶
Time to install domU (zfshost
) to the filesystem on /mnt
(which points to lv_root
)
We will use the -m
(write system to disk), -r
(RAID), and -L
(LVM) parameters
# setup-disk -m sys -r -L /mnt Installing system on /dev/vg_zfshost/lv_root: /mnt/boot is device /dev/xvda1 100% ############################################==> initramfs: creating /boot/initramfs-lts /boot is device /dev/xvda1 You might need fix the MBR to be able to boot
Update GRUB¶
We need to create a GRUB boot stanza
# mkdir /mnt/boot/grub # cat > /mnt/boot/grub/grub.cfg set timeout=2 set default=0 menuentry "alpine" { linux /boot/vmlinuz-lts modules=ext4 console=hvc0 root=/dev/vg_zfshost/lv_root initrd /boot/initramfs-lts } CTRL-D
Fix initfs
¶
We need to make sure that the disks are accessable during early boot, hence the xen pci driver must be loaded by the initramfs.
1: Add the features xenpci
, lvm
and raid
to mkinitfs.conf
, if they are not there. In below example I just added them last on the default list.
# vi /mnt/etc/mkinitfs/mkinitfs.conf features="ata base ide scsi usb virtio ext4 xenpci lvm raid"
2: Re-generate the initramfs
# mkinitfs -c /mnt/etc/mkinitfs/mkinitfs.conf -b /mnt `uname -r`
Time to halt¶
Time to halt this newly installed system, and go back to dom0 for some changes.
# halt
Back to dom0¶
Fix dom0's domU config file¶
First we need to add the kernel for domU (grub-x86_64-xen.bin
), OK not really a kernel but a bootloader compiled to be loadable as a kernel
dom0 # apk add grub-xenhost
then we need to update the domU configuration file to use the newly added kernel, as well as remove the cdrom
.
dom0 # cat > /etc/xen/zfshost.cfg #### #### zfshost domU #### vcpus = '1' memory = '8096' maxmem = '8096' kernel = "/usr/lib/grub-xen/grub-x86_64-xen.bin" disk = [ 'phy:/dev/vg_domU/zfshost-boot,xvda1,w', ] name = 'zfshost' ## ENSURE MAC ADDRESS IS UNIQ!!! vif = [ 'mac=<Unique MAC Address>,bridge=br0' ] on_poweroff = 'destroy' on_reboot = 'restart' on_crash = 'restart' pci = [ '00:1f.2' ] CTRL-D dom0 #
And lastly we need to make these changes restart safe
dom0 # lbu commit
Start domU¶
Finally time to start the newly created domU, and see if it all works.
dom0 # xl create /etc/xen/zfshost.cfg -c
Add udev
¶
Add udev
, so we get proper disk names (UUID)
# apk add eudev zfs-udev # setup-udev
Add normal user¶
As per normal security, we should not use the root account for normal operations, so we need to create a normal user
# adduser <username>
Add sudo¶
For security reasons, and good practice, lets install sudo,
and add allow the just created user to use sudo
. Use visudo
to remove the comment marker from the line: #%wheel ALL=(ALL) ALL
.
# apk add sudo # visudo # adduser <username> wheel
Set minimum free memory¶
To avoid running out of memory during high data volume transfers, we specify how much minimum memory should be.
# vi /etc/sysctl.d/local.conf # Make sure ZFS does not take all memory when stressed vm.min_free_kbytes = 128000
Reboot to confirm udev
¶
Time to reboot again, to verify udev
is working
# reboot
This might take a while (a few minutes), so be patient. You can check the progress from dom0 with the following command:
dom0 # xl list
and log back in again
dom0 # xl console zfshost
or (the command below should attach to the default session)
dom0 # tmux attach-session
Verify that your /dev/disk
directory is populated
# ls -l /dev/disk total 0 drwxr-xr-x 2 root root 400 Jan 6 22:55 by-id drwxr-xr-x 2 root root 80 Jan 7 22:52 by-partuuid drwxr-xr-x 2 root root 60 Jan 6 22:55 by-uuid #
Time to set started flag¶
Time now to create a started flag in Xen Store (xenstore
), which dom0 can check for when it is deciding if it is time to start the other domU (zfshost
starts first).
First we need to install Xen
# apk add xen
Then we add the script
# cat > /etc/init.d/zfs-ok-informdom0 #!/sbin/openrc-run description="Add a flag (1) to xenstore which can be read by dom0 to determine if zfshost is running or not" depend() { after syslog xendriverdomain before zfs-share } start() { ebegin "Inform dom0" xenstore-write /local/domain/`xenstore-read domid`/data/storage-online 1 eend $? "Failed to inform dom0" } CTRL-D # chmod a+x /etc/init.d/zfs-ok-informdom0 # rc-update add zfs-ok-informdom0
Fix autostart of domU¶
Time to fix so that this domU is automatically started on reboot.
Lets stop domU!
# halt . . . dom0 # xl list
Repeat the xl list
command above until the domU has gone from the list. It might take a while (a few minutes) before the domU is gone.
And on the dom0 we create the auto start link, remember, do not forget to enter the lbu commit
command.
dom0 # ln -s /etc/xen/zfshost.cfg /etc/xen/auto dom0 # rc-update add xendomains dom0 # lbu commit
Lets not stop all domain in parallel, safer to stop them one by one, so we disable this option.
dom0 # vi /etc/conf.d/xendomains PARALLEL_SHUTDOWN=no #
Now we need to patch how the various domains are started and stopped, since the Storage Domain has to be started before all other domains, as well as stopped after all other has stopped.
First, we need to add (if they are missing) the two patched xendomains
files to the lbu
management. The files to be patched are:
/etc/init.d/xendomains
/etc/conf.d/xendomains
If these two files are not visible in the lbu ls
command, then please add them with the lbu add
command.
dom0 # lbu ls | grep xendomains dom0 # lbu add /etc/init.d/xendomains dom0 # lbu add /etc/conf.d/xendomains dom0 # lbu commit
Edit the /etc/conf.d/xendomains
file¶
Add the following lines to the end of /etc/conf.d/xendomains
dom0 # cat >> /etc/conf.d/xendomains # If using a storage domain its name should be supplied. The storage # domain will be started first and no other domains will start before it # is fully online. XENDOMAINS_STORAGE_DOM_NAME="zfshost" CTRL-D
Fix the /etc/init.d/xendomains
file to handle the storage domain correctly¶
Please download and apply this patch
--- a/init.d/xendomains +++ b/init.d/xendomains @@ -120,6 +120,42 @@ esac } +start_storage() { + einfo "Starting Xen storage domain from ${AUTODIR:=/etc/xen/auto}" + + # Create storage domain. + want_usleep= + for dom in $(ls "${AUTODIR:=/etc/xen/auto}/"${XENDOMAINS_STORAGE_DOM_NAME}.cfg 2>/dev/null | sort); do + name=$(get_domname ${dom}) + if ! is_running ${name} ; then + if [ -n "$want_usleep" ]; then + usleep ${XENDOMAINS_CREATE_USLEEP:=5000000} + else + want_usleep=1 + fi + ebegin " Starting domain ${name}" + $startdom "${name}" "${dom}" + eend $? + else + einfo " Not starting domain ${name} - already running" + fi + done + # + # Lets wait until storage domain is fully running + # zfshost domain stores a 1 in data/storage-online when it is fully up. + # + # Sleep 5 to ensure we get a domain id + sleep 5 + + stor_dom=$(xl domid $XENDOMAINS_STORAGE_DOM_NAME) + einfo "Waiting for storage domain to come online (forever)" + until $(xenstore-exists /local/domain/${stor_dom}/data/storage-online) + do + sleep 2 + done + einfo "Done Xen Starting storage domain from ${AUTODIR:=/etc/xen/auto}" +} + start() { set_dom_cmd checkpath --directory --mode 755 /var/run/xen @@ -127,6 +163,10 @@ einfo "Starting Xen domains from ${AUTODIR:=/etc/xen/auto}" $initconsole + # If Storage Domain is definied, start this domain first. + if [ -n "$XENDOMAINS_STORAGE_DOM_NAME" ]; then + start_storage + fi # Create all domains with config files in AUTODIR. want_usleep= @@ -157,11 +197,13 @@ if yesno "$PARALLEL_SHUTDOWN"; then for dom in $DOMAINS ; do name=$(get_domname ${dom}) - if is_running ${name} ; then - ebegin " Asking domain ${name} to shutdown in the background..." - xl shutdown -w ${name} >/dev/null & - else - einfo " Not stopping domain ${name} - not running" + if [ "n${name}" != "n${XENDOMAINS_STORAGE_DOM_NAME}" ]; then + if is_running ${name} ; then + ebegin " Asking domain ${name} to shutdown in the background..." + xl shutdown -w ${name} >/dev/null & + else + einfo " Not stopping domain ${name} - not running" + fi fi done einfo " Waiting for shutdown of domains that are still running" @@ -170,14 +212,27 @@ else for dom in $DOMAINS ; do name=$(get_domname ${dom}) - if is_running ${name} ; then - ebegin " Waiting for domain ${name} to shutdown" - xl shutdown -w ${name} >/dev/null - eend $? - else - einfo " Not stopping domain ${name} - not running" + if [ "n${name}" != "n${XENDOMAINS_STORAGE_DOM_NAME}" ]; then + if is_running ${name} ; then + ebegin " Waiting for domain ${name} to shutdown" + xl shutdown -w ${name} >/dev/null + eend $? + else + einfo " Not stopping domain ${name} - not running" + fi fi done + fi + + # If Storage Domain is definied, stop this domain last. + if [ -n "$XENDOMAINS_STORAGE_DOM_NAME" ]; then + if is_running ${XENDOMAINS_STORAGE_DOM_NAME} ; then + ebegin " Waiting for storage domain ${XENDOMAINS_STORAGE_DOM_NAME} to shutdown" + xl shutdown -w ${XENDOMAINS_STORAGE_DOM_NAME} >/dev/null + eend $? + else + einfo " Not stopping storage domain ${XENDOMAINS_STORAGE_DOM_NAME} - not running" + fi fi $closeconsole
Apply it in the dom0
dom0# cd /etc dom0# patch -p1 < .../xendomains-storage-domU.patch
Store the LBU state and reboot to verify¶
And finally, execute the lbu commit
command to make this reboot safe, and then reboot
to verify that all is working as intended.
dom0 # lbu commit dom0 # reboot
Then, after dom0 is up and running again, check that the zfshost
domain is up and running.
dom0 # xl list Name ID Mem VCPUs State Time(s) Domain-0 0 1024 2 r----- 17.3 zfshost 1 8096 1 -b---- 10.6
ZFS¶
Time to change this plain Alpine domU to a proper ZFS file server.
Start the console for this zfshost
.
dom0 # xl console zfshost
Or use the tmux attach-session
command.
Add packages¶
Add the ZFS packages to the domU domain. Also enable automatic start on boot.
# apk add parted # apk add zfs zfs-$(uname -r | rev | cut -d'-' -f1 | rev) ... # modprobe zfs # lsmod | grep zfs zfs 3760128 0 zunicode 335872 1 zfs zlua 176128 1 zfs zcommon 90112 1 zfs znvpair 94208 2 zfs,zcommon zavl 16384 1 zfs icp 311296 1 zfs spl 122880 5 zfs,icp,znvpair,zcommon,zavl
NOTE: If Linux was updated in the apk add ...
command, then a reboot is necessary before doing the modprobe zfs
command (because the files in the directory /lib/modules
has been changed.
Option A: Create a pool called tank
and add a mirrored SLOG¶
NOTE If You are not using SLOG then go to next chapter.
Lets add the tank
pool (the tank
is the general pool for ZFS volumes).
Just a recap, we have the following disks and partitions to play with (extract from dmesg
, and fdisk -l
)
dmesg | grep logical\ blocks ... [ 5.242266] sd 0:0:0:0: [sda] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB) [ 5.718972] sd 1:0:0:0: [sdb] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB) ... fdisk -l 2>/dev/null | grep "^/" ... /dev/sdc2 1023,254,63 1023,254,63 58605120 78156224 19551105 9546M da Unknown /dev/sdd2 1023,254,63 1023,254,63 58605120 78156224 19551105 9546M da Unknown ...
OK, now we need to check what these disk devices are called in the /dev/disk/by-id
directory.
# ls -l /dev/disk/by-id | awk '{print $(NF-2), $(NF-1), $NF}' total 0 total 0 ata-SAMSUNG_MZ7KM120HAFD-00005_S2HPNX0HB00214 -> ../../sdd ata-SAMSUNG_MZ7KM120HAFD-00005_S2HPNX0HB00214-part1 -> ../../sdd1 ata-SAMSUNG_MZ7KM120HAFD-00005_S2HPNX0HB00214-part2 -> ../../sdd2 ata-SAMSUNG_MZ7KM120HAFD-00005_S2HPNX0HB00793 -> ../../sdc ata-SAMSUNG_MZ7KM120HAFD-00005_S2HPNX0HB00793-part1 -> ../../sdc1 ata-SAMSUNG_MZ7KM120HAFD-00005_S2HPNX0HB00793-part2 -> ../../sdc2 ata-WDC_WD60EFRX-68L0BN1_WD-WX11D28H9KVT -> ../../sda ata-WDC_WD60EFRX-68L0BN1_WD-WX11D28H9KVT-part1 -> ../../sda1 ata-WDC_WD60EFRX-68L0BN1_WD-WX11D28H9KVT-part9 -> ../../sda9 ata-WDC_WD60EFRX-68L0BN1_WD-WX21D48FDP8H -> ../../sdb ata-WDC_WD60EFRX-68L0BN1_WD-WX21D48FDP8H-part1 -> ../../sdb1 ata-WDC_WD60EFRX-68L0BN1_WD-WX21D48FDP8H-part9 -> ../../sdb9 wwn-0x50014ee265419796 -> ../../sda wwn-0x50014ee265419796-part1 -> ../../sda1 wwn-0x50014ee265419796-part9 -> ../../sda9 wwn-0x50014ee2bae31e07 -> ../../sdb wwn-0x50014ee2bae31e07-part1 -> ../../sdb1 wwn-0x50014ee2bae31e07-part9 -> ../../sdb9 wwn-0x5002538c4045ab3f -> ../../sdd wwn-0x5002538c4045ab3f-part1 -> ../../sdd1 wwn-0x5002538c4045ab3f-part2 -> ../../sdd2 wwn-0x5002538c4045aecc -> ../../sdc wwn-0x5002538c4045aecc-part1 -> ../../sdc1 wwn-0x5002538c4045aecc-part2 -> ../../sdc2
Using the disk /dev/disk/by-id
's makes life much easier if you need to replace a disk, or move the disk around a bit.
NOTE: The option -o ashift=12
below is for disks with physical sector size 4096 bytes (2^12 = 4096), which are almost all modern disks. If Your disks has the physical sector size 512 bytes (2^9 = 512, older disks) then should the argument to ashift
be 9
instead of 12
. One way to check the physical sector size of Your disks is with the following command:
# parted --list 2>/dev/null | egrep "^Disk /|^Sector" ... Disk /dev/sdc: 3001GB Sector size (logical/physical): 512B/4096B Disk /dev/sdd: 3001GB Sector size (logical/physical): 512B/4096B Disk /dev/sde: 3001GB Sector size (logical/physical): 512B/4096B ... #
Here the 4096B (at the end of lines starting with Sector
) shows that these disks have physical sector size of 4096 bytes. (If any line ends with 512B/512B
then the physical sector size are 512 bytes for those disks...)
Now, let's create the tank
pool, and it should be on the sda
and the sdb
disks. (Yes, the whole disk! ZFS is capable of handling this correctly.)
# zpool create -o ashift=12 tank mirror \ ata-WDC_WD60EFRX-68L0BN1_WD-WX11D28H9KVT \ ata-WDC_WD60EFRX-68L0BN1_WD-WX21D48FDP8H
and the SLOG should be on the SLOG partition sdc2
and sdd2
(which should be fast SSD disks...).
# zpool add tank log mirror \ ata-SAMSUNG_MZ7KM120HAFD-00005_S2HPNX0HB00214-part2 \ ata-SAMSUNG_MZ7KM120HAFD-00005_S2HPNX0HB00793-part2
Option B: Create the tank
pool without the SLOG¶
Just do, for each wanted ZFS pool, as in the previous section, but do not do the last command (zpool add tank log mirror ...
)
ZFS Cron job¶
Add cron
job for automatic check and fix bit rot.
# cat > /etc/periodic/weekly/zfs_scrub #!/bin/sh /usr/sbin/zpool scrub tank CTRL-D # chmod a+x /etc/periodic/weekly/zfs_scrub
ZFS boot services¶
We need to configure ZFS to automatically mount tank
on boot.
# rc-update add zfs-import boot # rc-update add zfs-mount # rc-update add zfs-zed # rc-update add zfs-share
ZFS Basic Config¶
Set the Tank pool to autoexpand=on
, in case larger disks are added in future
# zpool set autoexpand=on tank
Set default compression on top level (sub levels will inherit)
# zfs set compression=lz4 tank
To see the status do the following
# zpool get autoexpand NAME PROPERTY VALUE SOURCE tank autoexpand on local # zfs get compression NAME PROPERTY VALUE SOURCE tank compression lz4 local
ZFS Xen Related¶
Create a top dir for Xen domU storage, for example we use the name xen
here.
# zfs create tank/xen
And we also need to be able to use zfshost
disks as disks for other domUs. To enable this we need xendriverdomain
.
Start xendriverdomain
manually, as well as configure automatic start on boot.
# service xendriverdomain start # rc-update add xendriverdomain
Update Your system¶
Get Your system into sync with possible changes in the Alpine repository
# apk update fetch http://ftp.acc.umu.se/mirror/alpinelinux.org/v3.11/main/x86_64/APKINDEX.tar.gz v3.11.2-51-g7cf8ea7952 [http://ftp.acc.umu.se/mirror/alpinelinux.org/v3.11/main] OK: 5371 distinct packages available # apk upgrade OK: 642 MiB in 204 packages #
Add swap
to the zfshost
¶
There seems to be quite a few trouble reports on swap on zvol for zfs on linux. So we better avoid using a zvol for swap for the time being. It seems stable on FreeBSD but they recommend a much "simpler" setup than most zfs on linux related howtos for swap on zvol.
NOTE: For the swap
, we need to make the block size match the VM's system page size, which You can find with the command getconf PAGESIZE
(to be used with the -b
option below). Then we need to disable automatic snapshots for the swap
(com.sun:auto-snapshot=false
).
The FreeBSD way
zfs create -V 2G -o org.freebsd:swap=on -o checksum=off -o compression=off -o dedup=off -o sync=disabled -o primarycache=none <pool name>/swap
For Linux and our case that would be
zfs create -V 8G -b 4k -o com.sun:auto-snapshot=false -o checksum=off -o compression=off -o dedup=off -o sync=disabled -o primarycache=none <pool name>/zfshost-swap
But using checksum=off
totally takes away the only reason to put swap on zfs in the first place, bit rot protecton. So I would suggest to not turn of checksums.
We need to put the swap
on the /tank/xen/zfshost-swap
area
Then to complete the setup
# mkswap -f /dev/zvol/tank/xen/zfshost-swap Setting up swapspace version 1, size = 8 GiB (8589930496 bytes) no label, UUID=e84d527d-5bbb-4075-a377-6c7258c24633 # echo "/dev/zvol/tank/xen/zfshost-swap none swap sw,discard 0 0" >> /etc/fstab # swapon -a # rc-update add swap boot # cat /proc/swaps Filename Type Size Used Priority /dev/zd0 partition 8388604 0 -2 # cat /proc/meminfo MemTotal: 8112416 kB MemFree: 7962448 kB ... SwapTotal: 8388604 kB SwapFree: 8388604 kB Dirty: 8 kB ... #
NOTE But for maximum stability during memory pressure its probably much more vise to put swap on a lvm lv.
# lvcreate -n zfshost-swap -L 8G vg_domU
and the rest like above but /dev/vg_domU/zfshost-swap
instead of /dev/zvol/tank/xen/zfshost-swap
.
ZFS volumes for a new domU¶
When we want to create a new domU we will need to do a bit of work on both zfshost
and on the dom0
On the zfshost
¶
On zfshost
we will need to create the relevant domU disks. (We are using the domU DNS, which will provide the DNS service, as an example of a domU.)
NOTE: For the swap
, we need to make the block size match the VM's system page size, which You can find with the command getconf PAGESIZE
(to be used with the -b
option below). Then we need to disable automatic snapshots for the swap
(com.sun:auto-snapshot=false
), and set the following other attributes according to https://github.com/zfsonlinux/zfs/wiki/FAQ#using-a-zvol-for-a-swap-device.
# getconf PAGESIZE 4096 # zfs create -V 2G tank/xen/<domU-Service>-disk # zfs create -V 512M -b 4k \ -o logbias=throughput \ -o sync=always \ -o primarycache=metadata \ -o com.sun:auto-snapshot=false \ tank/xen/<domU-Service>-swap
As with the DNS Server as an example;
# getconf PAGESIZE 4096 # zfs create -V 2G tank/xen/dns-disk # zfs create -V 512M -b 4k \ -o logbias=throughput \ -o sync=always \ -o primarycache=metadata \ -o com.sun:auto-snapshot=false \ tank/xen/dns-swap
On the dom0¶
On the dom0 we need to use the backend
parameter to indicate where the disks are located.
Apart from the backend
parameter, the rest of the domU configuration file is the same as for a normal domU installation.
domU # grep backend /etc/xen/dnshost.cfg 'backend=zfshost,phy:/dev/zvol/tank/xen/<domU-Service>-disk,xvda1,w', 'backend=zfshost,phy:/dev/zvol/tank/xen/<domU-Service>-swap,xvda2,w',
Or as with the dnshost
as an example
domU # grep backend /etc/xen/dnshost.cfg 'backend=zfshost,phy:/dev/zvol/tank/xen/dns-disk,xvda1,w', 'backend=zfshost,phy:/dev/zvol/tank/xen/dns-swap,xvda2,w',
Useful commands¶
Below are some simple useful commands to check the zfs status
# zpool status # zpool history # zpool events # zpool list # zfs list