raspi-k3s/readme.org

# -*- ii: ii; -*-
#+TITLE: Raspberry pi k3s cluster guide
#+AUTHOR: James Blair
#+EMAIL: mail@jamesblair.net
#+DATE: 24th December 2019


This file serves as a complete step by step guide for creating a bare metal raspberry pi kubernetes cluster using [[https://k3s.io/][k3s]] from [[https://rancher.com/][Rancher]].

My goal for this build is to replace a server I currently run at home that hosts several workloads via Docker with a scalable k3s cluster.

Additionally in future I would like the cluster to be portable and operate via 3G-5G Cellular network and an array of batteries.

I chose k3s as it incredibly lightweight but still CNCF certified and production grade software that is optimised for resource constraints of raspberry pis.


* Pre-requisites

** Cluster machines

   For this guide I am using four [[https://www.pishop.us/product/raspberry-pi-4-model-b-4gb/][Raspberry Pi 4 4GB]] machines.

   The cluster will have two leader nodes and two worker nodes.

   For resiliency puposes in future I will consider updateing the cluster to run with all nodes as leader nodes.


** Boot media

   This guide requires each Raspberry Pi to have a removable SD card or other removable boot media.  I am using four 32GB SD Cards though any USB or SD card at least 8GB in size should work fine.

*** TODO Migration to network booting

   In future it would be preferable for the raspberry pi's to be able to network boot and setup automatically without an SD card.

   This is a nice to have that I will pursue at a later date once I have a deployed cluster that allows me to migrate off the current server setup I have deployed.


* Step 1 - Prepare boot media for master

** Download the latest release

  Our first step is to create the bootable SD Card with a minimal install of [[https://www.raspbian.org/][Raspbian]], which is a free operating system based on [[https://www.debian.org/][Debian]] and is optimised for Raspberry Pi hardware.

  Rather than doing an installation and configuration of an os image from scratch I found [[https://github.com/FooDeas/raspberrypi-ua-netinst][this project]] on Github which automates the install and configuration process nicely.

  #+NAME: Download the latest release zip
  #+begin_src tmate
  echo Downloading latest release zip from github
  curl -s https://api.github.com/repos/foodeas/raspberrypi-ua-netinst/releases/latest \
  | grep "browser_download_url.*zip" \
  | cut -d : -f 2,3 \
  | tr -d \" \
  | wget -i -

  echo Checking file is now present
  ls -l | grep *.zip

  echo Extracting the zip file
  unzip -q -d installer *.zip
  ls -l | grep installer
  #+end_src

  #+RESULTS: Download the latest release zip
  #+begin_example
  Downloading latest release zip from github
  Checking file is now present
  -rw-rw-rw- 1 james james 60299545 Aug 12 08:35 raspberrypi-ua-netinst-v2.4.0.zip
  Extracting the zip file
  drwxrwxrwx 1 james james     4096 Jan 20 11:12 installer
  -rwxrwxrwx 1 james james     2863 Jan 10 17:04 installer-config.txt
  #+end_example


** Apply custom install configuration

   Our next step after downloading the latest release is to apply our own installation configuration using a simple txt file.

   There is great documentation online howing what configuration options are available [[https://github.com/malignus/raspberrypi-ua-netinst/blob/master/doc/INSTALL_CUSTOM.md][here]].

   For our purposes we just over-write the file downloaded and extracted in the previous step with one we have prepared earlier :)

   #+NAME: Overwrite installer configuration file
   #+begin_src tmate
   echo Display wordcount of original file for comparison
   wc installer/raspberrypi-ua-netinst/config/installer-config.txt

   echo Overwriting /installer/raspberrypi-ua-netinst/config/installer-config.txt
   cp installer-config.txt installer/raspberrypi-ua-netinst/config/

   echo Display wordcount of file after copy to validate update
   wc installer/raspberrypi-ua-netinst/config/installer-config.txt
   #+end_src

   #+RESULTS: Overwrite installer configuration file
   #+begin_example
   Display wordcount of original file for comparison
     3  23 157 installer/raspberrypi-ua-netinst/config/installer-config.txt
   Overwriting /installer/raspberrypi-ua-netinst/config/installer-config.txt
   Display wordcount of file after copy to validate update
     67   85 2863 installer/raspberrypi-ua-netinst/config/installer-config.txt
   #+end_example


** Apply custom post install script

   The final step is to supply a post install script which completes additional security hardening and production readiness automatically.

   To supply a script we can provide an additional ~post-install.txt~ file as documented [[https://github.com/FooDeas/raspberrypi-ua-netinst/blob/devel/doc/INSTALL_ADVANCED.md][here]].

   I have a hardening script prepared in this repository that we can copy in.

   #+NAME: Copy in post-install script
   #+begin_src tmate
   echo Copying in post-install.txt
   cp post-install.txt installer/raspberrypi-ua-netinst/config/

   echo Display wordcount of file after copy to validate
   wc installer/raspberrypi-ua-netinst/config/post-install.txt
   #+end_src

   #+RESULTS: Copy in post-install script
   #+begin_example
   Copying in post-install.txt
   Display wordcount of file after copy to validate
     98  282 3429 installer/raspberrypi-ua-netinst/config/post-install.txt
   #+end_example


* Step 2 - Copy the install media to sd card

  Our next step is to copy the contents of the ~installer/~ folder to a *FAT32* formatted removable media i.e. SD Card.

  Unfortunately this is currently a windows step as my dev environment is a Windows 10 laptop with Debian via Windows Subsystem for Linux which does not support ~lsblk~ or other disk management commands.

** Obtain sd card partition information

  Our first step is to insert the SD Card and ensure it is formatted correctly as ~FAT32~.  To do that we need to know the number of the disk we want to format, we can find that via powershell.

  #+NAME: Get disks via windows powershell
  #+begin_src tmate
  echo Retrieving disk list via powershell
  powershell.exe -nologo -command "get-disk | select Number, FriendlyName, Size"
  #+end_src

  #+NAME: Get partitions via windows powershell
  #+begin_src tmate
  echo Retrieving partition list via powershell
  powershell.exe -nologo -command "get-disk | get-partition | select PartitionNumber, DriveLetter, Size, Type"
  #+end_src

  #+RESULTS: Get disks via windows powershell
  #+begin_example
  Retrieving disk list via powershell

  PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\Debian\home\james\Documents\raspi-k3s> get-disk | select Number, FriendlyName, Size

  Number FriendlyName                       Size
  ------ ------------                       ----
       1 Realtek PCIE Card Reader    31104958464
       0 SAMSUNG MZVLB256HAHQ-000H1 256060514304


  PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\Debian\home\james\Documents\raspi-k3s>

  PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\Debian\home\james\Documents\raspi-k3s> echo Retrieving partition list via powershell
  Retrieving
  partition
  list
  via
  powershell

  PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\Debian\home\james\Documents\raspi-k3s> get-disk | get-partition | select PartitionNumber, DriveLetter, Size, Type

  PartitionNumber DriveLetter         Size Type
  --------------- -----------         ---- ----
                1           D    134217728 FAT32
                2             30966546432 Unknown
                1               272629760 System
                2                16777216 Reserved
                3           C 254735810560 Basic
                4              1027604480 Recovery


  PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\Debian\home\james\Documents\raspi-k3s> exit
  #+end_example


** Create and format sd card partition

  Once we know the number of the disk we want to format we can proceed. In the example above I have a 32GB SD Card which shows as number ~1~.

  Checking the disk we can see some partitions that exist already from previous use of the card.  To delete these partitions you can use the ~Remove-Partition -DiskNumber X -PartitionNumber Y~ command where ~X~ and ~Y~ relate to the output of your disk and partition number.

  Due to the risk of data loss this step is not automated. Once existing partitions have been cleared we can use the following block to:
  - Create a new partition using masixmum available space
  - Assign a free drive letter in windows
  - Mount the disk in WSL so we can copy to it
  - Copy the install media over to the partition

  #+NAME: Create sd card partition
  #+begin_src tmate
  echo Use powershell to create new partition and format
  powershell.exe -nologo -command "new-partition -disknumber 1 -usemaximumsize -driveletter e; format-volume -driveletter e -filesystem FAT32 -newfilesystemlabel sd"
  #+end_src

  #+NAME: Mount and copy the new media
  #+begin_src tmate
  echo Mount the new partition in wsl
  sudo mkdir /mnt/e
  sudo mount -t drvfs e: /mnt/e/

  echo Copy the contents of installer to sd
  cp -r installer/* /mnt/e/

  # We need to wait before we can eject
  sleep 5
  sudo umount /mnt/e

  sleep 5
  echo Eject the sd card ready for use
  powershell.exe -nologo -command "(new-object -comobject shell.application).namespace(17).parsename('E:').invokeverb('eject')"
  #+end_src


* Step 3 - Boot the pi and remotely connect

  Provided the configuration on the sd card is valid and the pi has been able to successfully obtain an ip address via dhcp on boot then following a 10-20minute net install process the pi will be online and accessible via ssh using the private key corresponding to the public key we supplied in our ~installer-config.txt~ file.

** Port knock and enter

  Now we can port knock and connect.

  Note: There seems to be a minute delay required between port knocks being transmitted and ssh being able to connect which is why a short sleep is included in the knock and enter command.

  #+NAME: Knock and enter
  #+begin_src tmate
  # Setup machine variables
  export port=2124
  export machineip=192.168.1.124
  export knocksequence="[SEQUENCE HERE]"

  # Gather ssh keys if not already known
  ssh-keyscan -p $port $machineip >> ~/.ssh/known_hosts

  # Knock and enter
  knock $machineip $knocksequence && sleep 2 && ssh -p $port $machineip
  #+end_src


* Step 4 - Configure distributed storage

One of the goals for this raspberry pi cluster is to run with distributed storage, rather than a traditional single device raid array that the server this cluster is replacing is currently running.

The reason I'm interested in this is primarily to explore options for greater hardware redunancy and reliability in the event that a node may go down within the cluster.

** Format and mount storage volumes

  Now that our machines are online and we have connected we can setup our storage cluster.

  For a distributed storage cluster we are using [[https://www.gluster.org/][glusterfs]]. As part of our earlier setup gluster was automatically installed.  We just need to configure it.

  Our first step is to ensure our storage drives attached to our raspberry pi's are formatted. In our case our drives are all showing as ~/dev/sda~ with no existing partitions, ensure you review your situation with ~lsblk~ first and ajdust the commands below as neccessary!

  #+NAME: Format and mount storage bricks
  #+begin_src tmate
  # Format the /dev/sda1 partition as xfs
  sudo mkfs.xfs -i size=512 /dev/sda1

  # Make the mount point directory
  sudo mkdir -p /data/brick1

  # Update fstab to ensure the mount will resume on boot
  echo '/dev/sda1 /data/brick1 xfs defaults 1 2' | sudo tee -a /etc/fstab

  # Mount the new filesystem now
  sudo mount -a && sudo mount
  #+end_src


** Configure firewall rules

  The gluster processes on the nodes need to be able to communicate with each other. To simplify this setup, configure the [[https://en.wikipedia.org/wiki/Iptables][iptables]] firewall on each node to accept all traffic from the other node(s).

  In our four node cluster this means ensuring we have rules present for all nodes. Adjust as neccessary for the requirements of your cluster!

  #+NAME: Setup firewall rules for inter cluster communication
  #+begin_src tmate
  # Add the firewall rules
  sudo iptables -I INPUT -p all -s 192.168.1.122 -j ACCEPT
  sudo iptables -I INPUT -p all -s 192.168.1.124 -j ACCEPT
  sudo iptables -I INPUT -p all -s 192.168.1.126 -j ACCEPT
  sudo iptables -I INPUT -p all -s 192.168.1.128 -j ACCEPT

  # Ensure these are saved permanently
  sudo netfilter-persistent save
  #+end_src


** Ensure the daemon is running

  Next we need to ensure the glusterfs daemon is enabled and started.

  #+NAME: Ensure glusterd is enabled and running
  #+begin_src tmate
  # Ensure the gluster service starts on boot
  sudo systemctl enable glusterd

  # Start the gluster service now
  sudo systemctl start glusterd

  # Check the service status to confirm running
  sudo systemctl status glusterd
  #+end_src


** Test connectivity between peers

  Now we're ready to test connectivity between all the gluster peers.

  #+NAME: Complete cluster probes
  #+begin_src tmate
  # Complete the peer probes
  sudo gluster peer probe 192.168.1.122
  sudo gluster peer probe 192.168.1.124
  sudo gluster peer probe 192.168.1.126
  sudo gluster peer probe 192.168.1.128

  # Validate the peer status
  sudo gluster peer status
  #+end_src


** Setup gluster volume

  Provided connectivity was established successfully you are now ready to setup a gluster volume.

  *Note:* The ~gluster volume create~ command only needs to be run from any one node.

  #+NAME: Setup gluster volume
  #+begin_src shell :wrap example
  # Create the gluster volume folder (all nodes)
  sudo mkdir -p /data/brick1/jammaraid

  # Create the gluster volume itself (one node)
  sudo gluster volume create jammaraid 192.168.1.122:/data/brick1/jammaraid 192.168.1.124:/data/brick1/jammaraid 192.168.1.126:/data/brick1/jammaraid 192.168.1.128:/data/brick1/jammaraid force

  # Ensure the volume is started
  sudo gluster volume start jammaraid

  # Confirm the volume has been created
  sudo gluster volume info
  #+end_src


** Mount and use the new volume

  Now that the gluster volume has been created and started we can mount it within each node so it is accessible for use :)

  #+NAME: Mount the gluster volume
  #+begin_src shell :wrap example
  # Create the gluster volume mount point
  sudo mkdir -p /media/raid

  # Mount the volume
  sudo mount -t glusterfs localhost:jammaraid /media/raid
  #+end_src


* Step 5 - Create kubernetes cluster

Now can begin installing [[http://k3s.io/][k3s]] on each of the cluster nodes, and then join them into one compute cluster. This will set us up to be able to deploy workloads to that kubernetes cluster.

** Download k3s setup binary

Our first step is to download the latest ~k3s-armhf~ setup binary from github. Repeat the steps below for each potential cluster node.

#+NAME: Knock and enter
#+begin_src tmate
# Setup machine variables
export port=2128
export machineip=192.168.1.128
export knocksequence="[SEQUENCE HERE]"

# Gather ssh keys if not already known
ssh-keyscan -p $port $machineip >> ~/.ssh/known_hosts

# Knock and enter
knock $machineip $knocksequence && sleep 2 && ssh -p $port $machineip
#+end_src

#+NAME: Download latest setup binary
#+begin_src tmate :wrap example
# Download the latest release dynamically
curl -s https://api.github.com/repos/rancher/k3s/releases/latest \
      | grep "browser_download_url.*k3s-armhf" \
      | cut -d : -f 2,3 \
      | tr -d \" \
      | wget -i -

# Make it executable
chmod +x k3s-armhf

# Leave the node
exit
#+end_src


** Initialise the cluster

Our next step we only run on the one node that will operate as our cluster master. K3s provides an installation script that is a convenient way to install it as a service on systemd or openrc based systems. This script is available at https://get.k3s.io.

After running this installation:

 * The ~k3s~ service will be configured to automatically restart after node reboots or if the process crashes or is killed.
 * Additional utilities will be installed, including ~kubectl~, ~crictl~, ~ctr~, ~k3s-killall.sh~, and ~k3s-uninstall.sh~.
 * A ~kubeconfig~ file will be written to ~/etc/rancher/k3s/k3s.yaml~ and the kubectl installed by K3s will automatically use it.

First step, let's login to our chosen master.

#+NAME: Knock and enter
#+begin_src tmate
# Setup machine variables
export port=2124
export machineip=192.168.1.124
export knocksequence="[SEQUENCE HERE]"

# Gather ssh keys if not already known
ssh-keyscan -p $port $machineip >> ~/.ssh/known_hosts

# Knock and enter
knock $machineip $knocksequence && sleep 2 && ssh -p $port $machineip
#+end_src


Once we have logged in we can run the install script.

#+NAME: Initialise the master node
#+begin_src tmate
curl -sfL https://get.k3s.io | sh -
#+end_src


Once our master has been deployed by the installation script we can check ~kubectl~ to ensure they are listed as expected.

#+NAME: Check cluster nodes
#+begin_src tmate
# Check kubectl
sudo kubectl get nodes

# Obtain cluster token
sudo cat /var/lib/rancher/k3s/server/node-token
#+end_src


** Join worker nodes

Once we have established our cluster masters we need to join workers into the cluster. To install on worker nodes and add them to the cluster, run the installation script with the K3S_URL and K3S_TOKEN environment variables.

Repeat the steps below for each worker node, ensuring the node port, machineip and knocksequence are set correctly.

#+NAME: Knock and enter
#+begin_src tmate
# Setup machine variables
export port=2128
export machineip=192.168.1.128
export knocksequence="[SEQUENCE HERE]"

# Gather ssh keys if not already known
ssh-keyscan -p $port $machineip >> ~/.ssh/known_hosts

# Knock and enter
knock $machineip $knocksequence && sleep 2 && ssh -p $port $machineip
#+end_src

#+NAME: Join worker
#+begin_src tmate
# Set environment variables
export K3S_URL=https://192.168.1.124:6443
export K3S_TOKEN=[TOKEN_HERE]

# Run the installation script
curl -sfL https://get.k3s.io | sh -

# Leave the worker
exit
#+end_src


** Check the cluster status

Once all workers have been joined lets hop back onto the master and confirm that all nodes are listed as expected.

#+NAME: Knock and enter
#+begin_src tmate
# Setup machine variables
export port=2124
export machineip=192.168.1.124
export knocksequence="[SEQUENCE HERE]"

# Gather ssh keys if not already known
ssh-keyscan -p $port $machineip >> ~/.ssh/known_hosts

# Knock and enter
knock $machineip $knocksequence && sleep 2 && ssh -p $port $machineip
#+end_src


#+NAME: Check cluster nodes
#+begin_src tmate
# Check kubectl
sudo kubectl get nodes
#+end_src


* Step 6 - Deploy a service

With our cluster now running, now we can take it for a spin! Let's deploy a simple service. We'll deploy figlet which will take a body over HTTP on port 8080 and return an ASCII-formatted string.

We'll need to be logged into our cluster master to do this.

#+NAME: Create the service
#+begin_src tmate
cat <<EOF > openfaas-figlet-svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: openfaas-figlet
  labels:
    app: openfaas-figlet
spec:
  type: NodePort
  ports:
    - port: 8080
      protocol: TCP
      targetPort: 8080
      nodePort: 31111
  selector:
    app: openfaas-figlet
EOF
#+end_src