I am used to create computing clusters. A cluster consists of a set of computers that work together to solve one task. In a cluster you usually have an interface to access to the cluster, a network that interconnect the nodes and a set of tools to manage the cluster. The interface to access to the cluster usually is a node named front-end to which the users can SSH. The other nodes are usually named the working-nodes. Another common component is a shared filesystem to ease simple communication between the WN.
A very common set-up is to install a NIS server in the front-end so that the users can access to the WN (i.e. using SSH), getting the same credentials than in the front-end. NIS is still useful because is very simple and it integrates very well with NFS, that is commonly used to share a file system.
It was easy to install all of this, but it is also a bit tricky (in special, NIS), and so this time I had to re-learn…
How to install a cluster with NIS and NFS in Ubuntu 16.04
We start from 3 nodes that have a fresh installation of Ubuntu 16.04. These nodes are in the network 10.0.0.1/24. Their names are hpcmd00 (10.0.0.35), hpcmd01 (10.0.0.36) and hpcmd02 (10.0.0.37). In this example, hpcmd00 will be the front-end node and the others will act as the working nodes.
First of all we are updating ubuntu in all the nodes:
root@hpcmd00:~# apt-get update && apt-get -y dist-upgrade
Installing and configuring NIS
Install NIS in the Server
Now that the system is up to date, we are installing the NIS server in hpcmd00. It is very simple:
root@hpcmd00:~# apt-get install -y rpcbind nis
During the installation, we will be asked for the name of the domain (as in the next picture):
We have selected the name hpcmd.nis for our domain. It will be kept in the file /etc/defaultdomain. Anyway we can change the name of the domain at any time by executing the next command:
root@hpcmd00:~# dpkg-reconfigure nis
And we will be prompted again for the name of the domain.
Now we need to adjust some parameters of the NIS server, that consist in editing the files /etc/default/nis and /etc/ypserv.securenets. In the first case we have to set the variable NISSERVER to the value “master”. In the second file (ypserv.securents) we are setting which IP addresses are allowed to access to the NIS service. In our case, we are allowing all the nodes in the subnet 10.0.0.0/24.
root@hpcmd00:~# sed -i 's/NISSERVER=.*$/NISSERVER=master/' /etc/default/nis root@hpcmd00:~# sed 's/^\(0.0.0.0[\t ].*\)$/#\1/' -i /etc/ypserv.securenets root@hpcmd00:~# echo "255.255.255.0 10.0.0.0" >> /etc/ypserv.securenets
Now we are including the name of the server in the /etc/hosts file, so that the server is able to solve its IP address, and then we are initializing the NIS service. As we have only one master server, we are including its name and let the initialization to proceed.
root@hpcmd00:~# echo "10.0.0.35 hpcmd00" >> /etc/hosts root@hpcmd00:~# /usr/lib/yp/ypinit -m At this point, we have to construct a list of the hosts which will run NIS servers. hpcmd00 is in the list of NIS server hosts. Please continue to add the names for the other hosts, one per line. When you are done with the list, type a <control D>. next host to add: hpcmd00 next host to add: The current list of NIS servers looks like this: hpcmd00 Is this correct? [y/n: y] y We need a few minutes to build the databases... Building /var/yp/hpcmd.nis/ypservers... Running /var/yp/Makefile... make: se entra en el directorio '/var/yp/hpcmd.nis' Updating passwd.byname... ... Updating shadow.byname... make: se sale del directorio '/var/yp/hpcmd.nis' hpcmd00 has been set up as a NIS master server. Now you can run ypinit -s hpcmd00 on all slave server.
Finally we are exporting the users of our system by issuing the next command:
root@hpcmd00:~# make -C /var/yp/
Take into account that everytime that you create a new user in the front-end, you need to export the users by issuing the make -C /var/yp command. So it is advisable to create a cron task that runs that command, to make it sure that the users are exported.
root@hpcmd00:~# cat > /etc/cron.hourly/ypexport <<\EOT #!/bin/sh make -C /var/yp EOT root@hpcmd00:~# chmod +x /etc/cron.hourly/ypexport
The users in NIS
When issuing the command make…, you are exporting the users that have an identifier of 1000 and above. If you want to change it, you can adjust the parameters in the file /var/yp/Makefile.
In particular, you can change the variables MINUID and MINGID to match your needs.
In the default configuration, the users with id 1000 and above are exported because the user 1000 is the first user that is created in the system.
Install the NIS clients
Now that we have installed the NIS server, we can proceed to install the NIS clients. In this example we are installing hpcmd01, but it will be the same procedure for all the nodes.
First install NIS using the next command:
root@hpcmd01:~# apt-get install -y rpcbind nis
As it occurred in the server, you will be prompted for the name of the domain. In our case, it is hpcmd.nis because we set that name in the server.
root@hpcmd01:~# echo "domain hpcmd.nis server hpcmd00" >> /etc/yp.conf root@hpcmd01:~# sed -i 's/compat$/compat nis/g;s/dns$/dns nis/g' /etc/nsswitch.conf root@hpcmd01:~# systemctl restart nis
Fix the rpcbind bug in Ubuntu 16.04
At this time the NIS services (both in server and clients) are ready to be used, but… WARNING because the rpcbind package needed by NIS has a bug in Ubuntu and as you reboot any of your system, rpc is dead and so the NIS server will not work. You can check it by issuing the next command:
root@hpcmd00:~# systemctl status rpcbind ● rpcbind.service - RPC bind portmap service Loaded: loaded (/lib/systemd/system/rpcbind.service; indirect; vendor preset: enabled) Drop-In: /run/systemd/generator/rpcbind.service.d └─50-rpcbind-$portmap.conf Active: inactive (dead)
Here you can see that it is inactive. And if you start it by hand, it will be properly running:
root@hpcmd00:~# systemctl start rpcbind root@hpcmd00:~# systemctl status rpcbind ● rpcbind.service - RPC bind portmap service Loaded: loaded (/lib/systemd/system/rpcbind.service; indirect; vendor preset: enabled) Drop-In: /run/systemd/generator/rpcbind.service.d └─50-rpcbind-$portmap.conf Active: active (running) since vie 2017-05-12 12:57:00 CEST; 1s ago Main PID: 1212 (rpcbind) Tasks: 1 Memory: 684.0K CPU: 8ms CGroup: /system.slice/rpcbind.service └─1212 /sbin/rpcbind -f -w may 12 12:57:00 hpcmd00 systemd: Starting RPC bind portmap service... may 12 12:57:00 hpcmd00 rpcbind: rpcbind: xdr_/run/rpcbind/rpcbind.xdr: failed may 12 12:57:00 hpcmd00 rpcbind: rpcbind: xdr_/run/rpcbind/portmap.xdr: failed may 12 12:57:00 hpcmd00 systemd: Started RPC bind portmap service.
There are some patches, and it seems that it will be solved in the new versions. But for now, we are including a very simple workaround that consists in adding the next lines to the file /etc/rc.local, just before the “exit 0” line:
systemctl restart rpcbind systemctl restart nis
Now if you reboot your system, it will be properly running the rpcbind service.
WARNING: this needs to be done in all the nodes.
Installing and configuring NFS
We are configuring NFS in a very straightforward way. If you need more security or other features, you should deep into NFS configuration options to adapt it to your deployment.
In particular, we are sharing the /home folder in hpcmd00 to be available for the WN. Then, the users will have their files available at each node. I followed the instructions at this blog post.
Sharing /home at front-end
In order to install NFS in the server, you just need to issue the next command
root@hpcmd00:~# apt-get install -y nfs-kernel-server
And to share the /home folder, you just need to add a line to the /etc/exports file
root@hpcmd00:~# cat >> /etc/exports << \EOF /home hpcmd*(rw,sync,no_root_squash,no_subtree_check) EOF
There are a lot of options to share a folder using NFS, but we are just using some of them that are common for a /home folder. Take into account that you can restrict the hosts to which you can share the folder using their names (that is our case: hpcmdXXXX) or using IP addresses. It is noticeable that you can use wildcards such as “*”.
Finally you need to restart the NFS daemon, and you will be able to verify that the exports are ready.
root@hpcmd00:~# service nfs-kernel-server restart root@hpcmd00:~# showmount -e localhost Export list for localhost: /home hpcmd*
Mount the /home folder in the WN
In order to be able to use NFS endpoints, you just need to run the next command on each node:
root@hpcmd01:~# apt-get install -y nfs-common
Now you will be able to list the folders shared at the server
root@hpcmd01:~# showmount -e hpcmd00 Export list for hpcmd00: /home hpcmd*
At this moment it is possible to mount the /home folder just issuing a command like
root@hpcmd01:~# mount -t nfs hpcmd00:/home /home
But we’d prefer to add a line to the /etc/fstab file. Using this approach, the mount will be available at boot time. In order to make it, we’ll add the proper line:
root@hpcmd01:~# cat >> /etc/fstab << \EOT hpcmd00:/home /home nfs auto,nofail,noatime,nolock,intr,tcp,actimeo=1800 0 0 EOT
Now you can also issue the following command to start using your share without the need of rebooting:
root@hpcmd01:~# mount /home/
At the hpcmd00 node you can create a user, and verify that the home folder has been created:
root@hpcmd00:~# adduser testuser Añadiendo el usuario `testuser' ... Añadiendo el nuevo grupo `testuser' (1002) ... Añadiendo el nuevo usuario `testuser' (1002) con grupo `testuser' ... ... ¿Es correcta la información? [S/n] S root@hpcmd00:~# ls -l /home/ total 4 drwxr-xr-x 2 testuser testuser 4096 may 15 10:06 testuser
If you ssh to the internal nodes, it will fail (the user will not be available), because the user has not been exported:
root@hpcmd00:~# ssh testuser@hpcmd01 testuser@hpcmd01's password: Permission denied, please try again.
But the home folder for that user is already available in these nodes (because the folder is shared using NFS).
Once we export the users at hpcmd00 the user will be available in the domain and we will be able to ssh to the WN using that user:
root@hpcmd00:~# make -C /var/yp/ make: se entra en el directorio '/var/yp' make: se entra en el directorio '/var/yp/hpcmd.nis' Updating passwd.byname... Updating passwd.byuid... Updating group.byname... Updating group.bygid... Updating netid.byname... make: se sale del directorio '/var/yp/hpcmd.nis' make: se sale del directorio '/var/yp' root@hpcmd00:~# ssh testuser@hpcmd01 testuser@hpcmd01's password: Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-77-generic x86_64) testuser@hpcmd01:~$ pwd /home/testuser