How to install a cluster with NIS and NFS in Ubuntu 16.04

I am used to create computing clusters. A cluster consists of a set of computers that work together to solve one task. In a cluster you usually have an interface to access to the cluster, a network that interconnect the nodes and a set of tools to manage the cluster. The interface to access to the cluster usually is a node named front-end to which the users can SSH. The other nodes are usually named the working-nodes. Another common component is a shared filesystem to ease simple communication between the WN.

A very common set-up is to install a NIS server in the front-end so that the users can access to the WN (i.e. using SSH), getting the same credentials than in the front-end. NIS is still useful because is very simple and it integrates very well with NFS, that is commonly used to share a file system.

It was easy to install all of this, but it is also a bit tricky (in special, NIS), and so this time I had to re-learn…

How to install a cluster with NIS and NFS in Ubuntu 16.04

We start from 3 nodes that have a fresh installation of Ubuntu 16.04. These nodes are in the network 10.0.0.1/24. Their names are hpcmd00 (10.0.0.35), hpcmd01 (10.0.0.36) and hpcmd02 (10.0.0.37). In this example, hpcmd00 will be the front-end node and the others will act as the working nodes.

Preparation

First of all we are updating ubuntu in all the nodes:

root@hpcmd00:~# apt-get update && apt-get -y dist-upgrade

Installing and configuring NIS

Install NIS in the Server

Now that the system is up to date, we are installing the NIS server in hpcmd00. It is very simple:

root@hpcmd00:~# apt-get install -y rpcbind nis

During the installation, we will be asked for the name of the domain (as in the next picture):

nisWe have selected the name hpcmd.nis for our domain. It will be kept in the file /etc/defaultdomain. Anyway we can change the name of the domain at any time by executing the next command:

root@hpcmd00:~# dpkg-reconfigure nis

And we will be prompted again for the name of the domain.

Now we need to adjust some parameters of the NIS server, that consist in editing the files /etc/default/nis and /etc/ypserv.securenets. In the first case we have to set the variable NISSERVER to the value “master”. In the second file (ypserv.securents) we are setting which IP addresses are allowed to access to the NIS service. In our case, we are allowing all the nodes in the subnet 10.0.0.0/24.

root@hpcmd00:~# sed -i 's/NISSERVER=.*$/NISSERVER=master/' /etc/default/nis
root@hpcmd00:~# sed 's/^\(0.0.0.0[\t ].*\)$/#\1/' -i /etc/ypserv.securenets
root@hpcmd00:~# echo "255.255.255.0 10.0.0.0" >> /etc/ypserv.securenets

Now we are including the name of the server in the /etc/hosts file, so that the server is able to solve its IP address, and then we are initializing the NIS service. As we have only one master server, we are including its name and let the initialization to proceed.

root@hpcmd00:~# echo "10.0.0.35 hpcmd00" >> /etc/hosts
root@hpcmd00:~# /usr/lib/yp/ypinit -m
At this point, we have to construct a list of the hosts which will run NIS
servers. hpcmd00 is in the list of NIS server hosts. Please continue to add
the names for the other hosts, one per line. When you are done with the
list, type a <control D>.
 next host to add: hpcmd00
 next host to add: 
The current list of NIS servers looks like this:
hpcmd00
Is this correct? [y/n: y] y
We need a few minutes to build the databases...
Building /var/yp/hpcmd.nis/ypservers...
Running /var/yp/Makefile...
make[1]: se entra en el directorio '/var/yp/hpcmd.nis'
Updating passwd.byname...
...
Updating shadow.byname...
make[1]: se sale del directorio '/var/yp/hpcmd.nis'

hpcmd00 has been set up as a NIS master server.

Now you can run ypinit -s hpcmd00 on all slave server.

Finally we are exporting the users of our system by issuing the next command:

root@hpcmd00:~# make -C /var/yp/

Take into account that everytime that you create a new user in the front-end, you need to export the users by issuing the make -C /var/yp command. So it is advisable to create a cron task that runs that command, to make it sure that the users are exported.

root@hpcmd00:~# cat > /etc/cron.hourly/ypexport <<\EOT
#!/bin/sh
make -C /var/yp
EOT
root@hpcmd00:~# chmod +x /etc/cron.hourly/ypexport

The users in NIS

When issuing the command make…, you are exporting the users that have an identifier of 1000 and above. If you want to change it, you can adjust the parameters in the file /var/yp/Makefile.

In particular, you can change the variables MINUID and MINGID to match your needs.

In the default configuration, the users with id 1000 and above are exported because the user 1000 is the first user that is created in the system.

Install the NIS clients

Now that we have installed the NIS server, we can proceed to install the NIS clients. In this example we are installing hpcmd01, but it will be the same procedure for all the nodes.

First install NIS using the next command:

root@hpcmd01:~# apt-get install -y rpcbind nis

As it occurred in the server, you will be prompted for the name of the domain. In our case, it is hpcmd.nis because we set that name in the server.

root@hpcmd01:~# echo "domain hpcmd.nis server hpcmd00" >> /etc/yp.conf 
root@hpcmd01:~# sed -i 's/compat$/compat nis/g;s/dns$/dns nis/g' /etc/nsswitch.conf 
root@hpcmd01:~# systemctl restart nis

Fix the rpcbind bug in Ubuntu 16.04

At this time the NIS services (both in server and clients) are ready to be used, but… WARNING because the rpcbind package needed by NIS has a bug in Ubuntu and as you reboot any of your system, rpc is dead and so the NIS server will not work. You can check it by issuing the next command:

root@hpcmd00:~# systemctl status rpcbind
● rpcbind.service - RPC bind portmap service
 Loaded: loaded (/lib/systemd/system/rpcbind.service; indirect; vendor preset: enabled)
 Drop-In: /run/systemd/generator/rpcbind.service.d
 └─50-rpcbind-$portmap.conf
 Active: inactive (dead)

Here you can see that it is inactive. And if you start it by hand, it will be properly running:

root@hpcmd00:~# systemctl start rpcbind
root@hpcmd00:~# systemctl status rpcbind
● rpcbind.service - RPC bind portmap service
 Loaded: loaded (/lib/systemd/system/rpcbind.service; indirect; vendor preset: enabled)
 Drop-In: /run/systemd/generator/rpcbind.service.d
 └─50-rpcbind-$portmap.conf
 Active: active (running) since vie 2017-05-12 12:57:00 CEST; 1s ago
 Main PID: 1212 (rpcbind)
 Tasks: 1
 Memory: 684.0K
 CPU: 8ms
 CGroup: /system.slice/rpcbind.service
 └─1212 /sbin/rpcbind -f -w
may 12 12:57:00 hpcmd00 systemd[1]: Starting RPC bind portmap service...
may 12 12:57:00 hpcmd00 rpcbind[1212]: rpcbind: xdr_/run/rpcbind/rpcbind.xdr: failed
may 12 12:57:00 hpcmd00 rpcbind[1212]: rpcbind: xdr_/run/rpcbind/portmap.xdr: failed
may 12 12:57:00 hpcmd00 systemd[1]: Started RPC bind portmap service.

There are some patches, and it seems that it will be solved in the new versions. But for now, we are including a very simple workaround that consists in adding the next lines to the file /etc/rc.local, just before the “exit 0” line:

systemctl restart rpcbind
systemctl restart nis

Now if you reboot your system, it will be properly running the rpcbind service.

WARNING: this needs to be done in all the nodes.

Installing and configuring NFS

We are configuring NFS in a very straightforward way. If you need more security or other features, you should deep into NFS configuration options to adapt it to your deployment.

In particular, we are sharing the /home folder in hpcmd00 to be available for the WN. Then, the users will have their files available at each node. I followed the instructions at this blog post.

Sharing /home at front-end

In order to install NFS in the server, you just need to issue the next command

root@hpcmd00:~# apt-get install -y nfs-kernel-server

And to share the /home folder, you just need to add a line to the /etc/exports file

root@hpcmd00:~# cat >> /etc/exports << \EOF
/home hpcmd*(rw,sync,no_root_squash,no_subtree_check)
EOF

There are a lot of options to share a folder using NFS, but we are just using some of them that are common for a /home folder. Take into account that you can restrict the hosts to which you can share the folder using their names (that is our case: hpcmdXXXX) or using IP addresses. It is noticeable that you can use wildcards such as “*”.

Finally you need to restart the NFS daemon, and you will be able to verify that the exports are ready.

root@hpcmd00:~# service nfs-kernel-server restart
root@hpcmd00:~# showmount -e localhost
Export list for localhost:
/home hpcmd*

Mount the /home folder in the WN

In order to be able to use NFS endpoints, you just need to run the next command on each node:

root@hpcmd01:~# apt-get install -y nfs-common

Now you will be able to list the folders shared at the server

root@hpcmd01:~# showmount -e hpcmd00
Export list for hpcmd00:
/home hpcmd*

At this moment it is possible to mount the /home folder just issuing a command like

root@hpcmd01:~# mount -t nfs hpcmd00:/home /home

But we’d prefer to add a line to the /etc/fstab file. Using this approach, the mount will be available at boot time. In order to make it, we’ll add the proper line:

root@hpcmd01:~# cat >> /etc/fstab << \EOT
hpcmd00:/home /home nfs auto,nofail,noatime,nolock,intr,tcp,actimeo=1800 0 0
EOT

Now you can also issue the following command to start using your share without the need of rebooting:

root@hpcmd01:~# mount /home/

Verification

At the hpcmd00 node you can create a user, and verify that the home folder has been created:

root@hpcmd00:~# adduser testuser
Añadiendo el usuario `testuser' ...
Añadiendo el nuevo grupo `testuser' (1002) ...
Añadiendo el nuevo usuario `testuser' (1002) con grupo `testuser' ...
...
¿Es correcta la información? [S/n] S
root@hpcmd00:~# ls -l /home/
total 4
drwxr-xr-x 2 testuser testuser 4096 may 15 10:06 testuser

If you ssh to the internal nodes, it will fail (the user will not be available), because the user has not been exported:

root@hpcmd00:~# ssh testuser@hpcmd01
testuser@hpcmd01's password: 
Permission denied, please try again.

But the home folder for that user is already available in these nodes (because the folder is shared using NFS).

Once we export the users at hpcmd00 the user will be available in the domain and we will be able to ssh to the WN using that user:

root@hpcmd00:~# make -C /var/yp/
make: se entra en el directorio '/var/yp'
make[1]: se entra en el directorio '/var/yp/hpcmd.nis'
Updating passwd.byname...
Updating passwd.byuid...
Updating group.byname...
Updating group.bygid...
Updating netid.byname...
make[1]: se sale del directorio '/var/yp/hpcmd.nis'
make: se sale del directorio '/var/yp'
root@hpcmd00:~# ssh testuser@hpcmd01
testuser@hpcmd01's password: 
Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-77-generic x86_64)

testuser@hpcmd01:~$ pwd
/home/testuser