Once in a while I end up facing problems due to the most disgusting global variable at all: time. Time can not be mocked easily. You'll have features based on time. You'll have reports based on time. You'll have to make the right decisions at the most appropriate time, etc... I could - and probably will - start a series of posts about time and its presence in software development. This post is dedicated to distributed build systems. I have a setup of a master Jenkins server, and several virtual and physical boxes around the globe as slaves. Oh. One more thing about time. It's relative.
If you're locked in a windowless room with only one device capable of measuring time, can you be sure that what you see is the time as we know it? Guess what! Our computers have only got a single clock. This leaves me with a conclusion, that a box can have its own relative time, which is not a fine goal for cloud based (=stored across multiple instances) applications. It leads to strange errors, like when a Jenkins master would declare a fresh report from a slave outdated.
We've evolved so much towards shiny applications, that we have forgotten about the basic ideas which led to the evolution of the world-wide web. I remember when I first encountered the Internet, I usually pinged one host which I hoped to be on-line 24/7. This was: time.kfki.hu. A time server! NTP is a protocol designed to synchronize the clocks of computers over a network. The first NTP implementation started around 1980 with an accuracy of only several hundred milliseconds!
So wouldn't it be nice if all boxes would sync from the same provider? But what happens if your crucial information is corrupted by the mistake of that third party service? You wouldn't like that, would you? The great part follows: You can have your Jenkins master server act as a time provider for all connected devices. This could mean user PC's, test boxes and even the inflatable pirate's controller which is hooked on the notification system.
The implementation
The Server
I'll show a Ubuntu Natty configuration, but it should be pretty much the same on every distribution. The first thing you'd want to do is to remove the ntpdate application from all instances. It's not harmful, but can cause minor turbulences.
apt-get remove ntpdate
Then, you need to install the NTP daemon to every instance
apt-get install ntp
On the time server instance, open up and edit /etc/ntp.conf. The existing contents of the file can be left as it is, but you need to add some lines. Under the list of servers, add a reference to the instance itself. This will serve as a back-up if your network should fail.
server 127.127.1.0
fudge 127.127.1.0 stratum 10
You need to enable the synchronization service. Suppose you have a network with 192.168.2.xxx IP addresses, then you'll have to add:
restrict 192.168.2.0 mask 255.255.255.0 nomodify notrap
I prefer to insert it before the line: restrict 127.0.0.1. You need to insert such line for every network you plan to server.
Once you are done, save the file and restart the service:
/etc/init.d/ntp restart
The Client
So you have your network a time provider, you need to get your boxes to sync with that. If you carefully followed along, you already have the package ntp installed. Open up /etc/ntp.conf and edit as follows.
Remove every single server line, and add only those which are relevant for your box. If your time server is on 192.168.1.123, than leave a single
server 192.168.1.123
line in the list.
Right before the line: restrict 127.0.0.1 enter two more restrictions. The first will prevent access to the ntp services of this machine, the second will back-up the localhost serving of the time.
restrict default notrust nomodify nopeer
restrict <external ip of the current box>
Of course, you have to replace "<external ip of the current box>" with the external ip of the current box.
Once you are done, save the file and restart the service:
/etc/init.d/ntp restart
Debugging
The ntp software is anything, but verbose. If you want to see what's going on, than stop the service with
/etc/init.d/ntp stop
and run
ntpt -d
Thanks for the tip Marton. Looks like we have our time synch problems resolved now.
ReplyDelete