At the end of June, a rare event will occur. Due to a barely discernible slowing of the Earth's rotation, a leap second will be added to the Network Time Protocol (NTP) to keep it synchronized with the slowly lengthening solar day.
That might seem like a simple thing to do, but as John Engates, CTO at Rackspace, said when asked about keeping computer clocks synchronized: "Time gets complicated fast." Not everyone will add the leap second in the same way, or at same time. Some organizations, including Google, will do it their own way.
In fact, as you stick with us through this article, you may be longing for Morgan Freeman to step in and break it down for you, as he does for the Science Channel series Through the Wormhole. We can't give you Freeman, but we can give you all you need to know about why the leap second really matters to IT.
The original author of the NTP protocol, Prof. David Mills at the University of Delaware, set a direct and simple way to add the second: Count the last second of June 30 twice, using a special notation on the second count for the record.
Google has divined what it calls a "clever" way to do it, adding bits of a second throughout the day on June 30, so that there's no jarring last-second adjustment to clocks. It calls its method the "Google Smear."
"We have a clever way of handling leap seconds," wrote Google site reliability engineers Noah Maxwell and Michael Rothwell in a blog posted May 21. "Instead of repeating a second, we 'smear' away the extra second." Over a 20-hour period on June 30, Google adds a couple of milliseconds to each of its NTP servers' updates. By the end of the day, a full second has been added. As the NTP protocol and Google timekeepers enter the first second of July, their methods may differ, but they both agree on the time.
[ Does anybody really know what time it is? This guy does. Read NTP's Fate Hinges On Father Time. ]
Traditionalists -- some might call them purists -- like Harlan Stenn, chief maintainer of the Network Time Protocol, don't like the smear. "At noon on June 30, clocks (of smear implementers) will be off by a half second," he said in a February interview with InformationWeek about the intricacies of computer timekeeping. That's a lot in terms of precision time -- it might as well be a decade.
As the day wears on, processes based on precise timing -- such as the amount of time a valve opens to add a chemical to a mix -- will be off by more than a half-second if they relied on the Google smear. "What if you're getting radiation treatment? Do you want your radiation dose to be off by a half-second or more?" asked Stenn.
The last time a second needed to be added to the day was on June 30, 2012. For Qantas Airlines in Australia, it was a memorable event. Its systems, including flight reservations, went down for two hours as internal system clocks fell out of synch with external clocks. Prior to 2012, a second was added on Dec. 31, 2008, and also in 2005. The process started in 1972, and we'll have made a total of 36 additions by the end of the day on June 30, 2015.
There are agencies that are good at measuring the solar day, such as the US Naval Observatory in Washington and the Royal Observatory in Greenwich, UK. They dictate when the need to add a second arises. But no one supervises how the addition is made to all computer systems.
NTP does it in the way that it does because it's coordinating millions of computers based on the Posix standard, a 1989 relic that was meant to resolve differences between various Unix brands. Linux and Windows have since adopted Posix standards. Posix dictates that there are exactly 86,400 seconds in the day, every day, no more, no less. To simply add a second to June 30, and count it accurately in NTP, would throw the count permanently out of synch for all Posix-based computers relying on NTP time servers.
So Mills invented a sleight-of-hand, counting the last second twice, leaving 86,400 seconds in the day. Knowing it's going to happen, NTP is geared to adjust and add 36 seconds (instead of 35) when coordinating time with its atomic clock references. Atomic time is the time kept by precise atomic clocks used in geographical positioning systems (GPS) and other precise-time measuring services. Atomic clocks can't adjust to the solar day, but through the leap second atomic time and solar time remain in synch.
Stenn has checked with other maintainers of system time for their opinions on the Google smear and other methods. NTP's method of counting the last second twice got this endorsement from Poul-Henning Kamp, a Danish expert on computer time, who is working on improvements to NTP: "I believe this works the way Dave [Mills] originally envisioned it should, and [it] makes a semi-perverse kind of sense to do it that way."
Mills wanted system users to be able to find out when they had been caught in a leap second. So for those who use a "timex" inquiry method on what would normally be the last second of June 30, their time stamp will read 23:59:59. For those who inquire during the leap second, their time stamp would read 23:59:60, a time that normally can't occur, before the clock rolls over to the first second of July 1.
Kamp noted in an email message to Stenn, however: "So far, I have never found one single piece of not-written-by-me software that actually uses the timex API to find out what's going on, so it probably doesn't matter."
At Google, it does matter. In adding a second to its NTP servers in 2005, it ran into timekeeping problems on some of its widely distributed systems. The Mills sleight-of-hand was confusing to some of its clusters, as they fell out of synch with NTP time.
"Very large-scale distributed systems, like ours, demand that time be well-synchronized and expect that time always moves forwards," wrote Christopher Pascoe, Google site reliability engineer, in a blog post on Sept. 15, 2011, as another leap second adjustment approached. "Computers traditionally accommodate leap seconds by setting their clock backwards by one second at the very end of the day. But this 'repeated' second can be a problem. For example, what happens to write operations that happen during that second? Does email that comes in during that second get stored correctly? What about all the unforeseen problems that may come up with the massive number of systems and servers that we run?"
Google had already tried the smear approach in 2008. According to Pascoe's blog post: "The leap smear is talked about internally in the Site Reliability Engineering group as one of our coolest workarounds, that … ultimately saved us massive amounts of time and energy in inspecting and refactoring code. It meant that we didn't have to sweep our entire (large) codebase …"
In an email message to InformationWeek on the subject earlier this month, Stenn conceded: "Operationally, this is a very nice solution." But he said he still can't accept imposing inaccurate clocks on all types of systems used by NTP to satisfy Google's operational reasons.
Stenn can see problems in both approaches. "Choose your poison," he advised at one point. But the real solution, he said, lies in more work by standards bodies and time experts collaborating on a solution. And the barrier to that, he said, is one with which he's already familiar -- no one is willing to devote money toward resolving what sometimes seem like obscure time issues.
The work done to date by two different standards bodies has resulted in two different philosophies: Either it's OK to add a leap second to any month; or it can only be added at the end of June or December.
So far, the latter holds as the convention.
The Network Time Foundation, a nonprofit umbrella group that includes the NTP project, has as one of its agenda items to come up with a General Timestamp API, and resolve such issues in the time stamp process it adopts. "Getting that implemented and accepted takes resources we do not yet have," Stenn wrote in his email exchange with InformationWeek.
How would you like to see the leap second handled? Does Google's smear approach make more sense to you, or does Mills's idea of counting the last second twice work better? Do you have a better idea of how to handle this? Tell us all about it in the comments section below.Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive ... View Full Bio