NTP Basics
The NTP reference implementation, ntpd
, has been designed to query the time from one or more configured reference time sources,
synchronize its own system time to those reference time sources, and at the same time work as NTP server to make its own synchronized system time available to other NTP clients on the network.
The program was originally written for Unix-like systems, but has also been ported to Windows. Meinberg makes a pre-compiled NTP package for Windows available to simplify installation on Windows.
Unlike some other programs for time synchronization which simply step the system time in periodic intervals,
ntpd
runs in the background and continuously adjusts its own system time as accurately as possible, in a way that is not even noticeably for applications.
It even measures and compensates its own system clock drift, so that a significant time offset can not even arise during continuous operation of the service.
The NTP software package includes a couple of executable programs. ntpd
is the NTP daemon (the NTP service on Windows) that runs in the background,
and ntpq
is the most important command line tool to check the status of the daemon/service which runs in the background.
Startup Behavior
Right after startup, ntpd
sets its internal status to stratum 16
and leap bits 11
to indicate that it is not yet synchronized.
This status information is also put into the packets that are sent out to the network to synchronize clients,
so NTP clients know the server is alive but is yet unable to provide an accurate time.
Large initial time offsets are only accepted and quickly corrected immediately after startup. See chapter Handling Large Time Offsets for the reasons.
Polling And Accepting Time Sources
It's a policy of ntpd
that any reference time source is only accepted if the time source claims to be synchronized,
no matter if the time source is a so-called hardware refclock
, e.g. a GPS clock, long wave receiver, etc., or another NTP server on the network:
- If the time source is a GPS receiver then the receiver needs to be synchronized to the signals from the GPS satellites.
- If the time source is another (so-called “upstream”) NTP server then that server needs to be synchronized to some time source, too, which can in turn be a GPS receiver, or some other upstream NTP server(s).
So only if a time source provides the time and claims to be synchronized the time source is considered reachable and may be accepted by an NTP client.
As mentioned above, a server instance of ntpd
that has just been started will not immediately be accepted by any client.
Only after the server was able to synchronize to its own configured time source(s), it starts claiming to be synchronized, and thus might be accepted by NTP clients.
Each configured reference time source is polled in certain, regular intervals. Polling means that the time and status are queried from a time source.
Polling Delays, Jitter And Accuracy
Depending on the type of time source, it takes some time until a submitted request arrives at its destination, and similarly it takes some time until a reply is received. Specifically, queries across the network require sending a request packet to a server, and waiting for a reply packet from the server. See this article for details.
The time accuracy of the client is not affected by the absolute magnitude of the polling delays, as long as the delays are exactly the same for the requests and replies. However, if the delay for requests is always shorter than for replies, or vice versa, e.g. on an ADSL network connection with different upload and download speeds, then this can't be determined automatically by the client, so this results in a systematic time error at the client side, depending on the ratio of the request and reply delay. This means that the real time offset may be e.g. a few milliseconds, even though the time offset computed by the client is reported as “0”.
In real life, network packet transportation generally suffers from a mean network delay caused by routers and switches, and variations of the network delay at subsequent pollings, if individual packets are more or less delayed.
If ntpd
queries the time from a GPS clock and/or PPS source then the jitter is usually at the microsecond level.
The network jitter on a LAN between different NTP nodes can be some tens of microseconds, and over a WAN connection it may be even milliseconds.
So a small time offset from a GPS receiver can quickly be identified, but if several pollings over a WAN yield different time offsets,
it is not clear if this is really a time offset, or it just looks like a time offset because a network packet has been queued in a switch or router.
So jitter reduces the accuracy of the computed time offset, and some filtering is required to reduce the jitter as good as possible.
This is why ntpd always evaluates the results from several polling actions over several polling intervals before it starts adjusting its own system time.
ntpd
has very powerful adaptive filters to determine the mean packet delay, network jitter, its own time offset,
and how the time offset evolves, i.e. how much its own system time drifts.
Classification of Time Sources
Each time source that is reachable (i.e., replies to queries, and is synchronized) is basically accepted by ntpd
.
However, if several reference time sources have been configured and are reachable then the time provided by each source,
its jitter etc. is evaluated to find a majority of time sources that provide the same, accurate time,
and select the “best” source of this group as so-called system peer. See:
- NTP docs: Mitigation Rules and the prefer Keyword
https://www.meinbergglobal.com/download/ntp/docs/html/prefer.html
Please note that the prefer
keyword has only limited effect on the selection process.
Usually the best approach is to simply let ntpd
select the system peer.
The selected system peer is marked with a *
(or an o
in case of a PPS refclock) in the output of the command ntpq -p
.
If there are several good reference time sources available then other so-called survivors of the selection process are also potential system peers,
so they are called candidates which are marked with a +
in the output of the command ntpq -p
.
If there are other time sources which provide a time that differs from the survivors' time,
these time sources are called falsetickers which are marked with a -
.
System Time Correction Value
The correction value for the own system time is derived from the weighted time offset from the system peer and the candidates. While this is a good approach for pure network clients, unfortunately this also means that the system time accuracy can be degraded for example if the main time source is a GPS refclock. The GPS clock provides very high accurate and precise time, but if there are additional reference time sources configured which are accessed via the network, those time sources can become candidates and thus may contribute to the weighted clock adjustment. Since network time sources usually provide significantly less accuracy and precision than a GPS refclock, this makes the system time adjustment worse than it could be if the GPS clock alone was used as referencer time source.
Multiple Time Sources
The described behavior suggests that it's good practice to configure only a single time source, so the time is always accepted, or more than 2 time sources, so that the selection algorithm can always determine a majority of good time sources. Due to the way the selection algorithm works there are certain quantities of time sources to be configured which yield the best results. See:
Specifically, please note that configuring exactly 2 reference time sources is the worst you can do:
if both time sources provide a slightly different time then the client ntpd
is unable to determine
which one provides the “right” time, and may finally ignore both time sources.
Polling Intervals
The polling interval is adjusted automatically by ntpd
depending on the stability of the local system clock.
The default range for the polling interval is 2^6s = 64s up to 2^10s = 1024s.
Normally it's a good idea to let ntpd
itself adjust the polling intervals for its time sources.
However, there are only some specific cases where this should be limited by configuration, e.g. for the the Windows port of ntpd
where the polling interval should be fixed to 2^6s = 64s.
Also for refclocks it may make sense to specify a fixed polling interval, depending on the refclock type.
Of course, if a time source is polled at a large interval it takes longer until an unexpected time step is detected than with a short polling interval, but it takes anyway a couple of minutes until it is eventually accepted.
On the other hand, if polling intervals shorter than 6 (64 seconds) are used with public NTP servers, operators of those public servers usually consider this as abusive, so this should be avoided.
System Time Adjustment
Once a time source has been selected as system peer, ntpd
starts to adjust its own system time
and changes its leap bits from 11
to 00
.
However, if a leap second is announced for the end of the current UTC day the leap bits
become 01
in case of a positive leap second (insertion), or 10
in case of a negative leap second (deletion).
The latter has not yet ever happened. All leap bit combinations except 11
indicate that ntpd
is synchronized.
Stratum Numbers
A synchronized ntpd
also changes its stratum number to the stratum of its system peer, plus 1.
So if the system peer is a hardware refclock then the reference time source has an internal stratum 0
,
so this instance of ntpd
becomes a stratum 1
NTP server on the network.
A client that selects this server as system peer becomes itself a stratum 2
server,
a client of that stratum 2
server becomes a stratum 3
server, and so on.
Unlike the stratum number used in the context of telecom applications, which defines a specific accuracy class, the stratum number used with NTP just indicates a hierarchy level, but does not guarantee a specific accuracy.
If clients receiving NTP response packets from this ntpd
server see that the leap bits from this server
are not 11
, and the stratum is not 16
, they start accepting the time from this server
in the same way as described in chapter Polling And Accepting Time Sources.
A control loop evaluates the determined time offset and clock drift, and applies corrections to the own system time.
Adjusting Small Time Offsets
As long as the system time offset determined by the filter algorithm is below a certain limit (the so-called step threshold
, 128 ms by default),
the system time is adjusted slowly and smoothly in a way that both the time offset, and the system clock drift
become as small as possible, so that a new significant time offset doesn't even accumulate. See also
- NTP docs: Step and Stepout Thresholds
https://www.meinbergglobal.com/download/ntp/docs/html/clock.html#step
So during normal operation, the system time is adjusted in a way that applications don't even notice the small corrections. However, how quickly and accurately a time offset is determined and compensated depends on the jitter seen by the filter algorithms from the last polling actions.
The rate for the system time adjustment is limited to 500 ppm, i.e. 500 microseconds per second, or 1.8 seconds per hour, which is usually sufficient for real applications.
Handling Large Time Offsets
If a large time offset is observed which exceeds the step threshold
then the system time is about to be stepped to correct this.
Normally this should only happen once, immediately after ntpd
has started, if the system time is not yet very accurate,
so only in this case the system time is stepped quickly to get the time offset below the step threshold
limit and continue with
the smooth adjustment.
However, if the system time has already been accurately disciplined,
but afterwards a system time offset is detected that exceeds the step threshold
, ntpd
waits for the so-called
stepout interval
(300 s by default since ntpd 4.2.8, 900 s by default until ntpd 4.2.6) to see if the large time offset persists,
and then checks if the time offset is above or below the so-called panic threshold
(1000 s by default). See:
- NTP docs: Panic Threshold
https://www.meinbergglobal.com/download/ntp/docs/html/clock.html#panic
If a large time offset occurs while 'ntpd' is already running, this can be due to one of the following reasons:
- An operator has changed the system time. This requires admin rights, so it's the operator's own problem if thinks he has to mess up the timekeeping.
- Another time synchronization software is running. It's never a good idea to have more than one program running in parallel to discipline the system time, so all programs but a single one should be disabled.
- The system timekeeping is broken. There have been cases where the time on a Windows server lost more than 30 seconds whenever a huge database application ran some maintenance tasks in the night. A program like
ntpd
is unable to compensate this, so the bad programs should be fixed instead.
So if the system time offset still exceeds the panic threshold
after the stepout interval
, ntpd
terminates itself with a message saying something like,
“set clock manually”.
The reason behind this behavior is that ntpd
means,
“The system time has been changed. That must have been done by the administrator who should know what he's doing. So I can't do anything else and terminate myself.”.
So a huge time offset that exceeds the panic threshold
is accepted only once at startup, if ntpd
is started with the '-g' option (which is usually the case).
Stepping The System Time
Whenever ntpd
steps the system time, all filter values from previous polls are discarded, and the control loop starts over from scratch.
Depending on the logging options configured for ntpd
, a “time reset” message may be written to the operating system's logging utility whenever ntpd
steps the system time.
Depending on the type and version of the operating system, there may also be a log message from the operating system whenever the system time is stepped by some application.
Don't Change The System Time While 'ntpd' Is Running
As mentioned earlier, ntpd
has not been designed to immediately correct large time steps that suddenly occur during operation.
If the system time is changed by some other program, or by the administrator (“to see if NTP really works”) while ntpd
is already running,
the system time is not corrected quickly by ntpd
as some folks might expect.
The reason is that the control loop used by ntpd
to accurately adjust the system time is totally messed up
if someone else fiddles with the system time. In any case it takes a few minutes (the stepout threshold
)
until the time step is accepted, the system time is stepped, and ntpd
has to start polling/filtering from scratch.
Eventually ntpd
even terminates itself, if the offset is too large.
For details see chapter Handling Large Time Offsets.
The next paragraph shows a better way how to monitor the performance of ntpd
.
Checking ntpd's Time Adjustment Performance
A good way to check that ntpd
is working properly is to run the command ntpq -p
periodically.
The output contains a table with one line of status information for each configured time source.
The table has the following columns:
remote | The (eventually truncated) host name or IP address of the time source. |
---|---|
refid | An informational indicator telling where this time source gets its time from. Can be a 4 character string, an IPv4 address, or the hash of an IPv6 address displayed like an IPv4 address. |
st | The stratum of the time source, which can be 16 if the source is not reachable or not synchronized |
t | The type of the time source, e.g. l for a local hardware refclock, or u for an upstream NTP server accessed via unicast data packets. |
when | The time after the last poll event. When when reaches the value of poll then the next polling action occurs. |
poll | The current polling interval, in seconds. |
reach | An octal display of the reach status. Whenever a polling event is successful, i.e. the time source
is accessible and synchronized, a logic 1 bit is shifted in from right, else a logic 0 bit.
So right after startup the reach value is 0, and after each successfull pollings it increases 1 , 3 , 7 , 17 , 37 ,
etc. until 377 , which means the last 8 pollings were successful.
During continuous operation the reach value stays at 377 for a time source that is continuously reachable. |
delay | The mean packet delay, in milliseconds. This is the mean execution time required to send a read request to the time source, and receive the reply from that source. |
offset | The mean time offset, in milliseconds. |
jitter | The time jitter, in milliseconds. This indicates how much packet delays from individual pollings vary from the mean packet delay. |
Please note the delay, offset, and jitter are all computed from the same four time stamps provided by each polling action, so they are related to each other, and all values settle when the control loop which adjusts the system time is settling.
Next there are some examples for the output of the ntpq -p
command, run on a Linux workstation with a built-in Meinberg GPS PCI card.
The time source labeled SHM(0)
with refid .shm0.
represents a hardware refclock, where the time from the GPS PCI card is fed into ntpd's shared memory driver.
lt-martin
is a GPS controlled Meinberg LANTIME NTP server on the local network.
The three ptbtime
nodes are NTP servers accessed via the internet.
The example below shows the result of an ntpq -p
command immediately after ntpd
was started:
remote refid st t when poll reach delay offset jitter ======================================================================== SHM(0) .shm0. 0 l - 8 0 0.000 0.000 0.000 lt-martin.py.me .INIT. 16 u - 64 0 0.000 0.000 0.000 ptbtime1.ptb.de .INIT. 16 u - 64 0 0.000 0.000 0.000 ptbtime2.ptb.de .INIT. 16 u - 64 0 0.000 0.000 0.000 ptbtime3.ptb.de .INIT. 16 u - 64 0 0.000 0.000 0.000
The reach
column for all time sources is 0
, so none of the time sources has been polled, yet.
For the upstream NTP servers this is indicated by a stratum value 16
, and a refid reading .INIT.
.
Also no line has an asterisk *
mark at the beginning, so there is yet no system peer,
and thus ntpd
has a status saying it is not synchronized.
A short time later the output has changed:
remote refid st t when poll reach delay offset jitter ======================================================================== *SHM(0) .shm0. 0 l 1 8 37 0.000 -0.221 0.121 +lt-martin.py.me .MRS. 1 u 21 64 1 0.097 -0.116 0.036 ptbtime1.ptb.de .INIT. 16 u 32 64 0 0.000 0.000 0.000 +ptbtime2.ptb.de .PTB. 1 u 19 64 1 186.367 -87.007 43.283 +ptbtime3.ptb.de .PTB. 1 u 20 64 1 192.954 -90.638 24.156
Now some of the sources have already been polled, and the GPS PCI card is marked as system peer with a *
at the beginning of the line.
Some other sources are considered candidates for the system peer and are thus marked with a +
.
Again some time later the control loop has settled:
remote refid st t when poll reach delay offset jitter ======================================================================== *SHM(0) .shm0. 0 l 7 8 377 0.000 0.002 0.003 +lt-martin.py.me .MRS. 1 u 60 64 377 0.080 -0.004 0.015 +ptbtime1.ptb.de .PTB. 1 u 38 64 377 11.665 0.021 29.236 -ptbtime2.ptb.de .PTB. 1 u 60 64 377 12.184 0.312 103.407 -ptbtime3.ptb.de .PTB. 1 u 56 64 377 12.257 0.342 81.159
The GPS PCI card is still the system peer, and shows only 2 miroseconds offset, but 3 microseconds jitter, and it stays continuously at this level.
lt-martin
on the LAN shows currently 4 microseconds offset, and 15 microseconds jitter.
ptbtime1
on the internet has 21 microseconds offset, which is not much if you take into account that the jitter is 29 milliseconds (!).
Also the other ptbtime
servers show only about 300 microseconds time offset, even though the jitter is even higher.
Anyway, they are classified as falsetickers since they are worse than the other time sources.
The GPS PCI card as system peer as well as the candidates lt-martin
and ptbtime1
are used to adjust the system time.
As mentioned above, the jitter from the upstream NTP servers which is much higher than the jitter from the GPS time source makes the time adjustment worse than it could be.
A quick test shows that the results are better if we use only the GPS PCI card as real time source, and append the keyword noselect
to the configuration lines for the upstream NTP servers.
The keyword noselect
tells ntpd
to poll a time source as usual, but don't consider it as a valid time source to which it can synchronize.
So in the next example the GPS PCI card connected via the SHM driver is the only real time source, and the other sources are only monitored:
remote refid st t when poll reach delay offset jitter ======================================================================== *SHM(0) .shm0. 0 l 1 8 377 0.000 0.001 0.000 lt-martin.py.me .MRS. 1 u 23 64 377 0.091 -0.021 0.016 ptbtime1.ptb.de .PTB. 1 u 21 64 377 12.200 0.289 135.419 ptbtime2.ptb.de .PTB. 1 u 10 64 377 12.808 0.083 241.332 ptbtime3.ptb.de .PTB. 1 u 24 64 377 12.196 0.411 174.997
We can see here that the offset and jitter from the GPS PCI card are smaller than in the original configuration with the additional upstream servers, but the drawback here is that the other sources are really only monitored, so they can't become candidates or even system peer in case the GPS card fails. So there is more accuracy but no redundancy with this configuration.
Please note that the jitter for the NTP servers on the WAN is here even higher than before. This is accidentally, just because the network connection is currently very busy.
Debugging Large Time Offsets
Normally, ntpd
steps the system time at most only once, shortly after it was started, to compensate an initial large time offset.
If there are periodic “Time Reset” events then this may either be due to an excessive clock drift, or because there's another program that also fiddles with the system time, continuously or only periodically.
So in any case it is helpful to know if the system time offset increases slowly and continuously for some reason, or if it is continuously low for some time and then suddenly becomes large.
Excessive Clock Drift
In some virtualization environments, or with a bad operating system or drivers where e.g. timer ticks get lost,
the undisciplined system time might drift so much that ntpd
is unable to compensate the drift,
thus the time offset quickly increases and exceeds the step threshold
,
so that after the stepout threshold
the system time is set correctly, and the game starts over again.
The only possible fix is to find out what causes the excessive clock drift, and fix this, which can be very hard to do.
Eventually there is another software running that also continuously applies corrections to the system time, and thus works against ntpd
.
This can be any other time synchronization software, not only an NTP client.
Sudden Huge Time Steps
If the system time is corrected whenever ntpd
is (re-)started, or the time offset is constantly low over a certain interval and then suddenly becomes large
then probably the system time has been set by a user with administrator privileges, or by some other application running with sufficient privileges.
For example, if the system is a virtual machine then the VM may have been configured such that the system time is periodically adjusted by the virtualization system itself.
In VMware there is a “VMware Tools” configuration parameter Time Sync
that should be set to Off
if ntpd
is running in the virtual machine.
If this parameter is set to On
then the time in the VM is periodically set to the time of the physical host, causing a time offset that can be small or huge,
depending on how good or bad the time in the virtualization system on the physical host is synchronized.
So again, to fix a problem like this you have to find out who or what sets the system time.
Detecting on Windows who has set the system time
Unless the Windows version is very old, the Windows kernel writes a log entry to the system event log whenever the system time is changed, but of course there are no log entries if the system time is only adjusted smoothly.
The Windows event viewer application can be used to inspect such log entries. If you open the properties of such an event and look at the “details” page then you can find the numeric process ID of the process that has changed the system time.
To find out which process has that specific process ID, you can for example open a PowerShell command line window and type the command
get-proccess
which prints a current list of processes with names and IDs.
So whenever ntpd
had to set the system time you find an associated system log entry with the process ID of ntpd
mentioned in the event details.
Similarly, if another process has set the system time you can identify that process. For example, if the time in a VM is periodically
set by the VMware tools then the process ID may belong to a process named vmwared
, so you know you have to change the parameter Time Sync
in the virtual machine settings and set it to Off
.
Please keep in mind that a process is assigned a new process ID if another instance is started, so also if a service is restarted the new instance of the service has a different process ID that may not match the process ID found in older system events.
Also, if a program runs, sets the system time, and then terminates, it will not be shown in the process list anymore after it has terminated.
Redundancy And Safety
The NTP reference implementation (ntpd
) uses a different approach for redundancy as usually known from other server approaches.
It is not possible to configure a “master” and a “slave” time source, expect the client to use only the master,
and switch to the slave only when the master becomes unavailable.
However, as explained earlier, you can simply configure several time sources at the client,
so the client ntpd
itself checks all servers periodically, and selects the ones to use.
If one of the configured reference time sources becomes unreachable,
this time source is automatically discarded by the selection algorithm.
Specifically, if the system peer becomes unreachable then simply a new system peer is selected from the remaining candidates,
as long as at least one candidate is available.
Since the system clock adjustment has been derived from the previous system peer and the candidates,
switching can be done very smoothly.
Also, if the time provided by a specific source starts to drift away from the time provided by other sources, the drifting time source becomes a falseticker and is also discarded. So even if a GPS clock is spoofed by some bad guys this can be detected and the GPS clock can be discarded and overvoted as long as there are other time sources available which provide and agree on the right time.
So this provides a high level of built-in redundancy and safety of operation.
Holdover Behavior And Root Dispersion
A special case is when one or more configured time source have been reachable for some time, and then suddenly all time sources become unreachable. This means that e.g. the antenna may have been disconnected from a GPS receiver used as single reference time source, or another NTP server on the network used as single reference time source may have been shut down (powered off), or the network connection to the remote server(s) is broken.
In this case ntpd
normally does not change its leap bits back to 11
, and does not change its stratum
back to 16
.
Instead, it keeps the stratum value it had before, and just starts to increase its so-called root dispersion
value over time.
This state is called holdover mode
.
The root dispersion
can be interpreted as a very coarse estimate of how much the local time has drifted away from some reference time.
Normally it increases at a constant rate, but is reset to a low value whenever the time could be queried successfully from a reference time source.
The value to which the root dispersion is reset depends on the precision of the reference time source.
Anyway, in holdover mode there are no more successful queries to a reference time source, so the root dispersion
keeps increasing continuously over time.
The root dispersion
is also put into the NTP packets sent to clients, so a client can see that the root dispersion
is increasing
and thus the time of the server has started drifting, and each client itself can decide what to do:
- If the client has another time source configured which is not drifting, it can switch to a better time source and discard the drifting NTP server.
- If the client has no other time source configured then it can keep accepting the drifting server anyway, so all clients of that server will at least keep the same time. If clients would immediately discard this server even if they had no other time source available then this would be even worse since even the times on different clients would start to drift apart.
LANTIME NTP Server In Holdover With Trust Time
Generally, ntpd
disciplines its own system time as long as the time sources are accepted,
and starts sending the freewheeling system time when all configured time sources become unreachable.
On the other hand, if a refclock (e.g. a GPS receiver) provides a good, stable oscillator which is disciplined during normal operation,
this oscillator usually drifts much less than e.g. the cheap crystal on an embedded microprocessor system or on a PC's mainboard.
So it usually make sense in this case to let ntpd
accept the GPS receiver for quite some time even after GPS reception has failed.
The parse
refclock driver (driver 8) from the NTP software package which is used for Meinberg GPS receivers
supports the concept of a trust time
. Please note that only the parse
refclock driver supports this,
other refclock drivers which might be used for different GPS receivers (e.g. NMEA) don't support this.
The trust time
interval starts when GPS reception suddenly fails, and only after the trust time has expired,
ntpd
notices that the GPS receiver has failed, and is unsynchronized. So ntpd
discards the GPS time source
only after the trust time interval.
This feature provides a stable time for a much longer holdover interval than the freewheeling clock
of an embedded microprocessor board, or a standard PC.
The trust time
interval needs to be determined according to the quality of the oscillator, and the acceptable time offset
due to the clock drift after reception has failed, which is a requirement of the specific application.
For example, if the acceptable drift is 10 milliseconds the trust time interval can be much longer than if the acceptable drift is only 100 microseconds.
On Meinberg LANTIME devices the trust time interval as well as the stratum number in holdover mode can be configured via the web interface.
Should An NTP Client Generally Discard A Server In Holdover Mode?
A basic question is why a client should stop accepting that server if there is no alternate time source available.
Usually, the time on a client drifts much more if the clients stops synchronizing to a dedicated NTP server since the server provides much more stable time even when in holdover.
So in most cases a better approach is to let clients still accept the time from a stable time source, but generate an alert e.g. if the time of the server starts drifting. For example, if you configure 10 days trust time on a Meinberg LANTIME, then the LANTIME can send a notification (e.g. log message, email, SNMP trap, …) when GPS reception fails, but can still provide a pretty accurate time to its clients during the trust time / holdover interval. So there's pretty much time for investigation, and to fix the reception problem.
Compatibility With Dumb NTP Clients
Described above is the default behavior of the NTP reference implementation in client and server role. However, other clients, specifically simple SNTP clients may behave differently.
There are SNTP implementations our there which only look at the stratum value received from the NTP server,
and expect the stratum to change back to 16
if the time sources of the server aren't synchronized anymore.
With some specific configuration you can force this behavior for the NTP server, e.g. if you configure orphan mode
or the so-called local clock
as fallback time source, with a stratum 15
.
In this case the server ntpd
discards its time source when it becomes unreachable, and switches to the configured
substitute time source which has stratum 15
, and thus becomes stratum 15 plus 1, i.e. stratum 16
.
So in special cases the trust time
can be set to a very short interval only, so that the stratum changes quickly to 16,
as mentioned above. However, as explained before,
the basic question is whether this is the best approach for the application.
LANTIME Clustering Feature
Meinberg LANTIME NTP servers provide a clustering feature which is an extension of the standard NTP funtionality.
If there are simple NTP clients which don't provide the powerful functionality of ntpd
,
but rely on a time source which is always available, then 2 or more LANTIMEs can be configured as cluster
which share an additional, common cluster IP address.
Only one of the LANTIMEs uses this IP address to provide NTP services. However, the LANTIME devices monitor each other,
and if the active LANTIME fails, another one becomes the active device and starts servicing NTP requests
via the shared cluster IP address.
So a client which synchronizes to the cluster IP address doesn't even notice if one device fails since the service is taken over by another device.
LANTIME Accuracy After Power Cycle
If a LANTIME is powered off, the time is only kept in a battery buffered RTC chip, and after power-up the initial time is read from that RTC chip. Unfortunately the high quality oscillator which often even includes an oven (OCXO) requires much more power than can be provided by a small backup battery, and thus accuracy is lost after power cycling.
This also means that after power cycling the GPS receiver claims to be not synchronized,
and also ntpd
has its stratum set to 16
and its leap bits to 11
, so clients don't accept
the ntpd
running on the LANTIME as time source after power cycling
until the GPS receiver is synchronized again to the satellites,
so that ntpd
can accept it as time source and synchronize to the GPS receiver.
— Martin Burnicki martin.burnicki@meinberg.de, last updated 2022-08-25