Linux server time not in sync with ntp server

Time Synchronization

The network time protocol helps computer systems to synchronize their time. We know this protocol by its shorter name NTP. In the past, it was not really a big issue if your system was a few minutes off. This changed with the interconnected world we are now living in. One of the better examples is networks relying on the authentication protocol Kerberos. If your system time is not correct, you may not be able to authenticate. This is because granted tickets have a built-in protection against timing attacks. While you may not be an attacker, the system will refuse to work when it finds requests being from the past or future.

When your local clock is not correct, serious damage could happen. Database data and log files could be incorrect, resulting in data loss at worst. For forensics, it might become very hard to reconstruct the steps occurred in a security incident. So having your Linux systems happily synchronized is a must. Let’s have a look how things work and how we can troubleshoot when things don’t work.

History of Time

We relied in the past on the system itself, to maintain a time. This was done by using a hardware component, which is named the real-time clock (RTC). But no device or component is 100% reliable, so your system time could slowly become “outdated”. If it went a little bit too quickly, you would be living in the future, according to your computer. For other systems, they would be living in the past. Systems are nowadays connected to other networks. This makes it possible to synchronize our times to very precise clocks. We call those atomic clocks. Instead of using digital components, they use the radiation of atomic particles. Then we can share the time with radio waves, so other systems can get synchronized.

 

Linux and Time

Most Linux systems use the following options to synchronize time

  • No synchronization
  • NTP daemon
  • NTP client
  • Other clients
No Synchronization

The first option “none” is obvious: there is no software installed on the system to maintain the time. While this may sound as a guarantee of getting out of sync, it isn’t always the case. Virtualized systems for example, may use the host system to get the right time. When starting such a system, they get the right time of the host, and be able to maintain it correctly during uptime. There is a risk of “skewing” (getting out of sync) if the client system is not able to count the cycles correctly, e.g. when the CPU speed is adjusted. Another risk is when the host system does not always give each client the same amount of time per CPU cycle, resulting in small variants in counting.

NTP Daemon

Next option is a NTP daemon. For Linux is typically a running process, or daemon, with the name ntpd. This process is waiting to receive time from several trusted sources. When it knows with a certain guarantee what the time is, it will instruct the kernel to use this new time, and synchronize it usually also with the hardware clock. This way hardware clock, Linux kernel and NTP daemon have the same understanding of the time. When the NTP daemon sees some skewing again, it will adjust the time again.

The process of time adjusting usually happens in small steps. This way other software on the systems doesn’t suddenly get confused. For example: it is now 4:43:52 PM and we would log something to a file. Then our NTP daemon decides to change the time 10 minutes back in time. Three minutes later we log another line to our file, which will be suddenly 4:36:52 PM. Not only does this get confusing in log files, it may corrupt data in databases and processes relying on network synchronization.

Common daemons

  • ntpd
  • openntpd (OpenBSD project)
NTP Client

A much simpler option is using a NTP client. It does a similar thing as a NTP daemon, except that it does not track the time from many sources. Instead, it requests the time of a trusted source, and acts upon that information. A tool like ntpdate or rdate are used this way, and scheduled by a cron job to regularly check the time and synchronize.

Common clients

  • ntpdate
  • rdate
Other Clients

The last category is the other clients. When using virtualized systems this option might be used. A toolkit like the VMware tools is then installed on the client, which will do system householding in the background. It will exchange data with the host system to ensure things are in sync, including its time.

Time Troubles

As with most software, things can go wrong. Many of us rarely check if our time sources are properly configured and still work correctly. We just assume the time is correct and the system does the synchronization correctly, right? Especially when using a NTP daemon, things can go wrong. Its configuration needs to be set-up correctly, and checked regularly. If not, sooner or later, time will skew and result in being a few minutes off.

False-tickers

The first category of NTP troubles is when using a so-called “false-ticker”. Like our own system can be incorrect, a trusted time source can be incorrect. It can be happening on purpose, misconfiguration, or hardware issues. If we rely on such a resource, our time will be wrong as well. If you are using the NTP daemon together with ntpq, these false-tickers can be recognized with a “x” in front of the entry.

Stratum 16

Another thing to check for is the “stratum 16” entries. We refer to an atomic clock or a reference clock as stratum 0. Stratum 1 devices collect the time from a stratum 0 device, usually via radio waves (GPS, CMDA, etc). Then our own systems are usually at stratum 2 or 3. If an entry shows stratum 16, something is wrong. It might not be able to synchronize its date. This may be occurring when it can’t find the source. Something as simple as iptables filtering too much traffic.

Unreliable Sources

The last category consists of sources which are unreliable. Because the NTP daemon receives time information from a configured set of systems, it will check them with regular intervals. It will compare the data received from the sources, and take factors like distance and network delay in account. When it finds that a source provides unexpected results, it will be marked as unreliable. You can solve this by using different sources which are closer to you, or even internal. If it already an internal network source, then something might be wrong with the device. Most likely multiple systems will mark the same system as unreliable. When using a NTP daemon (and ntpq), these items are marked with a minus (-).

Time out of sync

Good to know is that NTP daemons usually won’t synchronize in big steps, as previously described. If time is too far off, it may even stop functioning, which is on purpose. This is an indirect warning that the time should be correctly manually. Best way to handle this is stopping first all process relying on time synchronization. Then manually synchronize time with a tool like ntpdate or rdate.

Discover Time Issues

So now we know it is important to track the time, and keep it synchronized it properly. Using the ntpq utility we can query the details of our time synchronization. In particular, we can see what sources are used, and any issues.

 

No sources can be reached, showing stratum 16

The best way to discover time synchronization issues is by monitoring the output of ntpq when using a NTP daemon. If you are using a NTP client, then it would make sense to compare it to trusted source and see if it does not differs too much (e.g. a few seconds). You could add tests to your monitoring tool to validate your time configuration on a regular basis.

 

Although ntpstat shows the status as synchronised to NTP server at stratum 3, timedatectl still shows NTP synchronized: no

Resolution

Workaround
  • Run the ntpddeamon without the option -x

    1. Edit the ntpd configuration file and remove the -x option

      # vi /etc/sysconfig/ntpd

    2. Restart the ntpd.service

      # systemctl restart ntpd.service

Root Cause

The kernel maintains an "unsynchronized" flag for the system clock. The timedatectl program will print "NTP synchronized: yes" only if this flag is cleared (set to zero). It doesn't support the protocol which ntpstat uses to query the state of ntpd.

ntpd can control the system clock using two different system functions:
- ntp_adjtime() enables a phase-locked loop implemented in the kernel (aka kernel discipline), which automatically corrects the frequency offset of the clock (drift) and it needs to be called only when a new measurement is made. It clears the "unsynchronized" flag in the kernel. The main limitation is that it cannot correct offsets larger than 0.5 seconds.
- adjtime() is an older method, which makes a one-time adjustment of the clock (slew). It doesn't correct the frequency offset, so it needs to be called frequently to compensate for it, even when no measurement is made. It cannot clear the "unsynchronized" flag in the kernel, but it can correct any offset.

ntpd can use ntp_adjtime() or adjtime(), but not both at the same time. By default it uses ntp_adjtime(). If the step threshold is set to a larger value than 0.5 seconds (e.g. by enabling the -x option), it has to switch to adjtime(), because ntp_adjtime() does not work with larger offsets.

That means the kernel "unsynchronized" status will not be cleared and timedatectl will report "NTP synchronized: no" when ntpd is started with the -x option.

Linux NTP Server Setting | IT Tutorials

How to verify that timedatectl does not show correct status:

 

# timedatectl | grep NTP
NTP enabled: yes
NTP synchronized: no
# ntpstat
synchronised to NTP server (10.0.0.1) at stratum 3
time correct to within 1026 ms
 problem:
 ntpd in "slew" mode (ntpd -x). With running ntpd in "slew" mode, timedatectl does always show "NTP synchronized: no" while ntpstat correctly show "synchronised to NTP server (10.11.160.238) at stratum 2 "

Version-Release number of selected component (if applicable):
RHEL7.6 
systemd-219-62.el7.x86_64

How reproducible:
remove chrony package, install ntp package, configure ntpd to run in "slew" mode, wait some time to be synchronized, test timedatectl vs. ntpstat


Steps to Reproduce:
1. yum remove chrony
2. yum install ntp
3. vi /etc/sysconfig/ntpd and add -x at the end of the OPTIONS line -> OPTIONS="-g -x" 
4. systemctl enable ntpd.service
5. systemctl start ntpd.service / or / timedatectl set-ntp 1
6. *wait some time to be synchronized*
7. ntpstat
8. timedatectl | grep NTP

Actual results:
# timedatectl | grep NTP
     NTP enabled: yes
NTP synchronized: no

# ntpstat
synchronised to NTP server (10.11.160.238) at stratum 2 
   time correct to within 10 ms
   polling server every 64 s

# ps -wauxxx | grep ntp
ntp       4343  0.0  0.0  29944  2140 ?        Ss   11:33   0:00 /usr/sbin/ntpd -u ntp:ntp -g -x

Expected results:

# timedatectl | grep NTP
     NTP enabled: yes
NTP synchronized: yes
 
ntpd running with the -x option doesn't tell the kernel that the clock is synchronized and timedated only uses the information returned by the kernel. It doesn't talk to ntpd or chronyd
 
How to check NTP is working?
  1. ntpq – standard NTP query program
  2. ntpstat – show network time synchronisation status
  3. timedatectl – show or set info about ntp using systemd

Let us see all commands and examples in details

Verify NTP is working or not with ntpstat command

The ntpstat command will report the synchronisation state of the NTP daemon running on the local machine. If the local system is found to be synchronised to a reference time source, ntpstat will report the approximate time accuracy.

exit status of ntpstat command

You can use the exit status (return values) to verify its operations from a shell script or command line itself:

  • If exit status 0 – Clock is synchronised.
  • exit status 1 – Clock is not synchronised.
  • exit status 2 – If clock state is indeterminant, for example if ntpd is not contactable.

Type the command as follows:
$ ntpstat
Sample outputs:

synchronised to NTP server (149.20.54.20) at stratum 3 
   time correct to within 42 ms
   polling server every 1024 s

Use the echo command to display exit status of ntp client:
$ echo $?
Sample outputs:

0

Checking the status of NTP with ntpq command

The ntpq utility program is used to monitor NTP daemon ntpd operations and determine performance. The program can be run either in interactive mode or controlled using command line arguments. Type the following command on your Linux or Unix-based system:
$ ntpq -pn
OR
$ ntpq -p
Sample outputs:

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*dione.cbane.org 204.123.2.5      2 u  509 1024  377   51.661   -3.343   0.279
+ns1.your-site.c 132.236.56.252   3 u  899 1024  377   48.395    2.047   1.006
+ntp.yoinks.net  129.7.1.66       2 u  930 1024  377    0.693    1.035   0.241
 LOCAL(0)        .LOCL.          10 l   45   64  377    0.000    0.000   0.001

* the source you are synchronized to (syspeer). The above is an example of working ntp client. Where,

  1. -p : Print a list of the peers known to the server as well as a summary of their state.
  2. -n : Output all host addresses in dotted-quad numeric format rather than converting to the canonical host names.

Another reliable source is running the following command:
$ ntpq -c rv
Look for the leap code as follows:


image

 


So leap code 0 (leap_none) means normal synchronized state. And leap code 3 (leap_alarm) means NTP wasnever synchronized. Here is a sample outputs:

A note about timedatectl command

If you are using systemd based system, run the following command to check the service status
# timedatectl status
Sample outputs:

Is my NTP (systemd-timesyncd) Working?

 

Is my NTP (systemd-timesyncd) Working?

systemd-timesyncd configuration

If NTP enabled is set to No. Try configuring by editing /etc/systemd/timesyncd.conf file as follows:
# vi /etc/systemd/timesyncd.conf
Append/edit [Time] as follows i.e. add time servers or change the provided ones, uncomment the relevant line and list their host name or IP separated by a space (default from my Debian 8.x server):

[Time]
Servers=0.debian.pool.ntp.org 1.debian.pool.ntp.org 2.debian.pool.ntp.org 3.debian.pool.ntp.org

Save and close the file. Finally, start and enable it, run:
# timedatectl set-ntp true
# timedatectl status

Sample outputs:

                      Local time: Mon 2019-09-30 18:25:38 IST
                  Universal time: Mon 2019-09-30 12:55:38 UTC
                        RTC time: Mon 2019-09-30 12:55:38
                       Time zone: Asia/Kolkata (IST, +0530)
       System clock synchronized: yes
systemd-timesyncd.service active: yes
                 RTC in local TZ: no

The above is easy way to verify NTP is working on Linux.

Only time when chrony sets time

When the chrony service starts, there are some settings in the /etc/chrony/chrony.conf file that tells it to actually set the time if specific conditions occur:

# Force system clock correction at boot time.
makestep 1000 10

which means that if chrony detects during the first 10 measurements after its start that the time is off by more than 1000 seconds it will set the clock.

Some useful commands

Below are some useful commands which can be used for the troubleshooting of chrony related issues.

# chronyc tracking  
# chronyc sources
# chronyc sourcestats
# systemctl status chronyd
# chronyc activity
# timedatectl

Check chronyd status

To check the status of the chronyd daemon :

# systemctl status -l chronyd
● chronyd.service - NTP client/server
   Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2016-08-12 13:22:22 IST; 1s ago
  Process: 33263 ExecStartPost=/usr/libexec/chrony-helper update-daemon (code=exited, status=0/SUCCESS)
  Process: 33259 ExecStart=/usr/sbin/chronyd $OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 33261 (chronyd)
   CGroup: /system.slice/chronyd.service
           └─33261 /usr/sbin/chronyd

Aug 12 13:22:22 NVMBD1S11BKPMED03 systemd[1]: Starting NTP client/server...
Aug 12 13:22:22 NVMBD1S11BKPMED03 chronyd[33261]: chronyd version 2.1.1 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +DEBUG +ASYNCDNS +IPV6 +SECHASH)
Aug 12 13:22:22 NVMBD1S11BKPMED03 chronyd[33261]: Frequency 0.000 +/- 1000000.000 ppm read from /var/lib/chrony/drift
Aug 12 13:22:22 NVMBD1S11BKPMED03 systemd[1]: Started NTP client/server.

The chronyc sources command

Running chronyc sources -v shows the current state of the NTP server/s configured in the system. Here is an example output, in which ntp.example.com shows as a valid server which is online:

# chronyc sources -v
210 Number of sources = 1

  .-- Source mode  '^' = server, '=' = peer, '#' = local clock.
 / .- Source state '*' = current synced, '+' = OK for sync, '?' = unreachable,
| /                'x' = time may be in error, '~' = time is too variable.
||                                                 .- xxxx [ yyyy ] +/- zzzz
||                                                /   xxxx = adjusted offset,
||         Log2(Polling interval) -.             |    yyyy = measured offset,
||                                              |    zzzz = estimated error.
||                                   |           |                         
MS Name/IP address           Stratum Poll LastRx Last sample
============================================================================
^* ntp.example.com          3    6     40    +31us[  -98us] +/-  118ms

Note that a Source state different than ‘*’ usually indicates a problem with the NTP server.

Source state ‘~’ means that the time is too variable
If the Source state is ‘~‘, it probably means that the server is accessible but the time is too variable. This can happen if the server responds too slow or responds sometimes slower and sometimes faster. You could check the response time of the pings to the server to see if they are slow or variable. This state has also been noticed when the server is running on virtual machines which are too slow causing timing issues.

Chrony check and restart every hour

Once an hour, the chrony service checks the output of the chronyc sources -v command, by running script /usr/sbin/palladion_chrony_healthcheck which runs /usr/sbin/palladion_check_chrony and checks its output:

  • if /usr/sbin/palladion_check_chrony returns 1 – it means there was no online source (no source with Source state = ‘*’) , so chrony restarts in an attempt to re-initialize the server status
  • if /usr/sbin/palladion_check_chrony returns 0 – this means everything is ok, chrony does not need to be restarted because it already has a valid online source
# cat /etc/cron.d/chrony
SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
#
# Check chrony every hour and restart if necessary.
#
16 * * * *     root    /usr/sbin/palladion_chrony_healthcheck

Chrony logs

There are several chrony logs that can be used to troubleshoot. Most of them are located in /var/log/chrony/. Note that the latest file is not always the *.log one. Sometimes it happens that even the *.log.2 or *.log.3 file are the ones that are more recent. Here is an example of listing the files with sorting by the most recent:

# ls -lisaht  /var/log/chrony/
total 1.5M
3801115 580K -rw-r--r--  1 root root 574K Oct 21 14:56 measurements.log.3
3801131 544K -rw-r--r--  1 root root 540K Oct 21 14:56 statistics.log.3
3801166 356K -rw-r--r--  1 root root 350K Oct 21 14:56 tracking.log.3
3801089 4.0K drwxr-xr-x 16 root root 4.0K Oct 21 00:01 ..
3801114 4.0K drwxr-xr-x  2 root root 4.0K Oct 21 00:01 .
3801128    0 -rw-r--r--  1 root root    0 Oct 21 00:01 tracking.log
3801110    0 -rw-r--r--  1 root root    0 Oct 21 00:01 measurements.log
3801120    0 -rw-r--r--  1 root root    0 Oct 21 00:01 statistics.log
3801167    0 -rw-r--r--  1 root root    0 Oct 20 00:01 tracking.log.1
3801165    0 -rw-r--r--  1 root root    0 Oct 20 00:01 statistics.log.1
3801159    0 -rw-r--r--  1 root root    0 Oct 20 00:01 measurements.log.1
............

Try setting only one NTP server by entering its IP address

If until now you have been using two or more NTP servers (either because they were set or because you entered an FQDN that resolves in different IP addresses), try to set one single NTP server by entering only one IP address. This may solve your NTP related issue.

Tracing the communication with the NTP server

To double check if the NTP server is answering or not, it is possible to trace the traffic between chrony and the NTP server for a period of time while monitoring the server:
1. Start a pcap trace with tcpdump on the NTP port 123 and leave it running until the issue appears (run it in ‘screen’ or with ‘nohup’ to avoid it from being stopped if you disconnect from the shell command)
2. As soon as the issue re-appears, get a System Diagnostics covering the entire history since you have set the server to DNS name until the gap reoccurred. If this produces a file that is too big, just get the System Diagnostics for Current data and in addition copy all the files from /var/log/chrony/, and all files called /var/log/syslog* . Remember to stop the trace you started at step 1

 

Commands list:

timedatectl
systemctl start ntpd
systemctl enable ntpd
systemctl disable chronyd
timedatectl set-ntp on
timedatectl set-ntp true
systemctl restart ntpd.service

systemctl enable ntpd.service

chkconfig ntpd on

timedatectl set-ntp 1

vi /etc/sysconfig/ntpd

ntpstat
ntpdate ntp.1.com

ps -aux | grep ntp
ntp       4343  0.0  0.0  29944  2140 ?        Ss   11:33   0:00 /usr/sbin/ntpd -u ntp:ntp -g -x

# timedatectl | grep NTP

Post a Comment

Previous Post Next Post