Discussion:
[Opendnssec-user] "not serving soa" warning message
Havard Eidnes
2016-10-25 12:16:54 UTC
Permalink
Hi,

since I've had to restart OpenDNSSEC a few times lately due to
the bug in the signer when a zone is removed, I've taken a look
at the logs (I know, bad idea...), and I noticed a warning
message from the axfr component in the signer saying "zone xxx
expired at nnnnnnn, and it is now mmmmmmmm: not serving soa".

I thought "what?!?", and had to dig a bit deeper.

I was curious as to the values which went into computing the
"expire" value, so amended the log message in the obvious manner,
and restarted OpenDNSSEC with this in place, and I get:

Oct 25 13:51:55 hugin ods-signerd: [axfr] zone 3.1.2.1.5.7.7.4.nrenum.net expired at 1475733328, and it is now 1477396315: not serving soa (serial_xfr_acquired=1473918928, expire=1814400)
Oct 25 13:51:55 hugin ods-signerd: [axfr] zone 3.1.2.1.5.7.7.4.nrenum.net expired at 1475733328, and it is now 1477396315: not serving soa (serial_xfr_acquired=1473918928, expire=1814400)
Oct 25 13:53:17 hugin ods-signerd: [axfr] zone 254.39.128.in-addr.arpa expired at 1474532011, and it is now 1477396397: not serving soa (serial_xfr_acquired=1473927211, expire=604800)
Oct 25 13:53:17 hugin ods-signerd: [axfr] zone 254.39.128.in-addr.arpa expired at 1474532011, and it is now 1477396397: not serving soa (serial_xfr_acquired=1473927211, expire=604800)

The value for serial_xfr_acquired for the nrenum.net zone I find
in the xfrd-state file for the zone:

hugin: {11} cat 3.1.2.1.5.7.7.4.nrenum.net.xfrd-state
;OpenDNSSEC-backup-v3
;;Zone: name 3.1.2.1.5.7.7.4.nrenum.net ttl 3600 mname ns.uninett.no. rname hostmaster.uninett.no. serial 2015120315 refresh 28800 retry 3600 expire 1814400 minimum 86400
;;Master: num 0 next -1 round -1 timeout 1477416446
;;Serial: xfr 2015120315 1473918928 notify 0 0 disk 2015120315 1473918928
;OpenDNSSEC-backup-v3
hugin: {12}

That timestamp is a while in the past:

hugin: {12} date -r 1473918928
Thu Sep 15 07:55:28 CEST 2016
hugin: {13}

The hidden master upstream of OpenDNSSEC, however, emits this SOA
record:

3.1.2.1.5.7.7.4.nrenum.net. 3600 IN SOA ns.uninett.no. hostmaster.uninett.no. 2016102503 28800 3600 1814400 86400

which has a *different* SOA version number from what's recorded
in the xfrd-state file.

The .backup2 file for the zone also says something different
about the incoming SOA:

;OpenDNSSEC-backup-v3
;;Time: 1477399495
;;Zone: name 3.1.2.1.5.7.7.4.nrenum.net class 1 inbound 2016102503 internal 2016102509 outbound 2016102509
;;Signconf: lastmod 1476954235 maxzonettl 0 resign PT7200S refresh PT777600S valid PT1814400S denial PT1814400S jitter PT43200S offset PT3600S nsec 50 dnskeyttl PT3600S soattl PT3600S soamin PT900S serial datecounter

The "lastmod" points to "Thu Oct 20 11:03:55 CEST 2016" which is
much closer in time, and is well within the expire window:

: smistad {28} ; dc
1476954235 1814400+p
1478768635
!date -r 1478768635
Thu Nov 10 10:03:55 CET 2016

So... It seems to me that whatever code is supposed to keep the
xfrd-state information up to date isn't doing its job so that the
information is preserved across OpenDNSSEC restarts. Either
that, or the use of the information from the xfrd-state file is
incorrect.

And ... the two zones above are far from the only ones which are
affected by this problem.

Now, OpenDNSSEC itself may eventually come around to querying the
upstream hidden master for the SOA for the zones it serve, and do
a re-transfer if required, and in turn update its internal state,
and start serving the zone, hopefully preventing the downstream
public distribution master from expiring the zones.

However, it seems to me that the current code which is supposed
to "restore my state from where I left off" leaves ... a bit to
be desired -- OpenDNSSEC ought to be able to start serving the
zones it has cached from an earlier instance instead of starting
up by refusing SOA queries for the zones it serves.

Comments?

Regards,

- Håvard
Yuri Schaeffer
2016-10-28 08:15:30 UTC
Permalink
Hi Havard,

Thanks for the extensive report. I created an issue
https://issues.opendnssec.org/browse/OPENDNSSEC-853 which summarizes the
problem like this:

serial_xfr_acquired time in the xfrd state file is not updated properly.
This may cause an issue on restart if serial_xfr_acquired+expire < now.
The zone is then not served (despite having had a recent incoming XFR
and up to date SOA). After next XFR the zone will be served again.

We'll look in to it.

//Yuri
Post by Havard Eidnes
since I've had to restart OpenDNSSEC a few times lately due to
the bug in the signer when a zone is removed, I've taken a look
at the logs (I know, bad idea...), and I noticed a warning
message from the axfr component in the signer saying "zone xxx
expired at nnnnnnn, and it is now mmmmmmmm: not serving soa".
Havard Eidnes
2016-10-28 08:23:36 UTC
Permalink
Post by Yuri Schaeffer
Thanks for the extensive report. I created an issue
https://issues.opendnssec.org/browse/OPENDNSSEC-853 which summarizes the
serial_xfr_acquired time in the xfrd state file is not updated properly.
This may cause an issue on restart if serial_xfr_acquired+expire < now.
The zone is then not served (despite having had a recent incoming XFR
and up to date SOA). After next XFR the zone will be served again.
We'll look in to it.
Thanks!

I suspect that OpenDNSSEC itself is periodically doing a
"refresh" (outgoing SOA query towards the hidden master) for the
zones it serves, and after a while it will have completed this
for all the zones.

I observe that the "not serving soa" messages stopped after a
number of hours (I restarted OpenDNSSEC around 13:00 and the last
"not serving soa" message was logged 23:21 the same day), so it
looks like the downstream distribution master is in reality not
in danger of expiring the zones -- I'm assuming that once
OpenDNSSEC has done its refresh, it has started serving the SOA
records downstream as well.

Regards,

- Håvard

Loading...