[Opendnssec-user] ods2 AXFR request to nameserver fails , reports "bad packet: ... received error code NOTAUTH", but no traffic (tcpdump) seen ?

Discussion:

PGNet Dev

2016-12-25 20:11:47 UTC

Permalink

I'm running ods2, setting up for AXFR zone transfer from a Bind9 instance.

The bind9 server listens at

telnet 127.0.0.1 53
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.

From shell on the same box, a cmd-line transfer request

dig -b 127.0.0.1 axfr example.com @127.0.0.1

correctly returns

; <<>> DiG 9.11.0-P1 <<>> -b 127.0.0.1 axfr example.com @127.0.0.1
;; global options: +cmd
example.com. 5 IN SOA dns.example.com. adm.example.com. 1482370103 7200 1800 604800 5
...

and in my bind9 xfer logs, set to debug loglevel

...
category xfer-in { loglevel_debug; };
category xfer-out { loglevel_debug; };
category notify { loglevel_debug; };
category network { loglevel_debug; };
...

i see the ok start/end of the xfer,
...
Dec 25 11:44:11 dns named[28511]: 25-Dec-2016 11:44:11.600 xfer-out: info: client @0x7fb168074aa0 127.0.0.1#56479 (example.com): view internal: transfer of 'example.com/IN': AXFR started (serial 1482370103)
Dec 25 11:44:11 dns named[28511]: 25-Dec-2016 11:44:11.601 xfer-out: info: client @0x7fb168074aa0 127.0.0.1#56479 (example.com): view internal: transfer of 'example.com/IN': AXFR ended
...

and watching

tcpdump -i lo port 53

I see the full transaction traffic.

In opendnssec's addns.xml, I've config'd,

<?xml version="1.0" encoding="UTF-8"?>
<Adapter>
<DNS>
<TSIG>
<Name>ods-key</Name>
<Algorithm>hmac-sha256</Algorithm>
<Secret>xxx...xxx</Secret>
</TSIG>
<Inbound>

<RequestTransfer>

<Remote>
<Address>127.0.0.1</Address>
<Port>53</Port>
<Key>ods-key</Key>
</Remote>
</RequestTransfer>
</Inbound>
</DNS>
</Adapter>

When I exec

/usr/local/opendnssec/sbin/ods-enforcer zone add \
--zone example.com \
--policy lab \
--in-type DNS \
--input /usr/local/etc/opendnssec/addns.xml

The axfr is attempted, but fails,

...
Dec 25 11:41:10 dns ods-enforcerd: [zone_add_cmd] zone example.com added [policy: lab]
Dec 25 11:41:10 dns ods-enforcerd: INFO: The XML in /var/opendnssec/enforcer/zones.xml is valid
Dec 25 11:41:10 dns ods-enforcerd: INFO: The XML in /var/opendnssec/enforcer/zones.xml.update is valid
Dec 25 11:41:10 dns ods-enforcerd: [zone_add_cmd] internal zonelist updated successfully
Dec 25 11:41:10 dns ods-enforcerd: 1 zone(s) found on policy "lab"
Dec 25 11:41:10 dns ods-enforcerd: [hsm_key_factory_generate] 1 keys needed for 1 zones covering 86400 seconds, generating 1 keys for policy lab
Dec 25 11:41:10 dns ods-enforcerd: 1 new KSK(s) (256 bits) need to be created.
Dec 25 11:41:11 dns ods-enforcerd: 1 zone(s) found on policy "lab"
Dec 25 11:41:11 dns ods-enforcerd: [hsm_key_factory_generate] 6 keys needed for 1 zones covering 86400 seconds, generating 6 keys for policy lab
Dec 25 11:41:11 dns ods-enforcerd: 6 new ZSK(s) (256 bits) need to be created.
Dec 25 11:41:13 dns ods-enforcerd: [enforcer] update zone: example.com
Dec 25 11:41:15 dns ods-enforcerd: 1 zone(s) found on policy "lab"
Dec 25 11:41:15 dns ods-enforcerd: [hsm_key_factory_generate] 1 keys needed for 1 zones covering 86400 seconds, generating 1 keys for policy lab
Dec 25 11:41:15 dns ods-enforcerd: 1 new KSK(s) (256 bits) need to be created.
Dec 25 11:41:15 dns ods-enforcerd: 1 zone(s) found on policy "lab"
Dec 25 11:41:15 dns ods-enforcerd: [hsm_key_factory_generate] 6 keys needed for 1 zones covering 86400 seconds, generating 1 keys for policy lab
Dec 25 11:41:15 dns ods-enforcerd: 1 new ZSK(s) (256 bits) need to be created.
Dec 25 11:41:16 dns ods-enforcerd: [signconf_cmd] performing signconf for zone example.com
Dec 25 11:41:16 dns ods-enforcerd: [signconf_cmd] signconf done for zone example.com, notifying signer
Dec 25 11:41:16 dns ods-signerd: [xfrd] zone example.com request axfr to 127.0.0.1
Dec 25 11:41:16 dns ods-signerd: [xfrd] bad packet: zone example.com received error code NOTAUTH from 127.0.0.1
Dec 25 11:41:16 dns ods-signerd: [xfrd] zone example.com, from 127.0.0.1 has tsig error (Bad Key)
Dec 25 11:41:16 dns ods-signerd: [xfrd] unable to process tsig: xfr zone example.com from 127.0.0.1 has bad tsig signature
Dec 25 11:41:16 dns ods-signerd: [xfrd] bad packet: zone example.com received bad tsig from 127.0.0.1
Dec 25 11:41:16 dns ods-signerd: [xfrd] zone example.com request axfr to 127.0.0.1
Dec 25 11:41:16 dns ods-signerd: [xfrd] bad packet: zone example.com received error code NOTAUTH from 127.0.0.1
Dec 25 11:41:16 dns ods-signerd: [xfrd] zone example.com, from 127.0.0.1 has tsig error (Bad Key)
Dec 25 11:41:16 dns ods-signerd: [xfrd] unable to process tsig: xfr zone example.com from 127.0.0.1 has bad tsig signature
Dec 25 11:41:16 dns ods-signerd: [xfrd] bad packet: zone example.com received bad tsig from 127.0.0.1
Dec 25 11:41:16 dns ods-signerd: [xfrd] zone example.com request axfr to 127.0.0.1
Dec 25 11:41:16 dns ods-signerd: [xfrd] bad packet: zone example.com received error code NOTAUTH from 127.0.0.1
Dec 25 11:41:16 dns ods-signerd: [xfrd] zone example.com, from 127.0.0.1 has tsig error (Bad Key)
Dec 25 11:41:16 dns ods-signerd: [xfrd] unable to process tsig: xfr zone example.com from 127.0.0.1 has bad tsig signature
Dec 25 11:41:16 dns ods-signerd: [xfrd] bad packet: zone example.com received bad tsig from 127.0.0.1
Dec 25 11:41:24 dns ods-signerd: [tools] unable to read zone example.com: adapter failed (Incoming zone transfer not ready)
Dec 25 11:41:24 dns ods-signerd: back-off task [read] for zone example.com with 60 seconds

and there's no trace of it in the Bind9 xfer logs ...

nor any output at all at

tcpdump -i lo port 53

i.e., it _appears_ as if no request is actually initiated/sent.

Is there additional config needed? Or is this a known bug? Or something else entirely ... ?

Havard Eidnes

2016-12-25 22:41:30 UTC

Permalink

Post by PGNet Dev
From shell on the same box, a cmd-line transfer request

This one doesn't use TSIG. If it did, you'd be using the -y option.

Post by PGNet Dev
In opendnssec's addns.xml, I've config'd,
<?xml version="1.0" encoding="UTF-8"?>
<Adapter>
<DNS>
<TSIG>
<Name>ods-key</Name>
<Algorithm>hmac-sha256</Algorithm>
<Secret>xxx...xxx</Secret>
</TSIG>

You've configured OpenDNSSEC to use TSIG. You then need to make
the corresponding configuration on your BIND name server to
recognize that key, in the form

key ods-key {
algorithm hmac-sha256;
secret "xxx...xxx";
};

Post by PGNet Dev
Dec 25 11:41:16 dns ods-signerd: [xfrd] bad packet: zone example.com received error code NOTAUTH from 127.0.0.1

This is a hint.

Post by PGNet Dev
Dec 25 11:41:16 dns ods-signerd: [xfrd] zone example.com, from 127.0.0.1 has tsig error (Bad Key)

And this is the smoking gun.

Regards,

- Håvard

PGNet Dev

2016-12-25 23:51:13 UTC

Permalink

Post by Havard Eidnes

Post by PGNet Dev
From shell on the same box, a cmd-line transfer request

This one doesn't use TSIG. If it did, you'd be using the -y option.

Actually, that was simply to verify AXFR transferability & connection ...

From cmd line @ shell,

dig -b 127.0.0.1 axfr example.com @127.0.0.1 -y hmac-sha256:ods-key:xxx...xxx

works as well.

AND, I see the traffic in tcpdump, as above.

Post by Havard Eidnes
You've configured OpenDNSSEC to use TSIG. You then need to make
the corresponding configuration on your BIND name server to
recognize that key, in the form
key ods-key {
algorithm hmac-sha256;
secret "xxx...xxx";
};

Yes, and it's included. I transfer to/from other nameservers, using other keys, with no issue.

Post by Havard Eidnes

Post by PGNet Dev
Dec 25 11:41:16 dns ods-signerd: [xfrd] bad packet: zone example.com received error code NOTAUTH from 127.0.0.1

This is a hint.

Post by PGNet Dev
Dec 25 11:41:16 dns ods-signerd: [xfrd] zone example.com, from 127.0.0.1 has tsig error (Bad Key)

And this is the smoking gun.

I'm not convinced that it is.

I'd expect that there's SOME traffic shown via tcpdump in the ods2 usages case, EVEN IF it's NOTAUTH'd. Unfortunately, it's not.

Unless there's a reason I've missed/misunderstood why traffic WOULD show up when invoking AXFR from the cmd line, but not when invoked by ODS2 ...

PGNet Dev

2016-12-26 20:59:13 UTC

Permalink

Then I'm out of good and obvious suggestions, I'm afraid.
Unless, of course, 127.0.0.1 is what's really listed in the
config file, of course :) But I guess that's a silly suggestion,
but sometimes the silly even applies.

More often than not! :-)

In any case, I'm finding all sorts of little gotchas as I exercise
different pieces of this.

ODS clearly works for others -- although I'm not entirely certain that
I've seen a working config with my specific version-stack yet -- so I
suspect that there some issues in my config.

That said, there's also some challenges getting detailed-enough logging
info out of ODS2 itself to troubleshoot. Docs for ODS2 don't seem as if
they've caught up to the release, yet, either.

Today's latest for me, https://issues.opendnssec.org/browse/SUPPORT-206
, also suggest maybe some code issues.

I'm poking at it now, with tools from the outside, and trial-n-error.

I'll see how far I get reassembling the pieces ;-)

Yuri Schaeffer

2016-12-26 21:47:17 UTC

Permalink

Post by PGNet Dev
Today's latest for me, https://issues.opendnssec.org/browse/SUPPORT-206
, also suggest maybe some code issues.
I'm poking at it now, with tools from the outside, and trial-n-error.

I'm not in the position to dive in to the code right now. But I might
have a hunch which might help you debug. It sounds like from what I
gather from your reports ODS has trouble selecting the right outgoing
interface (That's why it doesn't show up dumping lo, and that's why
sendto says invalid arguments).

Please take a look at the Signer/listener section in conf.xml and check
which interfaces you have configured. There has been some 'gotchas' in
the past in having multiple interfaces where the OS would select the
wrong outgoing interface if more than 1 had a route to the destination.
Resulting in the wrong source address on the outgoing packet. Maybe one
of our fixes has bitten you?

//Yuri

PGNet Dev

2016-12-26 22:29:32 UTC

Permalink

Post by Yuri Schaeffer
I'm not in the position to dive in to the code right now. But I might
have a hunch which might help you debug. It sounds like from what I
gather from your reports ODS has trouble selecting the right outgoing
interface (That's why it doesn't show up dumping lo, and that's why
sendto says invalid arguments).
Please take a look at the Signer/listener section in conf.xml and check
which interfaces you have configured. There has been some 'gotchas' in
the past in having multiple interfaces where the OS would select the
wrong outgoing interface if more than 1 had a route to the destination.
Resulting in the wrong source address on the outgoing packet. Maybe one
of our fixes has bitten you?

Perhaps ... I'd been looking at the bound src addresses, or trying to, until I got side tracked by that^ error-logging bug ...

In my latest/current stab at this, I've two physical boxes:

(1) bind9 (hidden primary)
listens on 10.1.1.53:53, 127.0.0.1:53
ods2
currently configured to listen on two interfaces (I've also tried with just one ...), port 15354

cat conf.xml
...
<Signer>
<Listener>
<Interface>
<Address>127.0.0.1</Address>
<Port>15354</Port>
</Interface>
<Interface>
<Address>10.1.1.53</Address>
<Port>15354</Port>
</Interface>
</Listener>
<Privileges>
<User>opendnssec</User>
<Group>opendnssec</Group>
</Privileges>
<WorkingDirectory>/var/opendnssec/signer</WorkingDirectory>
<WorkerThreads>4</WorkerThreads>
</Signer>
...

(2) nsd4 (secondary)
listens on 10.2.2.53:53

comms 'tween the two are over a VPN link. without ods2, it's worked this way for ages.

bind9 comms via AXFR+NOTIFY to the nsd4 secondary, etc.

firewall/routes are setup so that from the primary-box to the secondary-box,

telnet 10.2.2.53 53
Trying 10.2.2.53...
Connected to 10.2.2.53.
Escape character is '^]'.

and in the other direction, from the secondary to the primary

telnet 10.1.1.53 15354
Trying 10.1.1.53...
Connected to 10.1.1.53.
Escape character is '^]'.

I'm changing stuff all over the place atm, trying to figure out what's happening, or not :-/ So certainly open to any suggestions re: config.

Also, I'm trying to prove to myself that the bug report is (1) real, and (2) whether it only effects LOGGING or is hiding an actual UDP packet-assembly/content problem

Havard Eidnes

2016-12-26 23:17:53 UTC

Permalink

Post by PGNet Dev
ods2
currently configured to listen on two interfaces
(I've also tried with just one ...), port 15354

Even though XML invites list operations, not every construct
appear to be backed by handling in the code of a multi-valued /
list value... But you say you've also tried with one, so maybe
that suggestion doesn't apply. I'd probably go back to try to
use just one interface.

Regards,

- Håvard

PGNet Dev

2016-12-26 23:41:44 UTC

Permalink

Post by Havard Eidnes

Post by PGNet Dev
ods2
currently configured to listen on two interfaces
(I've also tried with just one ...), port 15354

Which I'm doing atm 0-- with still no luck yet.

Fwiw, with BOTH listener IPs defined in ods2's conf.xmk, netstat tells me both are, in fact, in use:

netstat -npla|grep :15354
tcp 0 0 10.1.1.53:15354 0.0.0.0:* LISTEN 11372/ods-signerd
tcp 0 0 127.0.0.1:15354 0.0.0.0:* LISTEN 11372/ods-signerd
udp 0 0 10.1.1.53:15354 0.0.0.0:* 11372/ods-signerd
udp 0 0 127.0.0.1:15354 0.0.0.0:* 11372/ods-signerd

What's DONE with them is of course a different matter ....

PGNet Dev

2016-12-27 00:52:35 UTC

Permalink

I'm losing track of all my own attempts :-/ So a quick summary:

I've set up my bind9 server to listen on 10.1.1.53 & 127.0.0.1.

transfers are ONLY allowed with TSIG. testing from shell on the same box,

dig -b 127.0.0.1 axfr example.com @127.0.0.1
; <<>> DiG 9.11.0-P1 <<>> -b 127.0.0.1 axfr example.com @127.0.0.1
;; global options: +cmd
; Transfer failed.

dig -b 127.0.0.1 axfr example.com @127.0.0.1 -k /usr/local/etc/named/keys/ods.key
; <<>> DiG 9.11.0-P1 <<>> -b 127.0.0.1 axfr example.com @127.0.0.1 -k /usr/local/etc/named/keys/ods.key
...
;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Dec 26 16:24:32 PST 2016
;; XFR size: 19 records (messages 1, bytes 1902)

since ODS2 is on the same box, it should be communicating for axfr only on localhost. it is, with config

cat conf.xml
...
<Signer>
<Listener>
<Interface>
<Address>127.0.0.1</Address>
<Port>15354</Port>
</Interface>
<Interface>
<Address>10.1.1.53</Address>
<Port>15354</Port>
</Interface>
</Listener>
<Privileges>
...

and

cat addns.xml
<?xml version="1.0" encoding="UTF-8"?>
<Adapter>
<DNS>
<TSIG>
<Name>ods-key</Name>
<Algorithm>hmac-sha256</Algorithm>
<Secret>xxx...xxx</Secret>
</TSIG>

<Inbound>
<RequestTransfer>
<Remote>
<Address>127.0.0.1</Address>
<Port>53</Port>
<Key>ods-key</Key>
</Remote>
</RequestTransfer>

<AllowNotify>
<Peer>
<Prefix>127.0.0.1</Prefix>
<Key>ods-key</Key>
</Peer>
</AllowNotify>
</Inbound>
...
</DNS>
</Adapter>
...

signerd listens as configured

netstat -npla|grep :15354
tcp 0 0 10.1.1.53:15354 0.0.0.0:* LISTEN 14482/ods-signerd
tcp 0 0 127.0.0.1:15354 0.0.0.0:* LISTEN 14482/ods-signerd
udp 0 0 10.1.1.53:15354 0.0.0.0:* 14482/ods-signerd
udp 0 0 127.0.0.1:15354 0.0.0.0:* 14482/ods-signerd

and axfr from bind works as expected

/usr/local/opendnssec/sbin/ods-signer retransfer example.com
Zone example.com being re-transfered.

tail -f opendnssec.log
...
Dec 26 16:32:23 dns ods-signerd: [xfrd] zone example.com request axfr to 127.0.0.1
Dec 26 16:32:23 dns ods-signerd: [xfrd] zone example.com transfer done [notify acquired 0, serial on disk 1482770644, notify serial 0]

at this point, if --out-type == file, the zone's signed to

/var/opendnssec/signed/example.com

notify mail, containing the new key, is sent/received correctly via a "<DelegationSignerSubmitCommand>" script, and we're done.

But, if --out-type == DNS, with add'l config

cat addns.xml
<Adapter>
<DNS>
...
<Outbound>
<ProvideTransfer>
<Peer>
<Prefix>10.2.2.53</Prefix>
<Key>ods-key</Key>
</Peer>
</ProvideTransfer>
<Notify>
<Remote>
<Address>10.2.2.53</Address>
<Port>53</Port>
</Remote>
</Notify>
</Outbound>
</DNS>
</Adapter>

signing fails

tail -f opendnssec.log
...
Dec 26 16:32:26 dns ods-signerd: [notify] unable to send data over udp to 10.2.2.53: sendto() failed (Invalid argument)
Dec 26 16:32:26 dns ods-signerd: [notify] unable to send notify retry 1 for zone example.com to 10.2.2.53: notify_send_udp() failed

which leads to this bug report

"error logging for failed ods-signer remote NOTIFY reports only "sendto() failed (Invalid argument)", no additional detail"
https://issues.opendnssec.org/browse/SUPPORT-206

I think 1st order of biz is to fix the "Invalid argument" as in the bug, and find out what the sendto() error *is* ...

PGNet Dev

2016-12-27 01:21:20 UTC

Permalink

a bit more for completeness ...

initiating UDP traffic from the ods box's shell, a query to nsd4 listening at 10.2.2.53 -- specified in ods addns.xml as the notify target, (noting that recursion's not allowed -- just watching traffic),

dig google.com @10.2.2.53
; <<>> DiG 9.11.0-P1 <<>> google.com @10.2.2.53
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 399
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com. IN A

;; Query time: 43 msec
;; SERVER: 10.2.2.53#53(10.2.2.53)
;; WHEN: Mon Dec 26 17:10:23 PST 2016
;; MSG SIZE rcvd: 39

following with tcpdump

tcpdump -i tun1 udp port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tun1, link-type RAW (Raw IP), capture size 262144 bytes
17:10:23.485198 IP dns.example.net.57886 > dnsext.example.net.domain: 399+ [1au] A? google.com. (51)
17:10:23.528369 IP dnsext.example.net.domain > dns.example.net.57886: 399 Refused- 0/0/1 (39)
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel

on exec of an ODS zone add

/usr/local/opendnssec/sbin/ods-enforcer zone add \
--zone example.com \
--policy lab \
--in-type DNS \
--input /usr/local/etc/opendnssec/addns.xml \
--out-type DNS \
--output /usr/local/etc/opendnssec/addns.xml
input is set to /usr/local/etc/opendnssec/addns.xml.
output is set to /usr/local/etc/opendnssec/addns.xml.
Zone example.com added successfully

tcpdump is silent

tcpdump -i tun1 udp port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tun1, link-type RAW (Raw IP), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel

Berry A.W. van Halderen

2016-12-27 09:36:08 UTC

Permalink

Dear PGDev (???), et al,

Let me respond first on a few items that already were partially
mentioned and then reply to the real issue. Also I do need to
mention I'm not absolutely certain on all items;

Network monitoring on the loopback device doesn't work that nice
for Linux. The kernel cuts short large parts of the network
stack to you might not see traffic from/to it.

There have been problems on *BSD machines, where it would not
send packets over the right interface. Linux normally selects
the right interface, where on *BSD you would need to bind
to a specific interface.

When using an adapter with a DNS/Inbound/RequestTransfer/Remote
specification, thus indicating you are allowing transfers
incoming from a certain source, you also need to specifc
an AllowNotify to indicate that you also allow a DNS NOTIFY
to be accepted. This, unlike bind, is not automatically
enabled and is more in line with NSD/Unbound specification.

These are all not your real problem. Looking at the way
OpenDNSSEC works, the TSIG can be specified in the configuration
because a Remote section is the same for inbound as well as
outbound transfers, but actually for inbound transfers it is
not used.

So I think that TSIG authorization isn't supported (yet) for
OpenDNSSEC. There is a bit of rationale why for inbound xfers
it is less used. Most of the times OpenDNSSEC is used where
the incoming zones are from a secured path anyway. Securing
by just restricting the address is enough.

Because it the setup you're looking for you are using
127.0.0.1, this might be the case as well and just removing
the requirement from the bind definition to require TSIGs
from 127.0.0.1 will make this work.

Yes the documentation does not explicitly state this and it is
certainly a feature worth implementing.

\Berry

PGNet Dev

2016-12-27 14:32:40 UTC

Permalink

Post by Berry A.W. van Halderen
So I think that TSIG authorization isn't supported (yet) for
OpenDNSSEC. There is a bit of rationale why for inbound xfers
it is less used. Most of the times OpenDNSSEC is used where
the incoming zones are from a secured path anyway. Securing
by just restricting the address is enough.
Because it the setup you're looking for you are using
127.0.0.1, this might be the case as well and just removing
the requirement from the bind definition to require TSIGs
from 127.0.0.1 will make this work.
Yes the documentation does not explicitly state this and it is
certainly a feature worth implementing.

IIUC, you're talking about inbound transfer from bind to ods.

As in my latest summary post, I'm not currently having problems with the
inbound transfer; it's working.

It's the OUTBOUND notify, in my case from ods to the secondary nsd4
instance, that's failing.

Berry A.W. van Halderen

2016-12-27 15:04:56 UTC

Permalink

Post by PGNet Dev

IIUC, you're talking about inbound transfer from bind to ods.
As in my latest summary post, I'm not currently having problems with the
inbound transfer; it's working.
It's the OUTBOUND notify, in my case from ods to the secondary nsd4
instance, that's failing.

A, but initially it was the /inbound/ you were trying to get up and

Post by PGNet Dev
cat addns.xml

<?xml version="1.0" encoding="UTF-8"?>
<Adapter>
<DNS>
<TSIG>
<Name>ods-key</Name>
<Algorithm>hmac-sha256</Algorithm>
<Secret>xxx...xxx</Secret>
</TSIG>
<Outbound>
<ProvideTransfer>
<Peer>
<Prefix>10.2.2.53</Prefix>
<Key>ods-key</Key>
</Peer>
</ProvideTransfer>
<Notify>
<Remote>
<Address>10.2.2.53</Address>
<Port>53</Port>
</Remote>
</Notify>
</Outbound>
...
</DNS>
</Adapter>

The Remote section here is missing the Key-reference.

\Berry

PGNet Dev

2016-12-27 15:16:29 UTC

Permalink

Post by PGNet Dev

Post by PGNet Dev
cat addns.xml

<?xml version="1.0" encoding="UTF-8"?>
<Adapter>
<DNS>
<TSIG>
<Name>ods-key</Name>
<Algorithm>hmac-sha256</Algorithm>
<Secret>xxx...xxx</Secret>
</TSIG>
<Outbound>
<ProvideTransfer>
<Peer>
<Prefix>10.2.2.53</Prefix>
<Key>ods-key</Key>
</Peer>
</ProvideTransfer>
<Notify>
<Remote>
<Address>10.2.2.53</Address>
<Port>53</Port>
</Remote>
</Notify>
</Outbound>
...
</DNS>
</Adapter>
The Remote section here is missing the Key-reference.

whether it's

<Remote>
<Address>10.2.2.53</Address>
<Port>53</Port>
</Remote>

or

<Remote>
<Address>10.2.2.53</Address>
<Port>53</Port>
<Key>ods-key</Key>
</Remote>

I see the same udp failure/error in the --out-type==DNS case.

That the error logging is not reporting what the problem is (per the bug
report) is certainly complicating the effort.

PGNet Dev

2016-12-27 15:42:33 UTC

Permalink

For reference,

with TSIG-usage ENabled here for inbound xfer, with a purposefully INcorrect key Secret, the xfer fails

Dec 27 07:36:18 dns sh[27465]: /usr/local/etc/opendnssec/addns.xml:8: element Secret: Relax-NG validity error : Element Secret failed to validate content

whereas using the CORRECT key Secret,

Dec 27 07:40:34 dns ods-signerd: [xfrd] zone example.com transfer done [notify acquired 0, serial on disk 1482770644, notify serial 0]

It certainly appears that TSIG is required & being used for inbound transfer.

Berry A.W. van Halderen

2016-12-28 09:51:01 UTC

Permalink

Post by PGNet Dev

For reference,
with TSIG-usage ENabled here for inbound xfer, with a purposefully INcorrect key Secret, the xfer fails
Dec 27 07:36:18 dns sh[27465]: /usr/local/etc/opendnssec/addns.xml:8: element Secret: Relax-NG validity error : Element Secret failed to validate content
whereas using the CORRECT key Secret,
Dec 27 07:40:34 dns ods-signerd: [xfrd] zone example.com transfer done [notify acquired 0, serial on disk 1482770644, notify serial 0]
It certainly appears that TSIG is required & being used for inbound transfer.

Actually, the "Relax-NG" error means it could not parse the XML file
because apparently that field is required. Doesn't tell if it is
actually used. This is a historic decision to require all fields,
even if not used. But apparently this isn't your issue.

\Berry

PGNet Dev

2016-12-27 18:02:46 UTC

Permalink

with

netstat -npla | egrep "ods\-|:15354"
tcp 0 0 10.1.1.53:15354 0.0.0.0:* LISTEN 12618/ods-signerd
tcp 0 0 127.0.0.1:15354 0.0.0.0:* LISTEN 12618/ods-signerd
udp 0 0 10.1.1.53:15354 0.0.0.0:* 12618/ods-signerd
udp 0 0 127.0.0.1:15354 0.0.0.0:* 12618/ods-signerd
unix 2 [ ACC ] STREAM LISTENING 260902 12660/ods-enforcerd /var/run/opendnssec/enforcer.sock
unix 2 [ ACC ] STREAM LISTENING 261964 12618/ods-signerd /var/run/opendnssec/engine.sock
unix 3 [ ] STREAM CONNECTED 262968 12660/ods-enforcerd
unix 2 [ ] DGRAM 260901 12660/ods-enforcerd
unix 3 [ ] DGRAM 261967 12618/ods-signerd
unix 3 [ ] STREAM CONNECTED 262878 12618/ods-signerd
unix 2 [ ] DGRAM 261963 12618/ods-signerd
unix 3 [ ] DGRAM 261966 12618/ods-signerd

and

/usr/local/opendnssec/sbin/ods-enforcer zone add \
--zone example.com \
--policy lab \
--in-type DNS \
--input /usr/local/etc/opendnssec/addns.xml \
--out-type DNS \
--output /usr/local/etc/opendnssec/addns.xml

on exec of

/usr/local/opendnssec/sbin/ods-signer retransfer example.com
Zone example.com being re-transfered.

log reports the same/consistent failure by ods to send the notify to the remote,

tail -f /var/logl/opendnssec/opendnssec.log

Dec 27 09:45:03 dns ods-signerd: [xfrd] zone example.com request axfr to 127.0.0.1
Dec 27 09:45:03 dns ods-signerd: [xfrd] zone example.com transfer done [notify acquired 0, serial on disk 1482857148, notify serial 0]
Dec 27 09:45:03 dns ods-signerd: [STATS] example.com 1482860703 RR[count=1 time=0(sec)] NSEC3[count=0 time=0(sec)] RRSIG[new=1 reused=26 time=0(sec) avg=0(sig/sec)] TOTAL[time=0(sec)]
Dec 27 09:45:03 dns ods-signerd: [notify] unable to send data over udp to 10.2.2.53: sendto() failed (Invalid argument)
Dec 27 09:45:03 dns ods-signerd: [notify] unable to send notify retry 1 for zone example.com to 10.2.2.53: notify_send_udp() failed

further, the remote nsd's logs show no activity, and there's no traffic I can manage to see via tcpdump either locally or @ remote

otoh, if I send a 'manual' notify to the remote

./send-dns-notify \
-d -d \
-b 10.1.1.53 \
-s 10.2.2.53 \
-z example.com
zone : example.com
nameserver: 10.2.2.53
src_ipaddr: 10.1.1.53

there's at least an obvious connection

--------------------------------------------------------------------------
send notify for example.com to 10.2.2.53
received answer from 10.2.2.53
;; Answer received from 10.2.2.53 (28 bytes)
;; HEADER SECTION
;; id = 27609
;; qr = 1 aa = 1 tc = 0 rd = 0 opcode = NOTIFY
;; ra = 0 z = 0 ad = 0 cd = 0 rcode = NOERROR
;; qdcount = 1 ancount = 0 nscount = 0 arcount = 0
;; do = 0

;; QUESTION SECTION (1 record)
;; example.com. IN SOA

;; ANSWER SECTION (0 records)

;; AUTHORITY SECTION (0 records)

;; ADDITIONAL SECTION (0 records)

which the remote nsd4 instance sees

[2016-12-27 17:58:53.491] nsd[28836]: info: notify for example.com. from 10.1.1.53

Berry A.W. van Halderen

2016-12-28 10:27:19 UTC

Permalink

Post by PGNet Dev
since ODS2 is on the same box, it should be communicating for axfr only on localhost. it is, with config
cat conf.xml
...
<Signer>
<Listener>
<Interface>
<Address>127.0.0.1</Address>
<Port>15354</Port>
</Interface>
<Interface>
<Address>10.1.1.53</Address>
<Port>15354</Port>
</Interface>
</Listener>
<Privileges>
...
and
cat addns.xml
<?xml version="1.0" encoding="UTF-8"?>
<Adapter>
<DNS>
<TSIG>
<Name>ods-key</Name>
<Algorithm>hmac-sha256</Algorithm>
<Secret>xxx...xxx</Secret>
</TSIG>
<Inbound>
<RequestTransfer>
<Remote>
<Address>127.0.0.1</Address>
<Port>53</Port>
<Key>ods-key</Key>
</Remote>
</RequestTransfer>
<AllowNotify>
<Peer>
<Prefix>127.0.0.1</Prefix>
<Key>ods-key</Key>
</Peer>
</AllowNotify>
</Inbound>
...
</DNS>
</Adapter>
...
notify mail, containing the new key, is sent/received correctly via a "<DelegationSignerSubmitCommand>" script, and we're done.
But, if --out-type == DNS, with add'l config
cat addns.xml
<Adapter>
<DNS>
...
<Outbound>
<ProvideTransfer>
<Peer>
<Prefix>10.2.2.53</Prefix>
<Key>ods-key</Key>
</Peer>
</ProvideTransfer>
<Notify>
<Remote>
<Address>10.2.2.53</Address>
<Port>53</Port>
</Remote>
</Notify>
</Outbound>
</DNS>
</Adapter>
signing fails
tail -f opendnssec.log
...
Dec 26 16:32:26 dns ods-signerd: [notify] unable to send data over udp to 10.2.2.53: sendto() failed (Invalid argument)
Dec 26 16:32:26 dns ods-signerd: [notify] unable to send notify retry 1 for zone example.com to 10.2.2.53: notify_send_udp() failed
which leads to this bug report
"error logging for failed ods-signer remote NOTIFY reports only "sendto() failed (Invalid argument)", no additional detail"
https://issues.opendnssec.org/browse/SUPPORT-206
I think 1st order of biz is to fix the "Invalid argument" as in the bug, and find out what the sendto() error *is* ...
_______________________________________________

Well, the error is in fact an "Invalid argument". It is really all the
information available. The sendto() call failed because one of the
arguments is invalid. The destination address is printed (10.2.2.53)
and it earlier also indicated it wanted to send 132 bytes. It is not
considered healthy to print the data that it wants to send, and that
doesn't help either anyway. All arguments *look* valid, it is just
that the operating system cannot do.

It took some digging, but puzzling the pieces from your items together I
think this is the case. You want to send a notify to 10.2.2.53,
but that will be send over the first (OpenDNSSEC can only assume/try
something) interface available to it. You specified two interfaces,
unfortunately the first is local-only, while you want to send something
to another box.

<Signer>
<Listener>
<Interface>
<Address>127.0.0.1</Address><Port>15354</Port>
</Interface>
<Interface>
<Address>10.1.1.53</Address><Port>15354</Port>
</Interface>
...
So the NOTIFY gets as source address 127.0.0.1 while is being
sent to 10.2.2.53. That is an "invalid argument" to the operating
system. If you reverse the two interfaces probably things start
working.

You might wander why we bind to an interface at all, well, there are
bugs where some OSes do not use the right interface. Also there could
be multiple addresses valid. We either need to know magically which
one to take or start probing, which isn't very friendly either.
Also it is often the case that explicit security is used to require
NOTIFies to be sent using an explicit source address. So it is
better to bind in these cases.

I'm afraid it is just one of those things that can go wrong in an
extended set-up.

\Berry

PGNet Dev

2016-12-28 13:54:33 UTC

Permalink

Post by Berry A.W. van Halderen
So the NOTIFY gets as source address 127.0.0.1 while is being
sent to 10.2.2.53. That is an "invalid argument" to the operating
system. If you reverse the two interfaces probably things start
working.

Unfortunately, though behavior IS apparently sensitive to that order,
they just fail *differently*.

Post by Berry A.W. van Halderen
You might wander why we bind to an interface at all

No, not at all. I however do wonder why a bind "per target (or action)"
is not implemented, perhaps using multiple-sockets ....

Post by Berry A.W. van Halderen
Also it is often the case that explicit security is used to require
NOTIFies to be sent using an explicit source address. So it is
better to bind in these cases.

If explicit security is in fact a consideration, as I'd hope it would
be, then making any 'guesses' is not a reliable approach.

Postfix, as as example of app that provides such explicit security, does
an excellent job of allowing bind-address specified per action/daemon ...

Post by Berry A.W. van Halderen
I'm afraid it is just one of those things that can go wrong in an
extended set-up.

I wouldn't have considered a commonplace primary + secondary setup to be
an 'extended' setup ...

In any case, is this extended setup something you intended to cleanly
implement/support ?

Simply need to know one way or the other. If so, great. If not, then I
need to use a different approach to DNSSEC automation here.

PGNet Dev

2016-12-28 14:01:20 UTC

Permalink

Post by PGNet Dev
Postfix, as as example of app that provides such explicit security, does
an excellent job of allowing bind-address specified per action/daemon ...

And, apparently, so does nsd4, although on a per-zone basis, using its

"outgoing-interface:"

param.

Berry A.W. van Halderen

2016-12-28 15:24:40 UTC

Permalink

Post by PGNet Dev

Unfortunately, though behavior IS apparently sensitive to that order,
they just fail *differently*.

Then how *does* that fail then?

Post by PGNet Dev

Post by Berry A.W. van Halderen
Postfix, as as example of app that provides such explicit security, does
an excellent job of allowing bind-address specified per action/daemon ...

And, apparently, so does nsd4, although on a per-zone basis, using its
"outgoing-interface:"
param.

Different programs, different requirements.
All in all, the outgoing interface needs to be able to reach the
destination, if not all slave servers are on the same network,
you would need to be able to specify a outgoing-interface on a
per destination basis. It will get very hairy then.
So far, the assumption that the primary address, had been good enough.
We can always extend functionality.

\Berry

PGNet Dev

2016-12-28 15:36:39 UTC

Permalink

Post by Berry A.W. van Halderen

Post by PGNet Dev
Unfortunately, though behavior IS apparently sensitive to that order,
they just fail *differently*.

Then how *does* that fail then?

Bottom line, it doesn't work. As to the details, I'll have to
re-diagnose & re-gather details if I stick with it ...

Post by Berry A.W. van Halderen
Different programs,

yes

Post by Berry A.W. van Halderen
different requirements.

Depends what you're talking about.

If the requirement is to be able to "address" & communicate securely
with different endpoints differently, then no -- not so different.

Post by Berry A.W. van Halderen
All in all, the outgoing interface needs to be able to reach the
destination, if not all slave servers are on the same network,

Which is in my own experience a far more frequent situation than having
multiple slaves on the SAME network, where typically a properly sized
single nameserver + network work well enough.

TBH, it's a headscratcher for me that the option for different IPs is
provided in inbound/outbound DNS adapters, but that the argument is that
that's now how it's supposed to work ...

If I can't talk to different servers, and automate it all, what's the
point?

Post by Berry A.W. van Halderen
you would need to be able to specify a outgoing-interface on a
per destination basis.

Sure, that's one approach.

Post by Berry A.W. van Halderen
It will get very hairy then.

Sorry, I don't buy that as a necessary fact. Again, nsd4 manages well
enough ...

Post by Berry A.W. van Halderen
So far, the assumption that the primary address, had been good enough.

Post by Berry A.W. van Halderen
We can always extend functionality.

That's the basis for my previous question -- will, vs can?