Discussion:
[Opendnssec-user] Error allocating ksks / zsks
Havard Eidnes
2016-01-25 14:35:02 UTC
Permalink
Hi,

I had reason to inspect the log from the physical console on our
signer host, and found messages from ods-enforcerd related to two
of our zones:

Jan 24 17:07:01 hugin ods-enforcerd: Error allocating ksks to zone godegrep.no
Jan 24 17:07:16 hugin ods-enforcerd: Error allocating zsks to zone 2.1.2.6.1.9.3.7.7.4.nrenum.net

and that this is a recurring theme.

Looking at the log reveals a bit more:

Jan 25 14:12:48 hugin ods-enforcerd: Zone godegrep.no found.
Jan 25 14:12:48 hugin ods-enforcerd: Policy for godegrep.no set to default.
Jan 25 14:12:48 hugin ods-enforcerd: Config will be output to /var/opendnssec/signconf/godegrep.no.xml.
Jan 25 14:12:48 hugin ods-enforcerd: Not enough keys to satisfy ksk policy for zone: godegrep.no. keys_to_allocate(1) = keys_needed(1) - (keys_available(1) - keys_pending_retirement(1))
Jan 25 14:12:48 hugin ods-enforcerd: Tried to allocate 1 keys, failed on allocating key number 1
Jan 25 14:12:48 hugin ods-enforcerd: ods-enforcerd will create some more keys on its next run
Jan 25 14:12:48 hugin ods-enforcerd: Error allocating ksks to zone godegrep.no

It seems to me that the calculation above wrt. keys_to_allocate
is correct, but the statement that ods-enforcerd will create more
keys on its next run appears to be a blatant lie.

Listing the keys for these zones reveals that some of the "Date
of next transition" has come and gone without the transition to
the next state having taken place, and one of the key sets has a
key in "generate" state which isn't visible witout the "-all"
switch:

ods @ hugin: {6} ods-ksmutil key list -all --zone godegrep.no
Keys:
Zone: Keytype: State: Date of next transition:
godegrep.no KSK active 2015-12-13 15:12:43
godegrep.no ZSK retire 2015-12-29 09:45:48
godegrep.no ZSK active 2016-01-07 04:30:48
godegrep.no ZSK generate (not scheduled)

ods @ hugin: {7} ods-ksmutil key list --all --zone 2.1.2.6.1.9.3.7.7.4.nrenum.net
Keys:
Zone: Keytype: State: Date of next transition:
2.1.2.6.1.9.3.7.7.4.nrenum.net KSK active 2016-12-09 23:42:31
2.1.2.6.1.9.3.7.7.4.nrenum.net ZSK active 2016-01-06 00:25:00

ods @ hugin: {8}

I'm not sure when this started.

So...

1) Any idea how OpenDNSSEC got itself into this state?

2) Are there any manual steps I have to perform to get it out of
this state for these two zones?

3) Rhetorical: why doesn't OpenDNSSEC recover by itself from this?


Best regards,

- Håvard
Yuri Schaeffer
2016-01-28 09:32:03 UTC
Permalink
Hi Håvard,

For now I made an issue in our tracker for it
https://issues.opendnssec.org/browse/OPENDNSSEC-752

Regards,
Yuri
Post by Havard Eidnes
Hi,
I had reason to inspect the log from the physical console on our
signer host, and found messages from ods-enforcerd related to two
Jan 24 17:07:01 hugin ods-enforcerd: Error allocating ksks to zone
godegrep.no Jan 24 17:07:16 hugin ods-enforcerd: Error allocating
zsks to zone 2.1.2.6.1.9.3.7.7.4.nrenum.net
and that this is a recurring theme.
Jan 25 14:12:48 hugin ods-enforcerd: Zone godegrep.no found. Jan 25
14:12:48 hugin ods-enforcerd: Policy for godegrep.no set to
default. Jan 25 14:12:48 hugin ods-enforcerd: Config will be output
to /var/opendnssec/signconf/godegrep.no.xml. Jan 25 14:12:48 hugin
godegrep.no. keys_to_allocate(1) = keys_needed(1) -
(keys_available(1) - keys_pending_retirement(1)) Jan 25 14:12:48
hugin ods-enforcerd: Tried to allocate 1 keys, failed on allocating
key number 1 Jan 25 14:12:48 hugin ods-enforcerd: ods-enforcerd
will create some more keys on its next run Jan 25 14:12:48 hugin
ods-enforcerd: Error allocating ksks to zone godegrep.no
It seems to me that the calculation above wrt. keys_to_allocate is
correct, but the statement that ods-enforcerd will create more keys
on its next run appears to be a blatant lie.
Listing the keys for these zones reveals that some of the "Date of
next transition" has come and gone without the transition to the
next state having taken place, and one of the key sets has a key in
Keys: Zone: Keytype: State: Date
of next transition: godegrep.no KSK
active 2015-12-13 15:12:43 godegrep.no ZSK
retire 2015-12-29 09:45:48 godegrep.no ZSK
active 2016-01-07 04:30:48 godegrep.no ZSK
generate (not scheduled)
2.1.2.6.1.9.3.7.7.4.nrenum.net KSK active 2016-12-09
23:42:31 2.1.2.6.1.9.3.7.7.4.nrenum.net ZSK active
2016-01-06 00:25:00
I'm not sure when this started.
So...
1) Any idea how OpenDNSSEC got itself into this state?
2) Are there any manual steps I have to perform to get it out of
this state for these two zones?
3) Rhetorical: why doesn't OpenDNSSEC recover by itself from this?
Best regards,
- Håvard _______________________________________________
https://lists.opendnssec.org/mailman/listinfo/opendnssec-user
Maurice
2016-01-28 09:42:35 UTC
Permalink
Hi Håvard,

I also ran into this a couple of times. I "fixed" this by using the
"ods-ksmutil key generate" command.

Regards,

Maurice Mahieu
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Håvard,
For now I made an issue in our tracker for it
https://issues.opendnssec.org/browse/OPENDNSSEC-752
Regards,
Yuri
Post by Havard Eidnes
Hi,
I had reason to inspect the log from the physical console on our
signer host, and found messages from ods-enforcerd related to two
Jan 24 17:07:01 hugin ods-enforcerd: Error allocating ksks to zone
godegrep.no Jan 24 17:07:16 hugin ods-enforcerd: Error allocating
zsks to zone 2.1.2.6.1.9.3.7.7.4.nrenum.net
and that this is a recurring theme.
Jan 25 14:12:48 hugin ods-enforcerd: Zone godegrep.no found. Jan 25
14:12:48 hugin ods-enforcerd: Policy for godegrep.no set to
default. Jan 25 14:12:48 hugin ods-enforcerd: Config will be output
to /var/opendnssec/signconf/godegrep.no.xml. Jan 25 14:12:48 hugin
godegrep.no. keys_to_allocate(1) = keys_needed(1) -
(keys_available(1) - keys_pending_retirement(1)) Jan 25 14:12:48
hugin ods-enforcerd: Tried to allocate 1 keys, failed on allocating
key number 1 Jan 25 14:12:48 hugin ods-enforcerd: ods-enforcerd
will create some more keys on its next run Jan 25 14:12:48 hugin
ods-enforcerd: Error allocating ksks to zone godegrep.no
It seems to me that the calculation above wrt. keys_to_allocate is
correct, but the statement that ods-enforcerd will create more keys
on its next run appears to be a blatant lie.
Listing the keys for these zones reveals that some of the "Date of
next transition" has come and gone without the transition to the
next state having taken place, and one of the key sets has a key in
Keys: Zone: Keytype: State: Date
of next transition: godegrep.no KSK
active 2015-12-13 15:12:43 godegrep.no ZSK
retire 2015-12-29 09:45:48 godegrep.no ZSK
active 2016-01-07 04:30:48 godegrep.no ZSK
generate (not scheduled)
2.1.2.6.1.9.3.7.7.4.nrenum.net KSK active 2016-12-09
23:42:31 2.1.2.6.1.9.3.7.7.4.nrenum.net ZSK active
2016-01-06 00:25:00
I'm not sure when this started.
So...
1) Any idea how OpenDNSSEC got itself into this state?
2) Are there any manual steps I have to perform to get it out of
this state for these two zones?
3) Rhetorical: why doesn't OpenDNSSEC recover by itself from this?
Best regards,
- Håvard _______________________________________________
https://lists.opendnssec.org/mailman/listinfo/opendnssec-user
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEARECAAYFAlap4BMACgkQI3PTR4mhaviRvgCgoYBMVUFAMkjqDZ/ster8n5G5
MwoAoL8HgOSsNoeCD1Dpg+PGzi+TGizc
=pWOH
-----END PGP SIGNATURE-----
_______________________________________________
Opendnssec-user mailing list
https://lists.opendnssec.org/mailman/listinfo/opendnssec-user
--
Maurice Mahieu
System Engineer | ***@info.nl <mailto:***@info.nl> | +31 (0)20
53 09 111 <tel:+31205309111>
info.nl <http://www.info.nl> /making platforms work/
<http://www.info.nl/nl?utm_source=e-mail_sig&utm_medium=e-mail&utm_term=connecting_the_dots&utm_campaign=info_sig>

Sint Antoniesbreestraat 16 | 1011 HB Amsterdam | +31 (0)20 530 91 00
<tel:+31205309100>
Facebook <https://www.facebook.com/infonl> | Twitter
<https://twitter.com/infonl> | LinkedIn
<https://www.linkedin.com/company/info.nl> | Google+
<https://plus.google.com/+infonl/>
Havard Eidnes
2016-02-03 08:26:26 UTC
Permalink
Post by Maurice
I also ran into this a couple of times. I "fixed" this by using the
"ods-ksmutil key generate" command.
Hmm... I'm wary of doing that, since the problem doesn't appear
to be that softhsm can't generate new keys (it does so just fine
for all the other zones we have), so I don't understand what
problem "ods-kemutil key generate" should fix. Rather, the
problem appears to be a bug or at best a lack of robustness to
deal with "strange state" in the enforcer.

Looking a bit at the code, I find this comment at the top of
KsmKeyGetUnallocated:

/*+
* KsmKeyGetUnallocated
*
* Description:
* Given a set of policy values get the next unallocated keypair
* Executes:
* select min(id) from keydata
* where policy_id = policy_id
* and securitymodule_id = sm
* and size = bits
* and algorithm = algorithm
* and state is KSM_STATE_GENERATE
*

so I worried that the key in the "generate" state for the
godegrep.no zone might mess things up (orphaned?), so it is now
history:

ods @ hugin: {12} ods-ksmutil key list --all --zone godegrep.no -v
SQLite database set to: /var/db/opendnssec/kasp.db
Keys:
Zone: Keytype: State: Date of next transition (to): Size: Algorithm: CKA_ID: Repository: Keytag:
godegrep.no KSK active 2015-12-13 15:12:43 (retire) 2048 8 3b9f49b93bece1c2e8f1f94ee8b6b4f7 SoftHSM 51569
godegrep.no ZSK retire 2015-12-29 09:45:48 (dead) 1024 8 13c2dafbfd19f837931efe5367b200e9 SoftHSM 58015
godegrep.no ZSK active 2016-01-07 04:30:48 (retire) 1024 8 76a7b62bcc80e9af6f57ef455bc483bf SoftHSM 31264
godegrep.no ZSK generate (not scheduled) (publish) 1024 8 15e81adbc4a30ced30cf1bab8cb2b212 SoftHSM 44994

ods @ hugin: {13} ods-ksmutil key delete --cka_id 15e81adbc4a30ced30cf1bab8cb2b212
Key delete successful: 15e81adbc4a30ced30cf1bab8cb2b212
ods @ hugin: {14}

and I'll wait and see if this improves matters for this
particular zone.

However, the other zone I have this issue with doesn't have a key
in "generate" state, and also doesn't have two ZSKs, so I'm not
sure what's up with that particular problem.

Regards,

- Håvard
Havard Eidnes
2016-02-03 08:41:16 UTC
Permalink
For the other zone, the 2.1.2.6.1.9.3.7.7.4.nrenum.net zone, I
tried another approach, namely I tried to initiate a manual key
rollover for both the ZSK and the KSK:

ods @ hugin: {22} ods-ksmutil key list --all --zone 2.1.2.6.1.9.3.7.7.4.nrenum.net
Keys:
Zone: Keytype: State: Date of next transition:
2.1.2.6.1.9.3.7.7.4.nrenum.net KSK active 2016-12-09 23:42:31
2.1.2.6.1.9.3.7.7.4.nrenum.net ZSK active 2016-01-06 00:25:00

ods @ hugin: {23} ods-ksmutil key rollover --zone 2.1.2.6.1.9.3.7.7.4.nrenum.net -t zsk

Manual key rollover for key type zsk on zone 2.1.2.6.1.9.3.7.7.4.nrenum.net initiated
Notifying enforcer of new database...
ods @ hugin: {24} ods-ksmutil key list --all --zone 2.1.2.6.1.9.3.7.7.4.nrenum.net
Keys:
Zone: Keytype: State: Date of next transition:
2.1.2.6.1.9.3.7.7.4.nrenum.net KSK active 2016-12-09 23:42:31
2.1.2.6.1.9.3.7.7.4.nrenum.net ZSK active 2016-02-03 09:33:23

ods @ hugin: {25} ods-ksmutil key rollover --zone 2.1.2.6.1.9.3.7.7.4.nrenum.net -t ksk

Manual key rollover for key type ksk on zone 2.1.2.6.1.9.3.7.7.4.nrenum.net initiated
Notifying enforcer of new database...
ods @ hugin: {26} ods-ksmutil key list --all --zone 2.1.2.6.1.9.3.7.7.4.nrenum.net
Keys:
Zone: Keytype: State: Date of next transition:
2.1.2.6.1.9.3.7.7.4.nrenum.net KSK active 2016-02-03 09:34:38
2.1.2.6.1.9.3.7.7.4.nrenum.net ZSK active 2016-02-03 09:33:23

ods @ hugin: {27}

Looking at the log I still see errors of this type:

Feb 3 09:35:00 hugin ods-enforcerd: Not enough keys to satisfy zsk policy for zone: 2.1.2.6.1.9.3.7.7.4.nrenum.net. keys_to_allocate(1) = keys_needed(1) - (keys_available(1) - keys_pending_retirement(1))
Feb 3 09:35:00 hugin ods-enforcerd: Tried to allocate 1 keys, failed on allocating key number 1
Feb 3 09:35:00 hugin ods-enforcerd: ods-enforcerd will create some more keys on its next run
Feb 3 09:35:00 hugin ods-enforcerd: Error allocating zsks to zone 2.1.2.6.1.9.3.7.7.4.nrenum.net

and similarly for the attempt at rolling the ZSK.

Bah!

Regards,

- Håvard
Havard Eidnes
2016-02-03 08:44:04 UTC
Permalink
Post by Havard Eidnes
so I worried that the key in the "generate" state for the
godegrep.no zone might mess things up (orphaned?), so it is now
This removal of the apparently orphaned key in "generate" state
seems to have unwedged this one. In the log I now found

Feb 3 09:34:40 hugin ods-enforcerd: Zone godegrep.no found.
Feb 3 09:34:40 hugin ods-enforcerd: Policy for godegrep.no set to default.
Feb 3 09:34:40 hugin ods-enforcerd: Config will be output to /var/opendnssec/signconf/godegrep.no.xml.
Feb 3 09:34:40 hugin ods-enforcerd: KSK key allocation for zone godegrep.no: 1 key(s) allocated
Feb 3 09:34:40 hugin ods-enforcerd: WARNING: KSK rollover for zone 'godegrep.no' not completed as there are no keys in the 'ready' state; ods-enforcerd will try again when it runs next
Feb 3 09:34:40 hugin ods-enforcerd: WARNING: ZSK rollover for zone 'godegrep.no' not completed as there are no keys in the 'ready' state; ods-enforcerd will try again when it runs next

and listing the keys for the zone shows significant change:

ods @ hugin: {29} ods-ksmutil key list --all --zone godegrep.no
Keys:
Zone: Keytype: State: Date of next transition:
godegrep.no KSK active 2015-12-13 15:12:43
godegrep.no ZSK dead to be deleted
godegrep.no ZSK active 2016-01-07 04:30:48
godegrep.no ZSK publish 2016-02-03 15:49:40
godegrep.no KSK publish 2016-02-03 15:49:40

ods @ hugin: {30}

So by the end of today, there should be both a new zsk and ksk in
"ready" state to progress the key rotation.

Regards,

- Håvard
Havard Eidnes
2016-02-08 08:10:50 UTC
Permalink
Post by Havard Eidnes
Post by Havard Eidnes
so I worried that the key in the "generate" state for the
godegrep.no zone might mess things up (orphaned?), so it is now
This removal of the apparently orphaned key in "generate" state
seems to have unwedged this one. [...]
...
Post by Havard Eidnes
godegrep.no KSK active 2015-12-13 15:12:43
godegrep.no ZSK dead to be deleted
godegrep.no ZSK active 2016-01-07 04:30:48
godegrep.no ZSK publish 2016-02-03 15:49:40
godegrep.no KSK publish 2016-02-03 15:49:40
So ... deleting the key stuck in "generate" state completed the
key rotation, but there still seems to be something odd with the
old keys, as the current state is:

ods @ hugin: {34} ods-ksmutil key list --all --zone godegrep.no
Keys:
Zone: Keytype: State: Date of next transition:
godegrep.no KSK dead to be deleted
godegrep.no ZSK dead to be deleted
godegrep.no ZSK retire 2016-02-25 05:55:10
godegrep.no ZSK active 2016-03-05 00:40:10
godegrep.no KSK active 2017-02-03 12:30:11

ods @ hugin: {35}

Why doesn't the enforcer actually delete the now-dead keys?

(I tried finding a way to comment on the OPENDNSSEC-752 issue via
the bug tracker, but didn't find a way to do that despite being
logged in, possibly because it's not "my" ticket.)

Regards,

- Håvard
Havard Eidnes
2016-03-09 20:51:44 UTC
Permalink
Hi,

a new occurrence of this problem has surfaced.

Somehow my SoftHSM database had become owned by root (ouch!). While
that was done, a new zone was added. Of course key allocation failed
since the enforcer could not write to the SoftHSM database.

However, the file ownership problem is now sorted, but OpenDNSSEC
still refuses to allocate any keys:

Mar 9 21:34:09 hugin ods-signerd: [parser] zone mydomainname.no added
Mar 9 21:34:25 hugin ods-enforcerd: Zone mydomainname.no found.
Mar 9 21:34:25 hugin ods-enforcerd: Policy for mydomainname.no set to default.
Mar 9 21:34:25 hugin ods-enforcerd: Config will be output to /var/opendnssec/signconf/mydomainname.no.xml.
Mar 9 21:34:25 hugin ods-enforcerd: Not enough keys to satisfy zsk policy for zone: mydomainname.no. keys_to_allocate(1) = keys_needed(1) - (keys_available(0) - keys_pending_retirement(0))
Mar 9 21:34:25 hugin ods-enforcerd: Error allocating zsks to zone mydomainname.no

Of course the "Not enough keys" error message is meaningless, and
the equation shown *is* true, so that's obviously *not* the problem.

There's also no way I've found to increase the logging level of
the enforcer so I can easily get a clue as to the actual problem. I
sense some coding coming up to narrow in on the actual problem...

The zone's key list is empty:

ods @ hugin: {14} ods-ksmutil key list --zone mydomainname.no
Keys:
Zone: Keytype: State: Date of next transition:

ods @ hugin: {15}

And later of course the signer logs:

Mar 9 21:39:44 hugin ods-signerd: [worker[1]] configure zone mydomainname.no
Mar 9 21:39:44 hugin ods-signerd: [file] unable to open file /var/opendnssec/signconf/mydomainname.no.xml for reading: No such file or directory
Mar 9 21:39:44 hugin ods-signerd: [file] unable to stat file /var/opendnssec/signconf/mydomainname.no.xml: ods_fopen() failed
Mar 9 21:39:44 hugin ods-signerd: [zone] zone mydomainname.no signconf file /var/opendnssec/signconf/mydomainname.no.xml is unchanged since 2016-03-09 21:39:44
Mar 9 21:39:44 hugin ods-signerd: [worker[1]] no signconf.xml for zone mydomainname.no yet
Mar 9 21:39:44 hugin ods-signerd: [worker[1]] CRITICAL: failed to sign zone mydomainname.no: General error
Mar 9 21:39:44 hugin ods-signerd: [worker[1]] backoff task [configure] for zone mydomainname.no with 60 seconds
Mar 9 21:39:44 hugin ods-signerd: [worker[1]] finished working on zone mydomainname.no
Mar 9 21:39:44 hugin ods-signerd: [scheduler] schedule task [configure] for zone mydomainname.no
Mar 9 21:39:44 hugin ods-signerd: [task] On Wed Mar 9 21:40:44 2016 I will [configure] zone mydomainname.no

Restarting OpenDNSSEC does not fix the problem.

So... I remove the zone and let one of my scripts re-create it.

However, that does *not* fix it:

Mar 9 21:48:21 hugin ods-enforcerd: Zone mydomainname.no found.
Mar 9 21:48:21 hugin ods-enforcerd: Policy for mydomainname.no set to default.
Mar 9 21:48:21 hugin ods-enforcerd: Config will be output to /var/opendnssec/signconf/mydomainname.no.xml.
Mar 9 21:48:21 hugin ods-enforcerd: Not enough keys to satisfy zsk policy for zone: mydomainname.no. keys_to_allocate(1) = keys_needed(1) - (keys_available(0) - keys_pending_retirement(0))
Mar 9 21:48:21 hugin ods-enforcerd: Error allocating zsks to zone mydomainname.no

So, how do I get out of this state?!? Should not OpenDNSSEC be able
to sort this out itself?

Regards,

- Håvard
Havard Eidnes
2016-03-10 12:50:41 UTC
Permalink
Post by Havard Eidnes
a new occurrence of this problem has surfaced.
Somehow my SoftHSM database had become owned by root (ouch!). While
that was done, a new zone was added. Of course key allocation failed
since the enforcer could not write to the SoftHSM database.
However, the file ownership problem is now sorted, but OpenDNSSEC
Mar 9 21:34:09 hugin ods-signerd: [parser] zone mydomainname.no added
Mar 9 21:34:25 hugin ods-enforcerd: Zone mydomainname.no found.
Mar 9 21:34:25 hugin ods-enforcerd: Policy for mydomainname.no set to default.
Mar 9 21:34:25 hugin ods-enforcerd: Config will be output to /var/opendnssec/signconf/mydomainname.no.xml.
Mar 9 21:34:25 hugin ods-enforcerd: Not enough keys to satisfy zsk policy for zone: mydomainname.no. keys_to_allocate(1) = keys_needed(1) - (keys_available(0) - keys_pending_retirement(0))
Mar 9 21:34:25 hugin ods-enforcerd: Error allocating zsks to zone mydomainname.no
I've done some more code reading, and on the way found reason to
file the tangentially related

https://issues.opendnssec.org/browse/SUPPORT-189

where any "KSM" log messages are discarded by the daemon.

However, this doesn't appear to be able to shed more light on
what the problem is. I added some more logging in do_keygen() in
the enforcer, and got:

Mar 10 12:08:02 hugin ods-enforcerd: Connecting to Database...
Mar 10 12:08:02 hugin ods-enforcerd: Policy default found.
Mar 10 12:08:02 hugin ods-enforcerd: Key sharing is Off.
Mar 10 12:08:02 hugin ods-enforcerd: 367 zone(s) found on policy "default"
Mar 10 12:08:02 hugin ods-enforcerd: Predict we need 367 KSKs
Mar 10 12:08:02 hugin ods-enforcerd: Have 367 KSK keys in queue
Mar 10 12:08:02 hugin ods-enforcerd: Need 0 new KSK keys
Mar 10 12:08:02 hugin ods-enforcerd: No new KSKs need to be created.
Mar 10 12:08:02 hugin ods-enforcerd: Predict we need 367 new ZSK keys
Mar 10 12:08:02 hugin ods-enforcerd: Have 367 ZSK keys in queue
Mar 10 12:08:02 hugin ods-enforcerd: Need 0 new ZSK keys
Mar 10 12:08:02 hugin ods-enforcerd: No new ZSKs need to be created.
Mar 10 12:08:02 hugin ods-enforcerd: Purging keys...

So this part of OpenDNSSEC seems to think it has all the keys it
needs, and that no new keys need to be generated.

However, the key allocation done by KsmKeyGetUnallocated() returns
with status=0 but keypair_id=0 as well, and allocateKeysToZone() in
the enforcer thinks this is a signal of failure:

status = KsmKeyGetUnallocated(policy->id, policy->zsk->sm, policy->zsk->bits, policy->zsk->algorithm, zone_id, policy->keys->share_keys, &key_pair_id);
if (status == -1 || key_pair_id == 0) {
if (man_key_gen == 0) {
log_msg(NULL, LOG_WARNING, "Not enough keys to satisfy zsk policy for zone: %s. keys_to_allocate(%d) = keys_needed(%d) - (keys_available(%d) - keys_pending_retirement(%d))\n", zone_name, new_keys, keys_needed, keys_in_queue, keys_pending_retirement);
log_msg(NULL, LOG_WARNING, "Tried to allocate %d keys, failed on allocating key number %d", new_keys, i+1);
log_msg(NULL, LOG_WARNING, "Status=%d key_pair_id=%d",
status, key_pair_id);
log_msg(NULL, LOG_WARNING, "ods-enforcerd will create some more keys on its next run");
}

(log of status code and key_pair_id added by me, and indentation
fixed to be consistent.)

So, since I'm not running with manual key generation, I'm wary of
doing "ods-ksmutil key generate", as I fear that will at best be a
temporary fix. Besides, it appears to want a gazillion parameters
which are inconvenient at best to dig out.

If I count all the 'active' ZSKs which "ods-ksmutil key list" lists, I
come to 366, not 367.

If key_pair_id = 0 is indeed invalid, I guess the database has gotten
into a state where OpenDNSSEC refuses to mend it automatically.

Guidance sought.

Regards,

- Håvard
Havard Eidnes
2016-03-10 13:12:00 UTC
Permalink
Post by Havard Eidnes
If key_pair_id = 0 is indeed invalid, I guess the database has gotten
into a state where OpenDNSSEC refuses to mend it automatically.
Guidance sought.
Listing all the keys with "ods-ksmutil key list --verbose --all"
revealed:

NOT ALLOCATED KSK generate (not scheduled) (publish) 2048 8 3b929d0ab308b4e1e8bf81abf1e6dafe SoftHSM
NOT ALLOCATED ZSK generate (not scheduled) (publish) 1024 8 b3c5b3d619c086f41f3f2ed440419f23 SoftHSM

Yes, that's an empty "key tag" field; all the others have a value
there (after the "SoftHSM" tag). I wonder how it managed to get into
that state. Let's try to delete these two and see how it goes...

ods @ hugin: {7} ods-ksmutil key delete --cka_id 3b929d0ab308b4e1e8bf81abf1e6dafe
Key delete successful: 3b929d0ab308b4e1e8bf81abf1e6dafe
ods @ hugin: {8} ods-ksmutil key delete --cka_id b3c5b3d619c086f41f3f2ed440419f23
Key delete successful: b3c5b3d619c086f41f3f2ed440419f23
ods @ hugin: {9}
ods @ hugin: {9} ods-control enforcer stop
Stopping enforcer...
ods @ hugin: {10} ods-control enforcer start
Starting enforcer...
OpenDNSSEC ods-enforcerd started (version 1.4.9), pid 17045
ods @ hugin: {11}

Hmm, definitely better:

Mar 10 14:04:01 hugin ods-enforcerd: 367 zone(s) found on policy "default"
Mar 10 14:04:01 hugin ods-enforcerd: Predict we need 367 KSKs
Mar 10 14:04:01 hugin ods-enforcerd: Have 366 KSK keys in queue
Mar 10 14:04:01 hugin ods-enforcerd: Need 1 new KSK keys
Mar 10 14:04:01 hugin ods-enforcerd: 1 new KSK(s) (2048 bits) need to be created for policy default: keys_to_generate(1) = keys_needed(367) - keys_available(366).
Mar 10 14:04:01 hugin ods-enforcerd: Created key in repository SoftHSM
Mar 10 14:04:01 hugin ods-enforcerd: Created KSK size: 2048, alg: 8 with id: 95ebe2949eeb84fac9eee71573347b96 in repository: SoftHSM and database.
Mar 10 14:04:01 hugin ods-enforcerd: Predict we need 367 new ZSK keys
Mar 10 14:04:01 hugin ods-enforcerd: Have 366 ZSK keys in queue
Mar 10 14:04:01 hugin ods-enforcerd: Need 1 new ZSK keys
Mar 10 14:04:01 hugin ods-enforcerd: 1 new ZSK(s) (1024 bits) need to be created for policy default: keys_to_generate(1) = keys_needed(367) - keys_available(366).
Mar 10 14:04:01 hugin ods-enforcerd: Created key in repository SoftHSM
Mar 10 14:04:01 hugin ods-enforcerd: Created ZSK size: 1024, alg: 8 with id: ce16fcac12944304b81957d99c69a1fd in repository: SoftHSM and database.
Mar 10 14:04:01 hugin ods-enforcerd: NOTE: keys generated in repository SoftHSM will not become active until they have been backed up

and...

Mar 10 14:04:20 hugin ods-enforcerd: Zone mydomainname.no found.
Mar 10 14:04:20 hugin ods-enforcerd: Policy for mydomainname.no set to default.
Mar 10 14:04:20 hugin ods-enforcerd: Config will be output to /var/opendnssec/signconf/mydomainname.no.xml.
Mar 10 14:04:20 hugin ods-enforcerd: New unallocated keypair_id=2373
Mar 10 14:04:20 hugin ods-enforcerd: ZSK key allocation for zone mydomainname.no: 1 key(s) allocated
Mar 10 14:04:20 hugin ods-enforcerd: New unallocated keypair_id=2372
Mar 10 14:04:20 hugin ods-enforcerd: KSK key allocation for zone mydomainname.no: 1 key(s) allocated
Mar 10 14:04:20 hugin ods-enforcerd: INFO: Promoting ZSK from publish to active as this is the first pass for the zone
Mar 10 14:04:20 hugin ods-enforcerd: ERROR: Trying to make non-backed up ZSK active when RequireBackup flag is set
Mar 10 14:04:20 hugin ods-enforcerd: KsmRequestKeys returned: 65562
Mar 10 14:04:20 hugin ods-enforcerd: Signconf not written for mydomainname.no
Mar 10 14:04:20 hugin ods-enforcerd: Disconnecting from Database...

and doing a SoftHSM backup and a "ods-control enforcer notify"
finally caused the domain to be signed.

Regards,

- Håvard
Kevin Thompson
2016-03-10 23:24:42 UTC
Permalink
Post by Havard Eidnes
Post by Havard Eidnes
If key_pair_id = 0 is indeed invalid, I guess the database has gotten
into a state where OpenDNSSEC refuses to mend it automatically.
Guidance sought.
Listing all the keys with "ods-ksmutil key list --verbose --all"
NOT ALLOCATED KSK generate (not
scheduled) (publish) 2048 8
3b929d0ab308b4e1e8bf81abf1e6dafe SoftHSM
NOT ALLOCATED ZSK generate (not
scheduled) (publish) 1024 8
b3c5b3d619c086f41f3f2ed440419f23 SoftHSM
I ran into a similar problem last night. Long ago, I had incorrectly
deleted a zone I was using for testing, resulting in a few keys I could
not delete, but were reported against the 'NOT ALLOCATED'. I found
myself unable to delete the key because it was in a 'publish' state for
a zone that no longer existed, so I left it.

Everything worked fine, until last night when my KSK finally rolled
over. The KSK that was previously active had a lower primary key than
the "NOT ALLOCATED" key, but my new KSK had a larger primary key.

When the rollover happend, signerd crashed. After restarting ODS, I was
unable to get it to publish a DNSKEY record for the new KSK on the
working domain, even though it was publishing the DS record for the key.

I started working on exporting the existing database so I could back it
up, wipe it, and restore it. This is when I noticed that `ods-ksmutil
key export --all` was writing out the first couple keys but when it got
to the "NOT ALLOCATED" key, it stopped; I don't have the exact error but
it was something along the lines of 'Error invalid index -1, stopping'.
I'm guessing that the DNSKEY for my new KSK wasn't being output to the
zone file because of this problem.

To fix it, I shut down ODS, opened the kasp.db file in sqlite and
deleted the busted key from the `keypairs` table. Lo and behold,
'ods-ksmutil key export' worked, and soon enough my zone finally had a
DNSKEY for my new KSK.

--Kevin Thompson

Havard Eidnes
2016-01-28 10:39:29 UTC
Permalink
Post by Yuri Schaeffer
For now I made an issue in our tracker for it
https://issues.opendnssec.org/browse/OPENDNSSEC-752
Thanks again,

- Håvard
Paul Wouters
2016-02-01 16:08:53 UTC
Permalink
Post by Yuri Schaeffer
For now I made an issue in our tracker for it
https://issues.opendnssec.org/browse/OPENDNSSEC-752
This happened to me today, but I traced it back to only having 16MB free
on my disk. But I also saw:

Feb 1 03:07:55 ns0 ods-enforcerd: ERROR: error executing SQL - disk I/O error

ods-enforcerd had died, and after clearing up space and restarting it,
things fixed themselves. So failure behaviour has improved, as I know
this caused me a lot more pain in the past.

Paul
Havard Eidnes
2016-02-02 08:12:32 UTC
Permalink
Post by Paul Wouters
Post by Yuri Schaeffer
For now I made an issue in our tracker for it
https://issues.opendnssec.org/browse/OPENDNSSEC-752
This happened to me today, but I traced it back to only having
16MB free on my disk.
Just for the record, that's not the situation at our end:

/dev/sd0f 308G 73G 219G 24% /var
Post by Paul Wouters
Feb 1 03:07:55 ns0 ods-enforcerd: ERROR: error executing SQL - disk I/O error
...and our OpenDNSSEC hasn't logged this one any time over the
last two months.

Regards,

- Håvard
Loading...