Discussion:
[Opendnssec-user] Missing keys and various other problems on 2.0
Casper Gielen
2018-06-22 11:44:34 UTC
Permalink
Hi,
I recently upgrade from ODS 1.0 to 2.0. For a while everything was fine
but now I'm running into problems. I have a lot of symptoms and issues
that may or may not be related. Sorry for the long mail, but I'm afraid
to leave out important information if I would split it smaller problems.


My main problem is that zones lose DNSKEYs and get stuk with unverifiable signatures.
Here is an example:

http://dnsviz.net/d/wiskundeoptiu.nl/Wytklw/dnssec/

# ods-enforcer key list --zone wiskundeoptiu.nl -v
Keys:
Zone: Keytype: State: Date of next transition: Size: Algorithm: CKA_ID: Repository: KeyTag:
wiskundeoptiu.nl KSK retire 2018-06-20 15:14:02 2048 8 489db07082a644fcfa67f077627b7c7c LocalHSM 39466
wiskundeoptiu.nl ZSK retire 2018-06-20 15:14:02 1024 8 2f3c7829c40248b5537b3cd09266678c LocalHSM 50226
wiskundeoptiu.nl KSK active 2018-06-20 15:14:02 2048 8 758cc85fc16528184f32dbfab70663f6 LocalHSM 62161
wiskundeoptiu.nl ZSK active 2018-06-20 15:14:02 1024 8 8472c2bac0dbfc86d3a687644a3ef4f5 LocalHSM 59790
wiskundeoptiu.nl ZSK ready 2018-06-20 15:14:02 1024 8 3e97dcd131d9264cad2fb84676ade00e LocalHSM 28818

# ods-enforcer key list --zone wiskundeoptiu.nl -d
Keys:
Zone: Key role: DS: DNSKEY: RRSIGDNSKEY: RRSIG: Pub: Act: Id:
wiskundeoptiu.nl KSK hidden hidden hidden NA 0 0 489db07082a644fcfa67f077627b7c7c
wiskundeoptiu.nl ZSK NA hidden NA hidden 0 0 2f3c7829c40248b5537b3cd09266678c
wiskundeoptiu.nl KSK omnipresent omnipresent omnipresent NA 1 1 758cc85fc16528184f32dbfab70663f6
wiskundeoptiu.nl ZSK NA omnipresent NA omnipresent 1 1 8472c2bac0dbfc86d3a687644a3ef4f5
wiskundeoptiu.nl ZSK NA omnipresent NA rumoured 1 1 3e97dcd131d9264cad2fb84676ade00e

ZSK 50226 is still being used but it is not published in the zone.
It used to be signed with KSK 39466 which is also missing.

The only reasonable root cause that I can think of is that for a while
the enforcer was running as root instead of the 'opendnssec' user. I've changed
that and made sure that the opendnssec-user is allowed to access the softhms2 files.


On to the other symptoms:
Sometimes my logs contain entries like:
ods-enforcerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002

I think that was related to the wrong permissions. I think it means the key could not be found in the HSM.
Here is a little more context.

Jun 18 06:51:54 ramachandra ods-signerd: [worker[1]] report for duty
Jun 18 06:51:54 ramachandra ods-signerd: [scheduler] pop task for zone wiskundeoptiu.nl
Jun 18 06:51:54 ramachandra ods-signerd: [scheduler] unschedule task [sign] for zone wiskundeoptiu.nl
Jun 18 06:51:54 ramachandra ods-signerd: [worker[1]] start working on zone wiskundeoptiu.nl
Jun 18 06:51:54 ramachandra ods-signerd: [worker[1]] perform task [sign] for zone wiskundeoptiu.nl
Jun 18 06:51:54 ramachandra ods-signerd: [worker[1]] sign zone wiskundeoptiu.nl
Jun 18 06:51:54 ramachandra ods-signerd: [namedb] zone wiskundeoptiu.nl update serial: format=datecounter in=2018061502 internal=2018061800 out=0 now=1529297514
Jun 18 06:51:54 ramachandra ods-signerd: [namedb] zone wiskundeoptiu.nl update serial: 2018061502 + 298 = 2018061800
Jun 18 06:51:54 ramachandra ods-signerd: [zone] zone wiskundeoptiu.nl set soa serial to 2018061800
Jun 18 06:51:54 ramachandra ods-signerd: [worker[4]] report for duty
Jun 18 06:51:54 ramachandra ods-signerd: [worker[4]] nothing to do
Jun 18 06:51:54 ramachandra ods-signerd: [worker[2]] report for duty
Jun 18 06:51:54 ramachandra ods-signerd: [worker[2]] nothing to do
Jun 18 06:51:54 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:54 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:54 ramachandra ods-signerd: [worker[3]] report for duty
Jun 18 06:51:54 ramachandra ods-signerd: [worker[3]] nothing to do
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: [hsm] unable to get key: key 489db07082a644fcfa67f077627b7c7c not found
Jun 18 06:51:55 ramachandra ods-signerd: [hsm] hsm_get_dnskey(): Got NULL key
Jun 18 06:51:55 ramachandra ods-signerd: [hsm] unable to get key: hsm failed to create dnskey
Jun 18 06:51:55 ramachandra ods-signerd: [zone] unable to prepare signing keys for zone wiskundeoptiu.nl: error getting dnskey
Jun 18 06:51:55 ramachandra ods-signerd: [worker[1]] CRITICAL: failed to sign zone wiskundeoptiu.nl: General error
Jun 18 06:51:55 ramachandra ods-signerd: [worker[1]] backoff task [sign] for zone wiskundeoptiu.nl with 3600 seconds
Jun 18 06:51:55 ramachandra ods-signerd: [worker[1]] finished working on zone wiskundeoptiu.nl
Jun 18 06:51:55 ramachandra ods-signerd: [scheduler] schedule task [sign] for zone wiskundeoptiu.nl
Jun 18 06:51:55 ramachandra ods-signerd: [task] On Mon Jun 18 07:51:55 2018 I will [sign] zone wiskundeoptiu.nl

These logs suggest that key 489db07082a644fcfa67f077627b7c7c (keytag 39466) is not in the HSM.
However, it is:

# ods-hsmutil list LocalHSM | grep 489db07082a644fcfa67f077627b7c7c
LocalHSM 489db07082a644fcfa67f077627b7c7c RSA/2048

And it is in the SQL-database as well:
MariaDB [opendnssec]> SELECT * FROM hsmKey WHERE locator='489db07082a644fcfa67f077627b7c7c';
+-------+-----+----------+----------------------------------+-------+------+-----------+------+------------+-----------+---------+------------+--------+
| id | rev | policyId | locator | state | bits | algorithm | role | inception | isRevoked | keyType | repository | backup |
+-------+-----+----------+----------------------------------+-------+------+-----------+------+------------+-----------+---------+------------+--------+
| 36722 | 3 | 7 | 489db07082a644fcfa67f077627b7c7c | 2 | 2048 | 8 | 1 | 1528956951 | 0 | 1 | LocalHSM | 3 |
+-------+-----+----------+----------------------------------+-------+------+-----------+------+------------+-----------+---------+------------+--------+


Here is a little more from the database related to that key:

MariaDB [opendnssec]> SELECT id,rev,policyId,signconfNeedsWriting,nextChange,ttlEndDs,ttlEndDk,ttlEndRs,nextKskRoll,nextZskRoll,nextCskRoll FROM zone WHERE name='wiskundeoptiu.nl';
+------+-----+----------+----------------------+------------+------------+------------+------------+-------------+-------------+-------------+
| id | rev | policyId | signconfNeedsWriting | nextChange | ttlEndDs | ttlEndDk | ttlEndRs | nextKskRoll | nextZskRoll | nextCskRoll |
+------+-----+----------+----------------------+------------+------------+------------+------------+-------------+-------------+-------------+
| 1068 | 35 | 7 | 0 | 1530221972 | 1529575591 | 1529575591 | 1529583242 | 1560858829 | 1532014562 | 0 |
+------+-----+----------+----------------------+------------+------------+------------+------------+-------------+-------------+-------------+


MariaDB [opendnssec]> SELECT * FROM keyData WHERE hsmKeyId='36722';
+------+-----+--------+----------+-----------+------------+------+-------------+--------------+---------+-----------+---------+-----------+------------+--------+----------+
| id | rev | zoneId | hsmKeyId | algorithm | inception | role | introducing | shouldRevoke | standby | activeZsk | publish | activeKsk | dsAtParent | keytag | minimize |
+------+-----+--------+----------+-----------+------------+------+-------------+--------------+---------+-----------+---------+-----------+------------+--------+----------+
| 2773 | 22 | 1068 | 36722 | 8 | 1529044974 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 39466 | 4 |
+------+-----+--------+----------+-----------+------------+------+-------------+--------------+---------+-----------+---------+-----------+------------+--------+----------+

MariaDB [opendnssec]> SELECT * FROM keyState WHERE keyDataId=2773;
+-------+-----+-----------+------+-------+------------+----------+-------+
| id | rev | keyDataId | type | state | lastChange | minimize | ttl |
+-------+-----+-----------+------+-------+------------+----------+-------+
| 11089 | 0 | 2773 | 0 | 0 | 1529044974 | 1 | 3600 |
| 11090 | 3 | 2773 | 2 | 0 | 1529571216 | 0 | 3600 |
| 11091 | 3 | 2773 | 3 | 0 | 1529412859 | 0 | 3600 |
| 11092 | 0 | 2773 | 1 | 4 | 1529044974 | 0 | 86400 |
+-------+-----+-----------+------+-------+------------+----------+-------+



This leads to the next symptom or problem: "DB prepare Err 2006: MySQL server has gone away"
The enforcer seems to lose it's connection to MariaDB:

Jun 21 10:51:42 ramachandra ods-enforcerd: DB prepare SQL SELECT policy.id, policy.rev, policy.name, policy.description, policy.signaturesResign, policy.signaturesRefresh, policy.signaturesJitter, policy.signaturesInceptionOffset, policy.signaturesValidityDefault, policy.signaturesValidityDenial, policy.signaturesValidityKeyset, policy.signaturesMaxZoneTtl, policy.denialType, policy.denialOptout, policy.denialTtl, policy.denialResalt, policy.denialAlgorithm, policy.denialIterations, policy.denialSaltLength, policy.denialSalt, policy.denialSaltLastChange, policy.keysTtl, policy.keysRetireSafety, policy.keysPublishSafety, policy.keysShared, policy.keysPurgeAfter, policy.zonePropagationDelay, policy.zoneSoaTtl, policy.zoneSoaMinimum, policy.zoneSoaSerial, policy.parentRegistrationDelay, policy.parentPropagationDelay, policy.parentDsTtl, policy.parentSoaTtl, policy.parentSoaMinimum, policy.passthrough FROM policy WHERE policy.denialType = ?
Jun 21 10:51:42 ramachandra ods-enforcerd: DB prepare Err 2006: MySQL server has gone away
Jun 21 10:51:42 ramachandra ods-enforcerd: [policy_resalt_task] retrying in 60 seconds

The are no errors in the sql-logs and restarting mariadb doesn't change anything.
However after restarting the enforcer the problem goes away for some time but seems to
return after a while. Despite that I've tuned various MySQL-parameters asscociated
with this error-message but with no noticeable change.


It's not that ods-enforcer is entirley stuck, I does respond toe queries like 'key list'.
In fact, my logs show thousands of mysql-statements are logged for each such command.
Listing the keys for 1 zone causes about 3.5k(!) SQL-calls.
Each call is nicely prepared, and then closed immediatly without ever being executed.
I'm not a MySQL/MariaDB-expert so I might be reading the logs wrong, but as far as I
understand SQL each query should only be prepared once and then executed repeatedly.
Immediatly after restaring the enforcer I can see some Executes in the logs but after a little
while they stop appearing while the prepare-statements keep on coming:

# grep Prepare mysql.log|wc -l
1277451

# grep Execute mysql.log|wc -l
9072

<note>
I run 'ods-enforcer key list' from a script every three minutes, that inflates
the total numbers. This is from about 6 hours of logging, which means about
120 calls to 'ods-enforce key list' from just that one script only.
I've proably done as many manual calls while debugging this.
</note>


Good:
3152 Prepare SELECT zone.id, zone.rev, zone.policyId, zone.name, zone.signconfNeedsWriting, zone.signconfPath, zone.nextChange, zone.ttlEndDs, zone.ttlEndDk, zone.ttlEndRs, zone.rollKskNow, zone.rollZskNow, zone.rollCskNow, zone.i
nputAdapterType, zone.inputAdapterUri, zone.outputAdapterType, zone.outputAdapterUri, zone.nextKskRoll, zone.nextZskRoll, zone.nextCskRoll FROM zone WHERE zone.id = ?
3152 Execute SELECT zone.id, zone.rev, zone.policyId, zone.name, zone.signconfNeedsWriting, zone.signconfPath, zone.nextChange, zone.ttlEndDs, zone.ttlEndDk, zone.ttlEndRs, zone.rollKskNow, zone.rollZskNow, zone.rollCskNow, zone.i
nputAdapterType, zone.inputAdapterUri, zone.outputAdapterType, zone.outputAdapterUri, zone.nextKskRoll, zone.nextZskRoll, zone.nextCskRoll FROM zone WHERE zone.id = 685
3152 Close stmt

Bad:
3121 Prepare SELECT zone.id, zone.rev, zone.policyId, zone.name, zone.signconfNeedsWriting, zone.signconfPath, zone.nextChange, zone.ttlEndDs, zone.ttlEndDk, zone.ttlEndRs, zone.rollKskNow, zone.rollZskNow, zone.rollCskNow, zone.inputAdapterType, zone.inputAdapterUri, zone.outputAdapterType, zone.outputAdapterUri, zone.nextKskRoll, zone.nextZskRoll, zone.nextCskRoll FROM zone WHERE zone.id = ?
3121 Close stmt


Even though I call the first one 'good', it's not really ideal because the prepared statement
is closed after use instead of reused.


Finally, my HSM contains about 15k of unused, old keys (for a few hundred zones).
I don't know when, why or how exactly this occured (most of them were migrated
from softhsm1) but I do know my HSM is huge. I don't think it's the cause of my
problems but have included it anyway for the sake of completeness.

# ods-enforcer zone list | wc -l
369
# ods-enforcer key list 2>& /dev/null | wc -l
1177

About 1175 kesy (minus two lines for headers) for 369 zones seems right,
and does agree with the keyData table:

MariaDB [opendnssec]> select count(*) from keyData;
+----------+
| count(*) |
+----------+
| 1175 |
+----------+

However hsmKey has another idea:
MariaDB [opendnssec]> select count(*) from hsmKey;
+----------+
| count(*) |
+----------+
| 2031 |
+----------+


Comparing the two tables reveals 856 keys that seem lost.
MariaDB [opendnssec]> SELECT count(*) FROM hsmKey WHERE id NOT in (SELECT hsmKeyId FROM keyData);
+----------+
| count(*) |
+----------+
| 856 |
+----------+

However ods-hsmutil sees even more keys:

# ods-hsmutil list LocalHSM | wc -l
7283

and the number of files in the HSM is even larger:

# ls /var/lib/softhsm/tokens/1fee18db-b8ab-460a-072c-19d8e0f2cce9 | wc -l
29113

29113 objects for 7283 keys seems to much.


However the HSM contains far more keys and even more objects that seem not listed at all.
At some point I'll need to clean this out but I would like to solve other problems first
before I start throwing out keys.


PS. The information in this mail was collected over several days. I've tried to present
a complete and consistent view but you might notice that timestamps jump around.

PS2. While debugging I was confronted with the lack of documentation for ODS2.
A lot of documentation refers to ODS1 and RFCS and the reader will have to decide what
parts are (still) relevent and which are not. On top of that the RFCs leave a lot of
freedom, for example with various key-replacement schemes, which gives the reader vary
little guidance on how ODS actually works.
Especially the relations between the various state-machines, the various types of data
(KSK, ZSK, DS, RRSIG), the timings and TTLs from the KASP, and ds-{submit,seen,...}
commands seem impossible to grasp without a lot of prior knowledge and experience.
I'm not entirely sure I get it everything myself after working with it for years.
While I understand that ODS can't be a complete introduction to DNSSEC for newbies, I
think the treshold has become unreasonable high.

PS3. At times I've longed for a '--force' options to commands like 'key rollover' or
'--use-key' for 'sign'.
I know ODS tries to protect the user from mistakes, but there have been times where I
know that DNSSEC validation is not working and ODS prevents a quick recovery. The
fastest way to fix such problems is to delete the zone from ODS and recreate it which
I consider a poor solution.

PS4. The zone 'wiskundeoptiu.nl' that is used as an example in this mail is new and
DNSSEC has never worked properly. The same problem occurs with other, existing zones, but
I can't experiment with those as easily.


Sorry for the long mail, and thank you for your time if you've made it until here.
I hope someone has any suggestions on how to improve the situation.





Addendum - KASP policy for wiskundeoptiu.nl
<Policy name="sidn">
<Signatures>
<Resign>PT30M</Resign>
<Refresh>P7D</Refresh>
<Validity>
<Default>P14D</Default>
<Denial>P14D</Denial>
</Validity>
<Jitter>PT12H</Jitter>
<InceptionOffset>PT600S</InceptionOffset>
</Signatures>
<Denial>
<NSEC3>
<Resalt>P100D</Resalt>
<Hash>
<Algorithm>1</Algorithm>
<Iterations>5</Iterations>
<Salt length="8"/>
</Hash>
</NSEC3>
</Denial>
<Keys>
<TTL>PT3600S</TTL>
<RetireSafety>P1D</RetireSafety>
<PublishSafety>PT30M</PublishSafety>
<Purge>P30D</Purge>

<KSK>
<Algorithm length="2048">8</Algorithm>
<Lifetime>P1Y</Lifetime>
<Repository>LocalHSM</Repository>
</KSK>
<ZSK>
<Algorithm length="1024">8</Algorithm>
<Lifetime>P30D</Lifetime>
<Repository>LocalHSM</Repository>
</ZSK>
</Keys>
<Zone>
<PropagationDelay>PT30S</PropagationDelay>
<SOA>
<TTL>PT3600S</TTL>
<Minimum>PT300S</Minimum>
<Serial>datecounter</Serial>
</SOA>
</Zone>
<Parent>
<PropagationDelay>PT7200S</PropagationDelay>
<DS>
<TTL>PT3600S</TTL>
</DS>
<SOA>
<TTL>PT73600S</TTL>
<Minimum>PT600S</Minimum>
</SOA>
</Parent>
</Policy>
--
Casper Gielen <***@uvt.nl> | LIS UNIX
PGP fingerprint = 16BD 2C9F 8156 C242 F981 63B8 2214 083C F80E 4AF7

Universiteit van Tilburg | Postbus 90153, 5000 LE
Warandelaan 2 | Telefoon 013 466 4100 | G 236 | http://www.uvt.nl
Berry A.W. van Halderen
2018-06-22 13:33:16 UTC
Permalink
Post by Casper Gielen
My main problem is that zones lose DNSKEYs and get stuk with unverifiable signatures.
# ods-enforcer key list --zone wiskundeoptiu.nl -v
wiskundeoptiu.nl KSK retire 2018-06-20 15:14:02 2048 8 489db07082a644fcfa67f077627b7c7c LocalHSM 39466
wiskundeoptiu.nl ZSK retire 2018-06-20 15:14:02 1024 8 2f3c7829c40248b5537b3cd09266678c LocalHSM 50226
wiskundeoptiu.nl KSK active 2018-06-20 15:14:02 2048 8 758cc85fc16528184f32dbfab70663f6 LocalHSM 62161
wiskundeoptiu.nl ZSK active 2018-06-20 15:14:02 1024 8 8472c2bac0dbfc86d3a687644a3ef4f5 LocalHSM 59790
wiskundeoptiu.nl ZSK ready 2018-06-20 15:14:02 1024 8 3e97dcd131d9264cad2fb84676ade00e LocalHSM 28818
Either this is a transcript from two days ago, or indeed something is
stuck which (see later) might indeed be the case.
Post by Casper Gielen
# ods-enforcer key list --zone wiskundeoptiu.nl -d
wiskundeoptiu.nl KSK hidden hidden hidden NA 0 0 489db07082a644fcfa67f077627b7c7c
wiskundeoptiu.nl ZSK NA hidden NA hidden 0 0 2f3c7829c40248b5537b3cd09266678c
wiskundeoptiu.nl KSK omnipresent omnipresent omnipresent NA 1 1 758cc85fc16528184f32dbfab70663f6
wiskundeoptiu.nl ZSK NA omnipresent NA omnipresent 1 1 8472c2bac0dbfc86d3a687644a3ef4f5
wiskundeoptiu.nl ZSK NA omnipresent NA rumoured 1 1 3e97dcd131d9264cad2fb84676ade00e
ZSK 50226 is still being used but it is not published in the zone.
It used to be signed with KSK 39466 which is also missing.
The only reasonable root cause that I can think of is that for a while
the enforcer was running as root instead of the 'opendnssec' user. I've changed
that and made sure that the opendnssec-user is allowed to access the softhms2 files.
I'm giving a quick reply now, even though I've not analyzed your mail
further. But your current remarks make sense and might already give
a resolution to the problem.

If you had previous been running OpenDNSSEC as root, the signconf.xml
file for the zone (normally located somewhere in a signconf directory
(typically /var/opendnssec/signconf/wiskundeoptiu.nl.xml).
Might have been written as the root user, and when later running as
a different user, OpenDNSSEC may then no longer be able to replace
this file. The message for this might probably be overwhelmed by other
messages. It will not re-try by itself, so what you need to do
is remove the files owned by root in the signconf directory and
then force to re-generate them by giving a "ods-enforcer enforce"
command to force inspecting the zones.

There is no feedback-loop from the signer to the enforcer,
which is one of the ideas to be placed in as (optional) feature.
What this means is that the enforcer will step through key roll
procedures regardless of wether the signer has actually picked
up the changes (in the signconf). This will further lead to
problems because this means keys might actually be purged from
the HSM and the signer will then fail further on.

Could you check whether files are still owned by root?

\Berry
Casper Gielen
2018-06-25 09:49:28 UTC
Permalink
Post by Berry A.W. van Halderen
Post by Casper Gielen
My main problem is that zones lose DNSKEYs and get stuk with unverifiable signatures.
# ods-enforcer key list --zone wiskundeoptiu.nl -v
wiskundeoptiu.nl KSK retire 2018-06-20 15:14:02 2048 8 489db07082a644fcfa67f077627b7c7c LocalHSM 39466
wiskundeoptiu.nl ZSK retire 2018-06-20 15:14:02 1024 8 2f3c7829c40248b5537b3cd09266678c LocalHSM 50226
wiskundeoptiu.nl KSK active 2018-06-20 15:14:02 2048 8 758cc85fc16528184f32dbfab70663f6 LocalHSM 62161
wiskundeoptiu.nl ZSK active 2018-06-20 15:14:02 1024 8 8472c2bac0dbfc86d3a687644a3ef4f5 LocalHSM 59790
wiskundeoptiu.nl ZSK ready 2018-06-20 15:14:02 1024 8 3e97dcd131d9264cad2fb84676ade00e LocalHSM 28818
Either this is a transcript from two days ago, or indeed something is
stuck which (see later) might indeed be the case.
It is indeed a old data, although nothing really changed since other
than the date of next transition.
Post by Berry A.W. van Halderen
If you had previous been running OpenDNSSEC as root, the signconf.xml
file for the zone (normally located somewhere in a signconf directory
(typically /var/opendnssec/signconf/wiskundeoptiu.nl.xml).
Might have been written as the root user, and when later running as
a different user, OpenDNSSEC may then no longer be able to replace
this file.
I've verified that everything under /var/lib/opendnssec is readable and
writable by the opendnssec user. The configuration, under
/etc/opendnssec, is readable but not writable.
Post by Berry A.W. van Halderen
There is no feedback-loop from the signer to the enforcer,
which is one of the ideas to be placed in as (optional) feature.
What this means is that the enforcer will step through key roll
procedures regardless of wether the signer has actually picked
up the changes (in the signconf). This will further lead to
problems because this means keys might actually be purged from
the HSM and the signer will then fail further on.
That's good to know. I guess this also means that if ODS server is not
available (eg powered off) for a few days that when it comes back online
it might take a big step forward?

Ik bring it up because when this problem first surfaced (at an
inconvenient time, as usual ;) ) I restored a backup from a few days
before so we could get through the weekend and then shutdown ODS>

In hindsight this might have caused problems when I turned it back on.

Thanks for you advice!
--
Casper Gielen <***@uvt.nl> | LIS UNIX
PGP fingerprint = 16BD 2C9F 8156 C242 F981 63B8 2214 083C F80E 4AF7

Universiteit van Tilburg | Postbus 90153, 5000 LE
Warandelaan 2 | Telefoon 013 466 4100 | G 236 | http://www.uvt.nl
Casper Gielen
2018-06-25 13:05:33 UTC
Permalink
Post by Casper Gielen
I've verified that everything under /var/lib/opendnssec is readable and
writable by the opendnssec user. The configuration, under
/etc/opendnssec, is readable but not writable.
Minutes after I wrote this a colleague added a new zone (ucgv.nl) that immediately ran into trouble.
Unfortunately I do not have complete logging, this is what I do have:

Jun 25 11:15:49 ramachandra ods-enforcerd: [zonelist_import] zone ucgv.nl created
...
Jun 25 11:16:02 ramachandra ods-enforcerd: [enforcer] update zone: ucgv.nl
Jun 25 11:16:02 ramachandra ods-enforcerd: 199 zone(s) found on policy "sidn"
Jun 25 11:16:02 ramachandra ods-enforcerd: [hsm_key_factory_generate] 1 keys needed for 199 zones covering 31536000 seconds, generating 1 keys for policy sidn
Jun 25 11:16:02 ramachandra ods-enforcerd: 1 new KSK(s) (2048 bits) need to be created.
Jun 25 11:16:02 ramachandra ods-enforcerd: 199 zone(s) found on policy "sidn"
Jun 25 11:16:02 ramachandra ods-enforcerd: [hsm_key_factory_generate] 13 keys needed for 199 zones covering 31536000 seconds, generating 1 keys for policy sidn
Jun 25 11:16:02 ramachandra ods-enforcerd: 1 new ZSK(s) (1024 bits) need to be created.
Jun 25 11:16:02 ramachandra ods-enforcerd: [signconf_cmd] performing signconf for all zones
Jun 25 11:16:02 ramachandra ods-enforcerd: [signconf_cmd] signconf done, notifying signer
...
Jun 25 11:30:17 ramachandra ods-signerd: [hsm] unable to get key: key a1d5274f2e3c73eb73ec99c16e781d0d not found
Jun 25 11:30:17 ramachandra ods-signerd: [hsm] hsm_get_dnskey(): Got NULL key
Jun 25 11:30:17 ramachandra ods-signerd: [hsm] unable to get key: hsm failed to create dnskey
Jun 25 11:30:17 ramachandra ods-signerd: [zone] unable to prepare signing keys for zone ucgv.nl: error getting dnskey
Jun 25 11:30:17 ramachandra ods-signerd: [worker[1]] CRITICAL: failed to sign zone ucgv.nl: General error

***@ramachandra:~# ods-enforcer key list --zone ucgv.nl -v
Keys:
Zone: Keytype: State: Date of next transition: Size: Algorithm: CKA_ID: Repository: KeyTag:
ucgv.nl ZSK publish 2018-06-26 11:16:27 1024 8 991337c6e4aba10487ef75d8ca990668 LocalHSM 24390

***@ramachandra:~# ods-enforcer key list --zone ucgv.nl -d
Keys:
Zone: Key role: DS: DNSKEY: RRSIGDNSKEY: RRSIG: Pub: Act: Id:
ucgv.nl ZSK NA hidden NA rumoured 0 1 991337c6e4aba10487ef75d8ca990668

The log shows he followed our usual procedure to the tee.
I immediatly repeated his steps, except for a different domain name, but was unable to reproduce te problem.
The difference might be that I worked with a frehsly restarted ods-enforcer, while it had been running for three days straight when my colleague added the zone.



Jun 25 13:40:08 ramachandra ods-enforcerd: [zonelist_import] zone ucgvtest2.nl created
Jun 25 13:40:16 ramachandra ods-enforcerd: [enforcer] update zone: ucgvtest2.nl
Jun 25 13:40:16 ramachandra ods-enforcerd: [hsm_key_factory_get_key] key allocated
Jun 25 13:40:16 ramachandra ods-enforcerd: [scheduler] schedule task [hsmkeygen] for policy_key
Jun 25 13:40:16 ramachandra ods-enforcerd: [scheduler] signal now
Jun 25 13:40:16 ramachandra ods-enforcerd: [enforcer] updatePolicy: got new key from HSM
Jun 25 13:40:16 ramachandra ods-enforcerd: [worker[4]]: report for duty
Jun 25 13:40:16 ramachandra ods-enforcerd: [scheduler] SIGALRM set
Jun 25 13:40:16 ramachandra ods-enforcerd: [worker[4]] start working
Jun 25 13:40:16 ramachandra ods-enforcerd: [worker[4]]: perform task [hsmkeygen] for policy_key
Jun 25 13:40:16 ramachandra ods-enforcerd: SELECT policy.id, policy.rev, policy.name, policy.description, policy.signaturesResign, policy.signaturesRefresh, policy.signaturesJitter, policy.signaturesInceptionOffset, policy.signaturesValidityDefault, policy.signaturesValidityDenial, policy.signaturesValidityKeyset, policy.signaturesMaxZoneTtl, policy.denialType, policy.denialOptout, policy.denialTtl, policy.denialResalt, policy.denialAlgorithm, policy.denialIterations, policy.denialSaltLength, policy.denialSalt, policy.denialSaltLastChange, policy.keysTtl, policy.keysRetireSafety, policy.keysPublishSafety, policy.keysShared, policy.keysPurgeAfter, policy.zonePropagationDelay, policy.zoneSoaTtl, policy.zoneSoaMinimum, policy.zoneSoaSerial, policy.parentRegistrationDelay, policy.parentPropagationDelay, policy.parentDsTtl, policy.parentSoaTtl, policy.parentSoaMinimum, policy.passthrough FROM policy WHERE policy.id = ?
Jun 25 13:40:16 ramachandra ods-enforcerd: [hsm_key_factory_generate_task] generate for policy key [duration: 0]
Jun 25 13:40:16 ramachandra ods-enforcerd: [hsm_key_factory_generate] repository LocalHSM role KSK
Jun 25 13:40:16 ramachandra ods-enforcerd: SELECT COUNT(*) FROM hsmKey WHERE hsmKey.policyId = ? AND hsmKey.state = ? AND hsmKey.bits = ? AND hsmKey.algorithm = ? AND hsmKey.role = ? AND hsmKey.isRevoked = ? AND hsmKey.keyType = ? AND hsmKey.repository = ?
Jun 25 13:40:16 ramachandra ods-enforcerd: SELECT COUNT(*) FROM zone WHERE zone.policyId = ?
Jun 25 13:40:16 ramachandra ods-enforcerd: 200 zone(s) found on policy "sidn"
Jun 25 13:40:16 ramachandra ods-enforcerd: [hsm_key_factory_generate] 1 keys needed for 200 zones covering 31536000 seconds, generating 1 keys for policy sidn
Jun 25 13:40:16 ramachandra ods-enforcerd: 1 new KSK(s) (2048 bits) need to be created.
Jun 25 13:40:16 ramachandra ods-enforcerd: [hsm_key_factory_generate] generated key 5ddcfb83faa841d034ee262586391216 successfully
Jun 25 13:40:16 ramachandra ods-enforcerd: [hsm_key_factory_generate_task] generate for policy key done

Jun 25 13:40:16 ramachandra ods-enforcerd: 200 zone(s) found on policy "sidn"
Jun 25 13:40:16 ramachandra ods-enforcerd: [hsm_key_factory_generate] 13 keys needed for 200 zones covering 31536000 seconds, generating 1 keys for policy sidn
Jun 25 13:40:16 ramachandra ods-enforcerd: 1 new ZSK(s) (1024 bits) need to be created.
Jun 25 13:40:17 ramachandra ods-enforcerd: [hsm_key_factory_generate] generated key e484bf43583403c1a1658c2c7e30d47b successfully
Jun 25 13:40:17 ramachandra ods-enforcerd: [hsm_key_factory_generate_task] generate for policy key done
Jun 25 13:40:17 ramachandra ods-enforcerd: [worker[2]] finished working
Jun 25 13:40:17 ramachandra ods-enforcerd: [worker[2]]: report for duty


Jun 25 13:40:17 ramachandra ods-enforcerd: SELECT keyDependency.id, keyDependency.rev, keyDependency.zoneId, keyDependency.fromKeyDataId, keyDependency.toKeyDataId, keyDependency.type FROM keyDependency WHERE keyDependency.zoneId = ?
Jun 25 13:40:17 ramachandra ods-enforcerd: SELECT keyState.id, keyState.rev, keyState.keyDataId, keyState.type, keyState.state, keyState.lastChange, keyState.minimize, keyState.ttl FROM keyState WHERE keyState.keyDataId = ?
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone: processing key 0c9c2506e33fe702c96f989bf78ab66f 4
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone: May KSK 0c9c2506e33fe702c96f989bf78ab66f DS in state hidden transition to rumoured?
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone: May KSK 0c9c2506e33fe702c96f989bf78ab66f DNSKEY in state hidden transition to rumoured?
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone Policy says we can (1/3)
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone: May KSK 0c9c2506e33fe702c96f989bf78ab66f RRSIGDNSKEY in state hidden transition to rumoured?
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone: processing key 68e476a72a0ab6e296dc309334276588 1
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone: May ZSK 68e476a72a0ab6e296dc309334276588 DNSKEY in state hidden transition to rumoured?
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone Policy says we can (1/3)
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone: May ZSK 68e476a72a0ab6e296dc309334276588 RRSIG in state rumoured transition to omnipresent?
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone Policy says we can (1/3)
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone DNSSEC says we can (2/3)
Jun 25 13:40:17 ramachandra ods-enforcerd: UPDATE keyData SET zoneId = ?, hsmKeyId = ?, algorithm = ?, inception = ?, role = ?, introducing = ?, shouldRevoke = ?, standby = ?, activeZsk = ?, publish = ?, activeKsk = ?, dsAtParent = ?, keytag = ?, minimize = ?, rev = ? WHERE keyData.id = ? AND keyData.rev = ?
Jun 25 13:40:17 ramachandra ods-enforcerd: UPDATE keyData SET zoneId = ?, hsmKeyId = ?, algorithm = ?, inception = ?, role = ?, introducing = ?, shouldRevoke = ?, standby = ?, activeZsk = ?, publish = ?, activeKsk = ?, dsAtParent = ?, keytag = ?, minimize = ?, rev = ? WHERE keyData.id = ? AND keyData.rev = ?
Jun 25 13:40:17 ramachandra ods-enforcerd: SELECT keyData.id, keyData.rev, keyData.zoneId, keyData.hsmKeyId, keyData.algorithm, keyData.inception, keyData.role, keyData.introducing, keyData.shouldRevoke, keyData.standby, keyData.activeZsk, keyData.publish, keyData.activeKsk, keyData.dsAtParent, keyData.keytag, keyData.minimize FROM keyData WHERE keyData.zoneId = ?
Jun 25 13:40:17 ramachandra ods-enforcerd: Next update for zone ucgvtest2.nl scheduled at Tue Jun 26 13:40:40 2018

***@ramachandra:~# ods-enforcer key list --zone ucgvtest2.nl -v
Keys:
Zone: Keytype: State: Date of next transition: Size: Algorithm: CKA_ID: Repository: KeyTag:
ucgvtest2.nl ZSK publish 2018-06-26 13:40:40 1024 8 68e476a72a0ab6e296dc309334276588 LocalHSM 14190
key list completed in 0 seconds.
***@ramachandra:~# ods-enforcer key list --zone ucgvtest2.nl -d
Keys:
Zone: Key role: DS: DNSKEY: RRSIGDNSKEY: RRSIG: Pub: Act: Id:
ucgvtest2.nl ZSK NA hidden NA rumoured 0 1 68e476a72a0ab6e296dc309334276588

Now I'm a bit miffed by these zones with a ZSK but without a KSK.

My original problem, the zone wiskundeoptiu.nl, remains, but the situation has been slightly improved
because some of the signatures have been replaced by signatures with the current ZSK.

I'm considering wether or not to edit the database directly to get the zone back in a workable condition by removing the problematic keys.
First I'm going to delete the zone ucgv.nl, which is not yet in use, and redo it in the hope that it magically works out this time.
--
Casper Gielen <***@uvt.nl> | LIS UNIX
PGP fingerprint = 16BD 2C9F 8156 C242 F981 63B8 2214 083C F80E 4AF7

Universiteit van Tilburg | Postbus 90153, 5000 LE
Warandelaan 2 | Telefoon 013 466 4100 | G 236 | http://www.uvt.nl
Berry A.W. van Halderen
2018-06-25 13:45:01 UTC
Permalink
Post by Casper Gielen
Post by Casper Gielen
I've verified that everything under /var/lib/opendnssec is readable and
writable by the opendnssec user. The configuration, under
/etc/opendnssec, is readable but not writable.
Minutes after I wrote this a colleague added a new zone (ucgv.nl) that immediately ran into trouble.
This could be much unrelated from the earlier issue.

Are you using SoftHSM as HSM? If so, which version?
There is a known, resolved issue with certain versions.

\Berry
Post by Casper Gielen
Jun 25 11:15:49 ramachandra ods-enforcerd: [zonelist_import] zone ucgv.nl created
...
Jun 25 11:16:02 ramachandra ods-enforcerd: [enforcer] update zone: ucgv.nl
Jun 25 11:16:02 ramachandra ods-enforcerd: 199 zone(s) found on policy "sidn"
Jun 25 11:16:02 ramachandra ods-enforcerd: [hsm_key_factory_generate] 1 keys needed for 199 zones covering 31536000 seconds, generating 1 keys for policy sidn
Jun 25 11:16:02 ramachandra ods-enforcerd: 1 new KSK(s) (2048 bits) need to be created.
Jun 25 11:16:02 ramachandra ods-enforcerd: 199 zone(s) found on policy "sidn"
Jun 25 11:16:02 ramachandra ods-enforcerd: [hsm_key_factory_generate] 13 keys needed for 199 zones covering 31536000 seconds, generating 1 keys for policy sidn
Jun 25 11:16:02 ramachandra ods-enforcerd: 1 new ZSK(s) (1024 bits) need to be created.
Jun 25 11:16:02 ramachandra ods-enforcerd: [signconf_cmd] performing signconf for all zones
Jun 25 11:16:02 ramachandra ods-enforcerd: [signconf_cmd] signconf done, notifying signer
...
Jun 25 11:30:17 ramachandra ods-signerd: [hsm] unable to get key: key a1d5274f2e3c73eb73ec99c16e781d0d not found
Jun 25 11:30:17 ramachandra ods-signerd: [hsm] hsm_get_dnskey(): Got NULL key
Jun 25 11:30:17 ramachandra ods-signerd: [hsm] unable to get key: hsm failed to create dnskey
Jun 25 11:30:17 ramachandra ods-signerd: [zone] unable to prepare signing keys for zone ucgv.nl: error getting dnskey
Jun 25 11:30:17 ramachandra ods-signerd: [worker[1]] CRITICAL: failed to sign zone ucgv.nl: General error
ucgv.nl ZSK publish 2018-06-26 11:16:27 1024 8 991337c6e4aba10487ef75d8ca990668 LocalHSM 24390
ucgv.nl ZSK NA hidden NA rumoured 0 1 991337c6e4aba10487ef75d8ca990668
The log shows he followed our usual procedure to the tee.
I immediatly repeated his steps, except for a different domain name, but was unable to reproduce te problem.
The difference might be that I worked with a frehsly restarted ods-enforcer, while it had been running for three days straight when my colleague added the zone.
Jun 25 13:40:08 ramachandra ods-enforcerd: [zonelist_import] zone ucgvtest2.nl created
Jun 25 13:40:16 ramachandra ods-enforcerd: [enforcer] update zone: ucgvtest2.nl
Jun 25 13:40:16 ramachandra ods-enforcerd: [hsm_key_factory_get_key] key allocated
Jun 25 13:40:16 ramachandra ods-enforcerd: [scheduler] schedule task [hsmkeygen] for policy_key
Jun 25 13:40:16 ramachandra ods-enforcerd: [scheduler] signal now
Jun 25 13:40:16 ramachandra ods-enforcerd: [enforcer] updatePolicy: got new key from HSM
Jun 25 13:40:16 ramachandra ods-enforcerd: [worker[4]]: report for duty
Jun 25 13:40:16 ramachandra ods-enforcerd: [scheduler] SIGALRM set
Jun 25 13:40:16 ramachandra ods-enforcerd: [worker[4]] start working
Jun 25 13:40:16 ramachandra ods-enforcerd: [worker[4]]: perform task [hsmkeygen] for policy_key
Jun 25 13:40:16 ramachandra ods-enforcerd: SELECT policy.id, policy.rev, policy.name, policy.description, policy.signaturesResign, policy.signaturesRefresh, policy.signaturesJitter, policy.signaturesInceptionOffset, policy.signaturesValidityDefault, policy.signaturesValidityDenial, policy.signaturesValidityKeyset, policy.signaturesMaxZoneTtl, policy.denialType, policy.denialOptout, policy.denialTtl, policy.denialResalt, policy.denialAlgorithm, policy.denialIterations, policy.denialSaltLength, policy.denialSalt, policy.denialSaltLastChange, policy.keysTtl, policy.keysRetireSafety, policy.keysPublishSafety, policy.keysShared, policy.keysPurgeAfter, policy.zonePropagationDelay, policy.zoneSoaTtl, policy.zoneSoaMinimum, policy.zoneSoaSerial, policy.parentRegistrationDelay, policy.parentPropagationDelay, policy.parentDsTtl, policy.parentSoaTtl, policy.parentSoaMinimum, policy.passthrough FROM policy WHERE policy.id = ?
Jun 25 13:40:16 ramachandra ods-enforcerd: [hsm_key_factory_generate_task] generate for policy key [duration: 0]
Jun 25 13:40:16 ramachandra ods-enforcerd: [hsm_key_factory_generate] repository LocalHSM role KSK
Jun 25 13:40:16 ramachandra ods-enforcerd: SELECT COUNT(*) FROM hsmKey WHERE hsmKey.policyId = ? AND hsmKey.state = ? AND hsmKey.bits = ? AND hsmKey.algorithm = ? AND hsmKey.role = ? AND hsmKey.isRevoked = ? AND hsmKey.keyType = ? AND hsmKey.repository = ?
Jun 25 13:40:16 ramachandra ods-enforcerd: SELECT COUNT(*) FROM zone WHERE zone.policyId = ?
Jun 25 13:40:16 ramachandra ods-enforcerd: 200 zone(s) found on policy "sidn"
Jun 25 13:40:16 ramachandra ods-enforcerd: [hsm_key_factory_generate] 1 keys needed for 200 zones covering 31536000 seconds, generating 1 keys for policy sidn
Jun 25 13:40:16 ramachandra ods-enforcerd: 1 new KSK(s) (2048 bits) need to be created.
Jun 25 13:40:16 ramachandra ods-enforcerd: [hsm_key_factory_generate] generated key 5ddcfb83faa841d034ee262586391216 successfully
Jun 25 13:40:16 ramachandra ods-enforcerd: [hsm_key_factory_generate_task] generate for policy key done
Jun 25 13:40:16 ramachandra ods-enforcerd: 200 zone(s) found on policy "sidn"
Jun 25 13:40:16 ramachandra ods-enforcerd: [hsm_key_factory_generate] 13 keys needed for 200 zones covering 31536000 seconds, generating 1 keys for policy sidn
Jun 25 13:40:16 ramachandra ods-enforcerd: 1 new ZSK(s) (1024 bits) need to be created.
Jun 25 13:40:17 ramachandra ods-enforcerd: [hsm_key_factory_generate] generated key e484bf43583403c1a1658c2c7e30d47b successfully
Jun 25 13:40:17 ramachandra ods-enforcerd: [hsm_key_factory_generate_task] generate for policy key done
Jun 25 13:40:17 ramachandra ods-enforcerd: [worker[2]] finished working
Jun 25 13:40:17 ramachandra ods-enforcerd: [worker[2]]: report for duty
Jun 25 13:40:17 ramachandra ods-enforcerd: SELECT keyDependency.id, keyDependency.rev, keyDependency.zoneId, keyDependency.fromKeyDataId, keyDependency.toKeyDataId, keyDependency.type FROM keyDependency WHERE keyDependency.zoneId = ?
Jun 25 13:40:17 ramachandra ods-enforcerd: SELECT keyState.id, keyState.rev, keyState.keyDataId, keyState.type, keyState.state, keyState.lastChange, keyState.minimize, keyState.ttl FROM keyState WHERE keyState.keyDataId = ?
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone: processing key 0c9c2506e33fe702c96f989bf78ab66f 4
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone: May KSK 0c9c2506e33fe702c96f989bf78ab66f DS in state hidden transition to rumoured?
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone: May KSK 0c9c2506e33fe702c96f989bf78ab66f DNSKEY in state hidden transition to rumoured?
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone Policy says we can (1/3)
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone: May KSK 0c9c2506e33fe702c96f989bf78ab66f RRSIGDNSKEY in state hidden transition to rumoured?
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone: processing key 68e476a72a0ab6e296dc309334276588 1
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone: May ZSK 68e476a72a0ab6e296dc309334276588 DNSKEY in state hidden transition to rumoured?
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone Policy says we can (1/3)
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone: May ZSK 68e476a72a0ab6e296dc309334276588 RRSIG in state rumoured transition to omnipresent?
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone Policy says we can (1/3)
Jun 25 13:40:17 ramachandra ods-enforcerd: [enforcer] updateZone DNSSEC says we can (2/3)
Jun 25 13:40:17 ramachandra ods-enforcerd: UPDATE keyData SET zoneId = ?, hsmKeyId = ?, algorithm = ?, inception = ?, role = ?, introducing = ?, shouldRevoke = ?, standby = ?, activeZsk = ?, publish = ?, activeKsk = ?, dsAtParent = ?, keytag = ?, minimize = ?, rev = ? WHERE keyData.id = ? AND keyData.rev = ?
Jun 25 13:40:17 ramachandra ods-enforcerd: UPDATE keyData SET zoneId = ?, hsmKeyId = ?, algorithm = ?, inception = ?, role = ?, introducing = ?, shouldRevoke = ?, standby = ?, activeZsk = ?, publish = ?, activeKsk = ?, dsAtParent = ?, keytag = ?, minimize = ?, rev = ? WHERE keyData.id = ? AND keyData.rev = ?
Jun 25 13:40:17 ramachandra ods-enforcerd: SELECT keyData.id, keyData.rev, keyData.zoneId, keyData.hsmKeyId, keyData.algorithm, keyData.inception, keyData.role, keyData.introducing, keyData.shouldRevoke, keyData.standby, keyData.activeZsk, keyData.publish, keyData.activeKsk, keyData.dsAtParent, keyData.keytag, keyData.minimize FROM keyData WHERE keyData.zoneId = ?
Jun 25 13:40:17 ramachandra ods-enforcerd: Next update for zone ucgvtest2.nl scheduled at Tue Jun 26 13:40:40 2018
ucgvtest2.nl ZSK publish 2018-06-26 13:40:40 1024 8 68e476a72a0ab6e296dc309334276588 LocalHSM 14190
key list completed in 0 seconds.
ucgvtest2.nl ZSK NA hidden NA rumoured 0 1 68e476a72a0ab6e296dc309334276588
Now I'm a bit miffed by these zones with a ZSK but without a KSK.
My original problem, the zone wiskundeoptiu.nl, remains, but the situation has been slightly improved
because some of the signatures have been replaced by signatures with the current ZSK.
I'm considering wether or not to edit the database directly to get the zone back in a workable condition by removing the problematic keys.
First I'm going to delete the zone ucgv.nl, which is not yet in use, and redo it in the hope that it magically works out this time.
Casper Gielen
2018-06-25 15:05:40 UTC
Permalink
Post by Berry A.W. van Halderen
Post by Casper Gielen
Post by Casper Gielen
I've verified that everything under /var/lib/opendnssec is readable and
writable by the opendnssec user. The configuration, under
/etc/opendnssec, is readable but not writable.
Minutes after I wrote this a colleague added a new zone (ucgv.nl) that immediately ran into trouble.
This could be much unrelated from the earlier issue.
Are you using SoftHSM as HSM? If so, which version?
There is a known, resolved issue with certain versions.
We were on SoftHSM 2, version 2.2.0, Debian package v3, as provided by
Debian Stretch. I just switched to SoftHSM 2.4.0, form Debian Unstable.

Onfortunately this did not magically solve my problems, the signer is
still not able to get the key that should be available:

# ods-hsmutil list | grep a1d5274f2e3c73eb73ec99c16e781d0d /tmp/hsmkeys
LocalHSM a1d5274f2e3c73eb73ec99c16e781d0d RSA/2048

I'll run it for a bit and see if anything improves.
--
Casper Gielen <***@uvt.nl> | LIS UNIX
PGP fingerprint = 16BD 2C9F 8156 C242 F981 63B8 2214 083C F80E 4AF7

Universiteit van Tilburg | Postbus 90153, 5000 LE
Warandelaan 2 | Telefoon 013 466 4100 | G 236 | http://www.uvt.nl
Casper Gielen
2018-06-28 12:06:49 UTC
Permalink
Post by Berry A.W. van Halderen
Are you using SoftHSM as HSM? If so, which version?
There is a known, resolved issue with certain versions.
I just switched to SoftHSM 2.4.0, from Debian Unstable.
I'll run it for a bit and see if anything improves.
After two days nothing has happened. That is, all keys seem to be in
exactly the same state as two days ago.

Calling 'ods-enforcer enforce' manually does trigger something, but the
enforcer is not able to talk to our SQL server.

Jun 28 11:52:16 ramachandra ods-enforcerd: DB prepare SQL SELECT
policy.id, policy.rev, policy.name, policy.description,
policy.signaturesResign, policy.signaturesRefresh,
policy.signaturesJitter, policy.signaturesInceptionOffset,
policy.signaturesValidityDefault, policy.signaturesValidityDenial, po
licy.signaturesValidityKeyset, policy.signaturesMaxZoneTtl,
policy.denialType, policy.denialOptout, policy.denialTtl,
policy.denialResalt, policy.denialAlgorithm, policy.denialIterations,
policy.denialSaltLength, policy.denialSalt, policy.denialSaltLastChange,
policy.keysTtl, policy.keysRetireSafety
, policy.keysPublishSafety, policy.keysShared, policy.keysPurgeAfter,
policy.zonePropagationDelay, policy.zoneSoaTtl, policy.zoneSoaMinimum,
policy.zoneSoaSerial, policy.parentRegistrationDelay,
policy.parentPropagationDelay, policy.parentDsTtl, policy.parentSoaTtl,
policy.parentSoaMinimum, policy.p
assthrough FROM policy WHERE policy.id = ?
Jun 28 11:52:16 ramachandra ods-enforcerd: DB prepare Err 2006: MySQL
server has gone away
Jun 28 11:52:16 ramachandra ods-enforcerd:
[hsm_key_factory_generate_task] generate for policy key [duration: 0]
Jun 28 11:52:16 ramachandra ods-enforcerd: [hsm_key_factory_generate]
repository LocalHSM role KSK
Jun 28 11:52:16 ramachandra ods-enforcerd: SELECT COUNT(*) FROM hsmKey
WHERE hsmKey.policyId = ? AND hsmKey.state = ? AND hsmKey.bits = ? AND
hsmKey.algorithm = ? AND hsmKey.role = ? AND hsmKey.isRevoked = ? AND
hsmKey.keyType = ? AND hsmKey.repository = ?
Jun 28 11:52:16 ramachandra ods-enforcerd: DB prepare SQL SELECT
COUNT(*) FROM hsmKey WHERE hsmKey.policyId = ? AND hsmKey.state = ? AND
hsmKey.bits = ? AND hsmKey.algorithm = ? AND hsmKey.role = ? AND
hsmKey.isRevoked = ? AND hsmKey.keyType = ? AND hsmKey.repository = ?
Jun 28 11:52:16 ramachandra ods-enforcerd: DB prepare Err 2006: MySQL
server has gone away

After restarting the enforcer it connects correctly to MySQL and the
keys start advancing through the various states.

I've added a cron-job that restarts the enforcer every 6 hours.
That's not ideal but should make clear if the problem is just that the
enforcer gets stuck and thus misses its deadlines, or if the problems go
deeper.
--
Casper Gielen <***@uvt.nl> | LIS UNIX
PGP fingerprint = 16BD 2C9F 8156 C242 F981 63B8 2214 083C F80E 4AF7

Universiteit van Tilburg | Postbus 90153, 5000 LE
Warandelaan 2 | Telefoon 013 466 4100 | G 236 | http://www.uvt.nl
Berry A.W. van Halderen
2018-07-02 08:33:28 UTC
Permalink
Post by Casper Gielen
Post by Berry A.W. van Halderen
Are you using SoftHSM as HSM? If so, which version?
There is a known, resolved issue with certain versions.
I just switched to SoftHSM 2.4.0, from Debian Unstable.
I'll run it for a bit and see if anything improves.
After two days nothing has happened. That is, all keys seem to be in
exactly the same state as two days ago.
Calling 'ods-enforcer enforce' manually does trigger something, but the
enforcer is not able to talk to our SQL server.
Jun 28 11:52:16 ramachandra ods-enforcerd: DB prepare SQL SELECT
policy.id, policy.rev, policy.name, policy.description,
policy.signaturesResign, policy.signaturesRefresh,
policy.signaturesJitter, policy.signaturesInceptionOffset,
policy.signaturesValidityDefault, policy.signaturesValidityDenial, po
licy.signaturesValidityKeyset, policy.signaturesMaxZoneTtl,
policy.denialType, policy.denialOptout, policy.denialTtl,
policy.denialResalt, policy.denialAlgorithm, policy.denialIterations,
policy.denialSaltLength, policy.denialSalt, policy.denialSaltLastChange,
policy.keysTtl, policy.keysRetireSafety
, policy.keysPublishSafety, policy.keysShared, policy.keysPurgeAfter,
policy.zonePropagationDelay, policy.zoneSoaTtl, policy.zoneSoaMinimum,
policy.zoneSoaSerial, policy.parentRegistrationDelay,
policy.parentPropagationDelay, policy.parentDsTtl, policy.parentSoaTtl,
policy.parentSoaMinimum, policy.p
assthrough FROM policy WHERE policy.id = ?
Jun 28 11:52:16 ramachandra ods-enforcerd: DB prepare Err 2006: MySQL
server has gone away
[hsm_key_factory_generate_task] generate for policy key [duration: 0]
Jun 28 11:52:16 ramachandra ods-enforcerd: [hsm_key_factory_generate]
repository LocalHSM role KSK
Jun 28 11:52:16 ramachandra ods-enforcerd: SELECT COUNT(*) FROM hsmKey
WHERE hsmKey.policyId = ? AND hsmKey.state = ? AND hsmKey.bits = ? AND
hsmKey.algorithm = ? AND hsmKey.role = ? AND hsmKey.isRevoked = ? AND
hsmKey.keyType = ? AND hsmKey.repository = ?
Jun 28 11:52:16 ramachandra ods-enforcerd: DB prepare SQL SELECT
COUNT(*) FROM hsmKey WHERE hsmKey.policyId = ? AND hsmKey.state = ? AND
hsmKey.bits = ? AND hsmKey.algorithm = ? AND hsmKey.role = ? AND
hsmKey.isRevoked = ? AND hsmKey.keyType = ? AND hsmKey.repository = ?
Jun 28 11:52:16 ramachandra ods-enforcerd: DB prepare Err 2006: MySQL
server has gone away
After restarting the enforcer it connects correctly to MySQL and the
keys start advancing through the various states.
That would be known issue:
https://issues.opendnssec.org/browse/OPENDNSSEC-913

There is some code which "keeps" the connection alive, but in case
the connection goes anyway, it won't reconnect. There are two
reasons for a connection to get lost:
- A deliberate restart of the database. Althrough we should address
this, it's not a very frequent case.
- A too short timeout on the mysql/mariadb compared to how often
the enforcer wakes up to check zones. Can also be adressed as above
but often also resolved by larger settings to interactive_timeout
(and possible wait_timeout, though that shouldn't).
Post by Casper Gielen
I've added a cron-job that restarts the enforcer every 6 hours.
That's not ideal but should make clear if the problem is just that the
enforcer gets stuck and thus misses its deadlines, or if the problems go
deeper.
Please let us know how you fare with that.

\Berry
Casper Gielen
2018-07-02 14:57:57 UTC
Permalink
Post by Berry A.W. van Halderen
Post by Casper Gielen
Post by Berry A.W. van Halderen
Are you using SoftHSM as HSM? If so, which version?
There is a known, resolved issue with certain versions.
I just switched to SoftHSM 2.4.0, from Debian Unstable.
I'll run it for a bit and see if anything improves.
After two days nothing has happened. That is, all keys seem to be in
exactly the same state as two days ago.
Calling 'ods-enforcer enforce' manually does trigger something, but the
enforcer is not able to talk to our SQL server.
Jun 28 11:52:16 ramachandra ods-enforcerd: DB prepare SQL SELECT
policy.id, policy.rev, policy.name, policy.description,
policy.signaturesResign, policy.signaturesRefresh,
policy.signaturesJitter, policy.signaturesInceptionOffset,
policy.signaturesValidityDefault, policy.signaturesValidityDenial, po
licy.signaturesValidityKeyset, policy.signaturesMaxZoneTtl,
policy.denialType, policy.denialOptout, policy.denialTtl,
policy.denialResalt, policy.denialAlgorithm, policy.denialIterations,
policy.denialSaltLength, policy.denialSalt, policy.denialSaltLastChange,
policy.keysTtl, policy.keysRetireSafety
, policy.keysPublishSafety, policy.keysShared, policy.keysPurgeAfter,
policy.zonePropagationDelay, policy.zoneSoaTtl, policy.zoneSoaMinimum,
policy.zoneSoaSerial, policy.parentRegistrationDelay,
policy.parentPropagationDelay, policy.parentDsTtl, policy.parentSoaTtl,
policy.parentSoaMinimum, policy.p
assthrough FROM policy WHERE policy.id = ?
Jun 28 11:52:16 ramachandra ods-enforcerd: DB prepare Err 2006: MySQL
server has gone away
[hsm_key_factory_generate_task] generate for policy key [duration: 0]
Jun 28 11:52:16 ramachandra ods-enforcerd: [hsm_key_factory_generate]
repository LocalHSM role KSK
Jun 28 11:52:16 ramachandra ods-enforcerd: SELECT COUNT(*) FROM hsmKey
WHERE hsmKey.policyId = ? AND hsmKey.state = ? AND hsmKey.bits = ? AND
hsmKey.algorithm = ? AND hsmKey.role = ? AND hsmKey.isRevoked = ? AND
hsmKey.keyType = ? AND hsmKey.repository = ?
Jun 28 11:52:16 ramachandra ods-enforcerd: DB prepare SQL SELECT
COUNT(*) FROM hsmKey WHERE hsmKey.policyId = ? AND hsmKey.state = ? AND
hsmKey.bits = ? AND hsmKey.algorithm = ? AND hsmKey.role = ? AND
hsmKey.isRevoked = ? AND hsmKey.keyType = ? AND hsmKey.repository = ?
Jun 28 11:52:16 ramachandra ods-enforcerd: DB prepare Err 2006: MySQL
server has gone away
After restarting the enforcer it connects correctly to MySQL and the
keys start advancing through the various states.
https://issues.opendnssec.org/browse/OPENDNSSEC-913
There is some code which "keeps" the connection alive, but in case
the connection goes anyway, it won't reconnect. There are two
- A deliberate restart of the database. Althrough we should address
this, it's not a very frequent case.
So far I've not seen a relation between restarts MariaDB and my problem,
even though I've tried to break it by repeatedly restarting MariaDB.
I'll keep it in mind in case there is a pattern I've missed so far.
Post by Berry A.W. van Halderen
- A too short timeout on the mysql/mariadb compared to how often
the enforcer wakes up to check zones. Can also be adressed as above
but often also resolved by larger settings to interactive_timeout
(and possible wait_timeout, though that shouldn't).
Both are at 8 hours, that should be plenty, but just to be sure I've
increased both to 80 hours.
Post by Berry A.W. van Halderen
Post by Casper Gielen
I've added a cron-job that restarts the enforcer every 6 hours.
That's not ideal but should make clear if the problem is just that the
enforcer gets stuck and thus misses its deadlines, or if the problems go
deeper.
Due to a small mistake this cron-job never got installed on the system
and this morning the enforcer was stuck again, so I don't have an new
results.
I've fixed the problem and the enforcer got back to it. I hope to have
more information tomorrow.
--
Casper Gielen <***@uvt.nl> | LIS UNIX
PGP fingerprint = 16BD 2C9F 8156 C242 F981 63B8 2214 083C F80E 4AF7

Universiteit van Tilburg | Postbus 90153, 5000 LE
Warandelaan 2 | Telefoon 013 466 4100 | G 236 | http://www.uvt.nl
Casper Gielen
2018-07-18 08:57:44 UTC
Permalink
Post by Casper Gielen
Post by Casper Gielen
I've added a cron-job that restarts the enforcer every 6 hours.
That's not ideal but should make clear if the problem is just that the
enforcer gets stuck and thus misses its deadlines, or if the problems go
deeper.
Due to a small mistake this cron-job never got installed on the system
and this morning the enforcer was stuck again, so I don't have an new
results.
I've fixed the problem and the enforcer got back to it. I hope to have
more information tomorrow.
Just a little update.
Restarting the enforcer every 6 hours supresses the symptoms. I still
get occasional errors ("DB prepare Err 2006: MySQL server has gone
away") every few days, but after a restart of the enforcer the process
continues. There are no longer large jumps in state or dissappearing
keys. I do not consider the problem solved, but DNS is usable again.
--
Casper Gielen <***@uvt.nl> | LIS UNIX
PGP fingerprint = 16BD 2C9F 8156 C242 F981 63B8 2214 083C F80E 4AF7

Universiteit van Tilburg | Postbus 90153, 5000 LE
Warandelaan 2 | Telefoon 013 466 4100 | G 236 | http://www.uvt.nl
Dennis Baaten
2018-07-31 09:23:55 UTC
Permalink
I've been reading up on this thread and I think I'm experiencing similar
problems.
I'm running Debian stable, but I've been using the ODS packages from testing
for a while now.
I'm still trying to figure out what is happening exactly, but in my logs I
find the following:

Jul 30 09:45:16 traxotic ods-enforcerd: Configuration.cpp(122): Missing
slots.removable in configuration. Using default value: false
Jul 30 09:45:17 traxotic ods-enforcerd: [hsm_key_factory_get_key] no keys
available
Jul 30 09:45:17 traxotic ods-enforcerd: [enforcer] updatePolicy: No keys
available in HSM for policy default, retry in 60 seconds
Jul 30 09:45:17 traxotic ods-enforcerd: [hsm_key_factory_get_key] no keys
available
Jul 30 09:45:17 traxotic ods-enforcerd: [enforcer] updatePolicy: No keys
available in HSM for policy com_net_org, retry in 60 seconds
Jul 30 09:46:17 traxotic ods-enforcerd: [hsm_key_factory_get_key] no keys
available
Jul 30 09:46:17 traxotic ods-enforcerd: [enforcer] updatePolicy: No keys
available in HSM for policy com_net_org, retry in 60 seconds
Jul 31 10:08:40 traxotic ods-enforcerd: Configuration.cpp(122): Missing
slots.removable in configuration. Using default value: false
Jul 31 10:08:41 traxotic ods-signerd: Configuration.cpp(122): Missing
slots.removable in configuration. Using default value: false
Jul 31 10:08:41 traxotic ods-signerd: Configuration.cpp(122): Missing
slots.removable in configuration. Using default value: false
Jul 31 10:25:29 traxotic ods-hsmutil: Configuration.cpp(122): Missing
slots.removable in configuration. Using default value: false
Jul 31 10:37:41 traxotic ods-enforcerd: Configuration.cpp(122): Missing
slots.removable in configuration. Using default value: false
Jul 31 10:37:43 traxotic ods-signerd: Configuration.cpp(122): Missing
slots.removable in configuration. Using default value: false
Jul 31 10:37:43 traxotic ods-signerd: Configuration.cpp(122): Missing
slots.removable in configuration. Using default value: false
Jul 31 10:40:13 traxotic softhsm2-util: Configuration.cpp(122): Missing
slots.removable in configuration. Using default value: false
Jul 31 10:46:19 traxotic ods-enforcerd: Configuration.cpp(122): Missing
slots.removable in configuration. Using default value: false
Jul 31 10:48:12 traxotic ods-hsmutil: Configuration.cpp(122): Missing
slots.removable in configuration. Using default value: false


Jul 31 09:50:52 traxotic ods-signerd: ObjectFile.cpp(122): The attribute
does not exist: 0x00000002
Jul 31 09:50:52 traxotic ods-signerd: [hsm] unable to get key: key
56be14a85f43ed317c789841fe664136 not found
Jul 31 09:50:52 traxotic ods-signerd: [hsm] hsm_get_dnskey(): Got NULL key
Jul 31 09:50:52 traxotic ods-signerd: [hsm] unable to get key: hsm failed to
create dnskey
Jul 31 09:50:52 traxotic ods-signerd: [zone] unable to publish dnskeys for
zone traxotic.net: error creating dnskey
Jul 31 09:50:52 traxotic ods-signerd: [tools] unable to read zone
traxotic.net: failed to publish dnskeys (General error)
Jul 31 09:50:52 traxotic ods-signerd: CRITICAL: failed to sign zone
traxotic.net: General error
Jul 31 10:02:55 traxotic ods-signerd: ObjectFile.cpp(122): The attribute
does not exist: 0x00000002
Jul 31 10:02:55 traxotic ods-signerd: [hsm] unable to get key: key
56be14a85f43ed317c789841fe664136 not found
Jul 31 10:02:55 traxotic ods-signerd: [hsm] hsm_get_dnskey(): Got NULL key
Jul 31 10:02:55 traxotic ods-signerd: [hsm] unable to get key: hsm failed to
create dnskey
Jul 31 10:02:55 traxotic ods-signerd: [zone] unable to publish dnskeys for
zone traxotic.net: error creating dnskey
Jul 31 10:02:55 traxotic ods-signerd: [tools] unable to read zone
traxotic.net: failed to publish dnskeys (General error)
Jul 31 10:02:55 traxotic ods-signerd: CRITICAL: failed to sign zone
traxotic.net: General error

After restarting some ODS service, it seems to be working again.

-----Oorspronkelijk bericht-----
Van: Opendnssec-user <opendnssec-user-***@lists.opendnssec.org> Namens
Casper Gielen
Verzonden: woensdag 18 juli 2018 10:58
Aan: opendnssec-***@lists.opendnssec.org
Onderwerp: Re: [Opendnssec-user] Missing keys and various other problems on
2.0
Post by Casper Gielen
Post by Casper Gielen
I've added a cron-job that restarts the enforcer every 6 hours.
That's not ideal but should make clear if the problem is just that
the enforcer gets stuck and thus misses its deadlines, or if the
problems go deeper.
Due to a small mistake this cron-job never got installed on the system
and this morning the enforcer was stuck again, so I don't have an new
results.
I've fixed the problem and the enforcer got back to it. I hope to have
more information tomorrow.
Just a little update.
Restarting the enforcer every 6 hours supresses the symptoms. I still get
occasional errors ("DB prepare Err 2006: MySQL server has gone
away") every few days, but after a restart of the enforcer the process
continues. There are no longer large jumps in state or dissappearing keys. I
do not consider the problem solved, but DNS is usable again.

--
Casper Gielen <***@uvt.nl> | LIS UNIX PGP fingerprint = 16BD 2C9F 8156
C242 F981 63B8 2214 083C F80E 4AF7

Universiteit van Tilburg | Postbus 90153, 5000 LE Warandelaan 2 | Telefoon
013 466 4100 | G 236 | http://www.uvt.nl

Loading...