Changes

Jump to: navigation, search

CA/Responding To An Incident

7,300 bytes added, 10:38, 18 August 2017
Expand with text from Kathleen and ideas from Ryan
{{draft}}
* Were you aware of this issue before it was The page gives guidance to CAs as to how Mozilla expects them to react to reported* Scanning your corpus misissuances, and what the best practices are. For the purposes of certs for others with the same issue* What processes should have prevented thispage, if a "misissuance" is defined as any certificate issued in contravention of any? Why did they fail?* What steps are you taking applicable standard, process or document - so it could be RFC non-compliant, BR non-compliant, issued contrary to make sure it doesnthe CA't happen again?s CP/CPS, or have some other flaw or problem.
Take any issuing While some forms of misissuance may be seen as less serious than others, opinions vary on which these are. Mozilla sees all misissuances as good opportunities for the CA affected offline immediatelyto test that their incident response processes are working well, and so we expect a similar level of timeliness of response and quality of reporting for all incidents, whatever their adjudged severity.
Post We do not expect perfection from any updates CA; it is true that our confidence in a CA is in part affected by the number and severity of incidents, but it is also significantly affected by the speed and quality of incident response. = Immediate Actions = In almost all cases, a CA should immediately cease issuance from the affected part of your PKI until you have diagnosed the source of the problem. Once the problem is diagnosed, you can restart issuance even if a full fix is not rolled out, if you are able to put in place temporary or manual procedures to prevent the problem re-occurring. You should not restart issuance until you are confident that the problem will not re-occur. = Revocation = It is normal practice for CAs to revoke misissued certificates. But that leaves the question about when this should be done, particularly if it's not possible to contact the customer immediately, or if they are unable to replace their certificate quickly. Section 4.9.1.1 of the CA/Browser Forum’s Baseline Requirements states: <blockquote>“The CA SHALL revoke a Certificate within 24 hours if one or more of the following occurs: …<br>9. The CA is made aware that the Certificate was not issued in accordance with these Requirements or the CA’s Certificate Policy or Certification Practice Statement;<br>10. The CA determines that any of the information appearing in the Certificate is inaccurate or misleading; …<br>14. Revocation is required by the CA’s Certificate Policy and/or Certification Practice Statement; or<br>15. The technical content or format of the Certificate presents an unacceptable risk to Application Software Suppliers or Relying Parties (e.g. the CA/Browser Forum might determine that a deprecated cryptographic/signature algorithm or key size presents an unacceptable risk and that such Certificates should be revoked and replaced by CAs within a given period of time).</blockquote> This means that, in most cases of misissuance, the CA has an obligation under the BRs to revoke the certificates concerned within 24 hours. However, it is not our intent to introduce additional problems by forcing the immediate revocation of certificates that are not BR compliant when they do not pose an urgent security concern. Therefore, we request that your CA perform careful analysis of the situation. If there is justification to not revoke the problematic certificates, then your report will need to explain those reasons and provide a timeline for when the bulks of the certificates will expire or be revoked/replaced. If your CA will not be revoking the certificates within 24 hours in accordance with the BRs, then that will need to be listed as a finding in your CA’s BR audit statement. We expect that your CA will work with your auditor (and supervisory body, as appropriate) and the Root Store(s) that your CA participates in to ensure your analysis of the risk and plan of remediation is acceptable. If your CA will not be revoking the problematic certificates as new threadsrequired by the BRs, then we recommend that you also contact the other root programs that your CA participates in to acknowledge this non-compliance and discuss what expectations their Root Programs have with respect to these certificates.  = Follow-Up Actions = * Work out how the bug or problem was introduced. For a code bug, were the code review processes sufficient? Does your code have automated tests, and if so, why did they not catch this case? * Work out why the problem was not detected earlier. Were these certificates missed by your self-audits? Or is the code or process you use for such audits insufficently rigorous? * If the problem is lack of compliance to an RFC, Baseline Requirement or Mozilla Policy requirement: were you aware of this requirement? If not, why not? If so, was an attempt made to meet it? If not, why not? If so, why was that attempt flawed? Do any processes need updating for making sure your CA complies with the latest version of the various requirements placed upon it? * Scan your corpus of certificates to look for others with the same issue. It does not look good for a CA to claim they have revoked all affected certificates and resolved the issue, and then for a researcher to discover another set of certificates with the same or a similar problem. * Examine whether there are potential related problems which you can also remediate at the same time. For example, if the problem was bad data in a particular field, consider improving the validation of all fields in the certificate prior to issuance. You should be proactively looking for ways to harden your issuance pipeline against further problems. * If, as happens in a regrettably large number of cases, a comment problem report was sent to your CA but action was not taken within 24 hours, investigate what happened to that report and whether your report handling processes are adequate. = Incident Report = Each incident should result in an incident report, written as soon as the old thread referencing problem is fully diagnosed and (temporary or permanent) measures have been put in place to make sure itwill not re-occur. If the permanent fix is going to take significant time to implement, you should not wait until this is done before issuing the report. We expect to see incident reports as soon as possible, and certainly within two weeks of the initial issue report.  The incident report should cover at least the following topics: # How your CA first became aware of the problem (Explain e.g. via a problem report submitted to your Problem Reporting Mechanism, via a discussion in mozilla.dev.security.policy, or via a Bugzilla bug), and the date.# A timeline of the actions your CA took in response.# Confirmation that your CA has stopped issuing TLS/SSL certificates with the problem.# A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.# A complete list of the problematic certificates. The recommended way to handle this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.# Explanation about how and why)the mistakes were made or bugs introduced, and how they avoided detection until now.# List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things. = Keeping Us Informed = Once the report is posted, you should provide regular updates giving your progress, and confirm when the remediation steps have been completed. Such updates should be posted to the m.d.s.p. thread, if there is one, and the Bugzilla bug, if there is one.
= Examples of Good Practice =
 
Here are some examples of good practice, where a CA did most or all of the things recommended above.
== Let's Encrypt Unicode Normalization Compliance Incident ==
* [https://groups.google.com/d/msg/mozilla.dev.security.policy/nMxaxhYb_iY/AmjCI3_ZBwAJY Final Report from CA], 2017-08-11 03:00 UTC
In this case, the CA managed to diagnosethe problem, remediate it, and deploy the fix to production within 24 hours.
== PKIOverheid Short Serial Number Incident ==
Accountapprovers, antispam, confirm, emeritus
4,925
edits

Navigation menu