Detailed Documentation Checklist
What should your project's documentation include?
All teams must document the behavior of their services when things start going wrong.
At minimum, that should include:
- Failover
- For each point of failure, describe the process to restore service
- Environments and their usage (qa/staging/prod)
- If it runs in the cloud, using a 3rd party, runs on a laptop
- Services dependencies
- graphql-api, discord for logins, etc
- At least basic runbooks of how to troubleshoot.
- How to find out who's on-call
- How to notify on-call / page them
Beyond that, here are some topics to consider:
- Detailed description of it. Include enough detail to make the potential points of failure clear.
- Where the code is
- What subparts there are (and where to find them)
- How a deploy works
- Including CI/CD type stuff
- Including any autoscaling aspects?
- Where monitoring/metrics are
- Logging / finding stuff