2022-09-28
- Backstage issues
- CEIP-2216 Runway tracker updates
2022-09-27
2022-09-26
- Set up tools for
cortex-documentationper https://github.com/elsevier-centraltechnology/cortex-documentation/blob/main/docs/annex/adding-documentation.md- Vale: https://vale.sh/docs/vale-cli/installation/
- MarkdownLint VS Code extension and CLI
- dev duty: got PRs down from 17 to 7
- Review JIRAs
- CEIP-1877: Cortex NPS split out as suggested in comments
- CEIP-2271: Kong developer portals to prod complete
- Managed Node Groups for Cortex Alpha and Beta clusters
- Note if no pod disruption budget will need to be on hand to manually…
2022-09-25
- go to link for incident summary
- reading cortex docs with a view to restructuring / contributing CEIP-2330
- Write blog: Protecting Calico
- chase APIOps pipeline for last couple of workspaces with failures
- deploy, with help from Tim, Kong prod. PatientPass incident results.
2022-09-22
- Kong nonprod live
- Kong cleanup
- consistent use of var.namespace
- consistent use of t4g.micro RDS instance type
- destroy nonprod 0909 environment inc. last undated RDS
- destroy infra kong_rds_20220906
- destroy sandbox kong_rds_20220912
- apply split DNS in sandbox and infra
2022-09-21
- Kong APIOps pipelines ran with minimal issues overnight. Cleared for takeoff!
- hit workspaces created with CamelCase issue.
- encountered further issue with API reporting success deleting workspace sub-entities but not actually doing so (confirmed at db level)
- recreated new nonprod 20220921
2022-09-20
- Reminder of where jcasc jobs are recorded (infra example)
- corp card application
nonprodKong developer portal
2022-09-19 - Public holiday for QE2 funeral
2022-09-16
- Complete sandbox 20220915 release
- Initiate nonprod ‘green’
2022-09-15
- Sierra meeting: updated Magento assessment
- Kong mtg
- John: Swagger (OAS) specs sometimes contain security info
- Jack says trying to auto-merge some of this in pipeline. Derived from spec.
- John says should be under control of products
- Raffa says security from Kong to service should be purged. Agreed I think.
- Observability
- need unified solution across BUs and Terry will get Kong in to say what can be done out of the box, what is custom and what has never been done
- API Gateway Observability
- APIM Observability
- Review guild proposal
- John: Swagger (OAS) specs sometimes contain security info
2022-09-14
- Incident response
- Key artifacts:
- capability blueprint high level, drill down; and
- monitoring blueprint
- capability blueprint has not been validated at all
- monitoring blueprint is still work in progress
- Thomas highlights need for lots of training
- Tim highlights that if system down partner will escalate and OCC unlikely to say no
- Irfan highlights there are other escalation routes too.
- Felipe indicates only escalate from OCC for revenue or reputation impact
- no tools to know if what partners do are appropriate (number of nodes, suitability of probes)
- data points: few actual escalations in #cortex-help, maybe 1 / month
- Minimal monitoring blueprint (from Irfan)
- per target dashboard - high-level, yes-nos
- connection with OCC
- TechDeck form to report
- Key artifacts:
2022-09-13
- rebuilding Kong sandbox
2022-09-12
- rebuilding Kong infra (again with new db)
2022-09-09
- rebuilding Kong infra
2022-09-08
LESSON:
- Anything automated must be doable without automation
- It is no help to have rafts of Jenkins or Terraform code that script things to happen quickly if they cannot be run when Jenkins is unavailable or Terraform has a broken state.
- Automation should simply compose a clear set of self-contained steps to accelerate the work a DevOps engineer would otherwise understand and perform manually.
- Reliable and repeatable steps are far more useful than automation.
- Anything automated must be doable without automation
fixing the
kong_adminlogin:- switch from basic-auth to openid-connect
- configure KONG_ADMIN_GUI_SESSION_CONF and KONG_ADMIN_GUI_AUTH_CONF
- https://docs.konghq.com/gateway/latest/configure/auth/kong-manager/basic/
- https://docs.konghq.com/gateway/latest/configure/auth/kong-manager/super-admin/
random unreliability:
Error: Provider produced inconsistent final plan When expanding the plan for module.kong_cp_20220908b[0].module.control_plane.helm_release.kong to include new values learned so far during apply, provider "registry.terraform.io/hashicorp/helm" produced an invalid new value for .set: planned set element cty.ObjectVal(map[string]cty.Value{"name":cty.StringVal("waitImage.enabled"), "type":cty.NullVal(cty.String), "value":cty.StringVal("false")}) does not correlate with any element in actual. This is a bug in the provider, which should be reported in the provider's own issue tracker.Unclear whether this means the work is done or note. Reapplying indicates nothing to do.
2022-09-04
2022-09-01
- CEIP-2271 - green RDS & CP
LOATHE: infra: 0 code changes and terraform plan reports 6815 lines summarised as20 to add, 9 to change, 3 to destroy.going blue to blue we are nonetheless seeing
│ infra ctrl-infra-green-61-kong-pre-upgrade-migrations-4wpv6 ● 0/1 0 Completed 0 0 n/a- Should have been on the unmerged branch
LOATHE: it isn’t GitOps if I cannot rely on
mainbeing the currently deployed configurationLOATHE: to conditionalise child modules terraform provides
countso transforming every subsequent scalar value into a listblue or greenorblue and/or green- no variable to control the blue or the green;
active_serviceshardcoded to just one (currently blue) andenvironment_servicespotentially supports both- point of control is in the environment (for example: infra) when could have been in pseudo but let’s not worry about that now
2022-09-01
- 1:1
- get Rob Williamson involved w GTBOA
- early Dec: REInvent (reward for OKRs)
- follow Thomas’ instructions to build environment
- review FG PR for colors
- set aside writing time
- do spring native investigation