2023-06-30

  • dev duty investigating drifts that turned out to be broken bootstrap
  • further sbom drift cleanup
  • began investigation of calico beta deployment only to find 3.26.1 chart is missing

2023-06-29

  • delete eu-west-1 again - success, need to confirm CIDR block released (manual)
  • cortex engineering form - delete a cluster
    • (A) need to update procedure
    • platform defs has dependency graph, largely for helm ordering
    • deletion simply reverses this
    • because use terraform outputs to pass data to downstream reconcilers must do platform update first before delete (that’s why cluster has to be in ‘good’ state before starting
    • same risks of being blocked by PDB eviction as when updating
    • does not actively remove Helm, just trusting that it goes when nodes go. Skipper cloud foundation / load balancer is excpetion to this.
    • ephemeral clusters will get new CIDR range, cannot reuse because of CIDR allocation but also because Cortex assumes to create VPC
    • argo will simply complain unable to connect to cluster (will have to enhance)

2023-06-28

  • cluster role & role binding
  • resolve reconciler failure on calico-apiserver in ap-southeast-1-test

2023-06-27

  • send alpha drift
  • cluster role & role binding

2023-06-26

  • delete eu-west-1

  • drift to slack inc. tests

  • NR training, getting ready for AIOps

  • resync alpha platform.yaml to apiserver enabled: true (although all alphas have apiserver with it False)

  • attempt beta sync of calico to 3.21.1 and apiserver enabled

  • Q3 roadmap

2023-06-21

  • drift detection fixes (goal of running each morning and writing to slack)
  • write up roadmap item for post-Skipper in light of go-oidc response
  • purl type for helm

2023-06-20

  • calico
    • upgrade to 3.26.1
    • start writing unit tests for crtxctl to catch issues with sbom / drift

2023-06-19

  • calico
    • changed understanding of replace behaviour: only create-delete if –force
    • validate instructions on calico site with minor tweaks

2023-06-14

2023-06-12

  • AWS training survey
    • Consider reading / awareness of AWS well architected programme for dev plan
  • Calico rollout
    • core-engineering test cluster failure
  • Delete cortex-build-team-eu-west-1-alpha
  • issues using: core-kube-cycle-cli
  • planning
    • propose PR to go-oidc
    • numbers and causes for failed releases in nonprod
    • green light on crtxctl
    • integration tests following Thomas pytest

2023-06-05

2023-06-02 - migrate runbooks to ops site

2023-06-01 - office day for release process

  • MC gave a pretty good summary of current concerns and limitations from last call

  • queue: not getting benefits but getting complexity

  • alternatives

    • rework reconciler as GHA
      • remove async (queue) complexity
      • increase visibility with std GHA definition and reporting
  • discussion

    • concern over reliance on GHA: not an issue if use GHA merely as glue between binaries

RPC - Rights and permissions controller

  • Nigel inherited from Phil HIbberd (along with CWS and QAS)
  • old style EC2
  • considerable tech debt
  • kill multipple birds with one stone by merging into CWS platform
  • Dave Cockram: solution architect: interested in moving PPE (and later PPM) to Cortex
  • Abirami Manaharan (Chennai): lead dev on RPC
  • Rob

RPC: Tomcat (Java 8, MVC, no Boot), Mongo -> Postgres, Elastic Search -> Postgres Small: Single repo for UI and backend timescales: Q3 & Q4

  • Multi-tenancy

    • alreday have network isolation by
    • also resource
    • ongoing effort to resolve reconciliation failures
  • ACTIONS

    • TPR1 Nigel
    • runway, slack channel, assessment (Tim)