Left over from 2024

2025-01-31

  • fixing more fluent captest issues
  • release
  • review of Inspector UI API
    • find . -name “*.py” -exec wc -l {} +` = 10605

2025-01-30

  • simple terraform implementation of captest release: doh!
  • fix fluent, vpa, kubecost tests
  • reviews, mostly for Khush
  • release, abortive due to token issues on GHA

2025-01-29

  • fix token issues locally
  • working on python regex impl of captest terraform release
    • stuck

2025-01-28: Vacation

2025-01-27

  • retro
    • raise GitHub alert thresholds
    • ask GH about lack of events from automated creation, deprecation of create-release action

2025-01-24

  • inspector release engineering
    • settled for part manual intervention
      1. person runs make a release workflow
      2. GHA bumps the versions, commits and drafts the release
      3. person reviews and publishes the release
      4. GHA build and deploys all the images, terraform etc
    • fix terraform permissions failure (GHA role cannot read state bucket)
      • cannot use GHA role that can do terraform because it can’t do other stuff
      • tried adding permissions - the usual nightmare
      • :lightbulb: have terraform build code take role as parameter?
        • could later refactor as poetry terraform plugin
      • in the end just call existing workflow and call it good.

2025-01-23

  • CICD incident-update-docker-desktop-for-mac
  • cortex engineering forum
    • tier 1 release process
    • karpenter release process
  • inspector release engineering
    • review some PRs from Khush
    • testing with different workflows, different action implementations and different triggers I have found no permutation that will create the release such that the release event will trigger the deployment workflow. Ditto automated tags do not trigger tag events. Doing releases and tags manually trigger the downstream workflow just fine.

2025-01-22

  • 121
    • poetry plugins for deployment
      • lambda
      • terraform
    • support sdlc, standardisation
    • env cost alongside finops
      • hits cost and modernisation msgs from de anna
  • Inspector
    • release publishing dev images from prod and prod from dev (always lagging a version) due to git ref being the one at the start of the process not the commit made within the workflow.

2025-01-21

  • inspector workflow
    • terraform apply on release
  • inspector ui
    • CPU saturated as shown by NR;
      SELECT max(`k8s.container.cpuLimitCores`), max(`k8s.container.cpuRequestedCores`), max(`k8s.container.cpuUsedCores`) FROM Metric FACET `k8s.namespaceName` WHERE `k8s.namespaceName` = 'inspector-ui' SINCE 30 minutes ago TIMESERIES UNTIL now
      

2025-01-20

  • continue simulating timeout
    • diff approach, client side but turns out still need server change because otherwise responds to incomplete posts
  • reviews for KA PRs
  • planning
  • Inspector release and workflow updates (again)

2025-01-17

2025-01-16

  • prep and discuss CPS platform request
    • left it with Irfan / James to decide thru the TPR proess

2025-01-15

  • inspector
    • reviewed docs (again)
  • captest
    • skipper failure

2025-01-14

  • inspector
    • checkout captest results
      • nonprod
        • passing: 1490
        • failing: 300
        • skip on fail: 76
    • release
    • review w Khush then Claire, Irfan

2025-01-13

2025-01-10

  • 121
    • objectives - team level
      • EKR updates and other BAU
      • look at deAnna’s deck for more strategic stuff
        • eg cortex lite (heroku like experience)
          • repo + manifests = working app
            • migration rather than revolution compared to cortex
            • struggling to get chris onboard
        • serverless
          • propose to constrain to lambda because widely adopted
          • acct factory most complex today (uses step functions)
          • what can we offer to permit devs to focus on product code
          • propose to reach out to partner teams (but not likely to be before H2)
        • position for ‘cortex dev xp’ not serverless ‘cos org not ready
          • dev xp is on roadmap
      • want to pivot KPIs away from number of clusters
        • doesn’t do justice to fact that may be 80% of dbs workload in 4 clusters
        • helios
          • tagging to move away from lots of accounts
          • expect to lead to revamped tagging standard
        • would permit reporting on namespaces or helios product / subproduct
      • Schuler
        • done some kt
        • getting ready to engage teams on enablement to build pipeline
        • will work with BU leads to create enablement teams
    • personal goals
      • selfish / interests
      • uptake of dev10, at least 80%
      • RE cares
      • security journey (should be ready by end of Jan)
      • respond to OfficeVibe (80%)
    • ops goals
      • SDLC maturity
      • project ’nesta’ (?): gen ai chatbot
      • software standards
        • templates
        • common GH workflows
      • convergence
        • cap test: happy as evidenced by CSI test creation
        • inspector cli with platform manager

2025-01-09

  • skipper ingress class
    • complete PR for new tests
    • schedule skipper 6 rollout
  • review poetry 2.0 PR

2025-01-08

  • skipper ingress class
    • refactor tests to eliminate duplication between external-dns and skipper
    • new test to see if can create two default ingressclass - yes we can!
    • Conclusion:
      1. we need the Skipper PR to allow creation of ingress in the isdp / sciencedirect scenario
      2. k8s is not enforcing the expected behaviour that 2 default ingressclass are prohibited or that a classless ingress is prohibited interestingly the aws-load-balancer-controller is using a webhook to enforce 2. which is how we discovered this problem
      3. there is no aws ‘bug’ to report, it’s more that expected behaviour is not enforced. The line after the Caution says: There are some ingress controllers, that work without the definition of a default IngressClass. For example, the Ingress-NGINX controller can be configured with a flag –watch-ingress-without-class. It is recommended though, to specify the default IngressClass
      4. Therefore: we will create a kyverno policy to prohibit creation of ingressclass with default = true by partners
    • TIL: remove finalisers on ingress (when class = ‘alb’)
      kubectl patch ingress classless-ingress-test-cortex -n kube-system --type='json' -p='[{"op": "remove", "path": "/metadata/finalizers"}]'
      

2025-01-07

2025-01-06

  • dataplatform rbac issue
    • discussion with Ashish
    • usefulness of cedar as general low resource policy engine
  • how to deploy skipper ingress class? (presumably system components)

2025-01-03

  • catch up on slack and email

  • implement external dns parent check discussed with Felipe

    def find_my_parent(record, zones):
        for inc in range(1, record.count('.')):
            found = [z for z in zones if z == '.'.join(record.split('.')[inc:])]
            if found:
                return found
        raise IndexError("no parent")
    
  Ref conversation with Mark: https://global-elsevier.slack.com/archives/C046V4MTB16/p1733234805643939

- `external-dns` capability tests still failing
  - narrowed it down to `isdp` and `sciencedirect`
  - turns out they define an ingressclass, which apparently breaks the [missing] skipper one referenced by test

## 2025-01-02: vacation