Next:

alias cortex-prod-admrole=“aws sso login –profile cortex-prod-admrole && export RPROMPT=cortex-prod-admrole && export KUBECONFIG=~/.kube/clusters/cortex-prod-cluster”

  • report on calico global netowrk policy use
  • report on priority class usage
  • report on hpa use
  • report expiry of artifactory token, renew and add to AWS secretsmanager
  • -CEIP-5593 Docs: Review cortex-documentation (backstage)

2024-04-29

  • planning
    • painful conversation about the partners with missing resource specs and others with bad ones.
      • could automate report for the missing ones
      • what about judging ‘good’ limits
      • felipe suggests total requests and compare to actual
    • ksi removal in ce prod in next 2 weeks so need to get scheduled
  • external-dns
  • ceip-5577
    • crtxctl as lambda

2024-04-26

  • ceip-5577
    • crtxctl as lambda

2024-04-25

  • ceip-5577
    • crtxctl as lambda
      • fix tests
      • more lambda testing
      • TIL: cloudwatch log entries change periodically (15mins), on redeploy and on crash.
  • CCMS:
    • Felipe, Claire, Tim Sm, Tim St
    • James Keena (TIO),
    • Kent Haynes (software eng mgr, no EKS xp, EC2 and traditional),
    • Connor Skio
    • Jeffrey Aoyagi
    • quarter by quarter planning process

2024-04-24: AWS summit, London

2024-04-22

  • ceip-5577
    • crtxctl as lambda
    • needing to rework the kubeconfig differently for lambda and cli
  • checked out state of cluster_diff with / for AN

2024-04-22

  • ceip-5577
    • crtxctl as lambda

      • reworking auth to (hopefully) remove need of cortex-prod-admrole as well as solving the lambda issue
    • class diagram

    • swagger api?

2024-04-19

  • ceip-5577
    • crtxctl as lambda
      • initially not running as Inspector but could not assume.
      • got stuck on the IAM permissions cos lambda already has Inspector and trying to assume the same.

2024-04-18

  • ceip-5577
    • crtxctl as lambda
      • dump the two-stage Dockerfile finally gets something running
      • no logging under runtime interface emulator, all doc examples seem to be ’live’

2024-04-17

  • ceip-5577
    • crtxctl as lambda
      • python lambdas require layers to be deployed as separate artifact - eff that
      • looked at kubeless, knative, apache whisk all seem to be a PITA
      • returned to lambda done as container
        • much time wasted on aws docs suggesting two stage docker file
        • concluded fast api image on EKS may be the best better after all
  • CEIP-4648 - cap test iam docs
  • CEIP-4827 mark created resources with owner reference

2024-04-16

2024-04-15

  • review Alerts prezo with AN
  • ceip-5577 (7h)
    • crtxctl bundle creation
    • publisher interface and console + s3 implementations
  • c3 (1h)

2024-04-12

  • Osmosis incident
  • debug inspector
  • ceip-5573: inspector api test
  • CEIP-5577: poc crtxctl bundle creation (3h)

2024-04-11

  • debug Inspector

    • cannot build image when on Zscaler because of go parts, cannot build off due to crtxctl parts
    • invoke
      curl -H "Content-Type: application/json" -X POST https://inspector.cortex-non-prod.elsevier.systems/inspect -d '
      {
        "targetCluster": {
          "platformClass": "alpha",
          "product": "core-engineering",
          "clusterName": "test-cluster"
        }
      }'
      {"eventId":"1b699f95-3a08-464c-ab3e-889acda3defb"}
      
    • look for in S3
    • This is the error:
      {
          "level": "error",
          "message": "failed to set target: operation error EKS: DescribeCluster, get identity: get credentials: failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: ee28cc4b-685f-4390-b6cd-a000b0d29e7c, api error AccessDenied: User: arn:aws:sts::781632261136:assumed-role/Cortex-Inspector/cortex-inspector-nonprod is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::781632261136:role/test-cluster-Cortex-Inspector-Agent",
          "timestamp": "2024-04-11T13:43:00Z",
          "clusterId": {
              "platformClass": "alpha",
              "product": "core-engineering",
              "clusterName": "test-cluster"
          }
      }
      
    • conclusion:
      • proceed with principle of prod platform managing non prod partner clusters
      • deploy prod inspector
  • long and tortuous conversation with Juan Angel about csi secrets

2024-04-09

  • CEIP-5527: metrics: complete and merge
  • CEIP-5504: Add ArgoCD post-sync hook for external-dns
    • revisit the bootstrap, no relevant drifts but fixed a bunch anyway
    • revisit debugging make run for capability tests
      • realise Makefile syntax is working against us
      • consider bash or python
        • after diversion on python repl not convinced that helps
        • consider more tomorrow.

2024-04-08

  • CEIP-5527: metrics
    • manually performed
    • automation investigation
  • CEIP-4469
    • figure out why init_db cannot connect to new kong_labs db. Wrong password?
    • need to create brand new color?
  • 1-2-1 w IP: lots of notes made as he talked solidly for an hour!

2024-04-05

  • CEIP-4469:
    • ctd scripting build, hitting Invalid RBAC
    • debug database:
      kubectl run -n labs -it --rm pgclient --image=jbergknoff/postgresql-client --restart=Never postgresql://kong_labs@kong-labs-20240313-1238.cclgtepwgx5u.eu-west-1.rds.amazonaws.com:5432/kong_labs
      

2024-04-04

  • CEIP-5536: docs write up
  • CEIP-4469: return to this after FG did some route group work to replace static listener nginx hack
    • paired and solved credentials issue
    • document steps, now reading from secretmanager
    • begin moving to robot for easier repeatability as steps have grown to > 30 lines

2024-04-03

  • CEIP-5408: Revised (moved to diff page) and completed
  • CEIP-5536: Create, debug with team and ultimately solve CORS issue
    • TIL: debugging Skipper
      • routes:
        • wget -O - http://localhost:9911/routes
        • eskip
        • Ref

2024-04-02: Half day vacation

2024-04-01: Easter Monday