Next:

alias cortex-prod-admrole=“aws sso login –profile cortex-prod-admrole && export RPROMPT=cortex-prod-admrole && export KUBECONFIG=~/.kube/clusters/cortex-prod-cluster”

  • report on calico global netowrk policy use

  • report on priority class usage

  • report on hpa use

  • report expiry of artifactory token, renew and add to AWS secretsmanager

  • -CEIP-5593 Docs: Review cortex-documentation (backstage)

  • review README for captests and update Luis

  • pre-delete checks: security group, lb, eni

  • kyverno

    • give me all kyverno issues, then exclude the known, report new ones
    • cheaper? check if volume of errors is growing week over week
  • cronjob

  • alt-ui on backstage

  • sonarlint?

  • release scripting?

  • alt-ui?

2024-07-31

  • respond to slack about newrelic logging
  • review Khush’s latest PR
  • discuss ops tool issue with Thomas
  • look at formatters and linters to standardise code conventions and avoid whitespace changes in PRs
    • 20:39

2024-07-30

  • sorting house issues.

2024-07-29

  • morning on house
  • retro
  • chat with Khush about priority
  • Kong KSI migration
    • all dataplanes need 3x mtls certs
    • admin pwd can be k8s secret
    • newrelic (our key for other clusters)
      • needed by all

2024-07-26

  • Karpenter experience
    • Partners concerned about node instability only stable when inactive Real question should be response time (which will be fine if correct PDBs)
    • Tension between ops wanting highly utilised and partner wanting high response time
    • Can release Karpenter at any time, only when scale down MNGs does any impact manifest
    • Knowledge running solar without topology spread and anti-affinity such that 3 replocas on one node Implication would be if that node went down, would need manual intervention to recover
  • ‘Fast lane’ release
    • hotfix breaks out of whereever the fire is
    • feature flags must be implemented for delayed releases
    • the intermediate releases must go all the way and are uninteruptible
  • TODO: commit hook for tflint, terrafrm fmt, carriage return at end of file
  • TIL: blocking ticket not enough visibility: yank due date TODO: can this be included in opsgenie-slack?

2024-07-25

  • (https://elsevier.atlassian.net/wiki/spaces/TIOCE/pages/119601452282728/Core+Engineering+Repositories+Review)
  • Electronic Warehouse
    • Vijay Ayyanar
      • download, scan and tag content from S3
      • in K8s cluster
      • in new AWS account
      • regular monthly terraform module updates
      • IP address exhaustion a frequent issue
      • have pod autoscaler
      • add new VPC to existing?
      • perf improvement thru Cortex? Karpenter and monitoring
      • pushing logs to both Elastic and OpenSearch
      • EKS clusters: msms-non-prod-eks-cluster & msms-prod-eks-cluster (1.28)
    • Claire
      • Have you discussed joining existing cluster
    • Tim P
      • ‘BigOps’ TIO is central to ELS
      • Cost monitoring is important
        • Can do cost monitoring at namespace level but Adam (FinOps) has reverted to account level

2024-07-24

  • More KSI migration
  • Start on request spec additions

2024-07-23

  • KSI migration

2024-07-22

  • KSI migration
  • planning

2024-07-18

  • recover Dev10 not actually done when recorded previously
    • (MIWG)

2024-07-17

  • CEIP-5785: Inspector Q2
    • documentation update
    • logging review
    • performance and tuning (remove unnecessary role assumes)
    • debugging release scripts

2024-07-16

2024-07-15

2024-07-12

2024-07-11

  • review inspector_ui with Khush
    • add count per tier and total count to /info endpoint
    • ensure all config injectable as env vars
  • complete simplification of check_request_specs
  • bug fix in /info serialisation
  • liaise with Juan on IPI resource requests

2024-07-10

2024-07-09

  • further discussion with Ashish on publishers and PlatformCommand
  • resolve some confused thinking around error handling on PR111
  • Argo app-of-apps demo
  • Inspector Q2 review

2024-07-08

  • co-editing session with Ashish on reading api resource groups and versions
  • planning
  • debugging with Thomas too.

2024-07-04

  • couple of inspector Jiras tidied up
  • inspector_ui reviews
  • discussion about inspector api / queue or not

2024-07-03

Rights and Permissions Collector (RPC)

  • database of images we don’t have permission to use.

  • workflow (termed project) starts when identify image without permission to use

  • internal users plus suppliers outside ELS

  • no workflow (Camunda) - though seems like ideal candidate

  • Java 17, [one of?] the base images has a vulnerability that needs to be patched.

  • OPWWAT file [vulnerability] testing outstanding

  • react migration is phased, pending while Cortex work happens

  • deployment workflow

  • inspector /platforms endpoint; abandon the all clusters endpoint

  • review all inspector tickets

2024-07-02

2024-07-01