2025-03-31

  • tier1 rollout
  • investigations job

2025-03-28: vacation

2025-03-27

  • prod release
  • 1-2-1
  • discuss db approach
  • discuss captest PR 389 with Ashish (esp. kyverno)
  • pydantic validators
  • ELSAPI
    • Genti Topija; Michelle Barboza; Saravanan Selvam; Chris Langridge; Christina Costello-Starland, Thomas Viaud
    • 5 apps , being containerised
    • selected north-south first
      • WIP getting onto EKS, containerisation
        • puppet
      • centos 7 exception process
        • second exception until just June 30
      • Avik, Terry are architects
    • alpha = tio dev, beta = developers, prod = prod (same as APIM)
    • BAD TASTE that basically told should not have got involved.
  • OKRs ops team brainstorming

2025-03-26

  • releases
    • beta calico test et al.
      • went fine
    • prod
      • smoke test rollout Failed

        • dataplatform rdp-tooling
        • core-engineering prod
        • emealasp prod
        • hm-core-platform cluster-prod
        • identity id-prod-use

        Not run

        • identity,id-prod-euw
        • hm-core-platform,cluster-prod-ap
        • hm-core-platform,cluster-staging-ap

        Rerun

        • hm-core-platform,cluster-prod: success
        • identity,id-prod-euw: success
        • core-engineering prod: success
        • emealasp prod: success
        • hm-core-platform,cluster-prod-ap: success
        • dataplatform rdp-tooling
  • fix(ceip-7601, captest): ensure images are all pulled through artifactory
  • Document captest scope including outstanding work (marked TODO)
  • https://github.com/elsevier-centraltechnology/cortex-inspector/pull/389

2025-03-25

  • random query from Pablo about terraform ownership
  • calico
    • manual report for Claire
    • perm issue locally running poetry run inspector -v DEBUG -p file -g */reaxys/ns check_calico_global_network_policies:
      kubernetes.client.exceptions.ApiException: (403)
      

Reason: Forbidden HTTP response headers: HTTPHeaderDict({‘Audit-Id’: ‘9595a1e8-f0c3-4607-ab1b-f8c809ce5b48, 9595a1e8-f0c3-4607-ab1b-f8c809ce5b48’, ‘Cache-Control’: ’no-cache, private, no-cache, private’, ‘Content-Length’: ‘285’, ‘Content-Type’: ‘application/json’, ‘Date’: ‘Tue, 25 Mar 2025 11:29:59 GMT’, ‘X-Kubernetes-Pf-Flowschema-Uid’: ‘caac1cb9-3b7e-4741-973c-ecadc0fcc4ea’, ‘X-Kubernetes-Pf-Prioritylevel-Uid’: ‘38efd130-b46d-4a93-888b-9e719320ea93’}) HTTP response body: {“kind”:“Status”,“apiVersion”:“v1”,“metadata”:{},“status”:“Failure”,“message”:“globalnetworkpolicies.projectcalico.org is forbidden: Operation on Calico tiered policy is forbidden”,“reason”:“Forbidden”,“details”:{“group”:“projectcalico.org”,“kind”:“globalnetworkpolicies”},“code”:403} ```

2025-03-24

2025-03-21: Vacation

2025-03-20

  • testing and patching inspector 0.13.0
    • review tickets with Khush
    • couple of bug fixes
    • catch up capability tests
  • 1-2-1
    • more partner facing
    • OKRs not contain much beyond keeping the lights on
  • engineering forum
    • tier 1 ordering rollback
  • 0.13.1 release

2025-03-19

  • inspector test and fix
    • long day fixing UI bugs and Grafana too

2025-03-18

  • Tier 1 confusion into the open
  • Pick up smoke test rollout

2025-03-14

  • complete the generic check
    • test with new bundles on Monday
    • work with Ashish on RBAC check

2025-03-13

  • mostly working on beta, prod and tier1 releases
  • small amount of work on generic check

2025-03-12

  • walkthru Inspector for Paul
  • spec out ticket for check_registries in Inspector dashboard
  • impl /checks apropos of ceip-6414-ui-generic-check
  • fire fighting
    • ipi-admin-prod incident due to calico OOM
      • postmortem
    • ipi-dev-beta no longer reporting calico issues
      • bundle not created correctly, yet works locally
    • smoke test failures
      • deploying s3 patch
      • investigating resource contention as cause of kyverno and secrets-store failures

2025-03-11

  • investigate some hugo updates and discuss with team
  • new cluster for CSC

2025-03-10

  • Bug fix of S3 cap test that should only run on sandbox cluster
  • README inserts for feature list of captest and inspector based on inspector ui. All now idempotent
  • retro

2025-03-07

  • Calico new kind (Tiers) investgation for business services
  • Standup with Ashish and Thomas
    • fix AN’s GHA commit issue
    • run over the VPA cap test at length
  • Inspector UI API
    • Exposed cronjobs thru swagger and do away with the custom UI
    • also DRY up a bunch of env vars
    • find . -name "*.py" -exec wc -l {} + = 7253 down from 10605 (32% reduction)
    • find inspector_ui -name "*.py" -exec wc -l {} + = 3956 down from 5157 (23% reduction excluding tests)

2025-03-06

  • Inspector UI API
    • cronjobs not running
      • proved to be a difference between the bundle api on UI and lambda So stripped the indirection thru UI and saved a bunch more redundant code.

2025-03-05

  • Inspector UI API change complete, inc. After this change we have now reduced the code base as follows:

    • find . -name "*.py" -exec wc -l {} + = 7316 down from 10605 (31% reduction)
    • find inspector_ui -name "*.py" -exec wc -l {} + = 3993 down from 5157 (23% reduction excluding tests)

    At the same time performance improved so much I cannot quite believe it = very nearly 900%! The following fetches a report for a single alpha cluster. I ran it a couple of times to warm things up but although both get faster it doesn’t make a lot of difference to the ratio.

    AFTER real 0m0.334s

    time curl -X 'GET'   'https://inspector-ui.cortex-non-prod.elsevier.systems/reports?platform-class=alpha&product=webpresence&cluster=dev'   -H 'accept: application/json'
    

    BEFORE: real 0m3.714s

    time curl -X 'GET' \
      'https://inspector-ui.cortex.elsevier.systems/report/alpha/webpresence/dev' \
      -H 'accept: application/json'