2025-03-31
- tier1 rollout
- investigations job
2025-03-28: vacation
2025-03-27
- prod release
- 1-2-1
- ** Software Engineering Lead (level 13): https://wd3.myworkday.com/relx/d/inst/1$9925/9925$189882.htmld
- TraceIQ: claims support fairer society. Who are customers?
- PHP stack, API first
- cross-functional agile team. Sys Arch and SRE external
- DBS Tech Optimisation (level 11): https://wd3.myworkday.com/relx/d/inst/1$9925/9925$189352.htmld
- If the above is not ‘DBS facilitation’ where is it?
- need OKRs written
- ** Software Engineering Lead (level 13): https://wd3.myworkday.com/relx/d/inst/1$9925/9925$189882.htmld
- discuss db approach
- discuss captest PR 389 with Ashish (esp. kyverno)
- pydantic validators
- ELSAPI
- Genti Topija; Michelle Barboza; Saravanan Selvam; Chris Langridge; Christina Costello-Starland, Thomas Viaud
- 5 apps , being containerised
- selected north-south first
- WIP getting onto EKS, containerisation
- puppet
- centos 7 exception process
- second exception until just June 30
- Avik, Terry are architects
- WIP getting onto EKS, containerisation
- alpha = tio dev, beta = developers, prod = prod (same as APIM)
- BAD TASTE that basically told should not have got involved.
- OKRs ops team brainstorming
2025-03-26
- releases
- beta calico test et al.
- went fine
- prod
smoke test rollout Failed
- dataplatform rdp-tooling
- core-engineering prod
- emealasp prod
- hm-core-platform cluster-prod
- identity id-prod-use
Not run
- identity,id-prod-euw
- hm-core-platform,cluster-prod-ap
- hm-core-platform,cluster-staging-ap
Rerun
- hm-core-platform,cluster-prod: success
- identity,id-prod-euw: success
- core-engineering prod: success
- emealasp prod: success
- hm-core-platform,cluster-prod-ap: success
- dataplatform rdp-tooling
- beta calico test et al.
- fix(ceip-7601, captest): ensure images are all pulled through artifactory
- Document captest scope including outstanding work (marked TODO)
- https://github.com/elsevier-centraltechnology/cortex-inspector/pull/389
2025-03-25
- random query from Pablo about terraform ownership
- calico
- manual report for Claire
- perm issue locally running
poetry run inspector -v DEBUG -p file -g */reaxys/ns check_calico_global_network_policies:kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden HTTP response headers: HTTPHeaderDict({‘Audit-Id’: ‘9595a1e8-f0c3-4607-ab1b-f8c809ce5b48, 9595a1e8-f0c3-4607-ab1b-f8c809ce5b48’, ‘Cache-Control’: ’no-cache, private, no-cache, private’, ‘Content-Length’: ‘285’, ‘Content-Type’: ‘application/json’, ‘Date’: ‘Tue, 25 Mar 2025 11:29:59 GMT’, ‘X-Kubernetes-Pf-Flowschema-Uid’: ‘caac1cb9-3b7e-4741-973c-ecadc0fcc4ea’, ‘X-Kubernetes-Pf-Prioritylevel-Uid’: ‘38efd130-b46d-4a93-888b-9e719320ea93’}) HTTP response body: {“kind”:“Status”,“apiVersion”:“v1”,“metadata”:{},“status”:“Failure”,“message”:“globalnetworkpolicies.projectcalico.org is forbidden: Operation on Calico tiered policy is forbidden”,“reason”:“Forbidden”,“details”:{“group”:“projectcalico.org”,“kind”:“globalnetworkpolicies”},“code”:403} ```
- inspector
- CEIP-7580: tier1
- CEIP-7602: ipi-admin-prod
- CEIP-7601: platform namespaces
- https://elsevier.atlassian.net/browse/CEIP-7542: smoke tests in beta
- check alpha is clean
2025-03-24
- responded on metrics
- new job:
- Pure impl mgr (IC level 9): https://wd3.myworkday.com/relx/d/inst/1$9925/9925$190073.htmld
- Risk Solutions consultant (level 11): https://wd3.myworkday.com/relx/d/inst/1$9925/9925$189973.htmld
- Risk Prod Mgr (level 12): https://wd3.myworkday.com/relx/d/inst/1$9925/9925$189923.htmld
- ** Software Engineering Lead (level 13): https://wd3.myworkday.com/relx/d/inst/1$9925/9925$189882.htmld
- Environmental and well being
- DBS Tech Optimisation (level 11): https://wd3.myworkday.com/relx/d/inst/1$9925/9925$189352.htmld
- Inspector
- review Khush feedback
- careful review+patch+test+release
2025-03-21: Vacation
2025-03-20
- testing and patching inspector 0.13.0
- review tickets with Khush
- couple of bug fixes
- catch up capability tests
- 1-2-1
- more partner facing
- OKRs not contain much beyond keeping the lights on
- engineering forum
- tier 1 ordering rollback
- 0.13.1 release
2025-03-19
- inspector test and fix
- long day fixing UI bugs and Grafana too
2025-03-18
- Tier 1 confusion into the open
- Pick up smoke test rollout
2025-03-14
- complete the generic check
- test with new bundles on Monday
- work with Ashish on RBAC check
2025-03-13
- mostly working on beta, prod and tier1 releases
- small amount of work on generic check
2025-03-12
- walkthru Inspector for Paul
- spec out ticket for check_registries in Inspector dashboard
- impl /checks apropos of ceip-6414-ui-generic-check
- fire fighting
- ipi-admin-prod incident due to calico OOM
- postmortem
- ipi-dev-beta no longer reporting calico issues
- bundle not created correctly, yet works locally
- smoke test failures
- deploying s3 patch
- investigating resource contention as cause of kyverno and secrets-store failures
- ipi-admin-prod incident due to calico OOM
2025-03-11
- investigate some hugo updates and discuss with team
- new cluster for CSC
2025-03-10
- Bug fix of S3 cap test that should only run on sandbox cluster
- README inserts for feature list of captest and inspector based on inspector ui. All now idempotent
- retro
2025-03-07
- Calico new kind (Tiers) investgation for business services
- Standup with Ashish and Thomas
- fix AN’s GHA commit issue
- run over the VPA cap test at length
- Inspector UI API
- Exposed cronjobs thru swagger and do away with the custom UI
- also DRY up a bunch of env vars
find . -name "*.py" -exec wc -l {} += 7253 down from 10605 (32% reduction)find inspector_ui -name "*.py" -exec wc -l {} += 3956 down from 5157 (23% reduction excluding tests)
2025-03-06
- Inspector UI API
- cronjobs not running
- proved to be a difference between the bundle api on UI and lambda So stripped the indirection thru UI and saved a bunch more redundant code.
- cronjobs not running
2025-03-05
Inspector UI API change complete, inc. After this change we have now reduced the code base as follows:
find . -name "*.py" -exec wc -l {} += 7316 down from 10605 (31% reduction)find inspector_ui -name "*.py" -exec wc -l {} += 3993 down from 5157 (23% reduction excluding tests)
At the same time performance improved so much I cannot quite believe it = very nearly 900%! The following fetches a report for a single alpha cluster. I ran it a couple of times to warm things up but although both get faster it doesn’t make a lot of difference to the ratio.
AFTER real 0m0.334s
time curl -X 'GET' 'https://inspector-ui.cortex-non-prod.elsevier.systems/reports?platform-class=alpha&product=webpresence&cluster=dev' -H 'accept: application/json'BEFORE: real 0m3.714s
time curl -X 'GET' \ 'https://inspector-ui.cortex.elsevier.systems/report/alpha/webpresence/dev' \ -H 'accept: application/json'