Left over from 2024
- inspector UI
- https://github.com/elsevier-centraltechnology/cortex-inspector/pull/296
- scan vs query in analysis and report manager
- pagination
- https://github.com/elsevier-centraltechnology/cortex-inspector/pull/296
2025-01-31
- fixing more fluent captest issues
- release
- review of Inspector UI API
- find . -name “*.py” -exec wc -l {} +` = 10605
2025-01-30
- simple terraform implementation of captest release: doh!
- fix fluent, vpa, kubecost tests
- reviews, mostly for Khush
- release, abortive due to token issues on GHA
2025-01-29
- fix token issues locally
- working on python regex impl of captest terraform release
- stuck
2025-01-28: Vacation
2025-01-27
- retro
- raise GitHub alert thresholds
- ask GH about lack of events from automated creation, deprecation of create-release action
2025-01-24
- inspector release engineering
- settled for part manual intervention
- person runs make a release workflow
- GHA bumps the versions, commits and drafts the release
- person reviews and publishes the release
- GHA build and deploys all the images, terraform etc
- fix terraform permissions failure (GHA role cannot read state bucket)
- cannot use GHA role that can do terraform because it can’t do other stuff
- tried adding permissions - the usual nightmare
- :lightbulb: have terraform build code take role as parameter?
- could later refactor as poetry terraform plugin
- in the end just call existing workflow and call it good.
- settled for part manual intervention
2025-01-23
- CICD incident-update-docker-desktop-for-mac
- cortex engineering forum
- tier 1 release process
- karpenter release process
- inspector release engineering
- review some PRs from Khush
- testing with different workflows, different action implementations and different triggers I have found no permutation that will create the release such that the release event will trigger the deployment workflow. Ditto automated tags do not trigger tag events. Doing releases and tags manually trigger the downstream workflow just fine.
2025-01-22
- 121
- poetry plugins for deployment
- lambda
- terraform
- support sdlc, standardisation
- env cost alongside finops
- hits cost and modernisation msgs from de anna
- poetry plugins for deployment
- Inspector
- release publishing dev images from prod and prod from dev (always lagging a version) due to git ref being the one at the start of the process not the commit made within the workflow.
2025-01-21
- inspector workflow
- terraform apply on release
- inspector ui
- CPU saturated as shown by NR;
SELECT max(`k8s.container.cpuLimitCores`), max(`k8s.container.cpuRequestedCores`), max(`k8s.container.cpuUsedCores`) FROM Metric FACET `k8s.namespaceName` WHERE `k8s.namespaceName` = 'inspector-ui' SINCE 30 minutes ago TIMESERIES UNTIL now
- CPU saturated as shown by NR;
2025-01-20
- continue simulating timeout
- diff approach, client side but turns out still need server change because otherwise responds to incomplete posts
- reviews for KA PRs
- planning
- Inspector release and workflow updates (again)
2025-01-17
- reviews
- tier1 release
- captest for s3 from GP
- investigate how to test Skipper timeout
- mod statuscode to support Keep Alive?
- start posting but hold back last byte?
- long convo on fluent bit with AN and SG
- difficulty of creating to the right regex to capture all mutli-line scenarios
- how about format logs differently https://runebook.dev/en/articles/spring_boot/application-properties/application-properties.core.logging.exception-conversion-word
- standup = 1h
- captest fix rollout
- beta ELB perms: https://elsevier.atlassian.net/browse/CEIP-6955
- investigate PPE external-dns issues: https://global-elsevier.slack.com/archives/C05T41TA278/p1733228086053019
2025-01-16
- prep and discuss CPS platform request
- left it with Irfan / James to decide thru the TPR proess
2025-01-15
- inspector
- reviewed docs (again)
- captest
- skipper failure
2025-01-14
- inspector
- checkout captest results
- nonprod
- passing: 1490
- failing: 300
- skip on fail: 76
- nonprod
- release
- clean up Fix workflows at v0.10.0
- also image publication did not happen (did it manually)
- review w Khush then Claire, Irfan
- checkout captest results
2025-01-13
- lot of time spent on Docker being marked malware by MacOS:
- https://www.docker.com/blog/incident-update-docker-desktop-for-mac/ -phishing training
- inspector migration to poetry 2.0
- 30 min expt on rust from python: https://pythonislove.com/supercharge-your-python-apps-with-rust
2025-01-10
- 121
- objectives - team level
- EKR updates and other BAU
- look at deAnna’s deck for more strategic stuff
- eg cortex lite (heroku like experience)
- repo + manifests = working app
- migration rather than revolution compared to cortex
- struggling to get chris onboard
- repo + manifests = working app
- serverless
- propose to constrain to lambda because widely adopted
- acct factory most complex today (uses step functions)
- what can we offer to permit devs to focus on product code
- propose to reach out to partner teams (but not likely to be before H2)
- position for ‘cortex dev xp’ not serverless ‘cos org not ready
- dev xp is on roadmap
- eg cortex lite (heroku like experience)
- want to pivot KPIs away from number of clusters
- doesn’t do justice to fact that may be 80% of dbs workload in 4 clusters
- helios
- tagging to move away from lots of accounts
- expect to lead to revamped tagging standard
- would permit reporting on namespaces or helios product / subproduct
- Schuler
- done some kt
- getting ready to engage teams on enablement to build pipeline
- will work with BU leads to create enablement teams
- personal goals
- selfish / interests
- uptake of dev10, at least 80%
- RE cares
- security journey (should be ready by end of Jan)
- respond to OfficeVibe (80%)
- ops goals
- SDLC maturity
- project ’nesta’ (?): gen ai chatbot
- software standards
- templates
- common GH workflows
- convergence
- cap test: happy as evidenced by CSI test creation
- inspector cli with platform manager
- objectives - team level
2025-01-09
- skipper ingress class
- complete PR for new tests
- schedule skipper 6 rollout
- review poetry 2.0 PR
2025-01-08
- skipper ingress class
- refactor tests to eliminate duplication between external-dns and skipper
- new test to see if can create two default ingressclass - yes we can!
- Conclusion:
- we need the Skipper PR to allow creation of ingress in the isdp / sciencedirect scenario
- k8s is not enforcing the expected behaviour that 2 default ingressclass are prohibited or that a classless ingress is prohibited interestingly the aws-load-balancer-controller is using a webhook to enforce 2. which is how we discovered this problem
- there is no aws ‘bug’ to report, it’s more that expected behaviour is not enforced. The line after the
Cautionsays: There are some ingress controllers, that work without the definition of a default IngressClass. For example, the Ingress-NGINX controller can be configured with a flag –watch-ingress-without-class. It is recommended though, to specify the default IngressClass - Therefore: we will create a kyverno policy to prohibit creation of ingressclass with default = true by partners
- TIL: remove finalisers on ingress (when class = ‘alb’)
kubectl patch ingress classless-ingress-test-cortex -n kube-system --type='json' -p='[{"op": "remove", "path": "/metadata/finalizers"}]'
2025-01-07
- chat with Khush about PRs and resolve his python environment
- skipper ingress class
- modify the helm chart, deploy to single cluster on branch
- https://github.com/elsevier-centraltechnology/tio-helmcharts-core-platform/pull/330
2025-01-06
- dataplatform rbac issue
- discussion with Ashish
- usefulness of cedar as general low resource policy engine
- how to deploy skipper ingress class? (presumably system components)
2025-01-03
catch up on slack and email
implement external dns parent check discussed with Felipe
def find_my_parent(record, zones): for inc in range(1, record.count('.')): found = [z for z in zones if z == '.'.join(record.split('.')[inc:])] if found: return found raise IndexError("no parent")
Ref conversation with Mark: https://global-elsevier.slack.com/archives/C046V4MTB16/p1733234805643939
- `external-dns` capability tests still failing
- narrowed it down to `isdp` and `sciencedirect`
- turns out they define an ingressclass, which apparently breaks the [missing] skipper one referenced by test
## 2025-01-02: vacation