2024-08-29
- review test API with Matteo
- annotation approach to warn if changing generated code possible?
- test to enforce code matches openapi.yaml
- 202 from POST
- POST return location header
- can we give indication of when to retry ?
- may need to have inspector know more about tests
- infer source account from endpoint called
- change /test/<smoke | etc>
- target suite version as own endpoint
- POST /capability-test/smoke//
- POST /capability-test/integration/
- target account, cluster, region, partner, suite
- POST /capability-test/acceptance/
- target account, cluster, region, partner, suite
- naming: suite -> component
- include v1 in endpoint
- annotation approach to warn if changing generated code possible?
- api changes
- return 202 from long-running POST
- return Location header (allows dumber client, automatically connecting POST and GET enspoints)
- dedicated endpoints for smoke and acceptance tests
- does ‘capability-test’ make sense as the umbrella term?
- I personally think it is closer to ‘acceptance’ and ’test’ should be the umberlla term but leaving it as capability test is already recognised
- ’target-suite’ renamed ‘component’
- The rationale is to avoid test internals bleeding through
- but ‘component’ is a vague and over-loaded term
¯\_ (ツ)_/¯
- include v1 in endpoint path to permit future change
- I have not done this for the existing endpoints, they can easily add v2 if and when needed
- impl changes
strfdatestart and end times in NewRelic listener
2024-08-28
- capability test as job
- beta reconcile
- daily run testing (as opposed to reconciliation)
2024-08-27
- dev duty (from yesterday!)
- hm-core-platform KSI/CSI migration troubleshooting
- LOATHED: trying to get simple permission change rolled out:
- ask Thomas (admittedly a bit lazy): https://global-elsevier.slack.com/archives/C030F90FM7U/p1724749968973529
- create system components PR
- create alpha JIRA
- create alpha PR
- repeat for beta and prod
- wait for Build
- alpha completed
- webpresence KSI migration troubleshooting
2024-08-26: Public holiday
2024-08-24
- capability tests
- small tweaks to api like error handling
- tested locally
- created cronjob to invoke daily
- hit permission problem: unable to create jobs as Agent: https://github.com/elsevier-centraltechnology/cortex-platform-system-components/pull/39
2024-08-23
- aws ingress controller ops calls (pretty well all day)
- bash repo: https://github.com/els-grazziotinf/ceip-6213
- fix ingress class name check
- chat with scott about external-dns
- update cap test endpoints as suggested by Matteo
2024-08-21
- aws ingress controller ops calls (4h)
- implementation of cap test endpoints
2024-08-20
- fighting with NR api (not metric but event)
- got Khush’s inspector ORs updated and merged, inc. 0.6.4 release
2024-08-19
- release inspector 0.6.4 for Khush
- further clarify checks of ingresses with Felipe)
- and cloud foundation stack with Ashish
- Planning mtg
- follow up assessments
2024-08-16
- DRAFT: System component impact
- 1-2-1
- pain of shared resposibility amd Karpenter
- cloud foundary
- app container layer on top of cortex
- inspector cap test endpoints
- mocked
- feature flag enabled tests
- sciencedirect
- gocd from thoughtworks
- already bought into developers develop and hand over
- cortex apps
- proto plan
- MVP:
- register name, cost code and ???
- assume docker image in public repo that runs http service
- delivers GHA: aka Initializr
- hello world running
- log to new relic and expose over Grafana
- metrics to new relic and expose over Grafana
- no secrets
- Testing / Acceptance / Production readiness
- app tests (partner provided, cortex executed)
- Later
- host domain
- ‘app groups’ that app can talk to
- kube watcher to report OOM etc -> refer to specialist
- cortex vault
- How to cross-charge?
- Gigabyte hour + standing charge to cover empty cluster
- scale down out of hours
- deployment (effectively entire app) for each PR
- MVP:
- proto plan
2024-08-15: Vacation
2024-08-14
- digging self out of hole over stupid comment on slack
- unit testing aws route53
- Knowledge Karpenter call
2024-08-13
- dropped cap test “Do you want me to rush the rush job I am rushing now, or rush the rush job you wanted me to rush, before I rush the rush job I’m rushing now, or rush the rush job I was rushing before.”
- wasted time on understanding
aws_route53.py:- two lists, one derived from the other, the base not cached, the derived is
- 2h mtg on hotfix or rollback aws ingress controller change
2024-08-12
- fighting building new image for capability tests
- large, slow builds and unstable zscaler
- changes associated with updating deps
- nothing on the actual problem!
- retro: 1h
2024-08-09
- discussion about testing
- general agreement, refinement
- new inspector endpoint for testing inc. reply
- general agreement, refinement
- capability testing
- figured out how to do Kustomize with Matteo & Luis
- deployed master build as 0.3.0 image to artifactory
- can use overlay for
TARGET_SUITE=calico - next test for
TARGET_SUITE=calico smoke
- stand up
- Karpenter
- Could roll out Karpenter controller, then add NodePools, then set MNG to not schedule, then drain MNG
- Potentially could push MNG decomissioning closer to partner control
- Use ‘when empty’ set as flag to decomission node pool not ‘when ???’ as currently
- may slow down urgancy and eliminate ‘churn’
- Karpenter
2024-08-08
- engineering forum
- Notifier
- karpenter
- memory request and limit should be the same
- size for any spikes, slack is within resource spec not space capacity at cluster level https://aws.github.io/aws-eks-best-practices/karpenter/#configure-requestslimits-for-all-non-cpu-resources-when-using-consolidation https://www.loft.sh/blog/how-to-set-up-kubernetes-requests-and-limits#:~:text=By%20setting%20equal%20memory%20requests,pod%20is%20exceeding%20its%20request.
- memory request and limit should be the same
2024-08-07
- call w Matteo and Liam for testing integrated with reconciliation
unit tests: simple e ghelm dry run (outside eks)
integration and acceptance tests similar but tolerate more expense (tiem) for acceptance
also smoke
need API
how to get output, custom new relic event?
todo
- agree terminology
- agree whcih clustr / tier can exercise hich tests
- smoke integrate version
2024-08-05
- adopt precommit
- fix sonar issues
- cluster diffs
- planning
- ORR, complete page pass to Neil
- Inspector feedback: approach Fabricio, John Miller, Mark Ferrari or Rob and then Juan Ipi.
- capability test scheduling
- security patch script to pick up from Daniel’s documentation
- thinking about ops tool
- publish Thomas’ tool to Slack
- cronjob like cap test?
2024-08-02
- time: 09:00-11:15, 12:30-
- poetry pre-commit
- cannot commit git hook, developer has to enable somehow, integrate into build script?
2024-08-01
- kong article
- fire fighting over CSI secrets
- pre-commit work
- Karpenter chat, inc. recording tickets and reviewing enablement channels
- Catch up with Felipe
- exposure on CE clusters (Damir convo)
- whitespace changes in Khush’s latest PR: Align Inspector priorities to Vulnerability Management policy