2024-09-30
inspector Q3 release
- review all outstanding
- replace pod with priorityclass cmds
capability tests not run? 🦺 Capability test results
from 2024-09-30 00:00 to 2024-09-30 14:06 suites: 1 (⬆ 0) tests: 1 (⬆ 0) failures: 0 (⬆ 0)
Found 1 capability_test_results Success: 1 report written to console
planning
- ksi
- start talking to partners to jolly them along
- will you be done by end Oct?
- IP says github runners will be helm fix so could fix partners for them in fixing the ce one
- start talking to partners to jolly them along
- ksi
2024-09-26
- inspector.check decorator
- single initialisation of commands
2024-09-25
- dry up inspector commands
find . -type f -name '*.py' -exec wc -l {} +- before (inc. tests): 13252
- before: 9480
- after: 8932
- slow day due to awful cough
2024-09-24
- tidy up cap tests
2024-09-23
- CEIP-6362 BETA: rollout of platform infrastructure 11.4.3
- permission boundary stuff approved late on Friday, retry cluster creation
- reconcile reports success
- informed partner
- inspector smoke endpoint:
- fluentbit smoke is failing - replaced by operator?
2024-09-20
- permission boundary changes in bootstrap
- Inspector release process
- proposals from KA
- like that it’s triggered on tagging
- looks good step forward, have not tested, need a picture of how the diff triggers launch diff workflows
- proposals from KA
- ctd integrating smoke API from train time yesterday
2024-09-19
- office day, meandering chat about current pain points
- some thoughts on possible actions: https://elsevier.atlassian.net/wiki/spaces/TIOCORTEX/pages/119601630781070/Cortex+Ops+2024+September+19th+Meeting
2024-09-16
- resolve permissions for NR user key
- release module 0.7.0 but not the images
- command endpoint for inspector (to schedule KSI for Claire)
- new permission problem (on lambda only)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:sts::781632261136:assumed-role/Cortex-Inspector/inspector-nonprod is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::058639152458:role/cws-nonprod-Cortex-Inspector-Agent[DEBUG] 2024-09-16T20:16:59.182Z 539e7352-62de-4799-a207-adeb5db8bb3f send({'type': 'http.response.start', 'status': 500, 'headers': [(b'content-length', b'21'), (b'content-type', b'text/plain; charset=utf-8')]})
- new permission problem (on lambda only)
- new command to check NR and report cap tests to Slack
2024-09-13
- mostly buried in permissions, some of it working with Ashish on external-dns
- solicit partner feedback on inspector
2024-09-12
- conclude existing cap tests for daily run
- add additional smoke tests
- Claire requires KSI update for Monday
- support 0.6.5 Inspector release
- wrote training review for Felipe
- chat with Ashish about cap test endpoint
2024-09-11
- moved to scheduling the existing cap tests for daily run
- agreed w Matteo to impl smoke tests rather than focus on API integration
- Ashish identified permission boundary as potential cause of NR publisation issue
2024-09-10
- dev duty
- fighting with IAM to get newrelic working
- Ashish highlighted permissions have to be opened at both ends
- still nada
2024-09-09
- EMCloud TPR1
- Bill Reuschlein scoping TPR1 at empty cluster
- Terry picked up on this
- Retro
- Irfan - start looking at 2025
- looking at 202 clusters (~60ish already on cortex)
- priority 1,2,3
- priority 1: eol already or extended support
- approached BU directors about 500% uplift in costs!
- at the same time cannot do all the work, need experts BU side
- goal of ‘abstracting away cognitive load’
- cherry pick some people as ‘interface’ (train the trainer)
- priority 2: eks but less than n-1 (1.28 falls into this at Nov)
- priority 3: consider when replatforming, upgrading etc
- generally accepted, ops mgrs planning, bu directors considering priorities
- Q1 / H1 tackle priority 1s
- Cortex XXX
- logging being elaborated now
- Inspector as currently envisaged done end ‘24
- new req’ts in Q1 25?
- Cortex Apps
- cf SDLC and tagging projects
- intends to get in front of DiAnna soon
- develop GenAi bot
- land grab
- trademark and get thru TPR1
- crossplane
- unsure if terraform is big enough problem to use
- arch wants CE to be capable of evaluating teams desired use
- IP prefers to offer ce composites in same way as terraform modules
- could use it cortex side to drop platman
- apps obstacles
- accounts is easy solution for cost codes but coming to end of savings possible
- consolidation becoming more pressing
- q: how access S3 in existing acct and vpc
- opportunity: get on on the ELS fabric to benefit from the goodies
- sdlc should deliver template apps
- identify focus groups to set templates for (boot, fastapi etc)
- ppt first doesn’t work historically
- expand paved road conversation then drop in poc to demo
- looking at 202 clusters (~60ish already on cortex)
2024-09-06
- smoke test
- fixed new relic test locally
- prepped cronjob
- found
arn:aws:secretsmanager:eu-west-1:183742092277:secret:service-account/new-relic-api-key-ANpqnlcontains what I want - appears to be controlled from
platform-infrastructure/prenode/iam.tfbut not working
2024-09-05
- daily cap test
- test triggering cronjob
- all appear to be running
- catch up with Khush
- seq diags
2024-09-04
- Osmosis scheduling issue
- retro observation? I spent a lot of time jumping to Slack notifications even when not actively contributing to them esp. last couple of days
- cap test API
- revert to body vars instead of having a split
- daily cap test
- add job name suffix
- impl the feature flag test and read value from NR if enabled.
- tested as job invoked from local machine
- harassment training
- seriously, pick your own harassment!
2024-09-03
- 1-2h wasted on ZScaler / NewRelic events API
- couple of hours to and from with Juan about LimitRanges
- updates and discussion on cap test endpoint with Matteo - positive
- daily cap test
- minimal time - merged in revised API and about to start debugging why not behaving as expected
2024-09-02
- planning
- inspector check: anyone using default storage class (changing from gp2 to gp3)
- felipe to lead ensuring 1-2-1 with 7 partners
- will get karpenter 0.x into prod then move to 1.x (there is impact)
- system component versioning
- ashish mentioned felipe has asked for hard feature flag
- left with felipe to report back on convo with Tim