Next:
alias cortex-prod-admrole=“aws sso login –profile cortex-prod-admrole && export RPROMPT=cortex-prod-admrole && export KUBECONFIG=~/.kube/clusters/cortex-prod-cluster”
- report on calico global netowrk policy use
- report on priority class usage
- report on hpa use
- report expiry of artifactory token, renew and add to AWS secretsmanager
- -CEIP-5593 Docs: Review cortex-documentation (backstage)
2024-04-29
- planning
- painful conversation about the partners with missing resource specs and others with bad ones.
- could automate report for the missing ones
- what about judging ‘good’ limits
- felipe suggests total requests and compare to actual
- ksi removal in ce prod in next 2 weeks so need to get scheduled
- painful conversation about the partners with missing resource specs and others with bad ones.
- external-dns
- ceip-5577
- crtxctl as lambda
2024-04-26
- ceip-5577
- crtxctl as lambda
2024-04-25
- ceip-5577
- crtxctl as lambda
- fix tests
- more lambda testing
- TIL: cloudwatch log entries change periodically (15mins), on redeploy and on crash.
- crtxctl as lambda
- CCMS:
- Felipe, Claire, Tim Sm, Tim St
- James Keena (TIO),
- Kent Haynes (software eng mgr, no EKS xp, EC2 and traditional),
- Connor Skio
- Jeffrey Aoyagi
- quarter by quarter planning process
2024-04-24: AWS summit, London
- serverless security workshop
- lambda deep dive
- Serverlessland.com
- Scope down lambda IAM
- karpenter and spot
- Need to monitor spot termination and spot allocation score
- 20% more interruptions from Karpenter than spot
- https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2/client/get_spot_placement_scores.html
2024-04-22
- ceip-5577
- crtxctl as lambda
- needing to rework the kubeconfig differently for lambda and cli
- checked out state of cluster_diff with / for AN
2024-04-22
- ceip-5577
crtxctl as lambda
- reworking auth to (hopefully) remove need of cortex-prod-admrole as well as solving the lambda issue
class diagram
swagger api?
2024-04-19
- ceip-5577
- crtxctl as lambda
- initially not running as Inspector but could not assume.
- got stuck on the IAM permissions cos lambda already has Inspector and trying to assume the same.
- crtxctl as lambda
2024-04-18
- ceip-5577
- crtxctl as lambda
- dump the two-stage Dockerfile finally gets something running
- no logging under
runtime interface emulator, all doc examples seem to be ’live’
- crtxctl as lambda
2024-04-17
- ceip-5577
- crtxctl as lambda
- python lambdas require layers to be deployed as separate artifact - eff that
- looked at kubeless, knative, apache whisk all seem to be a PITA
- returned to lambda done as container
- much time wasted on aws docs suggesting two stage docker file
- concluded fast api image on EKS may be the best better after all
- crtxctl as lambda
- CEIP-4648 - cap test iam docs
- CEIP-4827 mark created resources with owner reference
2024-04-16
- ceip-5577
- validate advisor requirements present in bundle
- show and tell
- notes: https://elsevier.atlassian.net/wiki/spaces/~grazziotinf/pages/119601475099772/Inspector+v2
- outcome: continue
- planning mtg
2024-04-15
- review Alerts prezo with AN
- ceip-5577 (7h)
- crtxctl bundle creation
- publisher interface and console + s3 implementations
- c3 (1h)
2024-04-12
- Osmosis incident
- debug inspector
- ceip-5573: inspector api test
- CEIP-5577: poc crtxctl bundle creation (3h)
2024-04-11
debug Inspector
- cannot build image when on Zscaler because of go parts, cannot build off due to crtxctl parts
- invoke
curl -H "Content-Type: application/json" -X POST https://inspector.cortex-non-prod.elsevier.systems/inspect -d ' { "targetCluster": { "platformClass": "alpha", "product": "core-engineering", "clusterName": "test-cluster" } }' {"eventId":"1b699f95-3a08-464c-ab3e-889acda3defb"} - look for in S3
- This is the error:
{ "level": "error", "message": "failed to set target: operation error EKS: DescribeCluster, get identity: get credentials: failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: ee28cc4b-685f-4390-b6cd-a000b0d29e7c, api error AccessDenied: User: arn:aws:sts::781632261136:assumed-role/Cortex-Inspector/cortex-inspector-nonprod is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::781632261136:role/test-cluster-Cortex-Inspector-Agent", "timestamp": "2024-04-11T13:43:00Z", "clusterId": { "platformClass": "alpha", "product": "core-engineering", "clusterName": "test-cluster" } } - conclusion:
- proceed with principle of prod platform managing non prod partner clusters
- deploy prod inspector
long and tortuous conversation with Juan Angel about csi secrets
- TIL: secrets will not be updated if mounted by any pod: https://github.com/aws/secrets-store-csi-driver-provider-aws/blob/main/provider/secrets_manager_provider.go#L113-L131
- means that need to redeploy secrets provider with unique name (or remove secret and pod to force refresh)
2024-04-09
- CEIP-5527: metrics: complete and merge
- CEIP-5504: Add ArgoCD post-sync hook for external-dns
- revisit the bootstrap, no relevant drifts but fixed a bunch anyway
- revisit debugging
make runfor capability tests- realise Makefile syntax is working against us
- consider bash or python
- after diversion on python repl not convinced that helps
- consider more tomorrow.
2024-04-08
- CEIP-5527: metrics
- manually performed
- automation investigation
- CEIP-4469
- figure out why init_db cannot connect to new kong_labs db. Wrong password?
- need to create brand new color?
- 1-2-1 w IP: lots of notes made as he talked solidly for an hour!
2024-04-05
- CEIP-4469:
- ctd scripting build, hitting Invalid RBAC
- debug database:
kubectl run -n labs -it --rm pgclient --image=jbergknoff/postgresql-client --restart=Never postgresql://kong_labs@kong-labs-20240313-1238.cclgtepwgx5u.eu-west-1.rds.amazonaws.com:5432/kong_labs
2024-04-04
- CEIP-5536: docs write up
- CEIP-4469: return to this after FG did some route group work to replace static listener nginx hack
- paired and solved credentials issue
- document steps, now reading from secretmanager
- begin moving to robot for easier repeatability as steps have grown to > 30 lines
2024-04-03
- CEIP-5408: Revised (moved to diff page) and completed
- CEIP-5536: Create, debug with team and ultimately solve CORS issue
2024-04-02: Half day vacation
- CEIP-5408: Started
- dev duty
- TIL: CPU savings from Kaprenter in Alpha