Next:
alias cortex-prod-admrole=“aws sso login –profile cortex-prod-admrole && export RPROMPT=cortex-prod-admrole && export KUBECONFIG=~/.kube/clusters/cortex-prod-cluster”
report on calico global netowrk policy use
report on priority class usage
report on hpa use
report expiry of artifactory token, renew and add to AWS secretsmanager
-CEIP-5593 Docs: Review cortex-documentation (backstage)
review README for captests and update Luis
pre-delete checks: security group, lb, eni
kyverno
- give me all kyverno issues, then exclude the known, report new ones
- cheaper? check if volume of errors is growing week over week
cronjob
alt-ui on backstage
sonarlint?
release scripting?
alt-ui?
2024-07-31
- respond to slack about newrelic logging
- review Khush’s latest PR
- discuss ops tool issue with Thomas
- look at formatters and linters to standardise code conventions and avoid whitespace changes in PRs
- 20:39
2024-07-30
- sorting house issues.
2024-07-29
- morning on house
- retro
- chat with Khush about priority
- Kong KSI migration
- all dataplanes need 3x mtls certs
- admin pwd can be k8s secret
- newrelic (our key for other clusters)
- needed by all
2024-07-26
- Karpenter experience
- Partners concerned about node instability only stable when inactive Real question should be response time (which will be fine if correct PDBs)
- Tension between ops wanting highly utilised and partner wanting high response time
- Can release Karpenter at any time, only when scale down MNGs does any impact manifest
- Knowledge running solar without topology spread and anti-affinity such that 3 replocas on one node Implication would be if that node went down, would need manual intervention to recover
- ‘Fast lane’ release
- hotfix breaks out of whereever the fire is
- feature flags must be implemented for delayed releases
- the intermediate releases must go all the way and are uninteruptible
- TODO: commit hook for tflint, terrafrm fmt, carriage return at end of file
- TIL: blocking ticket not enough visibility: yank due date TODO: can this be included in opsgenie-slack?
2024-07-25
- (https://elsevier.atlassian.net/wiki/spaces/TIOCE/pages/119601452282728/Core+Engineering+Repositories+Review)
- Electronic Warehouse
- Vijay Ayyanar
- download, scan and tag content from S3
- in K8s cluster
- in new AWS account
- regular monthly terraform module updates
- IP address exhaustion a frequent issue
- have pod autoscaler
- add new VPC to existing?
- perf improvement thru Cortex? Karpenter and monitoring
- pushing logs to both Elastic and OpenSearch
- EKS clusters: msms-non-prod-eks-cluster & msms-prod-eks-cluster (1.28)
- Claire
- Have you discussed joining existing cluster
- Tim P
- ‘BigOps’ TIO is central to ELS
- Cost monitoring is important
- Can do cost monitoring at namespace level but Adam (FinOps) has reverted to account level
- Vijay Ayyanar
2024-07-24
- More KSI migration
- Start on request spec additions
2024-07-23
- KSI migration
2024-07-22
- KSI migration
- planning
2024-07-18
- recover Dev10 not actually done when recorded previously
- (MIWG)
2024-07-17
- CEIP-5785: Inspector Q2
- documentation update
- logging review
- performance and tuning (remove unnecessary role assumes)
- debugging release scripts
2024-07-16
2024-07-15
2024-07-12
- reviews for AN and TV
- SSDR not being created, why?
- liaise with Juan on IPI resource requests
- Replaced mangum with much pain
2024-07-11
- review
inspector_uiwith Khush- add count per tier and total count to /info endpoint
- ensure all config injectable as env vars
- complete simplification of
check_request_specs - bug fix in /info serialisation
- liaise with Juan on IPI resource requests
2024-07-10
- fix serialisation bug in get bundle list
- CEIP-6020 Juan querying accuracy of resource report
- simplify complex code in
check_request_specs
2024-07-09
- further discussion with Ashish on publishers and PlatformCommand
- resolve some confused thinking around error handling on PR111
- Argo app-of-apps demo
- Inspector Q2 review
2024-07-08
- co-editing session with Ashish on reading api resource groups and versions
- planning
- debugging with Thomas too.
2024-07-04
- couple of inspector Jiras tidied up
inspector_uireviews- discussion about inspector api / queue or not
2024-07-03
Rights and Permissions Collector (RPC)
database of images we don’t have permission to use.
workflow (termed project) starts when identify image without permission to use
internal users plus suppliers outside ELS
no workflow (Camunda) - though seems like ideal candidate
Java 17, [one of?] the base images has a vulnerability that needs to be patched.
OPWWAT file [vulnerability] testing outstanding
react migration is phased, pending while Cortex work happens
inspector /platforms endpoint; abandon the all clusters endpoint
review all inspector tickets
2024-07-02
- problem (unspecified) with version deployed in prod
- turned out to be double-wrapping of bundle as BundleMetaData
- fix:
- /info in grafana: https://grafana-np.elsevier.net/d/inspector-cli/inspector-bundles?orgId=1
2024-07-01
- publish latest
check_request_spec - CEIP-5938: Update Inspector role access to Agent role
- CEIP-5965 Cortex Metrics for 2024-07
- add new metric
- and prep report
- deploy both nonprod and prod with same version.