October logbook

proposal for cap test to merge into inspector?
- opportunity to revise arch. diag.
- inspector -> cap test -> feature flag -> cap test vsn
get checkmarx working locally
fix SQ not reporting correctly in GHA

2024-10-31

more work on merging cap tests into inspector
engineering forum (karpenter release)
demo ibuild to ops team
sort out inspector_ui Dockerfile (copy not install python code)

2024-10-30

investigate alleged issues with check_resource_specs
extensible poetry build lifecycle (ibuild)
wash up Kong incident with Christian

2024-10-29

Learn: SDLC activity definitions

2024-10-28

KSI replacement for GHA
- approved, applied and merged PR from Friday
cap test in inspector

2024-10-25

KSI replacement for GHA
- syntax error on committed code
- eventually got to the bottom of that with FG help
some random Kong requests that socked up time to return to Raffa
merged cap test into inspector repository
consideration of how to reuse lifecycle scripts across components
- poetry multiproject plugin does ast rewriting, which seems extreme
- try to solve with common ’library’ project

2024-10-24

nightmare GHA PR from Khush
- large, copilot described and implemented
- discussed refactor and need for separate workflows
  - need a picture
docs PR from > IP
CVE in inspector base image
- nothing to be done?

Scan improvements

Should fail if jf is not installed

JFrog CLI version: /bin/sh: jf: command not found
Scan completed successfully for: inspector:0.8.4-dev

Put sys.exit calls inside method for better encapsulation

Use method composition for easy to read and test code

def scan():
    vsn = get_version()
    package()
    _verify_image()
    results = _scan()
    if is_critical(classify(results)):
        sys.exit()

how to reuse script in both components?

2024-10-22

clean up secrets-store test
- share single sa?
  - done and parameterised
  - also share secret provider class
- run reconciler
- test new cap test image in prod inspector against ce-nonprod-beta

2024-10-21

jira review
inspector test
- fixing missing permissions: services, then statefulsets
- also permission boundary to DescribeClusters that Khush is working on
BLOG, LOATHE: fine grained access controls create disproportionate amount of work and worse, probably a false sense of security since no one understands or audits them

2024-10-18

secrets-store cap test failure
- tmp fix to run only on the cluster it was designed for
- longer term, read NR relic from prod platform account

vpa cap test failure

    message: "Failed to delete all resource types, 1 remaining: admission webhook
    \"validate.kyverno.svc-fail\" denied the request: \n\nresource Parser/capability-testing/capability-testing-fluent-parser-multi-line-single-line
    was blocked due to the following policies \n\nvalidate-fluent-operator-parser-allowed:\n
    \ validate-fluent-operator-parser-allowed-requires-known-spec: '{\"regex\":{\"regex\":\"^(?\\u003cTIME\\u003e\\\\d+-\\\\d+-\\\\d+\n
    \   \\\\d+:\\\\d+:\\\\d+\\\\.\\\\d+)\\\\s+(?\\u003cLEVEL\\u003e\\\\S+) \\\\d+
    --- \\\\[\\\\s*(?\\u003cTHREAD\\u003e[^\\\\]]+main)\\\\]\n    (?\\u003cCONTEXT\\u003e\\\\S+)\\\\s+:
    (?\\u003cmessage\\u003e.*)$\",\"timeFormat\":\"%Y-%m-%d\n    %H:%M:%S.%L\",\"timeKey\":\"TIME\"}}
    is not among allowed list, allowed are: ''json''.'\n"

training
- sdlc101
- information security

2024-10-17

fix: lazy init NR so env vars not required for unrelated command use
rollout more inspector permissions: https://github.com/elsevier-centraltechnology/cortex-platform-definitions/pull/5129
broken cap tests in beta
- Keyword ‘pod “test-secrets-store-volume” status in namespace “capability-testing” is READY’ failed after retrying for 2 minutes. The last error was: ‘‘False’==‘True’’ should be true.
- Keyword ‘pod “test-secrets-store-environment” status in namespace “capability-testing” is READY’ failed after retrying for 2 minutes. The last error was: ‘‘False’==‘True’’ should be true.

2024-10-16: Dev10

GHA still failed (no separate commit for release, lack of clarity on requirement)
Luis test successful, revert the failure flag
FG resignation convo :-(

2024-10-15 (short day)

modify cap test and inspector lambda to support testing a failure thru flags
expedite rollout and testing with Luis
1h on SDLC maturity: eventually Irfan focused convo on traditional software (inspector et al.)
spent a couple of hours with Khush on GHA

2024-10-14

alpha rollout for calico perm change => bundle creation now fine
significant cap test failures
0.8.2 rollout failure
- tagged in github, but with wrong contents
- not deployed to prod lambda
  - 0.8.1 displayed in swagger, 0.8.2-dev in openapi.yaml
  - 0.8.1 from /info too, 0.8.2-dev in pyproject.toml
cronjob for confluence items
- can I run inspector cli from k8s cron or need endpoint?
- pain: build separate image, set lots of env, need to do aws auth => endpoint will be easier

secrets-store

     Warning  FailedMount  47s  kubelet  MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown d │
  │ esc = failed to mount secrets store objects for pod capability-testing/test-secrets-store-environment, err: rpc error: code =  │
  │ Unknown desc = eu-west-1: Failed fetching secret core-elsevier-platform-test/test-cluster/test-secret: WebIdentityErr: failed  │
  │ to retrieve credentials                                                                                                        │
  │ caused by: InvalidIdentityToken: No OpenIDConnect provider found in your account for https://oidc.eks.eu-west-1.amazonaws.com/ │
  │ id/6EF491F1E59868EA532811F93EBFB5AA                                                                                            │
  │   status code: 400, request id: 258e9341-58fb-47d5-8c8c-6c19d80501d4

2024-10-11

driving through the permission change affecting bundle creation
discovered Calico permission change was bigger than just the one (no surprise)
generate cap test cronjobs for all alpha clusters

2024-10-10

cleanup botched release
try to pull Khush back onto GHA as automation of manual without success
OKR review
- solo (20 mins)
- with Felipe

2024-10-09

testing updated fluent test with Ashish
fixing tests
accidental release, turned otu to eb due to workflow
```
on:
push:
  tags: ["[0-9]+.[0-9]+.[0-9]+"]
```
though unclear what pushed the tag
attempt to perform second release failed, possibly due to previous but also `poetry

2024-10-08

fixing tests
testing updated fluent test with Ashish
retro
run KSI report for Claire
- fix: (fix: get_namespaced_resource)[https://github.com/elsevier-centraltechnology/cortex-inspector/pull/218]

2024-10-07

SDLC training
- SDLC 201
  - train, assess, improve
  - maturity assessment: x2 p.a. team exercise for 1h
  - videos 1,2 of 4 (+2 optional)
investigate lack of daily cap test results
- missing env vars
CEIP-6463: update Inspector terraform for recently added env vars
- sorted that eventually but discovered
- strayed into diff between mocked and actual cap test responses in inspector
release new capability test image (0.4.0)
- manually tagged release (TODO)

2024-10-01 - 04

predominantly Inspector
- discuss and impl API change with Luis and Garrett
- integrate Khush’s PRs with inspector decorator and base class changes
- some tweaks to Cap Test to report as desired
- prep for release of both