2023-09-29

  • Ensure NPS ready for Monday
  • CEIP-4388: Package and evaluate Robot within an Argo hook
    • now need to get teh sync hook working
  • Build ppe alpha cluster
    • failed - of course, reports validation error on platform.yaml that appears to be spurious
      • stripped leading empty line and leading zero on cost code.
      • raised: CEIP-4542, CEIP-4543, CEIP-4544, CEIP-4545
      • and then the cluster creation failed within the reconciler!

2023-09-28

  • CEIP-4388: Package and evaluate Robot within an Argo hook
  • dev10 1/2 day for camunda con

2023-09-27

  • CEIP-4388: Package and evaluate Robot within an Argo hook

    • Tests running and passing in Argo
    • next steps: Argo sync hook
  • dev10 1/2 day for camunda con

    • public and private Konmector marketplace
    • maturity model
    • orchestration

2023-09-26

  • dev duty
  • CEIP-4388: Package and evaluate Robot within an Argo hook
    • COB: job running in Argo cluster but failing on roles

2023-09-25

  • CEIP-4388: Package and evaluate Robot within an Argo hook
    • COB: sorted push to artifactory but not pull

2023-09-24

  • CEIP-4388: Package and evaluate Robot within an Argo hook
    • create hello world job
    • figure out how to connect to Argo dev cluster (cortex-platform-manager-non-prod?)
    • deploy it.

2023-09-21

  • CEIP-4388: Package and evaluate Robot within an Argo hook
  • CEIP-4446: build cluster for SSDR (and deal with first failure)
    • internal network issue resolved by James Halstead
  • review AN’s robot PR

2023-09-21

2023-09-20

  • CEIP-4388: Package and evaluate Robot within an Argo hook
    • complete docker packaging
  • test coverage of ClusterOperator

2023-09-19

  • CEIP-4388: Package and evaluate Robot within an Argo hook
    • fix tests for running in Cortex
  • training video w Claire
  • CEIP-4446: build cluster for SSDR (and deal with first failure)

2023-09-18

  • CEIP-4388: Package and evaluate Robot within an Argo hook
    • building docker image, resolving zscaler issues for apk
    • complete aws auth inside docker

2023-09-15

  • CEIP-4388: Package and evaluate Robot within an Argo hook
    • complete cortex tests to set KUBECONFIG inside tests
    • building docker image, resolving zscaler issues for pip
    • start trying to do aws auth inside docker

2023-09-14

  • Maxim ‘Max’ Khan
    • cross-func teams
    • solve the tradeoff
    • ingredients for success
      • people: hungry, humble, smart
      • clarity over certainty
      • keep the enterprise hat on (strategic alignment of the trenches)

2023-09-13

2023-09-12

  • CEIP-4390 Investigate metric-server availability
    • Run thru tests again
    • commit PR
  • CEIP-4388: Package and evaluate Robot within an Argo hook
    • create Docker image: stuck on authenticating to aws and kubeconfig -> experimental python code

2023-09-11

  • CEIP-4390 Investigate metric-server availability
    • Chat with Joe: should be able to override CGNP
    • Added more K8s tests (+ve and -ve) to match the docs which JD and FG signed off

2023-09-08

  • CEIP-4390 Investigate metric-server availability
    • morning investigating (w FG and AN)
    • afternoon rewriting docs

2023-09-07

  • 1-2-1
  • engineering mtg:
    • fluentbit operator
    • kube library
  • adviser with Khush

2023-09-06

  • question: who is using each feature of platform?

    • could do in crtxctl
    • could do in kube-library (only one cluster at a time)
    • could do in troubleshoot?
    • specific questions:
      • who is using K8s network policies
  • CEIP-4390 Investigate metric-server availability

    • now seems to be netpol
      • export NAMESPACE=capability-testing; kubectl get ns ${NAMESPACE} -o json | jq '.spec.finalizers = []' | kubectl replace --raw "/api/v1/namespaces/${NAMESPACE}/finalize" -f -
    • long call with GP, FG, TV. Not able to identify the issue, but in desperation tried the example in the docs
      • COB Weds:
        • global policy in docs is sufficient to deny access from test namespace whilst allowing metrics server to continue working
        • therefore dropped previous global policy for now (though the ability for naive users to break the whole cluster with tools we give them seems like a bug)
        • attempted to deploy other 3 (k8s) policies listed in docs and example 3 & 4 won’t even validate!
        • example 3: error: error validating “testcases/calico/resources/example3.yaml”: error validating data: ValidationError(NetworkPolicy.spec.podSelector): unknown field “matchlabels” in io.k8s.apimachinery.pkg.apis.meta.v1.LabelSelector; if you choose to ignore these errors, turn validation off with –validate=false
        • example 4: error: error validating “testcases/calico/resources/example4.yaml”: error validating data: [ValidationError(NetworkPolicy.spec.ingress[0]): unknown field “to” in io.k8s.api.networking.v1.NetworkPolicyIngressRule, ValidationError(NetworkPolicy.spec.podSelector): unknown field “matchlabels” in io.k8s.apimachinery.pkg.apis.meta.v1.LabelSelector]; if you choose to ignore these errors, turn validation off with –validate=false
    • retro: I would not buy this product, might put up with it if it was free
  • Respond to Office Vibe (not sent) Radical simplification of technologies. Strong engagement with external trends especially open ones rather than trying to plough own furrow. This may cost productivity short term but prevents cul-de-sacs and maximises the good or successful choices Felipe Grazziotin - Sep 5, 2023 Thank you for the frank feedback. Can you ping me so we can document examples where we are being more insular than needed, so we can bring this back to the Senior Management Team in order to make real change. Your reply Today’s example: Calico deployed to offer network policy enforcement on what we’ve admitted was pretty well a brochureware justification.

    I don’t believe we have engaged with the product enough to understand it let alone diagnose it when it goes wrong. And we have placed this in Partner hands.

    If we had someone who was at least superficially engaged in the Calico community we would have relationships to reach out to for help and in time even be able to provide that help ourselves.

    We’ve seen similar things with Skipper and I suspect it is probably true of every component.

    Not doing this is basically just a hostage to fortune. We’ll get away with it most of the time I expect and if that is the business decision it may be a cost effective one. However in that case it would be better to scrap the pretense of platform engineering and allow each team to do the simplest thing that works.

    The third option is to say 80% of Partners don’t use 80% of the platform let’s simplify by dropping it

2023-09-05

  • CEIP-4139: capability testing
    • get Ashish up and running with KubeLibrary
    • start looking at docker image
  • CEIP-4390 Investigate metric-server availability
    • significant investigation with Ashish and Thomas and Joe
  • standup and planning
    • suggestion about Jira based planning
  • slack exchange about Go/Python/Robot with FG

2023-09-04

  • ops duty
    • recurrent tasks: metrics
  • CEIP-4139: capability testing
    • respond to Garrett’s comments

    • conversation with Garrett and Matteo fell into two parts: go and robot. This is the go part:

      • maintenance pain: go has a promise of backward compatibility

        • Matteo: expect that writing own tool chain in go would produce lower long term maintenance
      • desire for live documentation: support exists for cucumber, what about robot?

        • for example statically generated at test time
      • kubernetes own end-to-end test has something.

      • benefit of integrating capability tests closely with PlatMan / Build responsibilities

        • converge Build and Ops teams
        • end to end tests extracted to own ‘QA’ codebase, inc. dashboard etc.
      • go is engineering language of choice here (Garrett)

        • 99% (ish) SRE / K8s tools are written in go
          • this permits us to understand things that are not perfectly documented (which nothing ever is).
        • not a language that is easily abused (fewer ‘foot-guns’ as Felipe would say)
          • doesn’t have the multiplicity of options that Java / Python has
        • Go has better (single, integrated) build chain
      • benefit of single language choice across whole ’team’ (ce? Elsevier?)

        • difficulty of being two teams
    • the robot part was added to the RFP

2023-09-01

  • dev duty: fluent bit operator???

  • CEIP-4139: capability testing

    • wait until on namespace delete
  • crtxctl

    • crxtctl.get_cluster_operator
    • why GH_TOKEN mandatory?