Next:

alias cortex-prod-admrole=“aws sso login –profile cortex-prod-admrole && export RPROMPT=cortex-prod-admrole && export KUBECONFIG=~/.kube/clusters/cortex-prod-cluster”

  • report on calico global netowrk policy use
  • report on priority class usage
  • report on hpa use

2024-03-29: Good Friday

2024-03-28

  • CEIP-5504:
    • change permission boundary in bootstraps to permit capability tests to run under Inspector Agent role
    • engineering forum about tier 1
    • dev duty (minimal)

2024-03-27

2024-03-26

2024-03-25

  • CEIP-5504: external-dns post-sync hook still failing, investigate
    unset AWS_PROFILE AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN NAMESPACE KUBECONFIG RPROMPT
    cortex-prod-admrole 
    assume_role_helper arn:aws:iam::183742092277:role/Core-Elsevier-Platform-Service-Role
    assume_role_helper arn:aws:iam::781632261136:role/test-cluster-Cortex-Inspector-Agent
    
    produces
    2024-03-25 13:00:49 INFO Successful retrieved credentials for account: 183742092277
    2024-03-25 13:00:49 INFO Assumed role: EnterpriseAdmin
    2024-03-25 13:00:49 INFO Credentials expire at: 2024-03-25 17:00:48 +0000 GMT
    {
        "UserId": "AROASVR7B2P2526NVLCXP:assumed-role",
        "Account": "183742092277",
        "Arn": "arn:aws:sts::183742092277:assumed-role/Core-Elsevier-Platform-Service-Role/assumed-role"
    }
    An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:sts::183742092277:assumed-role/Core-Elsevier-Platform-Service-Role/assumed-role is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::781632261136:role/test-cluster-Cortex-Inspector-Agent
    {
        "UserId": "AROASVR7B2P2ZFQPG4MMC:stephensont@science.regn.net",
        "Account": "183742092277",
        "Arn": "arn:aws:sts::183742092277:assumed-role/AWSReservedSSO_EnterpriseAdmin_52f405afc5c213cb/stephensont@science.regn.net"
    }
    
    then:
    unset AWS_PROFILE AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN NAMESPACE KUBECONFIG RPROMPT
    cortex-prod-admrole 
    assume_role_helper arn:aws:iam::183742092277:role/Cortex-Inspector                   
    assume_role_helper arn:aws:iam::781632261136:role/test-cluster-Cortex-Inspector-Agent
    
    produces
    2024-03-25 13:03:17 INFO Successful retrieved credentials for account: 183742092277
    2024-03-25 13:03:17 INFO Assumed role: EnterpriseAdmin
    2024-03-25 13:03:17 INFO Credentials expire at: 2024-03-25 17:03:16 +0000 GMT
    
    {
        "UserId": "AROASVR7B2P2VPPTWVJQQ:assumed-role",
        "Account": "183742092277",
        "Arn": "arn:aws:sts::183742092277:assumed-role/Cortex-Inspector/assumed-role"
    }
    
    An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:sts::183742092277:assumed-role/Cortex-Inspector/assumed-role is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::781632261136:role/test-cluster-Cortex-Inspector-Agent
    {
        "UserId": "AROASVR7B2P2ZFQPG4MMC:stephensont@science.regn.net",
        "Account": "183742092277",
        "Arn": "arn:aws:sts::183742092277:assumed-role/AWSReservedSSO_EnterpriseAdmin_52f405afc5c213cb/stephensont@science.regn.net"
    }
    
  • CEIP-5505: Trusted Registries Check for 2024-03
    • passes?
      • need to explain quay.io reported when artifactory is supposed to be mirroring
  • Retro
    • requests-limits:
      • document best practice
      • kyverno policy for request and limit < 2x request
      • unless label explicitly takes over responsibility

2024-03-22

  • CEIP-5504: fix capability test image after move to poetry
    • tripped up over Karpenter-induced instability of argo dev
  • CEIP-5443: discuss w Khush and show him a way based on GitHub API and token

2024-03-21

  • external dns release
  • external dns test to apply to all clusters

2024-03-20

2024-03-19

  • dev duty
  • vulnerability patching
  • long chat with TV about csi secrets capability test

2024-03-18

  • core-kong-operations
    • ended on discussion with FG about moving from ALB and nginx listener conf to more conventional Skipper based
    • handed to him

2024-03-15: vacation

2024-03-14

  • ceip-4469 ksi migration kong
    • debugging the control plane portion of kong post refactoring
  • advisor
    • talked with KA
  • town hall
    • still over head count by 3%
    • advance science, benefit society
      • measured in customer spend!
    • vulnerability scanning continuously not at year end this year
    • adding sdlc
    • prioritise personal growth: growing @ tech (on nonsolus)
      • learning @ tech
      • communities @
      • wisdom vault @
    • sdlc
      • 30% in company less than 2 years
      • supporting people to get the job done

2024-03-13

2024-03-12

  • resolve the crtxctl cannot be released issue.
    • ended up simply applying the existing code in tio-terraformcontrol-ce/702267635140/oidc-github-actions but went via a big diversion on why there was a massive drift in tio-terraformcontrol-ce/702267635140/github-actions
  • worked with Khush on getting crtxctl into inspector
    • ultimately agreeing with Ashish that ’normal’ use of crtxctl should be via Inspector role not EnterpriseAdmin:
      export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s" \                                              
      $(aws sts assume-role \
      --role-arn arn:aws:iam::781632261136:role/Cortex-Inspector \
      --role-session-name test-as-inspector \
      --query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" \
      --output text))
      
    • daniel flagged that hardcoding any role in crtxctl may be problematic in a TPR process.

2024-03-11

  • retro
    • TIL: FG advocate specific role per workflow so can easily see role has minimum expected permissions
  • ceip-4469 ksi migration kong
    • need to get a way to confirm cntrol plane on infra then replicate on labs

Ref step 6, validate curl -i -X GET --url http://localhost:8001/services

curl -v -i -X GET --url http://internal-a47cfe2ec65b949d68a57c91bdb66f55-3302
29102.eu-west-1.elb.amazonaws.com:8001/services
...
* Empty reply from server

then

curl -v -i -X GET --url http://internal-a47cfe2ec65b949d68a57c91bdb66f55-330229102.eu-west-1.elb.amazonaws.com:8005
...
HTTP/1.1 400 Bad Request
...
The plain HTTP request was sent to HTTPS port

use TrustStore

curl --cacert mtls-ca-labs-TrustStore -v -i -X GET --url https://internal-aacaa81bace024a268c01e2e757f205e-171008392.eu-west-1.elb.amazonaws.com:8005

*   Trying 100.64.1.46:8005...
* Connected to internal-aacaa81bace024a268c01e2e757f205e-171008392.eu-west-1.elb.amazonaws.com (100.64.1.46) port 8005 (#0)
* ALPN: offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: mtls-ca-labs-TrustStore
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Request CERT (13):
* (304) (IN), TLS handshake, Certificate (11):
* SSL certificate problem: unable to get local issuer certificate
* Closing connection 0
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

trying to get a positive test from the infra cluster that is working but no dice

 curl --cacert mtls-ca-infra-TrustStore -v -i -X GET --url https://cluster.infra.kong-nonprod.cortex.elsevier.systems:8005/services
Note: Unnecessary use of -X or --request, GET is already inferred.
*   Trying 100.64.1.52:8005...
* Connected to cluster.infra.kong-nonprod.cortex.elsevier.systems (100.64.1.52) port 8005 (#0)
* ALPN: offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: mtls-ca-infra-TrustStore
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Request CERT (13):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Certificate (11):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-CHACHA20-POLY1305-SHA256
* ALPN: server accepted http/1.1
* Server certificate:
*  subject: C=GB; ST=Greater London; L=London; O=Elsevier Core Engineering; CN=cluster.infra.kong-nonprod.cortex.elsevier.systems
*  start date: Feb  8 00:00:33 2024 GMT
*  expire date: Apr  8 00:00:33 2024 GMT
*  subjectAltName: host "cluster.infra.kong-nonprod.cortex.elsevier.systems" matched cert's "cluster.infra.kong-nonprod.cortex.elsevier.systems"
*  issuer: C=GB; ST=Greater London; L=London; O=Elsevier Core Engineering; CN=Elsevier Kong mTLS CA Intermediary I28
*  SSL certificate verify ok.
* using HTTP/1.1
> GET /services HTTP/1.1
> Host: cluster.infra.kong-nonprod.cortex.elsevier.systems:8005
> User-Agent: curl/8.1.2
> Accept: */*
> 
< HTTP/1.1 400 Bad Request
HTTP/1.1 400 Bad Request
< Date: Fri, 08 Mar 2024 17:54:08 GMT
Date: Fri, 08 Mar 2024 17:54:08 GMT
< Content-Type: text/html; charset=UTF-8
Content-Type: text/html; charset=UTF-8
< Content-Length: 202
Content-Length: 202
< Connection: close
Connection: close

< 
<html>
<head><title>400 No required SSL certificate was sent</title></head>
<body>
<center><h1>400 Bad Request</h1></center>
<center>No required SSL certificate was sent</center>
</body>
</html>
* Closing connection 0
curl --cert infra-cert --key infra-key --cacert mtls-ca-infra-TrustStore https://cluster.infra.kong-nonprod.cortex.elsevier.systems:8005/services
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
</body>
</html>

2024-03-08

  • discuss robot KubeLibrary incluster config with Ashish
  • answer Shane on dataplaform about pulling images from Artifactory
  • 1-2-1:
    • mostly chat about Kong
      • shared concern about ending with diff spaghetti
    • potential OKR around serverless?
      • concern that not really visible
  • potential OKR around SBOM / support matrix: https://global-elsevier.slack.com/archives/C030F90FM7U/p1709892022071709
  • ceip-4469 ksi migration kong checked all values against infra, corrected a couple of cert related ones still unable to verify control plane

2024-03-07

2024-03-06

  • ceip-4469 ksi migration kong

    • continue on debugging connection of kong to db
      • does user exist? (previously done by k8s job)
  • potential blog / shower thought One of the things that has been in my mind recently is the ‘slow-moving’ projects you refer to. A corollary of being slow-moving is that when change does come it is inevitable that the world has moved on.

2024-03-05

  • ceip-4469 ksi migration kong
    • need permissions on kong/labs/lic
      {
        "Version" : "2012-10-17",
        "Statement" : [ {
          "Sid" : "AllowUseOfKey",
          "Effect" : "Allow",
          "Principal" : {
            "AWS" : "arn:aws:iam::595468393306:root"
          },
          "Action" : "secretsmanager:GetSecretValue",
          "Resource" : "*"
        }, {
           "Sid" : "AllowUseOfKey2",
          "Effect" : "Allow",
          "Principal" : {
            "AWS" : "arn:aws:iam::595468393306:role/nonprod-ctrl-labs-20240228-1648"
          },
          "Action" : "secretsmanager:GetSecretValue",
          "Resource" : "*"
        }, {
          "Sid" : "AllowRotatorLambdaToUpdate",
          "Effect" : "Allow",
          "Principal" : {
            "AWS" : "arn:aws:iam::595468393306:role/kong-mtls-ca-nonprod-role"
          },
          "Action" : [ "secretsmanager:PutSecretValue", "secretsmanager:GetSecretValue" ],
          "Resource" : "*"
        } ]
      }
      
    • many secrets empty, had to copy values manually from infra to labs, inc:
      • kong/labs/pg_ca_authority
      • kong/labs/newrelic-nri
      • gui_auth_conf
      • kong_pg_password

2024-03-04

  • ceip-4469 ksi migration kong
    • OK following the breadcrumbs:
      1. helm chart creates service account labs/labs-20240228-1648-kong

      2. service account has annotation eks.amazonaws.com/role-arn: arn:aws:iam::595468393306:role/nonprod-ctrl-labs-20240228-1648

      3. ^ role has policy nonprod-ctrl-labs-20240228-1648-0 including: “Action”: [ “secretsmanager:List*”, “secretsmanager:Get*”, “secretsmanager:Describe*” ], “Effect”: “Allow”, “Resource”: [ … “arn:aws:secretsmanager:eu-west-1:595468393306㊙️kong/mtls-ca/labs/Root/cert-*”, … ]

      4. pod labs/labs-20240228-1648-kong-init-migrations-zjb27 is attempting to start using service account labs-20240228-1648-kong in namespace labs (as expected) a) controlled by Job/labs-20240228-1648-kong-init-migrations … yet the pod fails to start with: MountVolume.SetUp failed for vol │ │ ume “secrets-store-inline” : rpc error: code = Unknown desc = failed to mount secrets store object │ │ s for pod labs/labs-20240228-1648-kong-init-migrations-zjb27, err: rpc error: code = Unknown desc │ │ = eu-west-1: Failed fetching secret kong/mtls-ca/labs/Root/cert: WebIdentityErr: failed to retriev │ │ e credentials which seems to indicate pod is not executing under the expected role.

      5. turned out to be trust relationship that used arn ending now service account name

         "ForAnyValue:StringLike": {
           "oidc.eks.eu-west-1.amazonaws.com/id/ECFA91D307CF68599D4A8B78A4C4B6F4:sub": "system:serviceaccount:labs:labs-20240228-1648-kong"
         }
      

2024-03-01

  • ceip-5329: remove deprecated template plugin from kong-mtls-rotation
    • TIL: terraform state can include providers are not required by the IaC so even after an upgrade you still have to have the things upgraded from!
  • ceip-4469 ksi migration kong
    • back to the point that helm install does not have permissions to mount the secrets
  • play around with robot / selenium