Next:
alias cortex-prod-admrole=“aws sso login –profile cortex-prod-admrole && export RPROMPT=cortex-prod-admrole && export KUBECONFIG=~/.kube/clusters/cortex-prod-cluster”
- report on calico global netowrk policy use
- report on priority class usage
- report on hpa use
2024-03-29: Good Friday
2024-03-28
- CEIP-5504:
- change permission boundary in bootstraps to permit capability tests to run under Inspector Agent role
- engineering forum about tier 1
- dev duty (minimal)
2024-03-27
- OKRs into Work day
- CEIP-5504:
- add required permissions needed by robot tests to: https://github.com/elsevier-centraltechnology/cortex-inspector/blob/main/modules/cortex-inspector-agent/iam.tf
- Fix crtxctl to assume Inspector then Agent rather than Agent direct
2024-03-26
- CEIP-5504:
- Allow platform prod Inspector to assume Inspector agent and prevent Service Role: https://github.com/elsevier-centraltechnology/cortex-platform-infrastructure/pull/147
- Enable inspector to assume inspector agent role:https://github.com/elsevier-centraltechnology/tio-terraformcontrol-ce/pull/1282
- Adopt Inspector-Inspector-Agent within locally running containerised capability tests : https://github.com/elsevier-centraltechnology/cortex-operations/pull/395
- Still to come:
- Expand permissions so Agent can run the current tests
- Fix crtxctl to assume Inspector then Agent rather than Agent direct
2024-03-25
- CEIP-5504: external-dns post-sync hook still failing, investigate
producesunset AWS_PROFILE AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN NAMESPACE KUBECONFIG RPROMPT cortex-prod-admrole assume_role_helper arn:aws:iam::183742092277:role/Core-Elsevier-Platform-Service-Role assume_role_helper arn:aws:iam::781632261136:role/test-cluster-Cortex-Inspector-Agent
then:2024-03-25 13:00:49 INFO Successful retrieved credentials for account: 183742092277 2024-03-25 13:00:49 INFO Assumed role: EnterpriseAdmin 2024-03-25 13:00:49 INFO Credentials expire at: 2024-03-25 17:00:48 +0000 GMT { "UserId": "AROASVR7B2P2526NVLCXP:assumed-role", "Account": "183742092277", "Arn": "arn:aws:sts::183742092277:assumed-role/Core-Elsevier-Platform-Service-Role/assumed-role" } An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:sts::183742092277:assumed-role/Core-Elsevier-Platform-Service-Role/assumed-role is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::781632261136:role/test-cluster-Cortex-Inspector-Agent { "UserId": "AROASVR7B2P2ZFQPG4MMC:stephensont@science.regn.net", "Account": "183742092277", "Arn": "arn:aws:sts::183742092277:assumed-role/AWSReservedSSO_EnterpriseAdmin_52f405afc5c213cb/stephensont@science.regn.net" }
producesunset AWS_PROFILE AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN NAMESPACE KUBECONFIG RPROMPT cortex-prod-admrole assume_role_helper arn:aws:iam::183742092277:role/Cortex-Inspector assume_role_helper arn:aws:iam::781632261136:role/test-cluster-Cortex-Inspector-Agent2024-03-25 13:03:17 INFO Successful retrieved credentials for account: 183742092277 2024-03-25 13:03:17 INFO Assumed role: EnterpriseAdmin 2024-03-25 13:03:17 INFO Credentials expire at: 2024-03-25 17:03:16 +0000 GMT { "UserId": "AROASVR7B2P2VPPTWVJQQ:assumed-role", "Account": "183742092277", "Arn": "arn:aws:sts::183742092277:assumed-role/Cortex-Inspector/assumed-role" } An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:sts::183742092277:assumed-role/Cortex-Inspector/assumed-role is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::781632261136:role/test-cluster-Cortex-Inspector-Agent { "UserId": "AROASVR7B2P2ZFQPG4MMC:stephensont@science.regn.net", "Account": "183742092277", "Arn": "arn:aws:sts::183742092277:assumed-role/AWSReservedSSO_EnterpriseAdmin_52f405afc5c213cb/stephensont@science.regn.net" } - CEIP-5505: Trusted Registries Check for 2024-03
- passes?
- need to explain quay.io reported when artifactory is supposed to be mirroring
- passes?
- Retro
- requests-limits:
- document best practice
- kyverno policy for request and limit < 2x request
- unless label explicitly takes over responsibility
- requests-limits:
2024-03-22
- CEIP-5504: fix capability test image after move to poetry
- tripped up over Karpenter-induced instability of argo dev
- CEIP-5443: discuss w Khush and show him a way based on GitHub API and token
2024-03-21
- external dns release
- external dns test to apply to all clusters
2024-03-20
- core-kong-operations
- cleanup of operations:
2024-03-19
- dev duty
- vulnerability patching
- long chat with TV about csi secrets capability test
2024-03-18
- core-kong-operations
- ended on discussion with FG about moving from ALB and nginx listener conf to more conventional Skipper based
- handed to him
2024-03-15: vacation
2024-03-14
- ceip-4469 ksi migration kong
- debugging the control plane portion of kong post refactoring
- advisor
- talked with KA
- town hall
- still over head count by 3%
- advance science, benefit society
- measured in customer spend!
- vulnerability scanning continuously not at year end this year
- adding sdlc
- prioritise personal growth: growing @ tech (on nonsolus)
- learning @ tech
- communities @
- wisdom vault @
- sdlc
- 30% in company less than 2 years
- supporting people to get the job done
2024-03-13
2024-03-12
- resolve the crtxctl cannot be released issue.
- ended up simply applying the existing code in
tio-terraformcontrol-ce/702267635140/oidc-github-actionsbut went via a big diversion on why there was a massive drift intio-terraformcontrol-ce/702267635140/github-actions
- ended up simply applying the existing code in
- worked with Khush on getting
crtxctlinto inspector- ultimately agreeing with Ashish that ’normal’ use of crtxctl should be via Inspector role not EnterpriseAdmin:
export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s" \ $(aws sts assume-role \ --role-arn arn:aws:iam::781632261136:role/Cortex-Inspector \ --role-session-name test-as-inspector \ --query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" \ --output text)) - daniel flagged that hardcoding any role in
crtxctlmay be problematic in a TPR process.
- ultimately agreeing with Ashish that ’normal’ use of crtxctl should be via Inspector role not EnterpriseAdmin:
2024-03-11
- retro
- TIL: FG advocate specific role per workflow so can easily see role has minimum expected permissions
- ceip-4469 ksi migration kong
- need to get a way to confirm cntrol plane on infra then replicate on labs
Ref step 6, validate
curl -i -X GET --url http://localhost:8001/services
curl -v -i -X GET --url http://internal-a47cfe2ec65b949d68a57c91bdb66f55-3302
29102.eu-west-1.elb.amazonaws.com:8001/services
...
* Empty reply from server
then
curl -v -i -X GET --url http://internal-a47cfe2ec65b949d68a57c91bdb66f55-330229102.eu-west-1.elb.amazonaws.com:8005
...
HTTP/1.1 400 Bad Request
...
The plain HTTP request was sent to HTTPS port
use TrustStore
curl --cacert mtls-ca-labs-TrustStore -v -i -X GET --url https://internal-aacaa81bace024a268c01e2e757f205e-171008392.eu-west-1.elb.amazonaws.com:8005
* Trying 100.64.1.46:8005...
* Connected to internal-aacaa81bace024a268c01e2e757f205e-171008392.eu-west-1.elb.amazonaws.com (100.64.1.46) port 8005 (#0)
* ALPN: offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
* CAfile: mtls-ca-labs-TrustStore
* CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Request CERT (13):
* (304) (IN), TLS handshake, Certificate (11):
* SSL certificate problem: unable to get local issuer certificate
* Closing connection 0
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
trying to get a positive test from the infra cluster that is working but no dice
curl --cacert mtls-ca-infra-TrustStore -v -i -X GET --url https://cluster.infra.kong-nonprod.cortex.elsevier.systems:8005/services
Note: Unnecessary use of -X or --request, GET is already inferred.
* Trying 100.64.1.52:8005...
* Connected to cluster.infra.kong-nonprod.cortex.elsevier.systems (100.64.1.52) port 8005 (#0)
* ALPN: offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
* CAfile: mtls-ca-infra-TrustStore
* CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Request CERT (13):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Certificate (11):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-CHACHA20-POLY1305-SHA256
* ALPN: server accepted http/1.1
* Server certificate:
* subject: C=GB; ST=Greater London; L=London; O=Elsevier Core Engineering; CN=cluster.infra.kong-nonprod.cortex.elsevier.systems
* start date: Feb 8 00:00:33 2024 GMT
* expire date: Apr 8 00:00:33 2024 GMT
* subjectAltName: host "cluster.infra.kong-nonprod.cortex.elsevier.systems" matched cert's "cluster.infra.kong-nonprod.cortex.elsevier.systems"
* issuer: C=GB; ST=Greater London; L=London; O=Elsevier Core Engineering; CN=Elsevier Kong mTLS CA Intermediary I28
* SSL certificate verify ok.
* using HTTP/1.1
> GET /services HTTP/1.1
> Host: cluster.infra.kong-nonprod.cortex.elsevier.systems:8005
> User-Agent: curl/8.1.2
> Accept: */*
>
< HTTP/1.1 400 Bad Request
HTTP/1.1 400 Bad Request
< Date: Fri, 08 Mar 2024 17:54:08 GMT
Date: Fri, 08 Mar 2024 17:54:08 GMT
< Content-Type: text/html; charset=UTF-8
Content-Type: text/html; charset=UTF-8
< Content-Length: 202
Content-Length: 202
< Connection: close
Connection: close
<
<html>
<head><title>400 No required SSL certificate was sent</title></head>
<body>
<center><h1>400 Bad Request</h1></center>
<center>No required SSL certificate was sent</center>
</body>
</html>
* Closing connection 0
curl --cert infra-cert --key infra-key --cacert mtls-ca-infra-TrustStore https://cluster.infra.kong-nonprod.cortex.elsevier.systems:8005/services
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
</body>
</html>
2024-03-08
- discuss robot KubeLibrary incluster config with Ashish
- answer Shane on dataplaform about pulling images from Artifactory
- CEIP-4508: doc ticket added
- 1-2-1:
- mostly chat about Kong
- shared concern about ending with diff spaghetti
- potential OKR around serverless?
- concern that not really visible
- mostly chat about Kong
- potential OKR around SBOM / support matrix: https://global-elsevier.slack.com/archives/C030F90FM7U/p1709892022071709
- ceip-4469 ksi migration kong checked all values against infra, corrected a couple of cert related ones still unable to verify control plane
2024-03-07
- dev duty: basically nothing I could do with these
- AMI in hand with FG
- front end PR from Jonathan for Luke
- ceip-4469 ksi migration kong
- fixed permissions on init-db job
- actual error I am getting is To start a new installation from scratch, run ‘kong migrations bootstrap’.
- documentation seems to suggest it is a script to be run: https://docs.konghq.com/gateway/latest/install/docker/#prepare-the-database
- helm chart configured with migration sidecars on as per: https://github.com/Kong/charts/tree/main/charts/kong#migration-sidecar-containers
helm uninstallrequired to make it run thekong migrations bootstrapscript
2024-03-06
ceip-4469 ksi migration kong
- continue on debugging connection of kong to db
- does user exist? (previously done by k8s job)
- continue on debugging connection of kong to db
potential blog / shower thought One of the things that has been in my mind recently is the ‘slow-moving’ projects you refer to. A corollary of being slow-moving is that when change does come it is inevitable that the world has moved on.
2024-03-05
- ceip-4469 ksi migration kong
- need permissions on kong/labs/lic
{ "Version" : "2012-10-17", "Statement" : [ { "Sid" : "AllowUseOfKey", "Effect" : "Allow", "Principal" : { "AWS" : "arn:aws:iam::595468393306:root" }, "Action" : "secretsmanager:GetSecretValue", "Resource" : "*" }, { "Sid" : "AllowUseOfKey2", "Effect" : "Allow", "Principal" : { "AWS" : "arn:aws:iam::595468393306:role/nonprod-ctrl-labs-20240228-1648" }, "Action" : "secretsmanager:GetSecretValue", "Resource" : "*" }, { "Sid" : "AllowRotatorLambdaToUpdate", "Effect" : "Allow", "Principal" : { "AWS" : "arn:aws:iam::595468393306:role/kong-mtls-ca-nonprod-role" }, "Action" : [ "secretsmanager:PutSecretValue", "secretsmanager:GetSecretValue" ], "Resource" : "*" } ] } - many secrets empty, had to copy values manually from infra to labs, inc:
- kong/labs/pg_ca_authority
- kong/labs/newrelic-nri
- gui_auth_conf
- kong_pg_password
- need permissions on kong/labs/lic
2024-03-04
- ceip-4469 ksi migration kong
- OK following the breadcrumbs:
helm chart creates service account labs/labs-20240228-1648-kong
service account has annotation eks.amazonaws.com/role-arn: arn:aws:iam::595468393306:role/nonprod-ctrl-labs-20240228-1648
^ role has policy nonprod-ctrl-labs-20240228-1648-0 including: “Action”: [ “secretsmanager:List*”, “secretsmanager:Get*”, “secretsmanager:Describe*” ], “Effect”: “Allow”, “Resource”: [ … “arn:aws:secretsmanager:eu-west-1:595468393306㊙️kong/mtls-ca/labs/Root/cert-*”, … ]
pod labs/labs-20240228-1648-kong-init-migrations-zjb27 is attempting to start using service account labs-20240228-1648-kong in namespace labs (as expected) a) controlled by Job/labs-20240228-1648-kong-init-migrations … yet the pod fails to start with: MountVolume.SetUp failed for vol │ │ ume “secrets-store-inline” : rpc error: code = Unknown desc = failed to mount secrets store object │ │ s for pod labs/labs-20240228-1648-kong-init-migrations-zjb27, err: rpc error: code = Unknown desc │ │ = eu-west-1: Failed fetching secret kong/mtls-ca/labs/Root/cert: WebIdentityErr: failed to retriev │ │ e credentials which seems to indicate pod is not executing under the expected role.
turned out to be trust relationship that used arn ending now service account name
"ForAnyValue:StringLike": { "oidc.eks.eu-west-1.amazonaws.com/id/ECFA91D307CF68599D4A8B78A4C4B6F4:sub": "system:serviceaccount:labs:labs-20240228-1648-kong" }
- OK following the breadcrumbs:
2024-03-01
- ceip-5329: remove deprecated
templateplugin fromkong-mtls-rotation- TIL: terraform state can include providers are not required by the IaC so even after an upgrade you still have to have the things upgraded from!
- ceip-4469 ksi migration kong
- back to the point that helm install does not have permissions to mount the secrets
- play around with robot / selenium