October logbook

TODO look at regression tests
PfP: Found postman/README.md - TODO try it out
TODO: Splunk training: https://hscic365.sharepoint.com/sites/NMS/SitePages/Splunk-Training.aspx
TODO

2025-10-31

rubber duck
- devcontainer PRs separated out because of getent, won’t work on windows
- connor is convinced it’s permissions
- devcontainer cli can build everything in image ahead of opening vscode
tango standup
- 4 weeks for proxy access
- need to get the spike done
- 1 day to finish 5803
- ask on dev channel

2025-10-30

ITOC catch up
- follow up with Tom O’Donoghue tom.odonoghue1@nhs.net
5803:
- Jim has hardcoded the prod app kid as a secret
  - probably not secret
- decide with Connor to parameterise properly

docker container building

RUN x && y better than RUN x; y because the latter continues on failure
if [ x = y ] is correct Posix and bash, though bash seems to tolerate == without [[ ]] 🤷

some containers () need tmp dir fix:

# Anticipate and resolve potential permission issues with apt
RUN mkdir -p /tmp && chmod 1777 /tmp

can avoid intermittent failures on poor network with

# Improve apt reliability: increase retries and disable pipelining
RUN echo 'Acquire::Retries "5";' > /etc/apt/apt.conf.d/80-retries \
  && echo 'Acquire::http::Pipeline-Depth "0";' > /etc/apt/apt.conf.d/99nopipeline

2025-10-29

5913: review of Connor’s PR lead me down a rabbit hole to understand tsconfig.json
- https://www.totaltypescript.com/tsconfig-cheat-sheet proved useful
- some explanations from AI TypeScript needs either a paths mapping or a project reference in tsconfig.json to resolve @ts1/l1 to the local source during development.
  paths: Tells TypeScript how to resolve module imports during development (e.g., @ts1/l1 → ../common/l1/src). This is useful for local development and editor IntelliSense, but does not enforce build order or incremental builds.
  references: Tells TypeScript that your project depends on another local project, enabling incremental builds, enforcing build order, and allowing tsc –build to work across packages.
  Best practice in a monorepo: Use both:
  references for correct build order and incremental builds. paths for local module resolution and editor support. If you only use paths, TypeScript will resolve the import, but you lose incremental build and project dependency features. If you only use references, you may get build errors or lack editor IntelliSense unless the referenced package is built.
5934: ready to collect and notifications requested

2025-10-28

5803: ready for next stage (switch to new secret)
logger
- adding new module to monorepo (now psu-utilities instead of logger)
- https://github.com/govuk-one-login/onboarding-self-service-experience
splunk reporting: AEA-5801, AEA-4876
eps devcontainer stopped working so after much reading…
- you’re right the docker-outside-docker is supposed to fix this - it doesn’t
- –group-add=nnn works but would require us to all have the same docker group id (987 for me, 999 for you)
- I have systemd-journal on 999, which feels deliberate - are you on some pre-systemd vsn of ubuntu? (or maybe you upgraded from one and docker already had 999)
- –group-add=docker works if the Dockerfile and host system have the same group id
- adding the group with a fixed id matching the host in the Dockerfile is not apparently enough to allow the –group-add=docker to work (which is surprising to me)
- tried a postCreateCommand (getent group docker | cut -d: -f3) … no permission to groupmod
- added vscode to sudoers and got a whole host of package not found errors I have no idea why this suddenly stopped working. I moved docker images to a separate disk but its not obvious why that would have caused it.

2025-10-27

remove deliveryStatus now report has moved over
5934:
- rollout group by psuRequestId
- agree with Pete to review over time
- however, ready to collect all null, so need to look again
- confirmed ready to collect in int after Kayal provided data
- discuss w Pete, agree to monitor => moved to in review
5936: rolled out
planning for move to proxigen

2025-10-24

get rollouts done!
finish 5853
5934: back to the drawing board in some ways
- promising in {{int}} need to try in prod
- no ready to collect data in int so assume good for now
- new approach to notifications attempted correct bt will never show on the right day
5936: simple date formatting

2025-10-23

5924: new distribution report
5905: rework for lower memory
5914: done but blocked by 5905
5828: rolled out
5853: probably half the day trying to fix commit signing

2025-10-22

planning
retro
5828:
- test data from Kayal
- figured out the report
  - coalesce to get ODS code
  - exclude last 24h
  - potential problem with rejected
- pass to Ant for deployment
5914:
- discuss with Pete
- fix query
- check to Ant
5906:
- close as dupe
pair with connor on the config change

2025-10-22

review mtg
- problem statement
  - for maximum flexibility we apply 3 levels of notification enablement
    - individual pharmacy sites,
    - pharmacy systems as a whole
    - block individual sites, overriding system setting
  - the pilot enables 104 individual sites
  - post pilot we expect to enable 4 systems initially, but there may be occasional reasons to en/disable individual sites
  - pre-pilot we only had one test ODS code so it was not immediately apparent that the report on all prescription notifications was not broken down by site.
  - we need breakdown by site to identify any sites that are using the triggering notifications differently to the expected pattern.
- diagnosis
  - show initial splunk report
  - highlight that:
    - the report uses multiple sources, transformations and aggregations
    - hence we need to breakdown into parts to diagnose issues.
    - this identified two places that we were not logging ODS code
- solution
  - two trivial PRs each simply adding ODS code to logging
  - fixed report
5828:
- need to ask Kayal to create new presciption updates
5914:
- investigate PfP state machine, sequence etc

2025-10-21

5828:
- merged PR
- now need to push to int and return to running queries
log best practice
- can I use middy to inject to each module?
- what is the TypeScript way to use factory
- what is pino’s answer?
- searched github and found: https://github.com/govuk-one-login/onboarding-self-service-experience
5087:
- stumbled on potential solution in diff repo that spine server should be set to sandbox to disable
  - unfortunately, not the solution in https://github.com/NHSDigital/electronic-prescription-service-api

2025-10-20

5905, 6 & 7:
- gonna have to deploy something and see if I can generate logs
5828:
- had to fix logging after backing out the full change
- get connor to approve

2025-10-17

5828:
- worked thru the individual lifecycle of a message
- got most of the columns but identified the nhsNotify does not log ODS code
- chat with Pete and Kayal suggested excluding the last 24h
- also that client not registered is beign lost (should be rejected)
- Kayal explained nhsNumber for each test case:
  - …126 - notifications on - 2 scripts, 1 read, 1 not
  - …526 - notifications off
  - …061 - not registered
5905, 6 & 7:
- started investigating and narrowed it down to not getting the changed msg Jim introduced within 5389
```
  ( source="aws:loggroup:/aws/lambda/pfp-GetMyPrescriptions"
AND message.message = "Processing PfP get prescriptions request for patient. They have these relevant ODS codes, and the PfP request was made via this apigee application." )
```
  - suspicious of initialising separate logger in responses.ts https://github.com/NHSDigital/prescriptionsforpatients/pull/2078/files

2025-10-16

5901: spoke to Connor about the fact that something is needed
- start with supplier id and if mre is needed jump to CDK
5853:
- tidied up and handed to Connor ‘cos Pete chasing (while on call with Supplier / client!)
5828: (already mid-afternoon before I could start)
- need to work on splunk query to understand what is happening in prescription updater and then in notifyProcessor

2025-10-15: DigiMeds day in Leeds

2025-10-14

5828
- report meeting w Ant:
  - a typo on a status returned from Notify (need to check notification_attempted not with a space between. This will explain the zero you’re seeing in the attempted_notifications_disabled_count Pete
  - a question arose about whether we care about the number of notifications not sent to Notify i.e. those that don’t match the specified ODS Codes. I have assumed ’no’ but would be good to confirm @Pete Hurst as I’m not seeing a comment either way within 5828
  - we are still seeing numbers that we cannot explain based on our understanding of Notify return status but need to trace in a nonprod environment
- Kayal working on test data
- Push Connor for +1 again tomorrow (rebutted his concern)
5853:
- last thing was abstracting the config inc. test.
- tests need to be updated
- embarked on a copilot refactoring to create common fixtures
  - some success but had to flip back to 5828
5828: initial investigation with Kayal: https://nhsdigital-platforms.slack.com/archives/D09F1T9FQJY/p1760452606466939 and following

2025-10-13

5508: PSU deploy_api.sh hitting timeout, which is on our lambda
- raise –cli-read-timeout. At the least this should get more information from our lambda (PTL in account resources) That lambda calls the APIM, calls Apigee, weight
- resolution was issue at Apigee end: https://nhsdigital-platforms.slack.com/archives/C04173JT5NV/p1760349319720889
- our proxygen lambda opportunities for simplification
  - is logging full swagger spec, should reduce
  - missing logging of headers
- TODO: Observed that deploy_api.sh is lightly modified copy and paste for each repo.
  - “As part of moving PfP to proxygen could centralise” –Ant
Delegated patient access mtg: https://nhsdigital-platforms.slack.com/archives/C09LNHYNWBA/p1760349602894419 And following
- Mtg with James Taylor
  - depends on APIM work, we should be able to start 1 Nov
  - need to add actor to spine interaction too (for SAR logging).
  - need to confirm APIM (Rishi) can do claim to header translation or we cannot move to proxygen
5828: reports have issues.
- Kayal can provide data if we match ODS codes in int to prod
  - created: https://github.com/NHSDigital/eps-prescription-status-update-api/pull/2313
- psu-PrescriptionStatusUpdates table has pharmacyODSCode but psu-PrescriptionNotivicationStatesv1 has ODSCode
- Pete says no longer want item 10. notification_in_progress_count - count of notifications where the data has been initialised and Notify is processing
  - lines of code ir cur dir
```
find . -type f -name '*.py' -o -name '*.js' -o -name '*.ts' -o -name '*.rb' | xargs wc -l
```
    psu nhsNotifyUpdateCallback: 2958 -> 2858
  - questions
    - is splunk querying case sensitive? Yes
    - do we have a preferred case for splunk attributes? Yes but not fussy enough to change existing

2025-10-10

5828: - maybe look at raw data in int - check report early and take to rubber duck: https://nhsdigital-platforms.slack.com/archives/C08MBDLCU0P/p1760008192658919
3730: need to ask if any technical merit in using Service API as none address wise.
- cancelled
5508: timeout reoccurred: https://github.com/NHSDigital/eps-prescription-status-update-api/actions/runs/18216632173/job/52359078386?pr=2267 - take to rubber duck: keep trying! - also outdated and also SQ complaining about duplicated code in testHandler.test.ts
ITOC-12791: alert on variance between messages to Notify and response from them. needs to be rethought
ITOC-12790: variance between created and ready to collect (i think?) Ant explained as normal while subset of sites onboarded
5853: is it just about specifying the NHS Numbers?
- yes, basically, there are reasons why diff suppliers have diff numbers
  - though also the ability to turn on and off as supplier starts and ends testing

2025-10-09

5508: timeout failure on proxygen lambda: https://github.com/NHSDigital/eps-prescription-status-update-api/actions/runs/18216632173?pr=2267
5828: - maybe look at raw data in int - check report early and take to rubber duck: https://nhsdigital-platforms.slack.com/archives/C08MBDLCU0P/p1760008192658919
cdk or sam: read up on stack refactoring: https://nhsdigital-platforms.slack.com/archives/C09KR1GSGDS/p1760014059693209
5803: Merged step 1 or 3.
5853: created SAM templates inc. the NHS numbers for each test - rolled back prob. due to error referencing: https://github.com/NHSDigital/prescriptionsforpatients/actions/runs/18379218115/job/52361835468?pr=2097 - looked at powertools for reading ssm, - need to have a stack name for get or multiple - maybe why Jim used getParametersAByName? - but then, how diff values by env? (merge env into name?)

2025-10-08

5828: analyse, clarify with Pete and impl. No test data in int so pushed to Ant.

2025-10-07

5803: requested new secret merge: https://github.com/NHSDigital/electronic-prescription-service-account-resources/pull/1558
5852: splunk query change proposed and in review

2025-10-06

5851: rubber duck steered away from an ad-hoc, full CI approach to ‘just’ cloud formation
- did that, then reworked to store the codes in a file.
- stand-up
  - once done, need to rework the KOP Ant prepared.
- prep. full pilot ods code file
5087:
Proxygen
- 6 months+ since APIM team said there was no way to migrate from non-proxygen to proxygen
- avoided this with FHIR API because used diff endpoint
- could do this here? potentially cos only one client to migrate (app)
  - engage with app team
  - url could be api.nhs.com/prescriptionsforpatientsv2/…
- have to engage with APIM for onboarding anyway.
  - Jonathon Eagle
  - Pete to set up mtg
- onboarding discussion
  - create public - private key pair
- also need diff auth mechanism
  - https://nhsd-confluence.digital.nhs.uk/spaces/APM/pages/778782975/Configuring+Target+Identity+Headers
- also need to understand how patient proxy will be handled here (separate and secondary issue)
- once have keys basically deploy OAS thru Proxygen
  - maybe some tweaks says Matt
- could start by enhancing regression tests

2025-10-03

walkthru of concurrent regression test by Connor
tango standup: Paul wants it to be distinct from whole-team standup, which is largely for Jen.
5851: draft automation spike ticket for en/disable sites and suppliers.
timesheet:
- airecentre
- Jira:
5087:
- remember this is for sandbox but not a distinct codebase as for psu
- SANDBOX_MODE_ENABLED env var in the GHA yamls
- written as “0” in cdk.json
  - should it be written like MTLS is by fix_cdk_json.sh?
```
fix_string_key sandboxModeEnabled "${SANDBOX_MODE_ENABLED}"
fix_boolean_number_key enableMutualTls "${ENABLE_MUTUAL_TLS}"
```

2025-10-02

3730:

John put in service now request for Apigee access

service search API: https://digital.nhs.uk/developer/api-catalogue/directory-of-healthcare-services

PfP Int lambda has access to live API, use its env to find secret mgr ARN of subscription key

export SVC_KEY=foo
export ODSCODE=XX123  # note can be comma separated list
curl -H "subscription-key: "$SVC_KEY \
  "https://api.nhs.uk/service-search?api-version=2&searchFields=ODSCode&search="$ODSCODE \
  | jq -r '.value[] | [.Address1, .Address2, .Address3, .City, .County, .Postcode] | join(",")'

spine equiv.

API
No auth needed

export ODSCODE=XX123
curl -X GET "https://uat.directory.spineservices.nhs.uk/STU3/Organization/$ODSCODE" \
 -H "accept: application/fhir+json"

2025-10-01

3730: return to it finally
- GOAL: call Service Search API for addresses
- GOAL: Splunk query for random ODS codes (then ask someone to run in prod)
- GOAL: call Spine for addresses
- make sam-run-local => parked
  - make sam-run-local results in No current session found, using default AWS::AccountId
  - make sso-login did not resolve
  - however aws sts get-caller-identity reports Error when retrieving token from sso: Token has expired and refresh failed
  - that clued me into the non-std session I created.
  - reran make aws-configure being sure to specify sso-session and then get-caller-identity returns a session
  - reran sam-run-local and still getting Error: [Errno 24] inotify instance limit reached
  - lets park the run local
- Postman => parked
  - TODO Need Apigee non-prod login
- AWS Toolkit
  - found pfp-GetMyPrescriptions (no PR deployed yet)
  - open remote invocation, I think we will have to keep test data files in this repo, maybe ask what others think

October logbook

2025-10-31

2025-10-30

2025-10-29

2025-10-28

2025-10-27

2025-10-24

2025-10-23

2025-10-22

2025-10-22

2025-10-21

2025-10-20

2025-10-17

2025-10-16

2025-10-15: DigiMeds day in Leeds

2025-10-14

2025-10-13

2025-10-10

2025-10-09

2025-10-08

2025-10-07

2025-10-06

5087:

2025-10-03

2025-10-02

2025-10-01