Lead the Site Reliability Engineering capabilities across multiple teams, managed workload, provided guidance and mentorship to junior team members
Collaborated with cross-functional teams including software development, IT operations, and security teams to improve reliability, scalability, and security of infrastructure and the surrounding processes
Developed and implemented proactive monitoring and alerting systems to ensure that issues are caught before they become critical
Worked with development teams to ensure that software releases are properly tested, validated and deployed in production environments
Developed and maintained documentation for infrastructure and processes related to Site Reliability Engineering
Participated in incident response and post-mortem analysis to identify the root cause of problems and implement preventative measures to avoid similar incidents in the future
Ensured compliance with industry standards and best practices related to Site Reliability Engineering and serverless systems
Evaluated and implemented new technologies and tools that can help improve system performance and reliability
Worked and Communicated with senior management and stakeholders regarding the strategy, progress and status of Site Reliability Engineering initiatives providing guidance at various levels
Contributed and assisted in building the support model for the programme, setting out the strategy for running critical service sustainably and reliably
: Senior SRE
As Tech Lead of the GOV.UK PaaS team, provided technical leadership of a large scale government wide cloud hosting platform
Planned, prioritised and built various components for tenants of GOV.UK PaaS (home to 200+ organisations, 2.5k applications and 2k backing services) including admin portal, billing statements and calculator, IPSec encryption for traffic between cells and routers, performance dashboards and alerts
Architected and led the design and development of various components for the Kubernetes platform including service operator, signed docker images, deployment and smoke testing pipelines
Worked with Senior Management, Product Managers, User Researchers, UI and Content Designers to understand user needs and design, plan and prioritise multiple streams of work to improve the platform
Provided expert advice to PaaS tenants across government and the wider public sector to help them get the most out of the platform and to establish best practice patterns and approaches. Feeding learnings from this support back into the team to help identify unmet needs
Developed GOV.UK PaaS and Build & Run processes with the use of Terraform, AWS, Concourse, Kubernetes, CloudFoundry, Bosh, YAML, Go, Shell Scripts, Postgres
Developed web applications for GOV.UK PaaS tenants with the use of Go, Node.js, TypeScript, Sinatra, JS, Webpack
Worked with Secure Continuous Delivery system using Git and GPG encryption to ensure integrity of developer commits prior to deployment
Line managed and mentored colleagues through career progression, more demanding tasks and new responsibilities
Troubleshoot complex network and systems issues through being on support or a incident lead
Actively patched various CVE's, performed security audits, triaged risks and pen-test findings
In-depth pair programming to build shared knowledge, onboard colleagues to complex systems and share the burden of more demanding tasks
Presented and demonstrated at show and tells, knowledge shares, meet-ups, conferences and online panels and podcasts
Driving progression, well being, and best working practices for the team
Worked in an Agile - Kanban environment
Used and managed ticketing system Pivotal Tracker for project management which involved creating epics and stories