Me, Myself and I: A Technical Portfolio & History
Think of this as a more extended version of my CV. I like to keep it short and sweet over there, and this is where I can ramble about why I’m where I am, and where I want to be.
Contents
Career History #
Jaguar Land Rover - Jan 2023 - Present - Senior Site Reliability Engineer #
A leading car manufacturer that makes modern luxury and customer care its focus
Roles & Responsibilities
- Providing Developers an ‘Embedded’ SRE
- Ensuring Developers Release Safe & Tested Code
- Keeping a Multi-Tenanted Kubernetes Environment Performant, Secure & Monitored
- Creating Self-Serve Terraform Modules
- Finding New Ways to Observe our Kubernetes Environments
- Creating & Presenting PoC’s for New Projects
Tech Stacks & Skillsets Learned
- GCP Architecture - Intermediate
- Terraform Module Creation & Management - Intermediate
- GitLab Administration - Junior
- GCP Alerts & Monitoring - Intermediate
- Prometheus Monitoring - Intermediate
- Google Container Registry - Intermediate
- Kubernetes Storage - Intermediate
- Multi-Cluster Kubernetes - Intermediate
Notable Projects
My first ‘big break’ at JLR surprisingly wasn’t a technical one. I felt really comfortable starting out, and already met some wonderful people quite early on. I got inspired by one of my colleagues and decided to make a presentation on my experiences with Neurodiversity in the workplace. I approached it from telling my story both good and bad, and how you can help someone that might not even know they need it. It’s a world with a lot of gray areas, but after I presented the talk, a few colleagues reached out to me to say how they enjoyed how black and white I made it. My goal was to send a clear message, and thankfully, it seemed to be received well. A few months layer in July, I got a shoutout from one of the people who attended that presentation, and said it inspired them to make their own talk on how certain activities improved their mental health.
On the technical side of things, I knew that things would be a big challenge with the huge scale that a company like JLR operates at. What I was also aware of though is that often-times, all that is needed is a pair of eyes and a willingness to learn. One of the first things I tackled was a bit of a strange one, and something you think would be easier to handle with GitLab & Terraform: Deploying only what you have changed in Terraform.
Without going into too much detail, applications often have their own terraform that deploys the infrastructure that’s needed to make it work in our Kubernetes clusters. The problem with how things were currently working is that no matter what, every pipeline step would run. Initially, it sounds like a simple fix, but then you have to change a lot of stuff depending on the environment that the app is being deployed to. You then have to implement a solution that has zero impact to developers. No changes in working, and definitely no pipeline downtime. This made things tricky, but a really good challenge. In short, it needed a lot of pipeline refactoring, and I ended up finding some very strange limitations from GitLab relating to how rules are interpreted (or… not) when you use anchors rather than extends.
This project also ended up creating a wider discussion on how our pipelines work for things outside terraform. As a team, we decided that we just need to approach GitLab pipelines differently. More modular. That in particular has been a great learning point, and has really helped shape my mind for how important versatility is into these kinds of processes.
ANS Group - Sep 2022 - Dec 2022 - DevOps Engineer - COE #
An ambitious, customer-obsessed data-transformation provider
Roles & Responsibilities
- Working within Agile Sprints to Deliver Customer Migrations, New Builds & Console-IaC Conversion Project
- Working with Solution Architects in HLD and LLD in Discovery Workshops
- Providing an Escalation Point for Managed Services
- Assisting Developers with Internal Load Testing & Performance Optimization using Azure Application Insights
Tech Stacks & Skillsets Learned
- CloudFormation - Intermediate
- Pipeline Creation - Intermediate
- CI/CD - Intermediate
- Kubernetes - Intermediate
- Agile Framework - Junior
Notable Projects
While this was a short stint, this was really what provided a big springboard into learning both what goes behind the ‘DevOps mentality’. When I started, I felt fresh in the team and raring to go despite working with the team for most of my stint as a Cloud Engineer at ANS previously. My most notable project in short looking into why one of our applications seems to perform at a certain (very poor) speed no matter the virtual hardware we throw at it.
We found an important bottleneck: Dynamics rate limiting! It seems at roughly 150 concurrent users, any further requests would hit a rate limit that we couldn’t find documented within Dynamics 365. Thankfully, we’ve got a good partner status with Azure, and some great people both in the Dynamics team at ANS, and support over at Microsoft. This ended up being used as an example that Azure support used to look further into Dynamics 365 rate limits as a whole, and the mitigations around it. While we couldn’t get a higher rate limit and had to send requests asynchronously instead , and present a soft error within the application more than we’d want, we still got some great feedback from Azure support, and they committed to providing more documentation on the matter for the wider customer
One other large piece of work I’ve been dealing with is to help provide a PoC for one of our (potentially) largest customers in how they can move an application that performs a lot of HTTP->TCP proxying to AWS while both keeping speed high and costs low. We ended up settling on using NGINX TCP streams forwarding to different instance groups defined as backends in the NGINX config. It’s somewhat messy, but the customer was certain that kubernetes wasn’t for them, and this was the next best thing for dynamic load balancing with extensive configuration. It also helped that I already have TCP streams setup in my lab!
ANS Group - Oct 2021 - Sep 2022 - Cloud Engineer #
An ambitious, customer-obsessed data-transformation provider
Roles & Responsibilities
- Providing Cloud & DevOps Consultancy
- Performed On-Prem to AWS & Azure Migrations
- Providing Orchestration Consultancy with Azure Kubernetes Services
- Infrastructure as Code (IaC) Migrations
- Completing Quarterly Customer Environment Reviews
- Deployment & Management of Infrastructure as Code via CodeCommit, Azure DevOps & GitHub
- Use of Ansible, Terraform & CloudFormation to Provision & Configure Infrastructure
- Assistance & Escalation for Technical Service Management
- CI/CD Pipeline Creation & Maintenance
Tech Stacks & Skillsets Learned
- Cloud Troubleshooting & Support- Expert
- Cloud Spend Optimization - Intermediate
- AWS - Intermediate/Expert
- Azure - Intermediate
- Terraform - Intermediate/Expert
- CloudFormation - Junior/Intermediate
- Cloud Networking - Intermediate
- Pipeline Creation - Junior/Intermediate
- CI/CD - Junior
- Kubernetes - Junior/Intermediate
- MongoDB - Junior
Notable Projects
There’s almost too many to mention here! First is the story of how I became a comparative SME for our MS team. One customer had a particular set of issues with their web app running on an AKS cluster we managed. They saw search query times go from around 0.5-1.5 seconds all the way up to 30 seconds! This wasn’t relatively rare either, it was becoming quite consistent for a good while.
I came in and identified some core issues:
- There wasn’t enough logging data needed to make data-driven decisions around what should be improved. - Container Insights was needed.
- Cluster monitoring was flaring up way too often, making our teams fatigued and missing the important stuff. This needed rectifying
- The cluster itself needed to be reviewed from an architectural point of view. Why wasn’t this autoscaling?!
- Their DB hasn’t been checked in a long time.
Covering the first point was easy, enabling container insights is incredibly easy.
To cover the second point, we use a bespoke monitoring system that I can’t give too many details about, but after using the data from Container Insights, I was able to make better decisions on what values to set for our alerts.
To check over the cluster itself, I combined the use of Container Insights along with kubectl logs and app testing with the customer and I determined a couple of things to implement / review.
Vertical Pod Autoscaling isn’t needed, but horizontal node autoscaling is. I implemented this with the use of Virtual Machine Scale Sets (VMSS) and now I’ve got the cluster scaling from 8-13 nodes automatically.
While autoscaling is great, it’s pointless without deployment resoure requests. This is something I also implemented, since it wasn’t done initially before I reviewed the customer. This again is where Container Insights really did come in handy.
To check over the DB was definitely more of a complex one, and this is where I understood that this wasn’t just an infrastructure issue. The application itself was performing a full table search without any filters, and instead filtering on the application side. Along with that, the DB wasn’t sharded or in it’s own cluster. This was one of the projects I kicked off and was able to successfully coordinate and get done for the customer.
The next project starts with the customer’s heart sinking. They’ve received a vulnerability report from the National Cyber Security Centre. Thankfully, it’s not as bad as it seems, but it’s still something really to get sorted. After sending it over to us, I see that their environment had direct IP accessibility over HTTPS rather than forcing the user to go through their Load Balancer’s IP address. This means that in theory, a black hat could target each of their web servers one-by-one much more easily.
This wouldn’t be so much of a task, if it wasn’t 45 separate environments that were individually configured by the customer. I was brought in to provide the clarity in the situation and become a quasi-project manager for the customer. Me and their Service Manager teamed up to organize the customer, set up a task board and get the show running. I did the following:
- Sort the list of machines affected by type (on-prem, AWS etc.) , OS and prod/test/dev
- Create an action plan for each, risks associated and any blockers
- ETA for implementation and testing
The good part here is that I’d just experimented with ALB’s in my terraform lab (As you can see here!) so I’d already got used to ALB listeners and associated rules. Replicating this 45 times only added tedium rather than difficulty, so ended up being as simple as arranging maintenance windows with the customer. In reality, this was a very complicated task, but my job was making it seem simple, which hopefully I’ve done here too!
Cloud spending is becoming more and more a leading topic of conversation with customers I have. One of the mini-projects I was responsible for was sorting out one particular customer that hadn’t taken any action around their spend for a long time. Here’s a short version of what I found:
- Very long backup retention, but also yearly requests to delete certain backups. Est. $3k a month savings.
- They were spending ~$8k a month on EC2 instances but with no savings plans or reserved instances.
- They had multiple volumes with a combined est. size of 60TB that weren’t attached to anything and were being backed up.
- Multiple instances could be power managed and/or rightsized to make sure they weren’t overspending on individual instances.
All in all, it would be a decent amount of work, but they’d save nearly $10k a month with what I suggested! Combining that with the rightsizing and power management and they ended up saving over half on their total bill!
Never.No (Now Dizplai) - Jan 2021 - Oct 2021 - Linux & AWS Operations Engineer #
SaaS to combine traditional media and social media
Roles & Responsibilities
- 3rd Line Linux & AWS Support
- Creation & Deployment of Ansible Scripts for Instance Fixes & Config Changes
- Providing Network & Security Support & Consultancy for Companies for Private SaaS Implementation
- Debugging Old Ansible & Python Scripts
- Performing Application QA Testing
Tech Stacks & Skillsets Learned
- AWS - Junior
- Ansible - Intermediate
- Debian / Ubuntu - Intermediate
- CentOS & RHEL - Intermediate
- DevOps Concepts - Junior
Notable Projects
I came in to Never.No with a few goals in mind. Firstly, I had to modernize not just their infrastructure, but their infrastructure as code. They were bringing AWS instances with an AWS CLI script that worked well for the time, but had a very outdated ansible script that had various shims to other scripts. This was hard to read to say the least. 1500 lines of python, bash and ansible to spin up an AWS instance and configure 2 applications? This wasn’t on. A lot of my first few weeks was spent on re-factoring this.
I reduced this down to a simple bash starter script to launch the instance in the CLI and configure application dependendencies using a standard user_data script. The specific instance configuration was done using Ansible, taking the customer name and certain other parameters to complete the application config and make deployment as painless as possible.
Additionally, I was looking into their cloud spend and saw that while we were doing a lot right, we could do more. Reserved instances were needed, and their Windows instances could be rightsized, saving approximately 25% of their total spend.
UKFast (Now ANS Group) - Oct 2019 - Dec 2020 - Junior Windows, then Enterprise Network Engineer #
The UK’s best support experience-oriented hosting provider
Roles & Responsibilities
- 1st & 2nd Line Windows & Linux Support
- 2nd & 3rd Line Networking Support
- Managing 100’s of Separate Customers’ Environments
- Technical Escalation for Cloud Two Team & Account / Service Management
Tech Stacks & Skillsets Learned
- Debian / Ubuntu - Intermediate
- CentOS / RHEL - Junior/Intermediate
- Cisco Networking - Intermediate
- Windows Server 2008-2019 - Intermediate
- Ansible - Junior
- VMWare - Junior/Intermediate
- Docker - Junior
- IIS - Junior
- SQL - Junior
Notable Project
Before moving to Networks, I had one project to test my skills. A couple of our DC’s had some strange issues on our backup network. Every now and then, but without any rhyme or reason, backups were failing all with the same random error message. Switching backup servers didn’t fix the issue, and neither did replacing the switches. Multiple backup and network engineers took a look and scratched their heads. I was asked as another fresh pair of eyes to review things. My first question: “Have we replaced the SFP transceivers?”. The answer: “Yes we did”. I wasn’t happy with this however. It was the only variable that stayed the same this whole time and hadn’t been confirmed working. The rest of the network was fine, and we’d verified the core in that network was working as it should.
On a working section of our network, I checked the transceiver ID’s and batch ID’s and compared with the ones in a known bad section. In the working section, the ID’s and batch ID’s were mostly different from each other. The bad section however showed all the same few batch ID’s. Eureka! I spoke to the vendor support team and confirmed that this batch was part of a recall that we weren’t informed about since the 3rd party we purchased from didn’t pass the memo along.
Coordinating this change was going to be a tough one, with hundreds of customers partially affected, we had to organize several windows of replacement, and coordinate with our nights team to ensure that the backups took place succesfully after a change. To make things easier, we were able to move backups outside of certain hours, so we could use part of the day to replace, and another part to verify the change succeeded.
All in all, over 50 transceivers were swapped out, and to this day they’re still working great.
Trinity C of E High School - Jul 2018 - Oct 2019 - Level 4 Network Engineer #
An outstanding Hulme secondary-school and sixth form
Roles & Responsibilities
- 1st Line Windows & Linux Support
- 2nd Line Hardware, Printer & Audio/Video Equipment Support
- 3rd Line Network Support
Tech Stacks & Skillsets Learned
- Debian / Ubuntu - Junior
- Windows Server 2008-2016 - Junior
- Network Troubleshooting - Intermediate
- Cisco Networking - Junior/Intermediate
- Virtualization - Junior/Intermediate
- VMWare - Junior
- Hyper-V - Junior/Intermediate
- HP Switching - Intermediate
Notable Projects
In my first week, I was tasked with migrating our domain controllers from Windows Server 2008 to 2016. A complicated task for someone fresh out of college, but I rose to the task and set a list of deliverables:
- As much of a gradual switch as possible.
- Switching from a single DC to dual DC is a must.
- A test virtual environment setup to ensure GPO’s still function.
During this project, I learnt a lot about FSMO roles, and just how complex DC migrations can be, especially with legacy software being in parts of the environment. I decided to use a multi-master model with two roles being on one master, and the other 3 on another. We also had secondary DC’s implemented to ensure no disruption to learning.
With the migration itself, I performed A and B deployments, with certain un-used IT rooms becoming quasi-test environments. I then prepared scripts to move the roles over when the migration took place. By performing it in this way, we measured the total downtime being 19 seconds. The maximum target downtime was 1 hour. From then on, I become the ‘AD guy’.
The month after, I was given another tough migration: Upgrading the student WiFi from a max of 254 users to ~1400 while also experiencing a downgrade from 1gbps to 200mbps WAN speed. To make the situation even harder, the expected download/upload usage for a student each day went from a few web pages and documents to multiple app downloads, and many more documents being viewed and uploaded.
I again generated a list of deliverables:
- QoS implementation & refining
- Address space from Class C for each building to Class B for each room
- Full site survey to be completed to assess WAP load
- Download & Upload Request Limiting & Staggering of App Onboards
The last point, I didn’t want to implement. Realistically though, 1600 users on 200mbps does not compute no matter which way you put it. Again, this was a tough one. Performing a site survey was one of the easy bits. Implementing my planned address scheme also wasn’t the toughest. The school wide Wi-Fi had a /21 subnet assigned to it. A max of 21 subnets should be fine, but I decided to use /23’s for each room since we were cutting it relatively close, with 50 or so rooms needing this connectivity. This did need a VLAN overhaul, but I created a theoretical VLAN restructuring anyway for a portfolio project for my apprenticeship, so could just use this and tweak it slightly.
A week or so later, and all was done! I’m massively simplifying here, but I did also review the QoS we had implemented, or lack of it, and made significant changes. I couldn’t magic up more bandwidth, but I could implement a limit that meant that students could start a download at the start of the lesson, and would be ready by the time they needed to get started in using the app for the first time.
Avensure Limited - Feb-Jul 2018 - IT & Legal Administrator #
A leading Manchester-based Employment Law firm.
Roles & Responsibilities
- 1st Line Windows, Printer & Network Support
- System & Employee On/Offboarding
Certificates & Qualifications #
- GCP Cloud Engineer - Associate - 08/2023
- AWS Solutions Architect - Associate - 05/2022 - 05/2025
- HashiCorp Certified Terraform Associate - 05/2022
- CompTIA Network+ - 08/2018 - 05/2025
- Level 4 Apprenticeship in Network Engineering - 02/2020
- Level 3 Extended Diploma in Hardware & Networking - 05/2018
- Cisco CCNA Routing & Switching (NetAcad) - 04/2018
Origin Story, Fun Facts & Motivations #
Formerly based in Tameside, I was a countryside kid. Until around 2011, I’d seen more types of Animals than Keyboards. Aside from having an old Windows XP computer I played games on, I was not a Technologist to say the least. That all changed when in 2014 my first ever PC I got for Christmas one year broke on me. The issue: a burnt out SATA cable. This got me from one rabbit-hole to another, researching PC parts and how it all ticked on together. I’d done this here and there for a couple of years, and decided that this would be the best thing to get in for my career too, and went to Oldham College doing IT Hardware & Networking.
Moving IT from a passion to a career was something that worried me at the time. What if I stop liking it? What if I stray too far away from my roots and change too much as a person? Thankfully, that didn’t happen. I’m still as inquisitive, if not more, as when I was a kid, and I love doing what I do even more. As sad as it may sound to some, when I clock out of work, I log out of a customer cluster and right into one of my own.
By the time I finished college, I learned a lot more both in and out of tech. I came out of my shell a lot more, and was able to network much better. Networks became my bread and butter at the time, and it’s still a skill I’m proud of today. Not only that, I gained a new passion for not just messing with machines, but messing with servers and making many more Personal Projects involving servers, clusters, vNets, whatever I could get my physical or virtual hands on.
From 2018 to the middle of 2021, I solidified my skill set and made it my muscle memory. Then I got exposed to AWS. It took me a while to get hooked since first getting exposed, but I took my first cloud-exclusive job in Oct 2021 with ANS. After solidifying my skills and not adding many strings to my bow for a while, I wanted to get back to winning ways and sink my teeth into as much as possible. I’ve definitely succeeded at that, and it’s catapulted me into a dream DevOps career.
My motivation for sticking in IT is a simple thing: This isn’t a job to me. Some might find it boring, likely in the same way I find being in a nightclub til 3am boring. VSCode is much safer anyway. I’m definitely very lucky to be passionate in this kind of industry. It’s a pre-requisite in my opinion, and it’s one that can set you up for a good life if you play your cards right.
Additionally, it’s been great to connect with like-minded people in an industry where the physical makeup of people is so vast. Nobody cares about what you look like, how big or small you are or where you or your parents are from. All that matters is if you’re any good at what you do. Sure there’s politics, but so much less than anywhere else.
Now for some fun facts!
I played in the Under 11’s and Under 13’s for Man City
- I then got injured quite a lot, breaking both wrists and an ankle. I called it a day pretty quickly.
I used to go shooting in an indoor range as a kid, firing anything from tiny 22LR rifles to slug-firing 12-gauge shotguns.
- One of said shotguns ricocheted off the back and side of the range and hit me in the face, still being the hardest slap I’ve ever received.
I cameo’d a few times for the UK national team in Counter-Strike, back when international competitions were a thing.
Personal Projects #
My Home Lab
- 10Gbe Storage Network - TrueNAS custom-built server with 8TB Flash Storage, and 12TB HDD Cold Backup Storage
- Using ZFS Optimizations & Caching to achieve 40Gbps total read throughput
- Grafana Monitoring for AWS Environment
- Docker server with ephemeral storage & ansible configuration for re-usable & replicable containers.
- Hyper-V
- Uptime Kuma for Status Checking & Alerting for publicly-accessible resources
- Flame used for a dashboard to my hosted resources
- Privatebin for encrypted pastebin sending
- Syncthing used to backup & sync resources around the world including to a secondary region.
- TailScale VPN setup for a software defined global mesh network with infinite scalability.
- PFSense Firewall with IP Reputation & GeoIP blocking for network security.
liamhardman.cloud - Infrastructure Blog as Code - Apr 2021
- Self-Hosted GitLab used to host application code and run CI/CD pipelines to deploy to Kubernetes
- Rancher management plane used to manage RKE2 cluster
- 2 Master & 3 Worker nodes across 2 physical servers for High Availability.
- Prometheus, AlertManager & Grafana used for Observability & Alerting