So... How do you Pay Off Technical Debt?

We’ve all heard the phrase “Technical Debt”. It’s the perceived issues you’ll face if you don’t stay up to date with your tech. Some decisions might lead to more technical debt such as using a less modern application framework, or deciding against using a standardised Patch Management tool such as AWS Patch Manager.As much as most of us engineers like to think these decisions are just adding onto their company’s technical debt, and they likely will, we’re all just living in blissful ignorance. For most organisations, it’s a balancing act that needs more hands than they likely have.

The question is less “How do you pay off technical debt?” And more “How do you work to not acquiring this debt in the first place while keeping costs reasonable in the short term?” . The answer to the former is simple: Update your frameworks and follow the AWS or Azure well architected frameworks, assuming you’re using those providers.

The answer to the latter, however, is a hard one that any new IT Manager will have a hard time facing. I’m not even a manager yet, but I’m here to put my hat in the ring for this one.

The 5-Minutes, 5-Hour, 5-Year Rule #

I’ve been working in the industry full-time for over 4 years as of April 2022. What I’ve found more than anything else are 3 key statements that I live by especially from a Support perspective.

“Most things are a 5 minute fix.”
However, it likely took 5 hours to learn (Or at least the underlying parts)
Expect any fix to stay there for 5 years, and have the documentation to back up anything you do from up to 5 years ago.

I’ll elaborate on the first statement. In my 4 years of doing IT Support, I can honestly say that most things have taken 5 minutes to fix. Perhaps significantly longer to research, hence the 5-hour rule, but the resolution itself often is a short one. That second rule is also the reason labour costs are so high these days. The standard of support we get in life as a whole is the highest we’ve had in human history. No wonder a mechanic costs twice what it used to. The third is the thesis of technical debt. That’s why there’s another statement from an IT Management perspective to put forward:

“Simplifying a 5 minute job can often take 5 hours, but it should be done if it’s a net positive by 5 years time.”

It’s been all too-common a case where I’ve been in the break room with my work friends talking about support tickets and one of us have said “this server’s died again, I know X looked at it before”.

Now, simplifying a task in this context can be one of a few things and it leads quite well on to my next couple of points for eliminating your Technical Debt.

This is something I’ve done a decent amount in the past, but I definitely want to do more. One of my ambitions is to become a tech or team lead so I definitely need to get used to it! I digress. Knowledge-Sharing is what makes a good engineer great, and really helps to bring a team up. For that person it means they can’t spend as much time doing the job itself, but it makes their time more efficient.

Everyone’s experienced the pain where they gaze upon some config or code, questioning every line and muttering a negative side-comment before realising it was their own. The next logical step would be to check through their documentation for why they did what they did, but so often this doesn’t exist.

This is what will distinguish good managers from bad managers apart. With whichever place of work I’ve been in, I’ve found it more and less difficult to both find and write documentation. You’ll find anywhere that the easier it is to write documentation, the more there will be.

I can’t stress this enough, if you own an IT company or are high up enough to call the shots, Make it so easy to document, it’d be harder not to. There was one company I saw that had a system that made you note why you had accessed a particular server, and it was both trained and taught to note on any support ticket exactly what you had done and seen. Any console output, any error message. This not only impressed me significantly, but it also influenced how I document to this day. I first saw that when I was 17!

Back to knowledge-sharing as a concept. People often make the misconception that knowledge-sharing=documentation and nothing else. Documentation funnily enough is the smallest part of sharing knowledge. There’s only so much you can learn and share by writing down what you’ve done. Engineers are infamously unenthusiastic to reading and prefer just doing the job, so here’s what I propose:

Shadowing & Reverse-Shadowing

This is the most common method of knowledge-sharing aside from documentation, so it needs little explanation. Reverse shadowing however I think needs more exposure.

This is something that I was luckily given quite a lot when I started off, tackling the tougher issues as a junior but also having the great support system needed to really thrive. This can be a hard one to make time for since you essentially need double the engineer time, but it can be great to get a new hire up to speed, or accelerate the growth of a junior engineer that just needs a bit more confidence.

Guided Escalations

Guided escalations is something I’ve coined (I think) so I’ll explain this one. It’s a common occurrence where my colleagues in Triage escalate some issues to me that are quite similar to each other, and informally I go through this in a more guided format. In a similar way to shadowing, I show them how I go through the issue itself along with the best way to nip it in the bud before it becomes a problem.

However, I also show them the alternatives. For example, if a customers infrastructure needs an upgrade, I don’t just give them what I’d recommend based on their needs, but also what I’d do if budget wasn’t a concern or even if the budget constraints were more aggressive.

Then when this is done, I write just a few more notes than I would, and format it a bit to present. This is something I’ll either send in an email to that team to save them some time if it crops up again, or even make a very simple presentation about if it’s quite a big issue. Guided escalations like this really help to put the countless google searches into a condensed, applicable format.

Certification Study Groups

Often times you can feel slightly left-out when studying for a cert but none of your colleagues are. I myself was in a peculiar situation in my current place of work. While I had a good amount of Infrastructure and AWS knowledge, I wasn’t actually AWS-certified up until the beginning of April 2022.

In a smaller organisation filled with lots of heterogeneous components, implementing something like this would be quite difficult but in a larger enterprise, this is absolute gold dust. Imagine an environment where your apprentices, junior engineers and the best you’ve got are all aiming to achieve the same goal.

Having an environment where anyone can just hop in when they need to ask the questions they need, go over a practice question that’s bugging them or just have a chat with someone that’s doing the same thing as them is a dream come true for a lot of techies.

While the above can seem almost like random conjecture, having knowledge-sharing at the forefront of your tech philosophy can really help to pro-actively remove the threat of an incoming tech-debt collection.

Automation #

When I wrote the “Simplifying a 5 minute job can often take 5 hours” , this is exactly what I was thinking of. As of writing this, I’m currently 20 hours in to the project of creating the infrastructure for this site through Terraform. I’d never done terraform, and cluster best practice in AWS was still relatively new to me. Looking at it, it likely would’ve taken half the time if that to just create everything manually. However, I can use this code to deploy an infinite amount of resources so in theory, this could be used for many customers versus the manual method which is one configuration per customer.

Take this code for example:

 default_tags {
    tags = {
      CreatedBy   = "Terraform"
      Environment = "Test"
      Owner       = "Liam Hardman"
      Application = "dev.liamhardman.cloud"
      Criticality = "Tier 5"
    }
  }
}

Having this set in my automation not only makes tagging quicker, but more standardised. Nobody wants to have to go and tag a bunch of network interfaces, but this does it for you!

Here’s another example:

resource "aws_acm_certificate" "cert" {
  domain_name       = "liamhardman.cloud"
  validation_method = "DNS"
  provider          = aws.virginia

  tags = {
    Environment = "test"
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_route53_record" "example" {
  for_each = {
    for dvo in aws_acm_certificate.cert.domain_validation_options : dvo.domain_name => {
      name   = dvo.resource_record_name
      record = dvo.resource_record_value
      type   = dvo.resource_record_type
    }
  }
  allow_overwrite = true
  name            = each.value.name
  records         = [each.value.record]
  ttl             = 60
  type            = each.value.type
  zone_id         = "Z0748452APJIMZEC3Y9G"
}
resource "aws_route53_record" "cloudfrontcname" {
  zone_id = "Z0748452APJIMZEC3Y9G"
  name = ""
  type = "A"

  alias {
    name = aws_cloudfront_distribution.alb_distribution.domain_name
    zone_id = aws_cloudfront_distribution.alb_distribution.hosted_zone_id
    evaluate_target_health = false
  }
}

This is where things get a bit more lengthy, and this code is still quite old (Still using a static zone ID for example). This means that whenever my template is applied, it ensures that my CloudFront distribution will be SSL secured and I won’t have to go through the many submenus that AWS has.

Automation will be your worst enemy when you start, but your best friend when you finish.

That’s something I go by quite a lot as a non-coder. I’m slowly getting better with programming concepts, and IaC is a huge motivator for me.

Summary #

That turned out longer than expected. Here’s a few points to take away.

Eliminating the technical debt must be done proactively
It must be done at the management level and at the technical level
Eliminating Technical Debt is achieved in the same way you avoid stagnation as a techie
Automation is a good thing, even if it’s pretty scary to get into
Document, Document, Document

The 5-Minutes, 5-Hour, 5-Year Rule #

Knowledge-Sharing & Documentation #

Automation #

Summary #