Column

Confessions of a Copy Paste Developer

Illustrated by Julia Hanke

Good fortune? Yeah, that’s not something that randomly lands in my lap.

Random incidents of misfortune? Oh yeah, I’m the guy who gets bird bombed while walking across the street on a beautiful day. The guy who gets the door closed in his face because “the guy before me was the last one.” Takeout places always forget some part of my order.

Bad stuff just seems to follow me around sometimes.

However...earned fortune? That’s another story. I mean, the fact that I'm currently a development team lead and have overall responsibility for our company’s Perl-based ETL (Extract, Transform, and Load) processing is something that I’m very proud of...and was something that I had to earn the hard way. The way that I managed to arrive at this position so quickly is a bit of a surprise to my peers and even to me...especially since, over the course of a single night, I was responsible for nearly destroying our business.

Just My Luck

You see, several years back, when I started out at Loadnetics Corp., the path laid out for me was that I would be a QA / business analyst and live entirely within that silo. My job was to take the requirements from the business, plug values into our project plan templates, and pass them on to our development team in Kerbleckistan. (A beautiful country that is world renown for their techie and fluent English–speaking population, with a cost of living less than half of the US, thus passing the savings on to us.)

However, as fate would have it, after nearly a decade of outsourcing, the Kerbleckistanian folks had suddenly gotten so good that they were no longer as comparatively cheap as they once were. The bean-counters-that-be decided, about a year after I had started, that all the development work was coming back our way. After all, software development can’t be that hard, right?

For me, the departure was truly bittersweet. Over the past year, I had built a close relationship with Kevin, my counterpart in Kerbleckistan, who had been with the team assigned to our company’s account since the beginning, and who was also responsible for most of the main coding of the ETL processes.

He’d be fine—we were only one of the clients he was assigned to help and he’d just get swapped into another team. Meanwhile, I was petrified. I was about to inherit full responsibility for years of developer-level technical “stuff” that Kevin had been taking care of and that I only really knew at a conceptual level. As in, I was about to wholly own the entire codebase—dev/test/prod environments, troubleshooting, QA, requirements gathering—the works. Great.

“But don’t worry, mate," Kevin said. "Between us, there’s a secret in all the modules I’ve written.” Kevin paused and whispered, “They’re all basically the same thing.”

"Copypasta’s" Secret Sauce

Kevin explained that one huge advantage to working for a large software factory with tons of clients all over the world was that most of the problems that are brought our way have been solved already.

“It’s like this, mate. You and I have been working on the same processing bits and bobs for the past year, yeah? Remember that time we worked for a month straight on adding the rules for that US-78466 Regulation? Sure we had to work it out, and it was a tough one, but when I checked it into the source control, any junior dev could just nick my algorithm and look like a rock star.”

He continued. “Later, as they grow up to be a senior dev like me, they’ve been digging into the code for years and, since their contributions are based on what has trickled down, it all has the same...flavor...and, to compound things further, the contributions that they make get put back into the ecosystem and get passed on to the next generation of junior devs.”

I had to admit, even as naive as I was back then, I was suspicious of this practice. But it made sense—writing a new implementation for EVERY customer in the world was probably a waste of time. Almost immediately after this big reveal, I felt less anxious. “So, because I know how it’s supposed to work, basically anything I don’t know is right there in the code. Yeah, I think I can handle this!”

“Yeah, mate! That’s the spirit!”

Unraveling the Tagliolini

After a few more knowledge transfer sessions, the time came to part ways. I was sure that Kevin wouldn’t miss working with me, but I was petrified. He was my safety. I felt kind of like how my son must have felt on the first day of kindergarten. I was away from the protection and buffer that being "not-the-developer" gave me. Right into the lion’s den.

First day in as a combined app and dev owner, the hits came in rapid fire, as if someone upstream knew to hold back until that day and then let ’er rip!

My first change was simple...rather, the wording itself seemed simple. One of our upstream data providers delivered a sudden requirement that they were going to be sending a new field as a Base64-encoded string that expanded out to a JSON object. All I cared about was fetching the “Transaction_ID” GUID out of JSON, but I had to play by the data provider’s rules.

Thankfully, I quickly recalled that this exact requirement came up a while back and had been solved before for a different processing stream. As a result, I was able to quickly whip up this example and test it against a sample record we received and was relieved at the end result.

use Data::Dumper qw(Dumper);

my $FROM_JSON = qr{

(?&VALUE) (?{ $_ = $^R->[1] })
(?(DEFINE)(?<OBJECT>(?{ [$^R, {}] })\{(?: (?&KV)(?{[$^R->[0][0], {$^R->[1] => $^R->[2]}] })(?: , (?&KV)(?{[$^R->[0][0], {%{$^R->[0][1]}, $^R->[1] => $^R->[2]}] }))*)?\})
(?<KV>(?&STRING): (?&VALUE)(?{[$^R->[0][0], $^R->[0][1], $^R->[1]] }))(?<ARRAY>(?{ [$^R, []] })\[(?: (?&VALUE) (?{ [$^R->[0][0], [$^R->[1]]] })(?: , (?&VALUE) (?{ [$^R->[0][0], [@{$^R->[0][1]}, $^R->[1]]] }))*)?\])
(?<VALUE>\s*((?&STRING)|(?&NUMBER)|(?&OBJECT)|(?&ARRAY)|true (?{ [$^R, 1] })|false (?{ [$^R, 0] })|null (?{ [$^R, undef] }))\s*)(?<STRING>("(?:[^\\"]+|\\ ["\\/bfnrt])*")(?{ [$^R, eval $^N] }))
(?<NUMBER>(-?(?: 0 | [1-9]\d* )(?: \. \d+ )?(?: [eE] [-+]? \d+ )?)(?{ [$^R, eval $^N] }))

) }xms;


sub from_json {
  local $_ = shift;
  local $^R;
  eval { m{\A$FROM_JSON\z}; } and return $_;
  die [email protected] if [email protected];
  return 'no match';
}

sub DecodeBase64 
{ 
    my $d = shift; 
    $d =~ tr!A-Za-z0-9+/!!cd; 
    $d =~ s/=+$//; 
    $d =~ tr!A-Za-z0-9+/! -_!; 
    my $r = ''; 
    while( $d =~ /(.{1,60})/gs ){ 
        my $len = chr(32 + length($1)*3/4); 
        $r .= unpack("u", $len . $1 ); 
    } 
    $r; 
} 

$EncodedCustomerTransactionJSON = "eyJUcmFuc2FjdGlvbklEIjoiMzUyNTY2NTUzIiwiTWFnYXppbmUiOiJIdW1hbiBSZWFkYWJsZSIsIkZvdW5kZXIiOiJQZWsifQ==";

my $TransactionID = from_json(DecodeBase64($EncodedCustomerTransactionJSON))->{TransactionID};

print "Here it is: " . $TransactionID;

1 2

Being that I was still working through my “Introduction to Perl” online training, I was super nervous about integrating my changes with the mainline code. So, I was extra careful in my planning and authoring to make it seem like the code was always there. I felt such a sense of pride and accomplishment from such a small effort task.

If it weren’t for the step-by-step instructions that Kevin had sent me a few days before, I would have been struggling hard. At the end of the day, I was so grateful for the documentation and the ability to ask Kevin whatever questions I wanted leading up to this.

Unfortunately for me, now, I was completely on my own.

Deep, Dark Magic Rears Its Ugly Head

Skip ahead a few months, and I was actually doing pretty well for myself.

Still thinking like a business analyst, I solved problems the way that I’d think a business analyst (with just enough development knowledge to be dangerous) would. However, when I realized I was truly standing on the shoulders of giants, I was reminded of how Ian Malcolm warned of this in Jurassic Park, where, to paraphrase, I had gotten to where I was without needing any discipline and I didn’t need to earn any of the knowledge for myself. But this was software development, not genetic engineering. A totally different set of circumstances.

Besides, everything was going swimmingly. In nearly 90 percent of the cases where a new request for an add-on or new feature rolled in, I was able to "borrow" code from across the system and remix it to do what I needed. Need to transpose two characters at positions 78 and 79? No problem! I’m doing that for a few other apps already.

You’d think I’d have turned into this functional programming wizard—everything had a function and nothing was floating around as a monolithic hunk of code. I wasn’t wise enough yet to use shared libraries, but I was still learning, remember? Save that kind of refactoring for when things die down. If they die down.

Riding high, I was feeling pretty good—and then disaster struck. I was asked to make a change in response to a brand-new requirement straight from management. Basically, every time a file was processed, we now had to save off the processing time along with other meta-data to indicate various factors like file size, origin, destination data warehouse, and so on.

The idea was that, based on these results, we could more quickly isolate root causes of processing issues and maybe add throttle controls to keep processing going at a steady pace, even if the sky was falling...in a file transfer kind of way that is. First things first, though, was pretty dashboards, which required some math functions, of course, in Perl. Ok—time to research. A quick perusal of Stack Overflow had exactly what I was looking for...well, I thought it did anyway.

Starting with the code that I had found, and a few more bits that I needed, I got to work in my laptop development environment and was quickly making real progress. A few days later, I had a working prototype, and after further testing and tweaking...voila! Spikes appeared on my Grafana dashboard.

Confident that things were running smoothly after some further testing and tweaking, I pushed the code to the prod servers and called it done...but there was one little wrinkle. The awesome graphs that I expected to see in production didn’t have any spikes. Or dots of any kind for that matter either.

A quick inspection of the logs for my ETL analytics script all showed the same error:

Can't locate Scalar/Util.pm in @INC 

At the time, I had no idea what this module did, but ultimately found it connected to a function named weaken that I tracked down to some code that I had found. Thinking I had simply copied a little too much, I commented out the weaken line, tested, and everything passed with flying colors. A redeploy of the quick change and I was back in business. The production dashboards lit up with processing times and other performance metrics, exactly as expected.

Once I sent the email notifying everyone of the dashboard URL, I packed up and went home for the evening. My whole bus ride home, I kept rereading my email. It felt so good. I slayed the beast. My dashboards were back in business.

Of course, that night came the urgent 3 a.m. call.

Welcome to the Cargo Cult

“Yeah, this is Brian from the desk. Sorry to bug you so late, or early I guess, but we’re getting alerts saying that SCORPETL01 dot DMZ through SCORPETL18 dot DMZ are all down. How do you want us to proceed? Send the ticket to your queue or...”

In my midsleep haze, I almost didn’t catch what was happening because of how the desk guy was trying to pronounce the server names as actual words, which he actually did an admirable job of, but after collecting myself, I managed to ask, “So...wha-what do the ETL dashboards all show?”

“Well, the line was getting lower and lower but then flatlined at zero. Do you need me to reach out to the server team?”

“Hell yeah, I do!” I nearly shouted as I rolled out of bed to get on my laptop.

I was pissed. The call didn’t come out as a result of the new and shiny dashboards, no the help desk got an alert saying that resources on all the servers that handled ETL processing had been completely maxed out. But it wasn’t a fast ruination. No...it was a very slow bleed. Historical memory graphs showed utilization increased slowly until each server became unresponsive, prompting a reboot...which mysteriously fixed the issue.

Thankfully, I had realized that the metrics script had SOMETHING to do with the outage (after all, it was the last thing to change). While I was able to keep my calm in emails, I was in full-on panic mode. Not only had I gotten the attention of the higher ups that critical production-based processes were ruined overnight, but also there was evidence that processing was ruined...right there in line chart form. But yeah, thank goodness for monitoring...just not my attempt at it.

Ultimately, as I researched the code and, when looking up memory leaks and Perl, the weaken function came up repeatedly. What happened was that somewhere in the script, there was a circular reference in one of my code’s objects. Without that weaken call, Perl's reference-based garbage collection system didn’t notice when that structure was no longer being used.

Well, now I know that, but it took me a few days of deep researching and mostly trial-and-error-anything-that-might-work-oh-the-humanity in between status updates and attempts at assuring the masses that things would get better.

Which they ultimately did. In time. And more training and learning...a lot more.

Plot Twist?

Unrelated to this incident, Stack Overflow's family of sites were blocked by our upgraded internet firewall, making code thievery a little more difficult than it was when I first joined the team.

Most of us senior guys and gals are pretty much used to looking up sources of information elsewhere as everybody and their brother is writing about coding best practices nowadays. The junior devs coming on board are starting to give up their Stack Overflow habits as they gain deeper insight into how everything works and are ready to start relying on their skills first.

I mean, it’s a work in progress. If you absolutely need access, it’s still pretty easy if you are lucky enough to own a phone with internet access.

1: JSON Parser Source: https://www.perlmonks.org/?node_id=995856

2: Base64 Decode Source: https://www.perlmonks.org/bare/?node_id=524261

Mark Bowytz

author

Mark Bowytz is a connoisseur of IT failure and has been writing about this topic for more than ten years. His cautionary tales of development disasters are always adjusted to protect the identities of both the guilty and innocent alike. He also believes that failure is the best teacher as it shows that no person, process, or technology is ever truly perfect.

Julia Hanke

illustrator

Julia Hanke is an illustrator living in Warsaw, Poland. She worked in creative agencies, currently works as fulltime freelance Illustrator, mainly making Illustrations for animations and web design. Now she is shifting her focus on editorial and children's book Illustrations. You can follow her on instagram @julia_hanke.