Grab any file from a past commit in Git

Have you ever been working on a project and you realize that at some point you altered a file for the worse? The code was fine before, but you went in and changed it, thinking it was an improvement, and now you want it back?

If you’ve been tracking it with Git, as you really should with any project, it’s easy to hop in your time machine and retrieve the file. Simply enter the following command:

git log --follow -- public/functions/main.php

This will show you a history of commits where that file was changed, along with the author of the commit, the date of the commit, and the commit message. You’ll see something like this:

Let’s say looking back at the history of this file, you decide that “update all functions in main.php” is the last commit before you made the bad changes to file.php. Simply run the following command:

git checkout b43123 public/functions/main.php

This does exactly what it looks like, it checks out the file “main.php” from the commit starting with hash b43123. This version of main.php is now in your working directory. Review and test it, and if you’re happy with it, commit it and push it back to the tip of your branch. Voila!

Readable Code is Better Than Concise Code

When coding in a competitive environment, programmers often try to outgun each other by writing code that’s as compact as possible. But conciseness is not always a good thing in programming. In fact, readable code is almost always better than concise code.

If anyone knows the name of this artist, leave a comment below and I will attribute them (I couldn’t find them)!

Tell me if you’ve been in this situation before: you write a piece of code to add to your company’s application. You push it up to the shared repository, satisfied that you’ve done a good job. But before the head coder adds it to the application, he or she “revises” it. They reduce the variable names to single letters like “a” and “b,” and replace lines of code with helper functions. When they’re done doing that, they combine lines, and add in hacky tricks to make it all work. All this in the name of saving space and making things concise. What’s left looks technically impressive, but it’s no longer readable code.

From a philosophical standpoint, your code may have been better to begin with. In fact, it probably was.

This can be discouraging. The revised code is unrecognizable, as if your contribution has been written right out of it. The changes feel like a critique of your skills. Often times, from a philosophical standpoint at least, your code may have been better to begin with. In fact, it probably was. You wrote a block of readable code, and after the edits, well, it’s hard to see what it does.

Sometimes More is More

Compact code isn’t necessarily a bad thing, but readable code is better. If your code is too compact, other programmers will have trouble understanding it. Hell, YOU will have trouble understanding it. Say you come back to it after a few months to add a feature or refactor. It may take you a minute just to reacquaint yourself and remember what you did.

Take a look at the following code:

This dictionary in Python contains a first name and a last name that we want to print out in a message. But we don’t know that the “first name” and “last name” keys exist (imagine that we don’t have direct access to the dictionary, or that it’s dynamically created). So we’ll have to check for them before we write the message.

Code That is Too Long

Something like this will work:

Pretty straightforward, right? We check that the keys exist in the dictionary, assigning them to variables if they do. Then we append them to a string, and print the string. This is certainly a block of readable code. Of course, it’s also bait for an edit-happy colleague, because it’s not exactly concise.

Code That is Too Concise

You can make it a lot more concise, like this:

This code does what the previous snippet did in thirteen lines, in only one. Instead of storing anything in variables, we’re printing the message straight out, using ternary (or conditional) operators. Since Python has trouble with two ternary operators on the same line, we’ve put them into a join() function, that uses a single space as a separator (the ‘ ‘ before the .join function name).

But hang on…this is no longer readable code. If you didn’t know what this code did beforehand, it would have taken some effort to parse. It’s not that it’s too hard to deconstruct, given a little time, but consider a hundred lines like this. Things would get tiring quickly.

The Solution: Readable Code

If a line of code appears too compact on first read-through, consider splitting it up a bit. Let’s try rewriting the code one more time, this time making it concise, while still readable:

There we go. We’ve reduced the snippet down to three lines, which is good. But more importantly, it’s still readable code. In some ways, actually, it’s easier to read than the first example. Our two ternary operators are on their own lines, giving them some breathing room. This means that we don’t have to strain to see where they start and stop. Finally, the results are plugged into a print statement, and we can easily see where the variables came from.

If a line of code appears too compact on first read-through, consider splitting it up a bit.

As I’ve hopefully demonstrated here, readable code is always preferable to compact code. You may save some memory and speed up a program by cutting down on lines, but unless you’re programming for embedded systems, or you’ve scaled to global proportions, it won’t matter much. It’s a lot more important to that your code is understandable. So don’t stress yourself out making your code as terse as possible, focus on readable code instead!

Coding Confidence – aka Ignoring the Hype

The following is a work of fiction, but it’s affects nearly all of us at some point. Hopefully, if you relate to it, you can take the message to heart and increase your coding confidence.

Anxiety Mounts

You’re sitting at your desk, and your team lead comes into the room and makes an announcement:

“Listen up everyone, I just had a team meeting with devops, and we’re switching to Laravel for the next project.”

You hold back a groan. Not only have you never worked with Laravel before, you’re not even that familiar with PHP. As feelings of anxiety start swirling in your chest, another developer chimes in:

“I take it we’re using 5.4.22 then?”

The team lead’s face lights up a little. “Yeah, unless they release a newer version between now and then.”

“Laravel’s a good one,” the developer says, not missing a beat. He picks a foam dart off the table and pitches it through a basketball hoop against the wall. “I’ve been keeping an eye on Laravel since they added model binding in 5.4.”

“Oh yeah, 5.4 was a huge jump forward,” the team lead agrees.

“Blade templating’s a nice touch, too,” the developer says.

“And route prefixing,” the team lead adds.

You start to imagine that you have never learned anything useful about programming in your life, let alone anything you could have an intelligent conversation about.

Coding Confidence Tanks

The team lead’s eyes brush over you and the other developers, and you look away, hoping he doesn’t realize you have no idea what they are talking about. Model binding? Blade templating? Route prefixing? As your anxiety builds, your coding confidence plummets. You start to imagine that you have never learned anything useful about programming in your life, let alone anything you could have an intelligent conversation about. Were you were supposed to be keeping abreast of MVC release logs? Hell, maybe you really aren’t very good at what you do. You can’t help but entertain the notion, as you watch your two teammates return to their seats, that maybe you just aren’t cut out for programming.

Diagnosis & Solution

This is called impostor syndrome, and a lot of programmers suffer from it. Maybe you don’t have a formal degree in computer science, and feel like you just haven’t been trained to think about things from the right perspective. Or maybe you do have a degree, but you have coworkers who skipped college and started programming for large companies right away. Since programming is one of those professions that a lot of hiring managers don’t know how to test for, it would be easy for an impostor to slip through the cracks, right? This is exactly how good programmers lose coding confidence.

These terms that your coworkers used: model binding, blade templating, and route prefixing, they are all pretty simple concepts.

Okay, so what I recommend here is to take a deep breath and let’s try to see the big picture. Part of what is fueling your insecurities can actually bring you some relief, and here’s why: jargon in the programming world is completely out of control. Marketing teams hype up simple concepts in an effort to catch peoples’ attention. These terms that your coworkers used: model binding, blade templating, and route prefixing, they are all pretty simple concepts. Let’s take a look at them, one at a time.

Breaking it Down

Model binding is probably the most complex of the bunch, and all it means is that, when a user enters a URL into your web app, like “profile/user/22”, Laravel automatically looks up the record in the User table with an ID of 22 and gives you, the developer, access to the record. So if you wanted to show the user their email address and phone number when they entered “profile/user/22”, you wouldn’t have to write the code to query the User table for that record. That’s all model binding is.

Blade templating is just a kind of templating system. Blade is one of the most intuitive, straightforward templating systems I have ever worked with. It’s a snap.

Route prefixing is a Laravel feature that lets you forgo beginning of a URL when writing routes. Say you need three routes, “admin/profile/{id}”, “admin/dashboard/main”, and “admin/users/all”. First tell Laravel that the next group of routes will start with the prefix “admin”. Then write them as: “profile/{id}”, “dashboard/main”, and “users/all.” Laravel puts “admin” on the front of each one for you. It’s really simple, isn’t it? Not intimidating at all.

It’s really easy to get intimidated by jargon, but in hindsight, the conversation your coworkers had was pretty basic.

Coding Confidence

Feel better? It’s really easy to get intimidated by jargon, but in hindsight, the conversation your coworkers had was pretty basic. They might as well have been talking about baseball:

“I’m a Yankees fan.”

“Yeah, but what about the Red Sox?”

“Don’t forget the Blue Jays.”

99% of the time, tech talk in offices is much simpler (and boring) than it sounds. Maybe your coworker was just trying to impress the team lead with a barrage of intelligent sounding jargon. Maybe model binding and blade templating really do excite him. Either way, bring the focus back to yourself and don’t worry about him. You’ve got this, and you can build your coding condfidence. You know much more than you think you do.

Streamlining Code (Without Fucking Things Up)

When people talk about streamlining code, usually they are talking about writing code that runs as fast as possible under heavy loads. Google’s search functionality, for example, handles millions of requests daily, so it’s to Google’s advantage to write the leanest, fastest code possible. Even a small bit of code, if written badly enough, could measurably slow down a production that large.

Of course most of the code we write doesn’t handle data at the volume Google does. But even if you’re just looping through a large file, it is useful to streamline your your code. Streamlining also tends to follow best practices, and makes your code easier to read, so doing so builds your skills as a programmer.

Take a simple task like opening a file and reading through its lines, one by one. Let’s say we want to streamline this. We might first look at something like this:

Seems pretty succinct. But without much effort, we can reduce this five line snippet to four, like so:

All we did here was collapsed lines 1 and 2. Moving the string stored in the file variable into the open() function. It’s arguable a little easier to read, and now the machine running it won’t have to store an extra variable in memory, only to retrieve it one line later. Also, the code is arguably just a little easier to read read. We could take things a step further, like this:

Now we’ve collapsed lines 1 and 2 again. That open() function returns a file object, which now, instead of being stored in a variable, is fed directly into the with statement. Okay, so our code is still legible, and we’ve eliminated another unnecessary variable. Can we take it one step further? Well, yeah. This is what we could do:

This works, and produces the same output as the previous snippets. But is it better? No. We’ve lost something here. The with statement has been taken out, and we’re jumping straight to the for loop. The with statement makes sure that the file object created by the open() function gets handled correctly if something goes wrong. It’s essentially like putting the code in a try/finally statement, where finally calls close() on the file object, no matter what happens. It’s the same as writing this:

To put it another way, we reached peak efficiency with out third iteration of streamlines:

It’s at this point that we’d made the code as small and resource light as possible, without losing any of it’s original functionality.

Now, this is not to say that this snippet is necessarily the ideal for any situation. Maybe you want to leave in some of those variables for a reason. Say you’re processing several files with the same snippet, and want to pass in the file names to the same file variable each time. Or maybe you just personally find the use of some extra variables makes your code more legible. That’s fine, and I wouldn’t argue with you at all. The point is to decide what works best for you, while taking into account the load the machine running the code has to deal with, and as a result, how quickly it can complete its job. If you are passing it a file that is only a thousand lines long, maybe reducing the code by a line or two won’t matter. If you’re passing in a thousand files that are a thousand lines long, however, it might be worth your while!

Taking Risks in Code to Build Your Confidence

If you’re a programmer, you probably fall into the bad habit, at least sometimes, of comparing yourself unfavorably to other programmers. That guy or gal who sits across the office from you and seems to be able to tackle large and complex problems with ease. “Will I ever be as smart or as skilled as them?” you wonder. “What separates them from me?” While experience and talent no doubt play a role in programming, one thing that programmers often overlook is the power of confidence gained from risk taking. Since computers can only do exactly what we tell them to, and it’s easy to make small mistakes with large consequences, changing even a small part of your code can often be an anxiety inducing experience. While the best code is loosely coupled and appropriately abstract, all code is susceptible to being broken by change. So we’re conditioned to make as small a change as possible, sometimes manually altering the input we feed into a program instead of altering existing code, in order to mitigate the risk. The result is that we miss out on the opportunity to build confidence as programmers.

Take the following example: Your boss gives you a spreadsheet with 1,100 rows of data, each one representing a car that the company has recently purchased. Your boss wants you to enter each car into the database. Obviously, this is a job for a small script. No problem. You look at the first five rows and see this (we’re keeping things simple for the sake of example):

 MakeModelYear
1ToyotaCarolla2001
2MitsubishiLancer2010
3LexusES 3002014
4VolvoXC902018
5KiaOptima2016

Seems easy. The script needs to import the file, and then read it line by line. For each line it needs to create a record in the database, with the make, model, and year of the car. So this is your solution:

This script reads the file, cars.csv, into a while loop, and line by line, it calls the function “insertIntoDB()” and passes in an array representing the row it is currently looking at. You run the script on the test database and leave to grab a cup of coffee, confident that you have tackled the job quickly and painlessly. When you return to the office, however, your boss comes to you and says, “Actually, we need to make sure that only cars made on or after the year 2005 get entered into the database.” Okay, this is a little more complex, but an if statement should do the trick. Just wrap it around the call to insertIntoDB, so only those cars made on or after 2005 get through. Something like this:

Is it really going to be this easy? Your programming instincts are telling you no. You take a closer look at the spreadsheet, and just as you suspected, there are some complications further down the line:

 MakeModelYear
6HondaAccord15
7VolkswagenTiguan08
8TeslaModel S2018
9FordFocus2000
10NissanFrontier"2014"

Lines one through five were fine, but lines six and seven have a year format of XX, and line ten has a year with quotes around it, like this: “XXXX”. Your new code will not work in any of these instances. You scan through the next hundred records and see that every fifth record or so uses the XX format for the year, and every tenth record uses quotation marks.

Well, now we have a decision to make, we can solve the problem in a non-programmatic, less risky, and less confidence building way, or we can do it the programmatic, riskier, more confidence building way. Let’s take a look at both.

The Less Risky, Initially Easier Way

Here are the first ten lines of the spreadsheet, all together this time:

 MakeModelYear
1ToyotaCarolla2001
2MitsubishiLancer2010
3LexusES 3002014
4VolvoXC902018
5KiaOptima2016
6HondaAccord15
7VolkswagenTiguan08
8TeslaModel S2018
9FordFocus2000
10NissanFrontier"2014"

We’ve already identified the problem, we have differently formatted years. 1,100 rows isn’t nothing, but we could go through the spreadsheet ourselves (as in, opening up a spreadsheet app and tabbing through each cell one by one) and manually change the oddly formatted years to be four digit numbers with no quotes around them. Sure, it would take a little bit of time, but you wanted to take a break from thinking too hard today, anyways. Just put your headphones on and mindlessly plug through it. It will be like taking a mini break from your job, while still appearing productive, you tell yourself. Sounds tempting, and as an added bonus, you will be able to see to it yourself that all of the years are formatted properly, reducing the risk that the output in the database will not be as expected. But let me argue here that it is the very thing that makes this appealing, the idea of kicking back and wasting an hour doing something repetitive but easy, that makes it a bad idea.

Ask yourself, “What do computers do well that humans don’t?” Not a lot. In addition to being able to make calculations and logic based comparisons (just another kind of calculation), us humans can write symphonies, books on philosophy, and yes, even tech blogs (cue applause). But there are some things we’re not as good at. While a math wiz can bang out a hundred calculations in a minute, that’s nowhere near as fast as even an older model computers can. Plus a computer can do these calculations for weeks on end without rest, while humans get tired after the first couple of hours. Think of everyday things  computers do for us, like compressing video, sending it halfway around the world, and decompressing it on our friend’s computer screen so fast that it appears we are speaking to them in real time. At its heart this too is just a computer doing a large amount of simple calculations very, very fast.

I know this might be pretty obvious to a lot of people reading this, but the point I want to drive home is that to be a good programmer, you have to always keep this in mind, and let computers do this one thing they know how to do very well. In this particular instance, that means asking it to loop through a list of 1,100 records and format the dates itself. You could do the formatting yourself, and there are obvious advantages to it, but you are missing the opportunity to build skills that will let you tackle the next data set that is say, 10,000 records long, much too big for you to complete efficiently. And more importantly, the risk you take by doing things the hard way will build your confidence as a programmer, which is a much greater gain than saving yourself a little time and stress in the shortrun.

The Riskier, Initially Harder Way

So we’ve decided to do things the slightly riskier way, and we’re back to our original problem: we’ve got three different year formats and we need to compare each one to an integer, 2005, and filter out the ones that are lower in number (or earlier in year, whatever you prefer). Here’s a snippet of our data set again:

 MakeModelYear
1ToyotaCarolla2001
2MitsubishiLancer2010
3LexusES 3002014
4VolvoXC902018
5KiaOptima2016
6HondaAccord15
7VolkswagenTiguan08
8TeslaModel S2018
9FordFocus2000
10NissanFrontier"2014"

The question before us is, what are the (potentially code breaking) features we need to add to our script? Well, we need a function that does essentially what we were thinking of doing by hand a minute ago. It needs to trim the potential quotation marks off the year, and, since some year values will be two digits, it needs to grab only the last two digits of the year for the comparison. A few lines of code should get us set up:

rtrim() and ltrim() remove the quotation marks. Then substr() gives us just the last two digits of the year. Finally, a ternary operator returns true if the last two digits are higher than 05, for 2005 and lower than 18, for 2018 (it’s unlikely that any cars on the spreadsheet were made between 1905 and 1918, so this should be fine).

Note that while in this example, adding an extra function is not likely to break your code in any way that is difficult to repair, when you have an existing script that is say, 5,000 lines long, adding a new function and calling it from within the code could mess things up in unexpected ways. But it’s still worth doing!

So what did we do here that a computer couldn’t do just as easily? We asked ourselves a philosophical question. “What is a year, in this context?” We decided, “A year is at minimum a two digit number, with no quotation marks.” But what if there are some values in the spreadsheet like this:

$05

Or this:

(2015)

After a little thought, we decide that we want to strip off all non-numerical characters from our string, not just quotation marks. Here is a function that will do that:

Now we’ve accounted for not just the unusual formats we’ve come across already, but also a large set of potential formats that we’ve anticipated might exist, based on what we’ve seen so far. This sort of intuitive problem solving is what we are still much better at doing than computers, so it’s beneficial that we’ve taken on this task ourselves, and asked the computer to do the repetitive stuff. Humans for the win! The preg_match() function just compares a string, in this case $year, to a regular expression,  “/\d{2}$/” and puts the results into an array called $matches. For more on preg_match, see this, and take a look here to learn more about regular expressions.

Now, there could be some nonsense year values in the spreadsheet that our function hasn’t accounted for: “2022956,” or “unicorn,” for example. But who knows what these are supposed to mean in the first place? The above function will take the number “2022956” and return “56,” which is as good a guess as any, and it will return a blank string for “unicorn,” which is also as good a guess as any, because there’s no such thing as “unicorn” year (at least as far as I know). What we have written will probably work for 95% of our records, and possibly for 100% of them. So that’s a pretty good spot to be in. Here is our final code:

What’s The Point, Even?

So you might reasonably ask, “What was the point, really, of doing things the harder way? 1,100 records isn’t that many. In the time it took to conceptualize what a year is, and write a function that covered as many potential formats for it as possible, we could have just changed the values in the spreadsheet ourselves, and with less expended brain power.”

The answer is, while I understand the urge to do it this way, that won’t ever build your confidence as a programmer. Anyone can go through a spreadsheet and manually change numbers. By doing things programatically, and enhancing your code as you go, instead of mitigating risk by avoiding making changes to existing code, you are taking on some manageable risk and building your confidence as a programmer, and confidence is a valuable thing to have. Today you’ve tweaked some code that affects 1,100 records. Tomorrow you will be comfortable adjusting code that accepts 2,000 records, and after that 5,000 and 10,000. You will be better at reading large blocks of source code, and seeing what the effects are of changing a specific part of the code to make it more robust. And if you mess up, you will be better in the future at finding where the problem occurred in the code, and how to fix it.

The more you are willing to risk making mistakes, the better you’ll become at programming, and the stronger and more error free your code will be. That knowledge and confidence pays off in the end, because you’re really not as different from that person across the office from you as you might think.

Why use Lambdas in Python?

There is no doubt that lambdas in Python are fun to write. They’re like regular expressions lite, they give us a certain endorphin rush of accomplishment when we complete one. If you need a refresher, here is a simple example of a function being rewritten into a lambda:

 

Doesn’t look that special, right? I mean, all we’re doing is substituting a function declaration called add2 for a function literal, passed to the variable c. We haven’t made the code lighter by any significant degree, nor have we made it more readable. So the question is, even though they’re cool, why use lambdas in the first place?

The answer comes when we want to call a function only once, especially inside another function. Take a function that takes another function as a parameter, the map function:

 

This can be written much easier with a lambda:

 

I mean, that’s simpler, right? We only need to use the function once, so why not pass a lambda to the map function, as opposed to writing it on one line and referencing it in another? By writing a one line lambda function, we are:

  • making our code leaner and faster
  • avoiding clogging up the namespace with functions we only use once
  • making our code more readable

The importance of this last point can’t be overstated, even for small scripts. Imagine we didn’t use a lambda for this map function, that we declared a function instead and passed it into map:

Then we came back later, and not thinking about it (an easy thing to do when you’re returning to code you previously wrote) added some lines of code in-between the function declaration and the map call:

Suddenly we’re left with a function call to add2 on line 302, and we have no idea what it does. We have to scan all the way up to line 2 to find out.

So the short answer to the question, “Why even use lambdas?” is “its just fun.” The longer answer is “it cuts down on line real estate, speeds up execution time, and makes our code more readable and less prone to bugs in the future.”

Leave a comment below letting me know what you think, and check out my Twitter feed here.