What is a meaningful variable name?
Quick, what does the following code do? Show
It’s impossible to tell right? If you were trying to modify or debug this code, you’d be at a loss unless you could read the author’s mind. Even if you were the author, a few days after writing this code you might not remember what it does because of the unhelpful variable names and use of magic numbers. Working with data science code, I often see examples like above (or worse): code with variable names such as Clear Variable Names in 3 Steps
As I’ve grown from writing research-oriented data science code for one-off analyses to production-level code (at Cortex Building Intelligence), I’ve had to improve my programming by unlearning practices from data science books, courses and the lab. There are significant differences between deployable machine learning code and how data scientists learn to program, but we’ll start here by focusing on two common and easily fixable problems:
Both these problems contribute to the disconnect between data science research (or Kaggle projects) and production machine learning systems. Yes, you can get away with them in a Jupyter Notebook that runs once, but when you have mission-critical machine learning pipelines running hundreds of times per day with no errors, you have to write readable and understandable code. Fortunately, there are best practices from software engineering we data scientists can adopt, including the ones we’ll cover in this article. Note: I’m focusing on Python since it’s by far the most widely used language in industry data science. Some Python-specific naming rules (see here for more details) include:
More From Will KoerhsenThe Poisson Process and Poisson Distribution, Explained Naming VariablesThere are three basic ideas to keep in mind when naming variables:
What does this look like in practice? Let’s go through some improvements to variable names. X and YIf you’ve seen these several hundred times, you
know they commonly refer to features and targets in a data science context, but that may not be obvious to other developers reading your code. Instead, use names that describe what these variables represent such as ValueWhat does the value represent? It could stand for TempEven if you are only using a variable as a temporary value store, still give it a meaningful name. Perhaps it is a value where you need to convert the units, so in that case, make it explicit:
usd, aud, mph, kwh, sqftIf you’re using abbreviations like these, make sure you establish them ahead of time. Agree with the rest of your team on common abbreviations and write them down. Then, in code review, make sure to enforce these written standards. tp, tn, fp, fnAvoid machine learning-specific abbreviations. These values represent The above are examples of prioritizing ease of reading code instead of how quickly you can write it. Reading, understanding, testing, modifying and debugging poorly written code takes far longer than well-written code. Overall, trying to write code faster by using shorter variable names will actually increase your program’s development and debugging time! If you don’t believe me, go back to some code you wrote six months ago and try to modify it. If you find yourself having to decipher your own past code, that’s an indication you should be concentrating on better naming conventions. xs and ysThese are often used for plotting, in which case the values represent When Accuracy Isn't Enough...Use Precision and Recall to Evaluate Your Classification Model What Makes a Bad Variable Name?Most problems with naming variables stem from:
On the first point, while languages like Fortran did limit the length of variable names (to six characters), modern programming languages have no restrictions so don’t feel forced to use contrived abbreviations. Don’t use overly long variable names either, but if you have to favor one side, aim for readability. With regards to the second point, when you write an equation or use a model — and this is a point schools forget to emphasize — remember the letters or inputs represent real-world values!
Let’s see an example that makes both mistakes. Say we have a polynomial equation for finding the price of a house from a model. You may be tempted to write the mathematical formula directly in code:
This is code that looks like it was written by a machine for a machine. While a computer will ultimately run your code, it’ll be read by humans, so write code intended for humans! To do this, we need to think not about the formula itself (the how) and consider the real-world objects being modeled (the what). Let’s write out the complete equation. This is a good test to see if you understand the model):
If you are having trouble naming your variables, it means you don’t know the model or your code well enough. We write code to solve real-world problems, and we need to understand the problem our model represents.
Descriptive variable names let you work at a higher level of abstraction than a formula, helping you focus on the problem domain. Other Variable Naming ConsiderationsOne of the important points to remember when naming variables is: consistency counts. Staying consistent with variable names means you spend less time worrying about naming and more time solving the problem. This point is relevant when you add aggregations to variable names. Variable Names — Dos and Dont’s
Aggregations in Variable NamesSo you’ve got the basic idea of
using descriptive names, changing
Following these rules, your set of aggregated variables might be A
tricky point comes up when you have a variable representing the number of an item. You might be tempted to use
To avoid ambiguity, use Loop IndexesFor some unfortunate reason, typical loop variables have become
or
This is especially useful when you have nested loops so you don’t have to remember
if (In Python, if you aren’t using a loop variable, then use Variable Names — Conventions to Avoid
All of these rules stick to the principle of prioritizing read-time understandability instead of write-time convenience. Coding is primarily a method for communicating with other programmers, so give your team members some help in making sense of your computer programs. Never Use Magic NumbersA magic number is a constant value without a variable name. I see these used for tasks like converting units, changing time intervals or adding an offset:
(These variable names are all bad, by the way!) Magic numbers are a large source of errors and confusion because:
Instead of using magic numbers in this situation, we can define a function for conversions that accepts the unconverted value and the conversion rate as parameters:
If we use the conversion rate throughout a program in many functions, we could define a named constant in a single location:
(Remember, before we start the project, we should establish with our team that Here’s another example:
Using a As a real-world example of the perils of magic numbers, in college, I worked on a research project with building energy data that initially came in 15-minute intervals. No one gave much thought to the possibility this could change, and we wrote hundreds of functions with the magic number 15 (or 96 for the number of daily observations). This worked fine until we started getting data in five and one-minute intervals. We spent weeks changing all our functions to accept a parameter for the interval, but even so, we were still fighting errors caused by the use of magic numbers for months. More From Our Data Science ExpertsA Beginner's Guide to Evaluating Classification Models in Python Real-world data has a habit of changing on you. Conversion rates between currencies fluctuate every minute and hard-coding in specific values means you’ll have to spend significant time re-writing your code and fixing errors. There is no place for magic in programming, even in data science. The Importance of Standards and ConventionsThe benefits of adopting standards are that they let you make a single global decision instead of many local ones. Instead of choosing where to put the aggregation every time you name a variable, make one decision at the start of the project, and apply it consistently throughout. The objective is to spend less time on concerns only peripherally related to data science: naming, formatting, style — and more time solving important problems (like using machine learning to address climate change). If you are used to working by yourself, it might be hard to see the benefits of adopting standards. However, even when working alone, you can practice defining your own conventions and using them consistently. You’ll still get the benefits of fewer small decisions and it’s good practice for when you inevitably have to develop on a team. Anytime you have more than one programmer on a project, standards become a must! Keep Clarifying Your Code5 Ways to Write More Pythonic Code You might disagree with some of the choices I’ve made in this article, and that’s fine! It’s more important to adopt a consistent set of standards than the exact choice of how many spaces to use or the maximum length of a variable name. The key point is to stop spending so much time on accidental difficulties and instead concentrate on the essential difficulties. (Fred Brooks, author of the software engineering classic The Mythical Man-Month, has an excellent essay on how we’ve gone from addressing accidental problems in software engineering to concentrating on essential problems). Now let's go back to the initial code we started with and fix it up.
We’ll use descriptive variable names and named constants.
Now we can see that this code is normalizing the pixel values in an array and adding a constant offset to create a new array (ignore the inefficiency of the implementation!). When we give this code to our colleagues, they will be able to understand and modify it. Moreover, when we come back to the code to test it and fix our errors, we’ll know precisely what we were doing. Clarifying your variable names may seem like a dry activity, but if you spend time reading about software engineering, you realize what differentiates the best programmers is the repeated practice of mundane techniques such as using good variable names, keeping routines short, testing every line of code, refactoring, etc. These are the techniques you need to take your code from research or exploration to production-ready and, once there, you’ll see how exciting it is for your data science models to influence real-life decisions. How do you write a meaningful variable name?Use Intention-Revealing Names. The name of the variable, function, class, etc should be sufficient enough to understand its purpose. ... . Name Functions as Verbs. ... . Name Classes as Nouns. ... . Use Meaningful Distinction. ... . Use Pronounceable Names. ... . Use Searchable Names. ... . Avoid Encodings.. What is a good name for a variable?A good variable name should: Be clear and concise. Be written in English. A general coding practice is to write code with variable names in English, as that is the most likely common language between programmers.
What is important that variable names are meaningful?Each variable is named so it is clear which variable is being used at any time. It is important to use meaningful names for variables: For example, pocketMoney = 20 means that the variable 'pocketMoney' is being used to store how much pocket money you have.
|