A (brief) crash course in Python

Published in

Analytics Vidhya

7 min readMay 12, 2021

We came across a need to check bundle files to see if all labels are correctly added to all related files. Since this script (or whatever we use) was going to be utilized from our Jenkins pipeline my first reaction to go for bash but, why don’t we make it interesting and do it with python. Having 0 experience in it what so ever, let’s dive right in.

First of all, lets start with Python itself, and then install PyCharm Community since it’s free and looks promising. It has the same interface with Intellij IDEA so it’s not too jarring of a change.

Now lets open PyCharm and choose new project, create one with the below configs.

Create a main.py just to see the structure of the main class at this point

So now, we should have the below bits and not much else.

configuration to run/debug

When you run it via the green run button above, you should see the following

Our mission was to check some files in a directory but that directory changes with every build, hence we need to get the directory via a parameter. So lets edit our configuration

add whatever directory you want to check for bundles

Now finally, the code. Lets talk about it a bit. Open up main.py and paste the below code.

import sys


def print_hi(name):
    print(f'Hi, {name}')


if __name__ == '__main__':
    print_hi('PyCharm')
    try:
        directory_name = sys.argv[1]
        print(directory_name)
    except:
        print('Please pass directory_name')

So here, we import sys module which does the following (You can see the details with ctrl+leftclick on sys in the import statement). In short, it allows access to cli args.

"""
This module provides access to some objects used or maintained by the interpreter and to functions that interact strongly with the interpreter.
Dynamic objects:

argv -- command line arguments; argv[0] is the script pathname if known
...

The if bit is the start of the main class. All code must end with an empty new line apparently. Next part with the print_hi (‘PyCharm’) goes for the print_hi method a little above it and prints the passed parameter. Also notice the 2 line spaces between import, method definition and the main class. Pyhton cares about these things :D. Also indentation is important since there are no {} that define start and finish like in some other high level languages.

def print_hi(name):
    print(f'Hi, {name}')
if __name__ == '__main__':
    print_hi('PyCharm')

Also, notice this bit.

    try:
        directory_name = sys.argv[1]
        print(directory_name)
    except:
        print('Please pass directory_name')

When you run the main.py, you should be getting;

Lets remove the def print_hi method and its usage from main and then move on to creating a new file in a sub dir so we don’t write our code to main.py like savages. I assume this logic still holds in python as well.

I’ve created a LoopHelper like this. Notice the __init__.py file which apparently allows us to import our new file from main (and from other files if need be).

LoopHelper.py contents are like this;

import os


def loop_through(dir_name):
    directory = os.fsencode(dir_name)
    for file in os.listdir(directory):
        filename = os.fsdecode(file)
        if filename.endswith(".properties"):
            print(os.path.join(directory, filename))
            continue
        else:
            continue

So basically, we’ve created a method loop_through with a parameter dir_name. We use fsencode method to properly encode the dir_name. We then list and loop through the files, decode file to get the path and then do nothing (yet).

os.fsencode() method in Python is used to encode the specified filename to the filesystem encoding with ‘surrogateescape‘ error handler, or ‘strict‘ on Windows;

Lets use our new class from main adding this to the very top. It’s basically the folder structure and then the class name.

from classes.helpers import LoopHelper...
""" 
usage is like;
FileName.ClassName.whateverMethod"""LoopHelper.LoopHelper().loop_through(sys.argv[1])

So with this info, lets modify our main.py as below and while we are at it let’s change its name too to something like bundle_check.py

from classes.helpers import LoopHelper
import sys
import argparse

if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description='''Usage python PATHTO/bundle_check.py ROOTDIR ''')
    parser.add_argument('path', nargs='*', default=[1], help='root dir to check for bundles')
    args = parser.parse_args()

    print("Bundle check in dir: " + sys.argv[1])
    LoopHelper.LoopHelper().loop_through(sys.argv[1])

So a few extra things here. The last two lines are the ones we know that say I got this argument as dir and passes the arg to loop_through method.

The lines above are for help if someone uses our script like python bundle_check.py -h this pops up;

Now the main workhorse is the LoopHelper class. I’ve broken it down to several parts,

import os
import mmap
import sys
class LoopHelper:
    """
    Main Helper class containing the logic for bundle checking.

    Depending on the failed value;
    return sys.exit 1 if failed is true,
    return sys.exit 0 if failed is false
    """

    TR_POSTFIX = '_tr.'
    EN_POSTFIX = '_en.'
    BUNDLE_EXT = 'properties'
    TR_BUNDLE_EXT = TR_POSTFIX + BUNDLE_EXT
    EN_BUNDLE_EXT = EN_POSTFIX + BUNDLE_EXT
    failed = 0
...

The class of LoopHelper has several constants we will be using to check for file match operations. failed is the one class field we will be playing around with. If we get a bundle mismatch, this will be set so the app will exit with code 1, otherwise app will exit with code 0.

Now lets look at the methods;

def analyse_files(self, root_dir):
def nitpick(self, source_file, target_file, target_postfix):

analyse_files takes the directory to work and uses os.walk to recursively walk through all directories for files. For each file in the directory, it merges the dir name with the filename to get a complete path.

It then checks if the filename matches the bundle extension of _tr.properties and generates the expected _en equivalent. Both these file paths are sent to the nitpick method to get checked. Same applies for the opposite situation.

If something goes wrong it sets the failed flag and lists the problematic files.

def analyse_files(self, root_dir):
    """
    Prepares for analyses of files in work dir
    Checks for existence of TR_BUNDLE_EXT file, tries to find related EN_BUNDLE_EXT file and sends
    it to nitpick method and vice versa.

    Sets failed to 1 if any label or file is missing.

    :param root_dir: the root dir to check for bundles
    """
    for dir_path, dir_name, files in os.walk(root_dir):
        for file in files:
            complete_path = os.path.join(dir_path, file)
            try:
                if file.endswith(self.TR_BUNDLE_EXT):
                    tr_file_path = ""
                    en_file_path = complete_path.replace(self.TR_POSTFIX, self.EN_POSTFIX)
                    self.nitpick(complete_path, en_file_path, self.EN_POSTFIX)  # check for missing tr items in en

                if file.endswith(self.EN_BUNDLE_EXT):
                    en_file_path = ""
                    tr_file_path = complete_path.replace(self.EN_POSTFIX, self.TR_POSTFIX)
                    self.nitpick(complete_path, tr_file_path, self.TR_POSTFIX)  # check for missing en items in tr
            except Exception as generic:
                self.failed = 1
                print(generic)
                print('Related files:\n' + (en_file_path or complete_path)
                      + '\n' + (tr_file_path or complete_path) + '\n')

Now nitpick method is something else entirely. It takes a source and target file with the target postfix (because I can’t be bothered to find it again) and opens source to loop through, the target as readonly memory map to find said label. If it can’t find label in target, it sets fail but keeps going. This is to show devs all of the missing bundles and to unburden our pipeline of constant builds.

def nitpick(self, source_file, target_file, target_postfix):
    """
    Tries to find labels in source file in target file
    :param source_file:
    :param target_file:
    :param target_postfix: The target postfix to easily find missing label harboring files
    """
    with open(source_file) as file, open(target_file) as file_en:
        mmap_en_file = mmap.mmap(file_en.fileno(), 0, access=mmap.ACCESS_READ)  # get mmap for en file
        for bundle_line in file:  # bundle lines in source file
            if bundle_line.find('=') >= 0:
                bundle_key = bundle_line[0:bundle_line.find('=')].strip()  # get bundle key

                if mmap_en_file.find(bundle_key.encode()) == -1:
                    # if not able to find line in target file, set as failed
                    print('Key-val pair -> ' + os.path.basename(
                        source_file) + ' : ' + bundle_key + ' not found in ' + target_postfix)
                    self.failed = 1

Thats about it actually. This script runs with the below command;

python bundle_check.py DIR_TO_CHECK

Can be integrated to a pipeline as below since it returns with code 1 for failure.

python bundle_check.py DIR_TO_CHECK && echo 'OK' || echo 'Not OK'

There are parts that can be improved and would love to hear your thoughts about them. Thanks for reading.

A (brief) crash course in Python

Written by Yiğit İrez