Skip to content

bobby_dreamer

Python profiling

python, cloud-shell, profiler, GCS2 min read

When you start to program big or start to process bigger data than usual, you can notice slowness sometimes. It could be due to memory used in the laptop at that time. You can retry after shutting down all the application which you have been running parallely. If its still slow, it must because of your code. Easiest way to spot slow section of the code is by setting up timers,

If you want to know how much time entire program takes, you can do something like this,

Its not always a good strategy to guess or follow the gut-feeling in identifying performance problems, things can surprise you.

If you are looking for more information than the above timers like number of calls, execution time then you should look into Python profilers.

What's Python Profiling ? As per the doc

A profile is a set of statistics that describes how often and for how long various parts of the program executed.

Here we are going to see two profilers,

  1. cProfile
  2. line_profile

# cProfile

This profiler comes in the standard python installation. Easiest way to profile a program is like below,

This could give you huge information, if your program is big.

FieldsDescription
ncallsHow many times code was called
tottimeTotal time it took(excluding the time of other functions)
percallHow long each call took
cumtimeTotal time it took(including the time of other functions)

You can redirect the output to a text file or dump the stats using the -o output filename option in command.

By using the output file option, you get additional information like,

  • Callers - What function it called
  • Callees - What function called it

Catch is, this output file is binary. To read this binary file you need to use pstats. All these things sound pretty much tedious. Luckily i found a package called cprofilev which reads this output file and starts a server and in the localhost, where you can see all the profiling information.

Author of the tool : ymichael

Below are the steps,

Now go and check http://localhost:4000/. You will see the stats, you can sort on the fields and click on function name links. cProfilev - Main screen
cProfilev - Function

# line_profiler

Problem with this is, i couldn't install line-profiler in Windows 10, failed due to below error and first thing that came to my mind is "i should give up line profiling as it is not getting installed" next thought is "Will this work in Cloud shell".

I use Windows 10 machine and to solve the above issue, i took the below steps.

  1. Logged into my google console

  2. Went to Cloud Storage and created a new bucket

  3. Drag and drop files to the bucket ( data files, python file )

  4. Open Cloud Shell and i ran the below commands to set the environment,

    You can get the project-id by clicking on the top droplist, a small window will open to select the project, where you can see the project-id. GCP - Project ID

  5. Installed all the required libraries, since its only few

  6. Tested with a small program to see if line profiler works test-lp.py

  7. Made changes to the source code for line_profiler. Added import for line_profiler,

    In my original, i call the function like this,

    To profile, i modified the code like this

  8. Generates a big output, so its better to save it

  9. When everything is done here, copying outputs & files to cloud storage.

  10. Cloud shell also has a Quota, watch out for it. Cloud Shell - Quota

And you are done.

I find Cloud shell handy in doing things like this.

This is how the line-profiler output looks like and i started to work on the code with big Time values.
Details are in here

# Resources

# Related articles

  1. BSE Weekly trend analysis using Pandas & Numpy
  2. Github : BSE Trend analysis using Pandas(Notebook)
  3. Github : BSE Trend analysis using Numpy(Notebook)