Python Advanced Features

This post intends to discuss some less frequently used / discussed (at least from my experience) but pretty critical features, which is more focused on Python as a programming language instead of data science tools.

Table of Contents

Pass by Object Reference

Python is neither pass by value or pass by reference, instead it is passed by object reference. Everything in python is an object, and variables are just names (object reference) for them. And these objects can be classified as immutable or mutable.

  1. Immutable: objects whose value cannot change

    1. Tuples
    2. Booleans
    3. Numbers
    4. Strings
  2. Mutable

    1. Dictionaries
    2. Lists
    3. User-defined objects

    If the underlying object is mutable, then the parameters are passed by reference. While if the object is immutable then, the changes to the variable cannot be changed, so they are passed by values.

An example of pd.DataFrame blow. Note that in the dropCol1, df within the function block is actually another pointer point to the original data. In the function call, it points to a dropped column data frame, and therefore points to a new memory location. The original data won’t get affected.

In the second function dropCol2, however, df as a new pointer points to the original data, (so now both df in the function and df outside of the function both point to a single memory), changes the data on itself, then the original data also got modified.

import pandas as pd

# Corrected Functions
def dropCol1(df):
    df = df.drop(columns='A')
    return df

def dropCol2(df):
    df.drop(columns='A', inplace=True)
    return df

# Create a sample DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)

# Testing dropCol1
print("Original DataFrame for dropCol1:")
print(df)
dropCol1(df)
print("DataFrame after applying dropCol1:")
print(df)

# Create a sample DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)

# Testing dropCol2
print("\nOriginal DataFrame for dropCol2:")
print(df)
dropCol2(df)
print("DataFrame after applying dropCol2:")
print(df)

Variable Number of Positional and Keyword Arguments

def crazyprinter(*args, **kwargs):
    for arg in args:
    	print(arg)
    for k, v in kwargs.items():
    	print("{}={}".format(k, v))
crazyprinter("hello", "cheese", bar="foo")
# hello
# cheese
# bar=foo

Decorators

Decorators are really just a pretty way to wrap functions using functions that return functions.

from functools import wraps
def logging(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        result = func(*args, **kwargs)
        print(result)
        return result
    return wrapper

@logging
def foo(bar, baz):
	return bar + baz - 42

# equivalent to...
def foo(bar, baz):
	return bar + baz - 42

foo = logging(foo)

lru_cache

This is a decorator in functools. Which can be pretty efficient when we use recursion, this wrapper realizes memorization itself. It keeps dictionary and stores the values of every function call.

The next Fibonacci example is great. A simple recursion without memorization will redundantly call many times. Here the function with decorator is only called 16 times form .cache_info, and it prevents unnecessary calls.

@lru_cache(maxsize=None)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

>>> [fib(n) for n in range(16)]
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610]

>>> fib.cache_info()
CacheInfo(hits=28, misses=16, maxsize=None, currsize=16)

heapq

Defaultdict

Yiming Zhang
Yiming Zhang
Quantitative Researcher Associate, JP Morgan