Python Advanced Features
This post intends to discuss some less frequently used / discussed (at least from my experience) but pretty critical features, which is more focused on Python as a programming language instead of data science tools.
Table of Contents
Pass by Object Reference
Python is neither pass by value or pass by reference, instead it is passed by object reference. Everything in python is an object, and variables are just names (object reference) for them. And these objects can be classified as immutable or mutable.
Immutable: objects whose value cannot change
- Tuples
- Booleans
- Numbers
- Strings
Mutable
- Dictionaries
- Lists
- User-defined objects
If the underlying object is mutable, then the parameters are passed by reference. While if the object is immutable then, the changes to the variable cannot be changed, so they are passed by values.
An example of pd.DataFrame
blow. Note that in the dropCol1
, df
within the function block is actually another pointer point to the original data. In the function call, it points to a dropped column data frame, and therefore points to a new memory location. The original data won’t get affected.
In the second function dropCol2
, however, df
as a new pointer points to the original data, (so now both df
in the function and df
outside of the function both point to a single memory), changes the data on itself, then the original data also got modified.
import pandas as pd
# Corrected Functions
def dropCol1(df):
df = df.drop(columns='A')
return df
def dropCol2(df):
df.drop(columns='A', inplace=True)
return df
# Create a sample DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)
# Testing dropCol1
print("Original DataFrame for dropCol1:")
print(df)
dropCol1(df)
print("DataFrame after applying dropCol1:")
print(df)
# Create a sample DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)
# Testing dropCol2
print("\nOriginal DataFrame for dropCol2:")
print(df)
dropCol2(df)
print("DataFrame after applying dropCol2:")
print(df)
Variable Number of Positional and Keyword Arguments
def crazyprinter(*args, **kwargs):
for arg in args:
print(arg)
for k, v in kwargs.items():
print("{}={}".format(k, v))
crazyprinter("hello", "cheese", bar="foo")
# hello
# cheese
# bar=foo
Decorators
Decorators are really just a pretty way to wrap functions using functions that return functions.
from functools import wraps
def logging(func):
@wraps(func)
def wrapper(*args, **kwargs):
result = func(*args, **kwargs)
print(result)
return result
return wrapper
@logging
def foo(bar, baz):
return bar + baz - 42
# equivalent to...
def foo(bar, baz):
return bar + baz - 42
foo = logging(foo)
lru_cache
This is a decorator in functools
. Which can be pretty efficient when we use recursion, this wrapper realizes memorization itself. It keeps dictionary and stores the values of every function call.
The next Fibonacci example is great. A simple recursion without memorization will redundantly call many times. Here the function with decorator is only called 16 times form .cache_info
, and it prevents unnecessary calls.
@lru_cache(maxsize=None)
def fib(n):
if n < 2:
return n
return fib(n-1) + fib(n-2)
>>> [fib(n) for n in range(16)]
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610]
>>> fib.cache_info()
CacheInfo(hits=28, misses=16, maxsize=None, currsize=16)