Losing your Loops¶

Fast Numerical Computing with NumPy¶

Python is Fast¶

For Writing, Testing, and Developing of Code

# Hello World in Python
print("hello world")

/* Hello World in Java */
public class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello, World");
    }
}

In [ ]:

%matplotlib inline

In [ ]:

import seaborn as sns
data = sns.load_dataset("iris")
sns.pairplot(data, hue="species");

Python is Slow¶

Compared to compiled languages

In [ ]:

# A silly function implemented in Python

def func_python(N):
    d = 0.0
    for i in range(N):
        d += (i % 3 - 1) * i
    return d

In [ ]:

# Use IPython timeit magic to time the execution
%timeit func_python(10000)

Fortran version:¶

In [ ]:

%load_ext fortranmagic

In [ ]:

%%fortran
subroutine func_fort(n, d)
    integer, intent(in) :: n
    double precision, intent(out) :: d
    integer :: i
    d = 0
    do i = 0, n - 1
        d = d + (mod(i, 3) - 1) * i
    end do
end subroutine func_fort

In [ ]:

%%file func_fortran.f

      subroutine func_fort(n, d)
           integer, intent(in) :: n
           double precision, intent(out) :: d
           integer :: i
           d = 0
           do i = 0, n - 1
                d = d + (mod(i, 3) - 1) * i
           end do
      end subroutine func_fort

In [ ]:

# use f2py rather than f2py3 for Python 2
!f2py3 -c func_fortran.f -m func_fortran > /dev/null

In [ ]:

from func_fortran import func_fort

In [ ]:

%timeit func_fort(10000)

Outline¶

Use Numpy ufuncs to your advantage
Use Numpy aggregates to your advantage
Use Numpy broadcasting to your advantage
Use Numpy slicing and masking to your advantage
Use a tool like SWIG, Weave, cython, f2py, Numba, etc. to compile Python or to interface to compiled code.

Fortran is about 100 times faster for this task!

Why is Python so slow?¶

We alluded to this yesterday, but languages tend to have a compromise between convenience and performance.

C, Fortran, etc.: static typing and compiled code leads to fast execution
- But: lots of development overhead in declaring variables, no interactive prompt, etc.
Python, R, Matlab, IDL, etc.: dynamic typing and interpreted excecution leads to fast development
- But: lots of execution overhead in dynamic type-checking, etc.

We like Python because our development time is generally more valuable than execution time. But sometimes speed can be an issue.

Strategies for making Python fast¶

Use Numpy ufuncs to your advantage
Use Numpy aggregates to your advantage
Use Numpy broadcasting to your advantage
Use Numpy slicing and masking to your advantage
Use a tool like SWIG, cython or f2py to interface to compiled code.

Here we'll cover the first four, and leave the fifth strategy for a later session.

Strategy 1: Use ufuncs to your advantage¶

A ufunc in numpy is a Universal Function. This is a function which operates element-wise on an array. We've already seen examples of these in the various arithmetic operations:

In [ ]:

a = [1, 3, 2, 4, 3, 1, 4, 2]
b = [val + 5 for val in a]
print(b)

In [ ]:

import numpy as np
a = np.array(a)

In [ ]:

b = a + 5  # element-wise
print(b)

The speed of ufuncs¶

In [ ]:

a = list(range(100000))
%timeit [val + 5 for val in a]

In [ ]:

a = np.array(a)
%timeit a + 5

Other ufuncs¶

There are many, many ufuncs available:

Arithmetic: + - * / // % **
Bitwise Operations: & | ~ ^ >> <<
Comparisons: < > <= >= == !=
Trig Functions: np.sin np.cos np.tan ...etc.
Exponential Family: np.exp np.log np.log10 ...etc.
Special Functions: scipy.special.*

and many, many more.

Strategy 2. use aggregations to your advantage¶

Aggregations are functions over arrays which return smaller arrays.

Suppose you want to compute the minimum of an array

In [ ]:

from random import random
c = [random() for i in range(100000)]

In [ ]:

%timeit min(c)

In [ ]:

c = np.array(c)

In [ ]:

%timeit c.min()

Aggregates along axes¶

In [ ]:

M = np.random.randint(0, 10, (3, 5))
M

In [ ]:

M.sum()

In [ ]:

M.sum(axis=0)

In [ ]:

M.sum(axis=1)

Other Aggregation Functions¶

Numpy has many useful aggregation functions:

np.min np.max np.sum np.prod np.mean np.std np.var np.any np.all np.median np.percentile np.argmin np.argmax

Most also have a NaN-aware equivalent:

np.nanmin np.nanmax np.nansum ...etc.

Strategy 3: Use Broadcasting to your advantage¶

Broadcasting in NumPy is the set of rules for applying ufuncs on arrays of different sizes and/or dimensions.

In [ ]:

np.arange(3) + 5

In [ ]:

np.ones((3, 3)) + np.arange(3)

In [ ]:

np.arange(3).reshape((3, 1)) + np.arange(3)

Visualizing Broadcasting¶

(image source)

Rules of Broadcasting¶

If array dimensions differ, left-pad the smaller shape with 1s
If any dimension does not match, stretch the dimension with size=1
If neither non-matching dimension is 1, raise an error.

Examples¶

Example 1¶

In [ ]:

M = np.ones((2, 3))
M

In [ ]:

a = np.arange(3)
a

In [ ]:

M + a

Example 2¶

In [ ]:

a = np.arange(3).reshape((3, 1))
a

In [ ]:

b = np.arange(3)
b

In [ ]:

a + b

Example 3¶

In [ ]:

M = np.ones((3, 2))
M

In [ ]:

a = np.arange(3)
a

In [ ]:

M + a

Strategy 4: Use slicing and masking to your advantage¶

The last strategy we will cover is slicing and masking.

Python lists can be indexed with integers or slices:

In [ ]:

L = [2, 3, 5, 7, 11]

In [ ]:

L[0]  # integer index

In [ ]:

L[1:3]  # slice for multiple elements

NumPy arrays are like lists¶

In [ ]:

L = np.array(L)
L

In [ ]:

L[0]

In [ ]:

L[1:3]

Masking¶

In [ ]:

mask = np.array([False, True, True,
                 False, True])
L[mask]

In [ ]:

mask = (L < 4) | (L > 8) # "|" = "bitwise OR"
L[mask]

Fancy Indexing¶

In [ ]:

ind = [0, 4, 2]
L[ind]

Multiple Dimensions¶

In [ ]:

M = np.arange(6).reshape(2, 3)
M

In [ ]:

# multiple indices separated by comma
M[0, 1]

In [ ]:

# mixing slices and indices
M[:, 1]

In [ ]:

# masking the full array
M[abs(M - 3) < 2]

In [ ]:

# mixing fancy indexing and slicing
M[[1, 0], :2]

In [ ]:

# mixing masking and slicing 
M[M.sum(axis=1) > 4, 1:]

In [ ]:

Putting it All Together¶

Nearest Neighbors of some data

In [ ]:

# 1000 points in 3 dimensions
X = np.random.random((1000, 3))
X.shape

In [ ]:

# Broadcasting to find pairwise differences
diff = X.reshape(1000, 1, 3) - X
diff.shape

In [ ]:

# Aggregate to find pairwise distances
D = (diff ** 2).sum(2)
D.shape

In [ ]:

# set diagonal to infinity to skip self-neighbors
i = np.arange(1000)
D[i, i] = np.inf

In [ ]:

# print the indices of the nearest neighbor
i = np.argmin(D, 1)
print(i[:10])

In [ ]:

# double-check with scikit-learn
from sklearn.neighbors import NearestNeighbors
d, i = NearestNeighbors().fit(X).kneighbors(X, 2)
print(i[:10, 1])

Summary: Speeding up NumPy¶

It's all about moving loops into compiled code:

Use Numpy ufuncs to your advantage (eliminate loops!)
Use Numpy aggregates to your advantage (eliminate loops!)
Use Numpy broadcasting to your advantage (eliminate loops!)
Use Numpy slicing and masking to your advantage (eliminate loops!)
Use a tool like SWIG, cython or f2py to interface to compiled code.

In [ ]: