#!/usr/bin/env python
# coding: utf-8
#
#
#
# *This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*
#
# *The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*
#
# < [Understanding Data Types in Python](02.01-Understanding-Data-Types.ipynb) | [Contents](Index.ipynb) | [Computation on NumPy Arrays: Universal Functions](02.03-Computation-on-arrays-ufuncs.ipynb) >
#
#
#
# # The Basics of NumPy Arrays
# Data manipulation in Python is nearly synonymous with NumPy array manipulation: even newer tools like Pandas ([Chapter 3](03.00-Introduction-to-Pandas.ipynb)) are built around the NumPy array.
# This section will present several examples of using NumPy array manipulation to access data and subarrays, and to split, reshape, and join the arrays.
# While the types of operations shown here may seem a bit dry and pedantic, they comprise the building blocks of many other examples used throughout the book.
# Get to know them well!
#
# We'll cover a few categories of basic array manipulations here:
#
# - *Attributes of arrays*: Determining the size, shape, memory consumption, and data types of arrays
# - *Indexing of arrays*: Getting and setting the value of individual array elements
# - *Slicing of arrays*: Getting and setting smaller subarrays within a larger array
# - *Reshaping of arrays*: Changing the shape of a given array
# - *Joining and splitting of arrays*: Combining multiple arrays into one, and splitting one array into many
# ## NumPy Array Attributes
# First let's discuss some useful array attributes.
# We'll start by defining three random arrays, a one-dimensional, two-dimensional, and three-dimensional array.
# We'll use NumPy's random number generator, which we will *seed* with a set value in order to ensure that the same random arrays are generated each time this code is run:
# In[1]:
import numpy as np
np.random.seed(0) # seed for reproducibility
x1 = np.random.randint(10, size=6) # One-dimensional array
x2 = np.random.randint(10, size=(3, 4)) # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5)) # Three-dimensional array
# Each array has attributes ``ndim`` (the number of dimensions), ``shape`` (the size of each dimension), and ``size`` (the total size of the array):
# In[2]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)
# Another useful attribute is the ``dtype``, the data type of the array (which we discussed previously in [Understanding Data Types in Python](02.01-Understanding-Data-Types.ipynb)):
# In[3]:
print("dtype:", x3.dtype)
# Other attributes include ``itemsize``, which lists the size (in bytes) of each array element, and ``nbytes``, which lists the total size (in bytes) of the array:
# In[4]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")
# In general, we expect that ``nbytes`` is equal to ``itemsize`` times ``size``.
# ## Array Indexing: Accessing Single Elements
# If you are familiar with Python's standard list indexing, indexing in NumPy will feel quite familiar.
# In a one-dimensional array, the $i^{th}$ value (counting from zero) can be accessed by specifying the desired index in square brackets, just as with Python lists:
# In[5]:
x1
# In[6]:
x1[0]
# In[7]:
x1[4]
# To index from the end of the array, you can use negative indices:
# In[8]:
x1[-1]
# In[9]:
x1[-2]
# In a multi-dimensional array, items can be accessed using a comma-separated tuple of indices:
# In[10]:
x2
# In[11]:
x2[0, 0]
# In[12]:
x2[2, 0]
# In[13]:
x2[2, -1]
# Values can also be modified using any of the above index notation:
# In[14]:
x2[0, 0] = 12
x2
# Keep in mind that, unlike Python lists, NumPy arrays have a fixed type.
# This means, for example, that if you attempt to insert a floating-point value to an integer array, the value will be silently truncated. Don't be caught unaware by this behavior!
# In[15]:
x1[0] = 3.14159 # this will be truncated!
x1
# ## Array Slicing: Accessing Subarrays
# Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the *slice* notation, marked by the colon (``:``) character.
# The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array ``x``, use this:
# ``` python
# x[start:stop:step]
# ```
# If any of these are unspecified, they default to the values ``start=0``, ``stop=``*``size of dimension``*, ``step=1``.
# We'll take a look at accessing sub-arrays in one dimension and in multiple dimensions.
# ### One-dimensional subarrays
# In[16]:
x = np.arange(10)
x
# In[17]:
x[:5] # first five elements
# In[18]:
x[5:] # elements after index 5
# In[19]:
x[4:7] # middle sub-array
# In[20]:
x[::2] # every other element
# In[21]:
x[1::2] # every other element, starting at index 1
# A potentially confusing case is when the ``step`` value is negative.
# In this case, the defaults for ``start`` and ``stop`` are swapped.
# This becomes a convenient way to reverse an array:
# In[22]:
x[::-1] # all elements, reversed
# In[23]:
x[5::-2] # reversed every other from index 5
# ### Multi-dimensional subarrays
#
# Multi-dimensional slices work in the same way, with multiple slices separated by commas.
# For example:
# In[24]:
x2
# In[25]:
x2[:2, :3] # two rows, three columns
# In[26]:
x2[:3, ::2] # all rows, every other column
# Finally, subarray dimensions can even be reversed together:
# In[27]:
x2[::-1, ::-1]
# #### Accessing array rows and columns
#
# One commonly needed routine is accessing of single rows or columns of an array.
# This can be done by combining indexing and slicing, using an empty slice marked by a single colon (``:``):
# In[28]:
print(x2[:, 0]) # first column of x2
# In[29]:
print(x2[0, :]) # first row of x2
# In the case of row access, the empty slice can be omitted for a more compact syntax:
# In[30]:
print(x2[0]) # equivalent to x2[0, :]
# ### Subarrays as no-copy views
#
# One important–and extremely useful–thing to know about array slices is that they return *views* rather than *copies* of the array data.
# This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies.
# Consider our two-dimensional array from before:
# In[31]:
print(x2)
# Let's extract a $2 \times 2$ subarray from this:
# In[32]:
x2_sub = x2[:2, :2]
print(x2_sub)
# Now if we modify this subarray, we'll see that the original array is changed! Observe:
# In[33]:
x2_sub[0, 0] = 99
print(x2_sub)
# In[34]:
print(x2)
# This default behavior is actually quite useful: it means that when we work with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer.
# ### Creating copies of arrays
#
# Despite the nice features of array views, it is sometimes useful to instead explicitly copy the data within an array or a subarray. This can be most easily done with the ``copy()`` method:
# In[35]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)
# If we now modify this subarray, the original array is not touched:
# In[36]:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy)
# In[37]:
print(x2)
# ## Reshaping of Arrays
#
# Another useful type of operation is reshaping of arrays.
# The most flexible way of doing this is with the ``reshape`` method.
# For example, if you want to put the numbers 1 through 9 in a $3 \times 3$ grid, you can do the following:
# In[38]:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)
# Note that for this to work, the size of the initial array must match the size of the reshaped array.
# Where possible, the ``reshape`` method will use a no-copy view of the initial array, but with non-contiguous memory buffers this is not always the case.
#
# Another common reshaping pattern is the conversion of a one-dimensional array into a two-dimensional row or column matrix.
# This can be done with the ``reshape`` method, or more easily done by making use of the ``newaxis`` keyword within a slice operation:
# In[39]:
x = np.array([1, 2, 3])
# row vector via reshape
x.reshape((1, 3))
# In[40]:
# row vector via newaxis
x[np.newaxis, :]
# In[41]:
# column vector via reshape
x.reshape((3, 1))
# In[42]:
# column vector via newaxis
x[:, np.newaxis]
# We will see this type of transformation often throughout the remainder of the book.
# ## Array Concatenation and Splitting
#
# All of the preceding routines worked on single arrays. It's also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays. We'll take a look at those operations here.
# ### Concatenation of arrays
#
# Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines ``np.concatenate``, ``np.vstack``, and ``np.hstack``.
# ``np.concatenate`` takes a tuple or list of arrays as its first argument, as we can see here:
# In[43]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])
# You can also concatenate more than two arrays at once:
# In[44]:
z = [99, 99, 99]
print(np.concatenate([x, y, z]))
# It can also be used for two-dimensional arrays:
# In[45]:
grid = np.array([[1, 2, 3],
[4, 5, 6]])
# In[46]:
# concatenate along the first axis
np.concatenate([grid, grid])
# In[47]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)
# For working with arrays of mixed dimensions, it can be clearer to use the ``np.vstack`` (vertical stack) and ``np.hstack`` (horizontal stack) functions:
# In[48]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
[6, 5, 4]])
# vertically stack the arrays
np.vstack([x, grid])
# In[49]:
# horizontally stack the arrays
y = np.array([[99],
[99]])
np.hstack([grid, y])
# Similary, ``np.dstack`` will stack arrays along the third axis.
# ### Splitting of arrays
#
# The opposite of concatenation is splitting, which is implemented by the functions ``np.split``, ``np.hsplit``, and ``np.vsplit``. For each of these, we can pass a list of indices giving the split points:
# In[50]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)
# Notice that *N* split-points, leads to *N + 1* subarrays.
# The related functions ``np.hsplit`` and ``np.vsplit`` are similar:
# In[51]:
grid = np.arange(16).reshape((4, 4))
grid
# In[52]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)
# In[53]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)
# Similarly, ``np.dsplit`` will split arrays along the third axis.
#
# < [Understanding Data Types in Python](02.01-Understanding-Data-Types.ipynb) | [Contents](Index.ipynb) | [Computation on NumPy Arrays: Universal Functions](02.03-Computation-on-arrays-ufuncs.ipynb) >
#
#
#