Python : Select subset using sub column numbers or names

This post shows how to select sub columns from array or dataframe using sub column numbers or names. For example, this method is useful when we want to select yields with relevant maturities (1,3,5,7,10,15,20,30 years) from full spectrum of maturities (1,2,3,4,...,28,29,30 years).



Select subset using sub column numbers or names



Given a matrix A with 6 columns, to select 2, 4, and 6th columns from an array or dataframe, we can use np.where() function as follows.

# multiple graphs in Jupyter notebook
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
 
import pandas as pd
import numpy as np
 
#===================================================
# full column numbers and partial ones
#===================================================
cn_full = np.array([1,2,3,4,5,6])
cn_part = np.array([2,4,6])
 
= np.array([
    [0.0730.0740.0750.0940.0950.095],
    [0.0720.0730.0740.0830.0830.084],
    [0.0660.0670.0680.0750.0760.076],
    [0.0440.0450.0460.0420.0420.042],
    [0.05 , 0.0480.0480.0370.0370.038],
    [0.05 , 0.0490.0480.0390.0390.039]])
 
Adf = pd.DataFrame(A); Adf
 
#===================================================
# Indices of cn_full which belong to cn_part
#===================================================
indices = np.where(cn_part[:,None== cn_full[None,:])[1]
indices
 
#===============================================
# compare if two arrays are the same
#===============================================
np.array_equal(cn_part, cn_full[indices])
 
#===============================================
print('select sub columns from A or Adf')
#===============================================
A[:,indices]       # in the case of np array
Adf.iloc[:,indices] # in the case of pd dataframe
 
 
cs

python code, Select subset using sub column numbers or names

The following code uses the above method to select 6m, 1y, and 20y yields from the assumed full maturities of 3m, 6m, 1y, 3y, 10y, and 20y.

df_yield = pd.DataFrame(A, columns = ['3m','6m','1y','3y','10y','20y'])
df_yield
 
cn_full = df_yield.columns.to_numpy()
cn_part = np.array(['6m','1y','20y'])
 
#===================================================
# Indices of cn_full which belong to cn_part
#===================================================
indices = np.where(cn_part[:, None== cn_full[None, :])[1]
indices
 
#===============================================
# compare if two arrays are the same
#===============================================
np.array_equal(cn_part, cn_full[indices])
 
#===============================================
print('select sub maturities from df_yield')
#===============================================
df_yield.iloc[:,indices] # in the case of pd dataframe
 
cs

python code, Select subset using sub column numbers or names

Of course, the following commands are easier to use with a dataframe. In particular, when using .loc, each row and column location is stated explicitly, such as : as all rows.

df_yield[['6m','1y','20y']]
# or 
df_yield.loc[:,['6m','1y','20y']]
 
 
cs

No comments:

Post a Comment