Select subset using sub column numbers or names
Given a matrix A with 6 columns, to select 2, 4, and 6th columns from an array or dataframe, we can use np.where() function as follows.
# multiple graphs in Jupyter notebook from IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = "all" import pandas as pd import numpy as np #=================================================== # full column numbers and partial ones #=================================================== cn_full = np.array([1,2,3,4,5,6]) cn_part = np.array([2,4,6]) A = np.array([ [0.073, 0.074, 0.075, 0.094, 0.095, 0.095], [0.072, 0.073, 0.074, 0.083, 0.083, 0.084], [0.066, 0.067, 0.068, 0.075, 0.076, 0.076], [0.044, 0.045, 0.046, 0.042, 0.042, 0.042], [0.05 , 0.048, 0.048, 0.037, 0.037, 0.038], [0.05 , 0.049, 0.048, 0.039, 0.039, 0.039]]) Adf = pd.DataFrame(A); Adf #=================================================== # Indices of cn_full which belong to cn_part #=================================================== indices = np.where(cn_part[:,None] == cn_full[None,:])[1] indices #=============================================== # compare if two arrays are the same #=============================================== np.array_equal(cn_part, cn_full[indices]) #=============================================== print('select sub columns from A or Adf') #=============================================== A[:,indices] # in the case of np array Adf.iloc[:,indices] # in the case of pd dataframe | cs |
The following code uses the above method to select 6m, 1y, and 20y yields from the assumed full maturities of 3m, 6m, 1y, 3y, 10y, and 20y.
df_yield = pd.DataFrame(A, columns = ['3m','6m','1y','3y','10y','20y']) df_yield cn_full = df_yield.columns.to_numpy() cn_part = np.array(['6m','1y','20y']) #=================================================== # Indices of cn_full which belong to cn_part #=================================================== indices = np.where(cn_part[:, None] == cn_full[None, :])[1] indices #=============================================== # compare if two arrays are the same #=============================================== np.array_equal(cn_part, cn_full[indices]) #=============================================== print('select sub maturities from df_yield') #=============================================== df_yield.iloc[:,indices] # in the case of pd dataframe | cs |
Of course, the following commands are easier to use with a dataframe. In particular, when using .loc, each row and column location is stated explicitly, such as : as all rows.
df_yield[['6m','1y','20y']] # or df_yield.loc[:,['6m','1y','20y']] | cs |
No comments:
Post a Comment