Index (pandas)

pandas

  • Choose the columns as the index of dataframe (set index)
  • Initialization of index
  • Rearrange index
  • Sorting

Set Index

  • Specific columns can be set as the index of dataframe
  • Single index
    • df.set_index(“column2”) (column2 gonna be the index of this dataframe)
  • Multiple index
    • df.set_index([“column1”, “column3”]) (Both column1 and column3 gonna be the index of this dataframe)
  • If set_index is used, original index gonna be deleted from its dataframe

Initialization of Index

  • Use “reset_index”
    • origin source changes (X) → just return
    • If we wanna change the origin dataframe, we insert “inplace=True” as a parameter
  • df.reset_index()
    • Integer position index gonna be the index of dataframe
    • Original index gonna be the first column (The column’s name is “index”)

Rearrange Index

  • If we wanna change the order of index, use “reindex”
    • origin source changes (X) → just return
    • “inplace = True” cannat be used
import pandas as pd

df = pd.DataFrame({'math' : [100, 20, 98], 'english' : [15, 75, 68], 'biology' : [98, 100, 95]}, index = ['Samoh', 'Rivera', 'Cho'])
df = df.reindex(['Cho', 'Samoh', 'Rivera'])

print(df)


  • The order of index gonna be changed to ‘Cho’, ‘Samoh’, ‘Rivera’


import pandas as pd

df = pd.DataFrame({'math' : [100, 20, 98], 'english' : [15, 75, 68], 'biology' : [98, 100, 95]}, index = ['Samoh', 'Rivera', 'Cho'])
df = df.reindex(['Cho', 'Samoh', 'Rivera', 'Sanford', 'Westberry'])

print(df)


  • ‘Sanford’ and ‘Westberry’ are not involved in the original dataframe : They gonna be added as new index (all of elements are NaN)
  • NaN can be replaced if we insert fill_value parameter to reindex method
    • df.reindex(index_list, fill_value = 3) (If some elements in index_list are not involved in the original dataframe, they gonna be added as new index and their elements’ value is 3)

Sorting Based on Index

  • Based on index, rows of dataframe can be sorted
  • Use “sort_index”
    • origin source changes (X) → just return
    • If we wanna change the origin dataframe, we insert “inplace=True” as a parameter
  • df.sort_index(ascending=True)
  • df.sort_index(ascending=False)
import pandas as pd

df = pd.DataFrame({'math' : [100, 20, 98], 'english' : [15, 75, 68], 'biology' : [98, 100, 95]}, index = ['Samoh', 'Rivera', 'Cho'])

df = df.sort_index(ascending=True)

print(df)


  • The order of index : ‘Cho’, ‘Rivera’, ‘Samoh’

Sorting Based on Value

  • We can sort the dataframe by choosing a single column → It is sorted using the value of elements in this column
  • Use “sort_values”
    • origin source changes (X) → just return
    • If we wanna change the origin dataframe, we insert “inplace=True” as a parameter


import pandas as pd

df = pd.DataFrame({'math' : [100, 20, 98], 'english' : [15, 75, 68], 'biology' : [98, 100, 95]}, index = ['Samoh', 'Rivera', 'Cho'])

df = df.sort_values(by='math', ascending=False)

print(df)


  • By the value of elements in math, the order of index is ‘Samoh’, ‘Cho’, ‘Rivera’

© 2017. All rights reserved.

Powered by Hydejack v조현진