Index (pandas)

09 Dec 2023 in Study / Computer science on Pandas

pandas

Choose the columns as the index of dataframe (set index)
Initialization of index
Rearrange index
Sorting

Set Index

Specific columns can be set as the index of dataframe
Single index
- df.set_index(“column2”) (column2 gonna be the index of this dataframe)
Multiple index
- df.set_index([“column1”, “column3”]) (Both column1 and column3 gonna be the index of this dataframe)
If set_index is used, original index gonna be deleted from its dataframe

Initialization of Index

Use “reset_index”
- origin source changes (X) → just return
- If we wanna change the origin dataframe, we insert “inplace=True” as a parameter
df.reset_index()
- Integer position index gonna be the index of dataframe
- Original index gonna be the first column (The column’s name is “index”)

Rearrange Index

If we wanna change the order of index, use “reindex”
- origin source changes (X) → just return
- “inplace = True” cannat be used

import pandas as pd

df = pd.DataFrame({'math' : [100, 20, 98], 'english' : [15, 75, 68], 'biology' : [98, 100, 95]}, index = ['Samoh', 'Rivera', 'Cho'])
df = df.reindex(['Cho', 'Samoh', 'Rivera'])

print(df)

The order of index gonna be changed to ‘Cho’, ‘Samoh’, ‘Rivera’

import pandas as pd

df = pd.DataFrame({'math' : [100, 20, 98], 'english' : [15, 75, 68], 'biology' : [98, 100, 95]}, index = ['Samoh', 'Rivera', 'Cho'])
df = df.reindex(['Cho', 'Samoh', 'Rivera', 'Sanford', 'Westberry'])

print(df)

‘Sanford’ and ‘Westberry’ are not involved in the original dataframe : They gonna be added as new index (all of elements are NaN)
NaN can be replaced if we insert fill_value parameter to reindex method
- df.reindex(index_list, fill_value = 3) (If some elements in index_list are not involved in the original dataframe, they gonna be added as new index and their elements’ value is 3)

Sorting Based on Index

Based on index, rows of dataframe can be sorted
Use “sort_index”
- origin source changes (X) → just return
- If we wanna change the origin dataframe, we insert “inplace=True” as a parameter
df.sort_index(ascending=True)
df.sort_index(ascending=False)

import pandas as pd

df = pd.DataFrame({'math' : [100, 20, 98], 'english' : [15, 75, 68], 'biology' : [98, 100, 95]}, index = ['Samoh', 'Rivera', 'Cho'])

df = df.sort_index(ascending=True)

print(df)

The order of index : ‘Cho’, ‘Rivera’, ‘Samoh’

Sorting Based on Value

We can sort the dataframe by choosing a single column → It is sorted using the value of elements in this column
Use “sort_values”
- origin source changes (X) → just return
- If we wanna change the origin dataframe, we insert “inplace=True” as a parameter

import pandas as pd

df = pd.DataFrame({'math' : [100, 20, 98], 'english' : [15, 75, 68], 'biology' : [98, 100, 95]}, index = ['Samoh', 'Rivera', 'Cho'])

df = df.sort_values(by='math', ascending=False)

print(df)

By the value of elements in math, the order of index is ‘Samoh’, ‘Cho’, ‘Rivera’

Index (pandas)

Set Index

Initialization of Index

Rearrange Index

Sorting Based on Index

Sorting Based on Value

Bioinformatics

Error

Set Index

Initialization of Index

Rearrange Index

Sorting Based on Index

Sorting Based on Value

Templates (for web app):

Error