Dataframe (pandas)
in Study / Computer science on Pandas
- One data structure of pandas
- Like matrix, there are rows (each row (= a record) name : index) and columns
- Creation
import pandas as pd
df1 = pd.DataFrame({"math" : [90, 74, 65], "biology" : [100, 60, 97], "chemistry" : [85, 100, 78]}, index = ["Samoh", "Rivera", "Cho"]) # if you don't insert index content, the default index is integer index
df2 = pd.DataFrame([[90, 100, 85], [74, 60, 100], [65, 97, 78]], index = ["Samoh", "Rivera", "Cho"], columns = ["math", "biology", "chemistry"])
- In df2, [90, 100, 85] is a record
- A person name : a single index
- A subject name : a single columns
- We can make dataframe like this way
- df = pd.DataFrame([Series1, Series2, …], index = [“index0”, “index1”, …])
Change Index & Columns
- print(df.index) : check the index of df
- print(df.columns) : check the columns’ names
- change the names of all of things in index & columns
- df.index = [i0, i1, i2, ….]
- df.columns = [c0, c1, c2, ….]
- only change the parts that we wanna change
- Use “rename”
- origin source changes (X) → just return
- If we wanna change the origin dataframe, we insert “inplace=True” as a parameter
- df.rename(index={originI0 : newI0, originI1 : newI1, originI2 : newI2, …})
- df.rename(columns={originC0 : newC0, originC1 : newC1, …})
- Use “rename”
Deletion of Rows & Columns
- Use “drop”
- origin source changes (X) → just return
- If we wanna change the origin dataframe, we insert “inplace=True” as a parameter
- df.drop(‘index0’, axis=0) (remove a row whose name is index0, default : axis=0)
- df.drop([‘index0’, ‘index3’], axis=0) (remove 2 rows)
- df.drop(‘column0’, axis=1) (remove a column whose name is column0)
- df.drop([‘column0’, ‘column3’], axis=1) (remove 2 columns)
Choose Rows
- Use “loc” (index label) & “iloc” (integer index)
- Choose a single row
- df.loc[‘index0’] (return Series corresponding to index0)
- df.loc[0] (return Series corresponding to first index)
- Choose multiple rows
- Double times [] & Slicing returns dataframe
- df.loc[‘index0’] != df.loc[[‘index0’]]
- df.loc[ [‘index0’, ‘index3’] ] (return dataframe corresponding to index0 & index3)
- df.loc[‘index0’ : ‘index3’] (return dataframe corresponding to index0 & index1 & index2 & index3)
- df.iloc[[0, 3]] (return dataframe corresponding to first and fourth index)
- df.iloc[0 : 4] (return dataframe corresponding to first, second, third, and fourth index)
- df.iloc[0 : 4 : 2] (return dataframe corresponding to first, third index)
- df.iloc[ : : -1] (return dataframe corresponding to last, last - 1, last - 2, …, first index)
- Double times [] & Slicing returns dataframe
Choose Columns
- Choose a single column
- df[‘column1’] (return Series corresponding to column1)
- df.column2 (return Series corresponding to column2)
- Choose multiple columns
- df[ [‘column1’, ‘column2’, ‘column5’] ] (return dataframe corresponding to column1, column2, and column5)
Choose Element
- df.(i)loc[row information, column information]
- Example
- df.iloc[1, 2] (return a single element corresponding to second index and third column)
- df.loc[‘index2’, ‘column2’] (return a single element corresponding to index2 and column2)
- df.iloc[1, 2 : 5] (return Series corresponding to second index and third, fourth, fifth column)
- df.loc[‘index2’, [‘column1’, ‘column3’]] (return Series corresponding to index2 and column1, column3)
- df.iloc[1 : 3, 2 : 5] (return dataframe corresponding to second, third index and third, fourth, fifth column)
- df.loc[‘index2’ : ‘index4’, [‘column1’, ‘column3’]] (return dataframe corresponding to index2, index3, index4 and column1, column3)
- df.iloc[:, 2 : 5] (return dataframe corresponding to second index and third, fourth, fifth column)
- df.loc[:, [‘column1’, ‘column3’]] (return dataframe corresponding to column1, column3)
- Example
- df.(i)loc[row information][column information]
Add Row
- New row has only one value as element
- df.loc[“index5”] = 0 (add a row named as index5 and this row has only 0 as element)
- New row has various values as element
- df.loc[“index5”] = [2, 4, 5, 7] (Series is also possible) (add a row named as index5 and [2, 4, 5, 7] is a record)
Add Column
- New column has only one value as element
- df[“column5”] = 0 (add a column named as column5 and this column has only 0 as element)
- New row has various values as element
- df[“column5”] = [2, 4, 5, 7] (Series is also possible) (add a column named as column5)
Change Element
- (Choose parts that we wanna change) = newValue
- Example
- df.iloc[1, 2] = 50 (Change the original value located on second index, thrid column to 50)
- df.loc[‘index1’, [‘column2’, ‘column4’]] = [40, 60] (Change the original values located on (second index, column2), (second index, column4) to 40 and 60)
- Example
Transpose
- row → column, column → row
- df.transpose()
- df.T
Head & Tail
- df.head(number) (return dataframe with n rows starting from the first row, default : number == 5)
- df.tail(number) (return dataframe with n rows starting from the last row, default : number == 5)