Frequently use Pandas methods/functions
I have no counter-argument about why don’t we use something else beside
import pandas as pd
pd.read_csv(csvfile) and above that I mostly use
file=r'c:\datasets\csvfile.csv' because I put ipynb in Git.
pd.read_excel(excelfile) to work with excel file.
pd.read_sql(query, connection_object) not frequently use today since Power BI or Tableau is more faster, but in case it’s repeat process.
pd.read_json(jsonfile) to work with json, this one is fast and very useful when you want to manipulate json files again and again.
normally after we import the data, we put them into dataframe called
df = pd.read_csv('csv.csv') so, when we export, we begin with
df.to_csv(csvfile) export to csv to continue working with other program
df.to_excel(excelfile) mostly use when we do a quick ETL
df.to_json(jsonfile) after edited json
df.head() lookup on first 5 rows
df.tail() lookup on last 5 rows
df.shape check number of rows and columns, it will return like (20640, 10) meaning this dataframe has 20640 rows and 10 columns
df.info() we mostly use .info rather than .shape because it contains important data to continue working on
df.dropna() quick and easy way to get rid of row that has null value
This is just Pandas101, I’ll write about the steps of cleaning data with pandas another time in near future.