Install and import Pandas
pip install pandas
import numpy as np
import pandas as pd
Pandas Data Structures
The core value of Pandas comes through the data structure options it provides, primarily
- Series (labeled, homogenously-typed, one-dimensional arrays)
- DataFrames (labeled, potentially heterogenously-typed, two-dimensional arrays)
Pandas Series
Create Series
Create empty Series
s = pd.Series(dtype='float64')
Create Series from dictionary
d = {'a': 1, 'b': 2, 'c': 3}
s = pd.Series(d)
Create Series from Numpy array
a = np.array([1,2,3,4])
s = pd.Series(a, copy=False, dtype=float)
Create Series from Numpy array with a defined index
data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[10,11,12,13])
Create a Series from Scalar
s = pd.Series(5, index=[0, 1, 2, 3])
Select Series data
Retrieve first element
s[0]
Retrieve first 3 elements
s[:3]
Retieve last 3 elements
s[:-3]
Retieve data via index
s['a']
Retrieve multiple elements via index
s[['a', 'c', 'd']]
Series Functions
Return Series as an array
s.values
Returns shape and size of the series
s.shape
s.size
Cast Series as another data type
s.astype('int32')
Count non-null values in Series
s.count()
Cumulative Sum
s.cumsum()
Drop missing values
s.dropna()
Pandas DataFrame
A DataFrame is a two-dimensional structure, where data is aligned in a tabular fashion in rows and columns.
The columns of a dataframe are potentially heterogenously typed and the size is mutable. The axes are labeled, which allows for performing arithematic operations on the rows and columns.
Create DataFrame
Create empty dataframe
df = pd.DataFrame()