Pandas is a popular data analysis tool that

Install and import Pandas

pip install pandas
import numpy as np
import pandas as pd

Pandas Data Structures

The core value of Pandas comes through the data structure options it provides, primarily

  1. Series (labeled, homogenously-typed, one-dimensional arrays)
  2. DataFrames (labeled, potentially heterogenously-typed, two-dimensional arrays)

Pandas Series

Create Series

Create empty Series

s = pd.Series(dtype='float64')

Create Series from dictionary

d = {'a': 1, 'b': 2, 'c': 3}
s = pd.Series(d)

Create Series from Numpy array

a = np.array([1,2,3,4])
s = pd.Series(a, copy=False, dtype=float)

Create Series from Numpy array with a defined index

data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[10,11,12,13])

Create a Series from Scalar

s = pd.Series(5, index=[0, 1, 2, 3])

Select Series data

Retrieve first element


Retrieve first 3 elements


Retieve last 3 elements


Retieve data via index


Retrieve multiple elements via index

s[['a', 'c', 'd']]

Series Functions

Return Series as an array


Returns shape and size of the series


Cast Series as another data type


Count non-null values in Series


Cumulative Sum


Drop missing values


Pandas DataFrame

A DataFrame is a two-dimensional structure, where data is aligned in a tabular fashion in rows and columns.

The columns of a dataframe are potentially heterogenously typed and the size is mutable. The axes are labeled, which allows for performing arithematic operations on the rows and columns.

Create DataFrame

Create empty dataframe

df = pd.DataFrame()