Pandas is a popular data analysis tool that

Install and import Pandas

pip install pandas
import numpy as np
import pandas as pd

Pandas Data Structures

The core value of Pandas comes through the data structure options it provides, primarily

  1. Series (labeled, homogenously-typed, one-dimensional arrays)
  2. DataFrames (labeled, potentially heterogenously-typed, two-dimensional arrays)

Pandas Series

Create Series

Create empty Series

s = pd.Series(dtype='float64')

Create Series from dictionary

d = {'a': 1, 'b': 2, 'c': 3}
s = pd.Series(d)

Create Series from Numpy array

a = np.array([1,2,3,4])
s = pd.Series(a, copy=False, dtype=float)

Create Series from Numpy array with a defined index

data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[10,11,12,13])

Create a Series from Scalar

s = pd.Series(5, index=[0, 1, 2, 3])

Select Series data

Retrieve first element

s[0]

Retrieve first 3 elements

s[:3]

Retieve last 3 elements

s[:-3]

Retieve data via index

s['a']

Retrieve multiple elements via index

s[['a', 'c', 'd']]

Series Functions

Return Series as an array

s.values

Returns shape and size of the series

s.shape
s.size 

Cast Series as another data type

s.astype('int32')

Count non-null values in Series

s.count()

Cumulative Sum

s.cumsum()

Drop missing values

s.dropna()

Pandas DataFrame

A DataFrame is a two-dimensional structure, where data is aligned in a tabular fashion in rows and columns.

The columns of a dataframe are potentially heterogenously typed and the size is mutable. The axes are labeled, which allows for performing arithematic operations on the rows and columns.

Create DataFrame

Create empty dataframe

df = pd.DataFrame()