ETL Pipeline

The following resources were used to author a segment of an ETL pipeline:

Import Necessary Libraries

Declare and Assign Connection Variables for MySQL, MongoDB, & Databases: Change for your credentials

Connect to MongoDB

Define Functions for Getting Data From and Setting Data Into Databases

Dealing with CSV or JSON files

The following sections deals with CSV or JSON files. The option to convert to or from either exists, but for the sake of this project, the file "HistoricalQuotes.csv" is stored in the github repo alongside this page and will be converted to a JSON and used from there.

Load CSV File

Convert CSV into JSON (change file paths)

Extra: Convert JSON to CSV (change file paths)

Create the new data warehouse

Populate MongoDB with source data. DO ONLY ONCE!

Create and Populate Tables

1.1. Extract Data from the Source MongoDB Collections Into DataFrames

Remove/Modify Columns

Add Primary Key Column

Load the Transformed DataFrames into the New Data Warehouse by Creating New Tables

Validate the Tabels

Summary Statistics (Row and column count)

Extra Queries