Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Harsh-Belekar/Advance-SQL-Project-Data-Warehousing-EDA-Advanced-Data-Analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

6 Commits

Repository files navigation

Advance SQL Project : Data Warehousing + EDA + Advanced Data Analytics

A complete, end-to-end SQL-driven Data Warehouse & Analytics Project, covering:

  • Data Modeling & ETL Pipeline (Bronze - Silver - Gold)
  • Data Cleaning, Standardization & Transformation
  • EDA + Advanced SQL Analytics
  • Customer & Product Reporting
  • Business Insights & KPI Generation

This project simulates a real industry-grade data engineering + business analytics workflow, starting from raw ERP & CRM datasets and delivering production-ready insights using SQL.


Summary

This project builds a complete SQL Data Warehouse (Bronze - Silver - Gold) and performs advanced EDA + analytics to generate customer and product insights for a retail business.


Table of Contents

  • Business Problem
  • Project Overview
  • Project Architecture & Diagrams
  • Tech Stack
  • Key Skills Demonstrated
  • Project Folder Structure
  • Key Features
  • Reports Generated
  • Data Architecture Flow
  • Important Files
  • Key Outcomes
  • What I Will Learn
  • Clone This Repository
  • How to Run This Project
  • Project Highlights (for Resume / Portfolio)
  • License
  • Dataset Disclaimer
  • Author

Business Problem

Retail Company Issue:
Data scattered across ERP & CRM - no unified reporting - poor insights - inconsistent decisions.

You built a scalable Data Warehouse + Analytics System to solve:

  • Scattered data
  • Inconsistent formats
  • No single source of truth
  • No customer or product performance tracking
  • No advanced reporting

The result is a clean, scalable, analytics-ready warehouse.


Project Overview

This project demonstrates how to build and analyze a data warehouse environment using SQL.
It includes:

1. Data Warehouse Development

  • Data ingestion (Bronze layer)
  • Data cleaning & harmonization (Silver layer)
  • Business modeling & fact/dimension tables (Gold layer)
  • Stored procedures for ETL
  • Data quality testing
  • Documentation (data model, flow diagrams, architecture)

2. EDA + Advanced SQL Data Analysis

Using advanced SQL analytics techniques:

  • Ranking
  • Segmentation
  • Cumulative metrics
  • Change-over-time analysis
  • Performance metrics
  • Customer & Product reports
  • KPI calculations
  • Exploratory Data Analysis insights

3. Advanced Reporting

SQL-based dashboards & reports:

  • Customer Analytics Report
  • Product Performance Report

Project Architecture & Diagrams

Overall Architecture

Mesh Architecture Layers

Data Integration Workflow

Data Flow Diagram

Star Schema Data Model


Tech Stack

Layer Tools Used
Data Warehouse PostgreSQL / SQL
Data Modeling Star Schema, Dimensional Modeling
ETL Pipeline SQL Stored Procedures
EDA & Analytics SQL (Window functions, Aggregations, CTEs)
Documentation Markdown, PNG diagrams

Key Skills Demonstrated

  • Advanced SQL (Window Functions, CTEs, Ranking Functions)
  • Data Warehousing (Bronze-Silver-Gold architecture)
  • ETL Pipeline Development
  • Fact & Dimension Modeling
  • Data Cleaning & Standardization
  • Analytical Reporting & KPI Design
  • Data Architecture Documentation

Project Folder Structure

Advance-SQL-Project-Data-Warehousing-EDA-Advanced-Data-Analytics/
|
|-- README.md - Main Project Documentation
|-- LICENSE - License for Project
|
+-- Data Warehouse/
| |
| +-- scripts/ - ETL Scripts for Bronze - Silver - Gold
| | +-- bronze/
| | | +-- ddl_bronze.sql - Create Bronze Layer Tables
| | | +-- proc_load_bronze.sql - Load Raw ERP + CRM Data into Bronze
| | |
| | +-- silver/
| | | +-- ddl_silver.sql - Create Cleaned Silver Layer Tables
| | | +-- proc_load_silver.sql - Transform Bronze - Silver
| | |
| | +-- gold/
| | +-- ddl_gold.sql - Create Final Fact & Dimensions (DW)
| |
| +-- tests/ - Data Quality & Validation Scripts
| | +-- quality_checks_gold.sql - Gold Layer Validation Tests
| | +-- quality_checks_silver.sql - Silver Layer Validation Tests
| |
| +-- docs/ - Architecture, Models & Pipeline Diagrams
| | +-- Analysing Source System.png - Source System Exploration
| | +-- data_architecture.png - Full Data Architecture Overview
| | +-- data_catalog.md - Documentation for All Tables & Columns
| | +-- data_flow.png - End-to-End Data Flow Diagram
| | +-- data_integration.png - ERP + CRM Integration Overview
| | +-- data_layers.pdf - Bronze, Silver, Gold Explanation
| | +-- data_model.png - Data Warehouse Star Schema
| | +-- ETL.png - ETL Pipeline Overview
| | +-- Mesh_Architecture_Layers.png - Data Mesh Architecture Layers
| | +-- naming_conventions.md - Standards for Naming Tables & Columns
| |
| +-- row_dataset/ - Raw ERP & CRM Source System Data
| +-- source_erp/
| | +-- CUST_AZ12.csv - ERP Customer Data
| | +-- LOC_A101.csv - ERP Location Data
| | +-- PX_CAT_G1V2.csv - ERP Product/Category Data
| |
| +-- source_crm/
| +-- cust_info.csv - CRM Customer Info
| +-- prd_info.csv - CRM Product Info
| +-- sales_details.csv - CRM Sales Transactions
|
+-- EDA + Advanced Data Analysis/
|
+-- Data Analysis .png - EDA Output Summary Diagram
|
+-- scripts/ - All SQL Scripts for Analysis
| +-- 00_init_database.sql - Initialize Analysis Schema
| +-- 01_database_exploration.sql - Explore Tables & Metadata
| +-- 02_dimensions_exploration.sql - Explore Dimension Tables
| +-- 03_date_range_exploration.sql - Explore Date Ranges
| +-- 04_measures_exploration.sql - Explore Key Business Metrics
| +-- 05_magnitude_analysis.sql - Magnitude-Level Analysis
| +-- 06_ranking_analysis.sql - Ranking & Ordering Analysis
| +-- 07_change_over_time_analysis.sql - Trend + Time-Based Analysis
| +-- 08_cumulative_analysis.sql - Running Totals & Rolling Sums
| +-- 09_performance_analysis.sql - Performance & KPI Insights
| +-- 10_part_to_whole_analysis.sql - Proportional Contribution Analysis
| +-- 11_data_segmentation.sql - Customer & Product Segmentation
| +-- 12_report_customers.sql - Generate Customer Report (Gold Layer)
| +-- 12_report_products.sql - Generate Product Report (Gold Layer)
|
+-- dataset/ - Output Reports from Gold Layer
+-- gold.dim_customers.csv - Cleaned Customer Dimension
+-- gold.dim_products.csv - Cleaned Product Dimension
+-- gold.fact_sales.csv - Cleaned Fact Sales Table
+-- gold.report_customers.csv - Final Customer Analytics Report
+-- gold.report_products.csv - Final Product Analytics Report

Key Features

Data Warehouse (Bronze - Silver - Gold)

  • Raw data ingestion
  • Data profiling
  • Standardization & validation
  • Star schema modeling
  • Automated ETL procedures
  • Data quality tests
  • Complete documentation

EDA + Advanced SQL Analytics

  • Dimension exploration
  • Measures analysis
  • Ranking, segmentation
  • Time-series & cumulative trends
  • KPI calculations (Recency, AOV, Monthly Spend, etc.)
  • Customer & Product performance reports

Reports Generated

Customer Report

Includes:

  • Customer segments (VIP, Regular, New)
  • Recency
  • Lifespan
  • Total orders, products, quantity, sales
  • Avg order value
  • Avg monthly spend

Product Report

Includes:

  • Product segments (High Performer / Mid Range / Low Performer)
  • Recency
  • Lifespan
  • Unique customers
  • Avg selling price
  • Monthly revenue

Data Architecture Flow

RAW (ERP + CRM)
|
BRONZE - Clean storage
|
SILVER - Harmonized & Enriched Data
|
GOLD - Final Fact + Dimension Tables
|
ANALYTICS - Reports, KPIs, Dashboards

Important Files

Data Warehouse Scripts

bronze/
ddl_bronze.sql
proc_load_bronze.sql
silver/
ddl_silver.sql
proc_load_silver.sql
gold/
ddl_gold.sql

EDA SQL Scripts

00_init_database.sql
01_database_exploration.sql
02_dimensions_exploration.sql
...
12_report_customers.sql
12_report_products.sql

Key Outcomes

  • Built a fully functional SQL Data Warehouse
  • Designed & implemented ETL pipelines
  • Performed advanced SQL analytics
  • Designed star schema (Fact + Dimensions)
  • Developed customer & product analytical reports
  • Demonstrated real-world Data Engineer + Analyst workflow

What I Will Learn

  • How to design a Data Warehouse from scratch
  • How to build ETL pipelines (Bronze - Silver - Gold)
  • How to clean & transform raw data
  • How to write advanced SQL analytical scripts
  • How to generate customer & product insights using SQL
  • How to document a real-world data engineering project

Clone This Repository

git clone https://github.com/Harsh-Belekar/Advance-SQL-Project-Data-Warehousing-EDA-Advanced-Data-Analytics.git
cd Advance-SQL-Project-Data-Warehousing-EDA-Advanced-Data-Analytics

How to Run This Project

1. Initialize the Database

00_init_database.sql

2. Load Bronze Layer

proc_load_bronze.sql

3. Load Silver Layer

proc_load_silver.sql

4. Create Gold Layer

ddl_gold.sql

5. Run Analysis Scripts (00 - 12)

EDA + Advanced Data Analysis/scripts/

Project Highlights (for Resume / Portfolio)

  • Real-world Data Engineering + Analytics workflow
  • End-to-end SQL project (Data Warehousing + EDA + Advanced Data Analysis)
  • Realistic ETL + Data Modeling experience
  • Clean architecture & documentation
  • Retail analytics insights
  • Strong Analytics + Business Insights generation
  • Showcases SQL expertise at scale

License

MIT License -- see LICENSE file.


Dataset Disclaimer

All datasets used are dummy, synthetic, or public, intended only for learning and portfolio demonstration.
No real customer or company data is used.


Author

Harsh Belekar
Data Analyst | Python Developer | SQL | Power BI | Excel | Data Visualization
LinkedIn | GitHub

harshbelekar74@gmail.com


If you found this project helpful, feel free to star the repo and connect with me for collaboration!

About

An End-to-End Advanced SQL Project covering Data Warehousing, ETL Pipeline (Bronze - Silver - Gold), Star Schema Modeling, EDA, and Advanced SQL Analytics. Built using PostgreSQL, this project simulates a real-world Data Engineering + Data Analytics workflow using raw ERP & CRM data to generate production-ready customer and product insights.

Topics

Resources

Readme

License

MIT license

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors