Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

SaurabhSSB/statistics_workout

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

17 Commits

Repository files navigation

statistics_workout

This repository contains a collection of Python scripts that explore fundamental concepts in statistics using real-world datasets. These exercises cover techniques such as percentile-based filtering, Z-score calculations, modified Z-tests, and cosine similarity, enhanced with data visualization using Seaborn and Matplotlib.

Contents

File Name Description
1_percentile.py Calculates percentiles and removes outliers based on the 90th percentile of household size.
2_mean_absolute_deviation_standard_deviation_z_value.py Performs outlier detection using standard deviation and Z-scores on BMI data.
3_log.py Visualizes highway population data and introduces logarithmic plotting.
4_Normal.py Plots income vs credit limit with log-scaled axes using Seaborn.
5_cosine.py Demonstrates cosine similarity and cosine distance for basic NLP-like document vectors.
6_modified_z_test.py Implements both standard Z-score and Modified Z-score methods for income-based outlier detection.
modified_z_score.xlsx Example Excel sheet supporting the modified Z-score implementation.

Technologies Used

  • Python 3.x
  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scikit-learn

Key Concepts

  • Descriptive Statistics
  • Percentile Analysis
  • Outlier Detection (Z-score, Modified Z-score)
  • Data Cleaning & Preprocessing
  • Cosine Similarity & Distance
  • Data Visualization

How to Run

  1. Clone the repository:
    git clone https://github.com/your-username/statistics_workout.git
    cd statistics_workout
  2. Install required libraries:
    pip install pandas numpy matplotlib seaborn scikit-learn
  3. Run any script using:
    python filename.py

Ensure that the necessary CSV files are placed in the correct paths or update the paths in the scripts accordingly.

Download

Click here to download this repository as a ZIP file


Contact

If you have questions or suggestions, feel free to reach out via GitHub Issues.

About

A collection of Python scripts demonstrating core statistical concepts like percentile analysis, Z-scores, modified Z-tests, and cosine similarity with real datasets and visualizations.

Topics

Resources

Readme

License

MIT license

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

Languages