Sumit Gundawar

Data Scientist

Transforming data into actionable insights, one line of code at a time.

@SumitGundawar

Chicken Disease Classifier

This project focuses on building a CNN-based classification model to identify diseases in chickens. The repository includes a comprehensive workflow that covers updating configuration files, managing secrets, updating parameters, entities, components, and pipelines. The project utilizes DVC for version control and reproducibility. Detailed instructions are provided on how to set up and run the project locally using conda and Docker. Additionally, the project incorporates CI/CD deployment using GitHub Actions, with options to deploy on AWS using EC2 and ECR, as well as on Azure using Azure Container Registry and Web App Server.

Skills: Python, CNN, DVC, Docker, CI/CD, GitHub Actions, AWS (EC2, ECR), Azure (Container Registry, Web App Server)

Demand Planning Disaggregation

Develop a SAS-based forecasting model, significantly improving demand prediction accuracy. Established an ETL pipeline using PySpark and DAX to enhance metrics analysis and implemented a CI/CD pipeline for continuous feature development and issue resolution. Additionally, designed a PowerBI dashboard to support executive decision-making and optimize supply chain operations.

Skills: SAS, PySpark, DAX, CI/CD, PowerBI, Machine Learning, Data Analysis, Data Visualization, Supply Chain Management

Picker Accuracy

Developed a warehouse operative rating system using Python, analyzing various metrics. An ETL pipeline and dashboard were created using Python and DAX to display operative rankings across North American warehouses. This enabled management to identify top performers, address underperformance, and allocate incentives effectively, ultimately enhancing warehouse operations and employee satisfaction.

Skills: Python, DAX, ETL, Data Analysis, Data Visualization, Performance Metrics

Weather Prediction during WWII

This project predicts the maximum temperature based on the minimum temperature using various regression models. The dataset is preprocessed, and five models are employed: Linear Regression, Multiple Linear Regression, Polynomial Regression, Decision Tree Regression, and Random Forest Regression. The performance of each model is evaluated using the R-squared metric, and visualizations are created to illustrate the relationship between temperatures and predictions.

Skills: Python, Pandas, NumPy, Matplotlib, Scikit-learn

Stock Price Tracker using YahooFinance

This web application allows users to track real-time prices and percentage changes of stocks, index funds, ETFs, and cryptocurrencies. It fetches data from the Yahoo Finance API and updates the prices every 10 seconds. The application features a search box with a recommendation system and color-coded price display for easy interpretation of asset performance.

Skills: Flask, Python, JavaScript, HTML/CSS, Yahoo Finance API