/projects

Projects

End-to-end data engineering solutions and implementations, as well as academic software engineering projects.

End-to-End Data Pipeline with CI/CD and Interactive Dashboard
2026
End-to-End Data Pipeline with CI/CD and Interactive Dashboard

Complete DataOps workflow: data generation, processing, automated testing, CI/CD, and a Streamlit interactive dashboard deployed to Streamlit Cloud.

PythonDataOpsPlotlyStreamlitPytestGitHub Actions
View on GitHub-
Snowflake ELT Pipeline with dbt and Airflow
2026
Snowflake ELT Pipeline with dbt and Airflow

End-to-end ELT pipeline extracting TPCH orders into Snowflake and transforming with dbt, orchestrated by Apache Airflow.

SnowflakedbtApache AirflowPythonSQLELT Pipelines
View on GitHub-
GCP Uber Data Analytics
2025
GCP Uber Data Analytics

Built an end-to-end GCP pipeline with Mage, BigQuery, and Looker Studio for automated data ingestion and real-time analytics.

GCPMage.aiBigQueryLooker StudioETL PipelinesStar Schema
View on GitHub-
ERP & CRM Data Warehouse
2025
ERP & CRM Data Warehouse

End-to-end data warehousing and analytics project built with SQL Server using the Medallion Architecture. It covers data ingestion, transformation, modeling, and reporting, showcasing best practices in data engineering and analytics.

SQL ServerT-SQLPower BIMedallion ArchitectureData ModelingData Warehousing
View on GitHub-
AWS YouTube Analytics
2025
AWS YouTube Analytics

An AWS-based ETL pipeline that automates the processing and analysis of YouTube trending videos data, converting raw files into optimized datasets for analytics-ready insights

AWS GlueAWS LambdaAWS S3AWS Glue Data CatalogApache ParquetPythonPySparkAWS Data Wrangler
View on GitHub-
Azure Tokyo Olympics Data Analytics
2025
Azure Tokyo Olympics Data Analytics

End-to-end Azure ETL and analytics pipeline processing Tokyo 2020 Olympics data using Data Factory, Databricks, and Synapse.

AzureSynapseDatabricksData FactoryData Lake Gen2PySparkSQL
View on GitHub-
COVID-19 Exploratory Data Analysis
2024
COVID-19 Exploratory Data Analysis

Comprehensive exploration of global COVID-19 data (2020–2021) with SQL analytics and Tableau dashboards highlighting key pandemic patterns.

MS SQL ServerT-SQLTableauData CleaningData VisualizationEDA
View on GitHub-
Revealing Key Insights from GitHub Topics with Web Scraping
2025
Revealing Key Insights from GitHub Topics with Web Scraping

Web scraping project that scrapes GitHub Topics and the top repositories per topic to produce structured CSV files for analysis. The notebook demonstrates scraping the topics, exporting results to per-topic CSV files.

PythonBeautifulSouppandasJupyter NotebookWeb Scraping
View on GitHub-
CSV 2 SQL Pipeline
2025
CSV 2 SQL Pipeline

This project demonstrates a simple ETL pipeline that reads CSV files, cleans and transforms data using Python and pandas, and loads it into Microsoft SQL Server using pyodbc.

PythonSQL Serverpandaspyodbc
View on GitHub-
Crypto Market Insights with Binance EDA
2025
Crypto Market Insights with Binance EDA

Data-driven Exploratory Data Analysis on cryptocurrency markets using the Binance API. Fetches tickers, order book depth, historical OHLCV, preprocesses data, and visualizes candlestick charts with overlays and indicators.

python-binancepandasnumpyJupyter Notebook
View on GitHub-
Movie Data Analysis: Correlation with Gross Revenue
2024
Movie Data Analysis: Correlation with Gross Revenue

Exploratory data analysis of a movie dataset (from Kaggle) to identify which features correlate most with gross revenue. Includes cleaning, feature selection, correlation matrices, and visualizations using Seaborn and Matplotlib.

PythonpandasNumPySeabornMatplotlibJupyter Notebook
View on GitHub-
Gender Pay, Job Satisfaction & Language Usage Analysis in Data-Related Jobs
2025
Gender Pay, Job Satisfaction & Language Usage Analysis in Data-Related Jobs

Interactive Power BI dashboard analyzing a real-world survey of data professionals, covering gender pay gap, languages used, geographical distribution, average pay by role, job satisfaction, and work-life balance.

Power BIData VisualizationDAXData AnalysisData Cleaning
View on GitHub-
22 Years of Vehicle Fuel Efficiency and Emissions (2000–2022), An exploratory Data Analysis
2025
22 Years of Vehicle Fuel Efficiency and Emissions (2000–2022), An exploratory Data Analysis

This project explores fuel consumption and emissions data across various vehicle makes and models. By analyzing attributes like engine size, transmission type, fuel type, and vehicle class...

PythonpandasNumPyMatplotlibSeabornJupyter Notebook
View on GitHub-
Salary Prediction
2026
Salary Prediction

Small salary prediction project including an exploratory notebook, a trained scikit-learn model (salary_model.pkl), and a Streamlit app (app.py) to estimate salary from Years of Experience and Job Rate.

Pythonpandasscikit-learnjoblibstreamlitJupyter Notebook
View on GitHub-
FAQ Generator
2026
FAQ Generator
FSTS

Comprehensive local tool that scrapes website text, calls a LLM (e.g. Google Gemini) to produce exactly five FAQs in strict JSON, and serves a small web UI for interaction.

PythonFlaskBeautifulSouprequestsJavaScriptHTMLCSS
View on GitHub-
Invoice parsing (PDF2json)
2026
Invoice parsing (PDF2json)
FSTS

Extracts key fields from PDF invoices (invoice_number, bill_to, total_cost, item_description) and returns a structured JSON object. Includes a Jupyter notebook, Flask API, static frontend, and optional SQLite persistence.

PythonpdfplumberFlaskSQLiteHTMLCSSJavaScript
View on GitHub-
Meeting Management System
2025
Meeting Management System
FSTS

SQL Server database and accompanying tools for scheduling meetings, managing employees, rooms, invitations, documents, minutes and notifications within an organization.

SQL ServerT-SQLStored ProceduresUser-Defined FunctionsC (CLI)
View on GitHub-
Système de Gestion des Stages (StagEase)
2025
Système de Gestion des Stages (StagEase)
FSTS

JavaFX application for managing internships: offers, candidates, internship assignments, evaluations and administrative workflows. Built with Java, JavaFX, Maven and MySQL.

JavaJavaFXMavenMySQLJDBC
View on GitHub-
MNIST Handwritten Digits (CNN)
2026
MNIST Handwritten Digits (CNN)
FSTS

Jupyter notebook training a simple CNN on the MNIST handwritten digits dataset, including training, evaluation, and error analysis.

TensorFlowKerasNumPyMatplotlibSeabornJupyter NotebookCNN
View on GitHub-