/projects
Projects
End-to-end data engineering solutions and implementations, as well as academic software engineering projects.

Complete DataOps workflow: data generation, processing, automated testing, CI/CD, and a Streamlit interactive dashboard deployed to Streamlit Cloud.

End-to-end ELT pipeline extracting TPCH orders into Snowflake and transforming with dbt, orchestrated by Apache Airflow.

Built an end-to-end GCP pipeline with Mage, BigQuery, and Looker Studio for automated data ingestion and real-time analytics.

End-to-end data warehousing and analytics project built with SQL Server using the Medallion Architecture. It covers data ingestion, transformation, modeling, and reporting, showcasing best practices in data engineering and analytics.

An AWS-based ETL pipeline that automates the processing and analysis of YouTube trending videos data, converting raw files into optimized datasets for analytics-ready insights

End-to-end Azure ETL and analytics pipeline processing Tokyo 2020 Olympics data using Data Factory, Databricks, and Synapse.

Comprehensive exploration of global COVID-19 data (2020–2021) with SQL analytics and Tableau dashboards highlighting key pandemic patterns.

Web scraping project that scrapes GitHub Topics and the top repositories per topic to produce structured CSV files for analysis. The notebook demonstrates scraping the topics, exporting results to per-topic CSV files.

This project demonstrates a simple ETL pipeline that reads CSV files, cleans and transforms data using Python and pandas, and loads it into Microsoft SQL Server using pyodbc.

Data-driven Exploratory Data Analysis on cryptocurrency markets using the Binance API. Fetches tickers, order book depth, historical OHLCV, preprocesses data, and visualizes candlestick charts with overlays and indicators.

Exploratory data analysis of a movie dataset (from Kaggle) to identify which features correlate most with gross revenue. Includes cleaning, feature selection, correlation matrices, and visualizations using Seaborn and Matplotlib.

Interactive Power BI dashboard analyzing a real-world survey of data professionals, covering gender pay gap, languages used, geographical distribution, average pay by role, job satisfaction, and work-life balance.

This project explores fuel consumption and emissions data across various vehicle makes and models. By analyzing attributes like engine size, transmission type, fuel type, and vehicle class...

Small salary prediction project including an exploratory notebook, a trained scikit-learn model (salary_model.pkl), and a Streamlit app (app.py) to estimate salary from Years of Experience and Job Rate.

Comprehensive local tool that scrapes website text, calls a LLM (e.g. Google Gemini) to produce exactly five FAQs in strict JSON, and serves a small web UI for interaction.

Extracts key fields from PDF invoices (invoice_number, bill_to, total_cost, item_description) and returns a structured JSON object. Includes a Jupyter notebook, Flask API, static frontend, and optional SQLite persistence.
SQL Server database and accompanying tools for scheduling meetings, managing employees, rooms, invitations, documents, minutes and notifications within an organization.

JavaFX application for managing internships: offers, candidates, internship assignments, evaluations and administrative workflows. Built with Java, JavaFX, Maven and MySQL.

Jupyter notebook training a simple CNN on the MNIST handwritten digits dataset, including training, evaluation, and error analysis.