Analytics Capstone Projects

During my Master's in Analytics at Purdue University's Krannert School of Management, I completed several applied analytics projects that tackled real-world business problems using data science methodologies. These projects span diverse domains—from sports analytics and social media virality prediction to financial forecasting and personalized recommendations—demonstrating proficiency in optimization modeling, natural language processing, deep learning, and recommendation systems.

Each project was developed collaboratively with talented teammates and presented practical, data-driven solutions to complex business challenges.

---

NBA Roster Optimization

GitHub: samirhusain26/nba-roster-optimization

Problem Statement

The Indiana Pacers, an NBA franchise based in Indianapolis playing in the league since 1967, faced a strategic decision: dissolve the current roster and draft an entirely new team while retaining only the top 5 performing players. Given a fixed annual salary budget, the challenge was to optimally select the remaining 9 players from the available talent pool to maximize team performance.

Approach

This project applied operations research and optimization techniques to the problem of professional sports roster construction. The NBA consists of 30 teams with an average of 14 players each, playing across 5 positions. The optimization model needed to:

Evaluate player performance metrics across multiple dimensions
Account for positional requirements and roster balance
Respect salary cap constraints
Maximize overall team competitiveness

Solution

Developed an analytics-based decision support system that formulates roster selection as a constrained optimization problem, balancing player performance statistics, salary requirements, and positional needs to recommend an optimal 9-player draft list within budget constraints.

---

Craigslist Virality Prediction & Content Moderation

GitHub: samirhusain26/craigslistNLP

Course: MGMT 590 - Analyzing Unstructured Data

Team: Arun Ramakrishnan, Pranay Khandelwal, Roli Gupta, Shreeansh Priyadarshi, Samir Husain

Business Context

Craigslist operates on a minimal revenue model where most postings are free, with charges only for job and apartment listings in select major cities. The platform's core mission is providing simple, functional service to society. However, opportunities exist to increase user engagement and site traffic.

Problem Statement

The existing "Featured Listings" section surfaces content based purely on user likes. We identified an opportunity to proactively predict which posts would resonate with users and potentially go viral, while simultaneously filtering inappropriate content that could harm user experience.

Solution

Built a machine learning classification system with two objectives:

1. Virality Prediction — Using the "Best of Craigslist" posts as a proxy for viral content, trained models to predict whether new posts would achieve similar engagement

2. NSFW Content Detection — Developed a separate classifier to identify and filter obscene content

Proposed Implementation

A "This is Cool" button that directs users to a curated page of posts predicted to match characteristics of historically viral content. This feature would:

Increase time on site by surfacing engaging content
Drive organic growth through social sharing
Improve user satisfaction by filtering inappropriate material

Impact

The functionality makes it easier for users to discover interesting and amusing posts, potentially expanding Craigslist's reach to new users through word-of-mouth sharing.

---

Stock Market Prediction with LSTM

GitHub: samirhusain26/stockmarketLSTM

Course: MGMT 590 - Big Data Technologies

Team: Roli Gupta, Zaid Ahmed, Saumya Bharati, Gajender Saharan, Samir Husain

Project Video: YouTube Presentation

Project Overview

Developed an end-to-end data pipeline and deep learning model to predict stock prices for Walt Disney Co. (DIS), demonstrating proficiency in big data technologies and neural network architectures.

Technical Architecture

#### Data Pipeline

1. Data Ingestion — Downloaded NYSE historical data from Kaggle

2. Data Processing — Pushed data to Hive, created scripts to splice Disney-specific stock prices

3. Visualization — Connected Hive to Tableau for exploratory analysis

4. Data Augmentation — Appended recent stock data using Yahoo Finance API

5. Model Training — LSTM neural network for time series prediction

6. Output — Predictions sent to Google Sheets feeding into Tableau dashboard

#### Infrastructure

Hosted on GCP VM instance running Debian/GNU 9
Automated daily execution via crontab at 1:00 AM
Training data spans from January 1, 1962 to present day
Model predicts opening and closing prices for the next 7 days

Technology Stack

Apache Hive for data warehousing
Python with TensorFlow/Keras for LSTM modeling
Yahoo Finance API for real-time data
Google Cloud Platform for hosting
Tableau for visualization and dashboarding

---

Word-of-Mouth Restaurant Recommendation System

GitHub: samirhusain26/WoM-restaurant-recommendation

Course: MGMT 590 - Using R for Analytics

Team Revengers: Arun Ramakrishnan, Juily Vasandani, Maharshi Dutta, Samir Husain, Yizhu Liao

Live Demo: Shiny App

Presentation: YouTube Video

Problem Statement

Traditional restaurant recommendation systems rely heavily on star ratings and categorical filters. However, the nuanced opinions expressed in text reviews often capture aspects of dining experiences that numerical ratings miss.

Solution

Built a personalized restaurant recommendation system that analyzes Yelp reviews to find restaurants matching user preferences based on semantic similarity rather than just ratings.

Methodology

Text Processing — Cleaned and preprocessed Yelp review corpus
Feature Extraction — Applied NLP techniques to extract meaningful features from review text
Similarity Matching — Developed algorithms to match user-provided preferences with restaurant review characteristics
Recommendation Engine — Ranked restaurants by relevance to user input

Deployment

Created an interactive R Shiny application allowing users to:

Input their dining preferences in natural language
Receive personalized restaurant recommendations
Explore recommended restaurants with supporting review excerpts

Business Impact

The system aims to increase customer satisfaction by surfacing relevant restaurants that might be overlooked by traditional filtering methods, while helping restaurants expand their customer base through more accurate matching.

---

Skills Demonstrated

Across these projects, I developed and applied skills in:

Optimization & Operations Research — Linear programming, constrained optimization
Natural Language Processing — Text classification, sentiment analysis, feature extraction
Deep Learning — LSTM networks, time series forecasting
Big Data Technologies — Hive, GCP, distributed computing
Data Visualization — Tableau dashboards, interactive reporting
Full-Stack Analytics — End-to-end pipeline development from data ingestion to production deployment
Collaborative Development — Cross-functional teamwork on complex analytical challenges

Tags

NBA Roster Optimization

Problem Statement

Approach

Solution

Craigslist Virality Prediction & Content Moderation

Business Context

Problem Statement

Solution

Proposed Implementation

Impact

Stock Market Prediction with LSTM

Project Overview

Technical Architecture

Technology Stack

Word-of-Mouth Restaurant Recommendation System

Problem Statement

Solution

Methodology

Deployment

Business Impact

Skills Demonstrated