During my Master's in Analytics at Purdue University's Krannert School of Management, I completed several applied analytics projects that tackled real-world business problems using data science methodologies. These projects span diverse domains—from sports analytics and social media virality prediction to financial forecasting and personalized recommendations—demonstrating proficiency in optimization modeling, natural language processing, deep learning, and recommendation systems.
Each project was developed collaboratively with talented teammates and presented practical, data-driven solutions to complex business challenges.
---
NBA Roster Optimization
GitHub: samirhusain26/nba-roster-optimization
Problem Statement
The Indiana Pacers, an NBA franchise based in Indianapolis playing in the league since 1967, faced a strategic decision: dissolve the current roster and draft an entirely new team while retaining only the top 5 performing players. Given a fixed annual salary budget, the challenge was to optimally select the remaining 9 players from the available talent pool to maximize team performance.
Approach
This project applied operations research and optimization techniques to the problem of professional sports roster construction. The NBA consists of 30 teams with an average of 14 players each, playing across 5 positions. The optimization model needed to:
- Evaluate player performance metrics across multiple dimensions
- Account for positional requirements and roster balance
- Respect salary cap constraints
- Maximize overall team competitiveness
Solution
Developed an analytics-based decision support system that formulates roster selection as a constrained optimization problem, balancing player performance statistics, salary requirements, and positional needs to recommend an optimal 9-player draft list within budget constraints.
---
Craigslist Virality Prediction & Content Moderation
GitHub: samirhusain26/craigslistNLP
Course: MGMT 590 - Analyzing Unstructured Data
Team: Arun Ramakrishnan, Pranay Khandelwal, Roli Gupta, Shreeansh Priyadarshi, Samir Husain
Business Context
Craigslist operates on a minimal revenue model where most postings are free, with charges only for job and apartment listings in select major cities. The platform's core mission is providing simple, functional service to society. However, opportunities exist to increase user engagement and site traffic.
Problem Statement
The existing "Featured Listings" section surfaces content based purely on user likes. We identified an opportunity to proactively predict which posts would resonate with users and potentially go viral, while simultaneously filtering inappropriate content that could harm user experience.
Solution
Built a machine learning classification system with two objectives:
1. Virality Prediction — Using the "Best of Craigslist" posts as a proxy for viral content, trained models to predict whether new posts would achieve similar engagement
2. NSFW Content Detection — Developed a separate classifier to identify and filter obscene content
Proposed Implementation
A "This is Cool" button that directs users to a curated page of posts predicted to match characteristics of historically viral content. This feature would:
- Increase time on site by surfacing engaging content
- Drive organic growth through social sharing
- Improve user satisfaction by filtering inappropriate material
Impact
The functionality makes it easier for users to discover interesting and amusing posts, potentially expanding Craigslist's reach to new users through word-of-mouth sharing.
---
Stock Market Prediction with LSTM
GitHub: samirhusain26/stockmarketLSTM
Course: MGMT 590 - Big Data Technologies
Team: Roli Gupta, Zaid Ahmed, Saumya Bharati, Gajender Saharan, Samir Husain
Project Video: YouTube Presentation
Project Overview
Developed an end-to-end data pipeline and deep learning model to predict stock prices for Walt Disney Co. (DIS), demonstrating proficiency in big data technologies and neural network architectures.
Technical Architecture
#### Data Pipeline
1. Data Ingestion — Downloaded NYSE historical data from Kaggle
2. Data Processing — Pushed data to Hive, created scripts to splice Disney-specific stock prices
3. Visualization — Connected Hive to Tableau for exploratory analysis
4. Data Augmentation — Appended recent stock data using Yahoo Finance API
5. Model Training — LSTM neural network for time series prediction
6. Output — Predictions sent to Google Sheets feeding into Tableau dashboard
#### Infrastructure
- Hosted on GCP VM instance running Debian/GNU 9
- Automated daily execution via crontab at 1:00 AM
- Training data spans from January 1, 1962 to present day
- Model predicts opening and closing prices for the next 7 days
Technology Stack
- Apache Hive for data warehousing
- Python with TensorFlow/Keras for LSTM modeling
- Yahoo Finance API for real-time data
- Google Cloud Platform for hosting
- Tableau for visualization and dashboarding
---
Word-of-Mouth Restaurant Recommendation System
GitHub: samirhusain26/WoM-restaurant-recommendation
Course: MGMT 590 - Using R for Analytics
Team Revengers: Arun Ramakrishnan, Juily Vasandani, Maharshi Dutta, Samir Husain, Yizhu Liao
Live Demo: Shiny App
Presentation: YouTube Video
Problem Statement
Traditional restaurant recommendation systems rely heavily on star ratings and categorical filters. However, the nuanced opinions expressed in text reviews often capture aspects of dining experiences that numerical ratings miss.
Solution
Built a personalized restaurant recommendation system that analyzes Yelp reviews to find restaurants matching user preferences based on semantic similarity rather than just ratings.
Methodology
- Text Processing — Cleaned and preprocessed Yelp review corpus
- Feature Extraction — Applied NLP techniques to extract meaningful features from review text
- Similarity Matching — Developed algorithms to match user-provided preferences with restaurant review characteristics
- Recommendation Engine — Ranked restaurants by relevance to user input
Deployment
Created an interactive R Shiny application allowing users to:
- Input their dining preferences in natural language
- Receive personalized restaurant recommendations
- Explore recommended restaurants with supporting review excerpts
Business Impact
The system aims to increase customer satisfaction by surfacing relevant restaurants that might be overlooked by traditional filtering methods, while helping restaurants expand their customer base through more accurate matching.
---
Skills Demonstrated
Across these projects, I developed and applied skills in:
- Optimization & Operations Research — Linear programming, constrained optimization
- Natural Language Processing — Text classification, sentiment analysis, feature extraction
- Deep Learning — LSTM networks, time series forecasting
- Big Data Technologies — Hive, GCP, distributed computing
- Data Visualization — Tableau dashboards, interactive reporting
- Full-Stack Analytics — End-to-end pipeline development from data ingestion to production deployment
- Collaborative Development — Cross-functional teamwork on complex analytical challenges