Skip to content

This repository is for prediction of movies and beer rating using the principles of finding the line slopes and itercepts using the linear algebraic inverse formulares and veryifying theta and Mean Squared Error and absolute error values

Notifications You must be signed in to change notification settings

chinmaynadgir/Movie-Beer-Rating-prediction

Repository files navigation

Movie-Beer-Rating-prediction

This repository is for prediction of movies and beer rating using the principles of finding the line slopes and itercepts using the linear algebraic inverse formulares and veryifying theta and Mean Squared Error and absolute error values

This repository contains two assignments focused on feature engineering, linear and logistic models, and a simple item–item collaborative filtering approach. The code is organized to run with the provided runner notebooks and uses clean, modular functions for reproducibility.

Methods

Length-based regression: Predict ratings using scaled review length.

Time features:

Reduced one-hot encoding for weekday and month, constrained to fit dimension limits.

Numeric encoding variant with explicit offset term.

Train/test split evaluation to compare encodings.

Beer sentiment classification:

Baseline feature: review text length.

Improved features: length plus available subratings.

Logistic regression with balanced class weighting.

Precision@K evaluation using predicted probabilities.

ABV classification (beer):

Modular feature builder combining style one-hot, subratings, and scaled length.

Logistic regression with validation-based selection of regularization strength.

Ablation to quantify the contribution of each feature group.

Rating prediction (books):

Item–item collaborative filtering using Jaccard similarity over user sets.

Fallbacks to item and global averages for sparse cases.

Blended predictor that combines neighbor estimate with user/item baselines.

How to Run

Place homework1.py and homework2.py in the same directory as the runner notebooks.

Open each runner (Jupyter Notebook or JupyterLab).

Restart kernel to pick up changes.

Run all cells. Outputs include metrics (MSE, BER, precision@K) and sample results.

Data Field Conventions

Books:

Text: review_text

Rating: rating (fallbacks: star_rating, overall)

Time: parsed_date (provided by runner)

Beer:

Text: review/text

Labels: review/overall (classification), beer/ABV (diagnostics)

Category: beer/style

Subratings: review/aroma, review/appearance, review/palate, review/taste, review/overall

Design Choices

Explicit feature scaling for lengths using training-set maximum.

Reduced one-hot encodings to respect dimensionality constraints.

Bias handling aligned with assignment expectations per question.

Class weighting for imbalanced classification tasks.

Deterministic, readable functions to simplify grading and reuse.

Methods Used:

1.Feature engineering under dimensionality constraints. 2.Linear and logistic modeling with proper evaluation (MSE, BER). 3.Regularization selection via validation. 4.Recommender system fundamentals with robust fallbacks. 5.Clean module organization and reproducible experiments.

Requirements

Python 3.x Standard Libraries

numpy, scikit-learn

Jupyter Notebook or JupyterLab

About

This repository is for prediction of movies and beer rating using the principles of finding the line slopes and itercepts using the linear algebraic inverse formulares and veryifying theta and Mean Squared Error and absolute error values

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published