Movie-Beer-Rating-prediction

This repository is for prediction of movies and beer rating using the principles of finding the line slopes and itercepts using the linear algebraic inverse formulares and veryifying theta and Mean Squared Error and absolute error values

This repository contains two assignments focused on feature engineering, linear and logistic models, and a simple item–item collaborative filtering approach. The code is organized to run with the provided runner notebooks and uses clean, modular functions for reproducibility.

Methods

Length-based regression: Predict ratings using scaled review length.

Time features:

Reduced one-hot encoding for weekday and month, constrained to fit dimension limits.

Numeric encoding variant with explicit offset term.

Train/test split evaluation to compare encodings.

Beer sentiment classification:

Baseline feature: review text length.

Improved features: length plus available subratings.

Logistic regression with balanced class weighting.

Precision@K evaluation using predicted probabilities.

ABV classification (beer):

Modular feature builder combining style one-hot, subratings, and scaled length.

Logistic regression with validation-based selection of regularization strength.

Ablation to quantify the contribution of each feature group.

Rating prediction (books):

Item–item collaborative filtering using Jaccard similarity over user sets.

Fallbacks to item and global averages for sparse cases.

Blended predictor that combines neighbor estimate with user/item baselines.

How to Run

Place homework1.py and homework2.py in the same directory as the runner notebooks.

Open each runner (Jupyter Notebook or JupyterLab).

Restart kernel to pick up changes.

Run all cells. Outputs include metrics (MSE, BER, precision@K) and sample results.

Data Field Conventions

Books:

Text: review_text

Rating: rating (fallbacks: star_rating, overall)

Time: parsed_date (provided by runner)

Beer:

Text: review/text

Labels: review/overall (classification), beer/ABV (diagnostics)

Category: beer/style

Subratings: review/aroma, review/appearance, review/palate, review/taste, review/overall

Design Choices

Explicit feature scaling for lengths using training-set maximum.

Reduced one-hot encodings to respect dimensionality constraints.

Bias handling aligned with assignment expectations per question.

Class weighting for imbalanced classification tasks.

Deterministic, readable functions to simplify grading and reuse.

Methods Used:

1.Feature engineering under dimensionality constraints. 2.Linear and logistic modeling with proper evaluation (MSE, BER). 3.Regularization selection via validation. 4.Recommender system fundamentals with robust fallbacks. 5.Clean module organization and reproducible experiments.

Requirements

Python 3.x Standard Libraries

numpy, scikit-learn

Jupyter Notebook or JupyterLab

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Project1_runner.ipynb		Project1_runner.ipynb
Project1_stub.ipynb		Project1_stub.ipynb
Project2_runner.ipynb		Project2_runner.ipynb
Project2_stub.ipynb		Project2_stub.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Movie-Beer-Rating-prediction

Methods

How to Run

Data Field Conventions

Design Choices

Methods Used:

Requirements

About

Uh oh!

Releases

Packages

Languages

chinmaynadgir/Movie-Beer-Rating-prediction

Folders and files

Latest commit

History

Repository files navigation

Movie-Beer-Rating-prediction

Methods

How to Run

Data Field Conventions

Design Choices

Methods Used:

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages