Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
YelpMeKnow.ipynb	YelpMeKnow.ipynb

Name

Last commit message

Last commit date

YelpMeKnow

YelpMeKnow is a text classifier model, which leverages the power of Google's BERT pretrained models through the Hugging Face Pytorch implementation. Specifically the BERT-Base-Uncased: 12-layer, 768-hidden, 12-heads, 110M parameters, is used.

The model performs a Sequence Classification analysis of the customer satisfaction, (positive vs. negative reviews), contained in Yelp Review Polarity Dataset. The dataset containes 560,000 training samples and 38,000 testing samples, but due to limited resources and time I'm training/validating and testing on a really small subset of it.

Project data

Train data size: 20000 ~ 3.6% of training samples

Test data size: 2000 ~ 5.3% of testing samples

Epochs: 1

Matthew's correlation coefficient: ~ 0.86

The accuracy of predictions is evaluated using Matthew’s correlation coefficient, which is in essence a correlation coefficient value between -1 and +1. A coefficient of +1 represents a perfect prediction, 0 an average random prediction and -1 an inverse prediction.

N.B. [WORK IN PROGRESS] Data preparation and model require improvements and further training.

This project is part of SPAIC Project Showcase Challenge

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

YelpMeKnow

Project data

FilesExpand file tree

Samuela_Anastasi

Directory actions

More options

Directory actions

More options

Latest commit

History

Samuela_Anastasi

Folders and files

parent directory

README.md

YelpMeKnow

Project data