Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

README.md

YelpMeKnow

YelpMeKnow is a text classifier model, which leverages the power of Google's BERT pretrained models through the Hugging Face Pytorch implementation. Specifically the BERT-Base-Uncased: 12-layer, 768-hidden, 12-heads, 110M parameters, is used.

The model performs a Sequence Classification analysis of the customer satisfaction, (positive vs. negative reviews), contained in Yelp Review Polarity Dataset. The dataset containes 560,000 training samples and 38,000 testing samples, but due to limited resources and time I'm training/validating and testing on a really small subset of it.

Project data

Train data size: 20000 ~ 3.6% of training samples

Test data size: 2000 ~ 5.3% of testing samples

Epochs: 1

Matthew's correlation coefficient: ~ 0.86

The accuracy of predictions is evaluated using Matthew’s correlation coefficient, which is in essence a correlation coefficient value between -1 and +1. A coefficient of +1 represents a perfect prediction, 0 an average random prediction and -1 an inverse prediction.

N.B. [WORK IN PROGRESS] Data preparation and model require improvements and further training.

This project is part of SPAIC Project Showcase Challenge