Code for learning user representations as described in the paper Modelling Context with User Embeddings for Sarcasm Detection in Social Media paper
requirements:
- python >= 2.7
- sma_toolkit
- numpy
- gensim
- joblib
- theano
- pretrain word embeddings using [gensim] (https://radimrehurek.com/gensim/models/word2vec.html) with the hierarchical softmax option (see the [documention] (https://radimrehurek.com/gensim/models/word2vec.html) on how to do this--tl;dr set the flag hs=1). Save the embeddings in binary format.
- clone or download the [my_utils] (https://github.com/samiroid/utils) module
- edit file
setup.sh
to change the paths tomy_utils
and the word embeddings; runsetup.sh
- edit file
build_data.sh
to change the paths to the word embeddings and the file containing the user's tweets; runbuild_data.sh
- edit file
run.sh
to change the paths to the word embeddings (binary format) and the ouput user embeddings; runrun.sh
- kick-back and relax :)