Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README.md

Dataset Description

Dataset 1:

Kaggle Link

Thanks to Kaggle for giving this open source data.

Context:-

The Myers Briggs Type Indicator (or MBTI for short) is a personality type system that divides everyone into 16 distinct personality types across 4 axis:

Introversion (I) – Extroversion (E)
Intuition (N) – Sensing (S)
Thinking (T) – Feeling (F)
Judging (J) – Perceiving (P)

More can be learnt what this means here:- https://www.myersbriggs.org/my-mbti-personality-type/mbti-basics/home.htm

So for example, someone who prefers introversion, intuition, thinking and perceiving would be labelled an INTP in the MBTI system, and there are lots of personality based components that would model or describe this person’s preferences or behaviour based on the label.

It is one of, if not, the most popular personality test in the world. It is used in businesses, online, for fun, for research and lots more. A simple google search reveals all of the different ways the test has been used over time. It’s safe to say that this test is still very relevant in the world in terms of its use.

From scientific or psychological perspective it is based on the work done on cognitive functions by Carl Jung i.e. Jungian Typology. This was a model of 8 distinct functions, thought processes or ways of thinking that were suggested to be present in the mind. Later this work was transformed into several different personality systems to make it more accessible, the most popular of which is of course the MBTI.

Recently, its use/validity has come into question because of unreliability in experiments surrounding it, among other reasons. But it is still clung to as being a very useful tool in a lot of areas, and the purpose of this dataset is to help see if any patterns can be detected in specific types and their style of writing, which overall explores the validity of the test in analysing, predicting or categorising behaviour.

Dataset 2:

This data belongs to this github link

This code implements the model discussed in Deep Learning-Based Document Modeling for Personality Detection from Text For detection of Big-Five personality traits, namely:

Extroversion
Neuroticism
Agreeableness
Conscientiousness
Openness

essays.csv- File contains the data which we will be going to hop over for our code

Entities of the dataset:-

  1. Author Id- Numeric

  2. Text - Parah

  3. Extroversion- Yes/No (y/n)

  4. Neuroticism- Yes/No (y/n)

  5. Agreeableness- Yes/No (y/n)

  6. Conscientiosness- Yes/No (y/n)

  7. Openness- Yes/No (y/n)

based on that we have to make our analysis

mairesse.csv file contains mairesse based feature will not be using now

for additional reading what mairesse features are can hop over to the link:- mairesse

Contributors

Name Slack
Shashank Jain @Shashank Jain
Shreya Jaiswal @Shreya jaiswal
Susy Jam @susyjam