Skip to content

aadriien/pronoun-proofer

Repository files navigation

✍️📓 Pronoun Proofer

Description

Meet the Pronoun Proofer, a Zulip bot that validates the usage of pronouns in a given text to ensure they match the preferences of the people being referenced. This bot now leverages NLP for clever parsing!

This constructive tool is designed to improve the community experience at RC by helping people identify and fix any mistaken pronouns. Pronoun Proofer will NOT punish, shame, or embarrass people when they slip up.

Instead, it will reach out to them privately so that they are aware of the potential mismatch, encouraging them to review and edit their message to reflect the correct pronouns. With the help of this bot, folks in the community can connect with one another on a deeper and more respectful level!

How It Works

Pronoun Proofer runs 24/7 on RC's Heap Community Cluster

Listening

  1. Zulip bot is subscribed to all public streams
  2. New message event in a stream triggers validation pipeline
  3. Alternatively, pipeline triggered by update message (edit) event

Scanning

  1. Bot scans for any mentions (@) in message content
  2. Name + pronouns are extracted from Zulip name tag markdown

Parsing

  1. NLP is applied to full text content, generating clusters for entities
  2. All mentioned names are linked to their cluster pronouns
  3. Mappings are reviewed to check for any discrepancies

Resolving

  1. Any detected mismatches are flagged for a secondary context check
  2. Context window is expanded by retrieving previous 5 messages in thread
  3. NLP is again applied, this time to the larger stream history

Responding

  1. If no mismatches, or if false positives clarified by context window, no action
  2. If wrong pronouns, bot privately DMs writer of message, with link to revisit + edit

Tools / Tech

  • Python: logic with Zulip client
    • CLI arguments processed via Python's Click package
      • easily run as client or service
  • spaCy (NLP): experimental coreference pipeline
    • cluster component
    • span resolver component
  • Linux: Bash scripts for RC's heap cluster
    • user instance of systemd
    • .service files run with enable-linger
    • .timer file to act as cron job for log extraction

Getting Started

Python dependencies are managed by Poetry.

To install dependencies:

make setup

For a fast run:

make all

For the Zulip bot:

To run bot in production (listen for and respond to messages 24/7):

make run-prod

To run bot in development (one-off real-world testing instance):

make run-dev

To run a series of unit tests for the bot:

make tests

For the coreference model (NLP):

To iteratively fine-tune model:

make fine_tune_model

Acknowledgements

People

A massive thank you to the wonderful community of builders, creators, and programmers at the Recurse Center!

And speaking of people at RC.. I'd especially like to thank Florian Ragwitz, who paired with me on this project! Florian's Linux expertise is what helped get Pronoun Proofer onto the heap cluster, and the two of us also collaborated on property-based testing.

The feedback and edge cases provided by folks at RC have really helped this bot grow and evolve with time. Stay tuned as I continue to iterate on training / fine-tuning for improved NLP!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published