Meet the Pronoun Proofer, a Zulip bot that validates the usage of pronouns in a given text to ensure they match the preferences of the people being referenced. This bot now leverages NLP for clever parsing!
This constructive tool is designed to improve the community experience at RC by helping people identify and fix any mistaken pronouns. Pronoun Proofer will NOT punish, shame, or embarrass people when they slip up.
Instead, it will reach out to them privately so that they are aware of the potential mismatch, encouraging them to review and edit their message to reflect the correct pronouns. With the help of this bot, folks in the community can connect with one another on a deeper and more respectful level!
Pronoun Proofer runs 24/7 on RC's Heap Community Cluster
- Zulip bot is subscribed to all public streams
- New message event in a stream triggers validation pipeline
- Alternatively, pipeline triggered by update message (edit) event
- Bot scans for any mentions (@) in message content
- Name + pronouns are extracted from Zulip name tag markdown
- NLP is applied to full text content, generating clusters for entities
- All mentioned names are linked to their cluster pronouns
- Mappings are reviewed to check for any discrepancies
- Any detected mismatches are flagged for a secondary context check
- Context window is expanded by retrieving previous 5 messages in thread
- NLP is again applied, this time to the larger stream history
- If no mismatches, or if false positives clarified by context window, no action
- If wrong pronouns, bot privately DMs writer of message, with link to revisit + edit
- Python: logic with Zulip client
- CLI arguments processed via Python's
Clickpackage- easily run as client or service
- CLI arguments processed via Python's
- spaCy (NLP): experimental coreference pipeline
- cluster component
- span resolver component
- Linux: Bash scripts for RC's heap cluster
- user instance of
systemd .servicefiles run withenable-linger.timerfile to act as cron job for log extraction
- user instance of
Python dependencies are managed by Poetry.
To install dependencies:
make setup
For a fast run:
make all
To run bot in production (listen for and respond to messages 24/7):
make run-prod
To run bot in development (one-off real-world testing instance):
make run-dev
To run a series of unit tests for the bot:
make tests
To iteratively fine-tune model:
make fine_tune_model
A massive thank you to the wonderful community of builders, creators, and programmers at the Recurse Center!
And speaking of people at RC.. I'd especially like to thank Florian Ragwitz, who paired with me on this project! Florian's Linux expertise is what helped get Pronoun Proofer onto the heap cluster, and the two of us also collaborated on property-based testing.
The feedback and edge cases provided by folks at RC have really helped this bot grow and evolve with time. Stay tuned as I continue to iterate on training / fine-tuning for improved NLP!