BabbleFishv2

Agentic translation system, an attempted implementation of (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts. https://arxiv.org/html/2405.11804v1. Aided with Graphiti inspired graphRAG using temporal chapter based memory, WIP.

Features

Knowledge Graphs: Uses Neo4j to map triplets and entities into a graphical database
Feedback Loops: LLM based feedback loops for reviewing
Workflow Visualization: Generates Mermaid diagrams of the process

Current pipeline

Setup Phase:

Language detection Lingua for detection
Styleguide Creation Creates a styleguide for future translation
Genre Tagging Tags text with choices from the set genre enum
TODO Topic Tags Use some topic modelling approach

Ingestion Phase:

Entity Extraction LLM for categorised NER
Triplet Extraction Temporally and metadata tagged triplet extraction using fixed predicate enums
TODO Tuple Extraction For tuples since relations may be with themselves, e.g. traits

Annotation Phase:

Entity Replacer Tags a recognised entity in the text with its match in translation memory
WIP Add an agent with some tool use for the database

Translation Phase:

Translation Translation with gemini
Junior Editor: Feedback on translation input, can reject up to 3 times
Fluency Editor: Base text blind index based editing for fluency

TODO

Novel factory, takes in text dicts to produce them, probably abstracts loading from epub, txt etc
Create nlp provider which does language configurable POS tagging, lemmatisation etc for preprocessing
Tagging using corextopic for topic modelling, potentially seed it then use llm to classify topics
Try other approaches with keyword extraction after preprocessing
Maybe change the nodes to all be an implementation of an abstract class for more consistency
Embeddings with entity descriptions
More database queries BM25, community clustering, etc
Setup phase creates domain specific edge types
DB query agent for informing translations
agent profiles
langsmith or similar for some evaluations on unit tests
Include batch processing capabilities
Add metrics and monitoring
Implement different editorial personas, probably need to abstract nodes using a registry for this
Fix github workflow
2 fold triplet extraction, also for attribute based triplets (or tuples maybe is more accurate?)
Funny entity resolution bug, A changed his name to B, B has coreference resolution with A so triplet reads as A changed to A

Ticked Off

Logging
Abstract class workflow factories inherit from
Entity replacer to substitute in Translation Memory Joshua -> Joshua [Translation Memory 约书亚]
Architecturally novel processor feels like a mess, remake it
Translation Orchestrator
Get a better prompt so it stops screwing up predicates
Entity unification
Informative relation based triplet extraction
change architecture, 4 workflows,
- setup (get language, style guide etc),
- ingestion (get triplets into graphical database with localisations) status: Partially done
- annotation (annotate the base text with references like translations etc)
- translation (generic translation workflow with feedback loops etc) status: Partially done
Registry for the translation orchestrator

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
.github/workflows		.github/workflows
.vscode		.vscode
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bertopictest.py		bertopictest.py
cleanup_old_tests.sh		cleanup_old_tests.sh
corex_analysis.py		corex_analysis.py
corex_test w. yake.py		corex_test w. yake.py
debug_config.py		debug_config.py
enhanced_preprocessing.py		enhanced_preprocessing.py
fixed_bertopic_example.py		fixed_bertopic_example.py
improved_vectorization.py		improved_vectorization.py
keyberttem.py		keyberttem.py
optimized_corex.py		optimized_corex.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_tests.py		run_tests.py
stable_bertopic_example.py		stable_bertopic_example.py
testing.py		testing.py
topic_network.png		topic_network.png
topic_optimization.png		topic_optimization.png
topic_wordclouds.png		topic_wordclouds.png
trans.py		trans.py
workflow_graph.md		workflow_graph.md
yake_test.py		yake_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BabbleFishv2

Features

Current pipeline

TODO

Ticked Off

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

dargem/BabbleFishv2

Folders and files

Latest commit

History

Repository files navigation

BabbleFishv2

Features

Current pipeline

TODO

Ticked Off

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages