Skip to content

Implement alternative to MarkdownHeaderTextSplitter #13

@daavoo

Description

@daavoo

We are currently using https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/markdown_header_metadata/ but it is currently very limited.
A set of regex patterns could be used and extended based on exploration of different pdfs converted to markdown.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions