We are currently using https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/markdown_header_metadata/ but it is currently very limited.
A set of regex patterns could be used and extended based on exploration of different pdfs converted to markdown.