Proposal: Pandoc HWPX Writer — Support for Korea's Public Document Standard #11569

msjang · 2026-04-08T08:26:57Z

msjang
Apr 8, 2026

About Me and My Experience with Pandoc

Hi, I'm a developer working at a government-funded research institute in South Korea.

I have been using Pandoc for the past 10 years — primarily to convert Markdown and LaTeX into Word documents — and it has been an indispensable part of my daily workflow. I have deep respect for the project and the community that has built and maintained it. I would be honored to contribute back by proposing and implementing an HWPX writer.

Background: South Korea's HWP Lock-in

South Korea has a word processor called Hangul Word Processor (HWP), first released in 1989. Optimized for Korean script processing, it rapidly gained dominance in the early Korean PC market. In the 1990s, the Korean government adopted it as the standard document authoring tool for public institutions, making it the de facto standard for all government documents.

As a result, HWP remains mandatory across virtually all public sectors in Korea today — government agencies, public institutions, educational bodies, and courts. Over 30 years of accumulated documents in this format have created a deep vendor lock-in at the national level.

What Is HWPX?

HWPX is the modern, open successor to the legacy binary HWP format. Structurally similar to OOXML (.docx) and ODF (.odt), it is a ZIP-based package containing XML files. Unlike the proprietary binary .hwp format, .hwpx uses a published XML schema, making it far more amenable to programmatic manipulation. The Korean government designated HWPX as a national standard (KS X 6101) in 2014 and has been actively encouraging its adoption across public agencies.

Why This Matters

Millions of public sector workers in Korea create HWP/HWPX documents daily, yet this format is absent from Pandoc's rich conversion ecosystem. Adding HWPX support to Pandoc would:

Enable conversion from Markdown, LaTeX, HTML, DOCX, and other formats into HWPX
Be especially valuable in the LLM era — AI-generated Markdown/HTML content could be directly converted into HWPX for use in Korean public institutions
Connect Korea's document ecosystem to the international open-source toolchain
Significantly expand Pandoc's user base in a country of approximately 50 million people

Work Done So Far

I have been approaching this problem from two directions.

1. Python Prototype: pypandoc-hwpx

As a proof of concept, I developed pypandoc-hwpx, a Python-based conversion tool that takes Pandoc's JSON AST as input and produces HWPX output.

Currently supported features:

Headings, paragraphs, and code blocks
Bullet lists and ordered lists
Complex tables with cell merging (rowspan/colspan)
Image embedding and footnotes
Style and page layout inheritance from a reference HWPX template
Inline formatting: bold, italic, strikethrough, superscript/subscript

The project is published on PyPI (pip install pypandoc-hwpx) and has received positive reception on GeekNews (a Korean equivalent of Hacker News).

2. Haskell HWPX Writer (In Progress)

My ultimate goal is to contribute a native HWPX writer to Pandoc. I have been learning Haskell specifically for this purpose and currently have a working prototype. I am porting the conversion logic validated in the Python prototype into Haskell.

Technical Approach

The HWPX writer architecture follows patterns similar to the existing DOCX writer (T.P.W.Docx):

AST traversal: Recursively walk Block and Inline elements
HWPX XML generation: Produce XML elements conforming to the HWP Markup Language namespaces (hp:, hh:, hc:, hs:)
ZIP packaging: Bundle the generated XML files and media resources into the HWPX format (ZIP archive)
Reference document support: Inherit styles and layout settings from an existing HWPX file via --reference-doc

HWPX Document Structure

document.hwpx (ZIP)
├── META-INF/
│   ├── container.xml        # Root file path mapping
│   └── manifest.xml         # File manifest with MIME types
├── Contents/
│   ├── header.xml           # Document settings (styles, fonts, paper size, etc.)
│   ├── section0.xml         # Body content (paragraphs, tables, images, etc.)
│   └── content.hpf          # OPF-style content listing
├── BinData/                  # Binary resources (images, etc.)
├── Preview/
│   └── PrvText.txt          # Preview text
└── mimetype                  # application/hwp+zip

Questions for Discussion

I would like to hear the community's thoughts on including an HWPX writer in Pandoc:

Inclusion path: Is adding a new native writer a viable path, or would a standalone Lua writer or external filter be more appropriate?
Maintenance burden: What are your thoughts on the ongoing maintenance cost of supporting a new output format?
Technical feedback: Any advice on the approach, or patterns from existing writer implementations I should follow more closely?

Pandoc already handles several XML/ZIP-based formats beautifully — DOCX, ODT, EPUB — so I believe HWPX can follow the same established patterns. That said, I fully respect the community's judgment and would be glad to contribute in whatever form is considered most appropriate.

References

pypandoc-hwpx (Python prototype): https://github.com/msjang/pypandoc-hwpx
HWPX national standard (KS X 6101): Korean Industrial Standard — Open Word Processor Markup Language
PyPI: https://pypi.org/project/pypandoc-hwpx/

Thank you for your time and consideration.

jgm · 2026-04-08T10:58:07Z

jgm
Apr 8, 2026
Maintainer

I guess the thing I'm most worried about is maintenance burden. This is a format I don't know at all; it may be that I wouldn't even be able to read the documentation. In the end, I'm often the one who ends up having to respond to bug reports, even if someone else contributed the writer. And a writer/reader for a format that includes embedded resources in a zip-like structure is going to be more complicated to support. In addition, it would be a fairly heavy patch, when you include the source files, tests, reference doc, etc.

On the other hand, if this is the main document format for a large country with 30 years of legacy documents, I can see that it would be very useful to have pandoc support.

I'd say: go ahead with what you're doing. You will end up with something useful in any case. Either it can be included in pandoc, or if we decide the maintenance burden is too heavy, you can create a Haskell executable program that uses the pandoc library and your writer code to expose a standalone anything -> hwpx converter. I wouldn't want to make a final decision on this before seeing the code, but in any case you wouldn't waste your time.

0 replies

OctopusET · 2026-04-14T11:14:25Z

OctopusET
Apr 14, 2026

I'm creating HWP 5.0 and HWPX specifications in machine- and human-readable formats.

I'm doing this because there are issues with the current open spec by the government and Hancom, and the Hancom spec may have licensing issues.

With these specifications, people won't have to duplicate spec understanding work per project.

The specifications include file <-> spec validation, and other useful libraries.

I have validated own specifications research with around 140,000 HWP and HWPX files.

I will publish them soon, probably in a couple of weeks. I need to refine some things and reduce legal issues. If you are interested, I can provide you some samples.

I hope this can be helpful for projects like this.

1 reply

OctopusET Apr 15, 2026

It will be provided in both Korean and English

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Pandoc HWPX Writer — Support for Korea's Public Document Standard #11569

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Proposal: Pandoc HWPX Writer — Support for Korea's Public Document Standard #11569

Uh oh!

msjang Apr 8, 2026

About Me and My Experience with Pandoc

Background: South Korea's HWP Lock-in

What Is HWPX?

Why This Matters

Work Done So Far

1. Python Prototype: pypandoc-hwpx

2. Haskell HWPX Writer (In Progress)

Technical Approach

HWPX Document Structure

Questions for Discussion

References

Replies: 2 comments · 1 reply

Uh oh!

jgm Apr 8, 2026 Maintainer

Uh oh!

OctopusET Apr 14, 2026

Uh oh!

OctopusET Apr 15, 2026

msjang
Apr 8, 2026

Replies: 2 comments 1 reply

jgm
Apr 8, 2026
Maintainer

OctopusET
Apr 14, 2026