Replies: 2 comments 1 reply
-
|
I guess the thing I'm most worried about is maintenance burden. This is a format I don't know at all; it may be that I wouldn't even be able to read the documentation. In the end, I'm often the one who ends up having to respond to bug reports, even if someone else contributed the writer. And a writer/reader for a format that includes embedded resources in a zip-like structure is going to be more complicated to support. In addition, it would be a fairly heavy patch, when you include the source files, tests, reference doc, etc. On the other hand, if this is the main document format for a large country with 30 years of legacy documents, I can see that it would be very useful to have pandoc support. I'd say: go ahead with what you're doing. You will end up with something useful in any case. Either it can be included in pandoc, or if we decide the maintenance burden is too heavy, you can create a Haskell executable program that uses the pandoc library and your writer code to expose a standalone anything -> hwpx converter. I wouldn't want to make a final decision on this before seeing the code, but in any case you wouldn't waste your time. |
Beta Was this translation helpful? Give feedback.
-
|
I'm creating HWP 5.0 and HWPX specifications in machine- and human-readable formats. I'm doing this because there are issues with the current open spec by the government and Hancom, and the Hancom spec may have licensing issues. With these specifications, people won't have to duplicate spec understanding work per project. The specifications include file <-> spec validation, and other useful libraries. I have validated own specifications research with around 140,000 HWP and HWPX files. I will publish them soon, probably in a couple of weeks. I need to refine some things and reduce legal issues. If you are interested, I can provide you some samples. I hope this can be helpful for projects like this. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
About Me and My Experience with Pandoc
Hi, I'm a developer working at a government-funded research institute in South Korea.
I have been using Pandoc for the past 10 years — primarily to convert Markdown and LaTeX into Word documents — and it has been an indispensable part of my daily workflow. I have deep respect for the project and the community that has built and maintained it. I would be honored to contribute back by proposing and implementing an HWPX writer.
Background: South Korea's HWP Lock-in
South Korea has a word processor called Hangul Word Processor (HWP), first released in 1989. Optimized for Korean script processing, it rapidly gained dominance in the early Korean PC market. In the 1990s, the Korean government adopted it as the standard document authoring tool for public institutions, making it the de facto standard for all government documents.
As a result, HWP remains mandatory across virtually all public sectors in Korea today — government agencies, public institutions, educational bodies, and courts. Over 30 years of accumulated documents in this format have created a deep vendor lock-in at the national level.
What Is HWPX?
HWPX is the modern, open successor to the legacy binary HWP format. Structurally similar to OOXML (
.docx) and ODF (.odt), it is a ZIP-based package containing XML files. Unlike the proprietary binary.hwpformat,.hwpxuses a published XML schema, making it far more amenable to programmatic manipulation. The Korean government designated HWPX as a national standard (KS X 6101) in 2014 and has been actively encouraging its adoption across public agencies.Why This Matters
Millions of public sector workers in Korea create HWP/HWPX documents daily, yet this format is absent from Pandoc's rich conversion ecosystem. Adding HWPX support to Pandoc would:
Work Done So Far
I have been approaching this problem from two directions.
1. Python Prototype: pypandoc-hwpx
As a proof of concept, I developed pypandoc-hwpx, a Python-based conversion tool that takes Pandoc's JSON AST as input and produces HWPX output.
Currently supported features:
The project is published on PyPI (
pip install pypandoc-hwpx) and has received positive reception on GeekNews (a Korean equivalent of Hacker News).2. Haskell HWPX Writer (In Progress)
My ultimate goal is to contribute a native HWPX writer to Pandoc. I have been learning Haskell specifically for this purpose and currently have a working prototype. I am porting the conversion logic validated in the Python prototype into Haskell.
Technical Approach
The HWPX writer architecture follows patterns similar to the existing DOCX writer (
T.P.W.Docx):BlockandInlineelementshp:,hh:,hc:,hs:)--reference-docHWPX Document Structure
Questions for Discussion
I would like to hear the community's thoughts on including an HWPX writer in Pandoc:
Pandoc already handles several XML/ZIP-based formats beautifully — DOCX, ODT, EPUB — so I believe HWPX can follow the same established patterns. That said, I fully respect the community's judgment and would be glad to contribute in whatever form is considered most appropriate.
References
Thank you for your time and consideration.
Beta Was this translation helpful? Give feedback.
All reactions