GitHub
DOCX Fidelity Engine

Paste Word. Keep Structure. Ship Faster.

docsjs is a render-first Word pipeline for frontend teams. Import clipboard or .docx into clean HTML snapshots with advanced semantics: lists, merged tables, anchored media, notes, revisions, formulas, charts, and SmartArt fallback.

50 automated tests passing
v0.1.5 current release line
6 deep-fidelity tracks delivered
CI + Bench quality gates and benchmark trend
npm i @coding01/docsjs

import { WordFidelityEditorReact } from "@coding01/docsjs/react";

<WordFidelityEditorReact
  lang="en"
  onChange={(payload) => console.log(payload.htmlSnapshot)}
  onError={(payload) => console.error(payload.message)}
/>

Page Directory

Deep-Fidelity Tracks

Anchor Layout v1

`wp:anchor` with relative references, overlap policy, z-layer, and wrap distance markers.

✅ done

Table Fidelity v1

`tblGrid/tcW`, merge spans, `tblBorders/tcBorders`, `tblCellSpacing`, and layout mode mapping.

✅ done

OMML / Chart / SmartArt

Semantic fallback output for formulas, charts, and diagram nodes to keep downstream processing stable.

✅ done

Golden Benchmark

Corpus-driven fidelity score and trend artifacts in CI for release confidence.

✅ done

Anchor Collision Parity

Pixel-level text wrapping collision parity with desktop Word.

⏳ next

Math High Fidelity

OMML to MathML/KaTeX pipeline for production rendering quality.

⏳ next

Feature Checklist

Delivered capabilities and currently planned gaps.

Core

  • ✅ Web Component core (`docs-word-editor`)
  • ✅ React adapter + Vue adapter
  • ✅ Events and imperative public API
  • ✅ Strict-only parser strategy

Import Pipeline

  • ✅ Clipboard import (`text/html`, `text/plain`)
  • ✅ `.docx` upload + relationship media mapping
  • ✅ Clipboard image hydration (`file:/blob:/cid:`)
  • ✅ Output as stable HTML snapshot

Layout Fidelity

  • ✅ List reconstruction (`numId`, `ilvl`, `lvlText`)
  • ✅ Table v1 (`tblGrid/tcW`, merge, border, spacing)
  • ✅ Floating anchors v1 (`wp:anchor` metadata)
  • ⏳ Anchor collision parity (pixel-level wrap)

Advanced Semantics

  • ✅ Footnotes / endnotes / comments
  • ✅ Revision markers (`ins` / `del`) + metadata
  • ✅ Page break semantic markers
  • ✅ DOCX hyperlink relationship + anchor mapping

Semantic Fallback

  • ✅ OMML fallback output
  • ✅ Chart semantic extraction fallback
  • ✅ SmartArt node fallback extraction
  • ⏳ OMML high-fidelity render pipeline (MathML/KaTeX)

Engineering Quality

  • ✅ 50 automated tests (regression + boundary)
  • ✅ Baseline snapshot regression framework
  • ✅ `verify` quality gate (lint/typecheck/test/build/size)
  • ✅ Parse report API for performance tuning

Architecture

Directory-style view from import to rendered snapshot.

1) Input Layer

  • Clipboard reader + upload reader
  • Source normalization and MIME routing
  • Legacy image URI hydration

2) DOCX Parser Layer

  • ZIP relationship resolution
  • Style/list/table/anchor semantic extraction
  • Strict mode parse report counters

3) HTML Snapshot Layer

  • Stable semantic attributes for downstream systems
  • Runtime render compatibility normalization
  • Host-friendly output for storage/render

4) UI Adapter Layer

  • Web Component core (`docs-word-editor`)
  • React/Vue wrappers with unified props/events
  • Demo apps with bilingual controls

Quality Gates

All changes must pass lint, typecheck, tests, build, and size budget.

CI Pipeline

  • npm run lint
  • npm run typecheck
  • npm run test
  • npm run build + npm run sizecheck

Fidelity Tooling

  • Golden corpus benchmark trend artifacts in CI
  • Baseline snapshot regression for semantic stability
  • Parse report counters for optimization loops

Roadmap (Deep Fidelity)

Next targets aligned with desktop Word parity goals.

Layout Parity

  • ⏳ `wp:anchor` full floating layout restoration
  • ⏳ Anchor collision/wrap parity with desktop Word
  • ⏳ Word table auto-layout algorithm parity

Semantic Rendering

  • ⏳ OMML parser and MathML/KaTeX render pipeline
  • ⏳ Chart semantic extraction + render strategy
  • ⏳ SmartArt degradation and readability strategy

Quick Start


        

Events

  • docsjs-change: { htmlSnapshot, source, fileName? }
  • docsjs-error: { message }
  • docsjs-ready: { version }

Methods

  • loadHtml(rawHtml: string): void
  • loadDocx(file: File): Promise<void>
  • loadClipboard(): Promise<void>
  • getSnapshot(): string
  • clear(): void

Docs Cases Showcase

Case A: Notes + Revisions + Page Break semantic markers for downstream review systems
<sup data-word-comment-ref="5">[c5]</sup>
<ins data-word-revision="ins" data-word-revision-author="Alice">...</ins>
<del data-word-revision="del" data-word-revision-author="Bob">...</del>
<span data-word-page-break="1"></span>
Case B: Tables + Anchors + Advanced Semantics layout features with fallback compatibility
<table style="table-layout:auto;border-collapse:separate;...">...</table>
<img data-word-anchor="1" data-word-wrap="square" data-word-anchor-relh="page" />
<span data-word-omml="1">x^(2)</span>
<figure data-word-chart="1" data-word-chart-type="bar">...</figure>
<figure data-word-smartart="1">...</figure>