Skip to content

Convert DOCX documents

Manfredi Marceca edited this page Mar 24, 2025 · 4 revisions

DOCX to RTF

  1. Install the DocSharp.Docx package from NuGet

  2. Use the following code:

var converter = new DocxToRtfConverter();
converter.Convert(inputFile, outputFile); // file paths or streams; inputFile may also be a WordprocessingDocument object

To customize the default font and paragraph formatting in case they are not specified in the document, you can access the DefaultSettings property:

converter.DefaultSettings.FontName = "Calibri"; 
converter.DefaultSettings.FontSize = 11; // In points (default is 12)
converter.DefaultSettings.SpaceAfterParagraph = 0; // In points (default is 8)
converter.DefaultSettings.LineSpacing = 1; // In lines (default is 1.15)

DOCX to RTF string

To produce an RTF string rather than directly saving to a file path or stream:

var converter = new DocxToMarkdownConverter();
string rtf = converter.ConvertToString(inputFile);

DOCX to Markdown

  1. Install the DocSharp.Docx package from NuGet

  2. Use the following code:

var converter = new DocxToMarkdownConverter();
converter.Convert(inputFile, outputFile); // file paths or streams; inputFile may also be a WordprocessingDocument object

Since many Markdown processors (e.g. GitHub) don't support base64 images, to enable images conversion you need to set the ImagesOutputFolder and ImagesBaseUriOverride properties. The first one specifies where images are actually saved and should be an absolute directory path, the second one is the first part of an offline or online URI which will be combined with the image file name and written in the Markdown file.
For example, to save images in the same folder of the Markdown document:

ImagesOutputFolder = Path.GetDirectoryName(inputFilePath),
ImagesBaseUriOverride = "", // will produce just the image file name, same effect as "./"

DOCX to Markdown string

To produce a Markdown string rather than directly saving to a file path or stream:

var converter = new DocxToMarkdownConverter();
string markdown = converter.ConvertToString(inputFile);

Math blocks

Mathematical formulas in the DOCX document will be converted to LaTex syntax and embedded in a block like the following:

$x=\dfrac{-b\pm \sqrt{b^{2}-4ac}}{2a}$

Please note that not all Markdown processors support math blocks, and that formatting and non-mathematical content are not currently supported when producing the LaTex syntax.

DOCX to plain text

To extract plain unformatted text from DOCX documents you can refer to the following code:

var converter = new DocxToTxtConverter();
converter.Convert(inputFilePath, "output.txt"); // file paths or streams; inputFile may also be a WordprocessingDocument object

Text will be extracted from most elements, including paragraphs, hyperlinks, text boxes and tables.

Tables

Table layout is maintained when converting to plain text. For example, if the table has 2 rows and 3 columns the following output will be produced:

+---+---+---+  
| 1 | 2 | 3 |  
+---+---+---+  
| 4 | 5 | 6 |  
+---+---+---+

Multi-line paragraphs, lists and merged cells are supported, but nested tables are ignored.

It is recommended to use a monospaced font (such as Cascadia Code, Consolas or Courier) in the text editor used to view the result (e.g. Notepad or VS Code), so that the characters are aligned correctly.

Open XML SDK extension methods

The SaveTo extension method can be used to save a WordprocessingDocument object to a separate DOCX, RTF or Markdown document:

using (WordprocessingDocument document = WordprocessingDocument.Create("document.docx", WordprocessingDocumentType.Document))
{
     MainDocumentPart mainPart = wordDocument.AddMainDocumentPart();
     mainPart.Document = new Document();
     Body body = mainPart.Document.AppendChild(new Body());
     Paragraph paragraph = body.AppendChild(new Paragraph());
     Run run = paragraph .AppendChild(new Run());
     run.AppendChild(new Text("Add some text here."));
     document.SaveTo("document.rtf", SaveFormat.Rtf);
}
Clone this wiki locally