Skip to content

shfshanyue/markdown-read

Repository files navigation

Markdown Read

npm version GitHub issues GitHub stars npm downloads TypeScript node version code size install size npm bundle size npm bundle size dependencies tree shaking

Convert any URL to Markdown.

Try it online: HTML To Markdown

Tech Stack

  • @mozilla/readability for read meaning html
  • turndown for html to markdown
  • jsdom for parse html

Usage

You will need Node.js installed on your system, then install it globally.

$ npm i -g markdown-read

# Turn current page to markdown
$ markdown https://example.com
## Example Domain

This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.

[More information...](https://www.iana.org/domains/example)

Options

  • --header: Add custom headers to the request. This can be useful for setting user-agent strings or other HTTP headers required by the target website.

Example:

$ markdown https://httpbin.org/get --header 'User-Agent: Markdown Reader'

Support Plaforms

markdown-read includes special handling for various platforms, including:

  1. 掘金
  2. 知乎
  3. 博客园
  4. 微信公众号平台
  5. Segmentfault
  6. Github
  7. dev.to
  8. CSDN
  9. MDN

API Reference

markdown(url: string, options?: ReadOptions): Promise<MarkdownContent | null>

Converts a web page to Markdown format.

  • url: The URL of the web page to convert
  • options: Optional settings for document retrieval
    • headers: Additional headers to include in the request
    • fetcher: Custom function to fetch the HTML content

Returns a Promise that resolves to a MarkdownContent object or null if conversion fails.

turndown(html: string): string

Converts HTML content to Markdown.

  • html: The HTML string to convert

Returns the Markdown representation of the input HTML.

Advanced Features

  • Handles lazy-loaded images by setting their src attribute.
  • Extracts byline information from meta tags.
  • Supports platform-specific processing for various websites.
  • Uses Mozilla's Readability for content extraction.
  • Allows custom fetching logic through the fetcher option.