This module downloads all web pages listed in the Sitemap.xml file and compiles them into a single document.
Designed for AI Embedding Generation
Terminal
npm init -y && npm i sitemap2doc
Node index.mjs
import { Sitemap2Doc } from 'sitemap2doc'
const s2d = new Sitemap2Doc()
await s2d.getDocument( {
'projectName': 'test',
'sitemapUrl': 'https://...'
} )
Terminal
node index.mjs
Key | Type | Description | Required | Default |
---|---|---|---|---|
projectName | String |
Set project name | true |
|
sitemapUrl | String |
Set sitemap source | true |
|
silent | Boolean |
Control terminal output | false |
false |
Example
import { Sitemap2Doc } from 'sitemap2doc'
const s2d = new Sitemap2Doc()
await s2d.getDocument( {
'projectName': 'test',
'sitemapUrl': 'https://...'
} )
Get Sitemap https://...
Get Pages 0 1 2 3 4 5 6 7 8 9
Merge 0
Get current config, the default config you can find here: ./src/data/config.mjs
import { Sitemap2Doc } from 'sitemap2doc'
const s2d = new Sitemap2Doc()
let config = s2d.getConfig()
config['download']['chunkSize'] = 4
s2d
.setConfig( { config } )
.getDocument( { ... } )
All module settings are stored in a config file, see ./src/data/config.mjs. This file can be completely overridden by passing an object during initialization.
import { Sitemap2Doc } from 'sitemap2doc'
const s2d = new Sitemap2Doc()
let config = s2d.getConfig()
config['download']['chunkSize'] = 4
s2d
.setConfig( { config } )
.getDocument( { ... } )
The module is available as open source under the terms of the Apache 2.0. License.