Skip to content

Proof of concept of Solr Ref Guide converted to asciidoc format & using Asciidoctor for publishing

Notifications You must be signed in to change notification settings

ctargett/refguide-asciidoc-poc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Solr Ref Guide in Asciidoc

This project is a proof of concept for reimplementing the Apache Solr Reference Guide as an Asciidoc-driven HTML site.

** AS OF 10 MAY 2017 THIS PROJECT HAS BEEN MERGED WITH SOLR

OLD README CONTENT FOLLOWS

Instead of the current approach that uses Confluence, HTML pages and PDFs could be generated using a publishing toolchain that is integrated with existing Lucene/Solr build processes.

Asciidoc has been chosen as the format for Ref Guide format. This is a plain-text light-weight markup language, similar to Markdown, but with more options for structured articles, manuals, books, etc.

See the wiki page in this repo Why Asciidoc for discussion of reasons to change from Confluence and why Asciidoc is a good choice.

What’s in this Repo?

This repo is a proof-of-concept for converting the Ref Guide to Asciidoc format. It includes:

  • solr-ref-guide directory

    • This showcases what the "final" ref-guide files would look like in our Git repo for maintaining & building the Ref Guide.

    • Only this directory, and the files in it, would be copied/migrated into the official lucene-solr Git repo.

  • confluence-export directory

    • This contains tools for converting the raw confluence XML+HTML exports into .adoc files (with images) which will be written into solr-ref-guide/src.

    • This directory is also where the raw exports should be placed by folks interested in updating the converted docs:

      • confluence-export/raw-export - put an unzipped confluence HTML export here

      • confluence-export/raw-xml-export - put an unzipped confluence XML export here

  • build.xml

    • this top level build.xml has a single convert-raw-confluence-exports which handles converting the raw Confluence exports into confluence-export/cleaned-export and then doing the final conversion to solr-ref-guide/src/*.adoc files

Site Generation Process

Asciidoctor is designed as a toolchain for publishing content. Confluence provides a similar toolchain, so when thinking about replacing it, we must think about each aspect of the process and how it can be replaced.

Confluence provides a full end-to-end solution, from editing content to publishing a PDF. It includes features such as page comments and search, which are vital to community involvement and successful navigation of a topic as broad as Solr.

This project proposes the following:

  • Raw content files will be stored in Asciidoc (.adoc) format.

    • These files will contain metadata attributes indicating the "hierarchical" structure of the documents.

  • Custom Java tooling will be used to process the .adoc file metadata to build up navigation data files

  • Jekyll will be used for creation of a full HTML site.

    • Sidebar-based navigation between pages is built into the Jekyll templates used for each page.

    • The sidebar will be generated from a sidebar.json file produced by our custom Java tooling during site build.

  • Asciidoctor-pdf will be used for conversion to PDF.

    • The PDF will be generated from a programatically generated .adoc file built by our custom java tooling which includes all "real" .adoc files in the correct order.

More on these options below - for specifics, see solr-ref-guide/README.adoc.

Asciidoctor

Asciidoctor is a toolchain written in Ruby which facilitates converting text files to other formats for publishing.

Asciidoctor is both the markup language AND the conversion tool. The Asciidoctor GitHub repositories include several projects for build integration, content preview, content styling, and other tools required for a robust publishing solution.

HTML

It’s possible to use Asciidoctor to generate HTML pages. However, the HTML output is limited to what is defined in the source content itself. There is no ability to add site navigation or add customized JavaScript. These more robust features can be added with a static site generator like Jekyll which is what we will be using.

Jekyll

Jekyll is a static site generator, which uses templates to create complete websites from raw content files in Asciidoc or Markdown (and other simple markup formats).

Jekyll is the most well-known of currently available static site generators, and has the most robust development and user community, and is the most mature.

Jekyll is written in Ruby and depends on native libraries, so it won’t be viable to use JRuby to invoke jekyll via java, so building the HTML ref guide site will require that Jekyll be installed locally beforehand.

We’re also using the jekyll-asciidoc plugin, which would also need to be installed locally as a dependency. The build.xml file still needs to be updated to try and detect if these gems are installed as part of the build process, and give a clean error w/install instructiones if they are not.

The current POC uses a Jekyll Documentation Theme developed by Tom Johnson, so the POC has a UX that is very different from out-of-the-box Jekyll.

I have additionally modified Tom’s theme quite a bit:

  • The original theme implemented several features that are available from Asciidoctor and the jekyll-asciidoc plugin (which his theme does not use). I’ve removed most of these features from the theme and rely instead on the Asciidoctor functionality.

  • Modified the styling (fonts, colors, etc.) to conform to the Solr Style Guide (more to do here).

  • Started a long process to unify the CSS. Tom’s theme has about 5 stylesheets, and I had 2 more, so some culling was necessary. There might be some copyright issues there that need to be fixed before incorporating into a live site.

The borrowed theme implements "search", but it is not a full-text search engine. It’s a keyword index of titles and page description metadata.

TO DO

  • Fix Jekyll targets in build.xml to download the Ruby dependencies (Ruby, Jekyll, and jekyll-asciidoc).

  • Fix broken tag pages.

  • Further CSS consolidation.

PDF

We’ll use the asciidoctor-ant plugin which includes asciidoctorj which itself includes asciidoctorj-pdf, an implementation of asciidoctor-pdf an Asciidoctor project to generate PDFs.

This means that even without Jekyll (or Ruby) installed, anyone will be able to build the Ref Guide PDF using jars that can be obtained from maven central via ivy.

There are some interesting issues that still need to be overcome:

  • Additionally, there is an open issue in asciidoctor-pdf to make page breaks in sections configurable. Without this, page breaks are happening after every h2 level, which is used extensively in the source files. Thus, there are a lot more page breaks than there should be.

  • The files will pick up any custom Asciidoctor rules added to an individual file - such as to add a table of contents section. This probably can be overridden.

As for styling, a YAML-based theme file is required to define colors, fonts, sizes, etc. This implements many of the features of CSS.

Overall, this plugin is the easiest to use of the various options (another option requires converting the content to DocBook format first, then to PDF), but is still in an alpha stage (as of 10 Nov 2016), so many features are still pending for future releases.

TO DO

  • Check compression requirements.

Comments

Comments are one of the two main reasons why a static site generator is required to have a full-featured replacement for Confluence.

This POC uses the Apache Comment System. See this repo’s wiki page Comments for more details on this system.

Because static site generators are template driven, it’s simple to add JavaScript snippets to the template for each page. Variables allow filling in a page ID (this POC uses a page shortname) as each page is generated, which pull in the comments from the comment system.

Migration of existing comments from Confluence was briefly considered and rejected as too complicated. Comments are only available from the XML export from Confluence, while we needed the HTML export for effective content conversion. Then there is a question of if they are worth migrating - my own view is that they are not.

For more on how these decisions were made, see this repo’s wiki page, Comments.

TO DO

  • Style the comments sections (custom style in progress in comments.css).

Open Questions

Location of Source

Should the content source live in a separate tree?

Should the content source live in a new sub-directory of the Solr Git repo?

Organization of Files

How should we organize the Ref Guide pages in the directory tree?

  • As chapters, with a folder for each main subject heading.

  • As one big directory of files.

Some examples of how others have done it are available in this repo’s wiki page File Organization.

Hosting Options

Without Confluence, we will need to determine how and where to host the rendered pages. Some initial ideas:

  1. Host in ASF CMS with website.

  2. Host however the javadocs are hosted.

How will we provide search?

Recommend probably indexing generated HTML pages. Could use bin/post from Solr to recurse over the HTML files and index them. In this case, we will need to figure out where to host Solr.

About

Proof of concept of Solr Ref Guide converted to asciidoc format & using Asciidoctor for publishing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published