HTQL is a SQL-like query language designed for extracting data from HTML structures. Its syntax is simple and intuitive, making it easy to use within other SQL adapters or applications.
HTQL allows users to extract structured data from HTML documents using familiar SQL syntax. With HTQL, you can select elements by type, apply filters, and even pull data from remote URLs, providing a powerful way to query HTML content in a standardized format.
Use the SELECT
statement to query HTML structures. Examples:
SELECT * FROM ./test.html -- Select all elements from a local file
SELECT p, div, h2 FROM ./test.html -- Select specific elements (p, div, h2)
SELECT * FROM ./test.html WHERE attributes.class = 'title'
SELECT * FROM ./test.html WHERE attributes IS NOT NULL
SELECT * FROM ./test.html WHERE attributes.class = 'title' OR attributes.id = 'content'
SELECT span FROM ./test.html WHERE attributes.class = 'title' AND attributes.id = 'content'
SELECT span FROM ./test.html WHERE attributes.class = 'title' AND NOT attributes.id = 'content'
You can also query data directly from remote HTML documents by specifying the URL:
SELECT p, div, h2 FROM https://example.com