Releases: zoomio/tagify
Releases · zoomio/tagify
Release v0.62.0
- bumped Go to 1.22;%0A - bumped
github.com/zoomio/inout
to v0.14.0;%0A - introducedUserAgent
(-ua
in CLI mode) to allow to pass a custom user agent for headless HTTP calls.%0A
Release v0.61.0
- bumped Go to 1.20;%0A - bumped
github.com/zoomio/inout
to v0.13.0.%0A
Release v0.60.2
- fixed dictionary loader for segmenter for Chinese & Japanese languages.
Release v0.60.1
- BREAKING: from now on
ContentOnly
option is set totrue
by default; - optimization: moved segmenter inside the config with the lazy initialization so now it happens only once;
- fix: in cases when language detection is reliable it is now using correct value;
- fix: use the same segmenter logic in the plain text processor.
Release v0.60.0
- graduated
ContentOnly
option (-content
option in the CLI mode); - BREAKING: from now on
-content
option in the CLI mode is set totrue
by default.
Release v0.59.0
- use different segmentation logic based on the
github.com/go-ego/gse
segmenter for Chinese & Japanese languages; - improved HTML parser logic: optimised the way it collects contents of a document and improved logic for splitting into sentences;
- fallback to the English language for the stop words in cases when language detection is not reliable;
- added
lang
option to the CLI to be able to provide the language of the document; - bumped
github.com/zoomio/stopwords
to0.11.0
.
Release v0.58.0
- stopped ignoring
<h1>
in cases when they are equal to the<title>
, as in now they are included.
Release v0.57.0
- Bumped
github.com/zoomio/inout
to0.12.0
; - Fixed
-q
option orQuery
in the code (HTTP/HTML mode only), so now it actually works and retrieves contents of the DOM element for the query; - Introduced
-r
option orWaitFor
(HTTP/HTML mode only) to allow for waiting for certain DOM element to be ready before getting HTML; - Introduced
-u
option orWaitUntil
(HTTP/HTML mode only) to allow to wait for a certain delay before getting HTML; - Introduced
-i
option orScreenshot
(HTTP/HTML mode only) to capture a full screenshot of HTML in the given path.
Release v0.56.1
- Added macOS (darwin) ARM64 release.
Release v0.56.0
- Bumped Go to 1.18;
- BREAKING: renamed
ParseHTML
,ParseMD
&ParseText
toProcessHTML
,ProcessMD
&ProcessText
respectively; - BREAKING: renamed
extension.Result
toextension.ExtResult
; - New option
AllTagWeights
for enabling parsing through everything; - New option
ExcludeTagsString
for prohibitting some of the tags; ParseHTML
&ParseMD
are made public to open up parsing capabilities.