Skip to content

Releases: zoomio/tagify

Release v0.62.0

30 Mar 04:01
8a9b629
Compare
Choose a tag to compare
  • bumped Go to 1.22;%0A - bumped github.com/zoomio/inout to v0.14.0;%0A - introduced UserAgent (-ua in CLI mode) to allow to pass a custom user agent for headless HTTP calls.%0A

Release v0.61.0

08 Jul 04:04
09deee9
Compare
Choose a tag to compare
  • bumped Go to 1.20;%0A - bumped github.com/zoomio/inout to v0.13.0.%0A

Release v0.60.2

04 Sep 23:19
Compare
Choose a tag to compare
  • fixed dictionary loader for segmenter for Chinese & Japanese languages.

Release v0.60.1

25 Aug 22:59
612d60d
Compare
Choose a tag to compare
  • BREAKING: from now on ContentOnly option is set to true by default;
  • optimization: moved segmenter inside the config with the lazy initialization so now it happens only once;
  • fix: in cases when language detection is reliable it is now using correct value;
  • fix: use the same segmenter logic in the plain text processor.

Release v0.60.0

22 Aug 06:56
5eec79e
Compare
Choose a tag to compare
  • graduated ContentOnly option (-content option in the CLI mode);
  • BREAKING: from now on -content option in the CLI mode is set to true by default.

Release v0.59.0

21 Aug 22:17
Compare
Choose a tag to compare
  • use different segmentation logic based on the github.com/go-ego/gse segmenter for Chinese & Japanese languages;
  • improved HTML parser logic: optimised the way it collects contents of a document and improved logic for splitting into sentences;
  • fallback to the English language for the stop words in cases when language detection is not reliable;
  • added lang option to the CLI to be able to provide the language of the document;
  • bumped github.com/zoomio/stopwords to 0.11.0.

Release v0.58.0

02 Aug 02:44
Compare
Choose a tag to compare
  • stopped ignoring <h1> in cases when they are equal to the <title>, as in now they are included.

Release v0.57.0

02 Aug 02:11
62248c4
Compare
Choose a tag to compare
  • Bumped github.com/zoomio/inout to 0.12.0;
  • Fixed -q option or Query in the code (HTTP/HTML mode only), so now it actually works and retrieves contents of the DOM element for the query;
  • Introduced -r option or WaitFor (HTTP/HTML mode only) to allow for waiting for certain DOM element to be ready before getting HTML;
  • Introduced -u option or WaitUntil (HTTP/HTML mode only) to allow to wait for a certain delay before getting HTML;
  • Introduced -i option or Screenshot (HTTP/HTML mode only) to capture a full screenshot of HTML in the given path.

Release v0.56.1

30 Jul 11:07
Compare
Choose a tag to compare
  • Added macOS (darwin) ARM64 release.

Release v0.56.0

30 Jul 10:31
20c2165
Compare
Choose a tag to compare
  • Bumped Go to 1.18;
  • BREAKING: renamed ParseHTML, ParseMD & ParseText to ProcessHTML, ProcessMD & ProcessText respectively;
  • BREAKING: renamed extension.Result to extension.ExtResult;
  • New option AllTagWeights for enabling parsing through everything;
  • New option ExcludeTagsString for prohibitting some of the tags;
  • ParseHTML & ParseMD are made public to open up parsing capabilities.