-
-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing / Multilingual support / Add per field Analyzer #5911
Indexing / Multilingual support / Add per field Analyzer #5911
Conversation
* web/src/main/webResources/WEB-INF/data/config/index/records.json: Add fields langfre and anyfre with french analyzer.
records.json: Add properties lang... for different languages. copy_to to these properties.
* Remove schema for french only. * Add asciifolding in french for better support of accents * Update doc * Dispatch keywords in proper full text lang field.
* If records are in one language, the record language is used by default (and no options are proposed to users) * if records are in various languages, the following options are available: Languages are ordered by frequency in records. Languages proposed does not take into account templates. Language detection is made on the list of record's languages. The more languages you have, the more the language detection precision decrease. Catalogue administrator can define the language strategy to use by default in UI configuration: * queryBase: to adjust query * searchOptions.language: to display language options * languageStrategy Language strategies: * searchInUILanguage: search in UI languages eg. full text field is any.langfre if French * searchInAllLanguages: search using any.* fields (no analysis is done, more records are returned) * searchInDetectedLanguage: restrict the search to the language detected based on user search. If language detection fails, search in all languages. * forceALanguage: depending on user selection, force a language.
…n in batch editing.
…stead of based on catalogue content).
…re of language config).
aba4b19
to
17649e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments,
Thanks for this contribution !
web-ui/src/main/resources/catalog/components/elasticsearch/EsService.js
Outdated
Show resolved
Hide resolved
<div class="col-md-2"> | ||
<span data-translate="">searchWithLang</span> | ||
</div> | ||
<div class="col-md-10"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case of large strings :
Maybe add a margin move
<div class="col-md-2">
<span data-translate="">searchWithLang</span>
</div>
=>
<div class="col-md-2">
<span data-translate="">searchWithLang</span>
</div>
<div title="{{'searchAllLanguages-help' | translate}}"
data-ng-show="optionsConfig.language"
class="row">
<div class="col-md-10">...</div>
</div>
@fxprunayre @josegar74 hi, any idea why this PR is still opened? Could we merge this in main and propagate this improvement to customer project? Thanks! |
Probably because you told us that "no experimentation at all" is acceptable, and this is clearly an experimentation. Also maybe because, we heard some comments when working on this implementation and some of us are probably not inline with the approach of this work ? And unfortunately, like many other PRs in GeoNetwork, a PR gets some care only :
Closing experimentation. |
This commit is a squashed version of the #5911 PR
Handpicked some changes from geonetwork/core-geonetwork#5911 to use analyzers in main search. This requires regenerating the UI config.
Handpicked some changes from geonetwork/core-geonetwork#5911 to use analyzers in main search. This requires regenerating the UI config.
The main goals are:
Search & indexing benefits
Index changes
The full text field
any
is now an object:Stop words
eg. in French
des
is a stopword and currently returns a lot of results if the analyzer does not take care of this.des
does not match anymore, usingdes
in search does not affect results)Ignore plurals
Ignore accents and ellision
Indexing / Languages configured
Analyzers added for:
Add synonyms
eg. bathy = bathymétrie, sig = ids, ...
User interface
Languages are ordered by frequency in records. Languages proposed does not take into account templates.
Language detection is made on the list of record's languages. The more languages you have, the more the language detection precision decrease.
Catalogue administrator can define the language strategy to use by default in UI configuration:
Language strategies:
To improve language detection, a whitelist of languages is computed based on records but can also be defined to focus on some language only:
This is useful, when you harvest records with various languages you don't want to display or when only one record contains a language not much used in all other records.
Admin / Search option configuration
Other fixes
Annex
Version 3 language configurations (for memory):