Skip to content

Commit

Permalink
Improve the documentation of slope and pivot
Browse files Browse the repository at this point in the history
  • Loading branch information
Witiko committed May 18, 2019
1 parent a2f4c7e commit 71f55e3
Showing 1 changed file with 16 additions and 14 deletions.
30 changes: 16 additions & 14 deletions gensim/models/tfidfmodel.py
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,7 @@ class TfidfModel(interfaces.TransformationABC):
"""
def __init__(self, corpus=None, id2word=None, dictionary=None, wlocal=utils.identity,
wglobal=df2idf, normalize=True, smartirs=None, pivot=None, slope=0.65):
wglobal=df2idf, normalize=True, smartirs=None, pivot=None, slope=0.25):
r"""Compute TF-IDF by multiplying a local component (term frequency) with a global component
(inverse document frequency), and normalizing the resulting documents to unit length.
Formula for non-normalized weight of term :math:`i` in document :math:`j` in a corpus of :math:`D` documents
Expand Down Expand Up @@ -326,23 +326,25 @@ def __init__(self, corpus=None, id2word=None, dictionary=None, wlocal=utils.iden
For more information visit `SMART Information Retrieval System
<https://en.wikipedia.org/wiki/SMART_Information_Retrieval_System>`_.
pivot : float, optional
See the blog post at https://rare-technologies.com/pivoted-document-length-normalisation/.
In pivoted document length normalization, the effective norm of a document is the weighted average of the
old norm and a pivot: :math:`\text{slope}\times\text{old norm} + (1.0 - \text{slope})\times\text{pivot}`.
Pivot is the point around which the regular normalization curve is `tilted` to get the new pivoted
normalization curve. In the paper `Amit Singhal, Chris Buckley, Mandar Mitra:
"Pivoted Document Length Normalization" <http://singhal.info/pivoted-dln.pdf>`_ it is the point where the
retrieval and relevance curves intersect.
When `pivot` is None, `smartirs` specifies either the unique (`u`) or the character-length (`b`) pivoted
document length normalization scheme, and either `corpus` or `dictionary` are specified, then the pivot will
be determined automatically. Otherwise when `pivot` is None, pivoted document length normalization will be
disabled.
Default is None.
This parameter along with `slope` is used for pivoted document length normalization.
See also the blog post at https://rare-technologies.com/pivoted-document-length-normalisation/.
slope : float, optional
In pivoted document length normalization, the effective norm of a document is the weighted average of the
old norm and a pivot: :math:`\text{slope}\times\text{old norm} + (1.0 - \text{slope})\times\text{pivot}`.
When `pivot` is None, and `smartirs` specifies the pivoted unique document normalization scheme (u), and
either `corpus` or `dictionary` are specified, then the pivot will be determined automatically.
Setting `slope` to 0.0 uses only the pivot as the norm, and setting `slope` to 1.0 disables pivoted document
length normalization. Singhal suggests setting `slope` between 0.2 to 0.3 for best results.
Default is 0.25.
When `pivot` is None, and `smartirs` specifies the character length unique document normalization
scheme (b), and `dictionary` is specified, then the pivot will be determined automatically.
slope : float, optional
Parameter required by pivoted document length normalization which determines the slope to which
the `old normalization` can be tilted. This parameter only works when pivot is defined.
See also the blog post at https://rare-technologies.com/pivoted-document-length-normalisation/.
See Also
--------
Expand Down

0 comments on commit 71f55e3

Please sign in to comment.