Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom labels to get topic tree #2125

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

clfkenny
Copy link

What does this PR do?

get_topic_tree function currently does not accept custom_labels as a parameter whereas the other visualization methods do. Therefore this change aims to fix that issue so custom labels can be accepted and make the visualization more intuitive.

I really enjoyed looking through the codebase here. My first PR here so still learning the project structure, would appreciate any feedback. Thanks!

Fixes # #2123

Before submitting

  • This PR fixes a typo or improves the docs (if yes, ignore all other checks!).
  • Did you read the contributor guideline?
  • Was this discussed/approved via a Github issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes (if applicable)?
  • Did you write any new necessary tests?

Copy link
Owner

@MaartenGr MaartenGr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! The code looks solid and I left minor comments.

One other thing, would you mind expanding the tests at tests/test_variations/test_hierarchy (test_tree) a bit to include the custom_labels=True?

Comment on lines +1916 to +1919
if isinstance(custom_labels, str):
for topic, kws_info in self.topic_aspects_[custom_labels].items():
label = "_".join([kw[0] for kw in kws_info[:5]]) # displaying top 5 kws
left_labels[topic] = label
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a check for when a user attempts to use a specific aspect but it is not found in self.topic_aspects_? I can imagine that a user (myself included) would mistype the name and the resulting error might not be entirely clear.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, I can add a check here. One thing I noticed when looking through how custom labels were handled in the plotting functions is that they all follow a pretty similar pattern:

if isinstance(custom_labels, str):

if isinstance(custom_labels, str):

if isinstance(custom_labels, str):

if isinstance(custom_labels, str):

if isinstance(custom_labels, str):

Maybe a more generic and reusable function can be created for this, which would check if the user specified aspect exists in self.topic_aspects_

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it makes no sense to do that check here but not at all of those other instances. For now, I'm alright either way. I can imagine wanting to stick with what I already use everywhere else for consistency as this might be a bit out of the scope of this particular PR. So it's up to you.

label = "_".join([kw[0] for kw in kws_info[:5]]) # displaying top 5 kws
left_labels[topic] = label
elif self.custom_labels_ is not None and custom_labels:
left_labels = {topic_id: label for topic_id, label in enumerate(self.custom_labels_, -self._outliers)}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to mention that I am going to steal that enumerate trick. That is so much nicer coding than how I have approached it thus far.

left_labels = {}
if isinstance(custom_labels, str):
for topic, kws_info in self.topic_aspects_[custom_labels].items():
label = "_".join([kw[0] for kw in kws_info[:5]]) # displaying top 5 kws
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that some topic aspects might not have 5 keywords but actually a single label. In those cases, the kws_info[0] is likely to be filled with a label (or even a summary) and each instance in kws_info[1:] will be an empty string. As a result, you might get the following label: "artificial intelligence____".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants