Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search nearly always returns too many meaningless results #4883

Closed
awbacker opened this issue Nov 23, 2017 · 14 comments
Closed

Search nearly always returns too many meaningless results #4883

awbacker opened this issue Nov 23, 2017 · 14 comments
Labels
auto-locked Outdated issues that have been locked by automation C: search 'pip search' state: awaiting PR Feature discussed, PR is needed type: enhancement Improvements to functionality

Comments

@awbacker
Copy link

  • Pip version: pip 9.0.1
  • Python version: Python 3.6.3
  • Operating system: OSX

Description:

With pip search I almost never see what I could consider to be logical results. Today was just the last straw after so many years. Pip is, in other respects, wonderful and almost 100% trouble free. It keeps getting better (yay 9.x!), but somehow search feels like the child that was left behind. It is nearly as bad as apt search :D

pip search django-filter | wc -l   : **6000+** results!
pip search django+filter | wc -l   : **6000+** results
pip search filter django | wc -l   : **5900+** results
# note: some packages print full doc files, skewing the results by +150 lines or so

The only logical way to use it is to do:

pip search django-filter | grep -i django-filter

I have the exact package name and it can't find it without trickery

To make things worse:

  • django-filter isn't even returned in the search results, anywhere
  • pip install django-filter works perfectly fine
  • note: django-filters with an s is not the same package

At least I can't find it with any search term I can devise. pip help search returns no useful directions, or hints of any fancy query language.

What I logically expect when using a search

  • if I can install it by a name, search should be able to find it by the same name
  • fuzzy search (-=>_, etc), case insensitive
  • exact matchs of term entered (first)
  • partial matches, using starts-with
  • maybe return term anywhere in name, e.g. new-django-filter
  • a flag to search descriptions (since 99% of the time these results are bad)
  • a flag to enable regex-ish searching, e.g. *filters (that means filters is at the end)
  • paging of results (don't print out 6000!, maybe show the total and for next)

I'm trying not rip on pip in general, I do love it, but the search needs some TLC.

Funny results:

In the results for django-filter I have gems such as:

igitt-django (0.1.0.dev20170731112845)                     - A django app for storing IGitt objects.
pytest-tipsi-django (0.3.0)                                -
django-remote-finder (0.3)                                 - UNKNOWN
odoo8-addon-account-payment-mode-term (8.0.0.1.2.99.dev6)  - Account Banking - Payments Term Filter

Bad, but unrelated:

Search for django-limits (0.0.4). You'll see the package and a full documentation file printed out under it. This is distressingly common. search should trim the output to a line, since obviously there are lots of malformed package descriptions.

@pradyunsg pradyunsg added type: enhancement Improvements to functionality C: search 'pip search' state: awaiting PR Feature discussed, PR is needed labels Nov 23, 2017
@pradyunsg
Copy link
Member

I agree that pip search needs some love.

There indeed are a lot of UX issues with it and it's really a matter of someone finding the time to come around and figure out how to solve them.

@awbacker if you want, you can take a shot at it. :)

@pfmoore
Copy link
Member

pfmoore commented Nov 23, 2017

The big issue is that pip search simply calls the PyPI search, which is not very good. I believe Warehouse search is expected to be better. But we're mostly limited by the quality of the data the backend returns to us.

@awbacker
Copy link
Author

Oh, I didn't realize that pypi was the source, I assumed that it went off a local cache of all package names/descriptions. In that case there isn't much to be done in pip itself (regarding results, at least), without implementing that caching ourselves.

@pradyunsg I'd actually be curious to take a stab at it, honestly. Seems suitably simple sounding but hard in reality. However, what would you recommend re: pypi searching. Doesn't seem reasonable to keep a local cache, either it would need to be updated explicitly or pulled from pypi too often. That is if pypi even allows pulling. I've heard of warehouse, but that was a while ago. I'll go take a quick look again, see where its at.

@pradyunsg
Copy link
Member

The big issue is that pip search simply calls the PyPI search, which is not very good.

I was only thinking from the UX point of view -- so this didn't click. pip doesn't actually store all that information locally. pip just hits an endpoint on PyPI and shows whatever it got which means improving pip search requires improving PyPI search -- which is a good thing?

@pradyunsg I'd actually be curious to take a stab at it, honestly.

I guess a good place to start would be the warehouse docs -- https://warehouse.readthedocs.io. It's hosted at pypi.org currently on a low traffic capacity. :)

@pfmoore
Copy link
Member

pfmoore commented Nov 24, 2017

improving pip search requires improving PyPI search -- which is a good thing?

Agreed. I was just pointing out it's not a pip issue as such (and it may in fact already have been fixed in Warehouse - I don't follow Warehouse development)

@awbacker
Copy link
Author

Search in warehouse works fine, actually. At least its much better, and seems easier to improve.

The project seems to still be moving along, but I have no idea how far out it is from being "finished". Or from getting to the mythical deployed status, and becoming the default.

@xavfernandez
Copy link
Member

xavfernandez commented Nov 24, 2017

Note that we could either use and improve warehouse search or we could also go with #395

@pradyunsg
Copy link
Member

FWIW, Warehouse search improvements would probably just trickle down to everyone once the redirect from pypi.python.org to pypi.org is setup.

I thought #395 was the way to go earlier but now I'm not sure I understand enough about this entire pip search situation.

@awbacker
Copy link
Author

https://pyfound.blogspot.com/2017/11/the-psf-awarded-moss-grant-pypi.html

It seems to good to be true, but if its really means warehouse will be deployed then I suggest closing this. The search there works fine, so that would solve everything in one go.

The issues mentioned in #395 are still relevant though, so if this is kept open then I may do some more research and see if its something I can personally do or if its beyond my ability.

@brainwane
Copy link
Contributor

Hi! I'm the Warehouse project manager -- and helped write the blog post that just went up today, asking for package managers to test pypi.org. Here's the Warehouse roadmap and here's an overview of the remaining issues to resolve before we have Warehouse replacing legacy PyPI.

We do have several open search-related issues and would welcome help from anyone who knows, or feels like learning, Elasticsearch -- and here's our package querying API documentation. (For directions for getting set up, see our Getting Started Guide. If you are working on Warehouse issues and have questions, please feel free to ask them in the issue, in #pypa-dev on Freenode, or on the pypa-dev mailing list.)

Thanks for the pointer to #395.

@awbacker
Copy link
Author

awbacker commented Feb 27, 2018

@brainwane At this stage is warehouse suitable for local testing as a private repo?

I don't actually have a pypi account; we have a private pypicloud instance that stores our (relatively limited) internal packages though. It would be nice to scrap that.

@di
Copy link
Sponsor Member

di commented Feb 27, 2018

@awbacker I responded to you on IRC but I'll CC it here for posterity:

I wouldn’t recommend it. Warehouse is really only designed to be https://pypi.org, you’ll be much better off with projects designed to be private indices like pypicloud or devpi.

@di
Copy link
Sponsor Member

di commented Jul 19, 2019

I think this issue could probably be closed in favor of either #395 or #5216.

@pradyunsg
Copy link
Member

Yep. Thanks @di!

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Aug 19, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Aug 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation C: search 'pip search' state: awaiting PR Feature discussed, PR is needed type: enhancement Improvements to functionality
Projects
None yet
Development

No branches or pull requests

6 participants