Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance problems with new groups search/filter and large databases #2852

Closed
AEgit opened this issue May 19, 2017 · 23 comments
Closed

Performance problems with new groups search/filter and large databases #2852

AEgit opened this issue May 19, 2017 · 23 comments
Assignees
Labels
bug Confirmed bugs or reports that are very likely to be bugs groups type: performance
Milestone

Comments

@AEgit
Copy link

AEgit commented May 19, 2017

JabRef 4.0.0-dev--snapshot--2017-05-18--master--018173ebd
Windows 10 10.0 amd64
Java 1.8.0_131

The newly implemented groups filter/search (#2588) exhibits massive performance problems when used on a large database (>10,000 entries, ~1,000 static groups).

  1. Open large database
  2. Start to enter any term you want to search for in the "Filter Groups" part of the groups panel.
  3. You will immediately see a massive performance decrease. It takes minutes to just complete entering the word you want to search for (because the filtering starts immediately). The groups filtering itself also takes minutes to complete. During that time JabRef behaves as if it had crashed.

This issue has been reported before (#2588 (comment), #2588 (comment)) but I have been advised to open a new ticket so that the issue is not forgotten.

@AEgit AEgit changed the title Performance problems with new groups search and large databases Performance problems with new groups search/filter and large databases May 19, 2017
@tobiasdiez tobiasdiez added bug Confirmed bugs or reports that are very likely to be bugs groups labels May 19, 2017
@tobiasdiez tobiasdiez added this to the v4.0 milestone May 19, 2017
@tobiasdiez
Copy link
Member

tobiasdiez commented May 20, 2017

This should be fixed in the latest development version. For a very large groups tree, the filtering process still takes a moment and is not as fluid as I would like but at least it works now. Could you please check the build from http://builds.jabref.org/master/. Thanks!

@AEgit
Copy link
Author

AEgit commented May 20, 2017

JabRef 4.0.0-dev--snapshot--2017-05-20--master--01e854829
Windows 10 10.0 amd64
Java 1.8.0_131

I can confirm, that the performance has drastically improved. A group search takes no longer a couple of minutes. However, as you say, the performance is still not very high for large databases. In my case it takes 14 sec after entering a search/filter term for the group to be displayed. If you are switching often from groups to groups, I think this is a bit too much.
As a workaround: Would it be possible to not start immediately filtering/searching for groups while entering the search term? I. e., could you make the groups search/filter so that you have to press "Enter" to actually start the filtering/searching. This might already improve the subjective performance (?).
I would suggest to keep this issue open.

Thank you so much for your work!

@tobiasdiez tobiasdiez reopened this May 20, 2017
@tobiasdiez tobiasdiez removed this from the v4.0 milestone May 20, 2017
Siedlerchr added a commit that referenced this issue May 27, 2017
* upstream/master: (23 commits)
  Implement #2785: resort groups using drag & drop (#2864)
  Add Library of Congress as ID-fetcher (#2865)
  Fix export and import of MS office day/year/month acessed fields (#2862)
  Adsurl to url (#2861)
  Update LICENSE.md
  Update
  Update LICENSE.md
  Update license file so that github recognize it properly
  Improve Issue Template Using a Collapsible Log Area
  Fix #2852: Improve performance of group filtering.
  Rename GroupSelector to GroupSidePane
  Fix #2843: Show information correctly in entry editor
  Remove old entry editor code related to focus selection
  Implement feedback
  Menu Greek Translation (#2836)
  Relaxed the regex to also match negative timezone formats when parsing pdf annotation dates (#2841)
  Update localization
  Remove unnecessary group code (and move remaining settings to preferences)
  Add Local Maven repo as first lookup resource, to avoid having duplicate libs in gradle and maven
  Implement #2786: Allow selection of multiple groups
  ...
Siedlerchr added a commit that referenced this issue Jun 3, 2017
* upstream/master: (38 commits)
  Add link to "feature branch workflow"
  Support Annotations Created by Foxit (#2878)
  Fixes jacoco by excluding the fetcher tests from analysis (#2877)
  Fix entry editor (#2875)
  update bcprov-jdk15on from 1.56 -> 1.57
  update assertj-swing-junit from 3.5.0 -> 3.6.0
  update mockito-core from 2.7.22 -> 2.8.9
  update jfx from 0.11.0 -> 0.11.1
  update google guava from 21.0 -> 22.0
  Fix Divide by zero exception if rows is zero in Entry Editor Tab (#2873)
  Implement #2785: resort groups using drag & drop (#2864)
  Add Library of Congress as ID-fetcher (#2865)
  Fix export and import of MS office day/year/month acessed fields (#2862)
  Adsurl to url (#2861)
  Update LICENSE.md
  Update
  Update LICENSE.md
  Update license file so that github recognize it properly
  Improve Issue Template Using a Collapsible Log Area
  Fix #2852: Improve performance of group filtering.
  ...
@lenhard
Copy link
Member

lenhard commented Oct 23, 2017

When we are done with optimizing the entry editor, the time has come to address the performance issues in the groups panel.

I'll be so enthusiastic and add this to the 4.1 milestone. We'll see how it works out.

@lenhard lenhard added this to the v4.1 milestone Oct 23, 2017
@AEgit
Copy link
Author

AEgit commented Oct 23, 2017

That sounds awesome: Thank you very much!

@halirutan
Copy link
Collaborator

@AEgit To profile this, I need an example database of the size you mentioned. Can you point me to one?

@AEgit
Copy link
Author

AEgit commented Oct 23, 2017

@halirutan I think @lenhard should still have my example database with 6,500 entries. My current one consists of over 12,000 entries, but I think the old example should be fine. Please let me know, if you don't have access to that example database - if that is not the case, I'll send you a new example database via email (for various reasons I would prefer not to make the database available to the public, so I cannot upload it here).

@halirutan
Copy link
Collaborator

@AEgit Send me yours to my mail on GitHub. If I have time, I look into it and see if I can point out some hot spots we should look in to.

@AEgit
Copy link
Author

AEgit commented Oct 23, 2017

@halirutan Database has been sent. Thank you for your help!

@lenhard
Copy link
Member

lenhard commented Oct 23, 2017

@AEgit @halirutan Sorry for being late. We have the prior file as well, but if you've already exchanged the new one, there's no reason to send it again, I guess.

@halirutan
Copy link
Collaborator

I guess I found the problem and @lenhard or @tobiasdiez should be able to provide a fix easily. When I see this right, then the flaw lies here:

    private boolean showNode(T t) {
        if (filter.get() == null) {
            return true;
        }

        if (filter.get().test(t)) {
            // Node is directly matched -> so show it
            return true;
        }

        // Are there children (or children of children...) that are matched? If yes we also need to show this node
        return childrenFactory.call(t).stream().anyMatch(this::showNode);
    }

What happens is that every node in the groups view decides for itself if it is visible or not. To do this, intermediate nodes need to check if any of their children are visible because then, they need to be visible as well; no matter if they themselves match.

With a nested group structure of say depth 3, this means that nodes in level 1 check themselves and all their children. Despite the fact, that we now already calculated the visibility information of all children, each node at level 2 will do the exact same work again.

To give an example: Consider 1 root node with 3 children and each of the children again 3 children. This gives 13 nodes we need to check. What we instead do is

  1. Check for the root node: tests all 13 nodes
  2. Check for each of the 3 first level children: 3x4 nodes
  3. Check each level 2 child: 9 nodes

Makes 34 node tests altogether. We need to remember the visibility of a node once we have calculated it for a particular search term.

@halirutan
Copy link
Collaborator

I believe one crucial step, no matter if we can increase the performance of the group-search itself, is to unbind the search field from an instant search action. I have tried to do this, by replacing the direct bind with

 final Timer searchTask = FxTimer.create(Duration.ofMillis(400), () -> {                          
    LOGGER.info("Run Search " + searchField.getText());                                          
    viewModel.filterTextProperty().setValue(searchField.textProperty().getValue());              
});                                                                                              
searchField.textProperty().addListener((observable, oldValue, newValue) -> searchTask.restart());

This works pretty well. So we only start the search 400ms after the user typed. Everytime the user hits a key in the search-field, the timer is restarted. With this, editing the search-field doesn't lag even with large databases.

@AEgit I have pushed a new branch fix-GroupSearchPerformance that includes this. Would you mind to try this version on your machine? I have tested it with your database, but my pc might be somewhat more powerful.

@AEgit
Copy link
Author

AEgit commented Nov 16, 2017

JabRef 4.1-dev--snapshot--2017-11-16--fix-GroupSearchPerformance--08d8a2ca0
Windows 10 10.0 amd64
Java 1.8.0_151

@halirutan Thank you very much! This is exactly why I wanted to express in my convoluted comment: #2852 (comment)

This makes the group search/filter much faster in terms of user experience. So, this is definitely an improvement!
Something I noticed, however, is that after a couple of group searches/filter actions, the filtering becomes again slower. Similarly, when removing the groups filter, it takes a couple of seconds every time to display the complete group structure.
I'm just wondering, why the groups filter search requires much more resources than the normal search. If I use "groups = mykeyword", the articles belonging to the respective group will be shown immediately (what's missing is just showing the respective group in the groups panel). Is it the filtering/searching itself that takes so much time in the groups filter process, or is it the displaying part, that requires so much CPU power?

Note also, that I'm running JabRef on a quite powerful machine (Core i7 4+4HT cores with 2.2 GHz, 16 GB RAM, SSD), so on other machines this might take a bit longer.

Thank you for your help!

EDIT: I also found another bug with the group filter, which is reported here:
#3436

@halirutan
Copy link
Collaborator

Something I noticed, however, is that after a couple of group searches/filter actions, the filtering becomes again slower.

@AEgit I did not fix the overal complexity issue I pointed out in #2852 (comment). The recursive groups tree is implemented using a very general recursive structure that is used in other places too. Currently, I don't see how I can fix this on my own and be sure I don't break anything else. Sorry.

@AEgit
Copy link
Author

AEgit commented Nov 16, 2017

@halirutan Yes, thank you very much! I just mentioned those issues, to make sure that other people following this thread would notice that - while much has been done to improve the performance - there still persist some issues (caused by the structural problem you mentioned).

@lenhard
Copy link
Member

lenhard commented Nov 17, 2017

@halirutan Thanks for looking into this and implementing an improvement. I think you could go for a PR with your delayed filtering.

Regarding the overall complexity problem, I had a look at this some time ago, but did not arrive at a proper solution. The underlying problem is that every node tries to decide its own visibility locally. But ultimately it is not a decision that can be made based on local knowledge, because it depends on its children. This means that the current solution has to be rewritten and the filtering needs to be triggered once and only once from the root of the group tree. This requires some major changes to the node structure and I am also unsure how to fix this without breaking something else.

@halirutan
Copy link
Collaborator

halirutan commented Nov 17, 2017

But ultimately it is not a decision that can be made based on local knowledge, because it depends on its children. This means that the current solution has to be rewritten and the filtering needs to be triggered once and only once from the root of the group tree

@lenhard Exactly. There is another solution which, unfortunately, is highly impractical as well: The node stores the result of the last run and knows the last search term. This, however, means that you need to cache the Predicate in the current templated implementation which comes down to holding a reference to the complete NodeViewModel when I saw this right. The node could then decide if the predicate is the same and use the result of the last search and doesn't need to dive into children again.

I haven't profiled it carefully, but my guess is that as it stands the implementation has memory problems when searching repeatedly in large groups. At least this is suggested by what @AEgit described with "after a couple of group searches/filter actions, the filtering becomes again slower".

@tobiasdiez tobiasdiez self-assigned this Dec 18, 2017
@halirutan
Copy link
Collaborator

halirutan commented Dec 20, 2017

@AEgit Would you test this PR http://builds.jabref.org/fix-GroupSearchPerformance/

It should be considerably faster and it contains (a) a delay in the search field so that it waits until you finished typing and (b) a cache that prevents the described repeated calculation of visibility.

@AEgit
Copy link
Author

AEgit commented Dec 21, 2017

@halirutan Thank you very much! The behaviour in:

JabRef 4.1-dev--snapshot--2017-12-20--fix-GroupSearchPerformance--12aab7e26
Windows 10 10.0 amd64
Java 1.8.0_151

is much much better! A nice Christmas present ;)

@koppor koppor modified the milestones: v4.1, v4.2 Dec 22, 2017
@tobiasdiez
Copy link
Member

tobiasdiez commented Dec 27, 2017

@AEgit I changed a few things to the code. Can you please have a look, if it is still working for you. Thanks. The link is the same http://builds.jabref.org/fix-GroupSearchPerformance/ but it's a newer version (in a few minutes).

@AEgit
Copy link
Author

AEgit commented Dec 27, 2017

JabRef 4.1-dev--snapshot--2017-12-27--fix-GroupSearchPerformance--4246be769
Windows 10 10.0 amd64
Java 1.8.0_151

@tobiasdiez Unfortunately the new version is not working as expected. While it still filters the groups, when a filtered group is selected, the associated entries are not updated accordingly in the main table view. This was not the case with the previous version:

JabRef 4.1-dev--snapshot--2017-12-20--fix-GroupSearchPerformance--12aab7e26
Windows 10 10.0 amd64
Java 1.8.0_151

@AEgit AEgit closed this as completed Dec 27, 2017
@AEgit AEgit reopened this Dec 27, 2017
@tobiasdiez
Copy link
Member

Thanks for the feedback @AEgit. This bug should be fixed now as well.

@halirutan
Copy link
Collaborator

@AEgit Tobi fixed the bug and the branch is now merged. I haven't had the time to profile through the current state, but when Tobi fixed the repeated creation of tree elements it should be acceptable now. Anyway, great work of testing JabRef with incredibly large databases so that we get a feeling for the bottlenecks.

@AEgit
Copy link
Author

AEgit commented Jan 4, 2018

@tobiasdiez and @halirutan Thank you very much! I can confirm, that this bug is fixed in:

JabRef 4.2-dev--snapshot--2018-01-03--master--193bbbca6
Windows 10 10.0 amd64
Java 1.8.0_151

The only remaining issue with the groups filtering is the one reported here:
#3436

Thanks a lot for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bugs or reports that are very likely to be bugs groups type: performance
Projects
None yet
Development

No branches or pull requests

5 participants