-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stale search results due to SOLR latency & reindex failures #628
Comments
Ok, looking at one of these, OL3392693A, and inserting a publisher Science Research Associates with: The second response: Four more responses have no works or editions (three having been deleted but not deindexed, and one simply being blank of works): |
A few minutes later that's changed from: |
@hornc I see that on 18 Jan 2016 you reverted spam at https://openlibrary.org/works/OL102793W/The_Jesuits_in_North_America_in_the_seventeenth_century?m=history |
From looking at production updater logs there are numerous timeouts when POSTing update to http://ol-solr2:8983/solr/update This causes the updater script to die, then re-spawn. The batch of changes for which the POST failed (last 60 seconds worth of changed items, or greater than 100 items) is not updated in SOLR. I'm theorising that some item updates always cause the POST to timeout since they never get indexed when added to the /admin/solr list
Trying to determine how often this so roughly 100 times per day. |
So roughly every 900 seconds it loses 60 seconds of transactions? Ugh! Is there a regular time interval, or is that just a mean figure? |
That does sound bad doesn't it :( It is just a mean figure. From what I have seen in the logs, sometimes it is only one record in the 60 second batch, but other times there are other, presumably ok updates that are not being committed too. I'm still digging into the causes, but have eliminated a range of potential issues in the OL code and updater script (and adding tests and docstrings to the OL code!), so hopefully I am close to finding the root. |
Would reducing the batch size be a potential workaround for this issue? |
So, what's implicit in closing this is that the Feb. 10 hypothesis about this being caused by Solr update timeouts was incorrect. I believe that that is, in the main, true, but I wonder if we've got a (smaller) hole where timeouts or other errors in the Solr updater cause us to lose updates. Do we have a way of detecting and/or recovering from that scenario? Should we be doing periodic full reindexing sweeps? |
@tfmorris I think you are correct. I was wrong that it was the cause of this issue, but the logs appeared to show it happening, so it must be having some negative effect. This issue turned out to leave no error traces in logs. I'm trying to go over all the things I've learned and raise new issues for what is outstanding. I there are likely multiple outstanding issues. I want to add issues separating and characterising them, get an idea of their impact. As an example, I have discovered that there is no automated way that orphaned editions are removed from Solr when they are fixed by associating them with a work... so the orphan will appear as effectivly a duplicate in search results. There may already be an open issue for this one. I'll need to look for it. |
#891 fixed a real bug, but these records are still not being removed because there are still Edition records that reference these Authors. This is an additional data problem with the records on top of the code bug. Once the data is tidied and the stale reference removed, these will be reindexable. |
@hornc Do you have an example of an edition record which references a to-be-deleted author? Edition pages only show the work author, as far as I can tell, even though I can see the reference in the JSON. |
See |
A year on from my above, https://openlibrary.org/books/OL9761061M.json still shows the old OL1422909A even though the also-linked work OL5805701W shows OL5207025A. |
The existing cases should be fixed by the solr reindex. I created a new issue to track the reindexing progress: #2222 . Although note the root cause as to why these aren't getting updating during the normal flow will not be fixed by that. |
That seems to contradict his July 2019 #628 (comment) "Although note the root cause as to why these aren't getting updating during the normal flow will not be fixed by that," which I find more believable. I don't think root cause for the solr updater errors was ever found and we know the logging needs to be improved to even start investigating them. This has absolutely nothing to do with the version of Solr. |
Well, it’s now TWO years on, and the edition still contains the stale link to the wrong author record. Surely it is clear that an edition which links to a work should not also contain a link to an author except for secondary attributions. |
Three years on, and the edition record stilll shows the (wrong) author, even after the reindexing Again, edition records should not have explicit, uncorrectable links to authors. Author links belong on the work records where they can be corrected. |
@LeadSongDog I believe the issue you're describing is documented here: #2625 . It's not related to solr indexing. |
Calling this one closed (after all this time!); we've been on solr8 for ~2 weeks now, and (1) the caching issue was fixed in a PR, and (2) solr latency has gone from ~15min -> ~1min ! If anyone notices other issues of this nature, where they're edits are not reflected in search, please do create a new issue for it! |
I have fixed the majority of author records from issue #482
but the search results still list them all:
https://openlibrary.org/search/authors?q=please+see
I have tried manually adding the records to the admin/solr list but I beleive something about these records causes the reindex to (silently?) fail so they are skipped.
There may be a general problem that reindexes fail and are not suitably logged for us to investigate. It was difficult to get decent logs to gather more info on this issue.
@dvanduzer I'd love help with this one if you are looking for something to do with SOLR ;)
The text was updated successfully, but these errors were encountered: