-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
playlist entry hang on 1/7/2020 on production at 2:12 pm #123
Comments
No changes were made at that time to prod or stage, so I suspect a glitch on the control A machine or the network, as she was able to access zk from another machine later. |
did you check the server log for any errors around this time? |
There is nothing unusual in the error log, and requests continued to come in from the period of 2:12 until 2:30 with no gaps. According to the access log, a total of 2,483 requests were processed during that period, so zk was up and responding. Given she should access it from another machine later, it suggests a transient error, either on the system in control A or the network. |
Reopening the issue because it happened to me at approximately 14:30 on 01/18/2020. Problem appears to be that the add click invokes a 'move down' operation rather than add which in turn resulted in an error response from the server and refreshing the page did not fix the issue. rough steps (on firefox) were:
Note it was difficult to debug further due to the javascript minification. suggest disabling it since the JS payload is small. |
Oooh. I wonder if the 'seq' you saw during the insert is related to the new code I added to order spins correctly in a live playlist that has 'future' entries? There is a variable seq involved in that. Your change to prevent setting future times in a live list obviates this case, so we could just turn it off, but it should work correctly nonetheless, so I want to understand what is going on and deal with root cause. Can you recall the exact js error message? That would be extremely helpful. Also, add track does not use the legacy track up/down for its sequencing, so clicking add should not in any way involve a 'move down'. That's why the exact error message is crucial.
There are only a dozen or so lines of code in this particular case, so it should be easy enough just to figure it out from the message alone. For some of the more complex inline code, maybe we should look at refactoring it to external js files. I have a plan to add automatic map file generation for our external js, so it can be debugged with ease.
This is very interesting. What happens when you reload? I've never encountered a situation where javascript could so badly mess up the browser that a reload would not clear it. Have you ever encountered something like that, where the offending code is not run upon page reload, yet reload does not clear the problem? Seems like carefully crafted javascript that can elicit this behaviour would be a nice exploit. [Edit: added text, please re-read] |
Can you please confirm the time? I was going to review the log, but I see your playlist ends at 1530 and the next playlist commences at 1800. |
time was incorrect, the correct time was approximately 14:30. can you obtain the creation time of the second (empty) playlist because it was most likely created as part of my effort to correct the issue.? fwiw, i've tried introducing json error returns to the two REST requests involved in the bug flow and i could not get the page to stall. |
I'm not seeing anything interesting in the logs. There is an unrelated warning that pollutes the error log that ideally we will clear; I'll open a separate issue to track that. We cannot tell when an insert occurred, but I thought I would be clever and review the access log and try to figure it out that way. Unfortunately, new playlist is a form post, so it is difficult to pick it out definitively in the access log. In the access log, I did find you had three sessions:
*Interesting were two delete tracks, at 1429 and at 1433, both of exactly the same track id (that is to say, multiple attempts to delete the same track.) That particular track was successfully deleted, so we cannot say what it was, but it originally sat between Dylan/Jokerman and Rucker/Straight to Hell. Not sure if delete track is implicated in any way, but it does occur around 1430. **There were two distinct seq=downTrack events at 1454... all of the subsequent addTrack and getTracks ajax calls after that until 1537 include this URL in their Referer, as it was the last page load. A little out of range of 1430, but such a Referer could well appear as part of an error message, though not directly related to it. I've been attempting to reproduce both in my dev env and in prod, using tabs and space, as well as various combinations adds, time edits, deletes, etc, but as yet, unable to reproduce. |
another point to keep in mind is that logging out/in cleared the issue so it would appear that it is somehow related to the user's session state. |
I noticed something very interesting in the log. The following extract consists of log entries from the control A mac during your show from 1410-1415:
Lines 1-4 are ajax It turns out you have the playlist editor open in two browsers concurrently!! Referring back to the forensics in the previous comment, I can confirm your login at 1205 is from Firefox while the next one at 1336 is from Safari. You had two concurrent, active sessions open in two different browsers, and you then attempt to edit the same live playlist via both sessions. It seems things went bad immediately after the Safari Being logged in from two browsers is fine, but working on the same playlist at the same time from two browsers is undefined. As you know, the playlist editor does not sync the playlist on each |
Lois's failure mode is different to yours. She was using Chrome There are two problems here, 1. addTrack failure on the service, and 2. the js code should just report errors and fail, not get stuck in a loop like this. I am continuing to investigate. Any insights will be appreciated. [Edit: Lois's UA string is for Chrome, not Safari.] |
The log for Lois looks like this:
...and it continues in this way a couple dozen more times with rapid-fire Unfortunately, we don't know what was in the request. Apparently the server did not like what it saw (as no track was added) and the browser did not like what it received (as it kept on trying). No errors logged on the server. |
just to clarify, i did not invoke safari until after firefox stalled. i will continue to look out for the error. again, i'd suggest disabling javascript minification so that the issue can be debugged should it happen again. |
In that case, maybe your Firefox stalled much earlier than 1430? The logs show that you logged in from Safari at 1336, opened Edit Playlist, and did a couple add tracks (1336, 1339). Then you go back to Firefox addTrack at 1406, Safari addTrack at 1407, Firefox addTrack at 1410, Safari addTrack at 1411. So there is definitely concurrent access to the playlist over a number of minutes from both browsers. We cannot discount that the error you saw is related to this. FWIW, there was no activity from either browser between 1257 and 1336. The last addTrack from Firefox before that was at 1219. In any event, please do report the error if you see it again.
OK, it is now disabled in Playlists.php in prod. jQuery will still be minified with no source map; you will have to deal with that. |
Server-side error logging has been deployed to production to capture errors on track insert and update. |
Access logs from 2020-01-07 and 2020-01-18 |
Per Eric: ZK hang at studio approximately 1:52 pm on Sat 25 Jan Access logs from 2020-01-25 |
Eric, it seems you are able to reproduce this almost every show... can you reproduce it on a machine that is not the Control A mac? In the most recent logs, you did a delete shortly before the time in question... there were some deletes just before the time of the issue on the 18th as well... coincidence? I have been trying, but still unable to reproduce. |
jim, this has occurred to me only once on 01/18/2020. i, (as opposed to mike) had no problem with my show on 01/25/2020. i will continue to keep watch for it though. just a hunch, but i vaguely recall quickly tabbing from the tag ID -> track select -> Add fields when my one exception occurred. |
I believe I have found it. Offending code was the hidden index 3411f5c..9a03127 100644
--- a/ui/Playlists.php
+++ b/ui/Playlists.php
@@ -828,7 +828,7 @@ class Playlists extends MenuItem {
?>
<div class='pl-form-entry'>
- <input id='track-session' type='hidden' value='session VALUE="<?php echo $this->session->getSessionID(); ?>'>
+ <input id='track-session' type='hidden' value='<?php echo $this->session->getSessionID(); ?>'>
<input id='track-playlist' type='hidden' value='<?php echo $playlistId; ?>'>
<label></label><span id='error-msg' class='error'></span>
<div>
Problem manifests when there is no session cookie. In this case, zk looks to the form value, which in this case is not the session ID as expected, so the request fails to authenticate and is not processed. Not sure how I lost my session cookie, but this is how I discovered it. Failure mode then matched exactly that described in this slip. Fix is now in prod. Let's see how this goes... [Edit: In addition, I have just opened #129 to revisit session cookie scope, to deal with the disappearing cookie aspect of this issue. Even without that, however, the issue should be resolved, as addTrack is now providing the expected session ID.] |
This should already be resolved, but am keeping open for now and moving to the latest hotfix milestone. |
Keeping open on the latest hotfix milestone |
The problem is that stage and prod were pointed to two different databases, however cookies are shared between all instances on the same server. When a browser window was opened to stage or prod without the session ID (such as could happen simply because of a new window), the cookie would be used. If the cookie was created in the other instance (stage or prod), the session ID was invalid and the cookie deleted. This means any other windows needing the cookie (for example, because the session ID was not supplied in the request, such as the broken track add above), then it would fail. So the immediate cause -- the broken session ID in track add -- has been fixed. The root cause is that the session cookie breaks when stage and prod point to different databases. We won't do that again until we can scope session cookies to each unique instance. |
lois reports that playlist entry stopped working during her show. it appeared around 2:12 when she tried to perform a manual music entry. the subsequent entries were made from another PC after she complete her show.
https://zookeeper.stanford.edu/?action=viewDate&seq=selList&playlist=40105&session=
The text was updated successfully, but these errors were encountered: