Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spotDL Downloads Predominantly Long Tracks Over 1 Hour #2021

Closed
simpsss opened this issue Feb 4, 2024 · 13 comments · Fixed by #2028 or #2031
Closed

spotDL Downloads Predominantly Long Tracks Over 1 Hour #2021

simpsss opened this issue Feb 4, 2024 · 13 comments · Fixed by #2028 or #2031
Labels
Bug Unexpected problem or unintended behavior that needs to be fixed

Comments

@simpsss
Copy link

simpsss commented Feb 4, 2024

System OS

Docker

Python Version

3.7 (CPython)

Install Source

GitHub

Install version / commit hash

4.2.4

Expected Behavior vs Actual Behavior

The expected behavior is for spotDL to download the regular, album or single versions of tracks, typically lasting between 3 to 5 minutes each.

Steps to reproduce - Ensure to include actual links!

When using spotDL to download tracks from a Spotify playlist, I've encountered an issue where the majority of downloaded tracks are excessively long, typically over 1 hour in duration. This behavior deviates from the expected outcome of downloading the standard versions of songs that usually range from 3 to 5 minutes.

  1. Execute the command: docker compose run --rm spotdl download [playlist link] --save-file "playlist.sync.spotdl".
  2. Observe the resulting downloads, noting that most are unusually long tracks, often exceeding 1 hour, which are not the intended versions of the songs.

Traceback

Downloaded "Alphaville - Big in Japan - 2019 Remaster": https://music.youtube.com/watch?v=WlKhxulA4O8                                                                                                                                                                         
Downloaded "Modern Talking - Cheri Cheri Lady": https://music.youtube.com/watch?v=c1ZCYY-4lAM                                                                                                                                                                                 
Downloaded "Modern Talking - Brother Louie": https://music.youtube.com/watch?v=b5EZp76Oxhw                                                                                                                                                                                    
Downloaded "Modern Talking - You're My Heart, You're My Soul": https://music.youtube.com/watch?v=VR0_hgiDCk0                                                                                                                                                                  
Downloaded "Desireless - Voyage voyage": https://www.youtube.com/watch?v=x5sIvkS_kps                                                                                                                                                                                          
Downloaded "Max Romeo - Chase The Devil": https://music.youtube.com/watch?v=etXp-2op3aU

Other details

No response

@simpsss simpsss added the Bug Unexpected problem or unintended behavior that needs to be fixed label Feb 4, 2024
@bharat-nair
Copy link
Contributor

There is a logic in place to find the best result based on the views, perhaps we can add the duration of the song as well, for scoring the results? Something like, say if the result's duration is within 10% of the original song duration, then we let it pass as a result.

@clorirdrix
Copy link

I have the same problem, example, this playlist: https://open.spotify.com/playlist/37i9dQZF1DWTSKFpOdYF1r (The Smiths - Bigmouth Strikes Again - 2011 Remaster, The Clash - Rock the Casbah - Remastered, Bronski Beat - Smalltown Boy) 1 hour repeat song...

@egndz
Copy link
Contributor

egndz commented Feb 13, 2024

all of the songs given are pretty famous ones in general. If no one is willing to debug that, I can debug it to see if something is odd in the get_best_result(). If there is, I can add the length check to scoring

@bharat-nair
Copy link
Contributor

@egndz sure, got right ahead

@mathsoft-dev
Copy link

There is a logic in place to find the best result based on the views, perhaps we can add the duration of the song as well, for scoring the results? Something like, say if the result's duration is within 10% of the original song duration, then we let it pass as a result.

Great idea. This idea would fix all the issues that I encountered. They were much much longer than the original song duration.

@egndz
Copy link
Contributor

egndz commented Feb 18, 2024

Some of the results return negative score due to the error in calc_time_match where logic is return 100 - (result.duration - song.duration)

`
Song(name='Rock the Casbah - Remastered', .... duration=222426, .... list_length=1)

Result(source='YouTubeMusic', url='https://music.youtube.com/watch?v=6Wt3khq8NpM', .... duration=187.0, ....)
`

You can see the duration differences here once with seconds, the other with milliseconds. Should I fix the issue in this PR as well?

@bharat-nair
Copy link
Contributor

Thanks for sharing that function name, @egndz I didn't know there was already a check for the duration of the results.
I believe what you pointed out might be the actual underlying issue in causing longer songs to get downloaded. The code for reference is:
100 - (result.duration - song.duration)

If we assume the song.duration is in ms, then logically thinking, the match that is hours long will become the best match, as the smaller duration ones will have a much larger difference, and will lead to a lesser score. I think the best approach might be to try and fix this s-ms discrepancy first and see if that fixes the long duration issue.

Also as a side note, if you use the song url instead of a playlist in the spotdl command, the result seems to have the duration correctly in seconds.

@xnetcat
Copy link
Member

xnetcat commented Feb 19, 2024

Thanks for sharing that function name, @egndz I didn't know there was already a check for the duration of the results.
I believe what you pointed out might be the actual underlying issue in causing longer songs to get downloaded. The code for reference is:
100 - (result.duration - song.duration)

If we assume the song.duration is in ms, then logically thinking, the match that is hours long will become the best match, as the smaller duration ones will have a much larger difference, and will lead to a lesser score. I think the best approach might be to try and fix this s-ms discrepancy first and see if that fixes the long duration issue.

Also as a side note, if you use the song url instead of a playlist in the spotdl command, the result seems to have the duration correctly in seconds.

I've fixed the incorrect calculations for playlists on the dev branch. I wanted to work on a better way of calculating a time match, but I don't remember if I got around to working on it. I might have a fix on my PC, but might have forgot to push the commit. I will update you today/tomorrow.

@xnetcat
Copy link
Member

xnetcat commented Feb 19, 2024

If anyone has a ready PR I am more than happy to merge it 😁

@egndz
Copy link
Contributor

egndz commented Feb 19, 2024

I will review the rest of the code and spot the best place to fix the issue. Instead of fixing the duration in the time scoring, fix should be done during the class init where we set the duration. Can you please assign the issue to me? @xnetcat

regarding score calculation, if we have 200 duration and 2 results have 180 and 220 duration, as error rate is 10% for both, we can give a score of 90. This method will ensure that we will remove nonsense durations 😄 we can also add --ignore-time check later. open to discussion tho

I do like the project a lot as I am using day to day for djing and would like to contribute more as a data scientist

thank you to all contributors :)

@xnetcat xnetcat linked a pull request Feb 23, 2024 that will close this issue
@Prankish8407
Copy link

Just wanted to let you know this issue still persists.
Its not a lot but out of the 100000 songs i had at least 50 with a 1h and even up to 12h tracks.

@egndz
Copy link
Contributor

egndz commented Mar 2, 2024

@Prankish8407 have you tried with the dev branch? There is still PR #2031 open for that

@Prankish8407
Copy link

im using docker container is the tag development? as in:

spotdl/spotify-downloader:latest --> spotdl/spotify-downloader:development

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Unexpected problem or unintended behavior that needs to be fixed
Projects
None yet
7 participants