Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#33 Playlist Support #44

Merged
merged 6 commits into from
Sep 3, 2020
Merged

#33 Playlist Support #44

merged 6 commits into from
Sep 3, 2020

Conversation

Grodou
Copy link
Contributor

@Grodou Grodou commented Aug 22, 2020

Get playlist details and all videos.
No download for now.

Get playlist infos and all videos.
No download for now.
@sealedtx
Copy link
Owner

@Grodou Hi, great work! Thank you for contributing. I have few suggestions and questions:

  • Naming of YoutubeVideo+VideoDetails and YoutubePlaylist+PlaylistDetails is fine, but PlaylistVideo (which extends AbstractVideoDetails) is counterintuitive as for me. I don't know better naming for now, need to think, mb PlaylistItemor PlaylistVideoDetails
  • Why do you follow C89 variable declaration style? It's 2020 now, you don't need to declare variables at the beggining of methods :)
  • Please explain what is "continuation" of playlist, does it mean that playlist is too big to display in a single webpage and we need to load more?
  • What are those hardcoded values X-YouTube-Client-Version", "2.20200720.00.02" and do we need to update them from time to time, if yes where I should look for them?
  • What does "unstable" comment mean in your getVeryLongPlaylist?

@Grodou
Copy link
Contributor Author

Grodou commented Aug 24, 2020

@sealedtx Hi, thank you for the review. Here come the answers:

PlaylistVideo (which extends AbstractVideoDetails) is counterintuitive as for me. I don't know better naming for now, need to think, mb PlaylistItem or PlaylistVideoDetails

Completely agree. I think PlaylistVideoDetails is better.

Why do you follow C89 variable declaration style? It's 2020 now, you don't need to declare variables at the beggining of methods :)

Old school style I guess :)

Please explain what is "continuation" of playlist, does it mean that playlist is too big to display in a single webpage and we need to load more?

Exactly, 100 videos max. Scrolling to the bottom loads 100 more, and so on.
All videos are directly loaded for now, a paging system could be handy.

What are those hardcoded values X-YouTube-Client-Version", "2.20200720.00.02" and do we need to update them from time to time, if yes where I should look for them?

Just a part of playlist.py. The actual client version we should use is available in the ytInitialData.responseContext part of the response.
(for the record, it's also useful for faster downloads, maybe another PR)
For the "X-YouTube-Client-Name", I don't know, but both are mandatory.

What does "unstable" comment mean in your getVeryLongPlaylist?

That playlist is often updated, we can't do reliable tests on its length or content.

Still in progress, next step is to make a PlaylistVideo as easy to use as a YoutubeVideo (format selection, downloads, ...).

@sealedtx
Copy link
Owner

@Grodou Thank you for the answers.

Old school style I guess :)

Please, make sure you follow current code base style (even if it is not perfect), for easier maintenance in future.

The actual client version we should use is available in the ytInitialData.responseContext part of the response.

So mb we should implement getOrDefault logic for these values (search them in response, if not found - use default hardcoded)?

next step is to make a PlaylistVideo as easy to use as a YoutubeVideo (format selection, downloads, ...).

Not sure if you should focus on this. I believe that there is no simple way but just YoutubeDownloader.getVideo(id) for all of items in playlist which will be an overhead if user want only metadata about playlist items. I think current api might be sufficient to get playlist info (vidoe elements with videoId/title/thumbnail) and later it is possible to get YoutubeVideo if you need to download only some of videos. Correct me if I am wrong.

@Grodou
Copy link
Contributor Author

Grodou commented Aug 26, 2020

@sealedtx Hi, here are some answers and suggestions.

Please, make sure you follow current code base style (even if it is not perfect), for easier maintenance in future.

It wasn't clear, but of course I will.

So mb we should implement getOrDefault logic for these values (search them in response, if not found - use default hardcoded)?

Looks like the client version is always in the response, no need for hardcoded value.

I think current api might be sufficient to get playlist info (vidoe elements with videoId/title/thumbnail) and later it is possible to get YoutubeVideo if you need to download only some of videos. Correct me if I am wrong.

I agree, it is sufficient. However, functionalities like these could be handy:

  • lazy loading on formats() and subtitles() methods, no overhead if you're only interested in metadatas
  • specific download directory, video numbering (playlist_name/001-video1.mp4)
  • download all formats matching a filter (playlist.downloadByItag(File outDir, int itag), downloadVideosWithQuality(File outDir, VideoQuality videoQuality))

I made a version of it, and since code explains it better than I could, I will soon commit so you can tell if it's interesting or not.
And if not, I will of course take care of all the cleanings.

Grodou added 2 commits August 26, 2020 19:32
Download playlist videos as regular videos,  and full playlists using filters.
Code style fixes
@Grodou
Copy link
Contributor Author

Grodou commented Aug 26, 2020

@sealedtx Here is the version with download available, main changes are listed below.


Video hierarchy
AbstractVideo<T extends AbstractDetails>

  • YoutubeVideo<VideoDetails>
  • PlaylistVideo<PlaylistVideoDetails>

PlaylistVideo lazy loading: formats() and subtitles() trigger a load action.
We can force a single video to do so with PlaylistVideo.fetch().
We can force all videos to do so with Playlist.fecthVideos().
If an error occurs, formats and subtitles are empty, the initial exception is accessible via getFetchError()
Other ways to do:

  1. throw a runtime exception, YoutubeVideo method signatures remain unchanged
  2. throw a YoutubeException, and change YoutubeVideo methods

Exceptions
YoutubeException.DownloadUnavailableException superclass

  • LiveVideoException for live videos
  • RestrictedVideoException for playlist private and deleted videos

Tests

  • hierarchy, reuse
  • output directory is deleted after each test
  • video and playlist tests have their own directory

I noticed a strange behaviour: a video can have a specific available format at a time, and a few seconds later, it's gone. Unit tests take care of it and should success any time.
I didn't find any stable playlist with deleted and/or private videos to test.

Let me know what you think.

@sealedtx
Copy link
Owner

sealedtx commented Aug 28, 2020

@Grodou Hi, sorry for a long response, I feel busy last days. First of all thank you for a great work! And again suggestions:


YoutubePlaylist:
I believe that it should contain only these methods:

  • details()
  • videos()
  • findVideoById()
  • findVideoByIndex() (BTW: is it possible that video index is not same as list index? Or may be we can sort them and remove this method?)
  • findVideos() (I like your Predicate<> filter and findFormat() looks cool too, but may be it should be findFormats() and return list of formats, same as findVideos())
  • fetchVideos() (Not sure about this, because if one fetch fails it will skip subsequent ones)

I hardly imagine that some one is looking for a videos in playlist only with specific itag or quality. Even so it could be achieved with findVideos().
I think that we should remove download methods from YoutubePlaylist, because it creates additional hierarchy: playlist->videos->formats.
I see user's steps as:

  • parse playlist
  • look for a wanted videos
  • fetch formats
  • choose acceptable formats
  • download

Moving last four items into a single one makes code difficult to read and user won't understand what is under the hood (additional http requets), IMHO.


PlaylistVideo:
You should not keep reference to Parser inside it. Better to pass it inside fetch(parser) only once it is needed.
Storing fetchError also looks not pretty. I think subtitles() and formats() should return empty list or throw runtime exception if not fetched yet, choose the one you like more. BTW: you may use Collections.emptyList() instead of Collections.EMPTY_LIST and it won't require @SuppressWarnings("unchecked").


PlaylistVideoDetails:
Could you please explain what is isPlayable and when it can be false?


Packages:
Need to think about package refactoring. If we have playlist package mb we should add video package and move there YoutubeVideo and VideoDetails.


Waiting for your comments and thoughts. Thank you!

@Grodou
Copy link
Contributor Author

Grodou commented Aug 29, 2020

@sealedtx Hi, thank you for the kind comments, and sorry it took me some time to answer.


Lazy loading

Since it's a central part of this version, I think we should agree about it first.

You should not keep reference to Parser inside it. Better to pass it inside fetch(parser) only once it is needed.

If that so, lazy loading is broken.
And then:

  • the only way to get the parser back is the YoutubeDownloader
  • playlistVideo.fetch(downloader.getParser()) becomes useless since we could just use downloader.getVideo()
  • the whole PlaylistVideo class can be removed, format selection and download features as well

In my opinion, this widely used design pattern is completely adapted to this case: it loads data only when it's needed (if ever), and hides any underlying process.

Please feel free to tell if you don't want it, I'll keep the version in a branch anyway, just in case.
Hope this helps to make a final choice about it, so we can get closer to a final version!


2 questions I can answer for now:

is it possible that video index is not same as list index? Or may be we can sort them and remove this method?

Video index starts at 1, not 0. Videos seem to be always sorted.
findVideoByIndex(index) could simply return videos().get(index - 1) (after a basic boundary check)

Could you please explain what is isPlayable and when it can be false?

A false value means the video is either private or deleted

@sealedtx
Copy link
Owner

@Grodou Thank you for the answers.

Lazy loading

I understand what is lazy loading pattern and why it might be applied here. But I I doubt about hiding from user additional http request, because I know that this library is used by android devs. Due to lazy loading it is not clear that formats() or subtitles() requires http request, as I know network request may freeze UI Thread on android or even throw an Exception (because it is not allowed to work with network on main thread). I need to think about this, give me few days, please.

YoutubePlaylist

findVideoByIndex(index) could simply return videos().get(index - 1) (after a basic boundary check)

I think in such case this method can be omitted.


Still waiting for your thoughts about shrinking YoutubePlaylist api. Thank you.

@Grodou
Copy link
Contributor Author

Grodou commented Aug 30, 2020

@sealedtx Hi, thank you for the helpful informations.


Lazy loading

I was wondering why you seemed both interested and reluctant, and just wanted to move on.
Sorry about that, please take your time.


YoutubePlaylist

I think in such case this method can be omitted (findVideoByIndex(index))

Totally agree.

Still waiting for your thoughts about shrinking YoutubePlaylist api

I wanted to know about lazy loading first, because there's a lot to say, as you can see.
So let's assume playlist videos still have formats.

fetchVideos() (Not sure about this, because if one fetch fails it will skip subsequent ones)

Just a poor implementation of mine, it should better catch exceptions and return a list of successfully fetched videos or at least the count of it.

may be it should be findFormats() and return list of formats, same as findVideos()

  • video.findFormat() is only there for playlist.download(outDir, filter)
  • video.findFormats() could be useful indeed, but would not replace findFormat()
  • playlist.findFormats() returning a set of formats available for all videos would be great! (see next answer)

Download

I hardly imagine that some one is looking for a videos in playlist only with specific itag or quality.

If you're interested by a whole music playlist so you want all the audio tracks, we could then assume that:

  • you want the same quality for each
  • all videos have the same set of formats

So you could either:

  • pick a format from the 1st video
  • pick a format from the brand new playlist.findFormats() method
  • directly use an itag that is almost always there, like i140

and download them all!

Even so it could be achieved with findVideos().

Absolutely, but then, you still have to loop over videos, select the same format each time, and download.
The download feature is an utility, not an obligation. Any user will always be free to follow the 5 steps process you suggest.

2 more ideas for downloads.

We could watch progress (for an async task) with a listener like:

  • onDownloading(video, progress)
  • onFinished(video, file)
  • boolean onError(video, throwable): return true only if you want to keep on downloading

We could move the whole download logic to the Format class since:

  • you download a format, not a video
  • format is responsible for the file extension
  • methods with fewer arguments are more pretty :)
  • it would be more consistent with the subtitles package
    A reference to the video details (or at least the title) would be mandatory for the file name.

Packages

Need to think about package refactoring. If we have playlist package mb we should add video package and move there YoutubeVideo and VideoDetails.

Indeed, a dedicated package for "regular" video would make things clearer.
By the way, I think Itag and Extension should better belong to the format package.


That was a lot of topics, sorry for the longest comment ever :)
Hope it will be helpful.

@sealedtx
Copy link
Owner

@Grodou Hi! Thank you for your response.

Lazy loading

I really appreciate your work done, but unfortunately, I decided not to implement (or at least postpone it) lazy loading for playlist video formats, due to reasons mentioned above.


YoutubePlaylist

Just a poor implementation of mine

I prefer to keep simple things done well at first, and leave opportiunity to extend functionally later. If we don't know how it should be implemented to work well - remove it for now, and discuss/add in the further updates.


Download

all videos have the same set of formats

It might be wrong. It is common, that different videos may have different set of formats (it may depend on the period when they were uploaded, quality of the source video, etc). And also youtube make changes from time to time.

The download feature is an utility, not an obligation. Any user will always be free to follow the 5 steps process you suggest.

User must be aware about what is happening using this library. As I already mentioned, merging these steps into one can lead to unclear situations, errors may occur during fetching meta, or accessing file system, or connection, or downloading. It would be difficult to build such api that properly handles all these possible errors in a single method. I believe user of this library must be responsible for it.

We could move the whole download logic to the Format class since:

Agree. But I don't want to do changes which break backwards compatibility. There is a lot of work to be done about refactoing, including classes naming, packages structure, shrinking useless methods, extending api, fixing api and code style. But I want it to be released in the 3.x.x version. I will create "refactoring" branch later. I like most of your ideas and you are welcomed to contribute!

By the way, I think Itag and Extension should better belong to the format package.

Itag - yes, but Extension contains also subtitles formats, not only video. As I said, not in this PR, sorry for starting discussion about refactoring here.


So... in conclusion... sorry for being not very clear: In this PR I want to see the most simple functionality for playlist meta extraction, but enough to be useful. It should cover the most common use cases, and it should be clear for user where and which errors may occur. We may discuss further improvements (but no breaking changes in version 2.x.x) after the feature will have already been released and useful for people.

sorry for the longest comment ever :)

Don't worry, my comment is also not very compact :)

@Grodou
Copy link
Contributor Author

Grodou commented Aug 31, 2020

@sealedtx Hi, that's good news! I think it's a very good decision.

So, in order to keep the API as clear and simple as possible, I think:

  • YoutubePlaylist should only contain the details, and a list of PlaylistItem
  • PlaylistVideoDetails should change to PlaylistItem, keeping the hierarchy of AbstractDetails (as you suggested in your first comment)
  • No more PlaylistVideo class

Like this, things are clear, an item is not a video, and access is more simple: downloader.getVideo(item.videoId()).
Tell me if that's OK for you.

@sealedtx
Copy link
Owner

@Grodou Sounds good. But let it be PlaylistVideoDetails, to understand that this class contains fields of AbstractVideoDetails

@Grodou
Copy link
Contributor Author

Grodou commented Aug 31, 2020

@sealedtx OK, should the YoutubePlaylist field be items anyway ? or videos ?

@sealedtx
Copy link
Owner

@Grodou let it be videos ;)

* Lazy loading and download removed
* README.md update
* Code style fixes
@Grodou
Copy link
Contributor Author

Grodou commented Sep 1, 2020

@sealedtx Hi, it's done.

Just a few comments:

  • YoutubePlaylist.findVideos(filter) still there, YoutubeVideo.findFormats(filter) added
  • AbstractVideoDetails.checkDownload() remains too
  • the "very long playlist" test takes about 20 seconds to execute, feel free to remove it

Hope that's fine.

@sealedtx
Copy link
Owner

sealedtx commented Sep 2, 2020

@Grodou Perfect! Is everything ready to merge?

@Grodou
Copy link
Contributor Author

Grodou commented Sep 2, 2020

@sealedtx Hi, I forgot one little line... Ready to merge now!

Probably not the right place to ask, but I'd like to make a PR for faster downloads.
Is it relevant? Should I create an issue first?

@sealedtx
Copy link
Owner

sealedtx commented Sep 3, 2020

@Grodou Thank you.

Is it relevant? Should I create an issue first?

For sure, and yes, it would be better to explain your thoughts in the issue.

@sealedtx sealedtx merged commit 666f0dc into sealedtx:master Sep 3, 2020
@Grodou Grodou deleted the #33-playlist branch September 3, 2020 12:45
kangsLee pushed a commit to kangsLee/java-youtube-downloader that referenced this pull request Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants