Compare commits

...

595 commits

Author SHA1 Message Date
Sergey M․ 416da574ec
[ytsearch] Fix extraction (closes #26920) 2020-10-23 21:31:37 +07:00
Toan Nguyen 48c5663c5f
[afreecatv] Fix typo (#26970) 2020-10-22 19:15:05 +07:00
Hannu Hartikainen 7d740e7dc7
[23video] Relax _VALID_URL (#26870) 2020-10-20 00:56:23 +07:00
Kevin O'Connor 4eda10499e
[utils] Don't attempt to coerce JS strings to numbers in js_to_json (#26851)
The current logic in `js_to_json` tries to rewrite octal/hex numbers to
decimal. However, when the logic actually happens the `"` or `'` have
already been trimmed off. This causes what were originally strings, that
happen to look like octal/hex numbers, to get rewritten to decimal and
returned as a number rather than a string.

In practive something like:

```js
{
  "0x40": "foo",
  "040": "bar",
}
```

would get rewritten as:

```json
{
  64: "foo",
  32: "bar
}
```

This is problematic since this isn't valid JSON as you cannot have
non-string keys.
2020-10-18 00:10:41 +07:00
Sergio Livi 605535776a
[ustream] Add support for video.ibm.com (#26894) 2020-10-17 23:14:46 +07:00
Felix Yan 1050e0d09f
[iqiyi] Fix typo (#26884) 2020-10-17 23:02:17 +07:00
Sergey M․ d65d89183f
[expressen] Add support for di.se (closes #26670) 2020-09-24 07:37:10 +07:00
Surkal 0c92f1e96b
[iprima] Improve video id extraction (#26507) (closes #26494) 2020-09-24 06:46:58 +07:00
Sergey M․ adae9e844b
[README.md] Fix autonumber sequence description (refs #26686) 2020-09-24 06:36:07 +07:00
Sergey M․ c5764b3f89
[downloader/http] Properly handle missing message in SSLError (closes #26646) 2020-09-22 07:01:59 +07:00
Sergey M․ 0837992a22
[downloader/http] Fix access to not yet opened stream in retry 2020-09-22 06:44:14 +07:00
Sergey M․ b55715934b
release 2020.09.20 2020-09-20 12:30:45 +07:00
Sergey M․ bbc3b5b4bb
[ChangeLog] Actualize
[ci skip]
2020-09-20 12:24:32 +07:00
nixxo 1ca5f821c8
[redtube] Extend _VALID_URL (#26506) 2020-09-20 11:39:42 +07:00
Sergey M․ defc820b70
[twitch] Switch streams to GraphQL and refactor (closes #26535) 2020-09-20 10:05:00 +07:00
Sergey M․ 82ef02e936
[telequebec] Fix issues (closes #26368) 2020-09-19 07:56:00 +07:00
Patrick Dessalle b856b3997c
[telequebec] Add support for brightcove videos (closes #25833) 2020-09-19 07:52:57 +07:00
Sergey M․ cd85a1bb8b
[pornhub] Extract metadata from JSON-LD (closes #26614) 2020-09-19 06:34:34 +07:00
Sergey M․ ce5b904050
[extractor/common] Relax interaction count extraction in _json_ld 2020-09-19 06:33:17 +07:00
Sergey M․ ad06b99dd4
[extractor/common] Extract author as uploader for VideoObject in _json_ld 2020-09-19 06:13:42 +07:00
JChris246 540b9f5164
[pornhub] Fix view count extraction (#26621) (refs #26614) 2020-09-19 05:59:19 +07:00
Stefan Pöschel 6e65a2a67e
[downloader/hls] Fix incorrect end byte in Range HTTP header for media segments with EXT-X-BYTERANGE (#24512) (closes #14748)
The end of the byte range is the first byte that is NOT part of the to
be downloaded range. So don't include it into the requested HTTP
download range, as this additional byte leads to a broken TS packet and
subsequently to e.g. visible video corruption.

Fixes #14748.
2020-09-18 05:26:56 +07:00
Sergey M․ f8c7bed133
[extractor/common] Handle ssl.CertificateError in _request_webpage (closes #26601)
ssl.CertificateError is raised on some python versions <= 3.7.x
2020-09-18 03:41:16 +07:00
Sergey M․ cdc55e666f
[downloader/http] Improve timeout detection when reading block of data (refs #10935) 2020-09-18 03:32:54 +07:00
Ori Avtalion 86b7c00adc
[downloader/http] Retry download when urlopen times out (#26603) (refs #10935) 2020-09-18 03:15:44 +07:00
Sergey M․ e8c5d40bc8
release 2020.09.14 2020-09-14 03:37:36 +07:00
Sergey M․ ca7ebc4e5e
[ChangeLog] Actualize
[ci skip]
2020-09-14 03:35:18 +07:00
Sergey M․ bff857a8af
[postprocessor/embedthumbnail] Fix issues (closes #25717)
* Fix WebP with wrong extension processing
* Fix embedding of thumbnails with % character in path
2020-09-14 03:28:31 +07:00
Alex Merkel a31a022efd
[postprocessor/embedthumbnail] Add support for non jpeg/png thumbnails (closes #25687) 2020-09-14 03:10:01 +07:00
Sergey M․ 45f6362464
[rtlnl] Extend _VALID_URL for new embed URL schema 2020-09-13 21:42:06 +07:00
Derek Land 97f34a48d7
[rtlnl] Extend _VALID_URL (#26549) (closes #25821) 2020-09-13 21:38:16 +07:00
Daniel Peukert ea74e00b3a
[youtube] Fix empty description extraction (#26575) (closes #26006) 2020-09-13 21:23:21 +07:00
Sergey M․ 06cd4cdb25
[srgssr] Extend _VALID_URL (closes #26555, closes #26556, closes #26578) 2020-09-13 21:07:25 +07:00
Sergey M․ da2069fb22
[googledrive] Use redirect URLs for source format (closes #18877, closes #23919, closes #24689, closes #26565) 2020-09-13 20:49:32 +07:00
Sergey M․ 95c9810015
[svtplay] Fix id extraction (closes #26576) 2020-09-13 18:59:37 +07:00
Remita Amine b03eebdb6a [redbulltv] improve support for rebull.com TV localized URLS(#22063) 2020-09-13 11:26:11 +01:00
Remita Amine 1f7675451c [redbulltv] Add support for new redbull.com TV URLs(closes #22037)(closes #22063) 2020-09-12 19:27:58 +01:00
tfvlrue aa27253556
[soundcloud] Reduce pagination limit to fix 502 Bad Gateway errors when listing a user's tracks. (#26557)
Per the documentation here https://developers.soundcloud.com/blog/offset-pagination-deprecated the maximum limit is 200, so let's respect that (even if a higher value sometimes works).

Co-authored-by: tfvlrue <tfvlrue>
2020-09-12 09:35:11 +00:00
Sergey M․ d51e23d9fc
release 2020.09.06 2020-09-06 13:00:41 +07:00
Sergey M․ 6cd452acff
[ChangeLog] Actualize
[ci skip]
2020-09-06 12:57:56 +07:00
Sergey M․ 50e9fcc1fd
[nrktv:episode] Improve video id extraction (closes #25594, closes #26369, closes #26409) 2020-09-06 12:43:50 +07:00
random-nick 16ee69c1b7
[youtube] Fix age gate content detection (#26100) (closes #26152, closes #26311, closes #26384) 2020-09-06 11:44:53 +07:00
Sergey M․ 67171ed7e9
[youtube:user] Extend _VALID_URL (closes #26443) 2020-09-06 11:31:28 +07:00
Sergey M․ 1d9bf655e6
[utils] Recognize wav mimetype (closes #26463) 2020-09-06 11:19:53 +07:00
TheRealDude2 62ae19ff76
[xhamster] Improve initials regex (#26526) (closes #26353) 2020-09-06 11:10:27 +07:00
Sergey M․ 5ed05f26ad
[svtplay] Fix svt id extraction (closes #26425, closes #26428, closes #26438) 2020-09-06 10:45:57 +07:00
Sergey M․ 841b683804
[twitch] Rework extractors (closes #12297, closes #20414, closes #20604, closes #21811, closes #21812, closes #22979, closes #24263, closes #25010, closes #25553, closes #25606)
* Switch to GraphQL.
+ Add support for collections.
+ Add support for clips and collections playlists.
2020-09-06 10:45:34 +07:00
Remita Amine f5863a3ea0 [biqle] improve video_ext extraction 2020-08-27 19:20:41 +01:00
Sergey M․ 10709fc7c6
[xhamster] Extend _VALID_URL (closes #25927) 2020-08-12 21:51:50 +07:00
TheRealDude2 a7e348556a
[xhamster] Fix extraction (closes #26157) (#26254) 2020-08-12 21:42:17 +07:00
JChris246 6cb30ea5ed
[xhamster] Extend _VALID_URL (closes #25789) (#25804) 2020-08-12 21:37:22 +07:00
Sergey M․ a4ed50bb84
release 2020.07.28 2020-07-28 05:13:03 +07:00
Sergey M․ 570611955f
[ChangeLog] Actualize
[ci skip]
2020-07-28 05:07:54 +07:00
Sergey M․ e450f6cb63
[youtube] Fix sigfunc name extraction (closes #26134, closes #26135, closes #26136, closes #26137) 2020-07-28 05:05:38 +07:00
MRWITEK a115e07594
[youtube] Improve description extraction (closes #25937) (#25980) 2020-07-14 12:01:15 +01:00
Sergey M․ 718393c632
[wistia] Restrict embed regex (closes #25969) 2020-07-11 18:27:19 +07:00
Glenn Slayden 07af16b92e
[youtube] Prevent excess HTTP 301 (#25786) 2020-07-01 02:56:16 +07:00
Sergey M․ e942cfd1a7
[youtube:playlists] Extend _VALID_URL (closes #25810) 2020-06-28 10:30:03 +07:00
Remita Amine 9a7e5cb88a [bellmedia] add support for cp24.com clip URLs(closes #25764) 2020-06-23 15:09:13 +01:00
Sergey M․ 2391941f28
[brightcove] Improve embed detection (closes #25674) 2020-06-16 17:38:25 +07:00
Sergey M․ 9ff6165a81
release 2020.06.16.1 2020-06-16 06:22:01 +07:00
Sergey M․ 1c748722f9
[ChangeLog] Actualize
[ci skip]
2020-06-16 06:19:23 +07:00
Sergey M․ ee0b726cd7
[youtube] Force old layout (closes #25682, closes #25683, closes #25680, closes #25686) 2020-06-16 06:17:53 +07:00
Sergey M․ dbeafce5d5
[youtube] Fix categories and improve tags extraction 2020-06-16 03:13:39 +07:00
Sergey M․ ed604ce7bc
release 2020.06.16 2020-06-16 02:53:33 +07:00
Sergey M․ 7adc7ca547
[ChangeLog] Actualize
[ci skip]
2020-06-16 02:52:09 +07:00
Sergey M․ a6211d237b
[youtube] Fix uploader id and uploader URL extraction 2020-06-16 02:43:09 +07:00
Sergey M․ 7b16239a49
[youtube] Improve view count extraction 2020-06-16 02:38:45 +07:00
Sergey M․ 37357d21a9
[youtube] Fix upload date extraction 2020-06-16 02:37:19 +07:00
Sergey M․ b477fc1314
[youtube] Fix thumbnails extraction and remove uploader id extraction warning (closes #25676) 2020-06-16 02:29:04 +07:00
Sergey M․ d84b21b427
[youtube] Fix playlist and feed extraction (closes #25675) 2020-06-16 02:01:12 +07:00
Philipp Hagemeister 48bd042ce7 [facebook] Support single-video ID links
I stumbled upon this at https://www.facebook.com/bwfbadminton/posts/10157127020046316 . No idea how prevalent it is yet.
2020-06-14 13:17:51 +02:00
Sergey M․ 84213ea8d4
[youtube] Extract chapters from JSON (closes #24819) 2020-06-06 04:22:10 +07:00
Sergey M․ 562de77f41
[kaltura] Add support for multiple embeds on a webpage (closes #25523) 2020-06-06 02:14:35 +07:00
Sergey M․ e1723c4bac
release 2020.06.06 2020-06-06 01:51:39 +07:00
Sergey M․ 607d204551
[ChangeLog] Actualize
[ci skip]
2020-06-06 01:49:27 +07:00
Sergey M․ a5b6102ea8
[tele5] Bypass geo restriction 2020-06-06 01:45:05 +07:00
Sergey M․ b77888228d
[jwplatform] Add support for bypass geo restriction 2020-06-06 01:44:36 +07:00
Sergey M․ 0b1eaec3bc
[tele5] Prefer jwplatform over nexx (closes #25533) 2020-06-06 01:35:09 +07:00
Sergey M․ b37e47a3f9
[twitch:stream] Expect 400 and 410 HTTP errors from API 2020-06-06 00:57:40 +07:00
Sergey M․ ce3735df02
[twitch:stream] Fix extraction (closes #25528) 2020-06-06 00:55:29 +07:00
Sergey M․ a0455d0ffd
[twitch] Pass v5 accept header and fix thumbnails extraction (closes #25531) 2020-06-06 00:12:47 +07:00
Sergey M․ c8b232cc48
[brightcove] Sort imports 2020-06-05 23:35:57 +07:00
Sergey M․ b4eb0bc7bd
[brightcove] Fix subtitles extraction (closes #25540) 2020-06-05 23:33:14 +07:00
Matej Dujava d5147b65ac
[malltv] Add support for sk.mall.tv (#25445) 2020-06-01 21:11:31 +07:00
Sergey M․ 7b0b53ea69
[twitter:broadcast] Add untitled periscope broadcast test 2020-06-01 20:32:57 +07:00
Sergey M․ 7016e24ebe
[periscope] Fix untitled broadcasts (#25482) 2020-06-01 20:31:51 +07:00
Sergey M․ bef4688c72
[jwplatform] Improve embeds extraction (closes #25467) 2020-05-31 11:10:31 +07:00
Sergey M․ 228c1d685b
release 2020.05.29 2020-05-29 03:33:13 +07:00
Sergey M․ efd72b05d2
[ChangeLog] Actualize
[ci skip]
2020-05-29 03:28:44 +07:00
Sergey M․ fe515e5c75
[ard:beta] Extend _VALID_URL (closes #25405) 2020-05-29 02:01:51 +07:00
striker.sh 1db5ab6b34
[youtube] Add support for more invidious instances (#25417) 2020-05-27 01:26:45 +07:00
Sergey M․ 2791e80b60
[postprocessor/ffmpeg] Embed series metadata with --add-metadata 2020-05-23 12:28:15 +07:00
JordanWeatherby 8f841fafcd
[giantbomb] Extend _VALID_URL (#25222) 2020-05-21 04:30:50 +07:00
Michael Klein a54c5f83c0
[ard] Improve _VALID_URL (closes #25134) (#25198) 2020-05-20 04:08:08 +07:00
Sergey M․ cd13343ad8
[redtube] Improve formats extraction and extract m3u8 formats (closes #25311, closes #25321) 2020-05-20 03:39:41 +07:00
Rob 9cd5f54e31
[utils] Fix file permissions in write_json_file (closes #12471) (#25122) 2020-05-20 03:21:52 +07:00
tlsssl 9a269547f2
[indavideo] Switch to HTTPS for API request (#25191) 2020-05-20 02:13:06 +07:00
Dave Loyall bf097a5077
[redtube] Improve title extraction (#25208) 2020-05-20 02:11:05 +07:00
Remita Amine 52c50a10af [vimeo] improve format extraction and sorting(closes #25285) 2020-05-15 15:57:06 +01:00
Remita Amine b334732709 [soundcloud] reduce API playlist page limit(closes #25274) 2020-05-15 14:13:02 +01:00
Juan Francisco Cantero Hurtado 384bf91f88
[youtube] Add support for yewtu.be (#25226) 2020-05-14 05:54:42 +07:00
TotalCaesar659 fae11394f0
[README.md] flake8 HTTPS URL (#25230) 2020-05-14 05:53:17 +07:00
comsomisha adc13b0748
[mailru] Fix extraction (closes #24530) (#25239) 2020-05-14 05:51:40 +07:00
Sergey M․ 327593257c
[bbccouk] PEP8 2020-05-14 05:11:42 +07:00
Remita Amine 9d8f3a12a6 [spike] fix Bellator mgid extraction(closes #25195) 2020-05-12 20:49:08 +01:00
Sergey M․ b002bc433a
release 2020.05.08 2020-05-08 18:10:37 +07:00
Sergey M․ b74896dad1
[ChangeLog] Actualize
[ci skip]
2020-05-08 18:07:05 +07:00
Sergey M․ fa3db38333
[youtube] Improve signature cipher extraction (closes #25188) 2020-05-08 17:42:30 +07:00
Sergey M․ 30fa5c6087
[iprima] Improve extraction (closes #25138) 2020-05-06 23:20:14 +07:00
Sergey M․ 6c907eb33f
[downloader/http] Request last data block of exact remaining size
Always request last data block of exact size remaining to download if possible not the current block size.
2020-05-05 21:43:39 +07:00
Sergey M․ f7b42518dc
[downloader/http] Finish downloading once received data length matches expected
Always do this if possible, i.e. if Content-Length or expected length is known, not only in test.
This will save unnecessary last extra loop trying to read 0 bytes.
2020-05-05 21:43:39 +07:00
Remita Amine ce7db64bf1 [uol] fix extraction(closes #22007) 2020-05-05 11:19:40 +01:00
hh0rva1h 1328305851
[orf] Add support for more radio stations (closes #24938) (#24968) 2020-05-05 06:22:50 +07:00
Sergey M․ 6c22cee673
[extractor/common] Use compat_cookiejar_Cookie for _set_cookie (closes #23256, closes #24776)
To always ensure cookie name and value are bytestrings on python 2.
2020-05-05 06:00:37 +07:00
Sergey M․ 6d874fee2a
[compat] Introduce compat_cookiejar_Cookie 2020-05-05 05:54:10 +07:00
Sergey M․ 676723e0da
[dailymotion] Fix typo 2020-05-05 05:09:07 +07:00
Sergey M․ c380cc28c4
[utils] Improve cookie files support
+ Add support for UTF-8 in cookie files
* Skip malformed cookie file entries instead of crashing (invalid entry len, invalid expires at)
2020-05-05 04:21:25 +07:00
Sergey M․ f7f304910d
[puhutv] Remove no longer available HTTP formats (closes #25124) 2020-05-04 21:15:19 +07:00
Sergey M․ 00a41ca4c3
release 2020.05.03 2020-05-03 00:05:05 +07:00
Sergey M․ 66f32ca0e1
[ChangeLog] Actualize
[ci skip]
2020-05-02 23:59:25 +07:00
Sergey M․ 6ffc3cf74a
[crunchyroll] Fix and improve extraction (closes #25096, closes #25060) 2020-05-02 23:42:51 +07:00
Sergey M․ 4433bb0245
[extractor/common] Extract multiple JSON-LD entries 2020-05-02 23:40:30 +07:00
Sergey M․ e40c758c2a
[youtube] Improve player id extraction and add tests 2020-05-02 07:18:08 +07:00
Sergey M․ 011e75e641
[youtube] Use redirected video id if any (closes #25063) 2020-05-01 00:40:38 +07:00
Remita Amine 2468a6fa64 [yahoo] fix GYAO Player extraction and relax title URL regex(closes #24178)(closes #24778) 2020-04-29 14:56:32 +01:00
Remita Amine 700265bfcf [tvplay] fix Viafree extraction(closes #15189)(closes #24473)(closes #24789) 2020-04-29 13:38:58 +01:00
Sergey M․ c97f5e934f
[tenplay] Relax _VALID_URL (closes #25001) 2020-04-26 12:41:33 +07:00
Sergey M․ 38db9a405a
[prosiebensat1] Extract series metadata 2020-04-24 02:56:10 +07:00
Philipp Stehle 2cdfe977d7
[prosiebensat1] Improve extraction and remove 7tv.de support (#24948) 2020-04-24 02:44:13 +07:00
willbeaufoy 46d0baf941
[options] Clarify doc on --exec command (closes #19087) (#24883) 2020-04-24 02:31:38 +07:00
Sergey M․ 00eb865b3c
[youtube] Fix DRM videos detection (refs #24736) 2020-04-11 23:05:08 +07:00
Sergey M․ 2f19835726
[thisoldhouse] Improve video id extraction (closes #24549) 2020-04-11 20:07:37 +07:00
AndrewMBL 533f3e3557
[thisoldhouse] Fix video id extraction (closes #24548)
Added support for:
with of without "www."
and either  ".chorus.build" or ".com"

It now validated correctly on older URL's
```
<iframe src="https://thisoldhouse.chorus.build/videos/zype/5e33baec27d2e50001d5f52f
```
and newer ones
```
<iframe src="https://www.thisoldhouse.com/videos/zype/5e2b70e95216cc0001615120
```
2020-04-11 20:07:32 +07:00
Sergey M․ 75294a5ed0
[soundcloud] Improve AAC format extraction (closes #19173, closes #24708) 2020-04-10 17:26:03 +07:00
tom b9e5f87291
[soundcloud] Extract AAC format 2020-04-10 17:25:04 +07:00
Sergey M․ 6b09401b0b
[youtube] Skip broken multifeed videos (closes #24711) 2020-04-09 22:42:43 +07:00
Sergey M․ 5caf88ccb4
[nova:embed] Fix extraction (closes #24700) 2020-04-09 03:52:29 +07:00
Sergey M․ dcc8522fdb
[motherless] Fix extraction (closes #24699) 2020-04-09 02:14:49 +07:00
Felix Stupp c9595ee780
[twitch:clips] Extend _VALID_URL (closes #24290) (#24642) 2020-04-07 23:21:25 +07:00
Sergey M․ 91bd3bd019
[tv4] Fix ISM formats extraction (closes #24667) 2020-04-07 22:56:06 +07:00
Sergey M․ 13b08034b5
[extractor/common] Skip malformed ISM manifest XMLs while extracting ISM formats (#24667) 2020-04-07 22:55:59 +07:00
Sergey M․ 6a6e1a0cd8
[tele5] Fix extraction (closes #24553) 2020-04-06 02:05:06 +07:00
Sergey M․ 4e7b5bba5f
[mofosex] Add support for generic embeds (closes #24633) 2020-04-06 01:29:58 +07:00
Sergey M․ 52c4c51556
[youporn] Add support form generic embeds 2020-04-05 20:56:14 +07:00
Sergey M․ 8fae1a04eb
[spankwire] Add support for generic embeds (refs #24633) 2020-04-05 20:42:56 +07:00
Sergey M․ d44a707fdd
[spankwire] Fix extraction (closes #18924, closes #20648) 2020-04-05 20:42:56 +07:00
Sergey M․ 049c0486bb
release 2020.03.24 2020-03-24 03:14:30 +07:00
Sergey M․ 30b5121a1c
[ChangeLog] Actualize
[ci skip]
2020-03-24 03:12:15 +07:00
Sergey M․ b439634f0e
[ChangeLog] Actualize
[ci skip]
2020-03-24 03:07:34 +07:00
Sergey M․ 6e47200b6e
[teachable] Update test 2020-03-24 02:57:53 +07:00
Sergey M․ 38fa761a45
[teachable] Update gns3 domain 2020-03-24 02:57:48 +07:00
Sergey M․ 08a27407c4
[teachable] Update upskillcourses domain
New version does not use teachable platform any longer
2020-03-24 02:57:44 +07:00
Sergey M․ be7dacf9cf
[generic] Look for teachable embeds before wistia 2020-03-24 02:57:38 +07:00
Sergey M․ 4560adc820
[teachable] Extract chapter metadata (closes #24421) 2020-03-24 02:57:32 +07:00
Sergey M․ 63dce3094b
[bilibili] Add support for player.bilibili.com (closes #24402) 2020-03-24 00:24:39 +07:00
Sergey M․ b4eb08bb03
[bilibili] Add support for new URL schema with BV ids (closes #24439, closes #24442) 2020-03-24 00:11:39 +07:00
Remita Amine 2e20cb3636 [limelight] remove disabled API requests(closes #24255) 2020-03-23 12:57:10 +01:00
Remita Amine a6c5859d6b [soundcloud] fix download url extraction(closes #24394) 2020-03-22 09:24:26 +01:00
Sergey M․ c76cdf2382
[cbc:watch] Fix authenticated device token caching (closes #19160) 2020-03-21 01:43:13 +07:00
Devon Meunier 787c360467
[cbc:watch] Add support for authentication 2020-03-21 01:43:08 +07:00
Sergey M․ 73453430c1
[hellporno] Fix extraction (closes #24399) 2020-03-21 00:59:48 +07:00
Sergey M․ 158bc5ac03
[xtube] Fix typo 2020-03-14 22:58:10 +07:00
Sergey M․ 4568a11802
[xtube] Fix formats extraction (closes #24348) 2020-03-14 22:57:10 +07:00
Sergey M․ 4cbce88f8b
[ndr] Fix extraction (closes #24326) 2020-03-14 04:58:24 +07:00
Sergey M․ 541fe3eaff
[nhk] Update m3u8 URL and use native hls (#24329) 2020-03-14 04:42:40 +07:00
Sergey M․ 9bfe088594
[nhk] Remove obsolete rtmp formats (closes #24329) 2020-03-14 04:40:11 +07:00
Sergey M․ fcaf4d7a06
[nhk] Relax _VALID_URL (#24329) 2020-03-14 04:39:21 +07:00
Remita Amine 40b6495d40 Revert "[vimeo] fix showcase password protected video extraction(closes #24224)"
This reverts commit 12ee431676.
2020-03-13 08:59:10 +01:00
Sergey M․ f1a8511f7b
[utils] Add reference to cookie file format 2020-03-10 04:59:02 +07:00
Sergey M․ 042b664933
Revert "[utils] Add support for cookies with spaces used instead of tabs"
According to [1] TABs must be used as separators between fields.
Files produces by some tools with spaces as separators are considered
malformed.

1. https://curl.haxx.se/docs/http-cookies.html

This reverts commit cff99c91d1.
2020-03-10 04:53:51 +07:00
Sergey M․ 68fa15155f
release 2020.03.08 2020-03-08 18:27:20 +07:00
Sergey M․ 434f573046
[ChangeLog] Actualize
[ci skip]
2020-03-08 18:16:17 +07:00
Sergey M․ cff99c91d1
[utils] Add support for cookies with spaces used instead of tabs 2020-03-08 18:01:32 +07:00
Tristan Waddington fa9b8c6628
[pornhub] Add support for pornhubpremium.com (#24288) 2020-03-08 18:00:25 +07:00
Sergey M․ ea782aca52
[README.md] Clarify 429 error 2020-03-08 09:17:17 +07:00
Sergey M․ 43ebf77df3
[youtube] Remove outdated code
Additional get_video_info requests don't seem to provide any extra itags any longer
2020-03-08 08:59:58 +07:00
Sergey M․ d332ec725d
[youtube] Improve age-gated videos extraction in 429 error conditions (refs #24283) 2020-03-08 05:41:04 +07:00
Sergey M․ f93abcf1da
[youtube] Improve extraction in 429 error conditions (closes #24283) 2020-03-08 05:09:02 +07:00
Remita Amine 0ec9d4e565 [nhk] update API version(closes #24270) 2020-03-06 20:13:28 +01:00
Sergey M․ 34525a3885
release 2020.03.06 2020-03-06 00:25:43 +07:00
Sergey M․ 2db9ac228d
[ChangeLog] Actualize
[ci skip]
2020-03-06 00:23:14 +07:00
Sergey M․ 5429d6a9cb
[youtube] Fix tests 2020-03-06 00:05:50 +07:00
Sergey M․ dc879c5a37
[youtube] Fix age-gated videos support without login (closes #24248) 2020-03-05 23:48:25 +07:00
Remita Amine 12ee431676 [vimeo] fix showcase password protected video extraction(closes #24224) 2020-03-03 12:33:57 +01:00
Sergey M․ 46cc54ca8f
[pornhub] Improve title extraction (closes #24184) 2020-03-03 06:23:39 +07:00
Sergey M․ 1e1c1960aa
[peertube] Fix issues and improve extraction (closes #23657) 2020-03-03 03:01:47 +07:00
3risian ac379fa236
[peertube] Improve extraction 2020-03-03 03:01:42 +07:00
jxu 0e30a7b973
[youtube:playlist] Fix tests (closes #23872) (#23885) 2020-03-03 01:46:00 +07:00
Sergey M․ 3b5399ce0f
[servus] Add support for new URL schema (closes #23475, closes #23583, closes #24142) 2020-03-03 01:41:53 +07:00
tsia 1c45ff5572
[vimeo] Fix subtitles URLs (#24209) 2020-03-03 01:27:40 +07:00
Sergey M․ 669625a32c
release 2020.03.01 2020-03-01 20:11:32 +07:00
Sergey M․ 170f5b7c27
[ChangeLog] Actualize
[ci skip]
2020-03-01 20:09:05 +07:00
Sergey M․ b274e48d56
[xhamster] Fix extraction (closes #24205) 2020-03-01 20:04:48 +07:00
Sergey M․ 50d19895a1
[franceculture] Fix extraction (closes #24204) 2020-03-01 19:22:09 +07:00
Sergey M․ 6d475d01d8
[telecinco] Add support for article opening videos 2020-03-01 03:09:19 +07:00
Sergey M․ f8cbd8c963
[telecinco] Fix extraction (refs #24195) 2020-03-01 01:04:51 +07:00
Sergey M․ 838f051c4b
[xtube:user] Fix test 2020-02-29 23:51:56 +07:00
Sergey M․ e88b450771
[xtube] Fix metadata extraction (closes #21073, closes #22455) 2020-02-29 23:51:34 +07:00
Sergey M․ 278355bae4
[zapiks] Fix test 2020-02-29 23:09:13 +07:00
Sergey M․ b4cbdbd4b3
[zdf:channel] Fix tests 2020-02-29 23:06:36 +07:00
Sergey M․ ea17979d83
[test_subtitles] Remove obsolete test 2020-02-29 22:08:43 +07:00
Sergey M․ 886d985959
[youjizz] Fix extraction (closes #24181) 2020-02-29 21:58:22 +07:00
Sergey M․ 7947a1f7db
Remove no longer needed compat_str around geturl 2020-02-29 19:19:24 +07:00
Sergey M․ fca6dba8b8
[YoutubeDL] Force redirect URL to unicode on python 2 2020-02-29 19:08:44 +07:00
Sergey M․ e2f8bf5888
[extractor/common] Convert ISM manifest to unicode before processing on python 2 (#24152) 2020-02-29 17:29:30 +07:00
The Hatsune Daishi b76f0e58f7
[options] Remove duplicate short option -v for --version (#24162) 2020-02-29 16:33:09 +07:00
Sergey M․ bee6451fe8
[pornhd] Fix extraction (closes #24128) 2020-02-24 04:47:56 +07:00
Sergey M․ 00d798b7c2
[teachable] Add support for multiple videos per lecture (closes #24101) 2020-02-23 06:49:45 +07:00
Sergey M․ fda6d237a5
[wistia] Add support for multiple generic embeds (closes #8347, closes #11385) 2020-02-23 06:47:11 +07:00
Sergey M․ 5d9f6cbc5a
[imdb] Fix extraction (closes #23443) 2020-02-23 04:33:29 +07:00
Martin Ström 97c822b3d5
[tv2dk:bornholm:play] Fix extraction (#24076) 2020-02-19 01:02:05 +07:00
Sergey M․ 117ba9e9df
release 2020.02.16 2020-02-16 22:43:42 +07:00
Sergey M․ 0d718db623
[ChangeLog] Actualize
[ci skip]
2020-02-16 22:40:44 +07:00
Sergey M․ 7bf27721d6
[npr] Add support for streams (closes #24042) 2020-02-15 05:35:55 +07:00
Sergey M․ f6052ec923
[24video] Add support for porn.24video.net (closes #23779, closes #23784) 2020-02-15 03:49:29 +07:00
Sergey M․ 4e9e1e240d
[test_YoutubeDL] Add tests for #10591 (closes #23873) 2020-02-15 03:37:31 +07:00
Sergey M․ e0abaab293
[test_YoutubeDL] Fix get_ids 2020-02-15 03:37:25 +07:00
jxu de1121d749
[YoutubeDL] Fix playlist entry indexing with --playlist-items (closes #10591, closes #10622) 2020-02-15 03:36:53 +07:00
Sergey M․ 293c9f0186
[jpopsuki] Remove extractor (closes #23858) 2020-02-15 02:23:29 +07:00
Sergey M․ 06f1de2daf
[nova] Improve extraction (refs #23690) 2020-02-15 02:16:26 +07:00
Sergey M․ b68a6e32fb
[nova:embed] Improve (closes #23690) 2020-02-15 02:00:58 +07:00
Jan 'Yenda' Trmal 8cd809fb3d
[nova:embed] Fix extraction (closes #23672) 2020-02-15 02:00:52 +07:00
d2au d6aa1db7ed
[abc:iview] Support 720p (#22907) (#22921) 2020-02-13 14:52:00 +01:00
Remita Amine f377edec06 [nytimes] improve format sorting(closes #24010) 2020-02-10 09:43:20 +01:00
Sergey M․ bfe2b8cf2a
[update] Fix updating via symlinks (closes #23991) 2020-02-08 19:46:58 +07:00
Sergey M․ 82fea5b42e
[compat] Introduce compat_realpath (refs #23991) 2020-02-08 19:36:55 +07:00
Xaver Hellauer fffc618c51
[toggle] Add support for mewatch.sg (closes #23895) (#23930) 2020-02-05 22:41:56 +07:00
Remita Amine 705b1cda99 [thisoldhouse] fix extraction(closes #23951) 2020-02-03 13:20:36 +01:00
Sergey M․ 7d55b62ff2
[popcorntimes] Add extractor (closes #23949) 2020-02-03 06:05:56 +07:00
Philipp Hagemeister 0d006fac5c [sportdeutschland] Update to new sportdeutschland API
They switched to SSL, but under a different host AND path...
Remove the old test cases because these videos have become unavailable.
2020-02-01 23:35:55 +01:00
Sergey M․ 00de61a98f
[twitch:stream] Lowercase channel id for stream request (closes #23917) 2020-02-01 00:32:25 +07:00
Sergey M․ d95a1cc98e
[tv5mondeplus] Fix extraction (closes #23907, closes #23911) 2020-01-31 04:58:36 +07:00
Sergey M․ 4935749730
[tva] Relax _VALID_URL (closes #23903) 2020-01-31 03:49:16 +07:00
Remita Amine 51c7f40c83 [vimeo] fix album extraction(closes #23864) 2020-01-27 23:37:29 +01:00
Remita Amine 4877ffc0e9 [viewlift] improve extraction
- fix extraction(closes #23851)
- add add support for authentication
- add support for more domains
2020-01-27 15:41:21 +01:00
Remita Amine 8e4d3f83ce [svt] fix series extraction(closes #22297) 2020-01-26 16:17:51 +01:00
Remita Amine 43e7994749 [svt] fix article extraction(closes #22897)(closes #22919) 2020-01-26 14:16:59 +01:00
Remita Amine 2a5c26c980 [soundcloud] imporve private playlist/set tracks extraction
https://github.com/ytdl-org/youtube-dl/issues/3707#issuecomment-577873539
2020-01-23 23:24:37 +01:00
Sergey M․ 76dbe4df5f
release 2020.01.24 2020-01-24 04:16:05 +07:00
Sergey M․ bffdedfabd
[ChangeLog] Actualize
[ci skip]
2020-01-24 04:14:08 +07:00
Sergey M․ c3cfea9068
[youtube] Fix sigfunc name extraction (closes #23819) 2020-01-24 04:09:10 +07:00
Remita Amine 22cb94902f [stretchinternet] fix extraction(closes #4319) 2020-01-19 21:20:56 +01:00
Remita Amine be96f9924f [voicerepublic] fix extraction 2020-01-19 20:15:02 +01:00
Remita Amine 9cf30dc017 [azmedien] fix extraction(closes #23783) 2020-01-19 19:30:48 +01:00
Remita Amine f4a18db748 [ard] add a missing condition 2020-01-19 18:28:24 +01:00
PB fd032450f0 [businessinsider] Fix jwplatform id extraction (closes #22929) (#22954) 2020-01-18 22:47:50 +07:00
Sergey M․ a4b2769451
[24video] Add support for 24video.vip (closes #23753) 2020-01-18 15:05:45 +07:00
Sergey M․ d9a2f86791
[ivi:compilation] Fix entries extraction (closes #23770) 2020-01-18 14:46:38 +07:00
Remita Amine c968f738df [ard] improve extraction(closes #23761)
- simplify extraction
- extract age limit and series
- bypass geo-restriction
2020-01-17 14:23:24 +01:00
Remita Amine 48ff5590c1 [nbc] add support for nbc multi network URLs(closes #23049) 2020-01-16 15:37:16 +01:00
Remita Amine 2c482bff7c [americastestkitchen] fix extraction 2020-01-15 14:18:04 +01:00
Remita Amine a9866c0366 [zype] improve extraction
- extract subtitles(closes #21258)
- support URLs with alternative keys/tokens(#21258)
- extract more metadata
2020-01-15 14:18:04 +01:00
Sergey M․ 90ea83c64d
[orf:tvthek] Improve geo restricted videos detection (closes #23741) 2020-01-15 04:32:05 +07:00
Sergey M․ e4e5fa6e3c
[soundcloud] Restore previews extraction (closes #23739) 2020-01-15 04:13:10 +07:00
Sergey M․ e8cf0dbdd8
release 2020.01.15 2020-01-15 01:37:29 +07:00
Sergey M․ d7c55f226d
[ChangeLog] Actualize
[ci skip]
2020-01-15 01:34:01 +07:00
Moritz Patelscheck bfdc8340c9
[yourporn] Fix extraction (closes #21645, closes #22255, closes #23459) 2020-01-15 01:28:17 +07:00
jnozsc 14bb191634 [travis] Add flake8 job (#23720) 2020-01-15 01:09:08 +07:00
Sergey M․ 628e5bc0b7
[canvas] Add support for new API endpoint and update tests (closes #17680, closes #18629) 2020-01-14 23:53:59 +07:00
Sergey M․ 3fc56635b7
[ndr:base:embed] Improve thumbnails extraction (closes #23731) 2020-01-14 21:46:56 +07:00
Remita Amine bd2c211fcc [vodplatform] add support for embed.kwikmotion.com domain 2020-01-12 17:34:57 +01:00
Remita Amine 10a5091e58 [twitter] add support for promo_video_website cards(closes #23711) 2020-01-12 12:01:59 +01:00
Sergey M․ aca2fd222f
[orf:radio] Clean description and improve extraction 2020-01-11 02:18:36 +07:00
Johannes N 9ba179c1fa [orf:fm4] Fix extraction (#23599) 2020-01-11 01:51:15 +07:00
cdarlint 3fdf573148 [safari] Fix kaltura session extraction (closes #23679) (#23670) 2020-01-11 01:34:26 +07:00
Remita Amine d4e0cd69ef [lego] fix extraction and extract subtitle(closes #23687) 2020-01-10 05:06:45 +01:00
Remita Amine 483b858d49 [cloudflarestream] import embed URL extraction 2020-01-08 23:07:41 +01:00
Remita Amine a71c1d1a5a [cloudflarestream] improve extraction
- add support for bytehighway.net domain
- add support for signed URLs
- extract thumbnail
2020-01-08 22:42:53 +01:00
Remita Amine 838171630d [naver] improve metadata extraction 2020-01-08 12:55:33 +01:00
Remita Amine c88debff5d [naver] improve extraction
- improve geo-restriction handling
- extract automatic captions
- extract uploader metadata
- extract VLive HLS formats
2020-01-08 10:59:56 +01:00
Singwai Chan 3cb05b86de [pandatv] Remove extractor (#23630) 2020-01-07 21:11:03 +07:00
Remita Amine b2771a2853 [dctp] fix format extraction(closes #23656) 2020-01-07 13:03:32 +01:00
Remita Amine 7bac77413d [scrippsnetworks] correct test case URL 2020-01-06 14:30:02 +01:00
Remita Amine 0264903574 [scrippsnetworks] add support for www.discovery.com videos 2020-01-06 14:25:54 +01:00
Remita Amine 2f7aa680b7 [discovery] fix anonymous token extraction(closes #23650) 2020-01-06 14:25:54 +01:00
Roxedus 0d2306d02b [nrktv:seriebase] Fix extraction (closes #23625) (#23537) 2020-01-06 06:34:36 +07:00
Remita Amine 233826f68f [wistia] improve format extraction and extract subtitles(closes #22590) 2020-01-05 21:09:37 +01:00
nmeum 259ad38173 [devscripts/create-github-release] Remove unused import 2020-01-06 01:26:22 +07:00
Remita Amine 44b434e4e3 [vice] improve extraction(closes #23631) 2020-01-05 16:33:21 +01:00
Sergey M․ 484637a9cc
[redtube] Detect private videos (#23518) 2020-01-02 22:45:42 +07:00
Sergey M․ ca069f6881
release 2020.01.01 2020-01-01 05:24:58 +07:00
Sergey M․ 0d5c415e1f
[devscripts/create-github-release] Switch to using PAT for authentication
Basic authentication will be deprecated soon
2020-01-01 05:20:48 +07:00
Sergey M․ d6bf9cbd46
[ChangeLog] Actualize
[ci skip]
2020-01-01 04:13:32 +07:00
Remita Amine de7aade2f8 [soundcloud] fix client id extraction for non fatal requests 2019-12-31 21:31:22 +01:00
Remita Amine 2d30b92e11 [brightcove] invalidate policy key cache on failing requests 2019-12-31 19:49:01 +01:00
Sergey M․ 0164cd5dac
[pornhub] Improve locked videos detection (closes #22449, closes #22780) 2019-12-31 23:43:43 +07:00
Sergey M․ f41347260c
[pornhub] Fix extraction and add support for m3u8 formats (closes #22749, closes #23082) 2019-12-31 23:29:06 +07:00
Remita Amine 0606808746 [brightcove] update policy key on failing requests 2019-12-31 16:44:30 +01:00
Sergey M․ 0a02732b56
[spankbang] Improve removed video detection (#23423) 2019-12-31 22:18:01 +07:00
Sergey M․ 2b845c4086
[spankbang] Fix extraction (closes #23307, closes #23423, closes #23444) 2019-12-31 22:16:39 +07:00
Remita Amine 3bed621750 [soundcloud] automatically update client id on failing requests 2019-12-31 09:49:29 +01:00
Remita Amine 0c15a56f1c [prosiebensat1] improve geo restriction handling(closes #23571) 2019-12-30 22:31:11 +01:00
Remita Amine 75ef77c1b1 [brightcove] cache brightcove player policy keys 2019-12-29 19:31:17 +01:00
Remita Amine cb7e053e0a [extractors] add missing import for ScrippsNetworksIE 2019-12-29 19:31:17 +01:00
Sergey M․ 941e359e95
[teachable] Fail with error message if no video URL found 2019-12-27 00:26:12 +07:00
Sergey M․ f8a12427a9
[teachable] Improve locked lessons detection (#23528) 2019-12-27 00:18:37 +07:00
Remita Amine 7ea55819ac [scrippsnetworks] Add new extractor(closes #19857)(closes #22981) 2019-12-26 15:25:04 +01:00
Remita Amine 18ff573e50 [mitele] fix extraction(closes #21354)(closes #23456) 2019-12-25 20:02:31 +01:00
Sergey M․ d1b2722095
[soundcloud] Update client id (closes #23516) 2019-12-25 22:39:50 +07:00
Sergey M․ 278be57be2
[mailru] Relax _VALID_URLs (#23509) 2019-12-25 04:28:34 +07:00
Sergey M․ 80e43af5bf
release 2019.12.25 2019-12-25 01:16:49 +07:00
Sergey M․ b1a92520a3
[ChangeLog] Actualize
[ci skip]
2019-12-25 00:52:11 +07:00
Sergey M․ 9b6e72fd06
[mediaset] Fix parse formats (closes #23508) 2019-12-24 23:51:08 +07:00
Sergey M․ 2dbc0967f2
[ChangeLog] Actualize
[ci skip]
2019-12-16 00:40:34 +07:00
Sergey M․ fab01080f4
[tv2dk:bornholm:play] Add extractor (closes #23291) 2019-12-16 00:08:18 +07:00
Sergey M․ 42db58ec73
[utils] Improve str_to_int 2019-12-15 23:15:24 +07:00
Remita Amine 73d8f3a634 [slideslive] add support for url and vimeo service names(closes #23414) 2019-12-14 21:35:31 +01:00
Remita Amine b33a05d221 [slideslive] fix extraction(closes #23413) 2019-12-14 19:29:04 +01:00
Remita Amine 232ed8e6e0 [twitch] fix clip extraction(closes #23375) 2019-12-13 11:00:31 +01:00
Remita Amine cf80ff186e [soundcloud] add support for token protected embeds(#18954) 2019-12-09 14:38:12 +01:00
Remita Amine 0e6ec3caf6 [vk] improve extraction
- fix User Videos extraction(closes #23356)
- extract all videos for lists with more than 1000 videos(#23356)
- add support for video albums(closes #14327)(closes #14492)
2019-12-09 09:13:02 +01:00
Remita Amine d686cab084 [kontrtube] remove extractor 2019-12-08 12:38:21 +01:00
Remita Amine 9d4424afaa [videopremium] remove extractor 2019-12-08 11:54:16 +01:00
Remita Amine ce709fcb00 [musicplayon] remove extractor(closes #9225) 2019-12-07 20:17:30 +01:00
Remita Amine 6633103f8e [ufctv] add support for ufcfightpass.imgdge.com and ufcfightpass.imggaming.com domains(closes #23343) 2019-12-07 19:23:19 +01:00
Remita Amine 1d31b7ca04 [twitch] extract m3u8 formats frame rate(closes #23333) 2019-12-06 15:34:35 +01:00
Remita Amine 4067a23270 [ufctv] add support for more domains and remove compatibility code(closes #23332) 2019-12-06 11:04:12 +01:00
Remita Amine 7d53fa475a [imggaming] add support for playlists and extract subtitles 2019-12-04 20:56:23 +01:00
Remita Amine 3ae878605d [ufctv] fix extraction and add support for UFC Arabia(closes #23312) 2019-12-04 17:20:53 +01:00
Remita Amine 22974a3782 [yahoo] correct gyao brightcove player id(closes #23303) 2019-12-03 21:13:44 +01:00
Remita Amine 63fe44eb4d [vzaar] update test 2019-12-03 12:31:16 +01:00
Remita Amine c712b16dc4 [vzaar] override AES decryption key URL(closes #17521) 2019-12-03 12:23:08 +01:00
Remita Amine 6797de75e0 [vzaar] add support for AES HLS manifests(closes #17521)(closes #23299) 2019-12-03 11:37:30 +01:00
Remita Amine 12cc89122d [nrl] fix extraction 2019-11-30 23:50:28 +01:00
Remita Amine 3765284476 [teachingchannel] fix extraction 2019-11-30 23:49:45 +01:00
Remita Amine ddfe50195b [nintendo] fix extraction and partially add support for Nintendo Direct videos(#4592) 2019-11-30 23:48:26 +01:00
Remita Amine 1ed2c4b378 [ooyala] add better fallback values for domain and streams variables 2019-11-30 23:21:13 +01:00
Remita Amine 66b4872747 [youtube] add support youtubekids.com(closes #23272) 2019-11-30 17:51:34 +01:00
Remita Amine 0b25af9bf5 [tv2] detect DRM protection 2019-11-30 15:50:17 +01:00
Remita Amine 8d3a3a9901 [tv2] add support for mtv.fi and fix tv2.no article extraction(closes #10543) 2019-11-30 15:26:12 +01:00
Remita Amine c0b1e01330 [msn] improve extraction
- add support for YouTube and NBCSports embeds
- add support for aricles with multiple videos
- improve AOL embed support
- improve format extraction
2019-11-29 17:39:18 +01:00
Remita Amine 88a7a9089a [abcotvs] relax _VALID_URL regex and improve metadata extraction(closes #18014) 2019-11-29 17:39:18 +01:00
Remita Amine a15adbe461 [channel9] reduce response size and update tests 2019-11-29 17:39:18 +01:00
Remita Amine 7f641d2c7a [adobetv] improve extaction
- use OnDemandPagedList for list extractors
- reduce show extraction requests
- extract original video format and subtitles
- add support for adobe tv embeds
2019-11-29 17:39:18 +01:00
Remita Amine 348c6bf1c1 [utils] handle int values passed to str_to_int 2019-11-29 17:39:18 +01:00
Sergey M․ b568561eba
release 2019.11.28 2019-11-28 23:25:25 +07:00
Sergey M․ e3f00f139f
[ChangeLog] Actualize
[ci skip]
2019-11-28 23:09:48 +07:00
Remita Amine 681ac7c92a [vimeo] improve extraction
- fix review extraction
- fix ondemand extraction
- make password protected player case as an expected error(closes #22896)
- simplify channel based extractors code
2019-11-27 13:57:30 +01:00
Remita Amine 6471d0d3b8 [openload] remove OpenLoad related extractors(closes #11999)(closes #15406) 2019-11-26 23:57:37 +01:00
Remita Amine 5ef62fc4ce [dailymotion] improve extraction
- extract http formats included in m3u8 manifest
- fix user extraction(closes #3553)(closes #21415)
- add suport for User Authentication(closes #11491)
- fix password protected videos extraction(closes #23176)
- respect age limit option and family filter cookie value(closes #18437)
- handle video url playlist query param
- report alowed countries for geo-restricted videos
2019-11-26 22:18:21 +01:00
Remita Amine df65a4a1ed [corus] improve extraction
- add support for Series Plus, W Network, YTV, ABC Spark, disneychannel.com
  and disneylachaine.ca(closes #20861)
- add support for self hosted videos(closes #22075)
- detect DRM protection(closes #14910)(closes #9164)
2019-11-26 22:18:21 +01:00
Sergey M․ edc2a1f68b
[vivo] Fix extraction (closes #22328, closes #22279) 2019-11-27 02:28:06 +07:00
Sergey M․ 1ced222120
[utils] Add generic caesar cipher and rot47 2019-11-27 02:26:42 +07:00
InfernalUnderling 6ddd4bf6ac [bitchute] Extract upload date (closes #22990) (#23193) 2019-11-27 00:20:39 +07:00
InfernalUnderling 9d30c2132a [utils] Handle rd-suffixed day parts in unified_strdate (#23199) 2019-11-27 00:08:37 +07:00
Sergey M․ cf3c9eafad
[soundcloud] Update client id (closes #23214) 2019-11-27 00:03:51 +07:00
Sergey M․ 0de9fd24dc
release 2019.11.22 2019-11-22 01:24:27 +07:00
Sergey M․ fb8dfc5a27
[ChangeLog] Actualize
[ci skip]
2019-11-22 01:21:00 +07:00
Sergey M․ 80a51fc2ef
[ivi] Skip s353 for bundled exe
See https://github.com/Legrandin/pycryptodome/issues/228
2019-11-22 01:10:24 +07:00
Sergey M․ f8015c1574
[ivi] Fix python 3.4 support 2019-11-21 23:38:39 +07:00
Sergey M․ 25d3f770e6
[ivi] Ask for pycryptodomex instead of pycryptodome
See discussion at 1bba88efc7 (r35982110)
2019-11-21 23:22:59 +07:00
Sergey M․ f0f6a7e73f
[chaturbate] Fix extraction (closes #23010, closes #23012) 2019-11-21 23:21:03 +07:00
Remita Amine 76d9eca43d [ivi] fallback to old extraction method for unknown error codes 2019-11-19 20:16:31 +01:00
Remita Amine f9c4a45210 [ntvru] add support for non relative file URLs(closes #23140) 2019-11-18 21:40:53 +01:00
Remita Amine 7e70620a34 [vk] fix wall audio thumbnails extraction(closes #23135) 2019-11-18 12:51:25 +01:00
Remita Amine 9e4e864639 [ivi] improve error detection 2019-11-16 01:51:48 +01:00
Sergey M․ 6c79785bb0
[travis] Add python 3.8 build 2019-11-16 07:47:23 +07:00
Sergey M․ 7360c06fac
[extractor/common] Add data, headers and query to all major extract methods preserving standard order for potential future use 2019-11-16 05:55:54 +07:00
Remita Amine 1bba88efc7 [ivi] sign content request only when pycryptodome is available 2019-11-15 23:46:31 +01:00
Remita Amine 656c20010f [ivi] fix format extraction(closes #21991) 2019-11-15 21:17:47 +01:00
Remita Amine 8b1a30c993 [comcarcoff] remove extractor 2019-11-14 06:39:21 +01:00
Sergey M․ 5709d661a2
[drtv] Add support for new URL schema (closes #23059) 2019-11-14 01:45:04 +07:00
Remita Amine eb22d1b557 [nexx] Add support for Multi Player JS Setup(closes #23052) 2019-11-13 19:09:32 +01:00
Remita Amine 48970d5cc8 [teamcoco] add support for new videos(closes #23054) 2019-11-12 10:51:54 +01:00
Remita Amine 2e9ad59a4d [soundcloud] check if the soundtrack has downloads left(closes #23045) 2019-11-11 09:53:04 +01:00
Remita Amine 433e071058 [facebook] fix posts video data extraction(closes #22473) 2019-11-10 17:02:47 +01:00
Remita Amine 9e46d1f8aa [addanime] remove extractor 2019-11-09 17:15:15 +01:00
Remita Amine 88b87b08b1 [minhateca] remove extractor 2019-11-09 17:01:21 +01:00
Remita Amine 20baa17c01 [daisuki] remove extractor 2019-11-09 16:00:12 +01:00
Remita Amine 8fbf5d2f87 [seeker] remove Revision3 extractors and fix extraction 2019-11-09 13:14:23 +01:00
Remita Amine f81dd65ba2 [extractor/common] clean jwplayer description HTML tags 2019-11-09 13:11:59 +01:00
Remita Amine ce112a8c19 [twitch] fix video comments URL(#18593)(closes #15828) 2019-11-09 11:01:07 +01:00
Remita Amine 18ca61c5e1 [twitter] improve extraction
- add support for generic embeds(closes #22168)
- always extract http formats for native videos(closes #14934)
- add support for Twitter Broadcasts(closes #21369)
- extract more metadata
- improve VMap format extraction
- unify extraction code for both twitter statuses and cards
2019-11-09 09:23:20 +01:00
Remita Amine 0b16b3c2d3 [twitch] add support for Clip embed URLs 2019-11-09 09:22:24 +01:00
Remita Amine d4f53af482 [lnkgo] fix extraction(closes #16834) 2019-11-06 23:14:26 +01:00
Remita Amine 5d92b407e0 [mixcloud] improve extraction
- improve metadata extraction(closes #11721)
- fix playlist extraction(closes #22378)
- fix user mixes extraction(closes #15197)(closes #17865)
2019-11-06 20:41:49 +01:00
Remita Amine 55adb63e54 [kinja] add support for Kinja embeds
closes #5756
closes #11282
closes #22237
closes #22384
2019-11-06 19:56:10 +01:00
Remita Amine d64ec1242e [onionstudios] fix extraction 2019-11-06 10:44:19 +01:00
Remita Amine 3ec86619e3 [common] initialize headers param with empty dict 2019-11-06 07:18:29 +01:00
Remita Amine 57033e35e5 [common] fix typo 2019-11-05 23:41:57 +01:00
Remita Amine d7def23d05 [hotstar] pass Referer header to format requests(closes #22836) 2019-11-05 23:08:42 +01:00
Remita Amine b6139cb0c3 [common] pass headers to _extract_(m3u8|mpd)_formats methods 2019-11-05 22:56:25 +01:00
Remita Amine 2318629b2b [dplay] minimize response size 2019-11-05 14:04:50 +01:00
Remita Amine b77c3949e8 [patreon] minimize reponse size and extract uploader_id and filesize 2019-11-05 14:04:17 +01:00
Remita Amine e9b95167af [roosterteeth] fix login request(closes #16094)(closes #22689) 2019-11-05 10:06:02 +01:00
Sergey M․ ea07412ebf
release 2019.11.05 2019-11-05 05:32:56 +07:00
Sergey M․ 1a4e4b0bfe
[ChangeLog] Actualize
[ci skip]
2019-11-05 05:31:40 +07:00
Sergey M․ 20218040db
[scte] Add extractor (closes #22975) 2019-11-05 05:21:16 +07:00
Remita Amine c69e71733d [msn] add support for Vidible and AOL embeds(closes #22195)(closes #22227) 2019-11-04 22:21:00 +01:00
Remita Amine 3e49083604 [myspass] fix video URL extraction and improve metadata extraction(closes #22448) 2019-11-04 20:05:27 +01:00
Remita Amine 2349255abd [jamendo] restore track url modification 2019-11-04 15:51:44 +01:00
Remita Amine e452345fc5 [jamendo] improve extraction
- fix album extraction(closes #18564)
- improve metadata extraction(closes #18565)(closes #21379)
2019-11-04 15:43:52 +01:00
Remita Amine bf45295c53 [mediaset] relax URL guid matching(closes #18352) 2019-11-04 11:13:14 +01:00
Remita Amine ef382405c6 [mediaset] extract unprotected M3U and MPD manifests(closes #17204) 2019-11-04 02:02:29 +01:00
Manu Cornet a6e6673e82 [README.md] Also read permission to the binary in how to update section (#22903) 2019-11-04 04:23:27 +07:00
Remita Amine 564275e26f [telegraaf] fix extraction 2019-11-03 22:04:03 +01:00
Remita Amine 726e8eef59 [bellmedia] add support for marilyn.ca videos(#22193) 2019-11-02 22:33:51 +01:00
Remita Amine e54924c46f [stv] fix extraction(closes #22928) 2019-11-02 18:13:31 +01:00
Remita Amine 5e36b63486 [iconosquare] remove extractor 2019-11-02 13:25:39 +01:00
Remita Amine 9249c50c18 [keek] remove extractor 2019-11-02 13:09:44 +01:00
geditorit 79b35e7c15 [gameone] Remove extractor (#21778) 2019-11-02 11:32:49 +00:00
Remita Amine 836bfcb54e [flipagram] remove extractor 2019-11-02 11:08:51 +01:00
Remita Amine 4c95fcf9e8 [bambuser] remove extractor
https://web.archive.org/web/20190808014227/https://go.bambuser.com/shutdown-announcement
2019-11-01 21:16:47 +01:00
Remita Amine 152f22920d [wistia] reduce embed extraction false positives and support inline embeds(closes #22931) 2019-11-01 17:44:34 +01:00
Remita Amine 20cc7c082b [go90] remove extractor 2019-11-01 16:36:35 +01:00
Remita Amine e987ce4bda [kakao] remove raw request and extract format total bitrate 2019-11-01 12:40:41 +01:00
Remita Amine d439989215 [daum] fix VOD and Clip extracton(closes #15015) 2019-11-01 11:43:18 +01:00
Remita Amine 274bf5e4c5 [kakao] improve extraction
- support embed URLs
- support Kakao Legacy vid based embed URLs
- only extract fields used for extraction
- strip description and extract tags
2019-11-01 11:37:41 +01:00
Remita Amine e993f1a095 [mixcloud] fix cloudcast data extraction(closes #22821) 2019-10-31 08:13:10 +01:00
Remita Amine 3cf70bf159 [yahoo] make cbs URL suffix part of the media alias 2019-10-31 07:44:21 +01:00
Remita Amine 237513e801 [yahoo] restore support for cbs suffixed URLs 2019-10-31 07:38:53 +01:00
Remita Amine 8040a0d35e [yahoo] fix typo 2019-10-30 23:52:09 +01:00
Remita Amine 45f4a43389 [yahoo] improve extraction
- add support for live streams(closes #3597)(closes #3779)(closes #22178)
- bypass cookie consent page for european domains(closes #16948)(closes #22576)
- add generic support for embeds(closes #20332)
2019-10-30 23:24:49 +01:00
Sergey M․ 9a621ddc3a
[tv2] Fix and improve extraction (closes #22787) 2019-10-30 02:21:52 +07:00
Sergey M․ c56b2ac43c
[tv2dk] Add extractor 2019-10-30 02:21:03 +07:00
Remita Amine 8989349e6d [onet] improve extraction
- add support for onet100.vod.pl domain
- extract m3u8 formats
- correct audio only format info
2019-10-29 09:50:01 +01:00
Remita Amine 7455832f31 [fox9] fix extraction 2019-10-29 09:50:00 +01:00
Sergey M․ c4bd9cb7bb
release 2019.10.29 2019-10-29 06:12:33 +07:00
Sergey M․ cae0bbc538
[ChangeLog] Actualize
[ci skip]
2019-10-29 06:11:09 +07:00
Sergey M․ 53896ca5be
[utils] Actualize major IPv4 address blocks per country 2019-10-29 06:10:20 +07:00
Sergey M․ 0d7392e68b
[ChangeLog] Actualize
[ci skip]
2019-10-29 05:54:32 +07:00
Sergey M․ aef9f87ea4
[go] Improve and beautify _VALID_URL 2019-10-29 05:52:15 +07:00
Sergey M․ dd90a21c28
[go] Add support for abc.com and freeform.com (closes #22823, closes #22864) 2019-10-29 05:49:36 +07:00
Remita Amine 01358b9fc1 [extractors] add import for MTVJapanIE 2019-10-28 23:34:31 +01:00
Remita Amine 3cdcebf547 [mtv] add support for mtvjapan.com 2019-10-28 23:31:14 +01:00
Remita Amine cfabc50598 [mtv] fix extraction for mtv.de (closes #22113) 2019-10-28 22:55:01 +01:00
Remita Amine 0086726e86 [videodetective] fix extraction 2019-10-28 19:48:34 +01:00
Remita Amine 83e49259bf [internetvideoarchive] fix extraction 2019-10-28 19:47:27 +01:00
Remita Amine 895e5c03db [nbcnews] fix extraction
closes #12569
closes #12576
closes #21703
closes #21923
2019-10-28 19:31:20 +01:00
Remita Amine 702984eca9 [hark] remove extractor 2019-10-28 17:49:05 +01:00
Remita Amine b3c2fa6dad [tutv] remove extractor 2019-10-28 17:42:33 +01:00
Remita Amine 831b732da1 [learnr] remove extractor 2019-10-28 17:41:17 +01:00
Remita Amine 3e252cca0e [macgamestore] remove extractor
Covered by generic extractor
2019-10-28 17:39:01 +01:00
Remita Amine 0f9d53566a [la7] update Kaltura service URL(closes #22358) 2019-10-28 15:17:06 +01:00
Remita Amine 80c2126e80 [thesun] fix extraction(closes #16966) 2019-10-28 13:32:35 +01:00
Remita Amine 71fa0b04f9 [makertv] remove extractor 2019-10-28 13:30:30 +01:00
Remita Amine dd90451f0f [tenplay] Add new extractor(closes #21446) 2019-10-27 22:02:46 +01:00
Remita Amine 548c395716 [soundcloud] improve extraction
- improve format extraction(closes #22123)
- extract uploader_id and uploader_url(closes #21916)
- extract all known thumbnails(closes #19071)(closes #20659)
- fix extration for private playlists(closes #20976)
- add support for playlist embeds(#20976)
- skip preview formats(closes #22806)
2019-10-27 17:52:46 +01:00
Remita Amine 0b98f3a751 [dplay] improve extraction
- add support for dplay.fi, dplay.jp and es.dplay.com(closes #16969)
- fix it.dplay.com extraction(closes #22826)
- update tests
- extract creator, tags and thumbnails
- handle playback API call errors
2019-10-26 14:58:29 +01:00
Remita Amine 235dbb434b [discoverynetworks] add support for dplay.co.uk 2019-10-26 14:57:42 +01:00
Remita Amine 42cd0824b3 [vk] remove assert statement 2019-10-26 00:06:05 +01:00
Remita Amine 3c989818e7 [vk] improve extraction
- add support for Odnoklassniki embeds
- update tests
- extract more video from user lists(closes #4470)
- fix wall post audio extraction(closes #18332)
- improve error detection(closes #22568)
2019-10-25 19:35:07 +01:00
Remita Amine 416c3ca7f5 [odnoklassniki] add support for Schemeless embed extraction 2019-10-25 19:27:28 +01:00
Remita Amine 162bcc68dc [puhutv] improve extraction
- fix subtitles extraction
- transform HLS URLs to http URLs
- improve metadata extraction
2019-10-24 12:53:33 +01:00
Remita Amine 07154c7930 [facebook] extract subtitles(closes #22777) 2019-10-22 17:59:14 +01:00
Remita Amine 0c2d10d225 [globo] handle alternative hash signing method 2019-10-22 17:59:14 +01:00
Sergey M․ 820215f0e3
release 2019.10.22 2019-10-22 00:09:02 +07:00
Sergey M․ b4818e3c7a
[ChangeLog] Actualize
[ci skip]
2019-10-22 00:06:48 +07:00
Sergey M․ 2297c0d7d9
[facebook] Bypass download rate limits (closes #21018) 2019-10-19 23:56:36 +07:00
Sergey M․ 824fa51165
[utils] Improve subtitles_filename (closes #22753) 2019-10-18 04:03:53 +07:00
Remita Amine 34e3885bc9 [viewster->contv] remove viewster extractor and add support for contv.com 2019-10-17 15:55:44 +01:00
Remita Amine 59296bae7e [xfileshare] clean extractor
- update the list of domains
- add support for aa-encoded video data
- improve jwplayer format extraction
- add support for Clappr sources

closes #17032
closes #17906
closes #18237
closes #18239
2019-10-17 13:26:45 +01:00
Remita Amine 755541a4c8 [mangomolo] fix video format extraction and add support for player URLs 2019-10-17 13:21:44 +01:00
Remita Amine 86f63633c8 [audioboom] improve metadata extraction 2019-10-17 13:20:16 +01:00
Remita Amine 0001157594 [atresplayer] Add coding cookie 2019-10-16 23:57:40 +01:00
MobiDotS bc48773ed4 [twitch] update VOD URL matching (closes #22395) (#22727) 2019-10-16 15:13:35 +00:00
Remita Amine d07866f13e [mit] Remove support for video.mit.edu(closes #22403) 2019-10-16 15:45:45 +01:00
Remita Amine 2b115b9460 [servingsys] Remove extractor(closes #22639) 2019-10-16 15:41:58 +01:00
Remita Amine e29e96a9f5 [dumpert] fix extraction(closes #22428)(closes #22564) 2019-10-16 15:06:48 +01:00
Remita Amine 6d394a66f5 [atresplayer] fix extraction(closes #16277)(closes #16716) 2019-10-16 12:04:52 +01:00
Sergey M․ 7815d6b743
release 2019.10.16 2019-10-16 03:26:47 +07:00
Sergey M․ 173190f5e3
[ChangeLog] Actualize
[ci skip]
2019-10-16 03:25:13 +07:00
Remita Amine 974311b5aa [vimeo] improve album videos id extraction(closes #22599) 2019-10-15 21:01:59 +01:00
Remita Amine 30eb05cb41 [globo] extract subtitles(closes #22713) 2019-10-15 19:54:53 +01:00
Remita Amine 2af01c0293 [bokecc] improve player params extraction(closes #22638) 2019-10-15 15:18:51 +01:00
Remita Amine 7e05df71b7 [nexx] handle result list(closes #22666) 2019-10-15 00:10:22 +01:00
Remita Amine a1ee23e98f [vimeo] fix VHX embed extraction 2019-10-14 18:37:35 +01:00
Remita Amine 311ee45731 [nbc] switch to graphql api(closes #18581)(closes #22693)(closes #22701) 2019-10-14 18:36:25 +01:00
Remita Amine c317b6163b [vessel] remove extractor 2019-10-10 00:01:51 +01:00
Sergey M․ 2765c47a8c
[promptfile] Remove extractor (closes #6239) 2019-10-10 03:40:01 +07:00
Sergey M․ 07b50f616e
[kaltura] Fix service URL extraction (closes #22658) 2019-10-10 00:24:03 +07:00
Sergey M․ 1907f06e7b
[kaltura] Fix embed info strip (refs #22658) 2019-10-10 00:11:41 +07:00
Remita Amine d4bb825b83 [globo] fix format extraction(closes #20319) 2019-10-09 11:08:28 +01:00
Sergey M․ 560d3b7d7c
[redtube] Improve metadata extraction (closes #22492, closes #22615) 2019-10-05 22:04:49 +07:00
Sergey M․ 4bf568d36c
[pornhub:uservideos:upload] Fix extraction (closes #22619) 2019-10-05 21:43:31 +07:00
Sergey M․ 05446d483d
[telequebec:squat] Add support for squat.telequebec.tv (closes #18503) 2019-10-04 20:17:18 +07:00
bitraid 3a37f2c3be [wimp] Remove extractor (closes #22088) (#22091) 2019-10-04 19:48:20 +07:00
Anh Nhan Nguyen 0b87beefe6 [gfycat] Extend _VALID_URL (#22225) 2019-10-04 19:27:58 +07:00
axelerometer fd4db1ebc2 [chaturbate] Extend _VALID_URL (#22309) 2019-10-04 19:22:01 +07:00
Andrew Morgan b64045cd2a [peertube] Update instances (#22414) 2019-10-04 19:17:16 +07:00
Patrice Levesque c2915de82e [telequebec] Add support for coucou.telequebec.tv (#22482) 2019-10-04 19:14:31 +07:00
Stephan 4e72d02f39 [xvideos] Extend _VALID_URL (#22471) 2019-10-04 19:05:35 +07:00
sofutru 76e510b92c [youtube] Remove support for invidious.enkirton.net (#22543) 2019-10-04 19:01:03 +07:00
kr4ssi 9679a62a28 [openload] Add support for oload.monster (#22592) 2019-10-04 18:57:51 +07:00
Martin Polden ca20b13048 [nrktv:seriebase] Fix extraction (#22596) 2019-10-04 18:57:18 +07:00
sofutru 894b3826f5 [youtube] Add support for yt.lelux.fi (#22597) 2019-10-04 18:52:15 +07:00
Sergey M․ aaf9d904aa
[orf:tvthek] Make manifest requests non fatal (refs #22578) 2019-10-03 00:55:46 +07:00
Sergey M․ 25e911a968
[extractor/common] Make _is_valid_url more relaxed 2019-10-03 00:53:07 +07:00
Sergey M․ 74bc299453
[teachable] Skip login when already logged in (closes #22572) 2019-10-02 02:03:22 +07:00
Sergey M․ 2906631e12
[viewlift] Fix URL matching 2019-10-01 23:18:11 +07:00
Sergey M․ 326ae4ff96
[viewlift] Improve extraction (closes #22545) 2019-09-29 23:03:39 +07:00
Sergey M․ 72fd4d0c6a
[nonktube] Fix extraction (closes #22544) 2019-09-29 21:57:08 +07:00
Sergey M․ f4b865c613
release 2019.09.28 2019-09-28 00:30:30 +07:00
Sergey M․ 412f44f4b3
[ChangeLog] Actualize
[ci skip]
2019-09-28 00:23:25 +07:00
Sergey M․ 6483fbd336
[vk] Fix extraction (closes #22522) 2019-09-28 00:04:52 +07:00
Sergey M․ 8130ac42e5
[openload] PEP 8 2019-09-26 23:15:06 +07:00
Sergey M․ cb3e4a2947
[heise] Fix kaltura embeds extraction (closes #22514) 2019-09-26 23:11:02 +07:00
Remita Amine 2a88a0c44d [ted] check for resources validity and extract subtitled downloads(closes #22513) 2019-09-26 11:44:57 +01:00
sofutru 33c1c7d80f [youtube] Add support for owxfohz4kjyv25fvlqilyxast7inivgiktls3th44jhk3ej3i7ya.b32.i2p (#22292) 2019-09-25 02:43:34 +07:00
Sergey M․ 21d3c21e62
[nhk] Add support for clips 2019-09-25 02:39:25 +07:00
Remita Amine a373befa25 [nhk] fix video extraction(closes #22249)(closes #22353) 2019-09-24 20:24:17 +01:00
Sergey M․ df63cafe49
[byutv] Fix extraction (refs #22070)
Downloading of new videos does not work due to DRM
2019-09-25 02:16:25 +07:00
Sergey M․ d06daf23da
[YoutubeDL] Honour all --get-* options with --flat-playlist (closes #22493) 2019-09-25 02:10:37 +07:00
smed79 8e9fdcbe27 [openload] Add support for oload.online (#22304) 2019-09-24 23:56:12 +07:00
sofutru 666d808e70 [youtube] Add support for invidious.drycat.fr (#22451) 2019-09-24 23:16:46 +07:00
ipaha 7d327fea5b [jwplatfom] do not match video URLs(#20596) (#22148) 2019-09-23 19:44:00 +00:00
Sergey M․ 4e3f1f0469
[youtube:playlist] Unescape playlist uploader (closes #22483) 2019-09-23 00:20:52 +07:00
Remita Amine 4bc15a68d1 [bilibili] add support audio albums and songs(closes #21094) 2019-09-22 17:14:18 +01:00
Remita Amine edb2820ca5 [instagram] add support for tv URLs 2019-09-21 21:57:45 +01:00
Remita Amine 6cf6b357f5 [mixcloud] allow uppercase letters in format urls(closes #19280) 2019-09-20 11:14:24 +01:00
Remita Amine f455a934e9 [brightcove] delegate all supported BrightcoveLegacyIE URLs to BrightcoveNewIE
closes #11523
closes #12842
closes #13912
closes #15669
closes #16303
2019-09-19 18:02:26 +01:00
Sergey M․ d9d3098675
[hotstar] Use native HLS downloader by default 2019-09-19 03:03:07 +07:00
Sergey M․ 1cb812d3c2
[hotstar] Extract more formats (closes #22323) 2019-09-19 03:00:19 +07:00
Sergey M․ 6fd26a7d4a
[9now] Fix extraction (closes #22361) 2019-09-19 02:31:39 +07:00
Sergey M․ 9cf26b6e1d
[zdf] Bypass geo restriction 2019-09-19 01:11:52 +07:00
Sergey M․ 20e11b70ac
[tv4] Fix extraction and extract series metadata (closes #22443) 2019-09-18 23:45:26 +07:00
Sergey M․ e1f692f0b3
release 2019.09.12.1 2019-09-12 02:53:52 +07:00
Sergey M․ 2f851a7d7d
[ChangeLog] Actualize
[ci skip]
2019-09-12 02:48:07 +07:00
Sergey M․ 4878759f3b
[youtube] Remove quality and tbr for itag 43 (closes #22372) 2019-09-12 02:46:12 +07:00
Sergey M․ 303d3e142c
[ChangeLog] Actualize
[ci skip]
2019-09-12 02:05:54 +07:00
Sergey M․ bd10b229c0
release 2019.09.12 2019-09-12 01:21:21 +07:00
Sergey M․ 035c7a59e8
[ChangeLog] Actualize
[ci skip]
2019-09-12 01:18:25 +07:00
Sergey M․ bf1317d257
[youtube] Quick extraction tempfix (closes #22367, closes #22163) 2019-09-11 22:44:47 +07:00
sofutru bff90fc518 [youtube] Add support for invidious tor instances (#22268) 2019-09-03 01:35:32 +07:00
Sergey M․ 31dbd054c8
[platzi] Improve client data extraction (closes #22290) 2019-09-03 01:24:20 +07:00
Sergey M․ 66d04c74e0
[platzi:course] Add support for authentication 2019-09-03 01:23:22 +07:00
Patrick Dessalle d7da1e37c7 [nickjr] Add support for nickelodeonjunior.fr (#22246) 2019-09-02 00:59:57 +07:00
Sergey M․ f620d0d860
release 2019.09.01 2019-09-01 03:33:02 +07:00
Sergey M․ 79dd8884bb
[ChangeLog] Actualize
[ci skip]
2019-09-01 03:18:35 +07:00
Sergey M․ df228355fd
[xhamster:user] Add extractor (closes #16330, closes #18454) 2019-09-01 03:12:56 +07:00
Sergey M․ 8945b10f6e
[xhamster] Add support for more domains 2019-09-01 03:09:04 +07:00
Sergey M․ 7cb51b5daf
[extractor/generic] Improve squarespace detection and fix test (closes #21859, refs #21294, refs #21802) 2019-09-01 01:25:48 +07:00
Barbara Miller d78657fd18
[extractor/generic] Add support for squarespace embeds (closes #21294) 2019-09-01 01:25:48 +07:00
Sergey M․ cc73d5ad15
[openload] Fix domains regex 2019-09-01 01:25:48 +07:00
telephono 71f47617c8 [downloader/external] Respect mtime option for aria2c (#22242) 2019-09-01 00:24:43 +07:00
Remita Amine 3f46a25a97 [verystream] add support for woof.tube (closes #22217) 2019-08-31 10:02:09 +01:00
Sergey M․ 9d058b3206
[dailymotion] Add support for lequipe.fr (closes #21328, closes #22152) 2019-08-29 23:08:19 +07:00
Sergey M․ b500955a58
[openload] Add support for oload.vip (closes #22205) 2019-08-28 01:58:07 +07:00
Jay acc86c9a97
[bbc] Fix some tests 2019-08-28 01:53:40 +07:00
Jay b72305f078
[bbccouk] Extend _VALID_URL (closes #19200) 2019-08-28 01:53:40 +07:00
sofutru 494d664e67 [youtube] Add support for invidious.nixnet.xyz and yt.elukerio.org (#22223) 2019-08-28 01:39:59 +07:00
phan-ctrl d1fcf255c5 [safari] Fix authentication (closes #22161) (#22184) 2019-08-27 10:16:04 +07:00
Sergey M․ 183a18c4e7
[usanetwork] Fix extraction (closes #22105) 2019-08-26 03:38:54 +07:00
supritkumar 393cc31d5e [einthusan] Add support for einthusan.ca (#22171) 2019-08-21 09:52:59 +07:00
Sergey M․ 0add33abcb
[youtube] Improve unavailable message extraction (refs #22117) 2019-08-16 23:44:11 +07:00
Chuck Cho 0326bcb6c1 [piksel] add subtitle capability (#20506) 2019-08-15 22:14:47 +00:00
Sergey M․ def849e0e6
release 2019.08.13 2019-08-13 23:18:38 +07:00
Sergey M․ 69611a1616
[ChangeLog] Actualize
[ci skip]
2019-08-13 23:10:05 +07:00
Sergey M․ 351f37c022
[youtube:playlist] Improve flat extraction (closes #21927) 2019-08-13 05:02:52 +07:00
lightmare 3bce4ff7d9 [downloader/fragment] Fix ETA calculation of resumed download (#21992) 2019-08-11 06:57:43 +07:00
Remita Amine ffddb11264 [YoutubeDL] check annotations availabilty(closes #18582) 2019-08-09 08:19:41 +01:00
Remita Amine 64b6a4e91e [youtube] fix annotations extraction(closes #22045) 2019-08-09 08:16:53 +01:00
Remita Amine b3d39be239 [discovery] extract series meta field(#21808) 2019-08-08 23:23:58 +01:00
Sergey M․ 1357734978
[youtube] Improve error detection (#16445) 2019-08-06 02:32:44 +07:00
Remita Amine eb9c9c74a6 [vimeo] fix album extraction
closes #1933
closes #15704
closes #15855
closes #18967
closes #21986
2019-08-03 10:29:20 +01:00
Remita Amine 5efbc1366f [roosterteeth] add support for watch URLs 2019-08-02 19:38:35 +01:00
Remita Amine 995f319b06 [discovery] limit video data by show slug(closes #21980) 2019-08-02 18:08:26 +01:00
Sergey M d9d3a5a816
[README.md] Move code from #21939 to the right place 2019-08-02 05:54:56 +07:00
Sergey M․ 4f2d735803
release 2019.08.02 2019-08-02 05:37:54 +07:00
Sergey M․ 2e9522b061
[ChangeLog] Actualize
[ci skip]
2019-08-02 05:36:32 +07:00
Sergey M․ be306d6a31
[tvigle] Fix extraction and add support for HLS and DASH formats (closes #21967) 2019-08-02 05:25:01 +07:00
Sergey M․ 33b529fabd
[yandexvideo] Add support for DASH formats (#21971) 2019-08-02 05:03:25 +07:00
Kyle 07f3a05c87 [CONTRIBUTING.md] Add some more coding conventions (#21939) 2019-08-02 04:49:01 +07:00
Remita Amine 535111657b [discovery] use API call for video data extraction(#21808) 2019-08-01 22:45:10 +01:00
cantandwont 826dcff99c Output batch filename when it could not be read (#21915) 2019-08-01 03:54:39 +07:00
Sen Jiang 9a37ff82f1 [mgtv] Extract format_note (#21881)
format_note should now show 标清, 高清, 超清, 蓝光, etc.
2019-08-01 03:45:02 +07:00
Sergey M․ 766c4f6090
[tvn24] Fix test 2019-07-31 02:32:45 +07:00
Sergey M․ 7279163412
[tvn24] Fix metadata extraction (closes #21833, closes #21834) 2019-07-31 02:32:45 +07:00
CeruleanSky 07ab44c420 [dlive] Relax _VALID_URL (#21909) 2019-07-31 01:43:49 +07:00
smed79 2c8b1a21e8 [openload] Add support for oload.best (#21913) 2019-07-31 01:40:50 +07:00
Sergey M․ c2d125d99f
[youtube] Improve metadata extraction for age gate content (closes #21943) 2019-07-31 00:14:33 +07:00
Sergey M․ 85c2c4b4ab
release 2019.07.30 2019-07-30 09:43:47 +07:00
Sergey M․ 8614a03f9c
[ChangeLog] Actualize
[ci skip]
2019-07-30 09:41:23 +07:00
Remita Amine 8dbf751aa2 [youtube] improve title and description extraction(closes #21934) 2019-07-30 00:13:33 +01:00
Sergey M․ 90634acfcf
release 2019.07.27 2019-07-27 03:44:55 +07:00
Sergey M․ eaba9dd6c2
[ChangeLog] Actualize
[ci skip]
2019-07-27 03:43:33 +07:00
Kitten King 843ad1796b Fix typos (#21901) 2019-07-26 22:30:18 +07:00
Kyle 608b8a4300 [yahoo:japannews] Add extractor (closes #21698) (#21265) 2019-07-22 00:59:36 +07:00
Sergey M․ ab794a553c
[ctsnews] PEP 8 2019-07-21 14:59:53 +07:00
Remita Amine 3b446ab351 [discovery] add support go.discovery.com URLs 2019-07-20 20:20:53 +01:00
Sergey M․ 13a75688a5
[youtube] Fix some tests 2019-07-21 00:01:46 +07:00
Sergey M․ 2e18adec98
[youtube:playlist] Relax _VIDEO_RE (closes #21844) 2019-07-20 23:46:34 +07:00
Sergey M․ 9c1da4a9f9
[extractor/generic] Restrict --default-search schemeless URLs detection pattern (closes #21842) 2019-07-20 23:08:26 +07:00
Petr Vaněk 5e1c39ac85 [extractor/common] Fix typo in thumbnails resolution description (#21817) 2019-07-17 22:47:53 +07:00
Remita Amine 1824bfdcdf [vrv] fix CMS signing query extraction(closes #21809) 2019-07-16 22:51:10 +01:00
Sergey M․ 2f1991ff14
release 2019.07.16 2019-07-16 00:01:46 +07:00
Sergey M․ 8b4a0ebf10
[ChangeLog] Actualize
[ci skip]
2019-07-15 23:59:23 +07:00
Sergey M․ f61496863d
[asiancrush] Add support for yuyutv.com, midnightpulp.com and cocoro.tv (closes #21281, closes #21290) 2019-07-15 23:58:08 +07:00
Sergey M․ 799756a3b3
[kaltura] Check source format URL (#21290) 2019-07-15 23:58:08 +07:00
chien-yu 7d4dd3e5b4 [ctsnews] Fix YouTube embeds extraction (#21678) 2019-07-15 23:03:03 +07:00
tlonic f2a213d025 [einthusan] Add support for einthusan.com (closes #21748) (#21775) 2019-07-15 22:58:55 +07:00
geditorit 791d2e8117 [youtube] Add support for invidious.mastodon.host (#21777) 2019-07-15 22:54:22 +07:00
Gary 2adedc477e [gfycat] Extend _VALID_URL (closes #21779) (#21780) 2019-07-15 22:53:20 +07:00
Sergey M․ 898238e9f8
[youtube] Restrict is_live extraction (closes #21782) 2019-07-14 20:30:05 +07:00
258 changed files with 12140 additions and 10187 deletions

View file

@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.07.14. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.09.20. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2019.07.14**
- [ ] I've verified that I'm running youtube-dl version **2020.09.20**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones
@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2019.07.14
[debug] youtube-dl version 2020.09.20
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View file

@ -19,7 +19,7 @@ labels: 'site-support-request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.07.14. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.09.20. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2019.07.14**
- [ ] I've verified that I'm running youtube-dl version **2020.09.20**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones

View file

@ -18,13 +18,13 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.07.14. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.09.20. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2019.07.14**
- [ ] I've verified that I'm running youtube-dl version **2020.09.20**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones

View file

@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.07.14. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.09.20. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2019.07.14**
- [ ] I've verified that I'm running youtube-dl version **2020.09.20**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones
@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2019.07.14
[debug] youtube-dl version 2020.09.20
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View file

@ -19,13 +19,13 @@ labels: 'request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.07.14. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.09.20. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2019.07.14**
- [ ] I've verified that I'm running youtube-dl version **2020.09.20**
- [ ] I've searched the bugtracker for similar feature requests including closed ones

View file

@ -13,7 +13,7 @@ dist: trusty
env:
- YTDL_TEST_SET=core
- YTDL_TEST_SET=download
matrix:
jobs:
include:
- python: 3.7
dist: xenial
@ -21,6 +21,12 @@ matrix:
- python: 3.7
dist: xenial
env: YTDL_TEST_SET=download
- python: 3.8
dist: xenial
env: YTDL_TEST_SET=core
- python: 3.8
dist: xenial
env: YTDL_TEST_SET=download
- python: 3.8-dev
dist: xenial
env: YTDL_TEST_SET=core
@ -29,6 +35,11 @@ matrix:
env: YTDL_TEST_SET=download
- env: JYTHON=true; YTDL_TEST_SET=core
- env: JYTHON=true; YTDL_TEST_SET=download
- name: flake8
python: 3.8
dist: xenial
install: pip install flake8
script: flake8 .
fast_finish: true
allow_failures:
- env: YTDL_TEST_SET=download

View file

@ -153,7 +153,7 @@ After you have ensured this site is distributing its content legally, you can fo
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart):
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):
$ flake8 youtube_dl/extractor/yourextractor.py
@ -339,6 +339,72 @@ Incorrect:
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
```
### Inline values
Extracting variables is acceptable for reducing code duplication and improving readability of complex expressions. However, you should avoid extracting variables used only once and moving them to opposite parts of the extractor file, which makes reading the linear flow difficult.
#### Example
Correct:
```python
title = self._html_search_regex(r'<title>([^<]+)</title>', webpage, 'title')
```
Incorrect:
```python
TITLE_RE = r'<title>([^<]+)</title>'
# ...some lines of code...
title = self._html_search_regex(TITLE_RE, webpage, 'title')
```
### Collapse fallbacks
Multiple fallback values can quickly become unwieldy. Collapse multiple fallback values into a single expression via a list of patterns.
#### Example
Good:
```python
description = self._html_search_meta(
['og:description', 'description', 'twitter:description'],
webpage, 'description', default=None)
```
Unwieldy:
```python
description = (
self._og_search_description(webpage, default=None)
or self._html_search_meta('description', webpage, default=None)
or self._html_search_meta('twitter:description', webpage, default=None))
```
Methods supporting list of patterns are: `_search_regex`, `_html_search_regex`, `_og_search_property`, `_html_search_meta`.
### Trailing parentheses
Always move trailing parentheses after the last argument.
#### Example
Correct:
```python
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list)
```
Incorrect:
```python
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list,
)
```
### Use convenience conversion and parsing functions
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.

777
ChangeLog
View file

@ -1,3 +1,776 @@
version 2020.09.20
Core
* [extractor/common] Relax interaction count extraction in _json_ld
+ [extractor/common] Extract author as uploader for VideoObject in _json_ld
* [downloader/hls] Fix incorrect end byte in Range HTTP header for
media segments with EXT-X-BYTERANGE (#14748, #24512)
* [extractor/common] Handle ssl.CertificateError in _request_webpage (#26601)
* [downloader/http] Improve timeout detection when reading block of data
(#10935)
* [downloader/http] Retry download when urlopen times out (#10935, #26603)
Extractors
* [redtube] Extend URL regular expression (#26506)
* [twitch] Refactor
* [twitch:stream] Switch to GraphQL and fix reruns (#26535)
+ [telequebec] Add support for brightcove videos (#25833)
* [pornhub] Extract metadata from JSON-LD (#26614)
* [pornhub] Fix view count extraction (#26621, #26614)
version 2020.09.14
Core
+ [postprocessor/embedthumbnail] Add support for non jpg/png thumbnails
(#25687, #25717)
Extractors
* [rtlnl] Extend URL regular expression (#26549, #25821)
* [youtube] Fix empty description extraction (#26575, #26006)
* [srgssr] Extend URL regular expression (#26555, #26556, #26578)
* [googledrive] Use redirect URLs for source format (#18877, #23919, #24689,
#26565)
* [svtplay] Fix id extraction (#26576)
* [redbulltv] Improve support for rebull.com TV localized URLs (#22063)
+ [redbulltv] Add support for new redbull.com TV URLs (#22037, #22063)
* [soundcloud:pagedplaylist] Reduce pagination limit (#26557)
version 2020.09.06
Core
+ [utils] Recognize wav mimetype (#26463)
Extractors
* [nrktv:episode] Improve video id extraction (#25594, #26369, #26409)
* [youtube] Fix age gate content detection (#26100, #26152, #26311, #26384)
* [youtube:user] Extend URL regular expression (#26443)
* [xhamster] Improve initials regular expression (#26526, #26353)
* [svtplay] Fix video id extraction (#26425, #26428, #26438)
* [twitch] Rework extractors (#12297, #20414, #20604, #21811, #21812, #22979,
#24263, #25010, #25553, #25606)
* Switch to GraphQL
+ Add support for collections
+ Add support for clips and collections playlists
* [biqle] Improve video ext extraction
* [xhamster] Fix extraction (#26157, #26254)
* [xhamster] Extend URL regular expression (#25789, #25804, #25927))
version 2020.07.28
Extractors
* [youtube] Fix sigfunc name extraction (#26134, #26135, #26136, #26137)
* [youtube] Improve description extraction (#25937, #25980)
* [wistia] Restrict embed regular expression (#25969)
* [youtube] Prevent excess HTTP 301 (#25786)
+ [youtube:playlists] Extend URL regular expression (#25810)
+ [bellmedia] Add support for cp24.com clip URLs (#25764)
* [brightcove] Improve embed detection (#25674)
version 2020.06.16.1
Extractors
* [youtube] Force old layout (#25682, #25683, #25680, #25686)
* [youtube] Fix categories and improve tags extraction
version 2020.06.16
Extractors
* [youtube] Fix uploader id and uploader URL extraction
* [youtube] Improve view count extraction
* [youtube] Fix upload date extraction (#25677)
* [youtube] Fix thumbnails extraction (#25676)
* [youtube] Fix playlist and feed extraction (#25675)
+ [facebook] Add support for single-video ID links
+ [youtube] Extract chapters from JSON (#24819)
+ [kaltura] Add support for multiple embeds on a webpage (#25523)
version 2020.06.06
Extractors
* [tele5] Bypass geo restriction
+ [jwplatform] Add support for bypass geo restriction
* [tele5] Prefer jwplatform over nexx (#25533)
* [twitch:stream] Expect 400 and 410 HTTP errors from API
* [twitch:stream] Fix extraction (#25528)
* [twitch] Fix thumbnails extraction (#25531)
+ [twitch] Pass v5 Accept HTTP header (#25531)
* [brightcove] Fix subtitles extraction (#25540)
+ [malltv] Add support for sk.mall.tv (#25445)
* [periscope] Fix untitled broadcasts (#25482)
* [jwplatform] Improve embeds extraction (#25467)
version 2020.05.29
Core
* [postprocessor/ffmpeg] Embed series metadata with --add-metadata
* [utils] Fix file permissions in write_json_file (#12471, #25122)
Extractors
* [ard:beta] Extend URL regular expression (#25405)
+ [youtube] Add support for more invidious instances (#25417)
* [giantbomb] Extend URL regular expression (#25222)
* [ard] Improve URL regular expression (#25134, #25198)
* [redtube] Improve formats extraction and extract m3u8 formats (#25311,
#25321)
* [indavideo] Switch to HTTPS for API request (#25191)
* [redtube] Improve title extraction (#25208)
* [vimeo] Improve format extraction and sorting (#25285)
* [soundcloud] Reduce API playlist page limit (#25274)
+ [youtube] Add support for yewtu.be (#25226)
* [mailru] Fix extraction (#24530, #25239)
* [bellator] Fix mgid extraction (#25195)
version 2020.05.08
Core
* [downloader/http] Request last data block of exact remaining size
* [downloader/http] Finish downloading once received data length matches
expected
* [extractor/common] Use compat_cookiejar_Cookie for _set_cookie to always
ensure cookie name and value are bytestrings on python 2 (#23256, #24776)
+ [compat] Introduce compat_cookiejar_Cookie
* [utils] Improve cookie files support
+ Add support for UTF-8 in cookie files
* Skip malformed cookie file entries instead of crashing (invalid entry
length, invalid expires at)
Extractors
* [youtube] Improve signature cipher extraction (#25187, #25188)
* [iprima] Improve extraction (#25138)
* [uol] Fix extraction (#22007)
+ [orf] Add support for more radio stations (#24938, #24968)
* [dailymotion] Fix typo
- [puhutv] Remove no longer available HTTP formats (#25124)
version 2020.05.03
Core
+ [extractor/common] Extract multiple JSON-LD entries
* [options] Clarify doc on --exec command (#19087, #24883)
* [extractor/common] Skip malformed ISM manifest XMLs while extracting
ISM formats (#24667)
Extractors
* [crunchyroll] Fix and improve extraction (#25096, #25060)
* [youtube] Improve player id extraction
* [youtube] Use redirected video id if any (#25063)
* [yahoo] Fix GYAO Player extraction and relax URL regular expression
(#24178, #24778)
* [tvplay] Fix Viafree extraction (#15189, #24473, #24789)
* [tenplay] Relax URL regular expression (#25001)
+ [prosiebensat1] Extract series metadata
* [prosiebensat1] Improve extraction and remove 7tv.de support (#24948)
- [prosiebensat1] Remove 7tv.de support (#24948)
* [youtube] Fix DRM videos detection (#24736)
* [thisoldhouse] Fix video id extraction (#24548, #24549)
+ [soundcloud] Extract AAC format (#19173, #24708)
* [youtube] Skip broken multifeed videos (#24711)
* [nova:embed] Fix extraction (#24700)
* [motherless] Fix extraction (#24699)
* [twitch:clips] Extend URL regular expression (#24290, #24642)
* [tv4] Fix ISM formats extraction (#24667)
* [tele5] Fix extraction (#24553)
+ [mofosex] Add support for generic embeds (#24633)
+ [youporn] Add support for generic embeds
+ [spankwire] Add support for generic embeds (#24633)
* [spankwire] Fix extraction (#18924, #20648)
version 2020.03.24
Core
- [utils] Revert support for cookie files with spaces used instead of tabs
Extractors
* [teachable] Update upskillcourses and gns3 domains
* [generic] Look for teachable embeds before wistia
+ [teachable] Extract chapter metadata (#24421)
+ [bilibili] Add support for player.bilibili.com (#24402)
+ [bilibili] Add support for new URL schema with BV ids (#24439, #24442)
* [limelight] Remove disabled API requests (#24255)
* [soundcloud] Fix download URL extraction (#24394)
+ [cbc:watch] Add support for authentication (#19160)
* [hellporno] Fix extraction (#24399)
* [xtube] Fix formats extraction (#24348)
* [ndr] Fix extraction (#24326)
* [nhk] Update m3u8 URL and use native HLS downloader (#24329)
- [nhk] Remove obsolete rtmp formats (#24329)
* [nhk] Relax URL regular expression (#24329)
- [vimeo] Revert fix showcase password protected video extraction (#24224)
version 2020.03.08
Core
+ [utils] Add support for cookie files with spaces used instead of tabs
Extractors
+ [pornhub] Add support for pornhubpremium.com (#24288)
- [youtube] Remove outdated code and unnecessary requests
* [youtube] Improve extraction in 429 HTTP error conditions (#24283)
* [nhk] Update API version (#24270)
version 2020.03.06
Extractors
* [youtube] Fix age-gated videos support without login (#24248)
* [vimeo] Fix showcase password protected video extraction (#24224)
* [pornhub] Improve title extraction (#24184)
* [peertube] Improve extraction (#23657)
+ [servus] Add support for new URL schema (#23475, #23583, #24142)
* [vimeo] Fix subtitles URLs (#24209)
version 2020.03.01
Core
* [YoutubeDL] Force redirect URL to unicode on python 2
- [options] Remove duplicate short option -v for --version (#24162)
Extractors
* [xhamster] Fix extraction (#24205)
* [franceculture] Fix extraction (#24204)
+ [telecinco] Add support for article opening videos
* [telecinco] Fix extraction (#24195)
* [xtube] Fix metadata extraction (#21073, #22455)
* [youjizz] Fix extraction (#24181)
- Remove no longer needed compat_str around geturl
* [pornhd] Fix extraction (#24128)
+ [teachable] Add support for multiple videos per lecture (#24101)
+ [wistia] Add support for multiple generic embeds (#8347, 11385)
* [imdb] Fix extraction (#23443)
* [tv2dk:bornholm:play] Fix extraction (#24076)
version 2020.02.16
Core
* [YoutubeDL] Fix playlist entry indexing with --playlist-items (#10591,
#10622)
* [update] Fix updating via symlinks (#23991)
+ [compat] Introduce compat_realpath (#23991)
Extractors
+ [npr] Add support for streams (#24042)
+ [24video] Add support for porn.24video.net (#23779, #23784)
- [jpopsuki] Remove extractor (#23858)
* [nova] Improve extraction (#23690)
* [nova:embed] Improve (#23690)
* [nova:embed] Fix extraction (#23672)
+ [abc:iview] Add support for 720p (#22907, #22921)
* [nytimes] Improve format sorting (#24010)
+ [toggle] Add support for mewatch.sg (#23895, #23930)
* [thisoldhouse] Fix extraction (#23951)
+ [popcorntimes] Add support for popcorntimes.tv (#23949)
* [sportdeutschland] Update to new API
* [twitch:stream] Lowercase channel id for stream request (#23917)
* [tv5mondeplus] Fix extraction (#23907, #23911)
* [tva] Relax URL regular expression (#23903)
* [vimeo] Fix album extraction (#23864)
* [viewlift] Improve extraction
* Fix extraction (#23851)
+ Add support for authentication
+ Add support for more domains
* [svt] Fix series extraction (#22297)
* [svt] Fix article extraction (#22897, #22919)
* [soundcloud] Imporve private playlist/set tracks extraction (#3707)
version 2020.01.24
Extractors
* [youtube] Fix sigfunc name extraction (#23819)
* [stretchinternet] Fix extraction (#4319)
* [voicerepublic] Fix extraction
* [azmedien] Fix extraction (#23783)
* [businessinsider] Fix jwplatform id extraction (#22929, #22954)
+ [24video] Add support for 24video.vip (#23753)
* [ivi:compilation] Fix entries extraction (#23770)
* [ard] Improve extraction (#23761)
* Simplify extraction
+ Extract age limit and series
* Bypass geo-restriction
+ [nbc] Add support for nbc multi network URLs (#23049)
* [americastestkitchen] Fix extraction
* [zype] Improve extraction
+ Extract subtitles (#21258)
+ Support URLs with alternative keys/tokens (#21258)
+ Extract more metadata
* [orf:tvthek] Improve geo restricted videos detection (#23741)
* [soundcloud] Restore previews extraction (#23739)
version 2020.01.15
Extractors
* [yourporn] Fix extraction (#21645, #22255, #23459)
+ [canvas] Add support for new API endpoint (#17680, #18629)
* [ndr:base:embed] Improve thumbnails extraction (#23731)
+ [vodplatform] Add support for embed.kwikmotion.com domain
+ [twitter] Add support for promo_video_website cards (#23711)
* [orf:radio] Clean description and improve extraction
* [orf:fm4] Fix extraction (#23599)
* [safari] Fix kaltura session extraction (#23679, #23670)
* [lego] Fix extraction and extract subtitle (#23687)
* [cloudflarestream] Improve extraction
+ Add support for bytehighway.net domain
+ Add support for signed URLs
+ Extract thumbnail
* [naver] Improve extraction
* Improve geo-restriction handling
+ Extract automatic captions
+ Extract uploader metadata
+ Extract VLive HLS formats
* Improve metadata extraction
- [pandatv] Remove extractor (#23630)
* [dctp] Fix format extraction (#23656)
+ [scrippsnetworks] Add support for www.discovery.com videos
* [discovery] Fix anonymous token extraction (#23650)
* [nrktv:seriebase] Fix extraction (#23625, #23537)
* [wistia] Improve format extraction and extract subtitles (#22590)
* [vice] Improve extraction (#23631)
* [redtube] Detect private videos (#23518)
version 2020.01.01
Extractors
* [brightcove] Invalidate policy key cache on failing requests
* [pornhub] Improve locked videos detection (#22449, #22780)
+ [pornhub] Add support for m3u8 formats
* [pornhub] Fix extraction (#22749, #23082)
* [brightcove] Update policy key on failing requests
* [spankbang] Improve removed video detection (#23423)
* [spankbang] Fix extraction (#23307, #23423, #23444)
* [soundcloud] Automatically update client id on failing requests
* [prosiebensat1] Improve geo restriction handling (#23571)
* [brightcove] Cache brightcove player policy keys
* [teachable] Fail with error message if no video URL found
* [teachable] Improve locked lessons detection (#23528)
+ [scrippsnetworks] Add support for Scripps Networks sites (#19857, #22981)
* [mitele] Fix extraction (#21354, #23456)
* [soundcloud] Update client id (#23516)
* [mailru] Relax URL regular expressions (#23509)
version 2019.12.25
Core
* [utils] Improve str_to_int
+ [downloader/hls] Add ability to override AES decryption key URL (#17521)
Extractors
* [mediaset] Fix parse formats (#23508)
+ [tv2dk:bornholm:play] Add support for play.tv2bornholm.dk (#23291)
+ [slideslive] Add support for url and vimeo service names (#23414)
* [slideslive] Fix extraction (#23413)
* [twitch:clips] Fix extraction (#23375)
+ [soundcloud] Add support for token protected embeds (#18954)
* [vk] Improve extraction
* Fix User Videos extraction (#23356)
* Extract all videos for lists with more than 1000 videos (#23356)
+ Add support for video albums (#14327, #14492)
- [kontrtube] Remove extractor
- [videopremium] Remove extractor
- [musicplayon] Remove extractor (#9225)
+ [ufctv] Add support for ufcfightpass.imgdge.com and
ufcfightpass.imggaming.com (#23343)
+ [twitch] Extract m3u8 formats frame rate (#23333)
+ [imggaming] Add support for playlists and extract subtitles
+ [ufcarabia] Add support for UFC Arabia (#23312)
* [ufctv] Fix extraction
* [yahoo] Fix gyao brightcove player id (#23303)
* [vzaar] Override AES decryption key URL (#17521)
+ [vzaar] Add support for AES HLS manifests (#17521, #23299)
* [nrl] Fix extraction
* [teachingchannel] Fix extraction
* [nintendo] Fix extraction and partially add support for Nintendo Direct
videos (#4592)
+ [ooyala] Add better fallback values for domain and streams variables
+ [youtube] Add support youtubekids.com (#23272)
* [tv2] Detect DRM protection
+ [tv2] Add support for katsomo.fi and mtv.fi (#10543)
* [tv2] Fix tv2.no article extraction
* [msn] Improve extraction
+ Add support for YouTube and NBCSports embeds
+ Add support for articles with multiple videos
* Improve AOL embed support
* Improve format extraction
* [abcotvs] Relax URL regular expression and improve metadata extraction
(#18014)
* [channel9] Reduce response size
* [adobetv] Improve extaction
* Use OnDemandPagedList for list extractors
* Reduce show extraction requests
* Extract original video format and subtitles
+ Add support for adobe tv embeds
version 2019.11.28
Core
+ [utils] Add generic caesar cipher and rot47
* [utils] Handle rd-suffixed day parts in unified_strdate (#23199)
Extractors
* [vimeo] Improve extraction
* Fix review extraction
* Fix ondemand extraction
* Make password protected player case as an expected error (#22896)
* Simplify channel based extractors code
- [openload] Remove extractor (#11999)
- [verystream] Remove extractor
- [streamango] Remove extractor (#15406)
* [dailymotion] Improve extraction
* Extract http formats included in m3u8 manifest
* Fix user extraction (#3553, #21415)
+ Add suport for User Authentication (#11491)
* Fix password protected videos extraction (#23176)
* Respect age limit option and family filter cookie value (#18437)
* Handle video url playlist query param
* Report allowed countries for geo-restricted videos
* [corus] Improve extraction
+ Add support for Series Plus, W Network, YTV, ABC Spark, disneychannel.com
and disneylachaine.ca (#20861)
+ Add support for self hosted videos (#22075)
* Detect DRM protection (#14910, #9164)
* [vivo] Fix extraction (#22328, #22279)
+ [bitchute] Extract upload date (#22990, #23193)
* [soundcloud] Update client id (#23214)
version 2019.11.22
Core
+ [extractor/common] Clean jwplayer description HTML tags
+ [extractor/common] Add data, headers and query to all major extract formats
methods
Extractors
* [chaturbate] Fix extraction (#23010, #23012)
+ [ntvru] Add support for non relative file URLs (#23140)
* [vk] Fix wall audio thumbnails extraction (#23135)
* [ivi] Fix format extraction (#21991)
- [comcarcoff] Remove extractor
+ [drtv] Add support for new URL schema (#23059)
+ [nexx] Add support for Multi Player JS Setup (#23052)
+ [teamcoco] Add support for new videos (#23054)
* [soundcloud] Check if the soundtrack has downloads left (#23045)
* [facebook] Fix posts video data extraction (#22473)
- [addanime] Remove extractor
- [minhateca] Remove extractor
- [daisuki] Remove extractor
* [seeker] Fix extraction
- [revision3] Remove extractors
* [twitch] Fix video comments URL (#18593, #15828)
* [twitter] Improve extraction
+ Add support for generic embeds (#22168)
* Always extract http formats for native videos (#14934)
+ Add support for Twitter Broadcasts (#21369)
+ Extract more metadata
* Improve VMap format extraction
* Unify extraction code for both twitter statuses and cards
+ [twitch] Add support for Clip embed URLs
* [lnkgo] Fix extraction (#16834)
* [mixcloud] Improve extraction
* Improve metadata extraction (#11721)
* Fix playlist extraction (#22378)
* Fix user mixes extraction (#15197, #17865)
+ [kinja] Add support for Kinja embeds (#5756, #11282, #22237, #22384)
* [onionstudios] Fix extraction
+ [hotstar] Pass Referer header to format requests (#22836)
* [dplay] Minimize response size
+ [patreon] Extract uploader_id and filesize
* [patreon] Minimize response size
* [roosterteeth] Fix login request (#16094, #22689)
version 2019.11.05
Extractors
+ [scte] Add support for learning.scte.org (#22975)
+ [msn] Add support for Vidible and AOL embeds (#22195, #22227)
* [myspass] Fix video URL extraction and improve metadata extraction (#22448)
* [jamendo] Improve extraction
* Fix album extraction (#18564)
* Improve metadata extraction (#18565, #21379)
* [mediaset] Relax URL guid matching (#18352)
+ [mediaset] Extract unprotected M3U and MPD manifests (#17204)
* [telegraaf] Fix extraction
+ [bellmedia] Add support for marilyn.ca videos (#22193)
* [stv] Fix extraction (#22928)
- [iconosquare] Remove extractor
- [keek] Remove extractor
- [gameone] Remove extractor (#21778)
- [flipagram] Remove extractor
- [bambuser] Remove extractor
* [wistia] Reduce embed extraction false positives
+ [wistia] Add support for inline embeds (#22931)
- [go90] Remove extractor
* [kakao] Remove raw request
+ [kakao] Extract format total bitrate
* [daum] Fix VOD and Clip extracton (#15015)
* [kakao] Improve extraction
+ Add support for embed URLs
+ Add support for Kakao Legacy vid based embed URLs
* Only extract fields used for extraction
* Strip description and extract tags
* [mixcloud] Fix cloudcast data extraction (#22821)
* [yahoo] Improve extraction
+ Add support for live streams (#3597, #3779, #22178)
* Bypass cookie consent page for european domains (#16948, #22576)
+ Add generic support for embeds (#20332)
* [tv2] Fix and improve extraction (#22787)
+ [tv2dk] Add support for TV2 DK sites
* [onet] Improve extraction …
+ Add support for onet100.vod.pl
+ Extract m3u8 formats
* Correct audio only format info
* [fox9] Fix extraction
version 2019.10.29
Core
* [utils] Actualize major IPv4 address blocks per country
Extractors
+ [go] Add support for abc.com and freeform.com (#22823, #22864)
+ [mtv] Add support for mtvjapan.com
* [mtv] Fix extraction for mtv.de (#22113)
* [videodetective] Fix extraction
* [internetvideoarchive] Fix extraction
* [nbcnews] Fix extraction (#12569, #12576, #21703, #21923)
- [hark] Remove extractor
- [tutv] Remove extractor
- [learnr] Remove extractor
- [macgamestore] Remove extractor
* [la7] Update Kaltura service URL (#22358)
* [thesun] Fix extraction (#16966)
- [makertv] Remove extractor
+ [tenplay] Add support for 10play.com.au (#21446)
* [soundcloud] Improve extraction
* Improve format extraction (#22123)
+ Extract uploader_id and uploader_url (#21916)
+ Extract all known thumbnails (#19071, #20659)
* Fix extration for private playlists (#20976)
+ Add support for playlist embeds (#20976)
* Skip preview formats (#22806)
* [dplay] Improve extraction
+ Add support for dplay.fi, dplay.jp and es.dplay.com (#16969)
* Fix it.dplay.com extraction (#22826)
+ Extract creator, tags and thumbnails
* Handle playback API call errors
+ [discoverynetworks] Add support for dplay.co.uk
* [vk] Improve extraction
+ Add support for Odnoklassniki embeds
+ Extract more videos from user lists (#4470)
+ Fix wall post audio extraction (#18332)
* Improve error detection (#22568)
+ [odnoklassniki] Add support for embeds
* [puhutv] Improve extraction
* Fix subtitles extraction
* Transform HLS URLs to HTTP URLs
* Improve metadata extraction
* [ceskatelevize] Skip DRM media
+ [facebook] Extract subtitles (#22777)
* [globo] Handle alternative hash signing method
version 2019.10.22
Core
* [utils] Improve subtitles_filename (#22753)
Extractors
* [facebook] Bypass download rate limits (#21018)
+ [contv] Add support for contv.com
- [viewster] Remove extractor
* [xfileshare] Improve extractor (#17032, #17906, #18237, #18239)
* Update the list of domains
+ Add support for aa-encoded video data
* Improve jwplayer format extraction
+ Add support for Clappr sources
* [mangomolo] Fix video format extraction and add support for player URLs
* [audioboom] Improve metadata extraction
* [twitch] Update VOD URL matching (#22395, #22727)
- [mit] Remove support for video.mit.edu (#22403)
- [servingsys] Remove extractor (#22639)
* [dumpert] Fix extraction (#22428, #22564)
* [atresplayer] Fix extraction (#16277, #16716)
version 2019.10.16
Core
* [extractor/common] Make _is_valid_url more relaxed
Extractors
* [vimeo] Improve album videos id extraction (#22599)
+ [globo] Extract subtitles (#22713)
* [bokecc] Improve player params extraction (#22638)
* [nexx] Handle result list (#22666)
* [vimeo] Fix VHX embed extraction
* [nbc] Switch to graphql API (#18581, #22693, #22701)
- [vessel] Remove extractor
- [promptfile] Remove extractor (#6239)
* [kaltura] Fix service URL extraction (#22658)
* [kaltura] Fix embed info strip (#22658)
* [globo] Fix format extraction (#20319)
* [redtube] Improve metadata extraction (#22492, #22615)
* [pornhub:uservideos:upload] Fix extraction (#22619)
+ [telequebec:squat] Add support for squat.telequebec.tv (#18503)
- [wimp] Remove extractor (#22088, #22091)
+ [gfycat] Extend URL regular expression (#22225)
+ [chaturbate] Extend URL regular expression (#22309)
* [peertube] Update instances (#22414)
+ [telequebec] Add support for coucou.telequebec.tv (#22482)
+ [xvideos] Extend URL regular expression (#22471)
- [youtube] Remove support for invidious.enkirton.net (#22543)
+ [openload] Add support for oload.monster (#22592)
* [nrktv:seriebase] Fix extraction (#22596)
+ [youtube] Add support for yt.lelux.fi (#22597)
* [orf:tvthek] Make manifest requests non fatal (#22578)
* [teachable] Skip login when already logged in (#22572)
* [viewlift] Improve extraction (#22545)
* [nonktube] Fix extraction (#22544)
version 2019.09.28
Core
* [YoutubeDL] Honour all --get-* options with --flat-playlist (#22493)
Extractors
* [vk] Fix extraction (#22522)
* [heise] Fix kaltura embeds extraction (#22514)
* [ted] Check for resources validity and extract subtitled downloads (#22513)
+ [youtube] Add support for
owxfohz4kjyv25fvlqilyxast7inivgiktls3th44jhk3ej3i7ya.b32.i2p (#22292)
+ [nhk] Add support for clips
* [nhk] Fix video extraction (#22249, #22353)
* [byutv] Fix extraction (#22070)
+ [openload] Add support for oload.online (#22304)
+ [youtube] Add support for invidious.drycat.fr (#22451)
* [jwplatfom] Do not match video URLs (#20596, #22148)
* [youtube:playlist] Unescape playlist uploader (#22483)
+ [bilibili] Add support audio albums and songs (#21094)
+ [instagram] Add support for tv URLs
+ [mixcloud] Allow uppercase letters in format URLs (#19280)
* [brightcove] Delegate all supported legacy URLs to new extractor (#11523,
#12842, #13912, #15669, #16303)
* [hotstar] Use native HLS downloader by default
+ [hotstar] Extract more formats (#22323)
* [9now] Fix extraction (#22361)
* [zdf] Bypass geo restriction
+ [tv4] Extract series metadata
* [tv4] Fix extraction (#22443)
version 2019.09.12.1
Extractors
* [youtube] Remove quality and tbr for itag 43 (#22372)
version 2019.09.12
Extractors
* [youtube] Quick extraction tempfix (#22367, #22163)
version 2019.09.01
Core
+ [extractor/generic] Add support for squarespace embeds (#21294, #21802,
#21859)
+ [downloader/external] Respect mtime option for aria2c (#22242)
Extractors
+ [xhamster:user] Add support for user pages (#16330, #18454)
+ [xhamster] Add support for more domains
+ [verystream] Add support for woof.tube (#22217)
+ [dailymotion] Add support for lequipe.fr (#21328, #22152)
+ [openload] Add support for oload.vip (#22205)
+ [bbccouk] Extend URL regular expression (#19200)
+ [youtube] Add support for invidious.nixnet.xyz and yt.elukerio.org (#22223)
* [safari] Fix authentication (#22161, #22184)
* [usanetwork] Fix extraction (#22105)
+ [einthusan] Add support for einthusan.ca (#22171)
* [youtube] Improve unavailable message extraction (#22117)
+ [piksel] Extract subtitles (#20506)
version 2019.08.13
Core
* [downloader/fragment] Fix ETA calculation of resumed download (#21992)
* [YoutubeDL] Check annotations availability (#18582)
Extractors
* [youtube:playlist] Improve flat extraction (#21927)
* [youtube] Fix annotations extraction (#22045)
+ [discovery] Extract series meta field (#21808)
* [youtube] Improve error detection (#16445)
* [vimeo] Fix album extraction (#1933, #15704, #15855, #18967, #21986)
+ [roosterteeth] Add support for watch URLs
* [discovery] Limit video data by show slug (#21980)
version 2019.08.02
Extractors
+ [tvigle] Add support for HLS and DASH formats (#21967)
* [tvigle] Fix extraction (#21967)
+ [yandexvideo] Add support for DASH formats (#21971)
* [discovery] Use API call for video data extraction (#21808)
+ [mgtv] Extract format_note (#21881)
* [tvn24] Fix metadata extraction (#21833, #21834)
* [dlive] Relax URL regular expression (#21909)
+ [openload] Add support for oload.best (#21913)
* [youtube] Improve metadata extraction for age gate content (#21943)
version 2019.07.30
Extractors
* [youtube] Fix and improve title and description extraction (#21934)
version 2019.07.27
Extractors
+ [yahoo:japannews] Add support for yahoo.co.jp (#21698, #21265)
+ [discovery] Add support go.discovery.com URLs
* [youtube:playlist] Relax video regular expression (#21844)
* [generic] Restrict --default-search schemeless URLs detection pattern
(#21842)
* [vrv] Fix CMS signing query extraction (#21809)
version 2019.07.16
Extractors
+ [asiancrush] Add support for yuyutv.com, midnightpulp.com and cocoro.tv
(#21281, #21290)
* [kaltura] Check source format URL (#21290)
* [ctsnews] Fix YouTube embeds extraction (#21678)
+ [einthusan] Add support for einthusan.com (#21748, #21775)
+ [youtube] Add support for invidious.mastodon.host (#21777)
+ [gfycat] Extend URL regular expression (#21779, #21780)
* [youtube] Restrict is_live extraction (#21782)
version 2019.07.14
Extractors
@ -228,7 +1001,7 @@ Extractors
version 2019.04.17
Extractors
* [openload] Randomize User-Agent (closes #20688)
* [openload] Randomize User-Agent (#20688)
+ [openload] Add support for oladblock domains (#20471)
* [adn] Fix subtitle extraction (#12724)
+ [aol] Add support for localized websites
@ -793,7 +1566,7 @@ Extractors
+ [youtube] Extract channel meta fields (#9676, #12939)
* [porntube] Fix extraction (#17541)
* [asiancrush] Fix extraction (#15630)
+ [twitch:clips] Extend URL regular expression (closes #17559)
+ [twitch:clips] Extend URL regular expression (#17559)
+ [vzaar] Add support for HLS
* [tube8] Fix metadata extraction (#17520)
* [eporner] Extract JSON-LD (#17519)

View file

@ -434,9 +434,9 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
either the path to the binary or its
containing directory.
--exec CMD Execute a command on the file after
downloading, similar to find's -exec
syntax. Example: --exec 'adb push {}
/sdcard/Music/ && rm {}'
downloading and post-processing, similar to
find's -exec syntax. Example: --exec 'adb
push {} /sdcard/Music/ && rm {}'
--convert-subs FORMAT Convert the subtitles to other format
(currently supported: srt|ass|vtt|lrc)
@ -545,7 +545,7 @@ The basic usage is not to set any template arguments when downloading a single f
- `extractor` (string): Name of the extractor
- `extractor_key` (string): Key name of the extractor
- `epoch` (numeric): Unix epoch when creating the file
- `autonumber` (numeric): Five-digit number that will be increased with each download, starting at zero
- `autonumber` (numeric): Number that will be increased with each download, starting at `--autonumber-start`
- `playlist` (string): Name or id of the playlist that contains the video
- `playlist_index` (numeric): Index of the video in the playlist padded with leading zeros according to the total length of the playlist
- `playlist_id` (string): Playlist identifier
@ -752,8 +752,8 @@ As a last resort, you can also uninstall the version installed by your package m
Afterwards, simply follow [our manual installation instructions](https://ytdl-org.github.io/youtube-dl/download.html):
```
sudo wget https://yt-dl.org/latest/youtube-dl -O /usr/local/bin/youtube-dl
sudo chmod a+x /usr/local/bin/youtube-dl
sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl
hash -r
```
@ -835,7 +835,9 @@ In February 2015, the new YouTube player contained a character sequence in a str
### HTTP Error 429: Too Many Requests or 402: Payment Required
These two error codes indicate that the service is blocking your IP address because of overuse. Contact the service and ask them to unblock your IP address, or - if you have acquired a whitelisted IP address already - use the [`--proxy` or `--source-address` options](#network-options) to select another IP address.
These two error codes indicate that the service is blocking your IP address because of overuse. Usually this is a soft block meaning that you can gain access again after solving CAPTCHA. Just open a browser and solve a CAPTCHA the service suggests you and after that [pass cookies](#how-do-i-pass-cookies-to-youtube-dl) to youtube-dl. Note that if your machine has multiple external IPs then you should also pass exactly the same IP you've used for solving CAPTCHA with [`--source-address`](#network-options). Also you may need to pass a `User-Agent` HTTP header of your browser with [`--user-agent`](#workarounds).
If this is not the case (no CAPTCHA suggested to solve by the service) then you can contact the service and ask them to unblock your IP address, or - if you have acquired a whitelisted IP address already - use the [`--proxy` or `--source-address` options](#network-options) to select another IP address.
### SyntaxError: Non-ASCII character
@ -1030,7 +1032,7 @@ After you have ensured this site is distributing its content legally, you can fo
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart):
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):
$ flake8 youtube_dl/extractor/yourextractor.py
@ -1216,6 +1218,72 @@ Incorrect:
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
```
### Inline values
Extracting variables is acceptable for reducing code duplication and improving readability of complex expressions. However, you should avoid extracting variables used only once and moving them to opposite parts of the extractor file, which makes reading the linear flow difficult.
#### Example
Correct:
```python
title = self._html_search_regex(r'<title>([^<]+)</title>', webpage, 'title')
```
Incorrect:
```python
TITLE_RE = r'<title>([^<]+)</title>'
# ...some lines of code...
title = self._html_search_regex(TITLE_RE, webpage, 'title')
```
### Collapse fallbacks
Multiple fallback values can quickly become unwieldy. Collapse multiple fallback values into a single expression via a list of patterns.
#### Example
Good:
```python
description = self._html_search_meta(
['og:description', 'description', 'twitter:description'],
webpage, 'description', default=None)
```
Unwieldy:
```python
description = (
self._og_search_description(webpage, default=None)
or self._html_search_meta('description', webpage, default=None)
or self._html_search_meta('twitter:description', webpage, default=None))
```
Methods supporting list of patterns are: `_search_regex`, `_html_search_regex`, `_og_search_property`, `_html_search_meta`.
### Trailing parentheses
Always move trailing parentheses after the last argument.
#### Example
Correct:
```python
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list)
```
Incorrect:
```python
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list,
)
```
### Use convenience conversion and parsing functions
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.

View file

@ -1,7 +1,6 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import base64
import io
import json
import mimetypes
@ -15,7 +14,6 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.compat import (
compat_basestring,
compat_input,
compat_getpass,
compat_print,
compat_urllib_request,
@ -40,28 +38,20 @@ class GitHubReleaser(object):
try:
info = netrc.netrc().authenticators(self._NETRC_MACHINE)
if info is not None:
self._username = info[0]
self._password = info[2]
self._token = info[2]
compat_print('Using GitHub credentials found in .netrc...')
return
else:
compat_print('No GitHub credentials found in .netrc')
except (IOError, netrc.NetrcParseError):
compat_print('Unable to parse .netrc')
self._username = compat_input(
'Type your GitHub username or email address and press [Return]: ')
self._password = compat_getpass(
'Type your GitHub password and press [Return]: ')
self._token = compat_getpass(
'Type your GitHub PAT (personal access token) and press [Return]: ')
def _call(self, req):
if isinstance(req, compat_basestring):
req = sanitized_Request(req)
# Authorizing manually since GitHub does not response with 401 with
# WWW-Authenticate header set (see
# https://developer.github.com/v3/#basic-authentication)
b64 = base64.b64encode(
('%s:%s' % (self._username, self._password)).encode('utf-8')).decode('ascii')
req.add_header('Authorization', 'Basic %s' % b64)
req.add_header('Authorization', 'token %s' % self._token)
response = self._opener.open(req).read().decode('utf-8')
return json.loads(response)

View file

@ -26,13 +26,13 @@
- **AcademicEarth:Course**
- **acast**
- **acast:channel**
- **AddAnime**
- **ADN**: Anime Digital Network
- **AdobeConnect**
- **AdobeTV**
- **AdobeTVChannel**
- **AdobeTVShow**
- **AdobeTVVideo**
- **adobetv**
- **adobetv:channel**
- **adobetv:embed**
- **adobetv:show**
- **adobetv:video**
- **AdultSwim**
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault
- **afreecatv**: afreecatv.com
@ -76,8 +76,6 @@
- **awaan:video**
- **AZMedien**: AZ Medien videos
- **BaiduVideo**: 百度视频
- **bambuser**
- **bambuser:channel**
- **Bandcamp**
- **Bandcamp:album**
- **Bandcamp:weekly**
@ -98,6 +96,9 @@
- **Bigflix**
- **Bild**: Bild.de
- **BiliBili**
- **BilibiliAudio**
- **BilibiliAudioAlbum**
- **BiliBiliPlayer**
- **BioBioChileTV**
- **BIQLE**
- **BitChute**
@ -175,12 +176,12 @@
- **CNN**
- **CNNArticle**
- **CNNBlogs**
- **ComCarCoff**
- **ComedyCentral**
- **ComedyCentralFullEpisodes**
- **ComedyCentralShortname**
- **ComedyCentralTV**
- **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
- **CONtv**
- **Corus**
- **Coub**
- **Cracked**
@ -202,8 +203,6 @@
- **dailymotion**
- **dailymotion:playlist**
- **dailymotion:user**
- **DaisukiMotto**
- **DaisukiMottoPlaylist**
- **daum.net**
- **daum.net:clip**
- **daum.net:playlist**
@ -229,7 +228,6 @@
- **DouyuShow**
- **DouyuTV**: 斗鱼
- **DPlay**
- **DPlayIt**
- **DRBonanza**
- **Dropbox**
- **DrTuber**
@ -282,12 +280,12 @@
- **FiveThirtyEight**
- **FiveTV**
- **Flickr**
- **Flipagram**
- **Folketinget**: Folketinget (ft.dk; Danish parliament)
- **FootyRoom**
- **Formula1**
- **FOX**
- **FOX9**
- **FOX9News**
- **Foxgay**
- **foxnews**: Fox News and Fox Business Video
- **foxnews:article**
@ -313,8 +311,6 @@
- **FXNetworks**
- **Gaia**
- **GameInformer**
- **GameOne**
- **gameone:playlist**
- **GameSpot**
- **GameStar**
- **Gaskrank**
@ -329,14 +325,12 @@
- **Globo**
- **GloboArticle**
- **Go**
- **Go90**
- **GodTube**
- **Golem**
- **GoogleDrive**
- **Goshgay**
- **GPUTechConf**
- **Groupon**
- **Hark**
- **hbo**
- **HearThisAt**
- **Heise**
@ -365,7 +359,6 @@
- **Hungama**
- **HungamaSong**
- **Hypem**
- **Iconosquare**
- **ign.com**
- **imdb**: Internet Movie Database trailers
- **imdb:list**: Internet Movie Database lists
@ -397,7 +390,6 @@
- **JeuxVideo**
- **Joj**
- **Jove**
- **jpopsuki.tv**
- **JWPlatform**
- **Kakao**
- **Kaltura**
@ -405,14 +397,14 @@
- **Kankan**
- **Karaoketv**
- **KarriereVideos**
- **keek**
- **Katsomo**
- **KeezMovies**
- **Ketnet**
- **KhanAcademy**
- **KickStarter**
- **KinjaEmbed**
- **KinoPoisk**
- **KonserthusetPlay**
- **kontrtube**: KontrTube.ru - Труба зовёт
- **KrasView**: Красвью
- **Ku6**
- **KUSI**
@ -429,7 +421,6 @@
- **Lcp**
- **LcpPlay**
- **Le**: 乐视网
- **Learnr**
- **Lecture2Go**
- **Lecturio**
- **LecturioCourse**
@ -463,11 +454,9 @@
- **lynda**: lynda.com videos
- **lynda:course**: lynda.com online courses
- **m6**
- **macgamestore**: MacGameStore trailers
- **mailru**: Видео@Mail.Ru
- **mailru:music**: Музыка@Mail.Ru
- **mailru:music:search**: Музыка@Mail.Ru
- **MakerTV**
- **MallTV**
- **mangomolo:live**
- **mangomolo:video**
@ -494,14 +483,12 @@
- **Mgoon**
- **MGTV**: 芒果TV
- **MiaoPai**
- **Minhateca**
- **MinistryGrid**
- **Minoto**
- **miomio.tv**
- **MiTele**: mitele.es
- **mixcloud**
- **mixcloud:playlist**
- **mixcloud:stream**
- **mixcloud:user**
- **Mixer:live**
- **Mixer:vod**
@ -510,6 +497,7 @@
- **MNetTV**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
- **Mofosex**
- **MofosexEmbed**
- **Mojvideo**
- **Morningstar**: morningstar.com
- **Motherless**
@ -523,11 +511,10 @@
- **mtg**: MTG services
- **mtv**
- **mtv.de**
- **mtv81**
- **mtv:video**
- **mtvjapan**
- **mtvservices:embedded**
- **MuenchenTV**: münchen.tv
- **MusicPlayOn**
- **mva**: Microsoft Virtual Academy videos
- **mva:course**: Microsoft Virtual Academy courses
- **Mwave**
@ -632,18 +619,26 @@
- **OnionStudios**
- **Ooyala**
- **OoyalaExternal**
- **Openload**
- **OraTV**
- **orf:burgenland**: Radio Burgenland
- **orf:fm4**: radio FM4
- **orf:fm4:story**: fm4.orf.at stories
- **orf:iptv**: iptv.ORF.at
- **orf:kaernten**: Radio Kärnten
- **orf:noe**: Radio Niederösterreich
- **orf:oberoesterreich**: Radio Oberösterreich
- **orf:oe1**: Radio Österreich 1
- **orf:oe3**: Radio Österreich 3
- **orf:salzburg**: Radio Salzburg
- **orf:steiermark**: Radio Steiermark
- **orf:tirol**: Radio Tirol
- **orf:tvthek**: ORF TVthek
- **orf:vorarlberg**: Radio Vorarlberg
- **orf:wien**: Radio Wien
- **OsnatelTV**
- **OutsideTV**
- **PacktPub**
- **PacktPubCourse**
- **PandaTV**: 熊猫TV
- **pandora.tv**: 판도라TV
- **ParamountNetwork**
- **parliamentlive.tv**: UK parliament videos
@ -679,6 +674,7 @@
- **Pokemon**
- **PolskieRadio**
- **PolskieRadioCategory**
- **Popcorntimes**
- **PopcornTV**
- **PornCom**
- **PornerBros**
@ -692,7 +688,6 @@
- **PornoXO**
- **PornTube**
- **PressTV**
- **PromptFile**
- **prosiebensat1**: ProSiebenSat.1 Digital
- **puhutv**
- **puhutv:serie**
@ -722,6 +717,8 @@
- **RayWenderlichCourse**
- **RBMARadio**
- **RDS**: RDS.ca
- **RedBull**
- **RedBullEmbed**
- **RedBullTV**
- **RedBullTVRrnContent**
- **Reddit**
@ -733,8 +730,6 @@
- **Restudy**
- **Reuters**
- **ReverbNation**
- **revision**
- **revision3:embed**
- **RICE**
- **RMCDecouverte**
- **RockstarGames**
@ -779,11 +774,13 @@
- **screen.yahoo:search**: Yahoo screen search
- **Screencast**
- **ScreencastOMatic**
- **ScrippsNetworks**
- **scrippsnetworks:watch**
- **SCTE**
- **SCTECourse**
- **Seeker**
- **SenateISVP**
- **SendtoNews**
- **ServingSys**
- **Servus**
- **Sexu**
- **SeznamZpravy**
@ -814,6 +811,7 @@
- **soundcloud:set**
- **soundcloud:trackstation**
- **soundcloud:user**
- **SoundcloudEmbed**
- **soundgasm**
- **soundgasm:profile**
- **southpark.cc.com**
@ -840,7 +838,6 @@
- **Steam**
- **Stitcher**
- **Streamable**
- **Streamango**
- **streamcloud.eu**
- **StreamCZ**
- **StreetVoice**
@ -882,9 +879,11 @@
- **TeleQuebec**
- **TeleQuebecEmission**
- **TeleQuebecLive**
- **TeleQuebecSquat**
- **TeleTask**
- **Telewebion**
- **TennisTV**
- **TenPlay**
- **TF1**
- **TFO**
- **TheIntercept**
@ -923,11 +922,12 @@
- **tunein:topic**
- **TunePk**
- **Turbo**
- **Tutv**
- **tv.dfb.de**
- **TV2**
- **tv2.hu**
- **TV2Article**
- **TV2DK**
- **TV2DKBornholmPlay**
- **TV4**: tv4.se and tv4play.se
- **TV5MondePlus**: TV5MONDE+
- **TVA**
@ -952,22 +952,21 @@
- **TVPlayHome**
- **Tweakers**
- **TwitCasting**
- **twitch:chapter**
- **twitch:clips**
- **twitch:profile**
- **twitch:stream**
- **twitch:video**
- **twitch:videos:all**
- **twitch:videos:highlights**
- **twitch:videos:past-broadcasts**
- **twitch:videos:uploads**
- **twitch:vod**
- **TwitchCollection**
- **TwitchVideos**
- **TwitchVideosClips**
- **TwitchVideosCollections**
- **twitter**
- **twitter:amplify**
- **twitter:broadcast**
- **twitter:card**
- **udemy**
- **udemy:course**
- **UDNEmbed**: 聯合影音
- **UFCArabia**
- **UFCTV**
- **UKTVPlay**
- **umg:de**: Universal Music Deutschland
@ -988,8 +987,6 @@
- **Vbox7**
- **VeeHD**
- **Veoh**
- **verystream**
- **Vessel**
- **Vesti**: Вести.Ru
- **Vevo**
- **VevoPlaylist**
@ -1004,13 +1001,11 @@
- **Viddler**
- **Videa**
- **video.google:search**: Google Video search
- **video.mit.edu**
- **VideoDetective**
- **videofy.me**
- **videomore**
- **videomore:season**
- **videomore:video**
- **VideoPremium**
- **VideoPress**
- **Vidio**
- **VidLii**
@ -1020,9 +1015,8 @@
- **Vidzi**
- **vier**: vier.be and vijf.be
- **vier:videos**
- **ViewLift**
- **ViewLiftEmbed**
- **Viewster**
- **viewlift**
- **viewlift:embed**
- **Viidea**
- **viki**
- **viki:channel**
@ -1088,7 +1082,6 @@
- **Weibo**
- **WeiboMobile**
- **WeiqiTV**: WQTV
- **Wimp**
- **Wistia**
- **wnl**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **WorldStarHipHop**
@ -1097,9 +1090,10 @@
- **WWE**
- **XBef**
- **XboxClips**
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC, VidBom, vidlo, RapidVideo.TV, FastVideo.me
- **XFileShare**: XFileShare based sites: ClipWatching, GoUnlimited, GoVid, HolaVid, Streamty, TheVideoBee, Uqload, VidBom, vidlo, VidLocker, VidShare, VUp, XVideoSharing
- **XHamster**
- **XHamsterEmbed**
- **XHamsterUser**
- **xiami:album**: 虾米音乐 - 专辑
- **xiami:artist**: 虾米音乐 - 歌手
- **xiami:collection**: 虾米音乐 - 精选集
@ -1117,6 +1111,7 @@
- **Yahoo**: Yahoo screen and movies
- **yahoo:gyao**
- **yahoo:gyao:player**
- **yahoo:japannews**: Yahoo! Japan News
- **YandexDisk**
- **yandexmusic:album**: Яндекс.Музыка - Альбом
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист

View file

@ -816,11 +816,15 @@ class TestYoutubeDL(unittest.TestCase):
'webpage_url': 'http://example.com',
}
def get_ids(params):
def get_downloaded_info_dicts(params):
ydl = YDL(params)
# make a copy because the dictionary can be modified
ydl.process_ie_result(playlist.copy())
return [int(v['id']) for v in ydl.downloaded_info_dicts]
# make a deep copy because the dictionary and nested entries
# can be modified
ydl.process_ie_result(copy.deepcopy(playlist))
return ydl.downloaded_info_dicts
def get_ids(params):
return [int(v['id']) for v in get_downloaded_info_dicts(params)]
result = get_ids({})
self.assertEqual(result, [1, 2, 3, 4])
@ -852,6 +856,22 @@ class TestYoutubeDL(unittest.TestCase):
result = get_ids({'playlist_items': '2-4,3-4,3'})
self.assertEqual(result, [2, 3, 4])
# Tests for https://github.com/ytdl-org/youtube-dl/issues/10591
# @{
result = get_downloaded_info_dicts({'playlist_items': '2-4,3-4,3'})
self.assertEqual(result[0]['playlist_index'], 2)
self.assertEqual(result[1]['playlist_index'], 3)
result = get_downloaded_info_dicts({'playlist_items': '2-4,3-4,3'})
self.assertEqual(result[0]['playlist_index'], 2)
self.assertEqual(result[1]['playlist_index'], 3)
self.assertEqual(result[2]['playlist_index'], 4)
result = get_downloaded_info_dicts({'playlist_items': '4,2'})
self.assertEqual(result[0]['playlist_index'], 4)
self.assertEqual(result[1]['playlist_index'], 2)
# @}
def test_urlopen_no_file_protocol(self):
# see https://github.com/ytdl-org/youtube-dl/issues/8227
ydl = YDL()

View file

@ -39,6 +39,13 @@ class TestYoutubeDLCookieJar(unittest.TestCase):
assert_cookie_has_value('HTTPONLY_COOKIE')
assert_cookie_has_value('JS_ACCESSIBLE_COOKIE')
def test_malformed_cookies(self):
cookiejar = YoutubeDLCookieJar('./test/testdata/cookies/malformed_cookies.txt')
cookiejar.load(ignore_discard=True, ignore_expires=True)
# Cookies should be empty since all malformed cookie file entries
# will be ignored
self.assertFalse(cookiejar._cookies)
if __name__ == '__main__':
unittest.main()

View file

@ -123,12 +123,6 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertMatch('http://video.pbs.org/viralplayer/2365173446/', ['pbs'])
self.assertMatch('http://video.pbs.org/widget/partnerplayer/980042464/', ['pbs'])
def test_yahoo_https(self):
# https://github.com/ytdl-org/youtube-dl/issues/2701
self.assertMatch(
'https://screen.yahoo.com/smartwatches-latest-wearable-gadgets-163745379-cbs.html',
['Yahoo'])
def test_no_duplicated_ie_names(self):
name_accu = collections.defaultdict(list)
for ie in self.ies:

View file

@ -26,7 +26,6 @@ from youtube_dl.extractor import (
ThePlatformIE,
ThePlatformFeedIE,
RTVEALaCartaIE,
FunnyOrDieIE,
DemocracynowIE,
)
@ -322,18 +321,6 @@ class TestRtveSubtitles(BaseTestSubtitles):
self.assertEqual(md5(subtitles['es']), '69e70cae2d40574fb7316f31d6eb7fca')
class TestFunnyOrDieSubtitles(BaseTestSubtitles):
url = 'http://www.funnyordie.com/videos/224829ff6d/judd-apatow-will-direct-your-vine'
IE = FunnyOrDieIE
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['en']))
self.assertEqual(md5(subtitles['en']), 'c5593c193eacd353596c11c2d4f9ecc4')
class TestDemocracynowSubtitles(BaseTestSubtitles):
url = 'http://www.democracynow.org/shows/2015/7/3'
IE = DemocracynowIE

View file

@ -19,6 +19,7 @@ from youtube_dl.utils import (
age_restricted,
args_to_str,
encode_base_n,
caesar,
clean_html,
date_from_str,
DateRange,
@ -69,11 +70,13 @@ from youtube_dl.utils import (
remove_start,
remove_end,
remove_quotes,
rot47,
shell_quote,
smuggle_url,
str_to_int,
strip_jsonp,
strip_or_none,
subtitles_filename,
timeconvert,
unescapeHTML,
unified_strdate,
@ -261,6 +264,11 @@ class TestUtil(unittest.TestCase):
self.assertEqual(replace_extension('.abc', 'temp'), '.abc.temp')
self.assertEqual(replace_extension('.abc.ext', 'temp'), '.abc.temp')
def test_subtitles_filename(self):
self.assertEqual(subtitles_filename('abc.ext', 'en', 'vtt'), 'abc.en.vtt')
self.assertEqual(subtitles_filename('abc.ext', 'en', 'vtt', 'ext'), 'abc.en.vtt')
self.assertEqual(subtitles_filename('abc.unexpected_ext', 'en', 'vtt', 'ext'), 'abc.unexpected_ext.en.vtt')
def test_remove_start(self):
self.assertEqual(remove_start(None, 'A - '), None)
self.assertEqual(remove_start('A - B', 'A - '), 'B')
@ -334,6 +342,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_strdate('July 15th, 2013'), '20130715')
self.assertEqual(unified_strdate('September 1st, 2013'), '20130901')
self.assertEqual(unified_strdate('Sep 2nd, 2013'), '20130902')
self.assertEqual(unified_strdate('November 3rd, 2019'), '20191103')
self.assertEqual(unified_strdate('October 23rd, 2005'), '20051023')
def test_unified_timestamps(self):
self.assertEqual(unified_timestamp('December 21, 2010'), 1292889600)
@ -489,6 +499,12 @@ class TestUtil(unittest.TestCase):
def test_str_to_int(self):
self.assertEqual(str_to_int('123,456'), 123456)
self.assertEqual(str_to_int('123.456'), 123456)
self.assertEqual(str_to_int(523), 523)
# Python 3 has no long
if sys.version_info < (3, 0):
eval('self.assertEqual(str_to_int(123456L), 123456)')
self.assertEqual(str_to_int('noninteger'), None)
self.assertEqual(str_to_int([]), None)
def test_url_basename(self):
self.assertEqual(url_basename('http://foo.de/'), '')
@ -787,6 +803,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(mimetype2ext('text/vtt'), 'vtt')
self.assertEqual(mimetype2ext('text/vtt;charset=utf-8'), 'vtt')
self.assertEqual(mimetype2ext('text/html; charset=utf-8'), 'html')
self.assertEqual(mimetype2ext('audio/x-wav'), 'wav')
self.assertEqual(mimetype2ext('audio/x-wav;codec=pcm'), 'wav')
def test_month_by_name(self):
self.assertEqual(month_by_name(None), None)
@ -976,6 +994,12 @@ class TestUtil(unittest.TestCase):
on = js_to_json('{42:4.2e1}')
self.assertEqual(json.loads(on), {'42': 42.0})
on = js_to_json('{ "0x40": "0x40" }')
self.assertEqual(json.loads(on), {'0x40': '0x40'})
on = js_to_json('{ "040": "040" }')
self.assertEqual(json.loads(on), {'040': '040'})
def test_js_to_json_malformed(self):
self.assertEqual(js_to_json('42a1'), '42"a1"')
self.assertEqual(js_to_json('42a-1'), '42"a"-1')
@ -1361,6 +1385,20 @@ Line 1
self.assertRaises(ValueError, encode_base_n, 0, 70)
self.assertRaises(ValueError, encode_base_n, 0, 60, custom_table)
def test_caesar(self):
self.assertEqual(caesar('ace', 'abcdef', 2), 'cea')
self.assertEqual(caesar('cea', 'abcdef', -2), 'ace')
self.assertEqual(caesar('ace', 'abcdef', -2), 'eac')
self.assertEqual(caesar('eac', 'abcdef', 2), 'ace')
self.assertEqual(caesar('ace', 'abcdef', 0), 'ace')
self.assertEqual(caesar('xyz', 'abcdef', 2), 'xyz')
self.assertEqual(caesar('abc', 'acegik', 2), 'ebg')
self.assertEqual(caesar('ebg', 'acegik', -2), 'abc')
def test_rot47(self):
self.assertEqual(rot47('youtube-dl'), r'J@FEF36\5=')
self.assertEqual(rot47('YOUTUBE-DL'), r'*~&%&qt\s{')
def test_urshift(self):
self.assertEqual(urshift(3, 1), 1)
self.assertEqual(urshift(-3, 1), 2147483646)

View file

@ -267,7 +267,7 @@ class TestYoutubeChapters(unittest.TestCase):
for description, duration, expected_chapters in self._TEST_CASES:
ie = YoutubeIE()
expect_value(
self, ie._extract_chapters(description, duration),
self, ie._extract_chapters_from_description(description, duration),
expected_chapters, None)

View file

@ -74,6 +74,28 @@ _TESTS = [
]
class TestPlayerInfo(unittest.TestCase):
def test_youtube_extract_player_info(self):
PLAYER_URLS = (
('https://www.youtube.com/s/player/64dddad9/player_ias.vflset/en_US/base.js', '64dddad9'),
# obsolete
('https://www.youtube.com/yts/jsbin/player_ias-vfle4-e03/en_US/base.js', 'vfle4-e03'),
('https://www.youtube.com/yts/jsbin/player_ias-vfl49f_g4/en_US/base.js', 'vfl49f_g4'),
('https://www.youtube.com/yts/jsbin/player_ias-vflCPQUIL/en_US/base.js', 'vflCPQUIL'),
('https://www.youtube.com/yts/jsbin/player-vflzQZbt7/en_US/base.js', 'vflzQZbt7'),
('https://www.youtube.com/yts/jsbin/player-en_US-vflaxXRn1/base.js', 'vflaxXRn1'),
('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflXGBaUN.js', 'vflXGBaUN'),
('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflKjOTVq/html5player.js', 'vflKjOTVq'),
('http://s.ytimg.com/yt/swfbin/watch_as3-vflrEm9Nq.swf', 'vflrEm9Nq'),
('https://s.ytimg.com/yts/swfbin/player-vflenCdZL/watch_as3.swf', 'vflenCdZL'),
)
for player_url, expected_player_id in PLAYER_URLS:
expected_player_type = player_url.split('.')[-1]
player_type, player_id = YoutubeIE._extract_player_info(player_url)
self.assertEqual(player_type, expected_player_type)
self.assertEqual(player_id, expected_player_id)
class TestSignature(unittest.TestCase):
def setUp(self):
TEST_DIR = os.path.dirname(os.path.abspath(__file__))

View file

@ -0,0 +1,9 @@
# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This is a generated file! Do not edit.
# Cookie file entry with invalid number of fields - 6 instead of 7
www.foobar.foobar FALSE / FALSE 0 COOKIE
# Cookie file entry with invalid expires at
www.foobar.foobar FALSE / FALSE 1.7976931348623157e+308 COOKIE VALUE

View file

@ -92,6 +92,7 @@ from .utils import (
YoutubeDLCookieJar,
YoutubeDLCookieProcessor,
YoutubeDLHandler,
YoutubeDLRedirectHandler,
)
from .cache import Cache
from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER
@ -852,8 +853,9 @@ class YoutubeDL(object):
extract_flat = self.params.get('extract_flat', False)
if ((extract_flat == 'in_playlist' and 'playlist' in extra_info)
or extract_flat is True):
if self.params.get('forcejson', False):
self.to_stdout(json.dumps(ie_result))
self.__forced_printings(
ie_result, self.prepare_filename(ie_result),
incomplete=True)
return ie_result
if result_type == 'video':
@ -989,7 +991,7 @@ class YoutubeDL(object):
'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'),
'playlist_uploader_id': ie_result.get('uploader_id'),
'playlist_index': i + playliststart,
'playlist_index': playlistitems[i - 1] if playlistitems else i + playliststart,
'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'],
'webpage_url_basename': url_basename(ie_result['webpage_url']),
@ -1693,6 +1695,36 @@ class YoutubeDL(object):
subs[lang] = f
return subs
def __forced_printings(self, info_dict, filename, incomplete):
def print_mandatory(field):
if (self.params.get('force%s' % field, False)
and (not incomplete or info_dict.get(field) is not None)):
self.to_stdout(info_dict[field])
def print_optional(field):
if (self.params.get('force%s' % field, False)
and info_dict.get(field) is not None):
self.to_stdout(info_dict[field])
print_mandatory('title')
print_mandatory('id')
if self.params.get('forceurl', False) and not incomplete:
if info_dict.get('requested_formats') is not None:
for f in info_dict['requested_formats']:
self.to_stdout(f['url'] + f.get('play_path', ''))
else:
# For RTMP URLs, also include the playpath
self.to_stdout(info_dict['url'] + info_dict.get('play_path', ''))
print_optional('thumbnail')
print_optional('description')
if self.params.get('forcefilename', False) and filename is not None:
self.to_stdout(filename)
if self.params.get('forceduration', False) and info_dict.get('duration') is not None:
self.to_stdout(formatSeconds(info_dict['duration']))
print_mandatory('format')
if self.params.get('forcejson', False):
self.to_stdout(json.dumps(info_dict))
def process_info(self, info_dict):
"""Process a single resolved IE result."""
@ -1703,9 +1735,8 @@ class YoutubeDL(object):
if self._num_downloads >= int(max_downloads):
raise MaxDownloadsReached()
# TODO: backward compatibility, to be removed
info_dict['fulltitle'] = info_dict['title']
if len(info_dict['title']) > 200:
info_dict['title'] = info_dict['title'][:197] + '...'
if 'format' not in info_dict:
info_dict['format'] = info_dict['ext']
@ -1720,29 +1751,7 @@ class YoutubeDL(object):
info_dict['_filename'] = filename = self.prepare_filename(info_dict)
# Forced printings
if self.params.get('forcetitle', False):
self.to_stdout(info_dict['fulltitle'])
if self.params.get('forceid', False):
self.to_stdout(info_dict['id'])
if self.params.get('forceurl', False):
if info_dict.get('requested_formats') is not None:
for f in info_dict['requested_formats']:
self.to_stdout(f['url'] + f.get('play_path', ''))
else:
# For RTMP URLs, also include the playpath
self.to_stdout(info_dict['url'] + info_dict.get('play_path', ''))
if self.params.get('forcethumbnail', False) and info_dict.get('thumbnail') is not None:
self.to_stdout(info_dict['thumbnail'])
if self.params.get('forcedescription', False) and info_dict.get('description') is not None:
self.to_stdout(info_dict['description'])
if self.params.get('forcefilename', False) and filename is not None:
self.to_stdout(filename)
if self.params.get('forceduration', False) and info_dict.get('duration') is not None:
self.to_stdout(formatSeconds(info_dict['duration']))
if self.params.get('forceformat', False):
self.to_stdout(info_dict['format'])
if self.params.get('forcejson', False):
self.to_stdout(json.dumps(info_dict))
self.__forced_printings(info_dict, filename, incomplete=False)
# Do nothing else if in simulate mode
if self.params.get('simulate', False):
@ -1783,6 +1792,8 @@ class YoutubeDL(object):
annofn = replace_extension(filename, 'annotations.xml', info_dict.get('ext'))
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(annofn)):
self.to_screen('[info] Video annotations are already present')
elif not info_dict.get('annotations'):
self.report_warning('There are no annotations to write.')
else:
try:
self.to_screen('[info] Writing video annotations to: ' + annofn)
@ -1804,7 +1815,7 @@ class YoutubeDL(object):
ie = self.get_info_extractor(info_dict['extractor_key'])
for sub_lang, sub_info in subtitles.items():
sub_format = sub_info['ext']
sub_filename = subtitles_filename(filename, sub_lang, sub_format)
sub_filename = subtitles_filename(filename, sub_lang, sub_format, info_dict.get('ext'))
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(sub_filename)):
self.to_screen('[info] Video subtitle %s.%s is already present' % (sub_lang, sub_format))
else:
@ -2333,6 +2344,7 @@ class YoutubeDL(object):
debuglevel = 1 if self.params.get('debug_printtraffic') else 0
https_handler = make_HTTPS_handler(self.params, debuglevel=debuglevel)
ydlh = YoutubeDLHandler(self.params, debuglevel=debuglevel)
redirect_handler = YoutubeDLRedirectHandler()
data_handler = compat_urllib_request_DataHandler()
# When passing our own FileHandler instance, build_opener won't add the
@ -2346,7 +2358,7 @@ class YoutubeDL(object):
file_handler.file_open = file_open
opener = compat_urllib_request.build_opener(
proxy_handler, https_handler, cookie_processor, ydlh, data_handler, file_handler)
proxy_handler, https_handler, cookie_processor, ydlh, redirect_handler, data_handler, file_handler)
# Delete the default user-agent header, which would otherwise apply in
# cases where our custom HTTP handler doesn't come into play

View file

@ -94,7 +94,7 @@ def _real_main(argv=None):
if opts.verbose:
write_string('[debug] Batch file urls: ' + repr(batch_urls) + '\n')
except IOError:
sys.exit('ERROR: batch file could not be read')
sys.exit('ERROR: batch file %s could not be read' % opts.batchfile)
all_urls = batch_urls + [url.strip() for url in args] # batch_urls are already striped in read_batch_urls
_enc = preferredencoding()
all_urls = [url.decode(_enc, 'ignore') if isinstance(url, bytes) else url for url in all_urls]

View file

@ -57,6 +57,17 @@ try:
except ImportError: # Python 2
import cookielib as compat_cookiejar
if sys.version_info[0] == 2:
class compat_cookiejar_Cookie(compat_cookiejar.Cookie):
def __init__(self, version, name, value, *args, **kwargs):
if isinstance(name, compat_str):
name = name.encode()
if isinstance(value, compat_str):
value = value.encode()
compat_cookiejar.Cookie.__init__(self, version, name, value, *args, **kwargs)
else:
compat_cookiejar_Cookie = compat_cookiejar.Cookie
try:
import http.cookies as compat_cookies
except ImportError: # Python 2
@ -2754,6 +2765,17 @@ else:
compat_expanduser = os.path.expanduser
if compat_os_name == 'nt' and sys.version_info < (3, 8):
# os.path.realpath on Windows does not follow symbolic links
# prior to Python 3.8 (see https://bugs.python.org/issue9949)
def compat_realpath(path):
while os.path.islink(path):
path = os.path.abspath(os.readlink(path))
return path
else:
compat_realpath = os.path.realpath
if sys.version_info < (3, 0):
def compat_print(s):
from .utils import preferredencoding
@ -2976,6 +2998,7 @@ __all__ = [
'compat_basestring',
'compat_chr',
'compat_cookiejar',
'compat_cookiejar_Cookie',
'compat_cookies',
'compat_ctypes_WINFUNCTYPE',
'compat_etree_Element',
@ -2998,6 +3021,7 @@ __all__ = [
'compat_os_name',
'compat_parse_qs',
'compat_print',
'compat_realpath',
'compat_setenv',
'compat_shlex_quote',
'compat_shlex_split',

View file

@ -53,7 +53,7 @@ class DashSegmentsFD(FragmentFD):
except compat_urllib_error.HTTPError as err:
# YouTube may often return 404 HTTP error for a fragment causing the
# whole download to fail. However if the same fragment is immediately
# retried with the same request data this usually succeeds (1-2 attemps
# retried with the same request data this usually succeeds (1-2 attempts
# is usually enough) thus allowing to download the whole file successfully.
# To be future-proof we will retry all fragments that fail with any
# HTTP error.

View file

@ -194,6 +194,7 @@ class Aria2cFD(ExternalFD):
cmd += self._option('--interface', 'source_address')
cmd += self._option('--all-proxy', 'proxy')
cmd += self._bool_option('--check-certificate', 'nocheckcertificate', 'false', 'true', '=')
cmd += self._bool_option('--remote-time', 'updatetime', 'true', 'false', '=')
cmd += ['--', info_dict['url']]
return cmd

View file

@ -190,12 +190,13 @@ class FragmentFD(FileDownloader):
})
def _start_frag_download(self, ctx):
resume_len = ctx['complete_frags_downloaded_bytes']
total_frags = ctx['total_frags']
# This dict stores the download progress, it's updated by the progress
# hook
state = {
'status': 'downloading',
'downloaded_bytes': ctx['complete_frags_downloaded_bytes'],
'downloaded_bytes': resume_len,
'fragment_index': ctx['fragment_index'],
'fragment_count': total_frags,
'filename': ctx['filename'],
@ -234,8 +235,8 @@ class FragmentFD(FileDownloader):
state['downloaded_bytes'] += frag_downloaded_bytes - ctx['prev_frag_downloaded_bytes']
if not ctx['live']:
state['eta'] = self.calc_eta(
start, time_now, estimated_size,
state['downloaded_bytes'])
start, time_now, estimated_size - resume_len,
state['downloaded_bytes'] - resume_len)
state['speed'] = s.get('speed') or ctx.get('speed')
ctx['speed'] = state['speed']
ctx['prev_frag_downloaded_bytes'] = frag_downloaded_bytes

View file

@ -64,7 +64,7 @@ class HlsFD(FragmentFD):
s = urlh.read().decode('utf-8', 'ignore')
if not self.can_download(s, info_dict):
if info_dict.get('extra_param_to_segment_url'):
if info_dict.get('extra_param_to_segment_url') or info_dict.get('_decryption_key_url'):
self.report_error('pycrypto not found. Please install it.')
return False
self.report_warning(
@ -141,7 +141,7 @@ class HlsFD(FragmentFD):
count = 0
headers = info_dict.get('http_headers', {})
if byte_range:
headers['Range'] = 'bytes=%d-%d' % (byte_range['start'], byte_range['end'])
headers['Range'] = 'bytes=%d-%d' % (byte_range['start'], byte_range['end'] - 1)
while count <= fragment_retries:
try:
success, frag_content = self._download_fragment(
@ -169,7 +169,7 @@ class HlsFD(FragmentFD):
if decrypt_info['METHOD'] == 'AES-128':
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen(
self._prepare_url(info_dict, decrypt_info['URI'])).read()
self._prepare_url(info_dict, info_dict.get('_decryption_key_url') or decrypt_info['URI'])).read()
frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
self._append_fragment(ctx, frag_content)

View file

@ -106,7 +106,12 @@ class HttpFD(FileDownloader):
set_range(request, range_start, range_end)
# Establish connection
try:
ctx.data = self.ydl.urlopen(request)
try:
ctx.data = self.ydl.urlopen(request)
except (compat_urllib_error.URLError, ) as err:
if isinstance(err.reason, socket.timeout):
raise RetryDownload(err)
raise err
# When trying to resume, Content-Range HTTP header of response has to be checked
# to match the value of requested Range HTTP header. This is due to a webservers
# that don't support resuming and serve a whole file with no Content-Range
@ -218,24 +223,27 @@ class HttpFD(FileDownloader):
def retry(e):
to_stdout = ctx.tmpfilename == '-'
if not to_stdout:
ctx.stream.close()
ctx.stream = None
if ctx.stream is not None:
if not to_stdout:
ctx.stream.close()
ctx.stream = None
ctx.resume_len = byte_counter if to_stdout else os.path.getsize(encodeFilename(ctx.tmpfilename))
raise RetryDownload(e)
while True:
try:
# Download and write
data_block = ctx.data.read(block_size if not is_test else min(block_size, data_len - byte_counter))
data_block = ctx.data.read(block_size if data_len is None else min(block_size, data_len - byte_counter))
# socket.timeout is a subclass of socket.error but may not have
# errno set
except socket.timeout as e:
retry(e)
except socket.error as e:
if e.errno not in (errno.ECONNRESET, errno.ETIMEDOUT):
raise
retry(e)
# SSLError on python 2 (inherits socket.error) may have
# no errno set but this error message
if e.errno in (errno.ECONNRESET, errno.ETIMEDOUT) or getattr(e, 'message', None) == 'The read operation timed out':
retry(e)
raise
byte_counter += len(data_block)
@ -299,7 +307,7 @@ class HttpFD(FileDownloader):
'elapsed': now - ctx.start_time,
})
if is_test and byte_counter == data_len:
if data_len is not None and byte_counter == data_len:
break
if not is_test and ctx.chunk_size and ctx.data_len is not None and byte_counter < ctx.data_len:

View file

@ -146,7 +146,7 @@ def write_piff_header(stream, params):
sps, pps = codec_private_data.split(u32.pack(1))[1:]
avcc_payload = u8.pack(1) # configuration version
avcc_payload += sps[1:4] # avc profile indication + profile compatibility + avc level indication
avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1)) # complete represenation (1) + reserved (11111) + length size minus one
avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1)) # complete representation (1) + reserved (11111) + length size minus one
avcc_payload += u8.pack(1) # reserved (0) + number of sps (0000001)
avcc_payload += u16.pack(len(sps))
avcc_payload += sps

View file

@ -110,17 +110,17 @@ class ABCIViewIE(InfoExtractor):
# ABC iview programs are normally available for 14 days only.
_TESTS = [{
'url': 'https://iview.abc.net.au/show/ben-and-hollys-little-kingdom/series/0/video/ZX9371A050S00',
'md5': 'cde42d728b3b7c2b32b1b94b4a548afc',
'url': 'https://iview.abc.net.au/show/gruen/series/11/video/LE1927H001S00',
'md5': '67715ce3c78426b11ba167d875ac6abf',
'info_dict': {
'id': 'ZX9371A050S00',
'id': 'LE1927H001S00',
'ext': 'mp4',
'title': "Gaston's Birthday",
'series': "Ben And Holly's Little Kingdom",
'description': 'md5:f9de914d02f226968f598ac76f105bcf',
'upload_date': '20180604',
'uploader_id': 'abc4kids',
'timestamp': 1528140219,
'title': "Series 11 Ep 1",
'series': "Gruen",
'description': 'md5:52cc744ad35045baf6aded2ce7287f67',
'upload_date': '20190925',
'uploader_id': 'abc1',
'timestamp': 1569445289,
},
'params': {
'skip_download': True,
@ -148,7 +148,7 @@ class ABCIViewIE(InfoExtractor):
'hdnea': token,
})
for sd in ('sd', 'sd-low'):
for sd in ('720', 'sd', 'sd-low'):
sd_url = try_get(
stream, lambda x: x['streams']['hls'][sd], compat_str)
if not sd_url:

View file

@ -4,29 +4,30 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
dict_get,
int_or_none,
parse_iso8601,
try_get,
)
class ABCOTVSIE(InfoExtractor):
IE_NAME = 'abcotvs'
IE_DESC = 'ABC Owned Television Stations'
_VALID_URL = r'https?://(?:abc(?:7(?:news|ny|chicago)?|11|13|30)|6abc)\.com(?:/[^/]+/(?P<display_id>[^/]+))?/(?P<id>\d+)'
_VALID_URL = r'https?://(?P<site>abc(?:7(?:news|ny|chicago)?|11|13|30)|6abc)\.com(?:(?:/[^/]+)*/(?P<display_id>[^/]+))?/(?P<id>\d+)'
_TESTS = [
{
'url': 'http://abc7news.com/entertainment/east-bay-museum-celebrates-vintage-synthesizers/472581/',
'info_dict': {
'id': '472581',
'id': '472548',
'display_id': 'east-bay-museum-celebrates-vintage-synthesizers',
'ext': 'mp4',
'title': 'East Bay museum celebrates vintage synthesizers',
'title': 'East Bay museum celebrates synthesized music',
'description': 'md5:24ed2bd527096ec2a5c67b9d5a9005f3',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1421123075,
'timestamp': 1421118520,
'upload_date': '20150113',
'uploader': 'Jonathan Bloom',
},
'params': {
# m3u8 download
@ -37,39 +38,63 @@ class ABCOTVSIE(InfoExtractor):
'url': 'http://abc7news.com/472581',
'only_matching': True,
},
{
'url': 'https://6abc.com/man-75-killed-after-being-struck-by-vehicle-in-chester/5725182/',
'only_matching': True,
},
]
_SITE_MAP = {
'6abc': 'wpvi',
'abc11': 'wtvd',
'abc13': 'ktrk',
'abc30': 'kfsn',
'abc7': 'kabc',
'abc7chicago': 'wls',
'abc7news': 'kgo',
'abc7ny': 'wabc',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
site, display_id, video_id = re.match(self._VALID_URL, url).groups()
display_id = display_id or video_id
station = self._SITE_MAP[site]
webpage = self._download_webpage(url, display_id)
data = self._download_json(
'https://api.abcotvs.com/v2/content', display_id, query={
'id': video_id,
'key': 'otv.web.%s.story' % station,
'station': station,
})['data']
video = try_get(data, lambda x: x['featuredMedia']['video'], dict) or data
video_id = compat_str(dict_get(video, ('id', 'publishedKey'), video_id))
title = video.get('title') or video['linkText']
m3u8 = self._html_search_meta(
'contentURL', webpage, 'm3u8 url', fatal=True).split('?')[0]
formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4')
formats = []
m3u8_url = video.get('m3u8')
if m3u8_url:
formats = self._extract_m3u8_formats(
video['m3u8'].split('?')[0], display_id, 'mp4', m3u8_id='hls', fatal=False)
mp4_url = video.get('mp4')
if mp4_url:
formats.append({
'abr': 128,
'format_id': 'https',
'height': 360,
'url': mp4_url,
'width': 640,
})
self._sort_formats(formats)
title = self._og_search_title(webpage).strip()
description = self._og_search_description(webpage).strip()
thumbnail = self._og_search_thumbnail(webpage)
timestamp = parse_iso8601(self._search_regex(
r'<div class="meta">\s*<time class="timeago" datetime="([^"]+)">',
webpage, 'upload date', fatal=False))
uploader = self._search_regex(
r'rel="author">([^<]+)</a>',
webpage, 'uploader', default=None)
image = video.get('image') or {}
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'uploader': uploader,
'description': dict_get(video, ('description', 'caption'), try_get(video, lambda x: x['meta']['description'])),
'thumbnail': dict_get(image, ('source', 'dynamicSource')),
'timestamp': int_or_none(video.get('date')),
'duration': int_or_none(video.get('length')),
'formats': formats,
}

View file

@ -1,95 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_str,
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
)
from ..utils import (
ExtractorError,
qualities,
)
class AddAnimeIE(InfoExtractor):
_VALID_URL = r'https?://(?:\w+\.)?add-anime\.net/(?:watch_video\.php\?(?:.*?)v=|video/)(?P<id>[\w_]+)'
_TESTS = [{
'url': 'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9',
'md5': '72954ea10bc979ab5e2eb288b21425a0',
'info_dict': {
'id': '24MR3YO5SAS9',
'ext': 'mp4',
'description': 'One Piece 606',
'title': 'One Piece 606',
},
'skip': 'Video is gone',
}, {
'url': 'http://add-anime.net/video/MDUGWYKNGBD8/One-Piece-687',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
try:
webpage = self._download_webpage(url, video_id)
except ExtractorError as ee:
if not isinstance(ee.cause, compat_HTTPError) or \
ee.cause.code != 503:
raise
redir_webpage = ee.cause.read().decode('utf-8')
action = self._search_regex(
r'<form id="challenge-form" action="([^"]+)"',
redir_webpage, 'Redirect form')
vc = self._search_regex(
r'<input type="hidden" name="jschl_vc" value="([^"]+)"/>',
redir_webpage, 'redirect vc value')
av = re.search(
r'a\.value = ([0-9]+)[+]([0-9]+)[*]([0-9]+);',
redir_webpage)
if av is None:
raise ExtractorError('Cannot find redirect math task')
av_res = int(av.group(1)) + int(av.group(2)) * int(av.group(3))
parsed_url = compat_urllib_parse_urlparse(url)
av_val = av_res + len(parsed_url.netloc)
confirm_url = (
parsed_url.scheme + '://' + parsed_url.netloc
+ action + '?'
+ compat_urllib_parse_urlencode({
'jschl_vc': vc, 'jschl_answer': compat_str(av_val)}))
self._download_webpage(
confirm_url, video_id,
note='Confirming after redirect')
webpage = self._download_webpage(url, video_id)
FORMATS = ('normal', 'hq')
quality = qualities(FORMATS)
formats = []
for format_id in FORMATS:
rex = r"var %s_video_file = '(.*?)';" % re.escape(format_id)
video_url = self._search_regex(rex, webpage, 'video file URLx',
fatal=False)
if not video_url:
continue
formats.append({
'format_id': format_id,
'url': video_url,
'quality': quality(format_id),
})
self._sort_formats(formats)
video_title = self._og_search_title(webpage)
video_description = self._og_search_description(webpage)
return {
'_type': 'video',
'id': video_id,
'formats': formats,
'title': video_title,
'description': video_description
}

View file

@ -1,25 +1,119 @@
from __future__ import unicode_literals
import functools
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
parse_duration,
unified_strdate,
str_to_int,
int_or_none,
float_or_none,
int_or_none,
ISO639Utils,
determine_ext,
OnDemandPagedList,
parse_duration,
str_or_none,
str_to_int,
unified_strdate,
)
class AdobeTVBaseIE(InfoExtractor):
_API_BASE_URL = 'http://tv.adobe.com/api/v4/'
def _call_api(self, path, video_id, query, note=None):
return self._download_json(
'http://tv.adobe.com/api/v4/' + path,
video_id, note, query=query)['data']
def _parse_subtitles(self, video_data, url_key):
subtitles = {}
for translation in video_data.get('translations', []):
vtt_path = translation.get(url_key)
if not vtt_path:
continue
lang = translation.get('language_w3c') or ISO639Utils.long2short(translation['language_medium'])
subtitles.setdefault(lang, []).append({
'ext': 'vtt',
'url': vtt_path,
})
return subtitles
def _parse_video_data(self, video_data):
video_id = compat_str(video_data['id'])
title = video_data['title']
s3_extracted = False
formats = []
for source in video_data.get('videos', []):
source_url = source.get('url')
if not source_url:
continue
f = {
'format_id': source.get('quality_level'),
'fps': int_or_none(source.get('frame_rate')),
'height': int_or_none(source.get('height')),
'tbr': int_or_none(source.get('video_data_rate')),
'width': int_or_none(source.get('width')),
'url': source_url,
}
original_filename = source.get('original_filename')
if original_filename:
if not (f.get('height') and f.get('width')):
mobj = re.search(r'_(\d+)x(\d+)', original_filename)
if mobj:
f.update({
'height': int(mobj.group(2)),
'width': int(mobj.group(1)),
})
if original_filename.startswith('s3://') and not s3_extracted:
formats.append({
'format_id': 'original',
'preference': 1,
'url': original_filename.replace('s3://', 'https://s3.amazonaws.com/'),
})
s3_extracted = True
formats.append(f)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': video_data.get('description'),
'thumbnail': video_data.get('thumbnail'),
'upload_date': unified_strdate(video_data.get('start_date')),
'duration': parse_duration(video_data.get('duration')),
'view_count': str_to_int(video_data.get('playcount')),
'formats': formats,
'subtitles': self._parse_subtitles(video_data, 'vtt'),
}
class AdobeTVEmbedIE(AdobeTVBaseIE):
IE_NAME = 'adobetv:embed'
_VALID_URL = r'https?://tv\.adobe\.com/embed/\d+/(?P<id>\d+)'
_TEST = {
'url': 'https://tv.adobe.com/embed/22/4153',
'md5': 'c8c0461bf04d54574fc2b4d07ac6783a',
'info_dict': {
'id': '4153',
'ext': 'flv',
'title': 'Creating Graphics Optimized for BlackBerry',
'description': 'md5:eac6e8dced38bdaae51cd94447927459',
'thumbnail': r're:https?://.*\.jpg$',
'upload_date': '20091109',
'duration': 377,
'view_count': int,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._call_api(
'episode/' + video_id, video_id, {'disclosure': 'standard'})[0]
return self._parse_video_data(video_data)
class AdobeTVIE(AdobeTVBaseIE):
IE_NAME = 'adobetv'
_VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?watch/(?P<show_urlname>[^/]+)/(?P<id>[^/]+)'
_TEST = {
@ -42,45 +136,33 @@ class AdobeTVIE(AdobeTVBaseIE):
if not language:
language = 'en'
video_data = self._download_json(
self._API_BASE_URL + 'episode/get/?language=%s&show_urlname=%s&urlname=%s&disclosure=standard' % (language, show_urlname, urlname),
urlname)['data'][0]
formats = [{
'url': source['url'],
'format_id': source.get('quality_level') or source['url'].split('-')[-1].split('.')[0] or None,
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
'tbr': int_or_none(source.get('video_data_rate')),
} for source in video_data['videos']]
self._sort_formats(formats)
return {
'id': compat_str(video_data['id']),
'title': video_data['title'],
'description': video_data.get('description'),
'thumbnail': video_data.get('thumbnail'),
'upload_date': unified_strdate(video_data.get('start_date')),
'duration': parse_duration(video_data.get('duration')),
'view_count': str_to_int(video_data.get('playcount')),
'formats': formats,
}
video_data = self._call_api(
'episode/get', urlname, {
'disclosure': 'standard',
'language': language,
'show_urlname': show_urlname,
'urlname': urlname,
})[0]
return self._parse_video_data(video_data)
class AdobeTVPlaylistBaseIE(AdobeTVBaseIE):
def _parse_page_data(self, page_data):
return [self.url_result(self._get_element_url(element_data)) for element_data in page_data]
_PAGE_SIZE = 25
def _extract_playlist_entries(self, url, display_id):
page = self._download_json(url, display_id)
entries = self._parse_page_data(page['data'])
for page_num in range(2, page['paging']['pages'] + 1):
entries.extend(self._parse_page_data(
self._download_json(url + '&page=%d' % page_num, display_id)['data']))
return entries
def _fetch_page(self, display_id, query, page):
page += 1
query['page'] = page
for element_data in self._call_api(
self._RESOURCE, display_id, query, 'Download Page %d' % page):
yield self._process_data(element_data)
def _extract_playlist_entries(self, display_id, query):
return OnDemandPagedList(functools.partial(
self._fetch_page, display_id, query), self._PAGE_SIZE)
class AdobeTVShowIE(AdobeTVPlaylistBaseIE):
IE_NAME = 'adobetv:show'
_VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?show/(?P<id>[^/]+)'
_TEST = {
@ -92,26 +174,31 @@ class AdobeTVShowIE(AdobeTVPlaylistBaseIE):
},
'playlist_mincount': 136,
}
def _get_element_url(self, element_data):
return element_data['urls'][0]
_RESOURCE = 'episode'
_process_data = AdobeTVBaseIE._parse_video_data
def _real_extract(self, url):
language, show_urlname = re.match(self._VALID_URL, url).groups()
if not language:
language = 'en'
query = 'language=%s&show_urlname=%s' % (language, show_urlname)
query = {
'disclosure': 'standard',
'language': language,
'show_urlname': show_urlname,
}
show_data = self._download_json(self._API_BASE_URL + 'show/get/?%s' % query, show_urlname)['data'][0]
show_data = self._call_api(
'show/get', show_urlname, query)[0]
return self.playlist_result(
self._extract_playlist_entries(self._API_BASE_URL + 'episode/?%s' % query, show_urlname),
compat_str(show_data['id']),
show_data['show_name'],
show_data['show_description'])
self._extract_playlist_entries(show_urlname, query),
str_or_none(show_data.get('id')),
show_data.get('show_name'),
show_data.get('show_description'))
class AdobeTVChannelIE(AdobeTVPlaylistBaseIE):
IE_NAME = 'adobetv:channel'
_VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?channel/(?P<id>[^/]+)(?:/(?P<category_urlname>[^/]+))?'
_TEST = {
@ -121,24 +208,30 @@ class AdobeTVChannelIE(AdobeTVPlaylistBaseIE):
},
'playlist_mincount': 96,
}
_RESOURCE = 'show'
def _get_element_url(self, element_data):
return element_data['url']
def _process_data(self, show_data):
return self.url_result(
show_data['url'], 'AdobeTVShow', str_or_none(show_data.get('id')))
def _real_extract(self, url):
language, channel_urlname, category_urlname = re.match(self._VALID_URL, url).groups()
if not language:
language = 'en'
query = 'language=%s&channel_urlname=%s' % (language, channel_urlname)
query = {
'channel_urlname': channel_urlname,
'language': language,
}
if category_urlname:
query += '&category_urlname=%s' % category_urlname
query['category_urlname'] = category_urlname
return self.playlist_result(
self._extract_playlist_entries(self._API_BASE_URL + 'show/?%s' % query, channel_urlname),
self._extract_playlist_entries(channel_urlname, query),
channel_urlname)
class AdobeTVVideoIE(InfoExtractor):
class AdobeTVVideoIE(AdobeTVBaseIE):
IE_NAME = 'adobetv:video'
_VALID_URL = r'https?://video\.tv\.adobe\.com/v/(?P<id>\d+)'
_TEST = {
@ -160,38 +253,36 @@ class AdobeTVVideoIE(InfoExtractor):
video_data = self._parse_json(self._search_regex(
r'var\s+bridge\s*=\s*([^;]+);', webpage, 'bridged data'), video_id)
title = video_data['title']
formats = [{
'format_id': '%s-%s' % (determine_ext(source['src']), source.get('height')),
'url': source['src'],
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
'tbr': int_or_none(source.get('bitrate')),
} for source in video_data['sources']]
formats = []
sources = video_data.get('sources') or []
for source in sources:
source_src = source.get('src')
if not source_src:
continue
formats.append({
'filesize': int_or_none(source.get('kilobytes') or None, invscale=1000),
'format_id': '-'.join(filter(None, [source.get('format'), source.get('label')])),
'height': int_or_none(source.get('height') or None),
'tbr': int_or_none(source.get('bitrate') or None),
'width': int_or_none(source.get('width') or None),
'url': source_src,
})
self._sort_formats(formats)
# For both metadata and downloaded files the duration varies among
# formats. I just pick the max one
duration = max(filter(None, [
float_or_none(source.get('duration'), scale=1000)
for source in video_data['sources']]))
subtitles = {}
for translation in video_data.get('translations', []):
lang_id = translation.get('language_w3c') or ISO639Utils.long2short(translation['language_medium'])
if lang_id not in subtitles:
subtitles[lang_id] = []
subtitles[lang_id].append({
'url': translation['vttPath'],
'ext': 'vtt',
})
for source in sources]))
return {
'id': video_id,
'formats': formats,
'title': video_data['title'],
'title': title,
'description': video_data.get('description'),
'thumbnail': video_data['video'].get('poster'),
'thumbnail': video_data.get('video', {}).get('poster'),
'duration': duration,
'subtitles': subtitles,
'subtitles': self._parse_subtitles(video_data, 'vttPath'),
}

View file

@ -275,7 +275,7 @@ class AfreecaTVIE(InfoExtractor):
video_element = video_xml.findall(compat_xpath('./track/video'))[-1]
if video_element is None or video_element.text is None:
raise ExtractorError(
'Video %s video does not exist' % video_id, expected=True)
'Video %s does not exist' % video_id, expected=True)
video_url = video_element.text.strip()

View file

@ -5,6 +5,7 @@ from .common import InfoExtractor
from ..utils import (
clean_html,
int_or_none,
js_to_json,
try_get,
unified_strdate,
)
@ -13,22 +14,21 @@ from ..utils import (
class AmericasTestKitchenIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?americastestkitchen\.com/(?:episode|videos)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.americastestkitchen.com/episode/548-summer-dinner-party',
'url': 'https://www.americastestkitchen.com/episode/582-weeknight-japanese-suppers',
'md5': 'b861c3e365ac38ad319cfd509c30577f',
'info_dict': {
'id': '1_5g5zua6e',
'title': 'Summer Dinner Party',
'id': '5b400b9ee338f922cb06450c',
'title': 'Weeknight Japanese Suppers',
'ext': 'mp4',
'description': 'md5:858d986e73a4826979b6a5d9f8f6a1ec',
'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1497285541,
'upload_date': '20170612',
'uploader_id': 'roger.metcalf@americastestkitchen.com',
'release_date': '20170617',
'description': 'md5:3d0c1a44bb3b27607ce82652db25b4a8',
'thumbnail': r're:^https?://',
'timestamp': 1523664000,
'upload_date': '20180414',
'release_date': '20180414',
'series': "America's Test Kitchen",
'season_number': 17,
'episode': 'Summer Dinner Party',
'episode_number': 24,
'season_number': 18,
'episode': 'Weeknight Japanese Suppers',
'episode_number': 15,
},
'params': {
'skip_download': True,
@ -47,7 +47,7 @@ class AmericasTestKitchenIE(InfoExtractor):
self._search_regex(
r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>',
webpage, 'initial context'),
video_id)
video_id, js_to_json)
ep_data = try_get(
video_data,
@ -55,17 +55,7 @@ class AmericasTestKitchenIE(InfoExtractor):
lambda x: x['videoDetail']['content']['data']), dict)
ep_meta = ep_data.get('full_video', {})
zype_id = ep_meta.get('zype_id')
if zype_id:
embed_url = 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id
ie_key = 'Zype'
else:
partner_id = self._search_regex(
r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
webpage, 'kaltura partner id')
external_id = ep_data.get('external_id') or ep_meta['external_id']
embed_url = 'kaltura:%s:%s' % (partner_id, external_id)
ie_key = 'Kaltura'
zype_id = ep_data.get('zype_id') or ep_meta['zype_id']
title = ep_data.get('title') or ep_meta.get('title')
description = clean_html(ep_meta.get('episode_description') or ep_data.get(
@ -79,8 +69,8 @@ class AmericasTestKitchenIE(InfoExtractor):
return {
'_type': 'url_transparent',
'url': embed_url,
'ie_key': ie_key,
'url': 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id,
'ie_key': 'Zype',
'title': title,
'description': description,
'thumbnail': thumbnail,

View file

@ -1,6 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
@ -22,7 +23,101 @@ from ..utils import (
from ..compat import compat_etree_fromstring
class ARDMediathekIE(InfoExtractor):
class ARDMediathekBaseIE(InfoExtractor):
_GEO_COUNTRIES = ['DE']
def _extract_media_info(self, media_info_url, webpage, video_id):
media_info = self._download_json(
media_info_url, video_id, 'Downloading media JSON')
return self._parse_media_info(media_info, video_id, '"fsk"' in webpage)
def _parse_media_info(self, media_info, video_id, fsk):
formats = self._extract_formats(media_info, video_id)
if not formats:
if fsk:
raise ExtractorError(
'This video is only available after 20:00', expected=True)
elif media_info.get('_geoblocked'):
self.raise_geo_restricted(
'This video is not available due to geoblocking',
countries=self._GEO_COUNTRIES)
self._sort_formats(formats)
subtitles = {}
subtitle_url = media_info.get('_subtitleUrl')
if subtitle_url:
subtitles['de'] = [{
'ext': 'ttml',
'url': subtitle_url,
}]
return {
'id': video_id,
'duration': int_or_none(media_info.get('_duration')),
'thumbnail': media_info.get('_previewImage'),
'is_live': media_info.get('_isLive') is True,
'formats': formats,
'subtitles': subtitles,
}
def _extract_formats(self, media_info, video_id):
type_ = media_info.get('_type')
media_array = media_info.get('_mediaArray', [])
formats = []
for num, media in enumerate(media_array):
for stream in media.get('_mediaStreamArray', []):
stream_urls = stream.get('_stream')
if not stream_urls:
continue
if not isinstance(stream_urls, list):
stream_urls = [stream_urls]
quality = stream.get('_quality')
server = stream.get('_server')
for stream_url in stream_urls:
if not url_or_none(stream_url):
continue
ext = determine_ext(stream_url)
if quality != 'auto' and ext in ('f4m', 'm3u8'):
continue
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
update_url_query(stream_url, {
'hdcore': '3.1.1',
'plugin': 'aasp-3.1.1.69.124'
}), video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
stream_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
else:
if server and server.startswith('rtmp'):
f = {
'url': server,
'play_path': stream_url,
'format_id': 'a%s-rtmp-%s' % (num, quality),
}
else:
f = {
'url': stream_url,
'format_id': 'a%s-%s-%s' % (num, ext, quality)
}
m = re.search(
r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$',
stream_url)
if m:
f.update({
'width': int(m.group('width')),
'height': int(m.group('height')),
})
if type_ == 'audio':
f['vcodec'] = 'none'
formats.append(f)
return formats
class ARDMediathekIE(ARDMediathekBaseIE):
IE_NAME = 'ARD:mediathek'
_VALID_URL = r'^https?://(?:(?:(?:www|classic)\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
@ -63,94 +158,6 @@ class ARDMediathekIE(InfoExtractor):
def suitable(cls, url):
return False if ARDBetaMediathekIE.suitable(url) else super(ARDMediathekIE, cls).suitable(url)
def _extract_media_info(self, media_info_url, webpage, video_id):
media_info = self._download_json(
media_info_url, video_id, 'Downloading media JSON')
formats = self._extract_formats(media_info, video_id)
if not formats:
if '"fsk"' in webpage:
raise ExtractorError(
'This video is only available after 20:00', expected=True)
elif media_info.get('_geoblocked'):
raise ExtractorError('This video is not available due to geo restriction', expected=True)
self._sort_formats(formats)
duration = int_or_none(media_info.get('_duration'))
thumbnail = media_info.get('_previewImage')
is_live = media_info.get('_isLive') is True
subtitles = {}
subtitle_url = media_info.get('_subtitleUrl')
if subtitle_url:
subtitles['de'] = [{
'ext': 'ttml',
'url': subtitle_url,
}]
return {
'id': video_id,
'duration': duration,
'thumbnail': thumbnail,
'is_live': is_live,
'formats': formats,
'subtitles': subtitles,
}
def _extract_formats(self, media_info, video_id):
type_ = media_info.get('_type')
media_array = media_info.get('_mediaArray', [])
formats = []
for num, media in enumerate(media_array):
for stream in media.get('_mediaStreamArray', []):
stream_urls = stream.get('_stream')
if not stream_urls:
continue
if not isinstance(stream_urls, list):
stream_urls = [stream_urls]
quality = stream.get('_quality')
server = stream.get('_server')
for stream_url in stream_urls:
if not url_or_none(stream_url):
continue
ext = determine_ext(stream_url)
if quality != 'auto' and ext in ('f4m', 'm3u8'):
continue
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
update_url_query(stream_url, {
'hdcore': '3.1.1',
'plugin': 'aasp-3.1.1.69.124'
}),
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
if server and server.startswith('rtmp'):
f = {
'url': server,
'play_path': stream_url,
'format_id': 'a%s-rtmp-%s' % (num, quality),
}
else:
f = {
'url': stream_url,
'format_id': 'a%s-%s-%s' % (num, ext, quality)
}
m = re.search(r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$', stream_url)
if m:
f.update({
'width': int(m.group('width')),
'height': int(m.group('height')),
})
if type_ == 'audio':
f['vcodec'] = 'none'
formats.append(f)
return formats
def _real_extract(self, url):
# determine video id from url
m = re.match(self._VALID_URL, url)
@ -242,7 +249,7 @@ class ARDMediathekIE(InfoExtractor):
class ARDIE(InfoExtractor):
_VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
_VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos(?:extern)?/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
_TESTS = [{
# available till 14.02.2019
'url': 'http://www.daserste.de/information/talk/maischberger/videos/das-groko-drama-zerlegen-sich-die-volksparteien-video-102.html',
@ -256,6 +263,9 @@ class ARDIE(InfoExtractor):
'upload_date': '20180214',
'thumbnail': r're:^https?://.*\.jpg$',
},
}, {
'url': 'https://www.daserste.de/information/reportage-dokumentation/erlebnis-erde/videosextern/woelfe-und-herdenschutzhunde-ungleiche-brueder-102.html',
'only_matching': True,
}, {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'only_matching': True,
@ -302,21 +312,31 @@ class ARDIE(InfoExtractor):
}
class ARDBetaMediathekIE(InfoExtractor):
_VALID_URL = r'https://(?:beta|www)\.ardmediathek\.de/[^/]+/(?:player|live)/(?P<video_id>[a-zA-Z0-9]+)(?:/(?P<display_id>[^/?#]+))?'
class ARDBetaMediathekIE(ARDMediathekBaseIE):
_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?P<client>[^/]+)/(?:player|live|video)/(?P<display_id>(?:[^/]+/)*)(?P<video_id>[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'https://beta.ardmediathek.de/ard/player/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE/die-robuste-roswita',
'md5': '2d02d996156ea3c397cfc5036b5d7f8f',
'url': 'https://ardmediathek.de/ard/video/die-robuste-roswita/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
'md5': 'dfdc87d2e7e09d073d5a80770a9ce88f',
'info_dict': {
'display_id': 'die-robuste-roswita',
'id': 'Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
'title': 'Tatort: Die robuste Roswita',
'id': '70153354',
'title': 'Die robuste Roswita',
'description': r're:^Der Mord.*trüber ist als die Ilm.',
'duration': 5316,
'thumbnail': 'https://img.ardmediathek.de/standard/00/55/43/59/34/-1774185891/16x9/960?mandant=ard',
'upload_date': '20180826',
'thumbnail': 'https://img.ardmediathek.de/standard/00/70/15/33/90/-1852531467/16x9/960?mandant=ard',
'timestamp': 1577047500,
'upload_date': '20191222',
'ext': 'mp4',
},
}, {
'url': 'https://beta.ardmediathek.de/ard/video/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
'only_matching': True,
}, {
'url': 'https://ardmediathek.de/ard/video/saartalk/saartalk-gesellschaftsgift-haltung-gegen-hass/sr-fernsehen/Y3JpZDovL3NyLW9ubGluZS5kZS9TVF84MTY4MA/',
'only_matching': True,
}, {
'url': 'https://www.ardmediathek.de/ard/video/trailer/private-eyes-s01-e01/one/Y3JpZDovL3dkci5kZS9CZWl0cmFnLTE1MTgwYzczLWNiMTEtNGNkMS1iMjUyLTg5MGYzOWQxZmQ1YQ/',
'only_matching': True,
}, {
'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3N3ci5kZS9hZXgvbzEwNzE5MTU/',
'only_matching': True,
@ -328,73 +348,75 @@ class ARDBetaMediathekIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
display_id = mobj.group('display_id') or video_id
display_id = mobj.group('display_id')
if display_id:
display_id = display_id.rstrip('/')
if not display_id:
display_id = video_id
webpage = self._download_webpage(url, display_id)
data_json = self._search_regex(r'window\.__APOLLO_STATE__\s*=\s*(\{.*);\n', webpage, 'json')
data = self._parse_json(data_json, display_id)
res = {
'id': video_id,
'display_id': display_id,
player_page = self._download_json(
'https://api.ardmediathek.de/public-gateway',
display_id, data=json.dumps({
'query': '''{
playerPage(client:"%s", clipId: "%s") {
blockedByFsk
broadcastedOn
maturityContentRating
mediaCollection {
_duration
_geoblocked
_isLive
_mediaArray {
_mediaStreamArray {
_quality
_server
_stream
}
formats = []
subtitles = {}
geoblocked = False
for widget in data.values():
if widget.get('_geoblocked') is True:
geoblocked = True
if '_duration' in widget:
res['duration'] = int_or_none(widget['_duration'])
if 'clipTitle' in widget:
res['title'] = widget['clipTitle']
if '_previewImage' in widget:
res['thumbnail'] = widget['_previewImage']
if 'broadcastedOn' in widget:
res['timestamp'] = unified_timestamp(widget['broadcastedOn'])
if 'synopsis' in widget:
res['description'] = widget['synopsis']
subtitle_url = url_or_none(widget.get('_subtitleUrl'))
if subtitle_url:
subtitles.setdefault('de', []).append({
'ext': 'ttml',
'url': subtitle_url,
})
if '_quality' in widget:
format_url = url_or_none(try_get(
widget, lambda x: x['_stream']['json'][0]))
if not format_url:
continue
ext = determine_ext(format_url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
format_url + '?hdcore=3.11.0',
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', m3u8_id='hls',
fatal=False))
else:
# HTTP formats are not available when geoblocked is True,
# other formats are fine though
if geoblocked:
continue
quality = str_or_none(widget.get('_quality'))
formats.append({
'format_id': ('http-' + quality) if quality else 'http',
'url': format_url,
'preference': 10, # Plain HTTP, that's nice
})
if not formats and geoblocked:
self.raise_geo_restricted(
msg='This video is not available due to geoblocking',
countries=['DE'])
self._sort_formats(formats)
res.update({
'subtitles': subtitles,
'formats': formats,
}
_previewImage
_subtitleUrl
_type
}
show {
title
}
synopsis
title
tracking {
atiCustomVars {
contentId
}
}
}
}''' % (mobj.group('client'), video_id),
}).encode(), headers={
'Content-Type': 'application/json'
})['data']['playerPage']
title = player_page['title']
content_id = str_or_none(try_get(
player_page, lambda x: x['tracking']['atiCustomVars']['contentId']))
media_collection = player_page.get('mediaCollection') or {}
if not media_collection and content_id:
media_collection = self._download_json(
'https://www.ardmediathek.de/play/media/' + content_id,
content_id, fatal=False) or {}
info = self._parse_media_info(
media_collection, content_id or video_id,
player_page.get('blockedByFsk'))
age_limit = None
description = player_page.get('synopsis')
maturity_content_rating = player_page.get('maturityContentRating')
if maturity_content_rating:
age_limit = int_or_none(maturity_content_rating.lstrip('FSK'))
if not age_limit and description:
age_limit = int_or_none(self._search_regex(
r'\(FSK\s*(\d+)\)\s*$', description, 'age limit', default=None))
info.update({
'age_limit': age_limit,
'display_id': display_id,
'title': title,
'description': description,
'timestamp': unified_timestamp(player_page.get('broadcastedOn')),
'series': try_get(player_page, lambda x: x['show']['title']),
})
return res
return info

View file

@ -5,14 +5,12 @@ import re
from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils import (
extract_attributes,
remove_end,
)
from ..utils import extract_attributes
class AsianCrushIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?asiancrush\.com/video/(?:[^/]+/)?0+(?P<id>\d+)v\b'
_VALID_URL_BASE = r'https?://(?:www\.)?(?P<host>(?:(?:asiancrush|yuyutv|midnightpulp)\.com|cocoro\.tv))'
_VALID_URL = r'%s/video/(?:[^/]+/)?0+(?P<id>\d+)v\b' % _VALID_URL_BASE
_TESTS = [{
'url': 'https://www.asiancrush.com/video/012869v/women-who-flirt/',
'md5': 'c3b740e48d0ba002a42c0b72857beae6',
@ -20,7 +18,7 @@ class AsianCrushIE(InfoExtractor):
'id': '1_y4tmjm5r',
'ext': 'mp4',
'title': 'Women Who Flirt',
'description': 'md5:3db14e9186197857e7063522cb89a805',
'description': 'md5:7e986615808bcfb11756eb503a751487',
'timestamp': 1496936429,
'upload_date': '20170608',
'uploader_id': 'craig@crifkin.com',
@ -28,10 +26,27 @@ class AsianCrushIE(InfoExtractor):
}, {
'url': 'https://www.asiancrush.com/video/she-was-pretty/011886v-pretty-episode-3/',
'only_matching': True,
}, {
'url': 'https://www.yuyutv.com/video/013886v/the-act-of-killing/',
'only_matching': True,
}, {
'url': 'https://www.yuyutv.com/video/peep-show/013922v-warring-factions/',
'only_matching': True,
}, {
'url': 'https://www.midnightpulp.com/video/010400v/drifters/',
'only_matching': True,
}, {
'url': 'https://www.midnightpulp.com/video/mononoke/016378v-zashikiwarashi-part-1/',
'only_matching': True,
}, {
'url': 'https://www.cocoro.tv/video/the-wonderful-wizard-of-oz/008878v-the-wonderful-wizard-of-oz-ep01/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
@ -51,7 +66,7 @@ class AsianCrushIE(InfoExtractor):
r'\bentry_id["\']\s*:\s*["\'](\d+)', webpage, 'entry id')
player = self._download_webpage(
'https://api.asiancrush.com/embeddedVideoPlayer', video_id,
'https://api.%s/embeddedVideoPlayer' % host, video_id,
query={'id': entry_id})
kaltura_id = self._search_regex(
@ -63,15 +78,23 @@ class AsianCrushIE(InfoExtractor):
r'/p(?:artner_id)?/(\d+)', player, 'partner id',
default='513551')
return self.url_result(
'kaltura:%s:%s' % (partner_id, kaltura_id),
ie=KalturaIE.ie_key(), video_id=kaltura_id,
video_title=title)
description = self._html_search_regex(
r'(?s)<div[^>]+\bclass=["\']description["\'][^>]*>(.+?)</div>',
webpage, 'description', fatal=False)
return {
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, kaltura_id),
'ie_key': KalturaIE.ie_key(),
'id': video_id,
'title': title,
'description': description,
}
class AsianCrushPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?asiancrush\.com/series/0+(?P<id>\d+)s\b'
_TEST = {
_VALID_URL = r'%s/series/0+(?P<id>\d+)s\b' % AsianCrushIE._VALID_URL_BASE
_TESTS = [{
'url': 'https://www.asiancrush.com/series/012481s/scholar-walks-night/',
'info_dict': {
'id': '12481',
@ -79,7 +102,16 @@ class AsianCrushPlaylistIE(InfoExtractor):
'description': 'md5:7addd7c5132a09fd4741152d96cce886',
},
'playlist_count': 20,
}
}, {
'url': 'https://www.yuyutv.com/series/013920s/peep-show/',
'only_matching': True,
}, {
'url': 'https://www.midnightpulp.com/series/016375s/mononoke/',
'only_matching': True,
}, {
'url': 'https://www.cocoro.tv/series/008549s/the-wonderful-wizard-of-oz/',
'only_matching': True,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
@ -96,15 +128,15 @@ class AsianCrushPlaylistIE(InfoExtractor):
entries.append(self.url_result(
mobj.group('url'), ie=AsianCrushIE.ie_key()))
title = remove_end(
self._html_search_regex(
r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
'title', default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title',
default=None) or self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title', fatal=False),
' | AsianCrush')
title = self._html_search_regex(
r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
'title', default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title',
default=None) or self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title', fatal=False)
if title:
title = re.sub(r'\s*\|\s*.+?$', '', title)
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(

View file

@ -1,202 +1,118 @@
# coding: utf-8
from __future__ import unicode_literals
import time
import hmac
import hashlib
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..compat import compat_HTTPError
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
sanitized_Request,
urlencode_postdata,
xpath_text,
)
class AtresPlayerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?atresplayer\.com/television/[^/]+/[^/]+/[^/]+/(?P<id>.+?)_\d+\.html'
_VALID_URL = r'https?://(?:www\.)?atresplayer\.com/[^/]+/[^/]+/[^/]+/[^/]+/(?P<display_id>.+?)_(?P<id>[0-9a-f]{24})'
_NETRC_MACHINE = 'atresplayer'
_TESTS = [
{
'url': 'http://www.atresplayer.com/television/programas/el-club-de-la-comedia/temporada-4/capitulo-10-especial-solidario-nochebuena_2014122100174.html',
'md5': 'efd56753cda1bb64df52a3074f62e38a',
'url': 'https://www.atresplayer.com/antena3/series/pequenas-coincidencias/temporada-1/capitulo-7-asuntos-pendientes_5d4aa2c57ed1a88fc715a615/',
'info_dict': {
'id': 'capitulo-10-especial-solidario-nochebuena',
'id': '5d4aa2c57ed1a88fc715a615',
'ext': 'mp4',
'title': 'Especial Solidario de Nochebuena',
'description': 'md5:e2d52ff12214fa937107d21064075bf1',
'duration': 5527.6,
'thumbnail': r're:^https?://.*\.jpg$',
'title': 'Capítulo 7: Asuntos pendientes',
'description': 'md5:7634cdcb4d50d5381bedf93efb537fbc',
'duration': 3413,
},
'params': {
'format': 'bestvideo',
},
'skip': 'This video is only available for registered users'
},
{
'url': 'http://www.atresplayer.com/television/especial/videoencuentros/temporada-1/capitulo-112-david-bustamante_2014121600375.html',
'md5': '6e52cbb513c405e403dbacb7aacf8747',
'info_dict': {
'id': 'capitulo-112-david-bustamante',
'ext': 'flv',
'title': 'David Bustamante',
'description': 'md5:f33f1c0a05be57f6708d4dd83a3b81c6',
'duration': 1439.0,
'thumbnail': r're:^https?://.*\.jpg$',
},
'url': 'https://www.atresplayer.com/lasexta/programas/el-club-de-la-comedia/temporada-4/capitulo-10-especial-solidario-nochebuena_5ad08edf986b2855ed47adc4/',
'only_matching': True,
},
{
'url': 'http://www.atresplayer.com/television/series/el-secreto-de-puente-viejo/el-chico-de-los-tres-lunares/capitulo-977-29-12-14_2014122400174.html',
'url': 'https://www.atresplayer.com/antena3/series/el-secreto-de-puente-viejo/el-chico-de-los-tres-lunares/capitulo-977-29-12-14_5ad51046986b2886722ccdea/',
'only_matching': True,
},
]
_USER_AGENT = 'Dalvik/1.6.0 (Linux; U; Android 4.3; GT-I9300 Build/JSS15J'
_MAGIC = 'QWtMLXs414Yo+c#_+Q#K@NN)'
_TIMESTAMP_SHIFT = 30000
_TIME_API_URL = 'http://servicios.atresplayer.com/api/admin/time.json'
_URL_VIDEO_TEMPLATE = 'https://servicios.atresplayer.com/api/urlVideo/{1}/{0}/{1}|{2}|{3}.json'
_PLAYER_URL_TEMPLATE = 'https://servicios.atresplayer.com/episode/getplayer.json?episodePk=%s'
_EPISODE_URL_TEMPLATE = 'http://www.atresplayer.com/episodexml/%s'
_LOGIN_URL = 'https://servicios.atresplayer.com/j_spring_security_check'
_ERRORS = {
'UNPUBLISHED': 'We\'re sorry, but this video is not yet available.',
'DELETED': 'This video has expired and is no longer available for online streaming.',
'GEOUNPUBLISHED': 'We\'re sorry, but this video is not available in your region due to right restrictions.',
# 'PREMIUM': 'PREMIUM',
}
_API_BASE = 'https://api.atresplayer.com/'
def _real_initialize(self):
self._login()
def _handle_error(self, e, code):
if isinstance(e.cause, compat_HTTPError) and e.cause.code == code:
error = self._parse_json(e.cause.read(), None)
if error.get('error') == 'required_registered':
self.raise_login_required()
raise ExtractorError(error['error_description'], expected=True)
raise
def _login(self):
username, password = self._get_login_info()
if username is None:
return
login_form = {
'j_username': username,
'j_password': password,
}
self._request_webpage(
self._API_BASE + 'login', None, 'Downloading login page')
request = sanitized_Request(
self._LOGIN_URL, urlencode_postdata(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
response = self._download_webpage(
request, None, 'Logging in')
try:
target_url = self._download_json(
'https://account.atresmedia.com/api/login', None,
'Logging in', headers={
'Content-Type': 'application/x-www-form-urlencoded'
}, data=urlencode_postdata({
'username': username,
'password': password,
}))['targetUrl']
except ExtractorError as e:
self._handle_error(e, 400)
error = self._html_search_regex(
r'(?s)<ul[^>]+class="[^"]*\blist_error\b[^"]*">(.+?)</ul>',
response, 'error', default=None)
if error:
raise ExtractorError(
'Unable to login: %s' % error, expected=True)
self._request_webpage(target_url, None, 'Following Target URL')
def _real_extract(self, url):
video_id = self._match_id(url)
display_id, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, video_id)
try:
episode = self._download_json(
self._API_BASE + 'client/v1/player/episode/' + video_id, video_id)
except ExtractorError as e:
self._handle_error(e, 403)
episode_id = self._search_regex(
r'episode="([^"]+)"', webpage, 'episode id')
request = sanitized_Request(
self._PLAYER_URL_TEMPLATE % episode_id,
headers={'User-Agent': self._USER_AGENT})
player = self._download_json(request, episode_id, 'Downloading player JSON')
episode_type = player.get('typeOfEpisode')
error_message = self._ERRORS.get(episode_type)
if error_message:
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error_message), expected=True)
title = episode['titulo']
formats = []
video_url = player.get('urlVideo')
if video_url:
format_info = {
'url': video_url,
'format_id': 'http',
}
mobj = re.search(r'(?P<bitrate>\d+)K_(?P<width>\d+)x(?P<height>\d+)', video_url)
if mobj:
format_info.update({
'width': int_or_none(mobj.group('width')),
'height': int_or_none(mobj.group('height')),
'tbr': int_or_none(mobj.group('bitrate')),
})
formats.append(format_info)
timestamp = int_or_none(self._download_webpage(
self._TIME_API_URL,
video_id, 'Downloading timestamp', fatal=False), 1000, time.time())
timestamp_shifted = compat_str(timestamp + self._TIMESTAMP_SHIFT)
token = hmac.new(
self._MAGIC.encode('ascii'),
(episode_id + timestamp_shifted).encode('utf-8'), hashlib.md5
).hexdigest()
request = sanitized_Request(
self._URL_VIDEO_TEMPLATE.format('windows', episode_id, timestamp_shifted, token),
headers={'User-Agent': self._USER_AGENT})
fmt_json = self._download_json(
request, video_id, 'Downloading windows video JSON')
result = fmt_json.get('resultDes')
if result.lower() != 'ok':
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, result), expected=True)
for format_id, video_url in fmt_json['resultObject'].items():
if format_id == 'token' or not video_url.startswith('http'):
for source in episode.get('sources', []):
src = source.get('src')
if not src:
continue
if 'geodeswowsmpra3player' in video_url:
# f4m_path = video_url.split('smil:', 1)[-1].split('free_', 1)[0]
# f4m_url = 'http://drg.antena3.com/{0}hds/es/sd.f4m'.format(f4m_path)
# this videos are protected by DRM, the f4m downloader doesn't support them
continue
video_url_hd = video_url.replace('free_es', 'es')
formats.extend(self._extract_f4m_formats(
video_url_hd[:-9] + '/manifest.f4m', video_id, f4m_id='hds',
fatal=False))
formats.extend(self._extract_mpd_formats(
video_url_hd[:-9] + '/manifest.mpd', video_id, mpd_id='dash',
fatal=False))
src_type = source.get('type')
if src_type == 'application/vnd.apple.mpegurl':
formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif src_type == 'application/dash+xml':
formats.extend(self._extract_mpd_formats(
src, video_id, mpd_id='dash', fatal=False))
self._sort_formats(formats)
path_data = player.get('pathData')
episode = self._download_xml(
self._EPISODE_URL_TEMPLATE % path_data, video_id,
'Downloading episode XML')
duration = float_or_none(xpath_text(
episode, './media/asset/info/technical/contentDuration', 'duration'))
art = episode.find('./media/asset/info/art')
title = xpath_text(art, './name', 'title')
description = xpath_text(art, './description', 'description')
thumbnail = xpath_text(episode, './media/asset/files/background', 'thumbnail')
subtitles = {}
subtitle_url = xpath_text(episode, './media/asset/files/subtitle', 'subtitle')
if subtitle_url:
subtitles['es'] = [{
'ext': 'srt',
'url': subtitle_url,
}]
heartbeat = episode.get('heartbeat') or {}
omniture = episode.get('omniture') or {}
get_meta = lambda x: heartbeat.get(x) or omniture.get(x)
return {
'display_id': display_id,
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'description': episode.get('descripcion'),
'thumbnail': episode.get('imgPoster'),
'duration': int_or_none(episode.get('duration')),
'formats': formats,
'subtitles': subtitles,
'channel': get_meta('channel'),
'season': get_meta('season'),
'episode_number': int_or_none(get_meta('episodeNumber')),
}

View file

@ -2,22 +2,25 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import float_or_none
from ..utils import (
clean_html,
float_or_none,
)
class AudioBoomIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?audioboom\.com/(?:boos|posts)/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'https://audioboom.com/boos/4279833-3-09-2016-czaban-hour-3?t=0',
'md5': '63a8d73a055c6ed0f1e51921a10a5a76',
'url': 'https://audioboom.com/posts/7398103-asim-chaudhry',
'md5': '7b00192e593ff227e6a315486979a42d',
'info_dict': {
'id': '4279833',
'id': '7398103',
'ext': 'mp3',
'title': '3/09/2016 Czaban Hour 3',
'description': 'Guest: Nate Davis - NFL free agency, Guest: Stan Gans',
'duration': 2245.72,
'uploader': 'SB Nation A.M.',
'uploader_url': r're:https?://(?:www\.)?audioboom\.com/channel/steveczabanyahoosportsradio',
'title': 'Asim Chaudhry',
'description': 'md5:2f3fef17dacc2595b5362e1d7d3602fc',
'duration': 4000.99,
'uploader': 'Sue Perkins: An hour or so with...',
'uploader_url': r're:https?://(?:www\.)?audioboom\.com/channel/perkins',
}
}, {
'url': 'https://audioboom.com/posts/4279833-3-09-2016-czaban-hour-3?t=0',
@ -32,8 +35,8 @@ class AudioBoomIE(InfoExtractor):
clip = None
clip_store = self._parse_json(
self._search_regex(
r'data-new-clip-store=(["\'])(?P<json>{.*?"clipId"\s*:\s*%s.*?})\1' % video_id,
self._html_search_regex(
r'data-new-clip-store=(["\'])(?P<json>{.+?})\1',
webpage, 'clip store', default='{}', group='json'),
video_id, fatal=False)
if clip_store:
@ -47,14 +50,15 @@ class AudioBoomIE(InfoExtractor):
audio_url = from_clip('clipURLPriorToLoading') or self._og_search_property(
'audio', webpage, 'audio url')
title = from_clip('title') or self._og_search_title(webpage)
description = from_clip('description') or self._og_search_description(webpage)
title = from_clip('title') or self._html_search_meta(
['og:title', 'og:audio:title', 'audio_title'], webpage)
description = from_clip('description') or clean_html(from_clip('formattedDescription')) or self._og_search_description(webpage)
duration = float_or_none(from_clip('duration') or self._html_search_meta(
'weibo:audio:duration', webpage))
uploader = from_clip('author') or self._og_search_property(
'audio:artist', webpage, 'uploader', fatal=False)
uploader = from_clip('author') or self._html_search_meta(
['og:audio:artist', 'twitter:audio:artist_name', 'audio_artist'], webpage, 'uploader')
uploader_url = from_clip('author_url') or self._html_search_meta(
'audioboo:channel', webpage, 'uploader url')

View file

@ -47,39 +47,19 @@ class AZMedienIE(InfoExtractor):
'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
'only_matching': True
}]
_API_TEMPL = 'https://www.%s/api/pub/gql/%s/NewsArticleTeaser/cb9f2f81ed22e9b47f4ca64ea3cc5a5d13e88d1d'
_PARTNER_ID = '1719221'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id')
entry_id = mobj.group('kaltura_id')
host, display_id, article_id, entry_id = re.match(self._VALID_URL, url).groups()
if not entry_id:
api_url = 'https://www.%s/api/pub/gql/%s' % (host, host.split('.')[0])
payload = {
'query': '''query VideoContext($articleId: ID!) {
article: node(id: $articleId) {
... on Article {
mainAssetRelation {
asset {
... on VideoAsset {
kalturaId
}
}
}
}
}
}''',
'variables': {'articleId': 'Article:%s' % mobj.group('article_id')},
}
json_data = self._download_json(
api_url, video_id, headers={
'Content-Type': 'application/json',
},
data=json.dumps(payload).encode())
entry_id = json_data['data']['article']['mainAssetRelation']['asset']['kalturaId']
entry_id = self._download_json(
self._API_TEMPL % (host, host.split('.')[0]), display_id, query={
'variables': json.dumps({
'contextId': 'NewsArticle:' + article_id,
}),
})['data']['context']['mainAsset']['video']['kaltura']['kalturaId']
return self.url_result(
'kaltura:%s:%s' % (self._PARTNER_ID, entry_id),

View file

@ -1,142 +0,0 @@
from __future__ import unicode_literals
import re
import itertools
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
sanitized_Request,
urlencode_postdata,
)
class BambuserIE(InfoExtractor):
IE_NAME = 'bambuser'
_VALID_URL = r'https?://bambuser\.com/v/(?P<id>\d+)'
_API_KEY = '005f64509e19a868399060af746a00aa'
_LOGIN_URL = 'https://bambuser.com/user'
_NETRC_MACHINE = 'bambuser'
_TEST = {
'url': 'http://bambuser.com/v/4050584',
# MD5 seems to be flaky, see https://travis-ci.org/ytdl-org/youtube-dl/jobs/14051016#L388
# 'md5': 'fba8f7693e48fd4e8641b3fd5539a641',
'info_dict': {
'id': '4050584',
'ext': 'flv',
'title': 'Education engineering days - lightning talks',
'duration': 3741,
'uploader': 'pixelversity',
'uploader_id': '344706',
'timestamp': 1382976692,
'upload_date': '20131028',
'view_count': int,
},
'params': {
# It doesn't respect the 'Range' header, it would download the whole video
# caused the travis builds to fail: https://travis-ci.org/ytdl-org/youtube-dl/jobs/14493845#L59
'skip_download': True,
},
}
def _login(self):
username, password = self._get_login_info()
if username is None:
return
login_form = {
'form_id': 'user_login',
'op': 'Log in',
'name': username,
'pass': password,
}
request = sanitized_Request(
self._LOGIN_URL, urlencode_postdata(login_form))
request.add_header('Referer', self._LOGIN_URL)
response = self._download_webpage(
request, None, 'Logging in')
login_error = self._html_search_regex(
r'(?s)<div class="messages error">(.+?)</div>',
response, 'login error', default=None)
if login_error:
raise ExtractorError(
'Unable to login: %s' % login_error, expected=True)
def _real_initialize(self):
self._login()
def _real_extract(self, url):
video_id = self._match_id(url)
info = self._download_json(
'http://player-c.api.bambuser.com/getVideo.json?api_key=%s&vid=%s'
% (self._API_KEY, video_id), video_id)
error = info.get('error')
if error:
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error), expected=True)
result = info['result']
return {
'id': video_id,
'title': result['title'],
'url': result['url'],
'thumbnail': result.get('preview'),
'duration': int_or_none(result.get('length')),
'uploader': result.get('username'),
'uploader_id': compat_str(result.get('owner', {}).get('uid')),
'timestamp': int_or_none(result.get('created')),
'fps': float_or_none(result.get('framerate')),
'view_count': int_or_none(result.get('views_total')),
'comment_count': int_or_none(result.get('comment_count')),
}
class BambuserChannelIE(InfoExtractor):
IE_NAME = 'bambuser:channel'
_VALID_URL = r'https?://bambuser\.com/channel/(?P<user>.*?)(?:/|#|\?|$)'
# The maximum number we can get with each request
_STEP = 50
_TEST = {
'url': 'http://bambuser.com/channel/pixelversity',
'info_dict': {
'title': 'pixelversity',
},
'playlist_mincount': 60,
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user = mobj.group('user')
urls = []
last_id = ''
for i in itertools.count(1):
req_url = (
'http://bambuser.com/xhr-api/index.php?username={user}'
'&sort=created&access_mode=0%2C1%2C2&limit={count}'
'&method=broadcast&format=json&vid_older_than={last}'
).format(user=user, count=self._STEP, last=last_id)
req = sanitized_Request(req_url)
# Without setting this header, we wouldn't get any result
req.add_header('Referer', 'http://bambuser.com/channel/%s' % user)
data = self._download_json(
req, user, 'Downloading page %d' % i)
results = data['result']
if not results:
break
last_id = results[-1]['vid']
urls.extend(self.url_result(v['page'], 'Bambuser') for v in results)
return {
'_type': 'playlist',
'title': user,
'entries': urls,
}

View file

@ -40,6 +40,7 @@ class BBCCoUkIE(InfoExtractor):
iplayer(?:/[^/]+)?/(?:episode/|playlist/)|
music/(?:clips|audiovideo/popular)[/#]|
radio/player/|
sounds/play/|
events/[^/]+/play/[^/]+/
)
(?P<id>%s)(?!/(?:episodes|broadcasts|clips))
@ -70,7 +71,7 @@ class BBCCoUkIE(InfoExtractor):
'info_dict': {
'id': 'b039d07m',
'ext': 'flv',
'title': 'Leonard Cohen, Kaleidoscope - BBC Radio 4',
'title': 'Kaleidoscope, Leonard Cohen',
'description': 'The Canadian poet and songwriter reflects on his musical career.',
},
'params': {
@ -220,6 +221,20 @@ class BBCCoUkIE(InfoExtractor):
# rtmp download
'skip_download': True,
},
}, {
'url': 'https://www.bbc.co.uk/sounds/play/m0007jzb',
'note': 'Audio',
'info_dict': {
'id': 'm0007jz9',
'ext': 'mp4',
'title': 'BBC Proms, 2019, Prom 34: WestEastern Divan Orchestra',
'description': "Live BBC Proms. WestEastern Divan Orchestra with Daniel Barenboim and Martha Argerich.",
'duration': 9840,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://www.bbc.co.uk/iplayer/playlist/p01dvks4',
'only_matching': True,
@ -513,7 +528,7 @@ class BBCCoUkIE(InfoExtractor):
def get_programme_id(item):
def get_from_attributes(item):
for p in('identifier', 'group'):
for p in ('identifier', 'group'):
value = item.get(p)
if value and re.match(r'^[pb][\da-z]{7}$', value):
return value
@ -609,7 +624,7 @@ class BBCIE(BBCCoUkIE):
'url': 'http://www.bbc.com/news/world-europe-32668511',
'info_dict': {
'id': 'world-europe-32668511',
'title': 'Russia stages massive WW2 parade despite Western boycott',
'title': 'Russia stages massive WW2 parade',
'description': 'md5:00ff61976f6081841f759a08bf78cc9c',
},
'playlist_count': 2,

View file

@ -22,10 +22,11 @@ class BellMediaIE(InfoExtractor):
bravo|
mtv|
space|
etalk
etalk|
marilyn
)\.ca|
much\.com
)/.*?(?:\bvid(?:eoid)?=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
(?:much|cp24)\.com
)/.*?(?:\b(?:vid(?:eoid)?|clipId)=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
_TESTS = [{
'url': 'https://www.bnnbloomberg.ca/video/david-cockfield-s-top-picks~1403070',
'md5': '36d3ef559cfe8af8efe15922cd3ce950',
@ -61,6 +62,9 @@ class BellMediaIE(InfoExtractor):
}, {
'url': 'http://www.etalk.ca/video?videoid=663455',
'only_matching': True,
}, {
'url': 'https://www.cp24.com/video?clipId=1982548',
'only_matching': True,
}]
_DOMAINS = {
'thecomedynetwork': 'comedy',
@ -70,6 +74,7 @@ class BellMediaIE(InfoExtractor):
'animalplanet': 'aniplan',
'etalk': 'ctv',
'bnnbloomberg': 'bnn',
'marilyn': 'ctv_marilyn',
}
def _real_extract(self, url):

View file

@ -15,6 +15,7 @@ from ..utils import (
float_or_none,
parse_iso8601,
smuggle_url,
str_or_none,
strip_jsonp,
unified_timestamp,
unsmuggle_url,
@ -23,7 +24,18 @@ from ..utils import (
class BiliBiliIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/(?P<anime_id>\d+)/play#)(?P<id>\d+)'
_VALID_URL = r'''(?x)
https?://
(?:(?:www|bangumi)\.)?
bilibili\.(?:tv|com)/
(?:
(?:
video/[aA][vV]|
anime/(?P<anime_id>\d+)/play\#
)(?P<id_bv>\d+)|
video/[bB][vV](?P<id>[^/?#&]+)
)
'''
_TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/',
@ -91,6 +103,10 @@ class BiliBiliIE(InfoExtractor):
'skip_download': True, # Test metadata only
},
}]
}, {
# new BV video id format
'url': 'https://www.bilibili.com/video/BV1JE411F741',
'only_matching': True,
}]
_APP_KEY = 'iVGUTjsxvpLeuDCf'
@ -108,7 +124,7 @@ class BiliBiliIE(InfoExtractor):
url, smuggled_data = unsmuggle_url(url, {})
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = mobj.group('id') or mobj.group('id_bv')
anime_id = mobj.group('anime_id')
webpage = self._download_webpage(url, video_id)
@ -306,3 +322,129 @@ class BiliBiliBangumiIE(InfoExtractor):
return self.playlist_result(
entries, bangumi_id,
season_info.get('bangumi_title'), season_info.get('evaluate'))
class BilibiliAudioBaseIE(InfoExtractor):
def _call_api(self, path, sid, query=None):
if not query:
query = {'sid': sid}
return self._download_json(
'https://www.bilibili.com/audio/music-service-c/web/' + path,
sid, query=query)['data']
class BilibiliAudioIE(BilibiliAudioBaseIE):
_VALID_URL = r'https?://(?:www\.)?bilibili\.com/audio/au(?P<id>\d+)'
_TEST = {
'url': 'https://www.bilibili.com/audio/au1003142',
'md5': 'fec4987014ec94ef9e666d4d158ad03b',
'info_dict': {
'id': '1003142',
'ext': 'm4a',
'title': '【tsukimi】YELLOW / 神山羊',
'artist': 'tsukimi',
'comment_count': int,
'description': 'YELLOW的mp3版',
'duration': 183,
'subtitles': {
'origin': [{
'ext': 'lrc',
}],
},
'thumbnail': r're:^https?://.+\.jpg',
'timestamp': 1564836614,
'upload_date': '20190803',
'uploader': 'tsukimi-つきみぐー',
'view_count': int,
},
}
def _real_extract(self, url):
au_id = self._match_id(url)
play_data = self._call_api('url', au_id)
formats = [{
'url': play_data['cdns'][0],
'filesize': int_or_none(play_data.get('size')),
}]
song = self._call_api('song/info', au_id)
title = song['title']
statistic = song.get('statistic') or {}
subtitles = None
lyric = song.get('lyric')
if lyric:
subtitles = {
'origin': [{
'url': lyric,
}]
}
return {
'id': au_id,
'title': title,
'formats': formats,
'artist': song.get('author'),
'comment_count': int_or_none(statistic.get('comment')),
'description': song.get('intro'),
'duration': int_or_none(song.get('duration')),
'subtitles': subtitles,
'thumbnail': song.get('cover'),
'timestamp': int_or_none(song.get('passtime')),
'uploader': song.get('uname'),
'view_count': int_or_none(statistic.get('play')),
}
class BilibiliAudioAlbumIE(BilibiliAudioBaseIE):
_VALID_URL = r'https?://(?:www\.)?bilibili\.com/audio/am(?P<id>\d+)'
_TEST = {
'url': 'https://www.bilibili.com/audio/am10624',
'info_dict': {
'id': '10624',
'title': '每日新曲推荐每日11:00更新',
'description': '每天11:00更新为你推送最新音乐',
},
'playlist_count': 19,
}
def _real_extract(self, url):
am_id = self._match_id(url)
songs = self._call_api(
'song/of-menu', am_id, {'sid': am_id, 'pn': 1, 'ps': 100})['data']
entries = []
for song in songs:
sid = str_or_none(song.get('id'))
if not sid:
continue
entries.append(self.url_result(
'https://www.bilibili.com/audio/au' + sid,
BilibiliAudioIE.ie_key(), sid))
if entries:
album_data = self._call_api('menu/info', am_id) or {}
album_title = album_data.get('title')
if album_title:
for entry in entries:
entry['album'] = album_title
return self.playlist_result(
entries, am_id, album_title, album_data.get('intro'))
return self.playlist_result(entries, am_id)
class BiliBiliPlayerIE(InfoExtractor):
_VALID_URL = r'https?://player\.bilibili\.com/player\.html\?.*?\baid=(?P<id>\d+)'
_TEST = {
'url': 'http://player.bilibili.com/player.html?aid=92494333&cid=157926707&page=1',
'only_matching': True,
}
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
'http://www.bilibili.tv/video/av%s/' % video_id,
ie=BiliBiliIE.ie_key(), video_id=video_id)

View file

@ -3,10 +3,11 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from .vk import VKIE
from ..utils import (
HEADRequest,
int_or_none,
from ..compat import (
compat_b64decode,
compat_urllib_parse_unquote,
)
from ..utils import int_or_none
class BIQLEIE(InfoExtractor):
@ -47,9 +48,16 @@ class BIQLEIE(InfoExtractor):
if VKIE.suitable(embed_url):
return self.url_result(embed_url, VKIE.ie_key(), video_id)
self._request_webpage(
HEADRequest(embed_url), video_id, headers={'Referer': url})
video_id, sig, _, access_token = self._get_cookies(embed_url)['video_ext'].value.split('%3A')
embed_page = self._download_webpage(
embed_url, video_id, headers={'Referer': url})
video_ext = self._get_cookies(embed_url).get('video_ext')
if video_ext:
video_ext = compat_urllib_parse_unquote(video_ext.value)
if not video_ext:
video_ext = compat_b64decode(self._search_regex(
r'video_ext\s*:\s*[\'"]([A-Za-z0-9+/=]+)',
embed_page, 'video_ext')).decode()
video_id, sig, _, access_token = video_ext.split(':')
item = self._download_json(
'https://api.vk.com/method/video.get', video_id,
headers={'User-Agent': 'okhttp/3.4.1'}, query={

View file

@ -7,6 +7,7 @@ import re
from .common import InfoExtractor
from ..utils import (
orderedSet,
unified_strdate,
urlencode_postdata,
)
@ -23,6 +24,7 @@ class BitChuteIE(InfoExtractor):
'description': 'md5:3f21f6fb5b1d17c3dee9cf6b5fe60b3a',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Victoria X Rave',
'upload_date': '20170813',
},
}, {
'url': 'https://www.bitchute.com/embed/lbb5G1hjPhw/',
@ -74,12 +76,17 @@ class BitChuteIE(InfoExtractor):
r'(?s)<p\b[^>]+\bclass=["\']video-author[^>]+>(.+?)</p>'),
webpage, 'uploader', fatal=False)
upload_date = unified_strdate(self._search_regex(
r'class=["\']video-publish-date[^>]+>[^<]+ at \d+:\d+ UTC on (.+?)\.',
webpage, 'upload date', fatal=False))
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'upload_date': upload_date,
'formats': formats,
}

View file

@ -11,8 +11,8 @@ from ..utils import ExtractorError
class BokeCCBaseIE(InfoExtractor):
def _extract_bokecc_formats(self, webpage, video_id, format_id=None):
player_params_str = self._html_search_regex(
r'<(?:script|embed)[^>]+src="http://p\.bokecc\.com/player\?([^"]+)',
webpage, 'player params')
r'<(?:script|embed)[^>]+src=(?P<q>["\'])(?:https?:)?//p\.bokecc\.com/(?:player|flash/player\.swf)\?(?P<query>.+?)(?P=q)',
webpage, 'player params', group='query')
player_params = compat_parse_qs(player_params_str)
@ -36,9 +36,9 @@ class BokeCCIE(BokeCCBaseIE):
_VALID_URL = r'https?://union\.bokecc\.com/playvideo\.bo\?(?P<query>.*)'
_TESTS = [{
'url': 'http://union.bokecc.com/playvideo.bo?vid=E44D40C15E65EA30&uid=CD0C5D3C8614B28B',
'url': 'http://union.bokecc.com/playvideo.bo?vid=E0ABAE9D4F509B189C33DC5901307461&uid=FE644790DE9D154A',
'info_dict': {
'id': 'CD0C5D3C8614B28B_E44D40C15E65EA30',
'id': 'FE644790DE9D154A_E0ABAE9D4F509B189C33DC5901307461',
'ext': 'flv',
'title': 'BokeCC Video',
},

View file

@ -2,43 +2,43 @@
from __future__ import unicode_literals
import base64
import json
import re
import struct
from .common import InfoExtractor
from .adobepass import AdobePassIE
from .common import InfoExtractor
from ..compat import (
compat_etree_fromstring,
compat_HTTPError,
compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse,
compat_urlparse,
compat_xml_parse_error,
compat_HTTPError,
)
from ..utils import (
determine_ext,
ExtractorError,
clean_html,
extract_attributes,
ExtractorError,
find_xpath_attr,
fix_xml_ampersands,
float_or_none,
js_to_json,
int_or_none,
js_to_json,
mimetype2ext,
parse_iso8601,
smuggle_url,
str_or_none,
unescapeHTML,
unsmuggle_url,
UnsupportedError,
update_url_query,
clean_html,
mimetype2ext,
url_or_none,
)
class BrightcoveLegacyIE(InfoExtractor):
IE_NAME = 'brightcove:legacy'
_VALID_URL = r'(?:https?://.*brightcove\.com/(services|viewer).*?\?|brightcove:)(?P<query>.*)'
_FEDERATED_URL = 'http://c.brightcove.com/services/viewer/htmlFederated'
_TESTS = [
{
@ -55,7 +55,8 @@ class BrightcoveLegacyIE(InfoExtractor):
'timestamp': 1368213670,
'upload_date': '20130510',
'uploader_id': '1589608506001',
}
},
'skip': 'The player has been deactivated by the content owner',
},
{
# From http://medianetwork.oracle.com/video/player/1785452137001
@ -70,6 +71,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'upload_date': '20120814',
'uploader_id': '1460825906',
},
'skip': 'video not playable',
},
{
# From http://mashable.com/2013/10/26/thermoelectric-bracelet-lets-you-control-your-body-temperature/
@ -79,7 +81,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'ext': 'mp4',
'title': 'This Bracelet Acts as a Personal Thermostat',
'description': 'md5:547b78c64f4112766ccf4e151c20b6a0',
'uploader': 'Mashable',
# 'uploader': 'Mashable',
'timestamp': 1382041798,
'upload_date': '20131017',
'uploader_id': '1130468786001',
@ -124,6 +126,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'id': '3550319591001',
},
'playlist_mincount': 7,
'skip': 'Unsupported URL',
},
{
# playlist with 'playlistTab' (https://github.com/ytdl-org/youtube-dl/issues/9965)
@ -133,6 +136,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'title': 'Lesson 08',
},
'playlist_mincount': 10,
'skip': 'Unsupported URL',
},
{
# playerID inferred from bcpid
@ -141,12 +145,6 @@ class BrightcoveLegacyIE(InfoExtractor):
'only_matching': True, # Tested in GenericIE
}
]
FLV_VCODECS = {
1: 'SORENSON',
2: 'ON2',
3: 'H264',
4: 'VP8',
}
@classmethod
def _build_brighcove_url(cls, object_str):
@ -238,7 +236,8 @@ class BrightcoveLegacyIE(InfoExtractor):
@classmethod
def _make_brightcove_url(cls, params):
return update_url_query(cls._FEDERATED_URL, params)
return update_url_query(
'http://c.brightcove.com/services/viewer/htmlFederated', params)
@classmethod
def _extract_brightcove_url(cls, webpage):
@ -297,38 +296,12 @@ class BrightcoveLegacyIE(InfoExtractor):
videoPlayer = query.get('@videoPlayer')
if videoPlayer:
# We set the original url as the default 'Referer' header
referer = smuggled_data.get('Referer', url)
referer = query.get('linkBaseURL', [None])[0] or smuggled_data.get('Referer', url)
video_id = videoPlayer[0]
if 'playerID' not in query:
mobj = re.search(r'/bcpid(\d+)', url)
if mobj is not None:
query['playerID'] = [mobj.group(1)]
return self._get_video_info(
videoPlayer[0], query, referer=referer)
elif 'playerKey' in query:
player_key = query['playerKey']
return self._get_playlist_info(player_key[0])
else:
raise ExtractorError(
'Cannot find playerKey= variable. Did you forget quotes in a shell invocation?',
expected=True)
def _brightcove_new_url_result(self, publisher_id, video_id):
brightcove_new_url = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s' % (publisher_id, video_id)
return self.url_result(brightcove_new_url, BrightcoveNewIE.ie_key(), video_id)
def _get_video_info(self, video_id, query, referer=None):
headers = {}
linkBase = query.get('linkBaseURL')
if linkBase is not None:
referer = linkBase[0]
if referer is not None:
headers['Referer'] = referer
webpage = self._download_webpage(self._FEDERATED_URL, video_id, headers=headers, query=query)
error_msg = self._html_search_regex(
r"<h1>We're sorry.</h1>([\s\n]*<p>.*?</p>)+", webpage,
'error message', default=None)
if error_msg is not None:
publisher_id = query.get('publisherId')
if publisher_id and publisher_id[0].isdigit():
publisher_id = publisher_id[0]
@ -339,6 +312,9 @@ class BrightcoveLegacyIE(InfoExtractor):
else:
player_id = query.get('playerID')
if player_id and player_id[0].isdigit():
headers = {}
if referer:
headers['Referer'] = referer
player_page = self._download_webpage(
'http://link.brightcove.com/services/player/bcpid' + player_id[0],
video_id, headers=headers, fatal=False)
@ -349,136 +325,16 @@ class BrightcoveLegacyIE(InfoExtractor):
if player_key:
enc_pub_id = player_key.split(',')[1].replace('~', '=')
publisher_id = struct.unpack('>Q', base64.urlsafe_b64decode(enc_pub_id))[0]
if publisher_id:
return self._brightcove_new_url_result(publisher_id, video_id)
raise ExtractorError(
'brightcove said: %s' % error_msg, expected=True)
self.report_extraction(video_id)
info = self._search_regex(r'var experienceJSON = ({.*});', webpage, 'json')
info = json.loads(info)['data']
video_info = info['programmedContent']['videoPlayer']['mediaDTO']
video_info['_youtubedl_adServerURL'] = info.get('adServerURL')
return self._extract_video_info(video_info)
def _get_playlist_info(self, player_key):
info_url = 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=%s' % player_key
playlist_info = self._download_webpage(
info_url, player_key, 'Downloading playlist information')
json_data = json.loads(playlist_info)
if 'videoList' in json_data:
playlist_info = json_data['videoList']
playlist_dto = playlist_info['mediaCollectionDTO']
elif 'playlistTabs' in json_data:
playlist_info = json_data['playlistTabs']
playlist_dto = playlist_info['lineupListDTO']['playlistDTOs'][0]
else:
raise ExtractorError('Empty playlist')
videos = [self._extract_video_info(video_info) for video_info in playlist_dto['videoDTOs']]
return self.playlist_result(videos, playlist_id='%s' % playlist_info['id'],
playlist_title=playlist_dto['displayName'])
def _extract_video_info(self, video_info):
video_id = compat_str(video_info['id'])
publisher_id = video_info.get('publisherId')
info = {
'id': video_id,
'title': video_info['displayName'].strip(),
'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
'uploader': video_info.get('publisherName'),
'uploader_id': compat_str(publisher_id) if publisher_id else None,
'duration': float_or_none(video_info.get('length'), 1000),
'timestamp': int_or_none(video_info.get('creationDate'), 1000),
}
renditions = video_info.get('renditions', []) + video_info.get('IOSRenditions', [])
if renditions:
formats = []
for rend in renditions:
url = rend['defaultURL']
if not url:
continue
ext = None
if rend['remote']:
url_comp = compat_urllib_parse_urlparse(url)
if url_comp.path.endswith('.m3u8'):
formats.extend(
self._extract_m3u8_formats(
url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
continue
elif 'akamaihd.net' in url_comp.netloc:
# This type of renditions are served through
# akamaihd.net, but they don't use f4m manifests
url = url.replace('control/', '') + '?&v=3.3.0&fp=13&r=FEEFJ&g=RTSJIMBMPFPB'
ext = 'flv'
if ext is None:
ext = determine_ext(url)
tbr = int_or_none(rend.get('encodingRate'), 1000)
a_format = {
'format_id': 'http%s' % ('-%s' % tbr if tbr else ''),
'url': url,
'ext': ext,
'filesize': int_or_none(rend.get('size')) or None,
'tbr': tbr,
}
if rend.get('audioOnly'):
a_format.update({
'vcodec': 'none',
})
else:
a_format.update({
'height': int_or_none(rend.get('frameHeight')),
'width': int_or_none(rend.get('frameWidth')),
'vcodec': rend.get('videoCodec'),
})
# m3u8 manifests with remote == false are media playlists
# Not calling _extract_m3u8_formats here to save network traffic
if ext == 'm3u8':
a_format.update({
'format_id': 'hls%s' % ('-%s' % tbr if tbr else ''),
'ext': 'mp4',
'protocol': 'm3u8_native',
})
formats.append(a_format)
self._sort_formats(formats)
info['formats'] = formats
elif video_info.get('FLVFullLengthURL') is not None:
info.update({
'url': video_info['FLVFullLengthURL'],
'vcodec': self.FLV_VCODECS.get(video_info.get('FLVFullCodec')),
'filesize': int_or_none(video_info.get('FLVFullSize')),
})
if self._downloader.params.get('include_ads', False):
adServerURL = video_info.get('_youtubedl_adServerURL')
if adServerURL:
ad_info = {
'_type': 'url',
'url': adServerURL,
}
if 'url' in info:
return {
'_type': 'playlist',
'title': info['title'],
'entries': [ad_info, info],
}
else:
return ad_info
if not info.get('url') and not info.get('formats'):
uploader_id = info.get('uploader_id')
if uploader_id:
info.update(self._brightcove_new_url_result(uploader_id, video_id))
else:
raise ExtractorError('Unable to extract video url for %s' % video_id)
return info
if publisher_id:
brightcove_new_url = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s' % (publisher_id, video_id)
if referer:
brightcove_new_url = smuggle_url(brightcove_new_url, {'referrer': referer})
return self.url_result(brightcove_new_url, BrightcoveNewIE.ie_key(), video_id)
# TODO: figure out if it's possible to extract playlistId from playerKey
# elif 'playerKey' in query:
# player_key = query['playerKey']
# return self._get_playlist_info(player_key[0])
raise UnsupportedError(url)
class BrightcoveNewIE(AdobePassIE):
@ -570,7 +426,7 @@ class BrightcoveNewIE(AdobePassIE):
# [2] looks like:
for video, script_tag, account_id, player_id, embed in re.findall(
r'''(?isx)
(<video\s+[^>]*\bdata-video-id\s*=\s*['"]?[^>]+>)
(<video(?:-js)?\s+[^>]*\bdata-video-id\s*=\s*['"]?[^>]+>)
(?:.*?
(<script[^>]+
src=["\'](?:https?:)?//players\.brightcove\.net/
@ -699,10 +555,16 @@ class BrightcoveNewIE(AdobePassIE):
subtitles = {}
for text_track in json_data.get('text_tracks', []):
if text_track.get('src'):
subtitles.setdefault(text_track.get('srclang'), []).append({
'url': text_track['src'],
})
if text_track.get('kind') != 'captions':
continue
text_track_url = url_or_none(text_track.get('src'))
if not text_track_url:
continue
lang = (str_or_none(text_track.get('srclang'))
or str_or_none(text_track.get('label')) or 'en').lower()
subtitles.setdefault(lang, []).append({
'url': text_track_url,
})
is_live = False
duration = float_or_none(json_data.get('duration'), 1000)
@ -732,45 +594,63 @@ class BrightcoveNewIE(AdobePassIE):
account_id, player_id, embed, content_type, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(
'http://players.brightcove.net/%s/%s_%s/index.min.js'
% (account_id, player_id, embed), video_id)
policy_key_id = '%s_%s' % (account_id, player_id)
policy_key = self._downloader.cache.load('brightcove', policy_key_id)
policy_key_extracted = False
store_pk = lambda x: self._downloader.cache.store('brightcove', policy_key_id, x)
policy_key = None
def extract_policy_key():
webpage = self._download_webpage(
'http://players.brightcove.net/%s/%s_%s/index.min.js'
% (account_id, player_id, embed), video_id)
catalog = self._search_regex(
r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
if catalog:
catalog = self._parse_json(
js_to_json(catalog), video_id, fatal=False)
policy_key = None
catalog = self._search_regex(
r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
if catalog:
policy_key = catalog.get('policyKey')
catalog = self._parse_json(
js_to_json(catalog), video_id, fatal=False)
if catalog:
policy_key = catalog.get('policyKey')
if not policy_key:
policy_key = self._search_regex(
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
webpage, 'policy key', group='pk')
if not policy_key:
policy_key = self._search_regex(
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
webpage, 'policy key', group='pk')
store_pk(policy_key)
return policy_key
api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/%ss/%s' % (account_id, content_type, video_id)
headers = {
'Accept': 'application/json;pk=%s' % policy_key,
}
headers = {}
referrer = smuggled_data.get('referrer')
if referrer:
headers.update({
'Referer': referrer,
'Origin': re.search(r'https?://[^/]+', referrer).group(0),
})
try:
json_data = self._download_json(api_url, video_id, headers=headers)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
json_data = self._parse_json(e.cause.read().decode(), video_id)[0]
message = json_data.get('message') or json_data['error_code']
if json_data.get('error_subcode') == 'CLIENT_GEO':
self.raise_geo_restricted(msg=message)
raise ExtractorError(message, expected=True)
raise
for _ in range(2):
if not policy_key:
policy_key = extract_policy_key()
policy_key_extracted = True
headers['Accept'] = 'application/json;pk=%s' % policy_key
try:
json_data = self._download_json(api_url, video_id, headers=headers)
break
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403):
json_data = self._parse_json(e.cause.read().decode(), video_id)[0]
message = json_data.get('message') or json_data['error_code']
if json_data.get('error_subcode') == 'CLIENT_GEO':
self.raise_geo_restricted(msg=message)
elif json_data.get('error_code') == 'INVALID_POLICY_KEY' and not policy_key_extracted:
policy_key = None
store_pk(None)
continue
raise ExtractorError(message, expected=True)
raise
errors = json_data.get('errors')
if errors and errors[0].get('error_subcode') == 'TVE_AUTH':

View file

@ -9,21 +9,26 @@ class BusinessInsiderIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?businessinsider\.(?:com|nl)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://uk.businessinsider.com/how-much-radiation-youre-exposed-to-in-everyday-life-2016-6',
'md5': 'ca237a53a8eb20b6dc5bd60564d4ab3e',
'md5': 'ffed3e1e12a6f950aa2f7d83851b497a',
'info_dict': {
'id': 'hZRllCfw',
'id': 'cjGDb0X9',
'ext': 'mp4',
'title': "Here's how much radiation you're exposed to in everyday life",
'description': 'md5:9a0d6e2c279948aadaa5e84d6d9b99bd',
'upload_date': '20170709',
'timestamp': 1499606400,
},
'params': {
'skip_download': True,
'title': "Bananas give you more radiation exposure than living next to a nuclear power plant",
'description': 'md5:0175a3baf200dd8fa658f94cade841b3',
'upload_date': '20160611',
'timestamp': 1465675620,
},
}, {
'url': 'https://www.businessinsider.nl/5-scientifically-proven-things-make-you-less-attractive-2017-7/',
'only_matching': True,
'md5': '43f438dbc6da0b89f5ac42f68529d84a',
'info_dict': {
'id': '5zJwd4FK',
'ext': 'mp4',
'title': 'Deze dingen zorgen ervoor dat je minder snel een date scoort',
'description': 'md5:2af8975825d38a4fed24717bbe51db49',
'upload_date': '20170705',
'timestamp': 1499270528,
},
}, {
'url': 'http://www.businessinsider.com/excel-index-match-vlookup-video-how-to-2015-2?IR=T',
'only_matching': True,
@ -35,7 +40,8 @@ class BusinessInsiderIE(InfoExtractor):
jwplatform_id = self._search_regex(
(r'data-media-id=["\']([a-zA-Z0-9]{8})',
r'id=["\']jwplayer_([a-zA-Z0-9]{8})',
r'id["\']?\s*:\s*["\']?([a-zA-Z0-9]{8})'),
r'id["\']?\s*:\s*["\']?([a-zA-Z0-9]{8})',
r'(?:jwplatform\.com/players/|jwplayer_)([a-zA-Z0-9]{8})'),
webpage, 'jwplatform id')
return self.url_result(
'jwplatform:%s' % jwplatform_id, ie=JWPlatformIE.ie_key(),

View file

@ -3,7 +3,12 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import parse_duration
from ..utils import (
determine_ext,
merge_dicts,
parse_duration,
url_or_none,
)
class BYUtvIE(InfoExtractor):
@ -51,7 +56,7 @@ class BYUtvIE(InfoExtractor):
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
info = self._download_json(
video = self._download_json(
'https://api.byutv.org/api3/catalog/getvideosforcontent',
display_id, query={
'contentid': video_id,
@ -62,7 +67,7 @@ class BYUtvIE(InfoExtractor):
'x-byutv-platformkey': 'xsaaw9c7y5',
})
ep = info.get('ooyalaVOD')
ep = video.get('ooyalaVOD')
if ep:
return {
'_type': 'url_transparent',
@ -75,18 +80,38 @@ class BYUtvIE(InfoExtractor):
'thumbnail': ep.get('imageThumbnail'),
}
ep = info['dvr']
title = ep['title']
formats = self._extract_m3u8_formats(
ep['videoUrl'], video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
info = {}
formats = []
for format_id, ep in video.items():
if not isinstance(ep, dict):
continue
video_url = url_or_none(ep.get('videoUrl'))
if not video_url:
continue
ext = determine_ext(video_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
video_url, video_id, mpd_id='dash', fatal=False))
else:
formats.append({
'url': video_url,
'format_id': format_id,
})
merge_dicts(info, {
'title': ep.get('title'),
'description': ep.get('description'),
'thumbnail': ep.get('imageThumbnail'),
'duration': parse_duration(ep.get('length')),
})
self._sort_formats(formats)
return {
return merge_dicts(info, {
'id': video_id,
'display_id': display_id,
'title': title,
'description': ep.get('description'),
'thumbnail': ep.get('imageThumbnail'),
'duration': parse_duration(ep.get('length')),
'title': display_id,
'formats': formats,
}
})

View file

@ -13,6 +13,8 @@ from ..utils import (
int_or_none,
merge_dicts,
parse_iso8601,
str_or_none,
url_or_none,
)
@ -20,15 +22,15 @@ class CanvasIE(InfoExtractor):
_VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza)/assets/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://mediazone.vrt.be/api/v1/ketnet/assets/md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'md5': '90139b746a0a9bd7bb631283f6e2a64e',
'md5': '68993eda72ef62386a15ea2cf3c93107',
'info_dict': {
'id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'display_id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'ext': 'flv',
'ext': 'mp4',
'title': 'Nachtwacht: De Greystook',
'description': 'md5:1db3f5dc4c7109c821261e7512975be7',
'description': 'Nachtwacht: De Greystook',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1468.03,
'duration': 1468.04,
},
'expected_warnings': ['is not a supported codec', 'Unknown MIME type'],
}, {
@ -39,23 +41,45 @@ class CanvasIE(InfoExtractor):
'HLS': 'm3u8_native',
'HLS_AES': 'm3u8',
}
_REST_API_BASE = 'https://media-services-public.vrt.be/vualto-video-aggregator-web/rest/external/v1'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
site_id, video_id = mobj.group('site_id'), mobj.group('id')
# Old API endpoint, serves more formats but may fail for some videos
data = self._download_json(
'https://mediazone.vrt.be/api/v1/%s/assets/%s'
% (site_id, video_id), video_id)
% (site_id, video_id), video_id, 'Downloading asset JSON',
'Unable to download asset JSON', fatal=False)
# New API endpoint
if not data:
token = self._download_json(
'%s/tokens' % self._REST_API_BASE, video_id,
'Downloading token', data=b'',
headers={'Content-Type': 'application/json'})['vrtPlayerToken']
data = self._download_json(
'%s/videos/%s' % (self._REST_API_BASE, video_id),
video_id, 'Downloading video JSON', fatal=False, query={
'vrtPlayerToken': token,
'client': '%s@PROD' % site_id,
}, expected_status=400)
message = data.get('message')
if message and not data.get('title'):
if data.get('code') == 'AUTHENTICATION_REQUIRED':
self.raise_login_required(message)
raise ExtractorError(message, expected=True)
title = data['title']
description = data.get('description')
formats = []
for target in data['targetUrls']:
format_url, format_type = target.get('url'), target.get('type')
format_url, format_type = url_or_none(target.get('url')), str_or_none(target.get('type'))
if not format_url or not format_type:
continue
format_type = format_type.upper()
if format_type in self._HLS_ENTRY_PROTOCOLS_MAP:
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', self._HLS_ENTRY_PROTOCOLS_MAP[format_type],
@ -134,20 +158,20 @@ class CanvasEenIE(InfoExtractor):
},
'skip': 'Pagina niet gevonden',
}, {
'url': 'https://www.een.be/sorry-voor-alles/herbekijk-sorry-voor-alles',
'url': 'https://www.een.be/thuis/emma-pakt-thilly-aan',
'info_dict': {
'id': 'mz-ast-11a587f8-b921-4266-82e2-0bce3e80d07f',
'display_id': 'herbekijk-sorry-voor-alles',
'id': 'md-ast-3a24ced2-64d7-44fb-b4ed-ed1aafbf90b8',
'display_id': 'emma-pakt-thilly-aan',
'ext': 'mp4',
'title': 'Herbekijk Sorry voor alles',
'description': 'md5:8bb2805df8164e5eb95d6a7a29dc0dd3',
'title': 'Emma pakt Thilly aan',
'description': 'md5:c5c9b572388a99b2690030afa3f3bad7',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 3788.06,
'duration': 118.24,
},
'params': {
'skip_download': True,
},
'skip': 'Episode no longer available',
'expected_warnings': ['is not a supported codec'],
}, {
'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend',
'only_matching': True,
@ -183,19 +207,44 @@ class VrtNUIE(GigyaBaseIE):
IE_DESC = 'VrtNU.be'
_VALID_URL = r'https?://(?:www\.)?vrt\.be/(?P<site_id>vrtnu)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
# Available via old API endpoint
'url': 'https://www.vrt.be/vrtnu/a-z/postbus-x/1/postbus-x-s1a1/',
'info_dict': {
'id': 'pbs-pub-2e2d8c27-df26-45c9-9dc6-90c78153044d$vid-90c932b1-e21d-4fb8-99b1-db7b49cf74de',
'ext': 'flv',
'ext': 'mp4',
'title': 'De zwarte weduwe',
'description': 'md5:d90c21dced7db869a85db89a623998d4',
'description': 'md5:db1227b0f318c849ba5eab1fef895ee4',
'duration': 1457.04,
'thumbnail': r're:^https?://.*\.jpg$',
'season': '1',
'season': 'Season 1',
'season_number': 1,
'episode_number': 1,
},
'skip': 'This video is only available for registered users'
'skip': 'This video is only available for registered users',
'params': {
'username': '<snip>',
'password': '<snip>',
},
'expected_warnings': ['is not a supported codec'],
}, {
# Only available via new API endpoint
'url': 'https://www.vrt.be/vrtnu/a-z/kamp-waes/1/kamp-waes-s1a5/',
'info_dict': {
'id': 'pbs-pub-0763b56c-64fb-4d38-b95b-af60bf433c71$vid-ad36a73c-4735-4f1f-b2c0-a38e6e6aa7e1',
'ext': 'mp4',
'title': 'Aflevering 5',
'description': 'Wie valt door de mand tijdens een missie?',
'duration': 2967.06,
'season': 'Season 1',
'season_number': 1,
'episode_number': 5,
},
'skip': 'This video is only available for registered users',
'params': {
'username': '<snip>',
'password': '<snip>',
},
'expected_warnings': ['Unable to download asset JSON', 'is not a supported codec', 'Unknown MIME type'],
}]
_NETRC_MACHINE = 'vrtnu'
_APIKEY = '3_0Z2HujMtiWq_pkAjgnS2Md2E11a1AwZjYiBETtwNE-EoEHDINgtnvcAOpNgmrVGy'

View file

@ -1,8 +1,10 @@
# coding: utf-8
from __future__ import unicode_literals
import hashlib
import json
import re
from xml.sax.saxutils import escape
from .common import InfoExtractor
from ..compat import (
@ -216,6 +218,29 @@ class CBCWatchBaseIE(InfoExtractor):
'clearleap': 'http://www.clearleap.com/namespace/clearleap/1.0/',
}
_GEO_COUNTRIES = ['CA']
_LOGIN_URL = 'https://api.loginradius.com/identity/v2/auth/login'
_TOKEN_URL = 'https://cloud-api.loginradius.com/sso/jwt/api/token'
_API_KEY = '3f4beddd-2061-49b0-ae80-6f1f2ed65b37'
_NETRC_MACHINE = 'cbcwatch'
def _signature(self, email, password):
data = json.dumps({
'email': email,
'password': password,
}).encode()
headers = {'content-type': 'application/json'}
query = {'apikey': self._API_KEY}
resp = self._download_json(self._LOGIN_URL, None, data=data, headers=headers, query=query)
access_token = resp['access_token']
# token
query = {
'access_token': access_token,
'apikey': self._API_KEY,
'jwtapp': 'jwt',
}
resp = self._download_json(self._TOKEN_URL, None, headers=headers, query=query)
return resp['signature']
def _call_api(self, path, video_id):
url = path if path.startswith('http') else self._API_BASE_URL + path
@ -239,7 +264,8 @@ class CBCWatchBaseIE(InfoExtractor):
def _real_initialize(self):
if self._valid_device_token():
return
device = self._downloader.cache.load('cbcwatch', 'device') or {}
device = self._downloader.cache.load(
'cbcwatch', self._cache_device_key()) or {}
self._device_id, self._device_token = device.get('id'), device.get('token')
if self._valid_device_token():
return
@ -248,16 +274,30 @@ class CBCWatchBaseIE(InfoExtractor):
def _valid_device_token(self):
return self._device_id and self._device_token
def _cache_device_key(self):
email, _ = self._get_login_info()
return '%s_device' % hashlib.sha256(email.encode()).hexdigest() if email else 'device'
def _register_device(self):
self._device_id = self._device_token = None
result = self._download_xml(
self._API_BASE_URL + 'device/register',
None, 'Acquiring device token',
data=b'<device><type>web</type></device>')
self._device_id = xpath_text(result, 'deviceId', fatal=True)
self._device_token = xpath_text(result, 'deviceToken', fatal=True)
email, password = self._get_login_info()
if email and password:
signature = self._signature(email, password)
data = '<login><token>{0}</token><device><deviceId>{1}</deviceId><type>web</type></device></login>'.format(
escape(signature), escape(self._device_id)).encode()
url = self._API_BASE_URL + 'device/login'
result = self._download_xml(
url, None, data=data,
headers={'content-type': 'application/xml'})
self._device_token = xpath_text(result, 'token', fatal=True)
else:
self._device_token = xpath_text(result, 'deviceToken', fatal=True)
self._downloader.cache.store(
'cbcwatch', 'device', {
'cbcwatch', self._cache_device_key(), {
'id': self._device_id,
'token': self._device_token,
})

View file

@ -147,6 +147,8 @@ class CeskaTelevizeIE(InfoExtractor):
is_live = item.get('type') == 'LIVE'
formats = []
for format_id, stream_url in item.get('streamUrls', {}).items():
if 'drmOnly=true' in stream_url:
continue
if 'playerType=flash' in stream_url:
stream_formats = self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4', 'm3u8_native',

View file

@ -32,7 +32,7 @@ class Channel9IE(InfoExtractor):
'upload_date': '20130828',
'session_code': 'KOS002',
'session_room': 'Arena 1A',
'session_speakers': ['Andrew Coates', 'Brady Gaster', 'Mads Kristensen', 'Ed Blankenship', 'Patrick Klug'],
'session_speakers': 'count:5',
},
}, {
'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing',
@ -64,15 +64,15 @@ class Channel9IE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
'url': 'https://channel9.msdn.com/Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b/RSS',
'info_dict': {
'id': 'Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b',
'title': 'Channel 9',
},
'playlist_mincount': 100,
}, {
'url': 'https://channel9.msdn.com/Events/DEVintersection/DEVintersection-2016/RSS',
'info_dict': {
'id': 'Events/DEVintersection/DEVintersection-2016',
'title': 'DEVintersection 2016 Orlando Sessions',
},
'playlist_mincount': 14,
}, {
'url': 'https://channel9.msdn.com/Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b/RSS',
'only_matching': True,
}, {
'url': 'https://channel9.msdn.com/Events/Speakers/scott-hanselman/RSS?UrlSafeName=scott-hanselman',
@ -112,11 +112,11 @@ class Channel9IE(InfoExtractor):
episode_data), content_path)
content_id = episode_data['contentId']
is_session = '/Sessions(' in episode_data['api']
content_url = 'https://channel9.msdn.com/odata' + episode_data['api']
content_url = 'https://channel9.msdn.com/odata' + episode_data['api'] + '?$select=Captions,CommentCount,MediaLengthInSeconds,PublishedDate,Rating,RatingCount,Title,VideoMP4High,VideoMP4Low,VideoMP4Medium,VideoPlayerPreviewImage,VideoWMV,VideoWMVHQ,Views,'
if is_session:
content_url += '?$expand=Speakers'
content_url += 'Code,Description,Room,Slides,Speakers,ZipFile&$expand=Speakers'
else:
content_url += '?$expand=Authors'
content_url += 'Authors,Body&$expand=Authors'
content_data = self._download_json(content_url, content_id)
title = content_data['Title']
@ -210,7 +210,7 @@ class Channel9IE(InfoExtractor):
'id': content_id,
'title': title,
'description': clean_html(content_data.get('Description') or content_data.get('Body')),
'thumbnail': content_data.get('Thumbnail') or content_data.get('VideoPlayerPreviewImage'),
'thumbnail': content_data.get('VideoPlayerPreviewImage'),
'duration': int_or_none(content_data.get('MediaLengthInSeconds')),
'timestamp': parse_iso8601(content_data.get('PublishedDate')),
'avg_rating': int_or_none(content_data.get('Rating')),

View file

@ -3,11 +3,15 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
from ..utils import (
ExtractorError,
lowercase_escape,
url_or_none,
)
class ChaturbateIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?chaturbate\.com/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:[^/]+\.)?chaturbate\.com/(?:fullvideo/?\?.*?\bb=)?(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://www.chaturbate.com/siswet19/',
'info_dict': {
@ -21,6 +25,9 @@ class ChaturbateIE(InfoExtractor):
'skip_download': True,
},
'skip': 'Room is offline',
}, {
'url': 'https://chaturbate.com/fullvideo/?b=caylin',
'only_matching': True,
}, {
'url': 'https://en.chaturbate.com/siswet19/',
'only_matching': True,
@ -32,14 +39,34 @@ class ChaturbateIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(
url, video_id, headers=self.geo_verification_headers())
'https://chaturbate.com/%s/' % video_id, video_id,
headers=self.geo_verification_headers())
found_m3u8_urls = []
data = self._parse_json(
self._search_regex(
r'initialRoomDossier\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
webpage, 'data', default='{}', group='value'),
video_id, transform_source=lowercase_escape, fatal=False)
if data:
m3u8_url = url_or_none(data.get('hls_source'))
if m3u8_url:
found_m3u8_urls.append(m3u8_url)
if not found_m3u8_urls:
for m in re.finditer(
r'(\\u002[27])(?P<url>http.+?\.m3u8.*?)\1', webpage):
found_m3u8_urls.append(lowercase_escape(m.group('url')))
if not found_m3u8_urls:
for m in re.finditer(
r'(["\'])(?P<url>http.+?\.m3u8.*?)\1', webpage):
found_m3u8_urls.append(m.group('url'))
m3u8_urls = []
for m in re.finditer(
r'(["\'])(?P<url>http.+?\.m3u8.*?)\1', webpage):
m3u8_fast_url, m3u8_no_fast_url = m.group('url'), m.group(
'url').replace('_fast', '')
for found_m3u8_url in found_m3u8_urls:
m3u8_fast_url, m3u8_no_fast_url = found_m3u8_url, found_m3u8_url.replace('_fast', '')
for m3u8_url in (m3u8_fast_url, m3u8_no_fast_url):
if m3u8_url not in m3u8_urls:
m3u8_urls.append(m3u8_url)
@ -59,7 +86,12 @@ class ChaturbateIE(InfoExtractor):
formats = []
for m3u8_url in m3u8_urls:
m3u8_id = 'fast' if '_fast' in m3u8_url else 'slow'
for known_id in ('fast', 'slow'):
if '_%s' % known_id in m3u8_url:
m3u8_id = known_id
break
else:
m3u8_id = None
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4',
# ffmpeg skips segments for fast m3u8

View file

@ -1,20 +1,24 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import re
from .common import InfoExtractor
class CloudflareStreamIE(InfoExtractor):
_DOMAIN_RE = r'(?:cloudflarestream\.com|(?:videodelivery|bytehighway)\.net)'
_EMBED_RE = r'embed\.%s/embed/[^/]+\.js\?.*?\bvideo=' % _DOMAIN_RE
_ID_RE = r'[\da-f]{32}|[\w-]+\.[\w-]+\.[\w-]+'
_VALID_URL = r'''(?x)
https?://
(?:
(?:watch\.)?(?:cloudflarestream\.com|videodelivery\.net)/|
embed\.(?:cloudflarestream\.com|videodelivery\.net)/embed/[^/]+\.js\?.*?\bvideo=
(?:watch\.)?%s/|
%s
)
(?P<id>[\da-f]+)
'''
(?P<id>%s)
''' % (_DOMAIN_RE, _EMBED_RE, _ID_RE)
_TESTS = [{
'url': 'https://embed.cloudflarestream.com/embed/we4g.fla9.latest.js?video=31c9291ab41fac05471db4e73aa11717',
'info_dict': {
@ -41,23 +45,28 @@ class CloudflareStreamIE(InfoExtractor):
return [
mobj.group('url')
for mobj in re.finditer(
r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//embed\.(?:cloudflarestream\.com|videodelivery\.net)/embed/[^/]+\.js\?.*?\bvideo=[\da-f]+?.*?)\1',
r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//%s(?:%s).*?)\1' % (CloudflareStreamIE._EMBED_RE, CloudflareStreamIE._ID_RE),
webpage)]
def _real_extract(self, url):
video_id = self._match_id(url)
domain = 'bytehighway.net' if 'bytehighway.net/' in url else 'videodelivery.net'
base_url = 'https://%s/%s/' % (domain, video_id)
if '.' in video_id:
video_id = self._parse_json(base64.urlsafe_b64decode(
video_id.split('.')[1]), video_id)['sub']
manifest_base_url = base_url + 'manifest/video.'
formats = self._extract_m3u8_formats(
'https://cloudflarestream.com/%s/manifest/video.m3u8' % video_id,
video_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False)
manifest_base_url + 'm3u8', video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False)
formats.extend(self._extract_mpd_formats(
'https://cloudflarestream.com/%s/manifest/video.mpd' % video_id,
video_id, mpd_id='dash', fatal=False))
manifest_base_url + 'mpd', video_id, mpd_id='dash', fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'title': video_id,
'thumbnail': base_url + 'thumbnails/thumbnail.jpg',
'formats': formats,
}

View file

@ -1,74 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
parse_duration,
parse_iso8601,
)
class ComCarCoffIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?comediansincarsgettingcoffee\.com/(?P<id>[a-z0-9\-]*)'
_TESTS = [{
'url': 'http://comediansincarsgettingcoffee.com/miranda-sings-happy-thanksgiving-miranda/',
'info_dict': {
'id': '2494164',
'ext': 'mp4',
'upload_date': '20141127',
'timestamp': 1417107600,
'duration': 1232,
'title': 'Happy Thanksgiving Miranda',
'description': 'Jerry Seinfeld and his special guest Miranda Sings cruise around town in search of coffee, complaining and apologizing along the way.',
},
'params': {
'skip_download': 'requires ffmpeg',
}
}]
def _real_extract(self, url):
display_id = self._match_id(url)
if not display_id:
display_id = 'comediansincarsgettingcoffee.com'
webpage = self._download_webpage(url, display_id)
full_data = self._parse_json(
self._search_regex(
r'window\.app\s*=\s*({.+?});\n', webpage, 'full data json'),
display_id)['videoData']
display_id = full_data['activeVideo']['video']
video_data = full_data.get('videos', {}).get(display_id) or full_data['singleshots'][display_id]
video_id = compat_str(video_data['mediaId'])
title = video_data['title']
formats = self._extract_m3u8_formats(
video_data['mediaUrl'], video_id, 'mp4')
self._sort_formats(formats)
thumbnails = [{
'url': video_data['images']['thumb'],
}, {
'url': video_data['images']['poster'],
}]
timestamp = int_or_none(video_data.get('pubDateTime')) or parse_iso8601(
video_data.get('pubDate'))
duration = int_or_none(video_data.get('durationSeconds')) or parse_duration(
video_data.get('duration'))
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': video_data.get('description'),
'timestamp': timestamp,
'duration': duration,
'thumbnails': thumbnails,
'formats': formats,
'season_number': int_or_none(video_data.get('season')),
'episode_number': int_or_none(video_data.get('episode')),
'webpage_url': 'http://comediansincarsgettingcoffee.com/%s' % (video_data.get('urlSlug', video_data.get('slug'))),
}

View file

@ -10,12 +10,13 @@ import os
import random
import re
import socket
import ssl
import sys
import time
import math
from ..compat import (
compat_cookiejar,
compat_cookiejar_Cookie,
compat_cookies,
compat_etree_Element,
compat_etree_fromstring,
@ -67,6 +68,7 @@ from ..utils import (
sanitized_Request,
sanitize_filename,
str_or_none,
str_to_int,
strip_or_none,
unescapeHTML,
unified_strdate,
@ -220,7 +222,7 @@ class InfoExtractor(object):
* "preference" (optional, int) - quality of the image
* "width" (optional, int)
* "height" (optional, int)
* "resolution" (optional, string "{width}x{height"},
* "resolution" (optional, string "{width}x{height}",
deprecated)
* "filesize" (optional, int)
thumbnail: Full URL to a video thumbnail image.
@ -623,9 +625,12 @@ class InfoExtractor(object):
url_or_request = update_url_query(url_or_request, query)
if data is not None or headers:
url_or_request = sanitized_Request(url_or_request, data, headers)
exceptions = [compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error]
if hasattr(ssl, 'CertificateError'):
exceptions.append(ssl.CertificateError)
try:
return self._downloader.urlopen(url_or_request)
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
except tuple(exceptions) as err:
if isinstance(err, compat_urllib_error.HTTPError):
if self.__can_accept_status_code(err, expected_status):
# Retain reference to error to prevent file object from
@ -1182,16 +1187,33 @@ class InfoExtractor(object):
'twitter card player')
def _search_json_ld(self, html, video_id, expected_type=None, **kwargs):
json_ld = self._search_regex(
JSON_LD_RE, html, 'JSON-LD', group='json_ld', **kwargs)
json_ld_list = list(re.finditer(JSON_LD_RE, html))
default = kwargs.get('default', NO_DEFAULT)
if not json_ld:
return default if default is not NO_DEFAULT else {}
# JSON-LD may be malformed and thus `fatal` should be respected.
# At the same time `default` may be passed that assumes `fatal=False`
# for _search_regex. Let's simulate the same behavior here as well.
fatal = kwargs.get('fatal', True) if default == NO_DEFAULT else False
return self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
json_ld = []
for mobj in json_ld_list:
json_ld_item = self._parse_json(
mobj.group('json_ld'), video_id, fatal=fatal)
if not json_ld_item:
continue
if isinstance(json_ld_item, dict):
json_ld.append(json_ld_item)
elif isinstance(json_ld_item, (list, tuple)):
json_ld.extend(json_ld_item)
if json_ld:
json_ld = self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
if json_ld:
return json_ld
if default is not NO_DEFAULT:
return default
elif fatal:
raise RegexNotFoundError('Unable to extract JSON-LD')
else:
self._downloader.report_warning('unable to extract JSON-LD %s' % bug_reports_message())
return {}
def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
if isinstance(json_ld, compat_str):
@ -1227,7 +1249,10 @@ class InfoExtractor(object):
interaction_type = is_e.get('interactionType')
if not isinstance(interaction_type, compat_str):
continue
interaction_count = int_or_none(is_e.get('userInteractionCount'))
# For interaction count some sites provide string instead of
# an integer (as per spec) with non digit characters (e.g. ",")
# so extracting count with more relaxed str_to_int
interaction_count = str_to_int(is_e.get('userInteractionCount'))
if interaction_count is None:
continue
count_kind = INTERACTION_TYPE_MAP.get(interaction_type.split('/')[-1])
@ -1247,6 +1272,7 @@ class InfoExtractor(object):
'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')),
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')),
'uploader': str_or_none(e.get('author')),
'filesize': float_or_none(e.get('contentSize')),
'tbr': int_or_none(e.get('bitrate')),
'width': int_or_none(e.get('width')),
@ -1256,10 +1282,10 @@ class InfoExtractor(object):
extract_interaction_statistic(e)
for e in json_ld:
if isinstance(e.get('@context'), compat_str) and re.match(r'^https?://schema.org/?$', e.get('@context')):
if '@context' in e:
item_type = e.get('@type')
if expected_type is not None and expected_type != item_type:
return info
continue
if item_type in ('TVEpisode', 'Episode'):
episode_name = unescapeHTML(e.get('name'))
info.update({
@ -1293,11 +1319,17 @@ class InfoExtractor(object):
})
elif item_type == 'VideoObject':
extract_video_object(e)
continue
if expected_type is None:
continue
else:
break
video = e.get('video')
if isinstance(video, dict) and video.get('@type') == 'VideoObject':
extract_video_object(video)
break
if expected_type is None:
continue
else:
break
return dict((k, v) for k, v in info.items() if v is not None)
@staticmethod
@ -1424,12 +1456,10 @@ class InfoExtractor(object):
try:
self._request_webpage(url, video_id, 'Checking %s URL' % item, headers=headers)
return True
except ExtractorError as e:
if isinstance(e.cause, compat_urllib_error.URLError):
self.to_screen(
'%s: %s URL is invalid, skipping' % (video_id, item))
return False
raise
except ExtractorError:
self.to_screen(
'%s: %s URL is invalid, skipping' % (video_id, item))
return False
def http_scheme(self):
""" Either "http:" or "https:", depending on the user's preferences """
@ -1457,14 +1487,14 @@ class InfoExtractor(object):
def _extract_f4m_formats(self, manifest_url, video_id, preference=None, f4m_id=None,
transform_source=lambda s: fix_xml_ampersands(s).strip(),
fatal=True, m3u8_id=None):
fatal=True, m3u8_id=None, data=None, headers={}, query={}):
manifest = self._download_xml(
manifest_url, video_id, 'Downloading f4m manifest',
'Unable to download f4m manifest',
# Some manifests may be malformed, e.g. prosiebensat1 generated manifests
# (see https://github.com/ytdl-org/youtube-dl/issues/6215#issuecomment-121704244)
transform_source=transform_source,
fatal=fatal)
fatal=fatal, data=data, headers=headers, query=query)
if manifest is False:
return []
@ -1588,12 +1618,13 @@ class InfoExtractor(object):
def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
entry_protocol='m3u8', preference=None,
m3u8_id=None, note=None, errnote=None,
fatal=True, live=False):
fatal=True, live=False, data=None, headers={},
query={}):
res = self._download_webpage_handle(
m3u8_url, video_id,
note=note or 'Downloading m3u8 information',
errnote=errnote or 'Failed to download m3u8 information',
fatal=fatal)
fatal=fatal, data=data, headers=headers, query=query)
if res is False:
return []
@ -1767,6 +1798,19 @@ class InfoExtractor(object):
# the same GROUP-ID
f['acodec'] = 'none'
formats.append(f)
# for DailyMotion
progressive_uri = last_stream_inf.get('PROGRESSIVE-URI')
if progressive_uri:
http_f = f.copy()
del http_f['manifest_url']
http_f.update({
'format_id': f['format_id'].replace('hls-', 'http-'),
'protocol': 'http',
'url': progressive_uri,
})
formats.append(http_f)
last_stream_inf = {}
return formats
@ -2011,12 +2055,12 @@ class InfoExtractor(object):
})
return entries
def _extract_mpd_formats(self, mpd_url, video_id, mpd_id=None, note=None, errnote=None, fatal=True, formats_dict={}):
def _extract_mpd_formats(self, mpd_url, video_id, mpd_id=None, note=None, errnote=None, fatal=True, formats_dict={}, data=None, headers={}, query={}):
res = self._download_xml_handle(
mpd_url, video_id,
note=note or 'Downloading MPD manifest',
errnote=errnote or 'Failed to download MPD manifest',
fatal=fatal)
fatal=fatal, data=data, headers=headers, query=query)
if res is False:
return []
mpd_doc, urlh = res
@ -2319,15 +2363,17 @@ class InfoExtractor(object):
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
return formats
def _extract_ism_formats(self, ism_url, video_id, ism_id=None, note=None, errnote=None, fatal=True):
def _extract_ism_formats(self, ism_url, video_id, ism_id=None, note=None, errnote=None, fatal=True, data=None, headers={}, query={}):
res = self._download_xml_handle(
ism_url, video_id,
note=note or 'Downloading ISM manifest',
errnote=errnote or 'Failed to download ISM manifest',
fatal=fatal)
fatal=fatal, data=data, headers=headers, query=query)
if res is False:
return []
ism_doc, urlh = res
if ism_doc is None:
return []
return self._parse_ism_formats(ism_doc, urlh.geturl(), ism_id)
@ -2691,7 +2737,7 @@ class InfoExtractor(object):
entry = {
'id': this_video_id,
'title': unescapeHTML(video_data['title'] if require_title else video_data.get('title')),
'description': video_data.get('description'),
'description': clean_html(video_data.get('description')),
'thumbnail': urljoin(base_url, self._proto_relative_url(video_data.get('image'))),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
@ -2806,7 +2852,7 @@ class InfoExtractor(object):
def _set_cookie(self, domain, name, value, expire_time=None, port=None,
path='/', secure=False, discard=False, rest={}, **kwargs):
cookie = compat_cookiejar.Cookie(
cookie = compat_cookiejar_Cookie(
0, name, value, port, port is not None, domain, True,
domain.startswith('.'), path, True, secure, expire_time,
discard, None, None, rest)

View file

@ -0,0 +1,118 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
float_or_none,
int_or_none,
)
class CONtvIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?contv\.com/details-movie/(?P<id>[^/]+)'
_TESTS = [{
'url': 'https://www.contv.com/details-movie/CEG10022949/days-of-thrills-&-laughter',
'info_dict': {
'id': 'CEG10022949',
'ext': 'mp4',
'title': 'Days Of Thrills & Laughter',
'description': 'md5:5d6b3d0b1829bb93eb72898c734802eb',
'upload_date': '20180703',
'timestamp': 1530634789.61,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'https://www.contv.com/details-movie/CLIP-show_fotld_bts/fight-of-the-living-dead:-behind-the-scenes-bites',
'info_dict': {
'id': 'CLIP-show_fotld_bts',
'title': 'Fight of the Living Dead: Behind the Scenes Bites',
},
'playlist_mincount': 7,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
details = self._download_json(
'http://metax.contv.live.junctiontv.net/metax/2.5/details/' + video_id,
video_id, query={'device': 'web'})
if details.get('type') == 'episodic':
seasons = self._download_json(
'http://metax.contv.live.junctiontv.net/metax/2.5/seriesfeed/json/' + video_id,
video_id)
entries = []
for season in seasons:
for episode in season.get('episodes', []):
episode_id = episode.get('id')
if not episode_id:
continue
entries.append(self.url_result(
'https://www.contv.com/details-movie/' + episode_id,
CONtvIE.ie_key(), episode_id))
return self.playlist_result(entries, video_id, details.get('title'))
m_details = details['details']
title = details['title']
formats = []
media_hls_url = m_details.get('media_hls_url')
if media_hls_url:
formats.extend(self._extract_m3u8_formats(
media_hls_url, video_id, 'mp4',
m3u8_id='hls', fatal=False))
media_mp4_url = m_details.get('media_mp4_url')
if media_mp4_url:
formats.append({
'format_id': 'http',
'url': media_mp4_url,
})
self._sort_formats(formats)
subtitles = {}
captions = m_details.get('captions') or {}
for caption_url in captions.values():
subtitles.setdefault('en', []).append({
'url': caption_url
})
thumbnails = []
for image in m_details.get('images', []):
image_url = image.get('url')
if not image_url:
continue
thumbnails.append({
'url': image_url,
'width': int_or_none(image.get('width')),
'height': int_or_none(image.get('height')),
})
description = None
for p in ('large_', 'medium_', 'small_', ''):
d = m_details.get(p + 'description')
if d:
description = d
break
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnails': thumbnails,
'description': description,
'timestamp': float_or_none(details.get('metax_added_on'), 1000),
'subtitles': subtitles,
'duration': float_or_none(m_details.get('duration'), 1000),
'view_count': int_or_none(details.get('num_watched')),
'like_count': int_or_none(details.get('num_fav')),
'categories': details.get('category'),
'tags': details.get('tags'),
'season_number': int_or_none(details.get('season')),
'episode_number': int_or_none(details.get('episode')),
'release_year': int_or_none(details.get('pub_year')),
}

View file

@ -4,7 +4,12 @@ from __future__ import unicode_literals
import re
from .theplatform import ThePlatformFeedIE
from ..utils import int_or_none
from ..utils import (
dict_get,
ExtractorError,
float_or_none,
int_or_none,
)
class CorusIE(ThePlatformFeedIE):
@ -12,24 +17,49 @@ class CorusIE(ThePlatformFeedIE):
https?://
(?:www\.)?
(?P<domain>
(?:globaltv|etcanada)\.com|
(?:hgtv|foodnetwork|slice|history|showcase|bigbrothercanada)\.ca
(?:
globaltv|
etcanada|
seriesplus|
wnetwork|
ytv
)\.com|
(?:
hgtv|
foodnetwork|
slice|
history|
showcase|
bigbrothercanada|
abcspark|
disney(?:channel|lachaine)
)\.ca
)
/(?:[^/]+/)*
(?:
video\.html\?.*?\bv=|
videos?/(?:[^/]+/)*(?:[a-z0-9-]+-)?
)
(?P<id>
[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}|
(?:[A-Z]{4})?\d{12,20}
)
/(?:video/(?:[^/]+/)?|(?:[^/]+/)+(?:videos/[a-z0-9-]+-|video\.html\?.*?\bv=))
(?P<id>\d+)
'''
_TESTS = [{
'url': 'http://www.hgtv.ca/shows/bryan-inc/videos/movie-night-popcorn-with-bryan-870923331648/',
'md5': '05dcbca777bf1e58c2acbb57168ad3a6',
'info_dict': {
'id': '870923331648',
'ext': 'mp4',
'title': 'Movie Night Popcorn with Bryan',
'description': 'Bryan whips up homemade popcorn, the old fashion way for Jojo and Lincoln.',
'uploader': 'SHWM-NEW',
'upload_date': '20170206',
'timestamp': 1486392197,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
'expected_warnings': ['Failed to parse JSON'],
}, {
'url': 'http://www.foodnetwork.ca/shows/chopped/video/episode/chocolate-obsession/video.html?v=872683587753',
'only_matching': True,
@ -48,58 +78,83 @@ class CorusIE(ThePlatformFeedIE):
}, {
'url': 'https://www.bigbrothercanada.ca/video/big-brother-canada-704/1457812035894/',
'only_matching': True
}, {
'url': 'https://www.seriesplus.com/emissions/dre-mary-mort-sur-ordonnance/videos/deux-coeurs-battant/SERP0055626330000200/',
'only_matching': True
}, {
'url': 'https://www.disneychannel.ca/shows/gabby-duran-the-unsittables/video/crybaby-duran-clip/2f557eec-0588-11ea-ae2b-e2c6776b770e/',
'only_matching': True
}]
_TP_FEEDS = {
'globaltv': {
'feed_id': 'ChQqrem0lNUp',
'account_id': 2269680845,
},
'etcanada': {
'feed_id': 'ChQqrem0lNUp',
'account_id': 2269680845,
},
'hgtv': {
'feed_id': 'L0BMHXi2no43',
'account_id': 2414428465,
},
'foodnetwork': {
'feed_id': 'ukK8o58zbRmJ',
'account_id': 2414429569,
},
'slice': {
'feed_id': '5tUJLgV2YNJ5',
'account_id': 2414427935,
},
'history': {
'feed_id': 'tQFx_TyyEq4J',
'account_id': 2369613659,
},
'showcase': {
'feed_id': '9H6qyshBZU3E',
'account_id': 2414426607,
},
'bigbrothercanada': {
'feed_id': 'ChQqrem0lNUp',
'account_id': 2269680845,
},
_GEO_BYPASS = False
_SITE_MAP = {
'globaltv': 'series',
'etcanada': 'series',
'foodnetwork': 'food',
'bigbrothercanada': 'series',
'disneychannel': 'disneyen',
'disneylachaine': 'disneyfr',
}
def _real_extract(self, url):
domain, video_id = re.match(self._VALID_URL, url).groups()
feed_info = self._TP_FEEDS[domain.split('.')[0]]
return self._extract_feed_info('dtjsEC', feed_info['feed_id'], 'byId=' + video_id, video_id, lambda e: {
'episode_number': int_or_none(e.get('pl1$episode')),
'season_number': int_or_none(e.get('pl1$season')),
'series': e.get('pl1$show'),
}, {
'HLS': {
'manifest': 'm3u',
},
'DesktopHLS Default': {
'manifest': 'm3u',
},
'MP4 MBR': {
'manifest': 'm3u',
},
}, feed_info['account_id'])
site = domain.split('.')[0]
path = self._SITE_MAP.get(site, site)
if path != 'series':
path = 'migration/' + path
video = self._download_json(
'https://globalcontent.corusappservices.com/templates/%s/playlist/' % path,
video_id, query={'byId': video_id},
headers={'Accept': 'application/json'})[0]
title = video['title']
formats = []
for source in video.get('sources', []):
smil_url = source.get('file')
if not smil_url:
continue
source_type = source.get('type')
note = 'Downloading%s smil file' % (' ' + source_type if source_type else '')
resp = self._download_webpage(
smil_url, video_id, note, fatal=False,
headers=self.geo_verification_headers())
if not resp:
continue
error = self._parse_json(resp, video_id, fatal=False)
if error:
if error.get('exception') == 'GeoLocationBlocked':
self.raise_geo_restricted(countries=['CA'])
raise ExtractorError(error['description'])
smil = self._parse_xml(resp, video_id, fatal=False)
if smil is None:
continue
namespace = self._parse_smil_namespace(smil)
formats.extend(self._parse_smil_formats(
smil, smil_url, video_id, namespace))
if not formats and video.get('drm'):
raise ExtractorError('This video is DRM protected.', expected=True)
self._sort_formats(formats)
subtitles = {}
for track in video.get('tracks', []):
track_url = track.get('file')
if not track_url:
continue
lang = 'fr' if site in ('disneylachaine', 'seriesplus') else 'en'
subtitles.setdefault(lang, []).append({'url': track_url})
metadata = video.get('metadata') or {}
get_number = lambda x: int_or_none(video.get('pl1$' + x) or metadata.get(x + 'Number'))
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': dict_get(video, ('defaultThumbnailUrl', 'thumbnail', 'image')),
'description': video.get('description'),
'timestamp': int_or_none(video.get('availableDate'), 1000),
'subtitles': subtitles,
'duration': float_or_none(metadata.get('duration')),
'series': dict_get(video, ('show', 'pl1$show')),
'season_number': get_number('season'),
'episode_number': get_number('episode'),
}

View file

@ -13,6 +13,7 @@ from ..compat import (
compat_b64decode,
compat_etree_Element,
compat_etree_fromstring,
compat_str,
compat_urllib_parse_urlencode,
compat_urllib_request,
compat_urlparse,
@ -25,9 +26,9 @@ from ..utils import (
intlist_to_bytes,
int_or_none,
lowercase_escape,
merge_dicts,
remove_end,
sanitized_Request,
unified_strdate,
urlencode_postdata,
xpath_text,
)
@ -136,6 +137,7 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
# rtmp
'skip_download': True,
},
'skip': 'Video gone',
}, {
'url': 'http://www.crunchyroll.com/media-589804/culture-japan-1',
'info_dict': {
@ -157,11 +159,12 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
'info_dict': {
'id': '702409',
'ext': 'mp4',
'title': 'Re:ZERO -Starting Life in Another World- Episode 5 The Morning of Our Promise Is Still Distant',
'description': 'md5:97664de1ab24bbf77a9c01918cb7dca9',
'title': compat_str,
'description': compat_str,
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'TV TOKYO',
'upload_date': '20160508',
'uploader': 'Re:Zero Partners',
'timestamp': 1462098900,
'upload_date': '20160501',
},
'params': {
# m3u8 download
@ -172,12 +175,13 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
'info_dict': {
'id': '727589',
'ext': 'mp4',
'title': "KONOSUBA -God's blessing on this wonderful world! 2 Episode 1 Give Me Deliverance From This Judicial Injustice!",
'description': 'md5:cbcf05e528124b0f3a0a419fc805ea7d',
'title': compat_str,
'description': compat_str,
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Kadokawa Pictures Inc.',
'upload_date': '20170118',
'series': "KONOSUBA -God's blessing on this wonderful world!",
'timestamp': 1484130900,
'upload_date': '20170111',
'series': compat_str,
'season': "KONOSUBA -God's blessing on this wonderful world! 2",
'season_number': 2,
'episode': 'Give Me Deliverance From This Judicial Injustice!',
@ -200,10 +204,11 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
'info_dict': {
'id': '535080',
'ext': 'mp4',
'title': '11eyes Episode 1 Red Night ~ Piros éjszaka',
'description': 'Kakeru and Yuka are thrown into an alternate nightmarish world they call "Red Night".',
'title': compat_str,
'description': compat_str,
'uploader': 'Marvelous AQL Inc.',
'upload_date': '20091021',
'timestamp': 1255512600,
'upload_date': '20091014',
},
'params': {
# Just test metadata extraction
@ -224,15 +229,17 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
# just test metadata extraction
'skip_download': True,
},
'skip': 'Video gone',
}, {
# A video with a vastly different season name compared to the series name
'url': 'http://www.crunchyroll.com/nyarko-san-another-crawling-chaos/episode-1-test-590532',
'info_dict': {
'id': '590532',
'ext': 'mp4',
'title': 'Haiyoru! Nyaruani (ONA) Episode 1 Test',
'description': 'Mahiro and Nyaruko talk about official certification.',
'title': compat_str,
'description': compat_str,
'uploader': 'TV TOKYO',
'timestamp': 1330956000,
'upload_date': '20120305',
'series': 'Nyarko-san: Another Crawling Chaos',
'season': 'Haiyoru! Nyaruani (ONA)',
@ -442,23 +449,21 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
webpage, 'language', default=None, group='lang')
video_title = self._html_search_regex(
r'(?s)<h1[^>]*>((?:(?!<h1).)*?<span[^>]+itemprop=["\']title["\'][^>]*>(?:(?!<h1).)+?)</h1>',
webpage, 'video_title')
(r'(?s)<h1[^>]*>((?:(?!<h1).)*?<(?:span[^>]+itemprop=["\']title["\']|meta[^>]+itemprop=["\']position["\'])[^>]*>(?:(?!<h1).)+?)</h1>',
r'<title>(.+?),\s+-\s+.+? Crunchyroll'),
webpage, 'video_title', default=None)
if not video_title:
video_title = re.sub(r'^Watch\s+', '', self._og_search_description(webpage))
video_title = re.sub(r' {2,}', ' ', video_title)
video_description = (self._parse_json(self._html_search_regex(
r'<script[^>]*>\s*.+?\[media_id=%s\].+?({.+?"description"\s*:.+?})\);' % video_id,
webpage, 'description', default='{}'), video_id) or media_metadata).get('description')
if video_description:
video_description = lowercase_escape(video_description.replace(r'\r\n', '\n'))
video_upload_date = self._html_search_regex(
[r'<div>Availability for free users:(.+?)</div>', r'<div>[^<>]+<span>\s*(.+?\d{4})\s*</span></div>'],
webpage, 'video_upload_date', fatal=False, flags=re.DOTALL)
if video_upload_date:
video_upload_date = unified_strdate(video_upload_date)
video_uploader = self._html_search_regex(
# try looking for both an uploader that's a link and one that's not
[r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', r'<div>\s*Publisher:\s*<span>\s*(.+?)\s*</span>\s*</div>'],
webpage, 'video_uploader', fatal=False)
webpage, 'video_uploader', default=False)
formats = []
for stream in media.get('streams', []):
@ -611,14 +616,15 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)',
webpage, 'season number', default=None))
return {
info = self._search_json_ld(webpage, video_id, default={})
return merge_dicts({
'id': video_id,
'title': video_title,
'description': video_description,
'duration': duration,
'thumbnail': thumbnail,
'uploader': video_uploader,
'upload_date': video_upload_date,
'series': series,
'season': season,
'season_number': season_number,
@ -626,7 +632,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'episode_number': episode_number,
'subtitles': subtitles,
'formats': formats,
}
}, info)
class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):

View file

@ -3,6 +3,7 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import unified_timestamp
from .youtube import YoutubeIE
class CtsNewsIE(InfoExtractor):
@ -14,8 +15,8 @@ class CtsNewsIE(InfoExtractor):
'info_dict': {
'id': '201501291578109',
'ext': 'mp4',
'title': '以色列.真主黨交火 3人死亡',
'description': '以色列和黎巴嫩真主黨,爆發五年最嚴重衝突,雙方砲轟交火,兩名以軍死亡,還有一名西班牙籍的聯合國維和人...',
'title': '以色列.真主黨交火 3人死亡 - 華視新聞網',
'description': '以色列和黎巴嫩真主黨,爆發五年最嚴重衝突,雙方砲轟交火,兩名以軍死亡,還有一名西班牙籍的聯合國維和人員也不幸罹難。大陸陝西、河南、安徽、江蘇和湖北五個省份出現大暴雪,嚴重影響陸空交通,不過九華山卻出現...',
'timestamp': 1422528540,
'upload_date': '20150129',
}
@ -26,7 +27,7 @@ class CtsNewsIE(InfoExtractor):
'info_dict': {
'id': '201309031304098',
'ext': 'mp4',
'title': '韓國31歲童顏男 貌如十多歲小孩',
'title': '韓國31歲童顏男 貌如十多歲小孩 - 華視新聞網',
'description': '越有年紀的人越希望看起來年輕一點而南韓卻有一位31歲的男子看起來像是11、12歲的小孩身...',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1378205880,
@ -62,8 +63,7 @@ class CtsNewsIE(InfoExtractor):
video_url = mp4_feed['source_url']
else:
self.to_screen('Not CTSPlayer video, trying Youtube...')
youtube_url = self._search_regex(
r'src="(//www\.youtube\.com/embed/[^"]+)"', page, 'youtube url')
youtube_url = YoutubeIE._extract_url(page)
return self.url_result(youtube_url, ie='Youtube')

View file

@ -1,64 +1,105 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import functools
import hashlib
import itertools
import json
import random
import re
import string
from .common import InfoExtractor
from ..compat import compat_struct_pack
from ..compat import compat_HTTPError
from ..utils import (
determine_ext,
error_to_compat_str,
age_restricted,
clean_html,
ExtractorError,
int_or_none,
mimetype2ext,
OnDemandPagedList,
parse_iso8601,
sanitized_Request,
str_to_int,
try_get,
unescapeHTML,
update_url_query,
url_or_none,
urlencode_postdata,
)
class DailymotionBaseInfoExtractor(InfoExtractor):
_FAMILY_FILTER = None
_HEADERS = {
'Content-Type': 'application/json',
'Origin': 'https://www.dailymotion.com',
}
_NETRC_MACHINE = 'dailymotion'
def _get_dailymotion_cookies(self):
return self._get_cookies('https://www.dailymotion.com/')
@staticmethod
def _build_request(url):
"""Build a request with the family filter disabled"""
request = sanitized_Request(url)
request.add_header('Cookie', 'family_filter=off; ff=off')
return request
def _get_cookie_value(cookies, name):
cookie = cookies.get(name)
if cookie:
return cookie.value
def _download_webpage_handle_no_ff(self, url, *args, **kwargs):
request = self._build_request(url)
return self._download_webpage_handle(request, *args, **kwargs)
def _set_dailymotion_cookie(self, name, value):
self._set_cookie('www.dailymotion.com', name, value)
def _download_webpage_no_ff(self, url, *args, **kwargs):
request = self._build_request(url)
return self._download_webpage(request, *args, **kwargs)
def _real_initialize(self):
cookies = self._get_dailymotion_cookies()
ff = self._get_cookie_value(cookies, 'ff')
self._FAMILY_FILTER = ff == 'on' if ff else age_restricted(18, self._downloader.params.get('age_limit'))
self._set_dailymotion_cookie('ff', 'on' if self._FAMILY_FILTER else 'off')
def _call_api(self, object_type, xid, object_fields, note, filter_extra=None):
if not self._HEADERS.get('Authorization'):
cookies = self._get_dailymotion_cookies()
token = self._get_cookie_value(cookies, 'access_token') or self._get_cookie_value(cookies, 'client_token')
if not token:
data = {
'client_id': 'f1a362d288c1b98099c7',
'client_secret': 'eea605b96e01c796ff369935357eca920c5da4c5',
}
username, password = self._get_login_info()
if username:
data.update({
'grant_type': 'password',
'password': password,
'username': username,
})
else:
data['grant_type'] = 'client_credentials'
try:
token = self._download_json(
'https://graphql.api.dailymotion.com/oauth/token',
None, 'Downloading Access Token',
data=urlencode_postdata(data))['access_token']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
raise ExtractorError(self._parse_json(
e.cause.read().decode(), xid)['error_description'], expected=True)
raise
self._set_dailymotion_cookie('access_token' if username else 'client_token', token)
self._HEADERS['Authorization'] = 'Bearer ' + token
resp = self._download_json(
'https://graphql.api.dailymotion.com/', xid, note, data=json.dumps({
'query': '''{
%s(xid: "%s"%s) {
%s
}
}''' % (object_type, xid, ', ' + filter_extra if filter_extra else '', object_fields),
}).encode(), headers=self._HEADERS)
obj = resp['data'][object_type]
if not obj:
raise ExtractorError(resp['errors'][0]['message'], expected=True)
return obj
class DailymotionIE(DailymotionBaseInfoExtractor):
_VALID_URL = r'(?i)https?://(?:(www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:(?:embed|swf|#)/)?video|swf)/(?P<id>[^/?_]+)'
_VALID_URL = r'''(?ix)
https?://
(?:
(?:(?:www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:(?:embed|swf|\#)/)?video|swf)|
(?:www\.)?lequipe\.fr/video
)
/(?P<id>[^/?_]+)(?:.+?\bplaylist=(?P<playlist_id>x[0-9a-z]+))?
'''
IE_NAME = 'dailymotion'
_FORMATS = [
('stream_h264_ld_url', 'ld'),
('stream_h264_url', 'standard'),
('stream_h264_hq_url', 'hq'),
('stream_h264_hd_url', 'hd'),
('stream_h264_hd1080_url', 'hd180'),
]
_TESTS = [{
'url': 'http://www.dailymotion.com/video/x5kesuj_office-christmas-party-review-jason-bateman-olivia-munn-t-j-miller_news',
'md5': '074b95bdee76b9e3654137aee9c79dfe',
@ -67,7 +108,6 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'ext': 'mp4',
'title': 'Office Christmas Party Review Jason Bateman, Olivia Munn, T.J. Miller',
'description': 'Office Christmas Party Review - Jason Bateman, Olivia Munn, T.J. Miller',
'thumbnail': r're:^https?:.*\.(?:jpg|png)$',
'duration': 187,
'timestamp': 1493651285,
'upload_date': '20170501',
@ -133,7 +173,22 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
}, {
'url': 'http://www.dailymotion.com/swf/x3ss1m_funny-magic-trick-barry-and-stuart_fun',
'only_matching': True,
}, {
'url': 'https://www.lequipe.fr/video/x791mem',
'only_matching': True,
}, {
'url': 'https://www.lequipe.fr/video/k7MtHciueyTcrFtFKA2',
'only_matching': True,
}, {
'url': 'https://www.dailymotion.com/video/x3z49k?playlist=xv4bw',
'only_matching': True,
}]
_GEO_BYPASS = False
_COMMON_MEDIA_FIELDS = '''description
geoblockedCountries {
allowed
}
xid'''
@staticmethod
def _extract_urls(webpage):
@ -149,264 +204,140 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
return urls
def _real_extract(self, url):
video_id = self._match_id(url)
video_id, playlist_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage_no_ff(
'https://www.dailymotion.com/video/%s' % video_id, video_id)
if playlist_id:
if not self._downloader.params.get('noplaylist'):
self.to_screen('Downloading playlist %s - add --no-playlist to just download video' % playlist_id)
return self.url_result(
'http://www.dailymotion.com/playlist/' + playlist_id,
'DailymotionPlaylist', playlist_id)
self.to_screen('Downloading just video %s because of --no-playlist' % video_id)
age_limit = self._rta_search(webpage)
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'description', webpage, 'description')
view_count_str = self._search_regex(
(r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:([\s\d,.]+)"',
r'video_views_count[^>]+>\s+([\s\d\,.]+)'),
webpage, 'view count', default=None)
if view_count_str:
view_count_str = re.sub(r'\s', '', view_count_str)
view_count = str_to_int(view_count_str)
comment_count = int_or_none(self._search_regex(
r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserComments:(\d+)"',
webpage, 'comment count', default=None))
player_v5 = self._search_regex(
[r'buildPlayer\(({.+?})\);\n', # See https://github.com/ytdl-org/youtube-dl/issues/7826
r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);',
r'buildPlayer\(({.+?})\);',
r'var\s+config\s*=\s*({.+?});',
# New layout regex (see https://github.com/ytdl-org/youtube-dl/issues/13580)
r'__PLAYER_CONFIG__\s*=\s*({.+?});'],
webpage, 'player v5', default=None)
if player_v5:
player = self._parse_json(player_v5, video_id, fatal=False) or {}
metadata = try_get(player, lambda x: x['metadata'], dict)
if not metadata:
metadata_url = url_or_none(try_get(
player, lambda x: x['context']['metadata_template_url1']))
if metadata_url:
metadata_url = metadata_url.replace(':videoId', video_id)
else:
metadata_url = update_url_query(
'https://www.dailymotion.com/player/metadata/video/%s'
% video_id, {
'embedder': url,
'integration': 'inline',
'GK_PV5_NEON': '1',
})
metadata = self._download_json(
metadata_url, video_id, 'Downloading metadata JSON')
if try_get(metadata, lambda x: x['error']['type']) == 'password_protected':
password = self._downloader.params.get('videopassword')
if password:
r = int(metadata['id'][1:], 36)
us64e = lambda x: base64.urlsafe_b64encode(x).decode().strip('=')
t = ''.join(random.choice(string.ascii_letters) for i in range(10))
n = us64e(compat_struct_pack('I', r))
i = us64e(hashlib.md5(('%s%d%s' % (password, r, t)).encode()).digest())
metadata = self._download_json(
'http://www.dailymotion.com/player/metadata/video/p' + i + t + n, video_id)
self._check_error(metadata)
formats = []
for quality, media_list in metadata['qualities'].items():
for media in media_list:
media_url = media.get('url')
if not media_url:
continue
type_ = media.get('type')
if type_ == 'application/vnd.lumberjack.manifest':
continue
ext = mimetype2ext(type_) or determine_ext(media_url)
if ext == 'm3u8':
m3u8_formats = self._extract_m3u8_formats(
media_url, video_id, 'mp4', preference=-1,
m3u8_id='hls', fatal=False)
for f in m3u8_formats:
f['url'] = f['url'].split('#')[0]
formats.append(f)
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
media_url, video_id, preference=-1, f4m_id='hds', fatal=False))
else:
f = {
'url': media_url,
'format_id': 'http-%s' % quality,
'ext': ext,
}
m = re.search(r'H264-(?P<width>\d+)x(?P<height>\d+)', media_url)
if m:
f.update({
'width': int(m.group('width')),
'height': int(m.group('height')),
})
formats.append(f)
self._sort_formats(formats)
title = metadata['title']
duration = int_or_none(metadata.get('duration'))
timestamp = int_or_none(metadata.get('created_time'))
thumbnail = metadata.get('poster_url')
uploader = metadata.get('owner', {}).get('screenname')
uploader_id = metadata.get('owner', {}).get('id')
subtitles = {}
subtitles_data = metadata.get('subtitles', {}).get('data', {})
if subtitles_data and isinstance(subtitles_data, dict):
for subtitle_lang, subtitle in subtitles_data.items():
subtitles[subtitle_lang] = [{
'ext': determine_ext(subtitle_url),
'url': subtitle_url,
} for subtitle_url in subtitle.get('urls', [])]
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'uploader': uploader,
'uploader_id': uploader_id,
'age_limit': age_limit,
'view_count': view_count,
'comment_count': comment_count,
'formats': formats,
'subtitles': subtitles,
}
# vevo embed
vevo_id = self._search_regex(
r'<link rel="video_src" href="[^"]*?vevo\.com[^"]*?video=(?P<id>[\w]*)',
webpage, 'vevo embed', default=None)
if vevo_id:
return self.url_result('vevo:%s' % vevo_id, 'Vevo')
# fallback old player
embed_page = self._download_webpage_no_ff(
'https://www.dailymotion.com/embed/video/%s' % video_id,
video_id, 'Downloading embed page')
timestamp = parse_iso8601(self._html_search_meta(
'video:release_date', webpage, 'upload date'))
info = self._parse_json(
self._search_regex(
r'var info = ({.*?}),$', embed_page,
'video info', flags=re.MULTILINE),
video_id)
self._check_error(info)
formats = []
for (key, format_id) in self._FORMATS:
video_url = info.get(key)
if video_url is not None:
m_size = re.search(r'H264-(\d+)x(\d+)', video_url)
if m_size is not None:
width, height = map(int_or_none, (m_size.group(1), m_size.group(2)))
else:
width, height = None, None
formats.append({
'url': video_url,
'ext': 'mp4',
'format_id': format_id,
'width': width,
'height': height,
})
self._sort_formats(formats)
# subtitles
video_subtitles = self.extract_subtitles(video_id, webpage)
title = self._og_search_title(webpage, default=None)
if title is None:
title = self._html_search_regex(
r'(?s)<span\s+id="video_title"[^>]*>(.*?)</span>', webpage,
'title')
return {
'id': video_id,
'formats': formats,
'uploader': info['owner.screenname'],
'timestamp': timestamp,
'title': title,
'description': description,
'subtitles': video_subtitles,
'thumbnail': info['thumbnail_url'],
'age_limit': age_limit,
'view_count': view_count,
'duration': info['duration']
password = self._downloader.params.get('videopassword')
media = self._call_api(
'media', video_id, '''... on Video {
%s
stats {
likes {
total
}
views {
total
}
}
}
... on Live {
%s
audienceCount
isOnAir
}''' % (self._COMMON_MEDIA_FIELDS, self._COMMON_MEDIA_FIELDS), 'Downloading media JSON metadata',
'password: "%s"' % self._downloader.params.get('videopassword') if password else None)
xid = media['xid']
def _check_error(self, info):
error = info.get('error')
metadata = self._download_json(
'https://www.dailymotion.com/player/metadata/video/' + xid,
xid, 'Downloading metadata JSON',
query={'app': 'com.dailymotion.neon'})
error = metadata.get('error')
if error:
title = error.get('title') or error['message']
title = error.get('title') or error['raw_message']
# See https://developer.dailymotion.com/api#access-error
if error.get('code') == 'DM007':
self.raise_geo_restricted(msg=title)
allowed_countries = try_get(media, lambda x: x['geoblockedCountries']['allowed'], list)
self.raise_geo_restricted(msg=title, countries=allowed_countries)
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, title), expected=True)
def _get_subtitles(self, video_id, webpage):
try:
sub_list = self._download_webpage(
'https://api.dailymotion.com/video/%s/subtitles?fields=id,language,url' % video_id,
video_id, note=False)
except ExtractorError as err:
self._downloader.report_warning('unable to download video subtitles: %s' % error_to_compat_str(err))
return {}
info = json.loads(sub_list)
if (info['total'] > 0):
sub_lang_list = dict((l['language'], [{'url': l['url'], 'ext': 'srt'}]) for l in info['list'])
return sub_lang_list
self._downloader.report_warning('video doesn\'t have subtitles')
return {}
title = metadata['title']
is_live = media.get('isOnAir')
formats = []
for quality, media_list in metadata['qualities'].items():
for m in media_list:
media_url = m.get('url')
media_type = m.get('type')
if not media_url or media_type == 'application/vnd.lumberjack.manifest':
continue
if media_type == 'application/x-mpegURL':
formats.extend(self._extract_m3u8_formats(
media_url, video_id, 'mp4',
'm3u8' if is_live else 'm3u8_native',
m3u8_id='hls', fatal=False))
else:
f = {
'url': media_url,
'format_id': 'http-' + quality,
}
m = re.search(r'/H264-(\d+)x(\d+)(?:-(60)/)?', media_url)
if m:
width, height, fps = map(int_or_none, m.groups())
f.update({
'fps': fps,
'height': height,
'width': width,
})
formats.append(f)
for f in formats:
f['url'] = f['url'].split('#')[0]
if not f.get('fps') and f['format_id'].endswith('@60'):
f['fps'] = 60
self._sort_formats(formats)
subtitles = {}
subtitles_data = try_get(metadata, lambda x: x['subtitles']['data'], dict) or {}
for subtitle_lang, subtitle in subtitles_data.items():
subtitles[subtitle_lang] = [{
'url': subtitle_url,
} for subtitle_url in subtitle.get('urls', [])]
thumbnails = []
for height, poster_url in metadata.get('posters', {}).items():
thumbnails.append({
'height': int_or_none(height),
'id': height,
'url': poster_url,
})
owner = metadata.get('owner') or {}
stats = media.get('stats') or {}
get_count = lambda x: int_or_none(try_get(stats, lambda y: y[x + 's']['total']))
return {
'id': video_id,
'title': self._live_title(title) if is_live else title,
'description': clean_html(media.get('description')),
'thumbnails': thumbnails,
'duration': int_or_none(metadata.get('duration')) or None,
'timestamp': int_or_none(metadata.get('created_time')),
'uploader': owner.get('screenname'),
'uploader_id': owner.get('id') or metadata.get('screenname'),
'age_limit': 18 if metadata.get('explicit') else 0,
'tags': metadata.get('tags'),
'view_count': get_count('view') or int_or_none(media.get('audienceCount')),
'like_count': get_count('like'),
'formats': formats,
'subtitles': subtitles,
'is_live': is_live,
}
class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
IE_NAME = 'dailymotion:playlist'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>x[0-9a-z]+)'
_TESTS = [{
'url': 'http://www.dailymotion.com/playlist/xv4bw_nqtv_sport/1#video=xl8v3q',
'info_dict': {
'title': 'SPORT',
'id': 'xv4bw',
},
'playlist_mincount': 20,
}]
class DailymotionPlaylistBaseIE(DailymotionBaseInfoExtractor):
_PAGE_SIZE = 100
def _fetch_page(self, playlist_id, authorizaion, page):
def _fetch_page(self, playlist_id, page):
page += 1
videos = self._download_json(
'https://graphql.api.dailymotion.com',
playlist_id, 'Downloading page %d' % page,
data=json.dumps({
'query': '''{
collection(xid: "%s") {
videos(first: %d, page: %d) {
pageInfo {
hasNextPage
nextPage
}
videos = self._call_api(
self._OBJECT_TYPE, playlist_id,
'''videos(allowExplicit: %s, first: %d, page: %d) {
edges {
node {
xid
url
}
}
}
}
}''' % (playlist_id, self._PAGE_SIZE, page)
}).encode(), headers={
'Authorization': authorizaion,
'Origin': 'https://www.dailymotion.com',
})['data']['collection']['videos']
}''' % ('false' if self._FAMILY_FILTER else 'true', self._PAGE_SIZE, page),
'Downloading page %d' % page)['videos']
for edge in videos['edges']:
node = edge['node']
yield self.url_result(
@ -414,86 +345,49 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
api = self._parse_json(self._search_regex(
r'__PLAYER_CONFIG__\s*=\s*({.+?});',
webpage, 'player config'), playlist_id)['context']['api']
auth = self._download_json(
api.get('auth_url', 'https://graphql.api.dailymotion.com/oauth/token'),
playlist_id, data=urlencode_postdata({
'client_id': api.get('client_id', 'f1a362d288c1b98099c7'),
'client_secret': api.get('client_secret', 'eea605b96e01c796ff369935357eca920c5da4c5'),
'grant_type': 'client_credentials',
}))
authorizaion = '%s %s' % (auth.get('token_type', 'Bearer'), auth['access_token'])
entries = OnDemandPagedList(functools.partial(
self._fetch_page, playlist_id, authorizaion), self._PAGE_SIZE)
self._fetch_page, playlist_id), self._PAGE_SIZE)
return self.playlist_result(
entries, playlist_id,
self._og_search_title(webpage))
entries, playlist_id)
class DailymotionUserIE(DailymotionBaseInfoExtractor):
class DailymotionPlaylistIE(DailymotionPlaylistBaseIE):
IE_NAME = 'dailymotion:playlist'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>x[0-9a-z]+)'
_TESTS = [{
'url': 'http://www.dailymotion.com/playlist/xv4bw_nqtv_sport/1#video=xl8v3q',
'info_dict': {
'id': 'xv4bw',
},
'playlist_mincount': 20,
}]
_OBJECT_TYPE = 'collection'
class DailymotionUserIE(DailymotionPlaylistBaseIE):
IE_NAME = 'dailymotion:user'
_VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|swf|#|video|playlist)/)(?:(?:old/)?user/)?(?P<user>[^/]+)'
_MORE_PAGES_INDICATOR = r'(?s)<div class="pages[^"]*">.*?<a\s+class="[^"]*?icon-arrow_right[^"]*?"'
_PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
_VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|swf|#|video|playlist)/)(?:(?:old/)?user/)?(?P<id>[^/]+)'
_TESTS = [{
'url': 'https://www.dailymotion.com/user/nqtv',
'info_dict': {
'id': 'nqtv',
'title': 'Rémi Gaillard',
},
'playlist_mincount': 100,
'playlist_mincount': 152,
}, {
'url': 'http://www.dailymotion.com/user/UnderProject',
'info_dict': {
'id': 'UnderProject',
'title': 'UnderProject',
},
'playlist_mincount': 1800,
'expected_warnings': [
'Stopped at duplicated page',
],
'playlist_mincount': 1000,
'skip': 'Takes too long time',
}, {
'url': 'https://www.dailymotion.com/user/nqtv',
'info_dict': {
'id': 'nqtv',
},
'playlist_mincount': 148,
'params': {
'age_limit': 0,
},
}]
def _extract_entries(self, id):
video_ids = set()
processed_urls = set()
for pagenum in itertools.count(1):
page_url = self._PAGE_TEMPLATE % (id, pagenum)
webpage, urlh = self._download_webpage_handle_no_ff(
page_url, id, 'Downloading page %s' % pagenum)
if urlh.geturl() in processed_urls:
self.report_warning('Stopped at duplicated page %s, which is the same as %s' % (
page_url, urlh.geturl()), id)
break
processed_urls.add(urlh.geturl())
for video_id in re.findall(r'data-xid="(.+?)"', webpage):
if video_id not in video_ids:
yield self.url_result(
'http://www.dailymotion.com/video/%s' % video_id,
DailymotionIE.ie_key(), video_id)
video_ids.add(video_id)
if re.search(self._MORE_PAGES_INDICATOR, webpage) is None:
break
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user = mobj.group('user')
webpage = self._download_webpage(
'https://www.dailymotion.com/user/%s' % user, user)
full_user = unescapeHTML(self._html_search_regex(
r'<a class="nav-image" title="([^"]+)" href="/%s">' % re.escape(user),
webpage, 'user'))
return {
'_type': 'playlist',
'id': user,
'title': full_user,
'entries': self._extract_entries(user),
}
_OBJECT_TYPE = 'channel'

View file

@ -1,154 +0,0 @@
from __future__ import unicode_literals
import base64
import json
import random
import re
from .common import InfoExtractor
from ..aes import (
aes_cbc_decrypt,
aes_cbc_encrypt,
)
from ..compat import compat_b64decode
from ..utils import (
bytes_to_intlist,
bytes_to_long,
extract_attributes,
ExtractorError,
intlist_to_bytes,
js_to_json,
int_or_none,
long_to_bytes,
pkcs1pad,
)
class DaisukiMottoIE(InfoExtractor):
_VALID_URL = r'https?://motto\.daisuki\.net/framewatch/embed/[^/]+/(?P<id>[0-9a-zA-Z]{3})'
_TEST = {
'url': 'http://motto.daisuki.net/framewatch/embed/embedDRAGONBALLSUPERUniverseSurvivalsaga/V2e/760/428',
'info_dict': {
'id': 'V2e',
'ext': 'mp4',
'title': '#117 SHOWDOWN OF LOVE! ANDROIDS VS UNIVERSE 2!!',
'subtitles': {
'mul': [{
'ext': 'ttml',
}],
},
},
'params': {
'skip_download': True, # AES-encrypted HLS stream
},
}
# The public key in PEM format can be found in clientlibs_anime_watch.min.js
_RSA_KEY = (0xc5524c25e8e14b366b3754940beeb6f96cb7e2feef0b932c7659a0c5c3bf173d602464c2df73d693b513ae06ff1be8f367529ab30bf969c5640522181f2a0c51ea546ae120d3d8d908595e4eff765b389cde080a1ef7f1bbfb07411cc568db73b7f521cedf270cbfbe0ddbc29b1ac9d0f2d8f4359098caffee6d07915020077d, 65537)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
flashvars = self._parse_json(self._search_regex(
r'(?s)var\s+flashvars\s*=\s*({.+?});', webpage, 'flashvars'),
video_id, transform_source=js_to_json)
iv = [0] * 16
data = {}
for key in ('device_cd', 'mv_id', 'ss1_prm', 'ss2_prm', 'ss3_prm', 'ss_id'):
data[key] = flashvars.get(key, '')
encrypted_rtn = None
# Some AES keys are rejected. Try it with different AES keys
for idx in range(5):
aes_key = [random.randint(0, 254) for _ in range(32)]
padded_aeskey = intlist_to_bytes(pkcs1pad(aes_key, 128))
n, e = self._RSA_KEY
encrypted_aeskey = long_to_bytes(pow(bytes_to_long(padded_aeskey), e, n))
init_data = self._download_json(
'http://motto.daisuki.net/fastAPI/bgn/init/',
video_id, query={
's': flashvars.get('s', ''),
'c': flashvars.get('ss3_prm', ''),
'e': url,
'd': base64.b64encode(intlist_to_bytes(aes_cbc_encrypt(
bytes_to_intlist(json.dumps(data)),
aes_key, iv))).decode('ascii'),
'a': base64.b64encode(encrypted_aeskey).decode('ascii'),
}, note='Downloading JSON metadata' + (' (try #%d)' % (idx + 1) if idx > 0 else ''))
if 'rtn' in init_data:
encrypted_rtn = init_data['rtn']
break
self._sleep(5, video_id)
if encrypted_rtn is None:
raise ExtractorError('Failed to fetch init data')
rtn = self._parse_json(
intlist_to_bytes(aes_cbc_decrypt(bytes_to_intlist(
compat_b64decode(encrypted_rtn)),
aes_key, iv)).decode('utf-8').rstrip('\0'),
video_id)
title = rtn['title_str']
formats = self._extract_m3u8_formats(
rtn['play_url'], video_id, ext='mp4', entry_protocol='m3u8_native')
subtitles = {}
caption_url = rtn.get('caption_url')
if caption_url:
# mul: multiple languages
subtitles['mul'] = [{
'url': caption_url,
'ext': 'ttml',
}]
return {
'id': video_id,
'title': title,
'formats': formats,
'subtitles': subtitles,
}
class DaisukiMottoPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://motto\.daisuki\.net/(?P<id>information)/'
_TEST = {
'url': 'http://motto.daisuki.net/information/',
'info_dict': {
'title': 'DRAGON BALL SUPER',
},
'playlist_mincount': 117,
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = []
for li in re.findall(r'(<li[^>]+?data-product_id="[a-zA-Z0-9]{3}"[^>]+>)', webpage):
attr = extract_attributes(li)
ad_id = attr.get('data-ad_id')
product_id = attr.get('data-product_id')
if ad_id and product_id:
episode_id = attr.get('data-chapter')
entries.append({
'_type': 'url_transparent',
'url': 'http://motto.daisuki.net/framewatch/embed/%s/%s/760/428' % (ad_id, product_id),
'episode_id': episode_id,
'episode_number': int_or_none(episode_id),
'ie_key': 'DaisukiMotto',
})
return self.playlist_result(entries, playlist_title='DRAGON BALL SUPER')

View file

@ -2,25 +2,21 @@
from __future__ import unicode_literals
import re
import itertools
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_unquote,
compat_urllib_parse_urlencode,
compat_urlparse,
)
from ..utils import (
int_or_none,
str_to_int,
xpath_text,
unescapeHTML,
)
class DaumIE(InfoExtractor):
class DaumBaseIE(InfoExtractor):
_KAKAO_EMBED_BASE = 'http://tv.kakao.com/embed/player/cliplink/'
class DaumIE(DaumBaseIE):
_VALID_URL = r'https?://(?:(?:m\.)?tvpot\.daum\.net/v/|videofarm\.daum\.net/controller/player/VodPlayer\.swf\?vid=)(?P<id>[^?#&]+)'
IE_NAME = 'daum.net'
@ -36,6 +32,9 @@ class DaumIE(InfoExtractor):
'duration': 2117,
'view_count': int,
'comment_count': int,
'uploader_id': 186139,
'uploader': '콘간지',
'timestamp': 1387310323,
},
}, {
'url': 'http://m.tvpot.daum.net/v/65139429',
@ -44,11 +43,14 @@ class DaumIE(InfoExtractor):
'ext': 'mp4',
'title': '1297회, \'아빠 아들로 태어나길 잘 했어\' 민수, 감동의 눈물[아빠 어디가] 20150118',
'description': 'md5:79794514261164ff27e36a21ad229fc5',
'upload_date': '20150604',
'upload_date': '20150118',
'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'duration': 154,
'view_count': int,
'comment_count': int,
'uploader': 'MBC 예능',
'uploader_id': 132251,
'timestamp': 1421604228,
},
}, {
'url': 'http://tvpot.daum.net/v/07dXWRka62Y%24',
@ -59,12 +61,15 @@ class DaumIE(InfoExtractor):
'id': 'vwIpVpCQsT8$',
'ext': 'flv',
'title': '01-Korean War ( Trouble on the horizon )',
'description': '\nKorean War 01\nTrouble on the horizon\n전쟁의 먹구름',
'description': 'Korean War 01\r\nTrouble on the horizon\r\n전쟁의 먹구름',
'upload_date': '20080223',
'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'duration': 249,
'view_count': int,
'comment_count': int,
'uploader': '까칠한 墮落始祖 황비홍님의',
'uploader_id': 560824,
'timestamp': 1203770745,
},
}, {
# Requires dte_type=WEB (#9972)
@ -73,60 +78,24 @@ class DaumIE(InfoExtractor):
'info_dict': {
'id': 's3794Uf1NZeZ1qMpGpeqeRU',
'ext': 'mp4',
'title': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny) [쇼! 음악중심] 508회 20160611',
'description': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)\n\n[쇼! 음악중심] 20160611, 507회',
'upload_date': '20160611',
'title': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)',
'description': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)\r\n\r\n[쇼! 음악중심] 20160611, 507회',
'upload_date': '20170129',
'uploader': '쇼! 음악중심',
'uploader_id': 2653210,
'timestamp': 1485684628,
},
}]
def _real_extract(self, url):
video_id = compat_urllib_parse_unquote(self._match_id(url))
movie_data = self._download_json(
'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json',
video_id, 'Downloading video formats info', query={'vid': video_id, 'dte_type': 'WEB'})
# For urls like http://m.tvpot.daum.net/v/65139429, where the video_id is really a clipid
if not movie_data.get('output_list', {}).get('output_list') and re.match(r'^\d+$', video_id):
return self.url_result('http://tvpot.daum.net/clip/ClipView.do?clipid=%s' % video_id)
info = self._download_xml(
'http://tvpot.daum.net/clip/ClipInfoXml.do', video_id,
'Downloading video info', query={'vid': video_id})
formats = []
for format_el in movie_data['output_list']['output_list']:
profile = format_el['profile']
format_query = compat_urllib_parse_urlencode({
'vid': video_id,
'profile': profile,
})
url_doc = self._download_xml(
'http://videofarm.daum.net/controller/api/open/v1_2/MovieLocation.apixml?' + format_query,
video_id, note='Downloading video data for %s format' % profile)
format_url = url_doc.find('result/url').text
formats.append({
'url': format_url,
'format_id': profile,
'width': int_or_none(format_el.get('width')),
'height': int_or_none(format_el.get('height')),
'filesize': int_or_none(format_el.get('filesize')),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': info.find('TITLE').text,
'formats': formats,
'thumbnail': xpath_text(info, 'THUMB_URL'),
'description': xpath_text(info, 'CONTENTS'),
'duration': int_or_none(xpath_text(info, 'DURATION')),
'upload_date': info.find('REGDTTM').text[:8],
'view_count': str_to_int(xpath_text(info, 'PLAY_CNT')),
'comment_count': str_to_int(xpath_text(info, 'COMMENT_CNT')),
}
if not video_id.isdigit():
video_id += '@my'
return self.url_result(
self._KAKAO_EMBED_BASE + video_id, 'Kakao', video_id)
class DaumClipIE(InfoExtractor):
class DaumClipIE(DaumBaseIE):
_VALID_URL = r'https?://(?:m\.)?tvpot\.daum\.net/(?:clip/ClipView.(?:do|tv)|mypot/View.do)\?.*?clipid=(?P<id>\d+)'
IE_NAME = 'daum.net:clip'
_URL_TEMPLATE = 'http://tvpot.daum.net/clip/ClipView.do?clipid=%s'
@ -142,6 +111,9 @@ class DaumClipIE(InfoExtractor):
'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'duration': 3868,
'view_count': int,
'uploader': 'GOMeXP',
'uploader_id': 6667,
'timestamp': 1377911092,
},
}, {
'url': 'http://m.tvpot.daum.net/clip/ClipView.tv?clipid=54999425',
@ -154,22 +126,8 @@ class DaumClipIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
clip_info = self._download_json(
'http://tvpot.daum.net/mypot/json/GetClipInfo.do?clipid=%s' % video_id,
video_id, 'Downloading clip info')['clip_bean']
return {
'_type': 'url_transparent',
'id': video_id,
'url': 'http://tvpot.daum.net/v/%s' % clip_info['vid'],
'title': unescapeHTML(clip_info['title']),
'thumbnail': clip_info.get('thumb_url'),
'description': clip_info.get('contents'),
'duration': int_or_none(clip_info.get('duration')),
'upload_date': clip_info.get('up_date')[:8],
'view_count': int_or_none(clip_info.get('play_count')),
'ie_key': 'Daum',
}
return self.url_result(
self._KAKAO_EMBED_BASE + video_id, 'Kakao', video_id)
class DaumListIE(InfoExtractor):

View file

@ -16,10 +16,11 @@ class DctpTvIE(InfoExtractor):
_TESTS = [{
# 4x3
'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
'md5': '3ffbd1556c3fe210724d7088fad723e3',
'info_dict': {
'id': '95eaa4f33dad413aa17b4ee613cccc6c',
'display_id': 'videoinstallation-fuer-eine-kaufhausfassade',
'ext': 'flv',
'ext': 'm4v',
'title': 'Videoinstallation für eine Kaufhausfassade',
'description': 'Kurzfilm',
'thumbnail': r're:^https?://.*\.jpg$',
@ -27,10 +28,6 @@ class DctpTvIE(InfoExtractor):
'timestamp': 1302172322,
'upload_date': '20110407',
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
# 16x9
'url': 'http://www.dctp.tv/filme/sind-youtuber-die-besseren-lehrer/',
@ -59,33 +56,26 @@ class DctpTvIE(InfoExtractor):
uuid = media['uuid']
title = media['title']
ratio = '16x9' if media.get('is_wide') else '4x3'
play_path = 'mp4:%s_dctp_0500_%s.m4v' % (uuid, ratio)
is_wide = media.get('is_wide')
formats = []
servers = self._download_json(
'http://www.dctp.tv/streaming_servers/', display_id,
note='Downloading server list JSON', fatal=False)
def add_formats(suffix):
templ = 'https://%%s/%s_dctp_%s.m4v' % (uuid, suffix)
formats.extend([{
'format_id': 'hls-' + suffix,
'url': templ % 'cdn-segments.dctp.tv' + '/playlist.m3u8',
'protocol': 'm3u8_native',
}, {
'format_id': 's3-' + suffix,
'url': templ % 'completed-media.s3.amazonaws.com',
}, {
'format_id': 'http-' + suffix,
'url': templ % 'cdn-media.dctp.tv',
}])
if servers:
endpoint = next(
server['endpoint']
for server in servers
if url_or_none(server.get('endpoint'))
and 'cloudfront' in server['endpoint'])
else:
endpoint = 'rtmpe://s2pqqn4u96e4j8.cloudfront.net/cfx/st/'
app = self._search_regex(
r'^rtmpe?://[^/]+/(?P<app>.*)$', endpoint, 'app')
formats = [{
'url': endpoint,
'app': app,
'play_path': play_path,
'page_url': url,
'player_url': 'http://svm-prod-dctptv-static.s3.amazonaws.com/dctptv-relaunch2012-110.swf',
'ext': 'flv',
}]
add_formats('0500_' + ('16x9' if is_wide else '4x3'))
if is_wide:
add_formats('720p')
thumbnails = []
images = media.get('images')

View file

@ -5,31 +5,24 @@ import re
import string
from .discoverygo import DiscoveryGoBaseIE
from ..compat import (
compat_str,
compat_urllib_parse_unquote,
)
from ..utils import (
ExtractorError,
try_get,
)
from ..compat import compat_urllib_parse_unquote
from ..utils import ExtractorError
from ..compat import compat_HTTPError
class DiscoveryIE(DiscoveryGoBaseIE):
_VALID_URL = r'''(?x)https?://
(?P<site>
(?:www\.)?
go\.discovery|
www\.
(?:
discovery|
investigationdiscovery|
discoverylife|
animalplanet|
ahctv|
destinationamerica|
sciencechannel|
tlc|
velocity
tlc
)|
watch\.
(?:
@ -40,15 +33,15 @@ class DiscoveryIE(DiscoveryGoBaseIE):
cookingchanneltv|
motortrend
)
)\.com(?P<path>/tv-shows/[^/]+/(?:video|full-episode)s/(?P<id>[^./?#]+))'''
)\.com/tv-shows/(?P<show_slug>[^/]+)/(?:video|full-episode)s/(?P<id>[^./?#]+)'''
_TESTS = [{
'url': 'https://www.discovery.com/tv-shows/cash-cab/videos/dave-foley',
'url': 'https://go.discovery.com/tv-shows/cash-cab/videos/riding-with-matthew-perry',
'info_dict': {
'id': '5a2d9b4d6b66d17a5026e1fd',
'id': '5a2f35ce6b66d17a5026e29e',
'ext': 'mp4',
'title': 'Dave Foley',
'description': 'md5:4b39bcafccf9167ca42810eb5f28b01f',
'duration': 608,
'title': 'Riding with Matthew Perry',
'description': 'md5:a34333153e79bc4526019a5129e7f878',
'duration': 84,
},
'params': {
'skip_download': True, # requires ffmpeg
@ -56,20 +49,20 @@ class DiscoveryIE(DiscoveryGoBaseIE):
}, {
'url': 'https://www.investigationdiscovery.com/tv-shows/final-vision/full-episodes/final-vision',
'only_matching': True,
}, {
'url': 'https://go.discovery.com/tv-shows/alaskan-bush-people/videos/follow-your-own-road',
'only_matching': True,
}, {
# using `show_slug` is important to get the correct video data
'url': 'https://www.sciencechannel.com/tv-shows/mythbusters-on-science/full-episodes/christmas-special',
'only_matching': True,
}]
_GEO_COUNTRIES = ['US']
_GEO_BYPASS = False
_API_BASE_URL = 'https://api.discovery.com/v1/'
def _real_extract(self, url):
site, path, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
react_data = self._parse_json(self._search_regex(
r'window\.__reactTransmitPacket\s*=\s*({.+?});',
webpage, 'react data'), display_id)
content_blocks = react_data['layout'][path]['contentBlocks']
video = next(cb for cb in content_blocks if cb.get('type') == 'video')['content']['items'][0]
video_id = video['id']
site, show_slug, display_id = re.match(self._VALID_URL, url).groups()
access_token = None
cookies = self._get_cookies(url)
@ -79,27 +72,36 @@ class DiscoveryIE(DiscoveryGoBaseIE):
if auth_storage_cookie and auth_storage_cookie.value:
auth_storage = self._parse_json(compat_urllib_parse_unquote(
compat_urllib_parse_unquote(auth_storage_cookie.value)),
video_id, fatal=False) or {}
display_id, fatal=False) or {}
access_token = auth_storage.get('a') or auth_storage.get('access_token')
if not access_token:
access_token = self._download_json(
'https://%s.com/anonymous' % site, display_id, query={
'https://%s.com/anonymous' % site, display_id,
'Downloading token JSON metadata', query={
'authRel': 'authorization',
'client_id': try_get(
react_data, lambda x: x['application']['apiClientId'],
compat_str) or '3020a40c2356a645b4b4',
'client_id': '3020a40c2356a645b4b4',
'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
'redirectUri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html?https://www.%s.com' % site,
'redirectUri': 'https://www.discovery.com/',
})['access_token']
try:
headers = self.geo_verification_headers()
headers['Authorization'] = 'Bearer ' + access_token
headers = self.geo_verification_headers()
headers['Authorization'] = 'Bearer ' + access_token
try:
video = self._download_json(
self._API_BASE_URL + 'content/videos',
display_id, 'Downloading content JSON metadata',
headers=headers, query={
'embed': 'show.name',
'fields': 'authenticated,description.detailed,duration,episodeNumber,id,name,parental.rating,season.number,show,tags',
'slug': display_id,
'show_slug': show_slug,
})[0]
video_id = video['id']
stream = self._download_json(
'https://api.discovery.com/v1/streaming/video/' + video_id,
display_id, headers=headers)
self._API_BASE_URL + 'streaming/video/' + video_id,
display_id, 'Downloading streaming JSON metadata', headers=headers)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403):
e_description = self._parse_json(

View file

@ -3,63 +3,38 @@ from __future__ import unicode_literals
import re
from .brightcove import BrightcoveLegacyIE
from .dplay import DPlayIE
from ..compat import (
compat_parse_qs,
compat_urlparse,
)
from ..utils import smuggle_url
class DiscoveryNetworksDeIE(DPlayIE):
_VALID_URL = r'''(?x)https?://(?:www\.)?(?P<site>discovery|tlc|animalplanet|dmax)\.de/
(?:
.*\#(?P<id>\d+)|
(?:[^/]+/)*videos/(?P<display_id>[^/?#]+)|
programme/(?P<programme>[^/]+)/video/(?P<alternate_id>[^/]+)
)'''
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:tlc|dmax)\.de|dplay\.co\.uk)/(?:programme|show)/(?P<programme>[^/]+)/video/(?P<alternate_id>[^/]+)'
_TESTS = [{
'url': 'http://www.tlc.de/sendungen/breaking-amish/videos/#3235167922001',
'url': 'https://www.tlc.de/programme/breaking-amish/video/die-welt-da-drauen/DCB331270001100',
'info_dict': {
'id': '3235167922001',
'id': '78867',
'ext': 'mp4',
'title': 'Breaking Amish: Die Welt da draußen',
'description': (
'Vier Amische und eine Mennonitin wagen in New York'
' den Sprung in ein komplett anderes Leben. Begleitet sie auf'
' ihrem spannenden Weg.'),
'timestamp': 1396598084,
'upload_date': '20140404',
'uploader_id': '1659832546',
'title': 'Die Welt da draußen',
'description': 'md5:61033c12b73286e409d99a41742ef608',
'timestamp': 1554069600,
'upload_date': '20190331',
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}, {
'url': 'http://www.dmax.de/programme/storage-hunters-uk/videos/storage-hunters-uk-episode-6/',
'url': 'https://www.dmax.de/programme/dmax-highlights/video/tuning-star-sidney-hoffmann-exklusiv-bei-dmax/191023082312316',
'only_matching': True,
}, {
'url': 'http://www.discovery.de/#5332316765001',
'url': 'https://www.dplay.co.uk/show/ghost-adventures/video/hotel-leger-103620/EHD_280313B',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1659832546/default_default/index.html?videoId=%s'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
alternate_id = mobj.group('alternate_id')
if alternate_id:
self._initialize_geo_bypass({
'countries': ['DE'],
})
return self._get_disco_api_info(
url, '%s/%s' % (mobj.group('programme'), alternate_id),
'sonic-eu1-prod.disco-api.com', mobj.group('site') + 'de')
brightcove_id = mobj.group('id')
if not brightcove_id:
title = mobj.group('title')
webpage = self._download_webpage(url, title)
brightcove_legacy_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
brightcove_id = compat_parse_qs(compat_urlparse.urlparse(
brightcove_legacy_url).query)['@videoPlayer'][0]
return self.url_result(smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, {'geo_countries': ['DE']}),
'BrightcoveNew', brightcove_id)
domain, programme, alternate_id = re.match(self._VALID_URL, url).groups()
country = 'GB' if domain == 'dplay.co.uk' else 'DE'
realm = 'questuk' if country == 'GB' else domain.replace('.', '')
return self._get_disco_api_info(
url, '%s/%s' % (programme, alternate_id),
'sonic-eu1-prod.disco-api.com', realm, country)

View file

@ -9,8 +9,8 @@ from ..utils import int_or_none
class DLiveVODIE(InfoExtractor):
IE_NAME = 'dlive:vod'
_VALID_URL = r'https?://(?:www\.)?dlive\.tv/p/(?P<uploader_id>.+?)\+(?P<id>[a-zA-Z0-9]+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?dlive\.tv/p/(?P<uploader_id>.+?)\+(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://dlive.tv/p/pdp+3mTzOl4WR',
'info_dict': {
'id': '3mTzOl4WR',
@ -20,7 +20,10 @@ class DLiveVODIE(InfoExtractor):
'timestamp': 1562011015,
'uploader_id': 'pdp',
}
}
}, {
'url': 'https://dlive.tv/p/pdpreplay+D-RD-xSZg',
'only_matching': True,
}]
def _real_extract(self, url):
uploader_id, vod_id = re.match(self._VALID_URL, url).groups()

View file

@ -1,74 +1,68 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
import time
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_str,
compat_urlparse,
)
from ..compat import compat_HTTPError
from ..utils import (
determine_ext,
ExtractorError,
float_or_none,
int_or_none,
remove_end,
try_get,
unified_strdate,
unified_timestamp,
update_url_query,
urljoin,
USER_AGENTS,
)
class DPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?P<domain>www\.(?P<host>dplay\.(?P<country>dk|se|no)))/(?:video(?:er|s)/)?(?P<id>[^/]+/[^/?#]+)'
_VALID_URL = r'''(?x)https?://
(?P<domain>
(?:www\.)?(?P<host>dplay\.(?P<country>dk|fi|jp|se|no))|
(?P<subdomain_country>es|it)\.dplay\.com
)/[^/]+/(?P<id>[^/]+/[^/?#]+)'''
_TESTS = [{
# non geo restricted, via secure api, unsigned download hls URL
'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/',
'url': 'https://www.dplay.se/videos/nugammalt-77-handelser-som-format-sverige/nugammalt-77-handelser-som-format-sverige-101',
'info_dict': {
'id': '3172',
'display_id': 'nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet',
'id': '13628',
'display_id': 'nugammalt-77-handelser-som-format-sverige/nugammalt-77-handelser-som-format-sverige-101',
'ext': 'mp4',
'title': 'Svensken lär sig njuta av livet',
'description': 'md5:d3819c9bccffd0fe458ca42451dd50d8',
'duration': 2650,
'timestamp': 1365454320,
'duration': 2649.856,
'timestamp': 1365453720,
'upload_date': '20130408',
'creator': 'Kanal 5 (Home)',
'creator': 'Kanal 5',
'series': 'Nugammalt - 77 händelser som format Sverige',
'season_number': 1,
'episode_number': 1,
'age_limit': 0,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}, {
# geo restricted, via secure api, unsigned download hls URL
'url': 'http://www.dplay.dk/mig-og-min-mor/season-6-episode-12/',
'url': 'http://www.dplay.dk/videoer/ted-bundy-mind-of-a-monster/ted-bundy-mind-of-a-monster',
'info_dict': {
'id': '70816',
'display_id': 'mig-og-min-mor/season-6-episode-12',
'id': '104465',
'display_id': 'ted-bundy-mind-of-a-monster/ted-bundy-mind-of-a-monster',
'ext': 'mp4',
'title': 'Episode 12',
'description': 'md5:9c86e51a93f8a4401fc9641ef9894c90',
'duration': 2563,
'timestamp': 1429696800,
'upload_date': '20150422',
'creator': 'Kanal 4 (Home)',
'series': 'Mig og min mor',
'season_number': 6,
'episode_number': 12,
'age_limit': 0,
'title': 'Ted Bundy: Mind Of A Monster',
'description': 'md5:8b780f6f18de4dae631668b8a9637995',
'duration': 5290.027,
'timestamp': 1570694400,
'upload_date': '20191010',
'creator': 'ID - Investigation Discovery',
'series': 'Ted Bundy: Mind Of A Monster',
'season_number': 1,
'episode_number': 1,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}, {
# geo restricted, via direct unsigned hls URL
'url': 'http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/',
'only_matching': True,
}, {
# disco-api
'url': 'https://www.dplay.no/videoer/i-kongens-klr/sesong-1-episode-7',
@ -89,19 +83,59 @@ class DPlayIE(InfoExtractor):
'format': 'bestvideo',
'skip_download': True,
},
'skip': 'Available for Premium users',
}, {
'url': 'https://www.dplay.dk/videoer/singleliv/season-5-episode-3',
'url': 'http://it.dplay.com/nove/biografie-imbarazzanti/luigi-di-maio-la-psicosi-di-stanislawskij/',
'md5': '2b808ffb00fc47b884a172ca5d13053c',
'info_dict': {
'id': '6918',
'display_id': 'biografie-imbarazzanti/luigi-di-maio-la-psicosi-di-stanislawskij',
'ext': 'mp4',
'title': 'Luigi Di Maio: la psicosi di Stanislawskij',
'description': 'md5:3c7a4303aef85868f867a26f5cc14813',
'thumbnail': r're:^https?://.*\.jpe?g',
'upload_date': '20160524',
'timestamp': 1464076800,
'series': 'Biografie imbarazzanti',
'season_number': 1,
'episode': 'Episode 1',
'episode_number': 1,
},
}, {
'url': 'https://es.dplay.com/dmax/la-fiebre-del-oro/temporada-8-episodio-1/',
'info_dict': {
'id': '21652',
'display_id': 'la-fiebre-del-oro/temporada-8-episodio-1',
'ext': 'mp4',
'title': 'Episodio 1',
'description': 'md5:b9dcff2071086e003737485210675f69',
'thumbnail': r're:^https?://.*\.png',
'upload_date': '20180709',
'timestamp': 1531173540,
'series': 'La fiebre del oro',
'season_number': 8,
'episode': 'Episode 1',
'episode_number': 1,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.dplay.fi/videot/shifting-gears-with-aaron-kaufman/episode-16',
'only_matching': True,
}, {
'url': 'https://www.dplay.se/videos/sofias-anglar/sofias-anglar-1001',
'url': 'https://www.dplay.jp/video/gold-rush/24086',
'only_matching': True,
}]
def _get_disco_api_info(self, url, display_id, disco_host, realm):
disco_base = 'https://' + disco_host
def _get_disco_api_info(self, url, display_id, disco_host, realm, country):
geo_countries = [country.upper()]
self._initialize_geo_bypass({
'countries': geo_countries,
})
disco_base = 'https://%s/' % disco_host
token = self._download_json(
'%s/token' % disco_base, display_id, 'Downloading token',
disco_base + 'token', display_id, 'Downloading token',
query={
'realm': realm,
})['data']['attributes']['token']
@ -110,17 +144,35 @@ class DPlayIE(InfoExtractor):
'Authorization': 'Bearer ' + token,
}
video = self._download_json(
'%s/content/videos/%s' % (disco_base, display_id), display_id,
disco_base + 'content/videos/' + display_id, display_id,
headers=headers, query={
'include': 'show'
'fields[channel]': 'name',
'fields[image]': 'height,src,width',
'fields[show]': 'name',
'fields[tag]': 'name',
'fields[video]': 'description,episodeNumber,name,publishStart,seasonNumber,videoDuration',
'include': 'images,primaryChannel,show,tags'
})
video_id = video['data']['id']
info = video['data']['attributes']
title = info['name']
title = info['name'].strip()
formats = []
for format_id, format_dict in self._download_json(
'%s/playback/videoPlaybackInfo/%s' % (disco_base, video_id),
display_id, headers=headers)['data']['attributes']['streaming'].items():
try:
streaming = self._download_json(
disco_base + 'playback/videoPlaybackInfo/' + video_id,
display_id, headers=headers)['data']['attributes']['streaming']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
info = self._parse_json(e.cause.read().decode('utf-8'), display_id)
error = info['errors'][0]
error_code = error.get('code')
if error_code == 'access.denied.geoblocked':
self.raise_geo_restricted(countries=geo_countries)
elif error_code == 'access.denied.missingpackage':
self.raise_login_required()
raise ExtractorError(info['errors'][0]['detail'], expected=True)
raise
for format_id, format_dict in streaming.items():
if not isinstance(format_dict, dict):
continue
format_url = format_dict.get('url')
@ -142,235 +194,54 @@ class DPlayIE(InfoExtractor):
})
self._sort_formats(formats)
series = None
try:
included = video.get('included')
if isinstance(included, list):
show = next(e for e in included if e.get('type') == 'show')
series = try_get(
show, lambda x: x['attributes']['name'], compat_str)
except StopIteration:
pass
creator = series = None
tags = []
thumbnails = []
included = video.get('included') or []
if isinstance(included, list):
for e in included:
attributes = e.get('attributes')
if not attributes:
continue
e_type = e.get('type')
if e_type == 'channel':
creator = attributes.get('name')
elif e_type == 'image':
src = attributes.get('src')
if src:
thumbnails.append({
'url': src,
'width': int_or_none(attributes.get('width')),
'height': int_or_none(attributes.get('height')),
})
if e_type == 'show':
series = attributes.get('name')
elif e_type == 'tag':
name = attributes.get('name')
if name:
tags.append(name)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': info.get('description'),
'duration': float_or_none(
info.get('videoDuration'), scale=1000),
'duration': float_or_none(info.get('videoDuration'), 1000),
'timestamp': unified_timestamp(info.get('publishStart')),
'series': series,
'season_number': int_or_none(info.get('seasonNumber')),
'episode_number': int_or_none(info.get('episodeNumber')),
'age_limit': int_or_none(info.get('minimum_age')),
'creator': creator,
'tags': tags,
'thumbnails': thumbnails,
'formats': formats,
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id')
domain = mobj.group('domain')
self._initialize_geo_bypass({
'countries': [mobj.group('country').upper()],
})
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'data-video-id=["\'](\d+)', webpage, 'video id', default=None)
if not video_id:
host = mobj.group('host')
return self._get_disco_api_info(
url, display_id, 'disco-api.' + host, host.replace('.', ''))
info = self._download_json(
'http://%s/api/v2/ajax/videos?video_id=%s' % (domain, video_id),
video_id)['data'][0]
title = info['title']
PROTOCOLS = ('hls', 'hds')
formats = []
def extract_formats(protocol, manifest_url):
if protocol == 'hls':
m3u8_formats = self._extract_m3u8_formats(
manifest_url, video_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False)
# Sometimes final URLs inside m3u8 are unsigned, let's fix this
# ourselves. Also fragments' URLs are only served signed for
# Safari user agent.
query = compat_urlparse.parse_qs(compat_urlparse.urlparse(manifest_url).query)
for m3u8_format in m3u8_formats:
m3u8_format.update({
'url': update_url_query(m3u8_format['url'], query),
'http_headers': {
'User-Agent': USER_AGENTS['Safari'],
},
})
formats.extend(m3u8_formats)
elif protocol == 'hds':
formats.extend(self._extract_f4m_formats(
manifest_url + '&hdcore=3.8.0&plugin=flowplayer-3.8.0.0',
video_id, f4m_id=protocol, fatal=False))
domain_tld = domain.split('.')[-1]
if domain_tld in ('se', 'dk', 'no'):
for protocol in PROTOCOLS:
# Providing dsc-geo allows to bypass geo restriction in some cases
self._set_cookie(
'secure.dplay.%s' % domain_tld, 'dsc-geo',
json.dumps({
'countryCode': domain_tld.upper(),
'expiry': (time.time() + 20 * 60) * 1000,
}))
stream = self._download_json(
'https://secure.dplay.%s/secure/api/v2/user/authorization/stream/%s?stream_type=%s'
% (domain_tld, video_id, protocol), video_id,
'Downloading %s stream JSON' % protocol, fatal=False)
if stream and stream.get(protocol):
extract_formats(protocol, stream[protocol])
# The last resort is to try direct unsigned hls/hds URLs from info dictionary.
# Sometimes this does work even when secure API with dsc-geo has failed (e.g.
# http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/).
if not formats:
for protocol in PROTOCOLS:
if info.get(protocol):
extract_formats(protocol, info[protocol])
self._sort_formats(formats)
subtitles = {}
for lang in ('se', 'sv', 'da', 'nl', 'no'):
for format_id in ('web_vtt', 'vtt', 'srt'):
subtitle_url = info.get('subtitles_%s_%s' % (lang, format_id))
if subtitle_url:
subtitles.setdefault(lang, []).append({'url': subtitle_url})
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': info.get('video_metadata_longDescription'),
'duration': int_or_none(info.get('video_metadata_length'), scale=1000),
'timestamp': int_or_none(info.get('video_publish_date')),
'creator': info.get('video_metadata_homeChannel'),
'series': info.get('video_metadata_show'),
'season_number': int_or_none(info.get('season')),
'episode_number': int_or_none(info.get('episode')),
'age_limit': int_or_none(info.get('minimum_age')),
'formats': formats,
'subtitles': subtitles,
}
class DPlayItIE(InfoExtractor):
_VALID_URL = r'https?://it\.dplay\.com/[^/]+/[^/]+/(?P<id>[^/?#]+)'
_GEO_COUNTRIES = ['IT']
_TEST = {
'url': 'http://it.dplay.com/nove/biografie-imbarazzanti/luigi-di-maio-la-psicosi-di-stanislawskij/',
'md5': '2b808ffb00fc47b884a172ca5d13053c',
'info_dict': {
'id': '6918',
'display_id': 'luigi-di-maio-la-psicosi-di-stanislawskij',
'ext': 'mp4',
'title': 'Biografie imbarazzanti: Luigi Di Maio: la psicosi di Stanislawskij',
'description': 'md5:3c7a4303aef85868f867a26f5cc14813',
'thumbnail': r're:^https?://.*\.jpe?g',
'upload_date': '20160524',
'series': 'Biografie imbarazzanti',
'season_number': 1,
'episode': 'Luigi Di Maio: la psicosi di Stanislawskij',
'episode_number': 1,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
title = remove_end(self._og_search_title(webpage), ' | Dplay')
video_id = None
info = self._search_regex(
r'playback_json\s*:\s*JSON\.parse\s*\(\s*("(?:\\.|[^"\\])+?")',
webpage, 'playback JSON', default=None)
if info:
for _ in range(2):
info = self._parse_json(info, display_id, fatal=False)
if not info:
break
else:
video_id = try_get(info, lambda x: x['data']['id'])
if not info:
info_url = self._search_regex(
(r'playback_json_url\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
r'url\s*[:=]\s*["\'](?P<url>(?:https?:)?//[^/]+/playback/videoPlaybackInfo/\d+)'),
webpage, 'info url', group='url')
info_url = urljoin(url, info_url)
video_id = info_url.rpartition('/')[-1]
try:
info = self._download_json(
info_url, display_id, headers={
'Authorization': 'Bearer %s' % self._get_cookies(url).get(
'dplayit_token').value,
'Referer': url,
})
if isinstance(info, compat_str):
info = self._parse_json(info, display_id)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 403):
info = self._parse_json(e.cause.read().decode('utf-8'), display_id)
error = info['errors'][0]
if error.get('code') == 'access.denied.geoblocked':
self.raise_geo_restricted(
msg=error.get('detail'), countries=self._GEO_COUNTRIES)
raise ExtractorError(info['errors'][0]['detail'], expected=True)
raise
hls_url = info['data']['attributes']['streaming']['hls']['url']
formats = self._extract_m3u8_formats(
hls_url, display_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
self._sort_formats(formats)
series = self._html_search_regex(
r'(?s)<h1[^>]+class=["\'].*?\bshow_title\b.*?["\'][^>]*>(.+?)</h1>',
webpage, 'series', fatal=False)
episode = self._search_regex(
r'<p[^>]+class=["\'].*?\bdesc_ep\b.*?["\'][^>]*>\s*<br/>\s*<b>([^<]+)',
webpage, 'episode', fatal=False)
mobj = re.search(
r'(?s)<span[^>]+class=["\']dates["\'][^>]*>.+?\bS\.(?P<season_number>\d+)\s+E\.(?P<episode_number>\d+)\s*-\s*(?P<upload_date>\d{2}/\d{2}/\d{4})',
webpage)
if mobj:
season_number = int(mobj.group('season_number'))
episode_number = int(mobj.group('episode_number'))
upload_date = unified_strdate(mobj.group('upload_date'))
else:
season_number = episode_number = upload_date = None
return {
'id': compat_str(video_id or display_id),
'display_id': display_id,
'title': title,
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'series': series,
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
'upload_date': upload_date,
'formats': formats,
}
domain = mobj.group('domain').lstrip('www.')
country = mobj.group('country') or mobj.group('subdomain_country')
host = 'disco-api.' + domain if domain.startswith('dplay.') else 'eu2-prod.disco-api.com'
return self._get_disco_api_info(
url, display_id, host, 'dplay' + country, country)

View file

@ -17,6 +17,7 @@ from ..utils import (
float_or_none,
mimetype2ext,
str_or_none,
try_get,
unified_timestamp,
update_url_query,
url_or_none,
@ -24,7 +25,14 @@ from ..utils import (
class DRTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dr\.dk/(?:tv/se|nyheder|radio(?:/ondemand)?)/(?:[^/]+/)*(?P<id>[\da-z-]+)(?:[/#?]|$)'
_VALID_URL = r'''(?x)
https?://
(?:
(?:www\.)?dr\.dk/(?:tv/se|nyheder|radio(?:/ondemand)?)/(?:[^/]+/)*|
(?:www\.)?(?:dr\.dk|dr-massive\.com)/drtv/(?:se|episode)/
)
(?P<id>[\da-z_-]+)
'''
_GEO_BYPASS = False
_GEO_COUNTRIES = ['DK']
IE_NAME = 'drtv'
@ -83,6 +91,26 @@ class DRTVIE(InfoExtractor):
}, {
'url': 'https://www.dr.dk/radio/p4kbh/regionale-nyheder-kh4/p4-nyheder-2019-06-26-17-30-9',
'only_matching': True,
}, {
'url': 'https://www.dr.dk/drtv/se/bonderoeven_71769',
'info_dict': {
'id': '00951930010',
'ext': 'mp4',
'title': 'Bonderøven (1:8)',
'description': 'md5:3cf18fc0d3b205745d4505f896af8121',
'timestamp': 1546542000,
'upload_date': '20190103',
'duration': 2576.6,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.dr.dk/drtv/episode/bonderoeven_71769',
'only_matching': True,
}, {
'url': 'https://dr-massive.com/drtv/se/bonderoeven_71769',
'only_matching': True,
}]
def _real_extract(self, url):
@ -100,13 +128,32 @@ class DRTVIE(InfoExtractor):
webpage, 'video id', default=None)
if not video_id:
video_id = compat_urllib_parse_unquote(self._search_regex(
video_id = self._search_regex(
r'(urn(?:%3A|:)dr(?:%3A|:)mu(?:%3A|:)programcard(?:%3A|:)[\da-f]+)',
webpage, 'urn'))
webpage, 'urn', default=None)
if video_id:
video_id = compat_urllib_parse_unquote(video_id)
_PROGRAMCARD_BASE = 'https://www.dr.dk/mu-online/api/1.4/programcard'
query = {'expanded': 'true'}
if video_id:
programcard_url = '%s/%s' % (_PROGRAMCARD_BASE, video_id)
else:
programcard_url = _PROGRAMCARD_BASE
page = self._parse_json(
self._search_regex(
r'data\s*=\s*({.+?})\s*(?:;|</script)', webpage,
'data'), '1')['cache']['page']
page = page[list(page.keys())[0]]
item = try_get(
page, (lambda x: x['item'], lambda x: x['entries'][0]['item']),
dict)
video_id = item['customId'].split(':')[-1]
query['productionnumber'] = video_id
data = self._download_json(
'https://www.dr.dk/mu-online/api/1.4/programcard/%s' % video_id,
video_id, 'Downloading video JSON', query={'expanded': 'true'})
programcard_url, video_id, 'Downloading video JSON', query=query)
title = str_or_none(data.get('Title')) or re.sub(
r'\s*\|\s*(?:TV\s*\|\s*DR|DRTV)$', '',

View file

@ -1,20 +1,17 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_b64decode
from ..utils import (
int_or_none,
qualities,
sanitized_Request,
)
class DumpertIE(InfoExtractor):
_VALID_URL = r'(?P<protocol>https?)://(?:www\.)?dumpert\.nl/(?:mediabase|embed)/(?P<id>[0-9]+/[0-9a-zA-Z]+)'
_VALID_URL = r'(?P<protocol>https?)://(?:(?:www|legacy)\.)?dumpert\.nl/(?:mediabase|embed|item)/(?P<id>[0-9]+[/_][0-9a-zA-Z]+)'
_TESTS = [{
'url': 'http://www.dumpert.nl/mediabase/6646981/951bc60f/',
'url': 'https://www.dumpert.nl/item/6646981_951bc60f',
'md5': '1b9318d7d5054e7dcb9dc7654f21d643',
'info_dict': {
'id': '6646981/951bc60f',
@ -24,46 +21,60 @@ class DumpertIE(InfoExtractor):
'thumbnail': r're:^https?://.*\.jpg$',
}
}, {
'url': 'http://www.dumpert.nl/embed/6675421/dc440fe7/',
'url': 'https://www.dumpert.nl/embed/6675421_dc440fe7',
'only_matching': True,
}, {
'url': 'http://legacy.dumpert.nl/mediabase/6646981/951bc60f',
'only_matching': True,
}, {
'url': 'http://legacy.dumpert.nl/embed/6675421/dc440fe7',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
protocol = mobj.group('protocol')
url = '%s://www.dumpert.nl/mediabase/%s' % (protocol, video_id)
req = sanitized_Request(url)
req.add_header('Cookie', 'nsfw=1; cpc=10')
webpage = self._download_webpage(req, video_id)
files_base64 = self._search_regex(
r'data-files="([^"]+)"', webpage, 'data files')
files = self._parse_json(
compat_b64decode(files_base64).decode('utf-8'),
video_id)
video_id = self._match_id(url).replace('_', '/')
item = self._download_json(
'http://api-live.dumpert.nl/mobile_api/json/info/' + video_id.replace('/', '_'),
video_id)['items'][0]
title = item['title']
media = next(m for m in item['media'] if m.get('mediatype') == 'VIDEO')
quality = qualities(['flv', 'mobile', 'tablet', '720p'])
formats = [{
'url': video_url,
'format_id': format_id,
'quality': quality(format_id),
} for format_id, video_url in files.items() if format_id != 'still']
formats = []
for variant in media.get('variants', []):
uri = variant.get('uri')
if not uri:
continue
version = variant.get('version')
formats.append({
'url': uri,
'format_id': version,
'quality': quality(version),
})
self._sort_formats(formats)
title = self._html_search_meta(
'title', webpage) or self._og_search_title(webpage)
description = self._html_search_meta(
'description', webpage) or self._og_search_description(webpage)
thumbnail = files.get('still') or self._og_search_thumbnail(webpage)
thumbnails = []
stills = item.get('stills') or {}
for t in ('thumb', 'still'):
for s in ('', '-medium', '-large'):
still_id = t + s
still_url = stills.get(still_id)
if not still_url:
continue
thumbnails.append({
'id': still_id,
'url': still_url,
})
stats = item.get('stats') or {}
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'formats': formats
'description': item.get('description'),
'thumbnails': thumbnails,
'formats': formats,
'duration': int_or_none(media.get('duration')),
'like_count': int_or_none(stats.get('kudos_total')),
'view_count': int_or_none(stats.get('views_total')),
}

View file

@ -2,6 +2,7 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..compat import (
@ -18,7 +19,7 @@ from ..utils import (
class EinthusanIE(InfoExtractor):
_VALID_URL = r'https?://einthusan\.tv/movie/watch/(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?P<host>einthusan\.(?:tv|com|ca))/movie/watch/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://einthusan.tv/movie/watch/9097/',
'md5': 'ff0f7f2065031b8a2cf13a933731c035',
@ -32,6 +33,12 @@ class EinthusanIE(InfoExtractor):
}, {
'url': 'https://einthusan.tv/movie/watch/51MZ/?lang=hindi',
'only_matching': True,
}, {
'url': 'https://einthusan.com/movie/watch/9097/',
'only_matching': True,
}, {
'url': 'https://einthusan.ca/movie/watch/4E9n/?lang=hindi',
'only_matching': True,
}]
# reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js
@ -41,7 +48,9 @@ class EinthusanIE(InfoExtractor):
)).decode('utf-8'), video_id)
def _real_extract(self, url):
video_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
@ -53,7 +62,7 @@ class EinthusanIE(InfoExtractor):
page_id = self._html_search_regex(
'<html[^>]+data-pageid="([^"]+)"', webpage, 'page ID')
video_data = self._download_json(
'https://einthusan.tv/ajax/movie/watch/%s/' % video_id, video_id,
'https://%s/ajax/movie/watch/%s/' % (host, video_id), video_id,
data=urlencode_postdata({
'xEvent': 'UIVideoPlayer.PingOutcome',
'xJson': json.dumps({

View file

@ -4,7 +4,6 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
encode_base_n,
ExtractorError,
@ -55,7 +54,7 @@ class EpornerIE(InfoExtractor):
webpage, urlh = self._download_webpage_handle(url, display_id)
video_id = self._match_id(compat_str(urlh.geturl()))
video_id = self._match_id(urlh.geturl())
hash = self._search_regex(
r'hash\s*:\s*["\']([\da-f]{32})', webpage, 'hash')

View file

@ -15,7 +15,7 @@ from ..utils import (
class ExpressenIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:www\.)?expressen\.se/
(?:www\.)?(?:expressen|di)\.se/
(?:(?:tvspelare/video|videoplayer/embed)/)?
tv/(?:[^/]+/)*
(?P<id>[^/?#&]+)
@ -42,13 +42,16 @@ class ExpressenIE(InfoExtractor):
}, {
'url': 'https://www.expressen.se/videoplayer/embed/tv/ditv/ekonomistudion/experterna-har-ar-fragorna-som-avgor-valet/?embed=true&external=true&autoplay=true&startVolume=0&partnerId=di',
'only_matching': True,
}, {
'url': 'https://www.di.se/videoplayer/embed/tv/ditv/borsmorgon/implantica-rusar-70--under-borspremiaren-hor-styrelsemedlemmen/?embed=true&external=true&autoplay=true&startVolume=0&partnerId=di',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url') for mobj in re.finditer(
r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?expressen\.se/(?:tvspelare/video|videoplayer/embed)/tv/.+?)\1',
r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?(?:expressen|di)\.se/(?:tvspelare/video|videoplayer/embed)/tv/.+?)\1',
webpage)]
def _real_extract(self, url):

View file

@ -18,10 +18,10 @@ from .acast import (
ACastIE,
ACastChannelIE,
)
from .addanime import AddAnimeIE
from .adn import ADNIE
from .adobeconnect import AdobeConnectIE
from .adobetv import (
AdobeTVEmbedIE,
AdobeTVIE,
AdobeTVShowIE,
AdobeTVChannelIE,
@ -80,7 +80,6 @@ from .awaan import (
)
from .azmedien import AZMedienIE
from .baidu import BaiduVideoIE
from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE, BandcampWeeklyIE
from .bbc import (
BBCCoUkIE,
@ -104,6 +103,9 @@ from .bild import BildIE
from .bilibili import (
BiliBiliIE,
BiliBiliBangumiIE,
BilibiliAudioIE,
BilibiliAudioAlbumIE,
BiliBiliPlayerIE,
)
from .biobiochiletv import BioBioChileTVIE
from .bitchute import (
@ -222,13 +224,13 @@ from .comedycentral import (
ComedyCentralTVIE,
ToshIE,
)
from .comcarcoff import ComCarCoffIE
from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .commonprotocols import (
MmsIE,
RtmpIE,
)
from .condenast import CondeNastIE
from .contv import CONtvIE
from .corus import CorusIE
from .cracked import CrackedIE
from .crackle import CrackleIE
@ -252,10 +254,6 @@ from .dailymotion import (
DailymotionPlaylistIE,
DailymotionUserIE,
)
from .daisuki import (
DaisukiMottoIE,
DaisukiMottoPlaylistIE,
)
from .daum import (
DaumIE,
DaumClipIE,
@ -274,10 +272,7 @@ from .douyutv import (
DouyuShowIE,
DouyuTVIE,
)
from .dplay import (
DPlayIE,
DPlayItIE,
)
from .dplay import DPlayIE
from .dreisat import DreiSatIE
from .drbonanza import DRBonanzaIE
from .drtuber import DrTuberIE
@ -356,7 +351,6 @@ from .firsttv import FirstTVIE
from .fivemin import FiveMinIE
from .fivetv import FiveTVIE
from .flickr import FlickrIE
from .flipagram import FlipagramIE
from .folketinget import FolketingetIE
from .footyroom import FootyRoomIE
from .formula1 import Formula1IE
@ -367,7 +361,10 @@ from .fourtube import (
FuxIE,
)
from .fox import FOXIE
from .fox9 import FOX9IE
from .fox9 import (
FOX9IE,
FOX9NewsIE,
)
from .foxgay import FoxgayIE
from .foxnews import (
FoxNewsIE,
@ -400,10 +397,6 @@ from .fusion import FusionIE
from .fxnetworks import FXNetworksIE
from .gaia import GaiaIE
from .gameinformer import GameInformerIE
from .gameone import (
GameOneIE,
GameOnePlaylistIE,
)
from .gamespot import GameSpotIE
from .gamestar import GameStarIE
from .gaskrank import GaskrankIE
@ -419,7 +412,6 @@ from .globo import (
GloboArticleIE,
)
from .go import GoIE
from .go90 import Go90IE
from .godtube import GodTubeIE
from .golem import GolemIE
from .googledrive import GoogleDriveIE
@ -428,7 +420,6 @@ from .googlesearch import GoogleSearchIE
from .goshgay import GoshgayIE
from .gputechconf import GPUTechConfIE
from .groupon import GrouponIE
from .hark import HarkIE
from .hbo import HBOIE
from .hearthisat import HearThisAtIE
from .heise import HeiseIE
@ -460,7 +451,6 @@ from .hungama import (
HungamaSongIE,
)
from .hypem import HypemIE
from .iconosquare import IconosquareIE
from .ign import (
IGNIE,
OneUPIE,
@ -508,7 +498,6 @@ from .jeuxvideo import JeuxVideoIE
from .jove import JoveIE
from .joj import JojIE
from .jwplatform import JWPlatformIE
from .jpopsukitv import JpopsukiIE
from .kakao import KakaoIE
from .kaltura import KalturaIE
from .kanalplay import KanalPlayIE
@ -519,10 +508,9 @@ from .keezmovies import KeezMoviesIE
from .ketnet import KetnetIE
from .khanacademy import KhanAcademyIE
from .kickstarter import KickStarterIE
from .kinja import KinjaEmbedIE
from .kinopoisk import KinoPoiskIE
from .keek import KeekIE
from .konserthusetplay import KonserthusetPlayIE
from .kontrtube import KontrTubeIE
from .krasview import KrasViewIE
from .ku6 import Ku6IE
from .kusi import KUSIIE
@ -546,7 +534,6 @@ from .lcp import (
LcpPlayIE,
LcpIE,
)
from .learnr import LearnrIE
from .lecture2go import Lecture2GoIE
from .lecturio import (
LecturioIE,
@ -598,13 +585,11 @@ from .lynda import (
LyndaCourseIE
)
from .m6 import M6IE
from .macgamestore import MacGameStoreIE
from .mailru import (
MailRuIE,
MailRuMusicIE,
MailRuMusicSearchIE,
)
from .makertv import MakerTVIE
from .malltv import MallTVIE
from .mangomolo import (
MangomoloVideoIE,
@ -638,22 +623,23 @@ from .microsoftvirtualacademy import (
MicrosoftVirtualAcademyIE,
MicrosoftVirtualAcademyCourseIE,
)
from .minhateca import MinhatecaIE
from .ministrygrid import MinistryGridIE
from .minoto import MinotoIE
from .miomio import MioMioIE
from .mit import TechTVMITIE, MITIE, OCWMITIE
from .mit import TechTVMITIE, OCWMITIE
from .mitele import MiTeleIE
from .mixcloud import (
MixcloudIE,
MixcloudUserIE,
MixcloudPlaylistIE,
MixcloudStreamIE,
)
from .mlb import MLBIE
from .mnet import MnetIE
from .moevideo import MoeVideoIE
from .mofosex import MofosexIE
from .mofosex import (
MofosexIE,
MofosexEmbedIE,
)
from .mojvideo import MojvideoIE
from .morningstar import MorningstarIE
from .motherless import (
@ -670,10 +656,9 @@ from .mtv import (
MTVVideoIE,
MTVServicesEmbeddedIE,
MTVDEIE,
MTV81IE,
MTVJapanIE,
)
from .muenchentv import MuenchenTVIE
from .musicplayon import MusicPlayOnIE
from .mwave import MwaveIE, MwaveMeetGreetIE
from .mychannels import MyChannelsIE
from .myspace import MySpaceIE, MySpaceAlbumIE
@ -813,16 +798,22 @@ from .ooyala import (
OoyalaIE,
OoyalaExternalIE,
)
from .openload import (
OpenloadIE,
VerystreamIE,
)
from .ora import OraTVIE
from .orf import (
ORFTVthekIE,
ORFFM4IE,
ORFFM4StoryIE,
ORFOE1IE,
ORFOE3IE,
ORFNOEIE,
ORFWIEIE,
ORFBGLIE,
ORFOOEIE,
ORFSTMIE,
ORFKTNIE,
ORFSBGIE,
ORFTIRIE,
ORFVBGIE,
ORFIPTVIE,
)
from .outsidetv import OutsideTVIE
@ -830,7 +821,6 @@ from .packtpub import (
PacktPubIE,
PacktPubCourseIE,
)
from .pandatv import PandaTVIE
from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE
@ -873,6 +863,7 @@ from .polskieradio import (
PolskieRadioIE,
PolskieRadioCategoryIE,
)
from .popcorntimes import PopcorntimesIE
from .popcorntv import PopcornTVIE
from .porn91 import Porn91IE
from .porncom import PornComIE
@ -891,7 +882,6 @@ from .puhutv import (
PuhuTVSerieIE,
)
from .presstv import PressTVIE
from .promptfile import PromptFileIE
from .prosiebensat1 import ProSiebenSat1IE
from .puls4 import Puls4IE
from .pyvideo import PyvideoIE
@ -928,7 +918,9 @@ from .rbmaradio import RBMARadioIE
from .rds import RDSIE
from .redbulltv import (
RedBullTVIE,
RedBullEmbedIE,
RedBullTVRrnContentIE,
RedBullIE,
)
from .reddit import (
RedditIE,
@ -943,10 +935,6 @@ from .rentv import (
from .restudy import RestudyIE
from .reuters import ReutersIE
from .reverbnation import ReverbNationIE
from .revision3 import (
Revision3EmbedIE,
Revision3IE,
)
from .rice import RICEIE
from .rmcdecouverte import RMCDecouverteIE
from .ro220 import Ro220IE
@ -990,11 +978,17 @@ from .savefrom import SaveFromIE
from .sbs import SBSIE
from .screencast import ScreencastIE
from .screencastomatic import ScreencastOMaticIE
from .scrippsnetworks import ScrippsNetworksWatchIE
from .scrippsnetworks import (
ScrippsNetworksWatchIE,
ScrippsNetworksIE,
)
from .scte import (
SCTEIE,
SCTECourseIE,
)
from .seeker import SeekerIE
from .senateisvp import SenateISVPIE
from .sendtonews import SendtoNewsIE
from .servingsys import ServingSysIE
from .servus import ServusIE
from .sevenplus import SevenPlusIE
from .sexu import SexuIE
@ -1035,6 +1029,7 @@ from .snotr import SnotrIE
from .sohu import SohuIE
from .sonyliv import SonyLIVIE
from .soundcloud import (
SoundcloudEmbedIE,
SoundcloudIE,
SoundcloudSetIE,
SoundcloudUserIE,
@ -1078,7 +1073,6 @@ from .srmediathek import SRMediathekIE
from .stanfordoc import StanfordOpenClassroomIE
from .steam import SteamIE
from .streamable import StreamableIE
from .streamango import StreamangoIE
from .streamcloud import StreamcloudIE
from .streamcz import StreamCZIE
from .streetvoice import StreetVoiceIE
@ -1127,12 +1121,14 @@ from .telegraaf import TelegraafIE
from .telemb import TeleMBIE
from .telequebec import (
TeleQuebecIE,
TeleQuebecSquatIE,
TeleQuebecEmissionIE,
TeleQuebecLiveIE,
)
from .teletask import TeleTaskIE
from .telewebion import TelewebionIE
from .tennistv import TennisTVIE
from .tenplay import TenPlayIE
from .testurl import TestURLIE
from .tf1 import TF1IE
from .tfo import TFOIE
@ -1185,10 +1181,14 @@ from .tunein import (
)
from .tunepk import TunePkIE
from .turbo import TurboIE
from .tutv import TutvIE
from .tv2 import (
TV2IE,
TV2ArticleIE,
KatsomoIE,
)
from .tv2dk import (
TV2DKIE,
TV2DKBornholmPlayIE,
)
from .tv2hu import TV2HuIE
from .tv4 import TV4IE
@ -1231,14 +1231,11 @@ from .twentymin import TwentyMinutenIE
from .twentythreevideo import TwentyThreeVideoIE
from .twitcasting import TwitCastingIE
from .twitch import (
TwitchVideoIE,
TwitchChapterIE,
TwitchVodIE,
TwitchProfileIE,
TwitchAllVideosIE,
TwitchUploadsIE,
TwitchPastBroadcastsIE,
TwitchHighlightsIE,
TwitchCollectionIE,
TwitchVideosIE,
TwitchVideosClipsIE,
TwitchVideosCollectionsIE,
TwitchStreamIE,
TwitchClipsIE,
)
@ -1246,13 +1243,17 @@ from .twitter import (
TwitterCardIE,
TwitterIE,
TwitterAmplifyIE,
TwitterBroadcastIE,
)
from .udemy import (
UdemyIE,
UdemyCourseIE
)
from .udn import UDNEmbedIE
from .ufctv import UFCTVIE
from .ufctv import (
UFCTVIE,
UFCArabiaIE,
)
from .uktvplay import UKTVPlayIE
from .digiteka import DigitekaIE
from .dlive import (
@ -1280,7 +1281,6 @@ from .varzesh3 import Varzesh3IE
from .vbox7 import Vbox7IE
from .veehd import VeeHDIE
from .veoh import VeohIE
from .vessel import VesselIE
from .vesti import VestiIE
from .vevo import (
VevoIE,
@ -1307,7 +1307,6 @@ from .videomore import (
VideomoreVideoIE,
VideomoreSeasonIE,
)
from .videopremium import VideoPremiumIE
from .videopress import VideoPressIE
from .vidio import VidioIE
from .vidlii import VidLiiIE
@ -1322,7 +1321,6 @@ from .viewlift import (
ViewLiftIE,
ViewLiftEmbedIE,
)
from .viewster import ViewsterIE
from .viidea import ViideaIE
from .vimeo import (
VimeoIE,
@ -1411,7 +1409,6 @@ from .weibo import (
WeiboMobileIE
)
from .weiqitv import WeiqiTVIE
from .wimp import WimpIE
from .wistia import WistiaIE
from .worldstarhiphop import WorldStarHipHopIE
from .wsj import (
@ -1425,6 +1422,7 @@ from .xfileshare import XFileShareIE
from .xhamster import (
XHamsterIE,
XHamsterEmbedIE,
XHamsterUserIE,
)
from .xiami import (
XiamiSongIE,
@ -1448,6 +1446,7 @@ from .yahoo import (
YahooSearchIE,
YahooGyaOPlayerIE,
YahooGyaOIE,
YahooJapanNewsIE,
)
from .yandexdisk import YandexDiskIE
from .yandexmusic import (

View file

@ -334,7 +334,7 @@ class FacebookIE(InfoExtractor):
if not video_data:
server_js_data = self._parse_json(
self._search_regex(
r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+(?:stream_pagelet|pagelet_group_mall|permalink_video_pagelet)',
r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+(?:pagelet_group_mall|permalink_video_pagelet|hyperfeed_story_id_\d+)',
webpage, 'js data', default='{}'),
video_id, transform_source=js_to_json, fatal=False)
video_data = extract_from_jsmods_instances(server_js_data)
@ -379,6 +379,7 @@ class FacebookIE(InfoExtractor):
if not video_data:
raise ExtractorError('Cannot parse data')
subtitles = {}
formats = []
for f in video_data:
format_id = f['stream_type']
@ -402,9 +403,17 @@ class FacebookIE(InfoExtractor):
if dash_manifest:
formats.extend(self._parse_mpd_formats(
compat_etree_fromstring(compat_urllib_parse_unquote_plus(dash_manifest))))
subtitles_src = f[0].get('subtitles_src')
if subtitles_src:
subtitles.setdefault('en', []).append({'url': subtitles_src})
if not formats:
raise ExtractorError('Cannot find video formats')
# Downloads with browser's User-Agent are rate limited. Working around
# with non-browser User-Agent.
for f in formats:
f.setdefault('http_headers', {})['User-Agent'] = 'facebookexternalhit/1.1'
self._sort_formats(formats)
video_title = self._html_search_regex(
@ -442,6 +451,7 @@ class FacebookIE(InfoExtractor):
'timestamp': timestamp,
'thumbnail': thumbnail,
'view_count': view_count,
'subtitles': subtitles,
}
return webpage, info_dict
@ -456,15 +466,18 @@ class FacebookIE(InfoExtractor):
return info_dict
if '/posts/' in url:
entries = [
self.url_result('facebook:%s' % vid, FacebookIE.ie_key())
for vid in self._parse_json(
self._search_regex(
r'(["\'])video_ids\1\s*:\s*(?P<ids>\[.+?\])',
webpage, 'video ids', group='ids'),
video_id)]
video_id_json = self._search_regex(
r'(["\'])video_ids\1\s*:\s*(?P<ids>\[.+?\])', webpage, 'video ids', group='ids',
default='')
if video_id_json:
entries = [
self.url_result('facebook:%s' % vid, FacebookIE.ie_key())
for vid in self._parse_json(video_id_json, video_id)]
return self.playlist_result(entries, video_id)
return self.playlist_result(entries, video_id)
# Single Video?
video_id = self._search_regex(r'video_id:\s*"([0-9]+)"', webpage, 'single video id')
return self.url_result('facebook:%s' % video_id, FacebookIE.ie_key())
else:
_, info_dict = self._extract_from_url(
self._VIDEO_PAGE_TEMPLATE % video_id,

View file

@ -1,115 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
float_or_none,
try_get,
unified_timestamp,
)
class FlipagramIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?flipagram\.com/f/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://flipagram.com/f/nyvTSJMKId',
'md5': '888dcf08b7ea671381f00fab74692755',
'info_dict': {
'id': 'nyvTSJMKId',
'ext': 'mp4',
'title': 'Flipagram by sjuria101 featuring Midnight Memories by One Direction',
'description': 'md5:d55e32edc55261cae96a41fa85ff630e',
'duration': 35.571,
'timestamp': 1461244995,
'upload_date': '20160421',
'uploader': 'kitty juria',
'uploader_id': 'sjuria101',
'creator': 'kitty juria',
'view_count': int,
'like_count': int,
'repost_count': int,
'comment_count': int,
'comments': list,
'formats': 'mincount:2',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_data = self._parse_json(
self._search_regex(
r'window\.reactH2O\s*=\s*({.+});', webpage, 'video data'),
video_id)
flipagram = video_data['flipagram']
video = flipagram['video']
json_ld = self._search_json_ld(webpage, video_id, default={})
title = json_ld.get('title') or flipagram['captionText']
description = json_ld.get('description') or flipagram.get('captionText')
formats = [{
'url': video['url'],
'width': int_or_none(video.get('width')),
'height': int_or_none(video.get('height')),
'filesize': int_or_none(video_data.get('size')),
}]
preview_url = try_get(
flipagram, lambda x: x['music']['track']['previewUrl'], compat_str)
if preview_url:
formats.append({
'url': preview_url,
'ext': 'm4a',
'vcodec': 'none',
})
self._sort_formats(formats)
counts = flipagram.get('counts', {})
user = flipagram.get('user', {})
video_data = flipagram.get('video', {})
thumbnails = [{
'url': self._proto_relative_url(cover['url']),
'width': int_or_none(cover.get('width')),
'height': int_or_none(cover.get('height')),
'filesize': int_or_none(cover.get('size')),
} for cover in flipagram.get('covers', []) if cover.get('url')]
# Note that this only retrieves comments that are initially loaded.
# For videos with large amounts of comments, most won't be retrieved.
comments = []
for comment in video_data.get('comments', {}).get(video_id, {}).get('items', []):
text = comment.get('comment')
if not text or not isinstance(text, list):
continue
comments.append({
'author': comment.get('user', {}).get('name'),
'author_id': comment.get('user', {}).get('username'),
'id': comment.get('id'),
'text': text[0],
'timestamp': unified_timestamp(comment.get('created')),
})
return {
'id': video_id,
'title': title,
'description': description,
'duration': float_or_none(flipagram.get('duration'), 1000),
'thumbnails': thumbnails,
'timestamp': unified_timestamp(flipagram.get('iso8601Created')),
'uploader': user.get('name'),
'uploader_id': user.get('username'),
'creator': user.get('name'),
'view_count': int_or_none(counts.get('plays')),
'like_count': int_or_none(counts.get('likes')),
'repost_count': int_or_none(counts.get('reflips')),
'comment_count': int_or_none(counts.get('comments')),
'comments': comments,
'formats': formats,
}

View file

@ -1,13 +1,23 @@
# coding: utf-8
from __future__ import unicode_literals
from .anvato import AnvatoIE
from .common import InfoExtractor
class FOX9IE(AnvatoIE):
_VALID_URL = r'https?://(?:www\.)?fox9\.com/(?:[^/]+/)+(?P<id>\d+)-story'
_TESTS = [{
'url': 'http://www.fox9.com/news/215123287-story',
class FOX9IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?fox9\.com/video/(?P<id>\d+)'
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
'anvato:anvato_epfox_app_web_prod_b3373168e12f423f41504f207000188daf88251b:' + video_id,
'Anvato', video_id)
class FOX9NewsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?fox9\.com/news/(?P<id>[^/?&#]+)'
_TEST = {
'url': 'https://www.fox9.com/news/black-bear-in-tree-draws-crowd-in-downtown-duluth-minnesota',
'md5': 'd6e1b2572c3bab8a849c9103615dd243',
'info_dict': {
'id': '314473',
@ -21,22 +31,11 @@ class FOX9IE(AnvatoIE):
'categories': ['News', 'Sports'],
'tags': ['news', 'video'],
},
}, {
'url': 'http://www.fox9.com/news/investigators/214070684-story',
'only_matching': True,
}]
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_id = self._parse_json(
self._search_regex(
r"this\.videosJson\s*=\s*'(\[.+?\])';",
webpage, 'anvato playlist'),
video_id)[0]['video']
return self._get_anvato_videos(
'anvato_epfox_app_web_prod_b3373168e12f423f41504f207000188daf88251b',
video_id)
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
anvato_id = self._search_regex(
r'anvatoId\s*:\s*[\'"](\d+)', webpage, 'anvato id')
return self.url_result('https://www.fox9.com/video/' + anvato_id, 'FOX9')

View file

@ -31,7 +31,13 @@ class FranceCultureIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
video_data = extract_attributes(self._search_regex(
r'(?s)<div[^>]+class="[^"]*?(?:title-zone-diffusion|heading-zone-(?:wrapper|player-button))[^"]*?"[^>]*>.*?(<button[^>]+data-asset-source="[^"]+"[^>]+>)',
r'''(?sx)
(?:
</h1>|
<div[^>]+class="[^"]*?(?:title-zone-diffusion|heading-zone-(?:wrapper|player-button))[^"]*?"[^>]*>
).*?
(<button[^>]+data-asset-source="[^"]+"[^>]+>)
''',
webpage, 'video data'))
video_url = video_data['data-asset-source']

View file

@ -1,134 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
xpath_with_ns,
parse_iso8601,
float_or_none,
int_or_none,
)
NAMESPACE_MAP = {
'media': 'http://search.yahoo.com/mrss/',
}
# URL prefix to download the mp4 files directly instead of streaming via rtmp
# Credits go to XBox-Maniac
# http://board.jdownloader.org/showpost.php?p=185835&postcount=31
RAW_MP4_URL = 'http://cdn.riptide-mtvn.com/'
class GameOneIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gameone\.de/tv/(?P<id>\d+)'
_TESTS = [
{
'url': 'http://www.gameone.de/tv/288',
'md5': '136656b7fb4c9cb4a8e2d500651c499b',
'info_dict': {
'id': '288',
'ext': 'mp4',
'title': 'Game One - Folge 288',
'duration': 1238,
'thumbnail': 'http://s3.gameone.de/gameone/assets/video_metas/teaser_images/000/643/636/big/640x360.jpg',
'description': 'FIFA-Pressepokal 2014, Star Citizen, Kingdom Come: Deliverance, Project Cars, Schöner Trants Nerdquiz Folge 2 Runde 1',
'age_limit': 16,
'upload_date': '20140513',
'timestamp': 1399980122,
}
},
{
'url': 'http://gameone.de/tv/220',
'md5': '5227ca74c4ae6b5f74c0510a7c48839e',
'info_dict': {
'id': '220',
'ext': 'mp4',
'upload_date': '20120918',
'description': 'Jet Set Radio HD, Tekken Tag Tournament 2, Source Filmmaker',
'timestamp': 1347971451,
'title': 'Game One - Folge 220',
'duration': 896.62,
'age_limit': 16,
}
}
]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
og_video = self._og_search_video_url(webpage, secure=False)
description = self._html_search_meta('description', webpage)
age_limit = int(
self._search_regex(
r'age=(\d+)',
self._html_search_meta(
'age-de-meta-label',
webpage),
'age_limit',
'0'))
mrss_url = self._search_regex(r'mrss=([^&]+)', og_video, 'mrss')
mrss = self._download_xml(mrss_url, video_id, 'Downloading mrss')
title = mrss.find('.//item/title').text
thumbnail = mrss.find('.//item/image').get('url')
timestamp = parse_iso8601(mrss.find('.//pubDate').text, delimiter=' ')
content = mrss.find(xpath_with_ns('.//media:content', NAMESPACE_MAP))
content_url = content.get('url')
content = self._download_xml(
content_url,
video_id,
'Downloading media:content')
rendition_items = content.findall('.//rendition')
duration = float_or_none(rendition_items[0].get('duration'))
formats = [
{
'url': re.sub(r'.*/(r2)', RAW_MP4_URL + r'\1', r.find('./src').text),
'width': int_or_none(r.get('width')),
'height': int_or_none(r.get('height')),
'tbr': int_or_none(r.get('bitrate')),
}
for r in rendition_items
]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'description': description,
'age_limit': age_limit,
'timestamp': timestamp,
}
class GameOnePlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gameone\.de(?:/tv)?/?$'
IE_NAME = 'gameone:playlist'
_TEST = {
'url': 'http://www.gameone.de/tv',
'info_dict': {
'title': 'GameOne',
},
'playlist_mincount': 294,
}
def _real_extract(self, url):
webpage = self._download_webpage('http://www.gameone.de/tv', 'TV')
max_id = max(map(int, re.findall(r'<a href="/tv/(\d+)"', webpage)))
entries = [
self.url_result('http://www.gameone.de/tv/%d' %
video_id, 'GameOne')
for video_id in range(max_id, 0, -1)]
return {
'_type': 'playlist',
'title': 'GameOne',
'entries': entries,
}

View file

@ -60,6 +60,9 @@ from .tnaflix import TNAFlixNetworkEmbedIE
from .drtuber import DrTuberIE
from .redtube import RedTubeIE
from .tube8 import Tube8IE
from .mofosex import MofosexEmbedIE
from .spankwire import SpankwireIE
from .youporn import YouPornIE
from .vimeo import VimeoIE
from .dailymotion import DailymotionIE
from .dailymail import DailyMailIE
@ -77,11 +80,10 @@ from .instagram import InstagramIE
from .liveleak import LiveLeakIE
from .threeqsdn import ThreeQSDNIE
from .theplatform import ThePlatformIE
from .vessel import VesselIE
from .kaltura import KalturaIE
from .eagleplatform import EaglePlatformIE
from .facebook import FacebookIE
from .soundcloud import SoundcloudIE
from .soundcloud import SoundcloudEmbedIE
from .tunein import TuneInBaseIE
from .vbox7 import Vbox7IE
from .dbtv import DBTVIE
@ -89,10 +91,6 @@ from .piksel import PikselIE
from .videa import VideaIE
from .twentymin import TwentyMinutenIE
from .ustream import UstreamIE
from .openload import (
OpenloadIE,
VerystreamIE,
)
from .videopress import VideoPressIE
from .rutube import RutubeIE
from .limelight import LimelightBaseIE
@ -119,6 +117,8 @@ from .foxnews import FoxNewsIE
from .viqeo import ViqeoIE
from .expressen import ExpressenIE
from .zype import ZypeIE
from .odnoklassniki import OdnoklassnikiIE
from .kinja import KinjaEmbedIE
class GenericIE(InfoExtractor):
@ -1487,16 +1487,18 @@ class GenericIE(InfoExtractor):
'timestamp': 1432570283,
},
},
# OnionStudios embed
# Kinja embed
{
'url': 'http://www.clickhole.com/video/dont-understand-bitcoin-man-will-mumble-explanatio-2537',
'info_dict': {
'id': '2855',
'id': '106351',
'ext': 'mp4',
'title': 'Dont Understand Bitcoin? This Man Will Mumble An Explanation At You',
'description': 'Migrated from OnionStudios',
'thumbnail': r're:^https?://.*\.jpe?g$',
'uploader': 'ClickHole',
'uploader_id': 'clickhole',
'uploader': 'clickhole',
'upload_date': '20150527',
'timestamp': 1432744860,
}
},
# SnagFilms embed
@ -1706,6 +1708,15 @@ class GenericIE(InfoExtractor):
},
'add_ie': ['Kaltura'],
},
{
# multiple kaltura embeds, nsfw
'url': 'https://www.quartier-rouge.be/prive/femmes/kamila-avec-video-jaime-sadomie.html',
'info_dict': {
'id': 'kamila-avec-video-jaime-sadomie',
'title': "Kamila avec vídeo “J'aime sadomie”",
},
'playlist_count': 8,
},
{
# Non-standard Vimeo embed
'url': 'https://openclassrooms.com/courses/understanding-the-web',
@ -2075,6 +2086,22 @@ class GenericIE(InfoExtractor):
},
'playlist_count': 6,
},
{
# Squarespace video embed, 2019-08-28
'url': 'http://ootboxford.com',
'info_dict': {
'id': 'Tc7b_JGdZfw',
'title': 'Out of the Blue, at Childish Things 10',
'ext': 'mp4',
'description': 'md5:a83d0026666cf5ee970f8bd1cfd69c7f',
'uploader_id': 'helendouglashouse',
'uploader': 'Helen & Douglas House',
'upload_date': '20140328',
},
'params': {
'skip_download': True,
},
},
{
# Zype embed
'url': 'https://www.cookscountry.com/episode/554-smoky-barbecue-favorites',
@ -2083,6 +2110,9 @@ class GenericIE(InfoExtractor):
'ext': 'mp4',
'title': 'Smoky Barbecue Favorites',
'thumbnail': r're:^https?://.*\.jpe?g',
'description': 'md5:5ff01e76316bd8d46508af26dc86023b',
'upload_date': '20170909',
'timestamp': 1504915200,
},
'add_ie': [ZypeIE.ie_key()],
'params': {
@ -2226,7 +2256,7 @@ class GenericIE(InfoExtractor):
default_search = 'fixup_error'
if default_search in ('auto', 'auto_warning', 'fixup_error'):
if '/' in url:
if re.match(r'^[^\s/]+\.[^\s/]+/', url):
self._downloader.report_warning('The url doesn\'t specify the protocol, trying with http')
return self.url_result('http://' + url)
elif default_search != 'fixup_error':
@ -2269,7 +2299,7 @@ class GenericIE(InfoExtractor):
if head_response is not False:
# Check for redirect
new_url = compat_str(head_response.geturl())
new_url = head_response.geturl()
if url != new_url:
self.report_following_redirect(new_url)
if force_videoid:
@ -2369,12 +2399,12 @@ class GenericIE(InfoExtractor):
return self.playlist_result(
self._parse_xspf(
doc, video_id, xspf_url=url,
xspf_base_url=compat_str(full_response.geturl())),
xspf_base_url=full_response.geturl()),
video_id)
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'] = self._parse_mpd_formats(
doc,
mpd_base_url=compat_str(full_response.geturl()).rpartition('/')[0],
mpd_base_url=full_response.geturl().rpartition('/')[0],
mpd_url=url)
self._sort_formats(info_dict['formats'])
return info_dict
@ -2395,6 +2425,12 @@ class GenericIE(InfoExtractor):
# Unescaping the whole page allows to handle those cases in a generic way
webpage = compat_urllib_parse_unquote(webpage)
# Unescape squarespace embeds to be detected by generic extractor,
# see https://github.com/ytdl-org/youtube-dl/issues/21294
webpage = re.sub(
r'<div[^>]+class=[^>]*?\bsqs-video-wrapper\b[^>]*>',
lambda x: unescapeHTML(x.group(0)), webpage)
# it's tempting to parse this further, but you would
# have to take into account all the variations like
# Video Title - Site Name
@ -2469,11 +2505,6 @@ class GenericIE(InfoExtractor):
if tp_urls:
return self.playlist_from_matches(tp_urls, video_id, video_title, ie='ThePlatform')
# Look for Vessel embeds
vessel_urls = VesselIE._extract_urls(webpage)
if vessel_urls:
return self.playlist_from_matches(vessel_urls, video_id, video_title, ie=VesselIE.ie_key())
# Look for embedded rtl.nl player
matches = re.findall(
r'<iframe[^>]+?src="((?:https?:)?//(?:(?:www|static)\.)?rtl\.nl/(?:system/videoplayer/[^"]+(?:video_)?)?embed[^"]+)"',
@ -2517,15 +2548,21 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches(
dailymail_urls, video_id, video_title, ie=DailyMailIE.ie_key())
# Look for Teachable embeds, must be before Wistia
teachable_url = TeachableIE._extract_url(webpage, url)
if teachable_url:
return self.url_result(teachable_url)
# Look for embedded Wistia player
wistia_url = WistiaIE._extract_url(webpage)
if wistia_url:
return {
'_type': 'url_transparent',
'url': self._proto_relative_url(wistia_url),
'ie_key': WistiaIE.ie_key(),
'uploader': video_uploader,
}
wistia_urls = WistiaIE._extract_urls(webpage)
if wistia_urls:
playlist = self.playlist_from_matches(wistia_urls, video_id, video_title, ie=WistiaIE.ie_key())
for entry in playlist['entries']:
entry.update({
'_type': 'url_transparent',
'uploader': video_uploader,
})
return playlist
# Look for SVT player
svt_url = SVTIE._extract_url(webpage)
@ -2611,9 +2648,9 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group('url'), 'VK')
# Look for embedded Odnoklassniki player
mobj = re.search(r'<iframe[^>]+?src=(["\'])(?P<url>https?://(?:odnoklassniki|ok)\.ru/videoembed/.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'Odnoklassniki')
odnoklassniki_url = OdnoklassnikiIE._extract_url(webpage)
if odnoklassniki_url:
return self.url_result(odnoklassniki_url, OdnoklassnikiIE.ie_key())
# Look for embedded ivi player
mobj = re.search(r'<embed[^>]+?src=(["\'])(?P<url>https?://(?:www\.)?ivi\.ru/video/player.+?)\1', webpage)
@ -2690,6 +2727,21 @@ class GenericIE(InfoExtractor):
if tube8_urls:
return self.playlist_from_matches(tube8_urls, video_id, video_title, ie=Tube8IE.ie_key())
# Look for embedded Mofosex player
mofosex_urls = MofosexEmbedIE._extract_urls(webpage)
if mofosex_urls:
return self.playlist_from_matches(mofosex_urls, video_id, video_title, ie=MofosexEmbedIE.ie_key())
# Look for embedded Spankwire player
spankwire_urls = SpankwireIE._extract_urls(webpage)
if spankwire_urls:
return self.playlist_from_matches(spankwire_urls, video_id, video_title, ie=SpankwireIE.ie_key())
# Look for embedded YouPorn player
youporn_urls = YouPornIE._extract_urls(webpage)
if youporn_urls:
return self.playlist_from_matches(youporn_urls, video_id, video_title, ie=YouPornIE.ie_key())
# Look for embedded Tvigle player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//cloud\.tvigle\.ru/video/.+?)\1', webpage)
@ -2732,9 +2784,9 @@ class GenericIE(InfoExtractor):
return self.url_result(myvi_url)
# Look for embedded soundcloud player
soundcloud_urls = SoundcloudIE._extract_urls(webpage)
soundcloud_urls = SoundcloudEmbedIE._extract_urls(webpage)
if soundcloud_urls:
return self.playlist_from_matches(soundcloud_urls, video_id, video_title, getter=unescapeHTML, ie=SoundcloudIE.ie_key())
return self.playlist_from_matches(soundcloud_urls, video_id, video_title, getter=unescapeHTML)
# Look for tunein player
tunein_urls = TuneInBaseIE._extract_urls(webpage)
@ -2801,9 +2853,12 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group('url'), 'Zapiks')
# Look for Kaltura embeds
kaltura_url = KalturaIE._extract_url(webpage)
if kaltura_url:
return self.url_result(smuggle_url(kaltura_url, {'source_url': url}), KalturaIE.ie_key())
kaltura_urls = KalturaIE._extract_urls(webpage)
if kaltura_urls:
return self.playlist_from_matches(
kaltura_urls, video_id, video_title,
getter=lambda x: smuggle_url(x, {'source_url': url}),
ie=KalturaIE.ie_key())
# Look for EaglePlatform embeds
eagleplatform_url = EaglePlatformIE._extract_url(webpage)
@ -2877,6 +2932,12 @@ class GenericIE(InfoExtractor):
if senate_isvp_url:
return self.url_result(senate_isvp_url, 'SenateISVP')
# Look for Kinja embeds
kinja_embed_urls = KinjaEmbedIE._extract_urls(webpage, url)
if kinja_embed_urls:
return self.playlist_from_matches(
kinja_embed_urls, video_id, video_title)
# Look for OnionStudios embeds
onionstudios_url = OnionStudiosIE._extract_url(webpage)
if onionstudios_url:
@ -2938,7 +2999,7 @@ class GenericIE(InfoExtractor):
# Look for VODPlatform embeds
mobj = re.search(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?vod-platform\.net/[eE]mbed/.+?)\1',
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:(?:www\.)?vod-platform\.net|embed\.kwikmotion\.com)/[eE]mbed/.+?)\1',
webpage)
if mobj is not None:
return self.url_result(
@ -2946,10 +3007,14 @@ class GenericIE(InfoExtractor):
# Look for Mangomolo embeds
mobj = re.search(
r'''(?x)<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?admin\.mangomolo\.com/analytics/index\.php/customers/embed/
r'''(?x)<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//
(?:
admin\.mangomolo\.com/analytics/index\.php/customers/embed|
player\.mangomolo\.com/v1
)/
(?:
video\?.*?\bid=(?P<video_id>\d+)|
index\?.*?\bchannelid=(?P<channel_id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)
(?:index|live)\?.*?\bchannelid=(?P<channel_id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)
).+?)\1''', webpage)
if mobj is not None:
info = {
@ -3018,18 +3083,6 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches(
twentymin_urls, video_id, video_title, ie=TwentyMinutenIE.ie_key())
# Look for Openload embeds
openload_urls = OpenloadIE._extract_urls(webpage)
if openload_urls:
return self.playlist_from_matches(
openload_urls, video_id, video_title, ie=OpenloadIE.ie_key())
# Look for Verystream embeds
verystream_urls = VerystreamIE._extract_urls(webpage)
if verystream_urls:
return self.playlist_from_matches(
verystream_urls, video_id, video_title, ie=VerystreamIE.ie_key())
# Look for VideoPress embeds
videopress_urls = VideoPressIE._extract_urls(webpage)
if videopress_urls:
@ -3123,10 +3176,6 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches(
peertube_urls, video_id, video_title, ie=PeerTubeIE.ie_key())
teachable_url = TeachableIE._extract_url(webpage, url)
if teachable_url:
return self.url_result(teachable_url)
indavideo_urls = IndavideoEmbedIE._extract_urls(webpage)
if indavideo_urls:
return self.playlist_from_matches(

View file

@ -11,7 +11,7 @@ from ..utils import (
class GfycatIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?:ifr/|gifs/detail/)?(?P<id>[^-/?#]+)'
_VALID_URL = r'https?://(?:(?:www|giant|thumbs)\.)?gfycat\.com/(?:ru/|ifr/|gifs/detail/)?(?P<id>[^-/?#\.]+)'
_TESTS = [{
'url': 'http://gfycat.com/DeadlyDecisiveGermanpinscher',
'info_dict': {
@ -44,12 +44,21 @@ class GfycatIE(InfoExtractor):
'categories': list,
'age_limit': 0,
}
}, {
'url': 'https://gfycat.com/ru/RemarkableDrearyAmurstarfish',
'only_matching': True
}, {
'url': 'https://gfycat.com/gifs/detail/UnconsciousLankyIvorygull',
'only_matching': True
}, {
'url': 'https://gfycat.com/acceptablehappygoluckyharborporpoise-baseball',
'only_matching': True
}, {
'url': 'https://thumbs.gfycat.com/acceptablehappygoluckyharborporpoise-size_restricted.gif',
'only_matching': True
}, {
'url': 'https://giant.gfycat.com/acceptablehappygoluckyharborporpoise.mp4',
'only_matching': True
}]
def _real_extract(self, url):

View file

@ -13,10 +13,10 @@ from ..utils import (
class GiantBombIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?giantbomb\.com/videos/(?P<display_id>[^/]+)/(?P<id>\d+-\d+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?giantbomb\.com/(?:videos|shows)/(?P<display_id>[^/]+)/(?P<id>\d+-\d+)'
_TESTS = [{
'url': 'http://www.giantbomb.com/videos/quick-look-destiny-the-dark-below/2300-9782/',
'md5': 'c8ea694254a59246a42831155dec57ac',
'md5': '132f5a803e7e0ab0e274d84bda1e77ae',
'info_dict': {
'id': '2300-9782',
'display_id': 'quick-look-destiny-the-dark-below',
@ -26,7 +26,10 @@ class GiantBombIE(InfoExtractor):
'duration': 2399,
'thumbnail': r're:^https?://.*\.jpg$',
}
}
}, {
'url': 'https://www.giantbomb.com/shows/ben-stranding/2970-20212',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)

View file

@ -96,21 +96,31 @@ class GloboIE(InfoExtractor):
video = self._download_json(
'http://api.globovideos.com/videos/%s/playlist' % video_id,
video_id)['videos'][0]
if video.get('encrypted') is True:
raise ExtractorError('This video is DRM protected.', expected=True)
title = video['title']
formats = []
subtitles = {}
for resource in video['resources']:
resource_id = resource.get('_id')
resource_url = resource.get('url')
if not resource_id or not resource_url:
resource_type = resource.get('type')
if not resource_url or (resource_type == 'media' and not resource_id) or resource_type not in ('subtitle', 'media'):
continue
if resource_type == 'subtitle':
subtitles.setdefault(resource.get('language') or 'por', []).append({
'url': resource_url,
})
continue
security = self._download_json(
'http://security.video.globo.com/videos/%s/hash' % video_id,
video_id, 'Downloading security hash for %s' % resource_id, query={
'player': 'flash',
'version': '17.0.0.132',
'player': 'desktop',
'version': '5.19.1',
'resource_id': resource_id,
})
@ -123,18 +133,23 @@ class GloboIE(InfoExtractor):
continue
hash_code = security_hash[:2]
received_time = security_hash[2:12]
received_random = security_hash[12:22]
received_md5 = security_hash[22:]
sign_time = compat_str(int(received_time) + 86400)
padding = '%010d' % random.randint(1, 10000000000)
if hash_code in ('04', '14'):
received_time = security_hash[3:13]
received_md5 = security_hash[24:]
hash_prefix = security_hash[:23]
elif hash_code in ('02', '12', '03', '13'):
received_time = security_hash[2:12]
received_md5 = security_hash[22:]
padding += '1'
hash_prefix = '05' + security_hash[:22]
md5_data = (received_md5 + sign_time + padding + '0xFF01DD').encode()
padded_sign_time = compat_str(int(received_time) + 86400) + padding
md5_data = (received_md5 + padded_sign_time + '0xAC10FD').encode()
signed_md5 = base64.urlsafe_b64encode(hashlib.md5(md5_data).digest()).decode().strip('=')
signed_hash = hash_code + received_time + received_random + sign_time + padding + signed_md5
signed_hash = hash_prefix + padded_sign_time + signed_md5
signed_url = '%s?h=%s&k=html5&a=%s&u=%s' % (resource_url, signed_hash, 'F' if video.get('subscriber_only') else 'A', security.get('user') or '')
signed_url = '%s?h=%s&k=%s' % (resource_url, signed_hash, 'flash')
if resource_id.endswith('m3u8') or resource_url.endswith('.m3u8'):
formats.extend(self._extract_m3u8_formats(
signed_url, resource_id, 'mp4', entry_protocol='m3u8_native',
@ -164,7 +179,8 @@ class GloboIE(InfoExtractor):
'duration': duration,
'uploader': uploader,
'uploader_id': uploader_id,
'formats': formats
'formats': formats,
'subtitles': subtitles,
}

View file

@ -40,8 +40,17 @@ class GoIE(AdobePassIE):
'resource_id': 'Disney',
}
}
_VALID_URL = r'https?://(?:(?:(?P<sub_domain>%s)\.)?go|(?P<sub_domain_2>disneynow))\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\
% '|'.join(list(_SITE_INFO.keys()) + ['disneynow'])
_VALID_URL = r'''(?x)
https?://
(?:
(?:(?P<sub_domain>%s)\.)?go|
(?P<sub_domain_2>abc|freeform|disneynow)
)\.com/
(?:
(?:[^/]+/)*(?P<id>[Vv][Dd][Kk][Aa]\w+)|
(?:[^/]+/)*(?P<display_id>[^/?\#]+)
)
''' % '|'.join(list(_SITE_INFO.keys()))
_TESTS = [{
'url': 'http://abc.go.com/shows/designated-survivor/video/most-recent/VDKA3807643',
'info_dict': {
@ -54,6 +63,7 @@ class GoIE(AdobePassIE):
# m3u8 download
'skip_download': True,
},
'skip': 'This content is no longer available.',
}, {
'url': 'http://watchdisneyxd.go.com/doraemon',
'info_dict': {
@ -61,6 +71,34 @@ class GoIE(AdobePassIE):
'id': 'SH55574025',
},
'playlist_mincount': 51,
}, {
'url': 'http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood',
'info_dict': {
'id': 'VDKA3609139',
'ext': 'mp4',
'title': 'This Guilty Blood',
'description': 'md5:f18e79ad1c613798d95fdabfe96cd292',
'age_limit': 14,
},
'params': {
'geo_bypass_ip_block': '3.244.239.0/24',
# m3u8 download
'skip_download': True,
},
}, {
'url': 'https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet',
'info_dict': {
'id': 'VDKA13435179',
'ext': 'mp4',
'title': 'The Bet',
'description': 'md5:c66de8ba2e92c6c5c113c3ade84ab404',
'age_limit': 14,
},
'params': {
'geo_bypass_ip_block': '3.244.239.0/24',
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding',
'only_matching': True,
@ -95,10 +133,13 @@ class GoIE(AdobePassIE):
if not video_id or not site_info:
webpage = self._download_webpage(url, display_id or video_id)
video_id = self._search_regex(
# There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
r'data-video-id=["\']*(VDKA\w+)', webpage, 'video id',
default=video_id)
(
# There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
r'data-video-id=["\']*(VDKA\w+)',
# https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet
r'\b(?:video)?id["\']\s*:\s*["\'](VDKA\w+)'
), webpage, 'video id', default=video_id)
if not site_info:
brand = self._search_regex(
(r'data-brand=\s*["\']\s*(\d+)',

View file

@ -1,149 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
parse_age_limit,
parse_iso8601,
)
class Go90IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?go90\.com/(?:videos|embed)/(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'https://www.go90.com/videos/84BUqjLpf9D',
'md5': 'efa7670dbbbf21a7b07b360652b24a32',
'info_dict': {
'id': '84BUqjLpf9D',
'ext': 'mp4',
'title': 'Daily VICE - Inside The Utah Coalition Against Pornography Convention',
'description': 'VICE\'s Karley Sciortino meets with activists who discuss the state\'s strong anti-porn stance. Then, VICE Sports explains NFL contracts.',
'timestamp': 1491868800,
'upload_date': '20170411',
'age_limit': 14,
}
}, {
'url': 'https://www.go90.com/embed/261MflWkD3N',
'only_matching': True,
}]
_GEO_BYPASS = False
def _real_extract(self, url):
video_id = self._match_id(url)
try:
headers = self.geo_verification_headers()
headers.update({
'Content-Type': 'application/json; charset=utf-8',
})
video_data = self._download_json(
'https://www.go90.com/api/view/items/' + video_id, video_id,
headers=headers, data=b'{"client":"web","device_type":"pc"}')
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
message = self._parse_json(e.cause.read().decode(), None)['error']['message']
if 'region unavailable' in message:
self.raise_geo_restricted(countries=['US'])
raise ExtractorError(message, expected=True)
raise
if video_data.get('requires_drm'):
raise ExtractorError('This video is DRM protected.', expected=True)
main_video_asset = video_data['main_video_asset']
episode_number = int_or_none(video_data.get('episode_number'))
series = None
season = None
season_id = None
season_number = None
for metadata in video_data.get('__children', {}).get('Item', {}).values():
if metadata.get('type') == 'show':
series = metadata.get('title')
elif metadata.get('type') == 'season':
season = metadata.get('title')
season_id = metadata.get('id')
season_number = int_or_none(metadata.get('season_number'))
title = episode = video_data.get('title') or series
if series and series != title:
title = '%s - %s' % (series, title)
thumbnails = []
formats = []
subtitles = {}
for asset in video_data.get('assets'):
if asset.get('id') == main_video_asset:
for source in asset.get('sources', []):
source_location = source.get('location')
if not source_location:
continue
source_type = source.get('type')
if source_type == 'hls':
m3u8_formats = self._extract_m3u8_formats(
source_location, video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False)
for f in m3u8_formats:
mobj = re.search(r'/hls-(\d+)-(\d+)K', f['url'])
if mobj:
height, tbr = mobj.groups()
height = int_or_none(height)
f.update({
'height': f.get('height') or height,
'width': f.get('width') or int_or_none(height / 9.0 * 16.0 if height else None),
'tbr': f.get('tbr') or int_or_none(tbr),
})
formats.extend(m3u8_formats)
elif source_type == 'dash':
formats.extend(self._extract_mpd_formats(
source_location, video_id, mpd_id='dash', fatal=False))
else:
formats.append({
'format_id': source.get('name'),
'url': source_location,
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
'tbr': int_or_none(source.get('bitrate')),
})
for caption in asset.get('caption_metadata', []):
caption_url = caption.get('source_url')
if not caption_url:
continue
subtitles.setdefault(caption.get('language', 'en'), []).append({
'url': caption_url,
'ext': determine_ext(caption_url, 'vtt'),
})
elif asset.get('type') == 'image':
asset_location = asset.get('location')
if not asset_location:
continue
thumbnails.append({
'url': asset_location,
'width': int_or_none(asset.get('width')),
'height': int_or_none(asset.get('height')),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnails': thumbnails,
'description': video_data.get('short_description'),
'like_count': int_or_none(video_data.get('like_count')),
'timestamp': parse_iso8601(video_data.get('released_at')),
'series': series,
'episode': episode,
'season': season,
'season_id': season_id,
'season_number': season_number,
'episode_number': episode_number,
'subtitles': subtitles,
'age_limit': parse_age_limit(video_data.get('rating')),
}

View file

@ -220,19 +220,27 @@ class GoogleDriveIE(InfoExtractor):
'id': video_id,
'export': 'download',
})
urlh = self._request_webpage(
source_url, video_id, note='Requesting source file',
errnote='Unable to request source file', fatal=False)
def request_source_file(source_url, kind):
return self._request_webpage(
source_url, video_id, note='Requesting %s file' % kind,
errnote='Unable to request %s file' % kind, fatal=False)
urlh = request_source_file(source_url, 'source')
if urlh:
def add_source_format(src_url):
def add_source_format(urlh):
formats.append({
'url': src_url,
# Use redirect URLs as download URLs in order to calculate
# correct cookies in _calc_cookies.
# Using original URLs may result in redirect loop due to
# google.com's cookies mistakenly used for googleusercontent.com
# redirect URLs (see #23919).
'url': urlh.geturl(),
'ext': determine_ext(title, 'mp4').lower(),
'format_id': 'source',
'quality': 1,
})
if urlh.headers.get('Content-Disposition'):
add_source_format(source_url)
add_source_format(urlh)
else:
confirmation_webpage = self._webpage_read_content(
urlh, url, video_id, note='Downloading confirmation page',
@ -242,9 +250,12 @@ class GoogleDriveIE(InfoExtractor):
r'confirm=([^&"\']+)', confirmation_webpage,
'confirmation code', fatal=False)
if confirm:
add_source_format(update_url_query(source_url, {
confirmed_source_url = update_url_query(source_url, {
'confirm': confirm,
}))
})
urlh = request_source_file(confirmed_source_url, 'confirmed source')
if urlh and urlh.headers.get('Content-Disposition'):
add_source_format(urlh)
if not formats:
reason = self._search_regex(

View file

@ -1,33 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class HarkIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hark\.com/clips/(?P<id>.+?)-.+'
_TEST = {
'url': 'http://www.hark.com/clips/mmbzyhkgny-obama-beyond-the-afghan-theater-we-only-target-al-qaeda-on-may-23-2013',
'md5': '6783a58491b47b92c7c1af5a77d4cbee',
'info_dict': {
'id': 'mmbzyhkgny',
'ext': 'mp3',
'title': 'Obama: \'Beyond The Afghan Theater, We Only Target Al Qaeda\' on May 23, 2013',
'description': 'President Barack Obama addressed the nation live on May 23, 2013 in a speech aimed at addressing counter-terrorism policies including the use of drone strikes, detainees at Guantanamo Bay prison facility, and American citizens who are terrorists.',
'duration': 11,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'http://www.hark.com/clips/%s.json' % video_id, video_id)
return {
'id': video_id,
'url': data['url'],
'title': data['name'],
'description': data.get('description'),
'thumbnail': data.get('image_original'),
'duration': data.get('duration'),
}

View file

@ -105,8 +105,7 @@ class HeiseIE(InfoExtractor):
webpage, default=None) or self._html_search_meta(
'description', webpage)
kaltura_url = KalturaIE._extract_url(webpage)
if kaltura_url:
def _make_kaltura_result(kaltura_url):
return {
'_type': 'url_transparent',
'url': smuggle_url(kaltura_url, {'source_url': url}),
@ -115,6 +114,16 @@ class HeiseIE(InfoExtractor):
'description': description,
}
kaltura_url = KalturaIE._extract_url(webpage)
if kaltura_url:
return _make_kaltura_result(kaltura_url)
kaltura_id = self._search_regex(
r'entry-id=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'kaltura id',
default=None, group='id')
if kaltura_id:
return _make_kaltura_result('kaltura:2238431:%s' % kaltura_id)
yt_urls = YoutubeIE._extract_urls(webpage)
if yt_urls:
return self.playlist_from_matches(

View file

@ -1,12 +1,11 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
js_to_json,
int_or_none,
merge_dicts,
remove_end,
determine_ext,
unified_timestamp,
)
@ -14,15 +13,21 @@ class HellPornoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hellporno\.(?:com/videos|net/v)/(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://hellporno.com/videos/dixie-is-posing-with-naked-ass-very-erotic/',
'md5': '1fee339c610d2049699ef2aa699439f1',
'md5': 'f0a46ebc0bed0c72ae8fe4629f7de5f3',
'info_dict': {
'id': '149116',
'display_id': 'dixie-is-posing-with-naked-ass-very-erotic',
'ext': 'mp4',
'title': 'Dixie is posing with naked ass very erotic',
'description': 'md5:9a72922749354edb1c4b6e540ad3d215',
'categories': list,
'thumbnail': r're:https?://.*\.jpg$',
'duration': 240,
'timestamp': 1398762720,
'upload_date': '20140429',
'view_count': int,
'age_limit': 18,
}
},
}, {
'url': 'http://hellporno.net/v/186271/',
'only_matching': True,
@ -36,40 +41,36 @@ class HellPornoIE(InfoExtractor):
title = remove_end(self._html_search_regex(
r'<title>([^<]+)</title>', webpage, 'title'), ' - Hell Porno')
flashvars = self._parse_json(self._search_regex(
r'var\s+flashvars\s*=\s*({.+?});', webpage, 'flashvars'),
display_id, transform_source=js_to_json)
info = self._parse_html5_media_entries(url, webpage, display_id)[0]
self._sort_formats(info['formats'])
video_id = flashvars.get('video_id')
thumbnail = flashvars.get('preview_url')
ext = determine_ext(flashvars.get('postfix'), 'mp4')
video_id = self._search_regex(
(r'chs_object\s*=\s*["\'](\d+)',
r'params\[["\']video_id["\']\]\s*=\s*(\d+)'), webpage, 'video id',
default=display_id)
description = self._search_regex(
r'class=["\']desc_video_view_v2[^>]+>([^<]+)', webpage,
'description', fatal=False)
categories = [
c.strip()
for c in self._html_search_meta(
'keywords', webpage, 'categories', default='').split(',')
if c.strip()]
duration = int_or_none(self._og_search_property(
'video:duration', webpage, fatal=False))
timestamp = unified_timestamp(self._og_search_property(
'video:release_date', webpage, fatal=False))
view_count = int_or_none(self._search_regex(
r'>Views\s+(\d+)', webpage, 'view count', fatal=False))
formats = []
for video_url_key in ['video_url', 'video_alt_url']:
video_url = flashvars.get(video_url_key)
if not video_url:
continue
video_text = flashvars.get('%s_text' % video_url_key)
fmt = {
'url': video_url,
'ext': ext,
'format_id': video_text,
}
m = re.search(r'^(?P<height>\d+)[pP]', video_text)
if m:
fmt['height'] = int(m.group('height'))
formats.append(fmt)
self._sort_formats(formats)
categories = self._html_search_meta(
'keywords', webpage, 'categories', default='').split(',')
return {
return merge_dicts(info, {
'id': video_id,
'display_id': display_id,
'title': title,
'thumbnail': thumbnail,
'description': description,
'categories': categories,
'duration': duration,
'timestamp': timestamp,
'view_count': view_count,
'age_limit': 18,
'formats': formats,
}
})

View file

@ -3,6 +3,7 @@ from __future__ import unicode_literals
import hashlib
import hmac
import re
import time
import uuid
@ -117,6 +118,7 @@ class HotStarIE(HotStarBaseIE):
if video_data.get('drmProtected'):
raise ExtractorError('This video is DRM protected.', expected=True)
headers = {'Referer': url}
formats = []
geo_restricted = False
playback_sets = self._call_api_v2('h/v2/play', video_id)['playBackSets']
@ -126,6 +128,8 @@ class HotStarIE(HotStarBaseIE):
format_url = url_or_none(playback_set.get('playbackUrl'))
if not format_url:
continue
format_url = re.sub(
r'(?<=//staragvod)(\d)', r'web\1', format_url)
tags = str_or_none(playback_set.get('tagsCombination')) or ''
if tags and 'encryption:plain' not in tags:
continue
@ -133,10 +137,12 @@ class HotStarIE(HotStarBaseIE):
try:
if 'package:hls' in tags or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', m3u8_id='hls'))
format_url, video_id, 'mp4',
entry_protocol='m3u8_native',
m3u8_id='hls', headers=headers))
elif 'package:dash' in tags or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash'))
format_url, video_id, mpd_id='dash', headers=headers))
elif ext == 'f4m':
# produce broken files
pass
@ -154,6 +160,9 @@ class HotStarIE(HotStarBaseIE):
self.raise_geo_restricted(countries=['IN'])
self._sort_formats(formats)
for f in formats:
f.setdefault('http_headers', {}).update(headers)
return {
'id': video_id,
'title': title,

View file

@ -1,85 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
get_element_by_id,
remove_end,
)
class IconosquareIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:iconosquare\.com|statigr\.am)/p/(?P<id>[^/]+)'
_TEST = {
'url': 'http://statigr.am/p/522207370455279102_24101272',
'md5': '6eb93b882a3ded7c378ee1d6884b1814',
'info_dict': {
'id': '522207370455279102_24101272',
'ext': 'mp4',
'title': 'Instagram photo by @aguynamedpatrick (Patrick Janelle)',
'description': 'md5:644406a9ec27457ed7aa7a9ebcd4ce3d',
'timestamp': 1376471991,
'upload_date': '20130814',
'uploader': 'aguynamedpatrick',
'uploader_id': '24101272',
'comment_count': int,
'like_count': int,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
media = self._parse_json(
get_element_by_id('mediaJson', webpage),
video_id)
formats = [{
'url': f['url'],
'format_id': format_id,
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height'))
} for format_id, f in media['videos'].items()]
self._sort_formats(formats)
title = remove_end(self._og_search_title(webpage), ' - via Iconosquare')
timestamp = int_or_none(media.get('created_time') or media.get('caption', {}).get('created_time'))
description = media.get('caption', {}).get('text')
uploader = media.get('user', {}).get('username')
uploader_id = media.get('user', {}).get('id')
comment_count = int_or_none(media.get('comments', {}).get('count'))
like_count = int_or_none(media.get('likes', {}).get('count'))
thumbnails = [{
'url': t['url'],
'id': thumbnail_id,
'width': int_or_none(t.get('width')),
'height': int_or_none(t.get('height'))
} for thumbnail_id, t in media.get('images', {}).items()]
comments = [{
'id': comment.get('id'),
'text': comment['text'],
'timestamp': int_or_none(comment.get('created_time')),
'author': comment.get('from', {}).get('full_name'),
'author_id': comment.get('from', {}).get('username'),
} for comment in media.get('comments', {}).get('data', []) if 'text' in comment]
return {
'id': video_id,
'title': title,
'description': description,
'thumbnails': thumbnails,
'timestamp': timestamp,
'uploader': uploader,
'uploader_id': uploader_id,
'comment_count': comment_count,
'like_count': like_count,
'formats': formats,
'comments': comments,
}

View file

@ -1,5 +1,7 @@
from __future__ import unicode_literals
import base64
import json
import re
from .common import InfoExtractor
@ -8,6 +10,7 @@ from ..utils import (
mimetype2ext,
parse_duration,
qualities,
try_get,
url_or_none,
)
@ -15,15 +18,16 @@ from ..utils import (
class ImdbIE(InfoExtractor):
IE_NAME = 'imdb'
IE_DESC = 'Internet Movie Database trailers'
_VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title|list).+?[/-]vi(?P<id>\d+)'
_VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title|list).*?[/-]vi(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.imdb.com/video/imdb/vi2524815897',
'info_dict': {
'id': '2524815897',
'ext': 'mp4',
'title': 'No. 2 from Ice Age: Continental Drift (2012)',
'title': 'No. 2',
'description': 'md5:87bd0bdc61e351f21f20d2d7441cb4e7',
'duration': 152,
}
}, {
'url': 'http://www.imdb.com/video/_/vi2524815897',
@ -47,21 +51,23 @@ class ImdbIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'https://www.imdb.com/videoplayer/vi' + video_id, video_id)
video_metadata = self._parse_json(self._search_regex(
r'window\.IMDbReactInitialState\.push\(({.+?})\);', webpage,
'video metadata'), video_id)['videos']['videoMetadata']['vi' + video_id]
title = self._html_search_meta(
['og:title', 'twitter:title'], webpage) or self._html_search_regex(
r'<title>(.+?)</title>', webpage, 'title', fatal=False) or video_metadata['title']
data = self._download_json(
'https://www.imdb.com/ve/data/VIDEO_PLAYBACK_DATA', video_id,
query={
'key': base64.b64encode(json.dumps({
'type': 'VIDEO_PLAYER',
'subType': 'FORCE_LEGACY',
'id': 'vi%s' % video_id,
}).encode()).decode(),
})[0]
quality = qualities(('SD', '480p', '720p', '1080p'))
formats = []
for encoding in video_metadata.get('encodings', []):
for encoding in data['videoLegacyEncodings']:
if not encoding or not isinstance(encoding, dict):
continue
video_url = url_or_none(encoding.get('videoUrl'))
video_url = url_or_none(encoding.get('url'))
if not video_url:
continue
ext = mimetype2ext(encoding.get(
@ -69,7 +75,7 @@ class ImdbIE(InfoExtractor):
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
preference=1, m3u8_id='hls', fatal=False))
continue
format_id = encoding.get('definition')
formats.append({
@ -80,13 +86,33 @@ class ImdbIE(InfoExtractor):
})
self._sort_formats(formats)
webpage = self._download_webpage(
'https://www.imdb.com/video/vi' + video_id, video_id)
video_metadata = self._parse_json(self._search_regex(
r'args\.push\(\s*({.+?})\s*\)\s*;', webpage,
'video metadata'), video_id)
video_info = video_metadata.get('VIDEO_INFO')
if video_info and isinstance(video_info, dict):
info = try_get(
video_info, lambda x: x[list(video_info.keys())[0]][0], dict)
else:
info = {}
title = self._html_search_meta(
['og:title', 'twitter:title'], webpage) or self._html_search_regex(
r'<title>(.+?)</title>', webpage, 'title',
default=None) or info['videoTitle']
return {
'id': video_id,
'title': title,
'alt_title': info.get('videoSubTitle'),
'formats': formats,
'description': video_metadata.get('description'),
'thumbnail': video_metadata.get('slate', {}).get('url'),
'duration': parse_duration(video_metadata.get('duration')),
'description': info.get('videoDescription'),
'thumbnail': url_or_none(try_get(
video_metadata, lambda x: x['videoSlate']['source'])),
'duration': parse_duration(info.get('videoRuntime')),
}

View file

@ -0,0 +1,133 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
ExtractorError,
int_or_none,
str_or_none,
try_get,
)
class ImgGamingBaseIE(InfoExtractor):
_API_BASE = 'https://dce-frontoffice.imggaming.com/api/v2/'
_API_KEY = '857a1e5d-e35e-4fdf-805b-a87b6f8364bf'
_HEADERS = None
_MANIFEST_HEADERS = {'Accept-Encoding': 'identity'}
_REALM = None
_VALID_URL_TEMPL = r'https?://(?P<domain>%s)/(?P<type>live|playlist|video)/(?P<id>\d+)(?:\?.*?\bplaylistId=(?P<playlist_id>\d+))?'
def _real_initialize(self):
self._HEADERS = {
'Realm': 'dce.' + self._REALM,
'x-api-key': self._API_KEY,
}
email, password = self._get_login_info()
if email is None:
self.raise_login_required()
p_headers = self._HEADERS.copy()
p_headers['Content-Type'] = 'application/json'
self._HEADERS['Authorization'] = 'Bearer ' + self._download_json(
self._API_BASE + 'login',
None, 'Logging in', data=json.dumps({
'id': email,
'secret': password,
}).encode(), headers=p_headers)['authorisationToken']
def _call_api(self, path, media_id):
return self._download_json(
self._API_BASE + path + media_id, media_id, headers=self._HEADERS)
def _extract_dve_api_url(self, media_id, media_type):
stream_path = 'stream'
if media_type == 'video':
stream_path += '/vod/'
else:
stream_path += '?eventId='
try:
return self._call_api(
stream_path, media_id)['playerUrlCallback']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
raise ExtractorError(
self._parse_json(e.cause.read().decode(), media_id)['messages'][0],
expected=True)
raise
def _real_extract(self, url):
domain, media_type, media_id, playlist_id = re.match(self._VALID_URL, url).groups()
if playlist_id:
if self._downloader.params.get('noplaylist'):
self.to_screen('Downloading just video %s because of --no-playlist' % media_id)
else:
self.to_screen('Downloading playlist %s - add --no-playlist to just download video' % playlist_id)
media_type, media_id = 'playlist', playlist_id
if media_type == 'playlist':
playlist = self._call_api('vod/playlist/', media_id)
entries = []
for video in try_get(playlist, lambda x: x['videos']['vods']) or []:
video_id = str_or_none(video.get('id'))
if not video_id:
continue
entries.append(self.url_result(
'https://%s/video/%s' % (domain, video_id),
self.ie_key(), video_id))
return self.playlist_result(
entries, media_id, playlist.get('title'),
playlist.get('description'))
dve_api_url = self._extract_dve_api_url(media_id, media_type)
video_data = self._download_json(dve_api_url, media_id)
is_live = media_type == 'live'
if is_live:
title = self._live_title(self._call_api('event/', media_id)['title'])
else:
title = video_data['name']
formats = []
for proto in ('hls', 'dash'):
media_url = video_data.get(proto + 'Url') or try_get(video_data, lambda x: x[proto]['url'])
if not media_url:
continue
if proto == 'hls':
m3u8_formats = self._extract_m3u8_formats(
media_url, media_id, 'mp4', 'm3u8' if is_live else 'm3u8_native',
m3u8_id='hls', fatal=False, headers=self._MANIFEST_HEADERS)
for f in m3u8_formats:
f.setdefault('http_headers', {}).update(self._MANIFEST_HEADERS)
formats.append(f)
else:
formats.extend(self._extract_mpd_formats(
media_url, media_id, mpd_id='dash', fatal=False,
headers=self._MANIFEST_HEADERS))
self._sort_formats(formats)
subtitles = {}
for subtitle in video_data.get('subtitles', []):
subtitle_url = subtitle.get('url')
if not subtitle_url:
continue
subtitles.setdefault(subtitle.get('lang', 'en_US'), []).append({
'url': subtitle_url,
})
return {
'id': media_id,
'title': title,
'formats': formats,
'thumbnail': video_data.get('thumbnailUrl'),
'description': video_data.get('description'),
'duration': int_or_none(video_data.get('duration')),
'tags': video_data.get('tags'),
'is_live': is_live,
'subtitles': subtitles,
}

View file

@ -58,7 +58,7 @@ class IndavideoEmbedIE(InfoExtractor):
video_id = self._match_id(url)
video = self._download_json(
'http://amfphp.indavideo.hu/SYm0json.php/player.playerHandler.getVideoData/%s' % video_id,
'https://amfphp.indavideo.hu/SYm0json.php/player.playerHandler.getVideoData/%s' % video_id,
video_id)['data']
title = video['title']

View file

@ -22,7 +22,7 @@ from ..utils import (
class InstagramIE(InfoExtractor):
_VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/p/(?P<id>[^/?#&]+))'
_VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/(?:p|tv)/(?P<id>[^/?#&]+))'
_TESTS = [{
'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',
'md5': '0d2da106a9d2631273e192b372806516',
@ -92,6 +92,9 @@ class InstagramIE(InfoExtractor):
}, {
'url': 'http://instagram.com/p/9o6LshA7zy/embed/',
'only_matching': True,
}, {
'url': 'https://www.instagram.com/tv/aye83DjauH/',
'only_matching': True,
}]
@staticmethod

View file

@ -1,15 +1,13 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urlparse,
)
from ..utils import (
determine_ext,
int_or_none,
xpath_text,
)
class InternetVideoArchiveIE(InfoExtractor):
@ -20,7 +18,7 @@ class InternetVideoArchiveIE(InfoExtractor):
'info_dict': {
'id': '194487',
'ext': 'mp4',
'title': 'KICK-ASS 2',
'title': 'Kick-Ass 2',
'description': 'md5:c189d5b7280400630a1d3dd17eaa8d8a',
},
'params': {
@ -33,68 +31,34 @@ class InternetVideoArchiveIE(InfoExtractor):
def _build_json_url(query):
return 'http://video.internetvideoarchive.net/player/6/configuration.ashx?' + query
@staticmethod
def _build_xml_url(query):
return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query
def _real_extract(self, url):
query = compat_urlparse.urlparse(url).query
query_dic = compat_parse_qs(query)
video_id = query_dic['publishedid'][0]
if '/player/' in url:
configuration = self._download_json(url, video_id)
# There are multiple videos in the playlist whlie only the first one
# matches the video played in browsers
video_info = configuration['playlist'][0]
title = video_info['title']
formats = []
for source in video_info['sources']:
file_url = source['file']
if determine_ext(file_url) == 'm3u8':
m3u8_formats = self._extract_m3u8_formats(
file_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)
if m3u8_formats:
formats.extend(m3u8_formats)
file_url = m3u8_formats[0]['url']
formats.extend(self._extract_f4m_formats(
file_url.replace('.m3u8', '.f4m'),
video_id, f4m_id='hds', fatal=False))
formats.extend(self._extract_mpd_formats(
file_url.replace('.m3u8', '.mpd'),
video_id, mpd_id='dash', fatal=False))
else:
a_format = {
'url': file_url,
}
if source.get('label') and source['label'][-4:] == ' kbs':
tbr = int_or_none(source['label'][:-4])
a_format.update({
'tbr': tbr,
'format_id': 'http-%d' % tbr,
})
formats.append(a_format)
self._sort_formats(formats)
description = video_info.get('description')
thumbnail = video_info.get('image')
else:
configuration = self._download_xml(url, video_id)
formats = [{
'url': xpath_text(configuration, './file', 'file URL', fatal=True),
}]
thumbnail = xpath_text(configuration, './image', 'thumbnail')
title = 'InternetVideoArchive video %s' % video_id
description = None
query = compat_parse_qs(compat_urlparse.urlparse(url).query)
video_id = query['publishedid'][0]
data = self._download_json(
'https://video.internetvideoarchive.net/videojs7/videojs7.ivasettings.ashx',
video_id, data=json.dumps({
'customerid': query['customerid'][0],
'publishedid': video_id,
}).encode())
title = data['Title']
formats = self._extract_m3u8_formats(
data['VideoUrl'], video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False)
file_url = formats[0]['url']
if '.ism/' in file_url:
replace_url = lambda x: re.sub(r'\.ism/[^?]+', '.ism/' + x, file_url)
formats.extend(self._extract_f4m_formats(
replace_url('.f4m'), video_id, f4m_id='hds', fatal=False))
formats.extend(self._extract_mpd_formats(
replace_url('.mpd'), video_id, mpd_id='dash', fatal=False))
formats.extend(self._extract_ism_formats(
replace_url('Manifest'), video_id, ism_id='mss', fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': thumbnail,
'description': description,
'thumbnail': data.get('PosterUrl'),
'description': data.get('Description'),
}

View file

@ -16,12 +16,22 @@ class IPrimaIE(InfoExtractor):
_GEO_BYPASS = False
_TESTS = [{
'url': 'http://play.iprima.cz/gondici-s-r-o-33',
'url': 'https://prima.iprima.cz/particka/92-epizoda',
'info_dict': {
'id': 'p136534',
'id': 'p51388',
'ext': 'mp4',
'title': 'Gondíci s. r. o. (34)',
'description': 'md5:16577c629d006aa91f59ca8d8e7f99bd',
'title': 'Partička (92)',
'description': 'md5:859d53beae4609e6dd7796413f1b6cac',
},
'params': {
'skip_download': True, # m3u8 download
},
}, {
'url': 'https://cnn.iprima.cz/videa/70-epizoda',
'info_dict': {
'id': 'p681554',
'ext': 'mp4',
'title': 'HLAVNÍ ZPRÁVY 3.5.2020',
},
'params': {
'skip_download': True, # m3u8 download
@ -68,9 +78,16 @@ class IPrimaIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(
webpage, default=None) or self._search_regex(
r'<h1>([^<]+)', webpage, 'title')
video_id = self._search_regex(
(r'<iframe[^>]+\bsrc=["\'](?:https?:)?//(?:api\.play-backend\.iprima\.cz/prehravac/embedded|prima\.iprima\.cz/[^/]+/[^/]+)\?.*?\bid=(p\d+)',
r'data-product="([^"]+)">'),
r'data-product="([^"]+)">',
r'id=["\']player-(p\d+)"',
r'playerId\s*:\s*["\']player-(p\d+)',
r'\bvideos\s*=\s*["\'](p\d+)'),
webpage, 'real id')
playerpage = self._download_webpage(
@ -125,8 +142,8 @@ class IPrimaIE(InfoExtractor):
return {
'id': video_id,
'title': self._og_search_title(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'title': title,
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'formats': formats,
'description': self._og_search_description(webpage),
'description': self._og_search_description(webpage, default=None),
}

View file

@ -150,7 +150,7 @@ class IqiyiSDKInterpreter(object):
elif function in other_functions:
other_functions[function]()
else:
raise ExtractorError('Unknown funcion %s' % function)
raise ExtractorError('Unknown function %s' % function)
return sdk.target

View file

@ -1,8 +1,9 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import json
import re
import sys
from .common import InfoExtractor
from ..utils import (
@ -18,6 +19,8 @@ class IviIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ivi\.(?:ru|tv)/(?:watch/(?:[^/]+/)?|video/player\?.*?videoId=)(?P<id>\d+)'
_GEO_BYPASS = False
_GEO_COUNTRIES = ['RU']
_LIGHT_KEY = b'\xf1\x02\x32\xb7\xbc\x5c\x7a\xe8\xf7\x96\xc1\x33\x2b\x27\xa1\x8c'
_LIGHT_URL = 'https://api.ivi.ru/light/'
_TESTS = [
# Single movie
@ -80,48 +83,96 @@ class IviIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
data = {
data = json.dumps({
'method': 'da.content.get',
'params': [
video_id, {
'site': 's183',
'site': 's%d',
'referrer': 'http://www.ivi.ru/watch/%s' % video_id,
'contentid': video_id
}
]
}
})
video_json = self._download_json(
'http://api.digitalaccess.ru/api/json/', video_id,
'Downloading video JSON', data=json.dumps(data))
bundled = hasattr(sys, 'frozen')
if 'error' in video_json:
error = video_json['error']
origin = error['origin']
if origin == 'NotAllowedForLocation':
self.raise_geo_restricted(
msg=error['message'], countries=self._GEO_COUNTRIES)
elif origin == 'NoRedisValidData':
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
raise ExtractorError(
'Unable to download video %s: %s' % (video_id, error['message']),
expected=True)
for site in (353, 183):
content_data = (data % site).encode()
if site == 353:
if bundled:
continue
try:
from Cryptodome.Cipher import Blowfish
from Cryptodome.Hash import CMAC
pycryptodomex_found = True
except ImportError:
pycryptodomex_found = False
continue
timestamp = (self._download_json(
self._LIGHT_URL, video_id,
'Downloading timestamp JSON', data=json.dumps({
'method': 'da.timestamp.get',
'params': []
}).encode(), fatal=False) or {}).get('result')
if not timestamp:
continue
query = {
'ts': timestamp,
'sign': CMAC.new(self._LIGHT_KEY, timestamp.encode() + content_data, Blowfish).hexdigest(),
}
else:
query = {}
video_json = self._download_json(
self._LIGHT_URL, video_id,
'Downloading video JSON', data=content_data, query=query)
error = video_json.get('error')
if error:
origin = error.get('origin')
message = error.get('message') or error.get('user_message')
extractor_msg = 'Unable to download video %s'
if origin == 'NotAllowedForLocation':
self.raise_geo_restricted(message, self._GEO_COUNTRIES)
elif origin == 'NoRedisValidData':
extractor_msg = 'Video %s does not exist'
elif site == 353:
continue
elif bundled:
raise ExtractorError(
'This feature does not work from bundled exe. Run youtube-dl from sources.',
expected=True)
elif not pycryptodomex_found:
raise ExtractorError(
'pycryptodomex not found. Please install it.',
expected=True)
elif message:
extractor_msg += ': ' + message
raise ExtractorError(extractor_msg % video_id, expected=True)
else:
break
result = video_json['result']
title = result['title']
quality = qualities(self._KNOWN_FORMATS)
formats = [{
'url': x['url'],
'format_id': x.get('content_format'),
'quality': quality(x.get('content_format')),
} for x in result['files'] if x.get('url')]
formats = []
for f in result.get('files', []):
f_url = f.get('url')
content_format = f.get('content_format')
if not f_url or '-MDRM-' in content_format or '-FPS-' in content_format:
continue
formats.append({
'url': f_url,
'format_id': content_format,
'quality': quality(content_format),
'filesize': int_or_none(f.get('size_in_bytes')),
})
self._sort_formats(formats)
title = result['title']
duration = int_or_none(result.get('duration'))
compilation = result.get('compilation')
episode = title if compilation else None
@ -158,7 +209,7 @@ class IviIE(InfoExtractor):
'episode_number': episode_number,
'thumbnails': thumbnails,
'description': description,
'duration': duration,
'duration': int_or_none(result.get('duration')),
'formats': formats,
}
@ -188,7 +239,7 @@ class IviCompilationIE(InfoExtractor):
self.url_result(
'http://www.ivi.ru/watch/%s/%s' % (compilation_id, serie), IviIE.ie_key())
for serie in re.findall(
r'<a href="/watch/%s/(\d+)"[^>]+data-id="\1"' % compilation_id, html)]
r'<a\b[^>]+\bhref=["\']/watch/%s/(\d+)["\']' % compilation_id, html)]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)

Some files were not shown because too many files have changed in this diff Show more