Commit graph

180 commits

Author SHA1 Message Date
Sergey M․ 4195096ea8
[utils] Improve comments processing in js_to_json (closes #11947) 2017-02-03 03:04:33 +07:00
Michal Čihař b3ee552e4b
[utils] Handle single-line comments in js_to_json 2017-02-03 03:04:33 +07:00
Sergey M․ 15846398ca
[utils] Improve parse_duration 2017-01-26 23:23:08 +07:00
Sergey M․ cb655f34fb
[utils] Add more date formats 2017-01-12 22:39:45 +07:00
Remita Amine 7fe1592073 [common] fix dash codec information for mixed videos and fragment url construction(#11490) 2016-12-20 12:35:03 +01:00
Sergey M․ b0c65c677f
[utils] Improve urljoin 2016-12-17 18:49:55 +07:00
Sergey M․ e34c33614d
[utils] Add convenience urljoin 2016-12-13 02:23:49 +07:00
Yen Chi Hsuan 582be35847
Update coding style after pycodestyle 2.1.0
In pycodestyle 2.1.0, E305 was introduced, which requires two blank
lines after top level declarations, too.

See https://github.com/PyCQA/pycodestyle/issues/400

See also #10689; thanks @stepshal for first mentioning this issue and
initial patches
2016-11-17 19:45:42 +08:00
Sergey M․ 02dc0a36b7
[utils] Introduce base_url 2016-11-02 02:30:18 +07:00
Sergey M․ c6eed6b8c0
[utils] Lower priority for rare date formats and add tests 2016-09-29 23:52:29 +07:00
Sergey M․ 3e4185c396
[utils] Use native french month names 2016-09-14 23:59:38 +07:00
Sergey M․ f6717dec8a
[utils] Improve month_by_name and add tests 2016-09-14 23:59:38 +07:00
Sergey M․ 6562d34a8c
[utils] Improve mimetype2ext 2016-09-02 22:57:48 +07:00
Yen Chi Hsuan 70852b47ca
[utils] Recognize units with full names in parse_filename
Reference: https://en.wikipedia.org/wiki/Template:Quantities_of_bytes
2016-08-20 00:17:26 +08:00
Yen Chi Hsuan e4659b4547
[utils] Correct octal/hexadecimal number detection in js_to_json 2016-08-19 20:37:17 +08:00
Sergey M․ 13585d7682
[utils] Recognize lowercase units in parse_filesize 2016-08-18 23:32:00 +07:00
Remita Amine 5f2c2b7936 [test_utils] add test for option with not str value 2016-08-13 09:54:12 +01:00
Sergey M․ a8795327ca
[utils] Add support TV Parental Guidelines ratings in parse_age_limit 2016-08-07 20:45:18 +07:00
Yen Chi Hsuan 7dc2a74e0a
[utils] Fix unified_timestamp for formats parsed by parsedate_tz() 2016-08-05 11:41:55 +08:00
Yen Chi Hsuan 0b68de3cc1 Merge pull request #8876 from remitamine/html5_media
[extractor/common] add helper method to extract html5 media entries
2016-07-10 23:40:45 +08:00
Yen Chi Hsuan 84c237fb8a
[utils] Add get_element_by_class
For #9950
2016-07-06 20:02:52 +08:00
Remita Amine dfaa86b75e [test_utils] add test for smuggling a smuggled url 2016-07-04 21:36:32 +01:00
remitamine 4f3c5e0627 [utils] add helper function for parsing codecs 2016-06-26 14:03:58 +01:00
Yen Chi Hsuan 1143535d76
[utils] Add urshift()
Used in IqiyiIE and LeIE
2016-06-26 15:16:49 +08:00
Sergey M․ 46f59e89ea
[utils] Add unified_timestamp 2016-06-25 23:19:18 +07:00
Yen Chi Hsuan 47212f7bcb
[utils] Don't transform numbers not starting with a zero
Fix test_Viidea and maybe others
2016-06-16 11:00:54 +08:00
Yen Chi Hsuan 55b2f099c0
[utils] Decode HTML5 entities
Used in test_Vporn_1. Also related to #9270
2016-06-10 15:11:55 +08:00
bzc6p b96f007eeb Added sanitization support for Hungarian letters Ő and Ű 2016-06-02 11:39:32 +02:00
Sergey M․ 46bc9b7d7c
[utils] Allow None in remove_{start,end} 2016-05-19 04:31:30 +06:00
Sergey M․ 364cf465dd
[test_utils] PEP 8 2016-05-14 20:46:33 +06:00
Sergey M․ 89ac4a19e6
[utils] Process non-base 10 integers in js_to_json 2016-05-14 20:39:58 +06:00
felix bd1e484448
[utils] js_to_json: various improvements
now JS object literals like { /* " */ 0: ",]\xaa<\/p>", } will be correctly converted to JSON.
2016-05-14 20:12:39 +06:00
Yen Chi Hsuan 778a1ccca7
[utils] Add Œ and œ found in French to ACCENT_CHARS
Fixes #9463
2016-05-12 19:48:48 +08:00
Yen Chi Hsuan dab0daeeb0
[utils,compat] Move struct_pack and struct_unpack to compat.py 2016-05-10 14:51:38 +08:00
Adam Thalhammer 31c4448f6e Instead of replacing accented characters with an underscore when sanitizing file names in restricted mode, replace them with their non-accented equivalents fixes #9347 2016-05-02 13:25:12 +10:00
Adam Thalhammer 79a2e94e79 Instead of replacing accented characters with an underscore when sanitizing file names in restricted mode, replace them with their non-accented equivalents fixes #9347 2016-05-02 13:21:39 +10:00
Sergey M b6c0d4f431 Merge pull request #9110 from remitamine/parse_duration
[utils] imporove parse_duration to handle more formats
2016-04-21 22:53:16 +07:00
remitamine acaff49575 [utils] imporove parse_duration to handle more formats 2016-04-21 16:34:54 +01:00
Jaime Marquínez Ferrándiz eb9c3edd5e [test/utils] Add test for date_from_str 2016-04-09 22:40:05 +02:00
Yen Chi Hsuan 81f36eba88 [test/test_utils] Update for escape_url change (again) 2016-03-23 23:23:26 +08:00
Yen Chi Hsuan 2d60465e44 [test/test_utils] Update for escape_url change 2016-03-23 23:20:28 +08:00
Jaime Marquínez Ferrándiz 782b1b5bd1 [utils] lookup_unit_table: Match word boundary instead of end of string 2016-03-19 11:44:49 +01:00
Sergey M․ c5229f3926 [utils] PEP 8 2016-03-16 21:50:04 +06:00
remitamine 83548824c2 Merge pull request #8092 from bpfoley/twitter-thumbnail
[utils] Add extract_attributes for extracting html tag attributes
2016-03-16 13:16:27 +01:00
Sergey M․ fb47597b09 [bbc] Generalize unit table lookup and add parse_count 2016-03-13 16:27:20 +06:00
remitamine 3201a67f61 [test/test_utils] add more tests for update_url_query 2016-03-03 19:18:57 +01:00
remitamine fb640d0a3d [test/test_utils] add tests for update_url_query 2016-03-03 18:40:05 +01:00
Brian Foley 8bb56eeeea [utils] Add extract_attributes for extracting html tag attributes
This is much more robust than just using regexps, and handles all
the common scenarios, such as empty/no values, repeated attributes,
entity decoding, mixed case names, and the different possible value
quoting schemes.
2016-03-03 10:11:37 +00:00
Yen Chi Hsuan 5eb6bdced4 [utils] Multiple changes to base_n()
1. Renamed to encode_base_n()
2. Allow tables longer than 62 characters
3. Raise ValueError instead of AssertionError for invalid input data
4. Return the first character in the table instead of '0' for number 0
5. Add tests
2016-02-27 03:22:52 +08:00
Sergey M․ f160785c5c [utils] Remove AM/PM from unified_strdate patterns 2016-02-25 00:52:49 +06:00