huggingface/datasets: 1.7.0
Citations Over Time
Abstract
Dataset Changes New: NLU evaluation data #2238 (@dkajtoch) New: Add SLR32, SLR52, SLR53 to OpenSLR #2241, #2311 (@cahya-wirawan) New: Bbaw egyptian #2290 (@phiwi) New: GooAQ #2260 (@bhavitvyamalik) New: SubjQA #2302 (@lewtun) New: Ascent KB #2341, #2349 (@phongnt570) New: HLGD #2325 (@tingofurro) New: Qasper #2346 (@cceyda) New: ConvQuestions benchmark #2372 (@PhilippChr) Update: Wikihow - Clarify how to load wikihow #2240 (@albertvillanova) Update multi_woz_v22 - update checksum #2281 (@lhoestq) Update: OSCAR - Set encoding in OSCAR dataset #2321 (@albertvillanova) Update: XTREME - Enable auto-download for PAN-X / Wikiann domain in XTREME #2326 (@lewtun) Update: GEM - the DART file checksums in GEM #2334 (@yjernite) Update: web_science - fixed download link #2338 (@bhavitvyamalik) Update: SNLI, MNLI- README updated for SNLI, MNLI #2364 (@bhavitvyamalik) Update: conll2003 - correct labels #2369 (@philschmid) Update: offenseval_dravidian - update citations #2385 (@adeepH) Update: ai2_arc - Add dataset tags #2405 (@OyvindTafjord) Fix: newsph_nli - test data added, dataset_infos updated #2263 (@bhavitvyamalik) Fix: hyperpartisan news detection - Remove getchildren #2367 (@ghomasHudson) Fix: indic_glue - Fix number of classes in indic_glue sna.bn dataset #2397 (@albertvillanova) Fix: head_qa - Fix keys #2408 (@lhoestq) Dataset Features Implement Dataset add_item #1870 (@albertvillanova) Implement Dataset add_column #2145 (@albertvillanova) Implement Dataset to JSON #2248, #2352 (@albertvillanova) Add rename_columnS method #2312 (@SBrandeis) add desc to tqdm in Dataset.map() #2374 (@bhavitvyamalik) Add env variable HF_MAX_IN_MEMORY_DATASET_SIZE_IN_BYTES #2399, #2409 (@albertvillanova) Metric Changes New: CUAD metrics #2273 (@bhavitvyamalik) New: Matthews/Pearson/Spearman correlation metrics #2328 (@lhoestq) Update: CER - Docs, CER above 1 #2342 (@borisdayma) General improvements and bug fixes Update black #2265 (@lhoestq) Fix incorrect update_metadata_with_features calls in ArrowDataset #2258 (@mariosasko) Faster map w/ input_columns & faster slicing w/ Iterable keys #2246 (@norabelrose) Don't use pyarrow 4.0.0 since it segfaults when casting a sliced ListArray of integers #2268 (@lhoestq) Fix query table with iterable #2269 (@lhoestq) Perform minor refactoring: use config #2253 (@albertvillanova) Update format, fingerprint and indices after add_item #2254 (@lhoestq) Always update metadata in arrow schema #2274 (@lhoestq) Make tests run faster #2266 (@lhoestq) Fix metadata validation with config names #2286 (@lhoestq) Fixed typo seperate->separate #2292 (@laksh9950) Allow collaborators to self-assign issues #2289 (@albertvillanova) Mapping in the distributed setting #2298 (@TevenLeScao) Fix conda release #2309 (@lhoestq) Fix incorrect version specification for the pyarrow package #2317 (@cemilcengiz) Set default name in init_dynamic_modules #2320 (@albertvillanova) Fix duplicate keys #2333 (@lhoestq) Add note about indices mapping in save_to_disk docstring #2332 (@lhoestq) Metadata validation #2107 (@theo-m) Add Validation For README #2121 (@gchhablani) Fix overflow issue in interpolation search #2336 (@mariosasko) Datasets cli improvements #2315 (@mariosasko) Add key type and duplicates verification with hashing #2245 (@NikhilBartwal) More consistent copy logic #2340 (@mariosasko) Update README vallidation rules #2353 (@gchhablani) normalized TOCs and titles in data cards #2355 (@yjernite) simpllify faiss index save #2351 (@Guitaricet) Allow "other-X" in licenses #2368 (@gchhablani) Improve ReadInstruction logic and update docs #2261 (@mariosasko) Disallow duplicate keys in yaml tags #2379 (@lhoestq) maintain YAML structure reading from README #2380 (@bhavitvyamalik) add dataset card title #2381 (@bhavitvyamalik) Add tests for dataset cards #2348 (@gchhablani) Improve example in rounding docs #2383 (@mariosasko) Paperswithcode dataset mapping #2404 (@julien-c) Free datasets with cache file in temp dir on exit #2403 (@mariosasko) Experimental and work in progress: Cast a dataset for specific tasks Task casting for text classification & question answering #2255 (@SBrandeis) Add check for task templates on dataset load #2390 (@lewtun) Add args description to DatasetInfo #2384 (@lewtun) Improve task api code quality #2376 (@mariosasko)