data-prep-toolkit-transforms-lang1

View on PyPIReverse Dependencies (0)

0.2.2 data_prep_toolkit_transforms_lang1-0.2.2-py3-none-any.whl

Wheel Details

Project: data-prep-toolkit-transforms-lang1
Version: 0.2.2
Filename: data_prep_toolkit_transforms_lang1-0.2.2-py3-none-any.whl
Download: [link]
Size: 72037
MD5: 2b547e18dc3941b4fc5949eafa84faab
SHA256: 580d3b6442931ab5f089dab98fac807258a3a66365566146c8d90e8aef0a8fbb
Uploaded: 2024-09-29 12:14:47 +0000

dist-info

METADATA

Metadata-Version: 2.1
Name: data_prep_toolkit_transforms_lang1
Version: 0.2.2
Summary: Data Preparation Toolkit Transforms
Author-Email: Maroun Touma <touma[at]us.ibm.com>
License: Apache-2.0
Keywords: transforms,data preprocessing,data preparation,llm,generative,ai,fine-tuning,llmapps
Requires-Python: <3.12,>=3.10
Requires-Dist: data-prep-toolkit (>=0.2.1)
Requires-Dist: duckdb (>=0.10.1)
Requires-Dist: fasttext (==0.9.2)
Requires-Dist: langcodes (==3.3.0)
Requires-Dist: huggingface-hub (<1.0.0,>=0.21.4)
Requires-Dist: numpy (==1.26.4)
Requires-Dist: mmh3 (>=4.1.0)
Requires-Dist: xxhash (==3.4.1)
Requires-Dist: tqdm (==4.66.3)
Requires-Dist: scipy (==1.12.0)
Requires-Dist: sentence-transformers (>=3.0.1)
Requires-Dist: trafilatura (==1.12.0)
Requires-Dist: transformers (==4.38.2)
Description-Content-Type: text/markdown
[Description omitted; length: 2439 characters]

WHEEL

Wheel-Version: 1.0
Generator: setuptools (75.1.0)
Root-Is-Purelib: true
Tag: py3-none-any

RECORD

Path Digest Size
cc_net_prepro.py sha256=B9lt_2ZFVsORqXHdepNQ3B-wd3CT_9wNrC41Qcedtg0 4929
doc_Gopher_statistics.py sha256=KxZpWjXKX5l9FkQnyLzwDb3fmdG-8awoGbKWnleFtPI 6530
doc_c4_statistics.py sha256=gZBossXPPswmJ1sujXMNcV7giup8VKZy9OYbnIMLLUM 6048
doc_id_local.py sha256=FdKnmtPVeCnaooZkxOzpi7R6b8lHKtD16AE3U--7wQU 2389
doc_id_local_python.py sha256=5clxS6OVatTMk2IjAAskmHirjqjRQNVeo8hud_Q5rcQ 2328
doc_id_transform_base.py sha256=qKzBLkatF79MoVan9RXEhviJyXRgn0FhaPOuQe4rNbs 6697
doc_id_transform_python.py sha256=iT6Ub4MKGE62IC-FwYnqKNQcPXKH3wuMKDkgw0nONDY 4801
doc_quality_local.py sha256=NgzYtfZuixRb674upFaqG2av2NDCup0ZXuhmx11M22E 1729
doc_quality_local_python.py sha256=G0kCMYy7Xv3RpI1SIOlEECK2WvooH6pCW9TNPgmTk5s 2398
doc_quality_transform.py sha256=g3JAHZMXkx77I3IT6sgUFLwUnHjTlj-dl_QVyyP7O0g 10872
doc_quality_transform_python.py sha256=5zBVvG_xHIXa-iuu6kgJ7kAJNM7eJPMC4Beo5oI-hrM 1720
doc_quality_utils.py sha256=n8vHmiLLAtO45_7S8lnseH4sUmyPrhBNIPuUEOhCqfI 2264
ededup_local.py sha256=Z8TjQ9c_ONspihEd9_Xv3TgQlHsRyOAZ0gfNV96D6wU 2027
ededup_local_python.py sha256=QgdYIsL05bEh_eHrv-8mCv4x-DR8NvrdOdCmTMbfRcY 1915
ededup_local_python_incremental.py sha256=ujumN2zto-JLtfWaealiH5ILMqJgKCYAYWLFXxEso44 2085
ededup_transform_base.py sha256=STehVD7DL1b3vmx-IQY3tEGtN1LYTcII2VPVhvNNl5w 9147
ededup_transform_python.py sha256=08KIvJK3psfQrTD3IlAGxvRIGaiAIFNF3vbGD63sOZs 6128
filter_local.py sha256=p6yuEnaxUgceU8IzExb34sQNXMv56o3VSYAfBMmrigI 2189
filter_local_python.py sha256=b7UwEG_AvPMkH6bppqG4hZToM3giFq2p08i4zqR8XpI 2388
filter_test_support.py sha256=zV1ZzP0fIAVXgV-bKfGBCTO3nLs45U8NOS-7e0hWGdY 5684
filter_transform.py sha256=TFzfNnWceUhxsRp89oQHH-qKz97wh_Y1xUAks8xZEUk 8260
filter_transform_python.py sha256=7xeCdI5gdri0JKT_jxg1BPBAnSRqMt3AoZ0fmuHZ8WY 1326
html2parquet_local.py sha256=XLCqDATiN6ZaeX-lCKyJMzo0YOdcW-EGjhOFhCaK9nE 1588
html2parquet_local_python.py sha256=e20v22EDtq9Z7hhxXg76cc6U0Pim435rvwf1q9S5id8 1946
html2parquet_transform.py sha256=49nbOC_ORA0R4lHFV3jX1SBBC35mKxe_Eiqams0oWZw 6132
html2parquet_transform_python.py sha256=AL6r-Gj9amRsVIqoyB6UsZXbCHGZPyv5NDE5RFGOMUQ 955
lang_id_local.py sha256=wX69bfpLo28VLLgBzXJWQMP9dkAyWZCPW766iu5qphE 1929
lang_id_local_python.py sha256=PhDhwXbuKNDaIvvBS-3G-oTh-SGc0W0O_Z-Q8E5P7BU 2314
lang_id_transform.py sha256=Y9XBp3RQJrH1lcsZIuRa7fGn6Jlz_TwM18XRw6gUvrE 6679
lang_id_transform_python.py sha256=i9VnbkbOLrc5BjNUji_7hM0v2bJ5-XGflMvBm0QnM2w 1758
lang_models.py sha256=ZniqCjUwVKiL5DSY--Z7NfFcMEf0cWHUGyR745i9Hl8 1891
nlp.py sha256=yzujQyKYG8PUT3cTmwhoJAgioGL4sLZsr2dn2uqkWkA 1875
resize_local.py sha256=VgwxoStQu_CuYdAkh0UuuwkMEFsztO_cmyaM85EHppE 1688
resize_local_python.py sha256=_MdQljbaUrWc_eeFWWkvG_le6SX58bLGYF4ZZIN-rVI 1869
resize_transform.py sha256=QUxt7DOfL-v4Lm_Qie05Ml_0TUcU8dgotIhmqTf6A2w 8946
resize_transform_python.py sha256=uAZtKGsLRWZoyJ-QLTyFCEE8-IlRr3-cGQH1zghT_f0 1496
text_encoder_local.py sha256=qs73_LTeBLtZXzB1aqv-xAlr0ysxtzmAguaTpGvRK4Q 1907
text_encoder_local_python.py sha256=fUXC2OfMFv5WLhMKw8PIDyiCxhGqCPXHSK0vei2SPyQ 1895
text_encoder_transform.py sha256=pWekXVfQ7Spq3hWiP27hJGnOMaHD2HDwb33kVNKxOFc 5146
text_encoder_transform_python.py sha256=mTNWldGvTwcGbydhNhcFWca5w8FHB0Lx7vgZQbqbSG4 1729
tokenization_local_long_doc_python.py sha256=kt6F1wRvWkp449npFpLhb3FJNZBjNU6qNAfeQROsv-I 2148
tokenization_local_python.py sha256=XQT69Odcw7fKhnU8C9fSl1Cqu0KQe9DOo-zMAfKiCqA 1681
tokenization_s3_long_doc_python.py sha256=n7iaMVZ6CjOvVXlLuGOvpvXiRcDKg8xbzfRbK0cMMAU 2149
tokenization_transform.py sha256=GkjfgxwEO_poPa1vsVJ2ghGNjTki3Jz155Pfq_8Z0iQ 10702
tokenization_transform_python.py sha256=J7E79ZAHIBwuLt65WHuVgQdIug8DGchX0fBQwwrKVrI 1158
tokenization_utils.py sha256=uJuozmA3hIGQHIFol3w_hT6Dw5J3dMu71or9bK0Pk6c 5475
data_prep_toolkit_transforms_lang1-0.2.2.dist-info/METADATA sha256=OHhZDidxsT5r8yeiyjThCCMfUbfNhKuAUzTET92QXzs 3230
data_prep_toolkit_transforms_lang1-0.2.2.dist-info/WHEEL sha256=GV9aMThwP_4oNCtvEC2ec3qUYutgWeAzklro_0m4WJQ 91
data_prep_toolkit_transforms_lang1-0.2.2.dist-info/top_level.txt sha256=uKWJ6GD7YES2GJljP7dHaT4OoIxp0IT2Hzb0uT5o31g 979
data_prep_toolkit_transforms_lang1-0.2.2.dist-info/RECORD

top_level.txt

cc_net_prepro
doc_Gopher_statistics
doc_c4_statistics
doc_id_local
doc_id_local_python
doc_id_transform_base
doc_id_transform_python
doc_quality_local
doc_quality_local_python
doc_quality_transform
doc_quality_transform_python
doc_quality_utils
ededup_local
ededup_local_python
ededup_local_python_incremental
ededup_transform_base
ededup_transform_python
filter_local
filter_local_python
filter_test_support
filter_transform
filter_transform_python
html2parquet_local
html2parquet_local_python
html2parquet_transform
html2parquet_transform_python
lang_id_local
lang_id_local_python
lang_id_transform
lang_id_transform_python
lang_models
nlp
resize_local
resize_local_python
resize_transform
resize_transform_python
text_encoder_local
text_encoder_local_python
text_encoder_transform
text_encoder_transform_python
tokenization_local_long_doc_python
tokenization_local_python
tokenization_s3_long_doc_python
tokenization_transform
tokenization_transform_python
tokenization_utils