data-prep-toolkit-transforms

View on PyPIReverse Dependencies (1)

0.2.1.dev3 data_prep_toolkit_transforms-0.2.1.dev3-py3-none-any.whl

Wheel Details

Project: data-prep-toolkit-transforms
Version: 0.2.1.dev3
Filename: data_prep_toolkit_transforms-0.2.1.dev3-py3-none-any.whl
Download: [link]
Size: 115220
MD5: 691fb641b9ffce40ec6f80497c2da2b3
SHA256: 877437589aa28645f3a042b71599d076f36cf736b571ba0e682798be2dbd3bbb
Uploaded: 2024-09-11 19:17:07 +0000

dist-info

METADATA

Metadata-Version: 2.1
Name: data_prep_toolkit_transforms
Version: 0.2.1.dev3
Summary: Data Preparation Toolkit Transforms
Author-Email: Maroun Touma <touma[at]us.ibm.com>
License: Apache-2.0
Keywords: transforms,data preprocessing,data preparation,llm,generative,ai,fine-tuning,llmapps
Requires-Python: <3.12,>=3.10
Requires-Dist: data-prep-toolkit (>=0.2.1.dev3)
Requires-Dist: bs4 (==0.0.2)
Requires-Dist: docling-core (==1.2.0)
Requires-Dist: docling (==1.11.0)
Requires-Dist: filetype (<2.0.0,>=1.2.0)
Requires-Dist: quackling (==0.4.0)
Requires-Dist: duckdb (==0.10.1)
Requires-Dist: fasttext (==0.9.2)
Requires-Dist: huggingface-hub (<1.0.0,>=0.21.4)
Requires-Dist: langcodes (==3.3.0)
Requires-Dist: mmh3 (==4.1.0)
Requires-Dist: numpy (==1.26.4)
Requires-Dist: pandas
Requires-Dist: parameterized
Requires-Dist: sentence-transformers (==3.0.1)
Requires-Dist: transformers (==4.38.2)
Requires-Dist: xxhash (==3.4.1)
Requires-Dist: presidio-analyzer (>=2.2.355)
Requires-Dist: presidio-anonymizer (>=2.2.355)
Requires-Dist: flair (>=0.14.0)
Requires-Dist: pandas (>=2.2.2)
Requires-Dist: scancode-toolkit (==32.1.0); platform_system != "Darwin"
Description-Content-Type: text/markdown
[Description omitted; length: 2093 characters]

WHEEL

Wheel-Version: 1.0
Generator: setuptools (74.1.2)
Root-Is-Purelib: true
Tag: py3-none-any

RECORD

Path Digest Size
cc_net_prepro.py sha256=B9lt_2ZFVsORqXHdepNQ3B-wd3CT_9wNrC41Qcedtg0 4929
code2parquet_local.py sha256=lpc4oesF2F-6cq3kOlUoteNNytRLhI3fUcVlKXCFP9M 2140
code2parquet_local_python.py sha256=X0IyCfonb4G1X2rjuhT0bpdYRz1exORg_UVRG9NyI9s 2326
code2parquet_s3_python.py sha256=3lV3rWxY_iCceCuKUoguztDsNp--nOU2E21Jad3dje0 2330
code2parquet_transform.py sha256=Rk1wKMMDvmE7ORqg59AVFCniDlkvNKGiWt5TBJaaZNQ 10512
code2parquet_transform_python.py sha256=dnaECaTrQtqsJjDHfXTD4AjTTzEFL0rSyPD_kqCpAQA 1539
code_quality_local.py sha256=cVjqwZwDDLs6Dee72XNbBRxExWvskwGSv14RPUR9WS8 1495
code_quality_local_python.py sha256=jBKu4a0cv0LwsVvq74eJ67Zvc8cZdajptlGHcSfgsB4 1933
code_quality_transform.py sha256=tBGjqBnym9cqbx16Tu5e0nivQOmAct2heSZe-Om70HQ 11283
code_quality_transform_python.py sha256=lcGBN2Oky9XuoY4kC5Huof5zHg09jmHCLE7zp_XpP3c 1140
doc_Gopher_statistics.py sha256=KxZpWjXKX5l9FkQnyLzwDb3fmdG-8awoGbKWnleFtPI 6530
doc_c4_statistics.py sha256=gZBossXPPswmJ1sujXMNcV7giup8VKZy9OYbnIMLLUM 6048
doc_chunk_chunkers.py sha256=9VsEXCuk5Nvs1QLpSfdCnt0XLS21Cru6q36G7eWhQJo 2695
doc_chunk_local.py sha256=AtuaTktcJ4D2qca_0NCkRCHzBVYcQ0cVI00K03YU7U8 1471
doc_chunk_local_python.py sha256=HTIpnhqql61imSrNFgjzNKf56hoGEaVFHS_ZyZ0GXy8 2148
doc_chunk_transform.py sha256=7qE14srWEHVf164lAxp6cY95GUmIdAsM4Qda_n-nT1s 8234
doc_chunk_transform_python.py sha256=LPSxi9yTeSR04AnSgPjwsHfuPBqLGmHLS5HuEFaNHes 1730
doc_id_local.py sha256=FdKnmtPVeCnaooZkxOzpi7R6b8lHKtD16AE3U--7wQU 2389
doc_id_local_python.py sha256=5clxS6OVatTMk2IjAAskmHirjqjRQNVeo8hud_Q5rcQ 2328
doc_id_transform_base.py sha256=qKzBLkatF79MoVan9RXEhviJyXRgn0FhaPOuQe4rNbs 6697
doc_id_transform_python.py sha256=iT6Ub4MKGE62IC-FwYnqKNQcPXKH3wuMKDkgw0nONDY 4801
doc_quality_local.py sha256=NgzYtfZuixRb674upFaqG2av2NDCup0ZXuhmx11M22E 1729
doc_quality_local_python.py sha256=G0kCMYy7Xv3RpI1SIOlEECK2WvooH6pCW9TNPgmTk5s 2398
doc_quality_transform.py sha256=g3JAHZMXkx77I3IT6sgUFLwUnHjTlj-dl_QVyyP7O0g 10872
doc_quality_transform_python.py sha256=5zBVvG_xHIXa-iuu6kgJ7kAJNM7eJPMC4Beo5oI-hrM 1720
doc_quality_utils.py sha256=n8vHmiLLAtO45_7S8lnseH4sUmyPrhBNIPuUEOhCqfI 2264
ededup_local.py sha256=Z8TjQ9c_ONspihEd9_Xv3TgQlHsRyOAZ0gfNV96D6wU 2027
ededup_local_python.py sha256=QgdYIsL05bEh_eHrv-8mCv4x-DR8NvrdOdCmTMbfRcY 1915
ededup_local_python_incremental.py sha256=ujumN2zto-JLtfWaealiH5ILMqJgKCYAYWLFXxEso44 2085
ededup_transform_base.py sha256=STehVD7DL1b3vmx-IQY3tEGtN1LYTcII2VPVhvNNl5w 9147
ededup_transform_python.py sha256=08KIvJK3psfQrTD3IlAGxvRIGaiAIFNF3vbGD63sOZs 6128
filter_local.py sha256=p6yuEnaxUgceU8IzExb34sQNXMv56o3VSYAfBMmrigI 2189
filter_local_python.py sha256=b7UwEG_AvPMkH6bppqG4hZToM3giFq2p08i4zqR8XpI 2388
filter_test_support.py sha256=zV1ZzP0fIAVXgV-bKfGBCTO3nLs45U8NOS-7e0hWGdY 5684
filter_transform.py sha256=TFzfNnWceUhxsRp89oQHH-qKz97wh_Y1xUAks8xZEUk 8260
filter_transform_python.py sha256=7xeCdI5gdri0JKT_jxg1BPBAnSRqMt3AoZ0fmuHZ8WY 1326
flair_recognizer.py sha256=r5NeTRFyalrUkkhUzzRmMjKu-hJgGDTh5Fc2Yj1kUDs 5218
header_cleanser_local.py sha256=8DwEfmjKXLHoUC2B18S6leZwcNhOkheFwyweR1H0IL4 1944
header_cleanser_local_python.py sha256=lmn57Cws6A2bKhgM9ucatJ1X-7frDGvNhww6Ol_uCxs 2077
header_cleanser_test_support.py sha256=EO8mk8rSzPNYVjkW7WqUaCZCO_CJqEQ_gidZYDURLh4 3460
header_cleanser_transform.py sha256=e8aLlSY9nPdv2unpXdZQOFhmvUPowDtGRODJF5O_tFI 7967
header_cleanser_transform_python.py sha256=N08Goq8rsot2lsFNtK7XjdERbSEYUmeL4KzTqKAJkM8 1390
lang_id_local.py sha256=wX69bfpLo28VLLgBzXJWQMP9dkAyWZCPW766iu5qphE 1929
lang_id_local_python.py sha256=PhDhwXbuKNDaIvvBS-3G-oTh-SGc0W0O_Z-Q8E5P7BU 2314
lang_id_transform.py sha256=Y9XBp3RQJrH1lcsZIuRa7fGn6Jlz_TwM18XRw6gUvrE 6679
lang_id_transform_python.py sha256=i9VnbkbOLrc5BjNUji_7hM0v2bJ5-XGflMvBm0QnM2w 1758
lang_models.py sha256=ZniqCjUwVKiL5DSY--Z7NfFcMEf0cWHUGyR745i9Hl8 1891
nlp.py sha256=yzujQyKYG8PUT3cTmwhoJAgioGL4sLZsr2dn2uqkWkA 1875
pdf2parquet_local.py sha256=9kDOJFhVbA4pvvGXY2zXRdz-GtCOQh_ztO7gwXw79-w 1609
pdf2parquet_local_python.py sha256=uioZvdLp6U2TQ--qpGX-FeDDdTLUyd8mrcf8IjB4bgg 2214
pdf2parquet_transform.py sha256=zlZVVPkSRXtbagd6oFIR0-X2TK2Y08tb2rFBhNjPlBU 13566
pdf2parquet_transform_python.py sha256=xSp6c4QwUQFg9kHNtICIbqxn-C8lYykogB1Xn9MTnj8 1610
pii_analyzer.py sha256=eOSaES5hJj2mIWuzhZZxzcMpO8VEHLv_2-DG__odQxo 2714
pii_anonymizer.py sha256=AlrXuyWg_RWzERPAMTsL3N8F_8EWFPmhv7KaPUs7GAU 929
pii_redactor_local.py sha256=FFaCOcGO9O56zCC0kOQso8Bw8t6te4BMbNPeaY6TTMA 1508
pii_redactor_local_python.py sha256=Pzar75Fo1RBFlC3F-ZQAoyPXbeW-KfGohXPVjjo4SB0 1694
pii_redactor_transform.py sha256=Q02Nk2PZOhi1_QPGyipGSrsjVph4MQbxA0sAYkmE4O4 6141
pii_redactor_transform_python.py sha256=Yo-VwNPozcv304537YYOpSLKCUyBxODTHsVtCMK1WMs 1467
proglang_select_local.py sha256=nwuE7htJpbkzLNcQpdH6Z3XElNh6Lsv5K1fWFZ9IHnY 2056
proglang_select_local_python.py sha256=BB33ejzoZeNxa02djjVHj1LhU3V8zv1sax1obfjm2kU 2351
proglang_select_transform.py sha256=MzoZqIAEvnHL7fN1TFRMJAgG3aBUcM-UTc90rDpwzB4 7675
proglang_select_transform_python.py sha256=wv1wAEcZf02b_frbaAJ0XrJC_V_sRBISkn2M7ej_-IQ 1247
resize_local.py sha256=VgwxoStQu_CuYdAkh0UuuwkMEFsztO_cmyaM85EHppE 1688
resize_local_python.py sha256=_MdQljbaUrWc_eeFWWkvG_le6SX58bLGYF4ZZIN-rVI 1869
resize_transform.py sha256=QUxt7DOfL-v4Lm_Qie05Ml_0TUcU8dgotIhmqTf6A2w 8946
resize_transform_python.py sha256=uAZtKGsLRWZoyJ-QLTyFCEE8-IlRr3-cGQH1zghT_f0 1496
text_encoder_local.py sha256=qs73_LTeBLtZXzB1aqv-xAlr0ysxtzmAguaTpGvRK4Q 1907
text_encoder_local_python.py sha256=fUXC2OfMFv5WLhMKw8PIDyiCxhGqCPXHSK0vei2SPyQ 1895
text_encoder_transform.py sha256=pWekXVfQ7Spq3hWiP27hJGnOMaHD2HDwb33kVNKxOFc 5146
text_encoder_transform_python.py sha256=mTNWldGvTwcGbydhNhcFWca5w8FHB0Lx7vgZQbqbSG4 1729
tokenization_local_long_doc_python.py sha256=kt6F1wRvWkp449npFpLhb3FJNZBjNU6qNAfeQROsv-I 2148
tokenization_local_python.py sha256=XQT69Odcw7fKhnU8C9fSl1Cqu0KQe9DOo-zMAfKiCqA 1681
tokenization_s3_long_doc_python.py sha256=n7iaMVZ6CjOvVXlLuGOvpvXiRcDKg8xbzfRbK0cMMAU 2149
tokenization_transform.py sha256=GkjfgxwEO_poPa1vsVJ2ghGNjTki3Jz155Pfq_8Z0iQ 10702
tokenization_transform_python.py sha256=J7E79ZAHIBwuLt65WHuVgQdIug8DGchX0fBQwwrKVrI 1158
tokenization_utils.py sha256=uJuozmA3hIGQHIFol3w_hT6Dw5J3dMu71or9bK0Pk6c 5475
data_prep_toolkit_transforms-0.2.1.dev3.dist-info/METADATA sha256=oVz7kFFzTllG_aLWscmT2Q9_DwsI-bWccvoJzzV36L0 3222
data_prep_toolkit_transforms-0.2.1.dev3.dist-info/WHEEL sha256=cVxcB9AmuTcXqmwrtPhNK88dr7IR_b6qagTj0UvIEbY 91
data_prep_toolkit_transforms-0.2.1.dev3.dist-info/top_level.txt sha256=O5EV8IKN2kg7d0OMLaErKlXd6odRg42enENi0plXxrM 1691
data_prep_toolkit_transforms-0.2.1.dev3.dist-info/RECORD

top_level.txt

cc_net_prepro
code2parquet_local
code2parquet_local_python
code2parquet_s3_python
code2parquet_transform
code2parquet_transform_python
code_quality_local
code_quality_local_python
code_quality_transform
code_quality_transform_python
doc_Gopher_statistics
doc_c4_statistics
doc_chunk_chunkers
doc_chunk_local
doc_chunk_local_python
doc_chunk_transform
doc_chunk_transform_python
doc_id_local
doc_id_local_python
doc_id_transform_base
doc_id_transform_python
doc_quality_local
doc_quality_local_python
doc_quality_transform
doc_quality_transform_python
doc_quality_utils
ededup_local
ededup_local_python
ededup_local_python_incremental
ededup_transform_base
ededup_transform_python
filter_local
filter_local_python
filter_test_support
filter_transform
filter_transform_python
flair_recognizer
header_cleanser_local
header_cleanser_local_python
header_cleanser_test_support
header_cleanser_transform
header_cleanser_transform_python
lang_id_local
lang_id_local_python
lang_id_transform
lang_id_transform_python
lang_models
nlp
pdf2parquet_local
pdf2parquet_local_python
pdf2parquet_transform
pdf2parquet_transform_python
pii_analyzer
pii_anonymizer
pii_redactor_local
pii_redactor_local_python
pii_redactor_transform
pii_redactor_transform_python
proglang_select_local
proglang_select_local_python
proglang_select_transform
proglang_select_transform_python
resize_local
resize_local_python
resize_transform
resize_transform_python
text_encoder_local
text_encoder_local_python
text_encoder_transform
text_encoder_transform_python
tokenization_local_long_doc_python
tokenization_local_python
tokenization_s3_long_doc_python
tokenization_transform
tokenization_transform_python
tokenization_utils