dom-tokenizers

View on PyPIReverse Dependencies (0)

0.0.17 dom_tokenizers-0.0.17-py3-none-any.whl

Wheel Details

Project: dom-tokenizers
Version: 0.0.17
Filename: dom_tokenizers-0.0.17-py3-none-any.whl
Download: [link]
Size: 28596
MD5: 030b48548448afd2fbb4df273c441465
SHA256: 1e14b260d25b0823db01aaf66ee08f836d7140a9aba9779ce899079ae6cfde6d
Uploaded: 2024-06-19 20:00:48 +0000

dist-info

METADATA

Metadata-Version: 2.1
Name: dom-tokenizers
Version: 0.0.17
Summary: DOM-aware tokenization for 🤗 Hugging Face language models
Author-Email: Gary Benson <gary[at]gbenson.net>
Project-Url: Homepage, https://github.com/gbenson/dom-tokenizers
Project-Url: Source, https://github.com/gbenson/dom-tokenizers
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing :: Markup :: HTML
Requires-Python: >=3.10
Requires-Dist: python-magic
Requires-Dist: tokenizers
Requires-Dist: unidecode
Requires-Dist: build; extra == "dev"
Requires-Dist: datasets; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Requires-Dist: flake8-custom-import-rules; extra == "dev"
Requires-Dist: flake8-quotes; extra == "dev"
Requires-Dist: pillow; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: transformers; extra == "dev"
Requires-Dist: datasets; extra == "train"
Requires-Dist: pillow; extra == "train"
Requires-Dist: transformers; extra == "train"
Provides-Extra: dev
Provides-Extra: train
Description-Content-Type: text/markdown
License-File: LICENSE
[Description omitted; length: 4427 characters]

WHEEL

Wheel-Version: 1.0
Generator: setuptools (70.1.0)
Root-Is-Purelib: true
Tag: py3-none-any

RECORD

Path Digest Size
dom_tokenizers/__init__.py sha256=5hpYkYozXjJH6aCLLCvogRskaUt4KnM4Z8WMnsFR5nk 52
dom_tokenizers/train.py sha256=k4bHTSxS3ILLzSc6ldXdUaGtNUSltxobge2IsBrKCUQ 6108
dom_tokenizers/internal/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
dom_tokenizers/internal/json.py sha256=WbsasHTkl7KE_qBgbFRXNaaSiEt9W8ZZwE_0J1ANpvA 429
dom_tokenizers/internal/jsonl.py sha256=CyBUsBmXAVmfPcsvj1CnodSxfo5m1BmHSYHDsXABjck 1026
dom_tokenizers/internal/transformers.py sha256=Z7lkwJV55Afjyz8mVNdP-AzXKU2H1kSg3hGrLhYl8hM 589
dom_tokenizers/pre_tokenizers/__init__.py sha256=o2HCII8oWjL0s0C81abj8wG61nrCwgOe8hKnYSr1swM 50
dom_tokenizers/pre_tokenizers/base64.py sha256=vDQlgeMBXJ2wFKFbdsfh21E_z0W5RIA0rvlWu5EqwHc 1778
dom_tokenizers/pre_tokenizers/compat_itertools.py sha256=-V0hUQARHfdW0R0FvOrk_-BYDviuE_TZyvKad0Ec-eQ 296
dom_tokenizers/pre_tokenizers/dom_snapshot.py sha256=Nt8MoSmUWBuQ0n9v2EGKzl9jH3PaRbSOwJrk4ui1iDI 4858
dom_tokenizers/pre_tokenizers/html.py sha256=3xxZIfAjd6Ue4jCTbeVOQ6AE4bGFPKuTwTx4Wd1Kp94 322
dom_tokenizers/pre_tokenizers/pre_tokenizer.py sha256=6mUQ504FgACM1I6Vy0yAuLww__KJheDENph4vdM0EAQ 3011
dom_tokenizers/pre_tokenizers/splitter.py sha256=Pl9ZSDO8AvMaJLjFr4DaAmIqe_rdcQKF1avqFqTKA0A 22500
dom_tokenizers/pre_tokenizers/token_buffer.py sha256=4JM8_bPhgGJ1Vy5KAq-8T7AB9E1FADuknlgsl75CsMM 541
dom_tokenizers/scripts/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
dom_tokenizers/scripts/defaults.py sha256=kVqW6wp16CC2fZtIQNpCBB-dZAkNI_r3FtwKvz85rns 215
dom_tokenizers/scripts/diff.py sha256=NR6bdQpb0rxNnKvI4btgIuw7_-EHerCobphJg3Nh5Mo 3585
dom_tokenizers/scripts/dump.py sha256=T3124d8pmJ4ZHBdM-TE7IYrnjmgr0B35yYDO2XDfoRs 1609
dom_tokenizers/scripts/dump_breaking_inputs.py sha256=HM8k1w1CwIfZUCTYx7bAEZ9hFp2pomXvVdCZxOXuh6g 1709
dom_tokenizers/scripts/profile.py sha256=XfKUiJsolU7CTJlIOLUK2s9unhNV2Mrsv6BaOTZDEsc 3613
dom_tokenizers-0.0.17.dist-info/LICENSE sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ 11357
dom_tokenizers-0.0.17.dist-info/METADATA sha256=BMtLNEnU4J3py0OOKFwOPEda-D0mpkLEaHvtzN3YMDA 6190
dom_tokenizers-0.0.17.dist-info/WHEEL sha256=cpQTJ5IWu9CdaPViMhC9YzF8gZuS5-vlfoFihTBC86A 91
dom_tokenizers-0.0.17.dist-info/entry_points.txt sha256=xI70nTxYao_rXLbPuttooQKnH39PMr8vMHnwxIjD0q4 344
dom_tokenizers-0.0.17.dist-info/top_level.txt sha256=X-UAu4PqdfJbKTG9mxU_38NYLpVn4BTBmEVpDsKvI64 15
dom_tokenizers-0.0.17.dist-info/RECORD

top_level.txt

dom_tokenizers

entry_points.txt

diff-tokenizer = dom_tokenizers.scripts.diff:main
dump-breaking-inputs = dom_tokenizers.scripts.dump_breaking_inputs:main
dump-tokenizations = dom_tokenizers.scripts.dump:main
profile-tokenizer = dom_tokenizers.scripts.profile:main
tokenizer-diff = dom_tokenizers.scripts.diff:main
train-tokenizer = dom_tokenizers.train:main