datatrove

View on PyPIReverse Dependencies (0)

0.3.0 datatrove-0.3.0-py3-none-any.whl
0.0.1.dev0 datatrove-0.0.1.dev0-py3-none-any.whl

Wheel Details

Project: datatrove
Version: 0.0.1.dev0
Filename: datatrove-0.0.1.dev0-py3-none-any.whl
Download: [link]
Size: 2305
MD5: cd4c7e52de3e2b8159c09ffedd84e499
SHA256: 6ee4652f936197d3ad3e6a831827dfb788fd3b1d2cad31434caec39743f5cf63
Uploaded: 2023-12-06 12:11:12 +0000

dist-info

METADATA

Metadata-Version: 2.1
Name: datatrove
Version: 0.0.1.dev0
Summary: HuggingFace library to process and filter large amounts of webdata
Author: HuggingFace Inc.
Author-Email: guilherme[at]huggingface.co
Home-Page: https://github.com/huggingface/datatrove
License: Apache 2.0
Keywords: data machine learning processing
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.7.0
Requires-Dist: boto3 (==1.28.78)
Requires-Dist: cchardet (==2.1.7)
Requires-Dist: inscriptis (==2.3.2)
Requires-Dist: loguru (==0.7.0)
Requires-Dist: multiprocess (==0.70.14)
Requires-Dist: nltk (==3.8.1)
Requires-Dist: numpy (==1.25.0)
Requires-Dist: python-magic (==0.4.27)
Requires-Dist: trafilatura (==1.6.1)
Requires-Dist: warcio (==1.7.4)
Requires-Dist: zstandard (==0.21.0)
Requires-Dist: pyarrow (==12.0.1)
Requires-Dist: tokenizers (==0.13.3)
Requires-Dist: tldextract (==3.4.4)
Requires-Dist: pandas (==2.0.3)
Requires-Dist: backoff (==2.2.1)
Requires-Dist: fsspec (==2023.9.2)
Requires-Dist: humanize (==4.8.0)
Requires-Dist: rich (==13.7.0)
Requires-Dist: black (~=23.1); extra == "dev"
Requires-Dist: pre-commit (>=3.3.3); extra == "dev"
Requires-Dist: pytest (>=7.2.0); extra == "dev"
Requires-Dist: pytest-timeout; extra == "dev"
Requires-Dist: pytest-xdist; extra == "dev"
Requires-Dist: ruff (<=0.0.259,>=0.0.241); extra == "dev"
Provides-Extra: dev
Description-Content-Type: text/markdown
[Description omitted; length: 221 characters]

WHEEL

Wheel-Version: 1.0
Generator: bdist_wheel (0.42.0)
Root-Is-Purelib: true
Tag: py3-none-any

RECORD

Path Digest Size
datatrove/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
datatrove-0.0.1.dev0.dist-info/METADATA sha256=hFC2b6_igbxHHM8OgDBpOAvyfxHZOnEXGwhTIl9CI04 2094
datatrove-0.0.1.dev0.dist-info/WHEEL sha256=oiQVh_5PnQM0E3gPdiz09WCNmwiHDMaGer_elqB3coM 92
datatrove-0.0.1.dev0.dist-info/entry_points.txt sha256=ZkGdrhsBGEDsb3xugIYN5pZEmO1WcgsF41w0cDmWECE 283
datatrove-0.0.1.dev0.dist-info/top_level.txt sha256=EA6CAg36D1YzT-oXcFMx0ImOvfCRDHZasZHwxcjFXSQ 10
datatrove-0.0.1.dev0.dist-info/RECORD

top_level.txt

datatrove

entry_points.txt

check_dataset = datatrove.tools.check_dataset:main
failed_logs = datatrove.tools.failed_logs:main
inspect_data = datatrove.tools.inspect_data:main
launch_pickled_pipeline = datatrove.tools.launch_pickled_pipeline:main
merge_stats = datatrove.tools.merge_stats:main