dolma

View on PyPIReverse Dependencies (1)

1.0.14.post1 dolma-1.0.14.post1-cp39-none-win_amd64.whl
dolma-1.0.14.post1-cp39-none-win32.whl
dolma-1.0.14.post1-cp39-cp39-macosx_10_12_x86_64.whl
dolma-1.0.14.post1-cp39-cp39-manylinux_2_28_x86_64.whl
dolma-1.0.14.post1-cp39-cp39-manylinux_2_28_armv7l.whl
dolma-1.0.14.post1-cp39-cp39-manylinux_2_28_aarch64.whl
dolma-1.0.14.post1-cp39-cp39-macosx_11_0_arm64.whl
dolma-1.0.14.post1-cp38-none-win_amd64.whl
dolma-1.0.14.post1-cp38-none-win32.whl
dolma-1.0.14.post1-cp38-cp38-manylinux_2_28_x86_64.whl
dolma-1.0.14.post1-cp38-cp38-manylinux_2_28_armv7l.whl
dolma-1.0.14.post1-cp38-cp38-manylinux_2_28_aarch64.whl
dolma-1.0.14.post1-cp312-none-win_amd64.whl
dolma-1.0.14.post1-cp312-none-win32.whl
dolma-1.0.14.post1-cp312-cp312-macosx_10_12_x86_64.whl
dolma-1.0.14.post1-cp312-cp312-manylinux_2_28_x86_64.whl
dolma-1.0.14.post1-cp312-cp312-manylinux_2_28_armv7l.whl
dolma-1.0.14.post1-cp312-cp312-manylinux_2_28_aarch64.whl
dolma-1.0.14.post1-cp312-cp312-macosx_11_0_arm64.whl
dolma-1.0.14.post1-cp311-none-win_amd64.whl
dolma-1.0.14.post1-cp311-none-win32.whl
dolma-1.0.14.post1-cp311-cp311-macosx_10_12_x86_64.whl
dolma-1.0.14.post1-cp311-cp311-manylinux_2_28_x86_64.whl
dolma-1.0.14.post1-cp311-cp311-manylinux_2_28_armv7l.whl
dolma-1.0.14.post1-cp311-cp311-manylinux_2_28_aarch64.whl
dolma-1.0.14.post1-cp311-cp311-macosx_11_0_arm64.whl
dolma-1.0.14.post1-cp310-none-win_amd64.whl
dolma-1.0.14.post1-cp310-none-win32.whl
dolma-1.0.14.post1-cp310-cp310-macosx_10_12_x86_64.whl
dolma-1.0.14.post1-cp310-cp310-manylinux_2_28_x86_64.whl
dolma-1.0.14.post1-cp310-cp310-manylinux_2_28_armv7l.whl
dolma-1.0.14.post1-cp310-cp310-manylinux_2_28_aarch64.whl
dolma-1.0.14.post1-cp310-cp310-macosx_11_0_arm64.whl
dolma-1.0.14.post1-pp39-pypy39_pp73-manylinux_2_28_x86_64.whl
dolma-1.0.14.post1-pp39-pypy39_pp73-manylinux_2_28_armv7l.whl
dolma-1.0.14.post1-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl
dolma-1.0.14.post1-pp38-pypy38_pp73-manylinux_2_28_x86_64.whl
dolma-1.0.14.post1-pp38-pypy38_pp73-manylinux_2_28_armv7l.whl
dolma-1.0.14.post1-pp38-pypy38_pp73-manylinux_2_28_aarch64.whl
dolma-1.0.14.post1-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
dolma-1.0.14.post1-pp310-pypy310_pp73-manylinux_2_28_armv7l.whl
dolma-1.0.14.post1-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
1.0.1 dolma-1.0.1-cp39-none-win_amd64.whl
dolma-1.0.1-cp39-none-win32.whl
dolma-1.0.1-cp39-cp39-manylinux_2_28_x86_64.whl
dolma-1.0.1-cp39-cp39-manylinux_2_28_armv7l.whl
dolma-1.0.1-cp39-cp39-manylinux_2_28_aarch64.whl
dolma-1.0.1-cp38-none-win_amd64.whl
dolma-1.0.1-cp38-none-win32.whl
dolma-1.0.1-cp38-cp38-manylinux_2_28_x86_64.whl
dolma-1.0.1-cp38-cp38-manylinux_2_28_armv7l.whl
dolma-1.0.1-cp38-cp38-manylinux_2_28_aarch64.whl
dolma-1.0.1-cp313-cp313-manylinux_2_28_armv7l.whl
dolma-1.0.1-cp313-cp313-manylinux_2_28_aarch64.whl
dolma-1.0.1-cp312-none-win_amd64.whl
dolma-1.0.1-cp312-none-win32.whl
dolma-1.0.1-cp312-cp312-macosx_10_12_x86_64.whl
dolma-1.0.1-cp312-cp312-manylinux_2_28_x86_64.whl
dolma-1.0.1-cp312-cp312-manylinux_2_28_armv7l.whl
dolma-1.0.1-cp312-cp312-manylinux_2_28_aarch64.whl
dolma-1.0.1-cp312-cp312-macosx_11_0_arm64.whl
dolma-1.0.1-cp311-none-win_amd64.whl
dolma-1.0.1-cp311-none-win32.whl
dolma-1.0.1-cp311-cp311-macosx_10_12_x86_64.whl
dolma-1.0.1-cp311-cp311-manylinux_2_28_x86_64.whl
dolma-1.0.1-cp311-cp311-manylinux_2_28_armv7l.whl
dolma-1.0.1-cp311-cp311-manylinux_2_28_aarch64.whl
dolma-1.0.1-cp311-cp311-macosx_11_0_arm64.whl
dolma-1.0.1-cp310-none-win_amd64.whl
dolma-1.0.1-cp310-none-win32.whl
dolma-1.0.1-cp310-cp310-macosx_10_12_x86_64.whl
dolma-1.0.1-cp310-cp310-manylinux_2_28_x86_64.whl
dolma-1.0.1-cp310-cp310-manylinux_2_28_armv7l.whl
dolma-1.0.1-cp310-cp310-manylinux_2_28_aarch64.whl
dolma-1.0.1-cp310-cp310-macosx_11_0_arm64.whl
dolma-1.0.1-pp39-pypy39_pp73-manylinux_2_28_x86_64.whl
dolma-1.0.1-pp39-pypy39_pp73-manylinux_2_28_armv7l.whl
dolma-1.0.1-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl
dolma-1.0.1-pp38-pypy38_pp73-manylinux_2_28_x86_64.whl
dolma-1.0.1-pp38-pypy38_pp73-manylinux_2_28_armv7l.whl
dolma-1.0.1-pp38-pypy38_pp73-manylinux_2_28_aarch64.whl
dolma-1.0.1-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
dolma-1.0.1-pp310-pypy310_pp73-manylinux_2_28_armv7l.whl
dolma-1.0.1-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl

Wheel Details

Project: dolma
Version: 1.0.1
Filename: dolma-1.0.1-cp39-none-win_amd64.whl
Download: [link]
Size: 4807948
MD5: 5bb96dc8b3a27f1ab820f412f2d83f22
SHA256: c35d0e286658e708fe4df77298392b9413c61112f7915e44e3894960da6d8c41
Uploaded: 2024-02-07 18:50:39 +0000

dist-info

METADATA

Metadata-Version: 2.1
Name: dolma
Version: 1.0.1
Summary: Data filters
Author-Email: Allen Institute for Artificial Intelligence <contact[at]allenai.org>, Luca Soldaini <luca[at]soldaini.net>, Kyle Lo <kylel[at]allenai.org>, Rodney Kinney <rodneyk[at]allenai.org>, Aakanksha Naik <aakankshan[at]allenai.org>, Abhilasha Ravichander <abhilashar[at]allenai.org>, Akshita Bhagia <akshitab[at]allenai.org>, Dirk Groeneveld <dirkg[at]allenai.org>, Dustin Schwenk <dustins[at]allenai.org>, Ian Magnusson <ianm[at]allenai.org>, Khyathi Chandu <khyathic[at]allenai.org>
Maintainer-Email: Allen Institute for Artificial Intelligence <contact[at]allenai.org>
Project-Url: Homepage, https://github.com/allenai/dolma
License: Apache-2.0
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing
Classifier: Typing :: Typed
Requires-Python: >=3.8
Requires-Dist: anyascii (>=0.3.2)
Requires-Dist: blingfire (==0.1.8)
Requires-Dist: boto3 (>=1.28)
Requires-Dist: cached-path (>=1.5.1)
Requires-Dist: fasttext-wheel (==0.9.2)
Requires-Dist: fsspec (>=2023.6.0)
Requires-Dist: msgspec (>=0.14.2)
Requires-Dist: nltk (==3.8.1)
Requires-Dist: omegaconf (>=2.3.0)
Requires-Dist: LTpycld2 (==0.42)
Requires-Dist: pyyaml
Requires-Dist: requests
Requires-Dist: rich
Requires-Dist: s3fs (>=2023.6.0)
Requires-Dist: smart-open
Requires-Dist: tokenizers (<1.0.0,>=0.15.0)
Requires-Dist: tqdm
Requires-Dist: uniseg
Requires-Dist: numpy
Requires-Dist: necessary (>=0.4.3)
Requires-Dist: langdetect (>=1.0.9)
Requires-Dist: charset-normalizer (>=3.2.0)
Requires-Dist: black (>=22.6.0); extra == "dev"
Requires-Dist: flake8 (>=5.0); extra == "dev"
Requires-Dist: flake8-pyi (>=22.8.1); extra == "dev"
Requires-Dist: Flake8-pyproject (>=1.1.0); extra == "dev"
Requires-Dist: ipdb (>=0.13.0); extra == "dev"
Requires-Dist: ipython (>=8.4.0); extra == "dev"
Requires-Dist: isort (>=5.10.1); extra == "dev"
Requires-Dist: mypy (>=0.971); extra == "dev"
Requires-Dist: pytest (>=5.2); extra == "dev"
Requires-Dist: detect-secrets (==1.4.0); extra == "code"
Requires-Dist: beautifulsoup4 (>=4); extra == "code"
Requires-Dist: pygments; extra == "code"
Requires-Dist: regex; extra == "code"
Requires-Dist: presidio_analyzer (==2.2.32); extra == "pii"
Requires-Dist: regex; extra == "pii"
Requires-Dist: dolma[dev]; extra == "all"
Requires-Dist: dolma[code]; extra == "all"
Requires-Dist: dolma[pii]; extra == "all"
Provides-Extra: dev
Provides-Extra: code
Provides-Extra: pii
Provides-Extra: all
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
License-File: LICENSE
[Description omitted; length: 3495 characters]

WHEEL

Wheel-Version: 1.0
Generator: maturin (1.4.0)
Root-Is-Purelib: false
Tag: cp39-none-win_amd64

RECORD

Path Digest Size
dolma-1.0.1.dist-info/METADATA sha256=Fa7OiQH5vPhssfjFcZGziLqLsEn8I85KPkqBhO_rDk4 6373
dolma-1.0.1.dist-info/WHEEL sha256=aUq0YpgMqG5VWoJyiSBR74qbEJt2FULFr1Rdxybsy1A 94
dolma-1.0.1.dist-info/entry_points.txt sha256=ujDqnT7r2W8ndsvrIIkFajfDuLyv3VmZwwBQ4YjE3fM 48
dolma-1.0.1.dist-info/license_files/LICENSE sha256=HrhfyXIkWY2tGFK11kg7vPCqhgh5DcxleloqdhrpyMY 11558
dolma/cli/analyzer.py sha256=KMoHqtg3aG2alym-UZOJ6vhmO49e_p-_PaoW9Id_TtY 3115
dolma/cli/deduper.py sha256=DHzBHMRRqGfiWfryfvuf0u3OuYFDsbHSUMeBpfXFFhk 9597
dolma/cli/mixer.py sha256=frmzWj8teVzVjF0xbCiR_Adcu0Sqyys3Xw670i0uF0Q 6972
dolma/cli/resolvers.py sha256=vsDlIoMdchh1UGWDWo4SGfSX4usLV4-nUCPYoTazJOs 840
dolma/cli/shared.py sha256=vPgIJ6Ikd-6x5ST7dxnWiDbXvT3KK-4GvGhqS8RkhW8 1321
dolma/cli/tagger.py sha256=0E5EraN2y8qDzzm5hgTg8dA1SN8JstiQF9K1vse-Its 6368
dolma/cli/tokenizer.py sha256=YZyudXzfs0-zmLYhgL9W1iyemRY2pRBgQSyxC3XwPKY 8249
dolma/cli/__init__.py sha256=qEK0aAS35Q5jga9sNGXYidW47H5bOIMAj9_dTz9Z-a8 6863
dolma/cli/__main__.py sha256=LpMujeI2JkU7TPHnyycI11XDxsFnjZNCW9591ZmSW4Q 2992
dolma/core/analyzer.py sha256=bEBQnUI-Wtflvc5izGws99ysl0JcC-jNXbSjotKj2qo 11256
dolma/core/binning.py sha256=lmsiCEs7fRgw8j0MjMeOUdJGFuUMzP7PQFdSnwgGHqw 11447
dolma/core/data_types.py sha256=H0gAHaeq6bVBuhO2nY0ZmthduSgIr2RDkwgEIZ8BqXo 8470
dolma/core/errors.py sha256=uHFdOIEVqka7SBUqpPzpg7Q4HYPeuJIDLvF0IiJA0VM 505
dolma/core/ft_dataset.py sha256=MwRmAzXnxmgeKqg27AUZT3i7Osve9dlcMNlTxQBKk10 6605
dolma/core/ft_tagger.py sha256=rx06zjLS_d4zCfvJ5E1ShO1g01t6ElTuvy4McgvliWE 5714
dolma/core/loggers.py sha256=5HtbDmNuQ3LXu2iyEEG4jaIvv7xe2OywKpU1Oi_ZMho 588
dolma/core/parallel.py sha256=dHLwb5xUUYkEEAmJ0VnJBEZ2DytfTWGtlje2aqbGdD4 17469
dolma/core/paths.py sha256=TrTpGq1A0bpAKixT2egNs02XRgh7go7DKVABa63ZlKo 9374
dolma/core/registry.py sha256=wCmkPW1Q12kujUfoU_QJG5jyd_OOP9L23uyYXIF3BWU 1958
dolma/core/runtime.py sha256=Qm14JTUs2APTGoZmt6rqh-y0alQ1nAE8YoAqWM9xthY 19675
dolma/core/taggers.py sha256=hQ0EDIasL_DJgPYCclWXX9GEQoQ4MqncR8NK9ZSnruI 2072
dolma/core/trainer.py sha256=Dq68dy_WBUQQS6vSKS0XZ87CBO3GGqLqUsoaV21UHSQ 49
dolma/core/utils.py sha256=OumOy-kFY8G48OtTcO5nLDFMWnRtgpAAQn_z4nRITfs 5251
dolma/core/vizualizer.py sha256=hrNqgO3-aq1g0GBPC2GHsWPPFg9yDVczcWb2Zt4ww-A 9764
dolma/core/__init__.py sha256=PxQ2G89gXVbL72BbneTGblCi-QAoB64TtRt41NnmI2E 230
dolma/data/ext_to_lang_mapping.json sha256=GDfONldIZpRmUwzyQ6raYbB8405TF9mRmJlnD31fTEE 19477
dolma/data/naughty_words_en.txt sha256=sFPpdy2XnolVqyZ_jvVyfDdfpId0cXqFKUmTbnIG2aU 4180
dolma/py.typed sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
dolma/taggers/c4.py sha256=DJF1N20JPvsAcWf7EAm4EPhDI8gYFTL_kYexWc3SNRU 4857
dolma/taggers/code/code_taggers.py sha256=o7Fvdb3APPnNSZuHxKl-itbl8Fy3oQWRz6b_55hI6K8 9904
dolma/taggers/code/starcoder.py sha256=otOe4RkB6rbjq7Ii0sTQSYKZVVijZodFJUs-F7bVaB0 4313
dolma/taggers/code/utils.py sha256=LI7RJVQP18fVM5OEtLYR7PTYhvAznIWAWsdRBEDby1Q 2212
dolma/taggers/code/__init__.py sha256=3lP_OOvIU5iMjHwjjMOy4OgXemp4pCx_HVSmtGiol0E 324
dolma/taggers/gopher.py sha256=nITf_A9OKaMAmCrkmiUyWTnWfQz75aFw5KEUS-pNmEQ 7873
dolma/taggers/jigsaw.py sha256=_FhLnJL-JlQhtKZeVENemW-F0HCu02kVJK6yWkSfv0M 1943
dolma/taggers/language.py sha256=3J1Xc2ADvdbQUn-LyAe-F42TI8ZAk_9rNdCZ5FJktD8 9058
dolma/taggers/length.py sha256=wA9xszFZOCCaaClIy0EonSE6zE4oQKhn0gsZ4RqRjPA 5814
dolma/taggers/pii.py sha256=SIQyulmpVEWXZnbqzrl10asZaTo6mdugDoBgrNh0sug 11002
dolma/taggers/punctuation.py sha256=Z4L6hWCWxsoCyJI8g-fGeXG1CNPBhOoPSAiqeLI-Kt4 1267
dolma/taggers/repetitions/repetitions_taggers.py sha256=XlZsLuFtqcBoQUGtpTRKjoQ0i7PmaqSRpHz63hQZo5M 6674
dolma/taggers/repetitions/utils.py sha256=hr0tDuGo32ZnTK_Myb1qy1ra_jYOVRGpyGbWwbHbvLI 5225
dolma/taggers/repetitions/__init__.py sha256=iE-Q__6IPoOJSRDEdwFc5mYAW8ztvafHbWPbTOMbM6Y 329
dolma/taggers/sampling.py sha256=nWY112ZvpogukfQw0rKGcLf38p2gpQOAsl1PRcFmo2Q 764
dolma/taggers/tokenizers.py sha256=B7xpBZYv94WFMkbshLqfM8xJW9EBwJZtzH2pmmWBoSc 1078
dolma/taggers/__init__.py sha256=VEtP6S88YPv1h0VW0UuG0zzo2iu3-D_Mqr85YI2H2uY 172
dolma/tokenizer/data_types.py sha256=NGNAFDJT12E8f9Z3OVzDfNiN6IZ5_SG8nfRDrioCnak 1264
dolma/tokenizer/executor.py sha256=yN2wKK9v9HtqX8Z4AKjcEEGdAPGumOnhoGTSD0XR1x8 11705
dolma/tokenizer/memmap_writer.py sha256=2NvL4WKXDbKveHdaxlOrQwdSlz7w6M4JSH0WwqJvHuc 8796
dolma/tokenizer/tokenizer.py sha256=OYtrIo0ecLlQrtmCZEgGCwzDsd6dg1IL51EK5kqcdhM 13074
dolma/tokenizer/__init__.py sha256=-gA-Nhsg_tif5fCVcGyow4LIo3CQZ3uN36bXTfIuPyc 245
dolma/tokenizer/__main__.py sha256=fgal1qkOIyy15cbSpREunPjjsJJ2koTBsNiq12gFoy4 1046
dolma/__init__.py sha256=5AuQ8kyX6hpqFwWklBIBHtymstdjMuG2RLvHK9p7WXY 1299
dolma/dolma.cp39-win_amd64.pyd sha256=9NDlXaNlU5N1OIPNCT_WwmOIHM-4yVpjzvUYoLAjpm8 12542976
dolma-1.0.1.dist-info/RECORD

entry_points.txt

dolma = dolma.cli.__main__:main