trl-fpo

View on PyPIReverse Dependencies (0)

0.0.15 trl_fpo-0.0.15-py3-none-any.whl

Wheel Details

Project: trl-fpo
Version: 0.0.15
Filename: trl_fpo-0.0.15-py3-none-any.whl
Download: [link]
Size: 296675
MD5: 03146be5481886f167ff584a867d95da
SHA256: b806938b8de6535fef90ab7c2ee03a4faeaf996a76d1f0076009140515e09dfe
Uploaded: 2025-01-18 07:47:07 +0000

dist-info

METADATA

Metadata-Version: 2.1
Name: trl-fpo
Version: 0.0.15
Summary: Train transformer language models with reinforcement learning.
Author: Rajarshi, Gurpreet, Danush
Author-Email: royrajarshi0123[at]gmail.com
Home-Page: https://github.com/huggingface/trl
License: Apache 2.0
Keywords: ppo,transformers,huggingface,gpt2,language modeling,rlhf
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Python: >=3.7
Requires-Dist: torch (>=1.4.0)
Requires-Dist: transformers (>=4.31.0)
Requires-Dist: numpy (<2.0.0,>=1.18.2)
Requires-Dist: accelerate
Requires-Dist: datasets
Requires-Dist: tyro (>=0.5.11)
Requires-Dist: peft
Requires-Dist: wandb
Requires-Dist: deepspeed
Requires-Dist: stanza
Requires-Dist: nltk
Requires-Dist: scipy
Requires-Dist: wandb; extra == "benchmark"
Requires-Dist: ghapi; extra == "benchmark"
Requires-Dist: openrlbenchmark (==0.2.1a5); extra == "benchmark"
Requires-Dist: requests; extra == "benchmark"
Requires-Dist: deepspeed; extra == "benchmark"
Requires-Dist: deepspeed (>=0.9.5); extra == "deepspeed"
Requires-Dist: parameterized; extra == "dev"
Requires-Dist: peft (>=0.8.0); extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-xdist; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: scikit-learn; extra == "dev"
Requires-Dist: Pillow; extra == "dev"
Requires-Dist: diffusers (>=0.18.0); extra == "dev"
Requires-Dist: deepspeed (>=0.9.5); extra == "dev"
Requires-Dist: wandb; extra == "dev"
Requires-Dist: ghapi; extra == "dev"
Requires-Dist: openrlbenchmark (==0.2.1a5); extra == "dev"
Requires-Dist: requests; extra == "dev"
Requires-Dist: deepspeed; extra == "dev"
Requires-Dist: bitsandbytes (<=0.41.1); extra == "dev"
Requires-Dist: openai (>=1.23.2); extra == "dev"
Requires-Dist: huggingface-hub (>=0.22.2); extra == "dev"
Requires-Dist: llm-blender (>=0.0.2); extra == "dev"
Requires-Dist: diffusers (>=0.18.0); extra == "diffusers"
Requires-Dist: openai (>=1.23.2); extra == "llm-judge"
Requires-Dist: huggingface-hub (>=0.22.2); extra == "llm-judge"
Requires-Dist: llm-blender (>=0.0.2); extra == "llm-judge"
Requires-Dist: peft (>=0.8.0); extra == "peft"
Requires-Dist: bitsandbytes (<=0.41.1); extra == "quantization"
Requires-Dist: parameterized; extra == "test"
Requires-Dist: peft (>=0.8.0); extra == "test"
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-xdist; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Requires-Dist: scikit-learn; extra == "test"
Requires-Dist: Pillow; extra == "test"
Provides-Extra: benchmark
Provides-Extra: deepspeed
Provides-Extra: dev
Provides-Extra: diffusers
Provides-Extra: llm_judge
Provides-Extra: peft
Provides-Extra: quantization
Provides-Extra: test
Description-Content-Type: text/markdown
License-File: LICENSE
[Description omitted; length: 9302 characters]

WHEEL

Wheel-Version: 1.0
Generator: bdist_wheel (0.37.1)
Root-Is-Purelib: true
Tag: py3-none-any

RECORD

Path Digest Size
tests/slow/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
tests/slow/test_dpo_slow.py sha256=glhJ8jow5TancfcxbplUqmdr_5D3UwrNm5t_yEjjFL8 7689
tests/slow/test_sft_slow.py sha256=HFWXeo_vefuxY9MUTZeYQISgtPwjBo0hPirsoyogzsE 14645
tests/slow/testing_constants.py sha256=9EEKDsVxlW2WgvK4-kpcdfHci0WA8WmsBDruL4zH4fY 1077
trl/__init__.py sha256=QF-0lI0LAVV3-6NVyh0cHyCviLB_5ujGddUAHfD9R9I 5079
trl/core.py sha256=YUv7ERnrDjm8ZDmFamvqcJ-IUeAho7aScEwW-IwoW4Q 12157
trl/env_utils.py sha256=56zu6hKknW6p7sQoNbEu0Xcb8qgJJN2GSu8gOiBsxb4 1394
trl/import_utils.py sha256=5sz98Z-XMeD-1jgQBALqiVvWyvPrqBRuyyjnhBIOJAA 6330
trl/commands/__init__.py sha256=CL5PGOOXVm1mbUSlNoEpQ2oRKfNIE5KwC-yzToHDehU 1155
trl/commands/cli.py sha256=PRedBq0LGTwK9EpBl2Givu426DdbBSaOmIUDT7n5lXI 2453
trl/commands/cli_utils.py sha256=7One8MonzJeX5QjgrFpPii-bSrMH7Hn1ydUSI0M9jvQ 11451
trl/commands/scripts/alignprop.py sha256=38bfg-sKkUuttuj-Nvj8DUUaagsaETopHVUiQ-w2fOg 4064
trl/commands/scripts/bco.py sha256=lDuQ5CBQWZ-WiNkG_YFK16bmx_AdLjeUJ71-XZ9_Ol4 7898
trl/commands/scripts/chat.py sha256=kj7eWCu2t_CSG89o0d5CrUFv_0pab28_Qw3vNKuSoPw 13229
trl/commands/scripts/cpo.py sha256=2Xmiy5AJQk7wosRwXh73O5501z_TQa9u6NZ8LjiXXAM 4217
trl/commands/scripts/ddpo.py sha256=39rI0qzrP3gtQPAnWitdCTH8pfH4V2e4lS6YkuM3gg8 6500
trl/commands/scripts/dpo.py sha256=VNtdLsVIObTX4hWreQE0nrRJG91f00_8satyA9rxEwc 6297
trl/commands/scripts/dpo_visual.py sha256=tbKo2NYVRneET49Fie3hWSLlxbIhYpg0sCjtAGF3dHM 6757
trl/commands/scripts/kto.py sha256=ecRBSNtrIduqRM18s7XTf-CXEDdwb3IoXnvzfU1GQM8 4171
trl/commands/scripts/online_dpo.py sha256=0iInrfuw3mKjgUDvrPeT-5FS6xmtbXTCo6expJhOgYI 4586
trl/commands/scripts/orpo.py sha256=-RlKmErt1vjFtjivcqKeh8taK-YlqHwSp2qQVsj1SuQ 4372
trl/commands/scripts/ppo.py sha256=6K0JnkeQ4iXo16gYD7Q5AERDwRqiU49ZQBMgmexS1LE 7855
trl/commands/scripts/ppo_multi_adapter.py sha256=YAZ3n3j_1nYu4-DftA56NGtIC1sIOQ245kpmPQ8TUR4 5672
trl/commands/scripts/reward_modeling.py sha256=zQRCmBsfpx0eIj9t0pNmPiq0NC5ss2fsklbXm483Hs8 5067
trl/commands/scripts/sft.py sha256=4-wHRa6M-p6SNjGhSaJ7ZS6cdSnMmWLQb5gt_1E8mCY 4651
trl/commands/scripts/vsft_llava.py sha256=MBbqBPbb1ayeg7fXt-8taB5mWJ65p9DeNaifaQ8w9Kg 5547
trl/commands/scripts/config/default_chat_config.yaml sha256=EUJ3XAK7_YbN3EzlTqBQRov71u0BoQEmSs6-qW39kCM 487
trl/environment/__init__.py sha256=uBxjkEmKJItWVB2I8nv1FmMU12PNVVamuN6sfGmXU6c 390
trl/environment/base_environment.py sha256=H4O2xHvG7wVogdqNan5KpzRGf8O25hbQYSO8xIxd9cY 17569
trl/extras/__init__.py sha256=ajCQ4__JlCJaTy-EHMowYAnRnFPuCrHaxlekGjSsLRU 971
trl/extras/best_of_n_sampler.py sha256=Wq6ylNxpB-OYp3s761cfZWAe3IPpuArt9QzKMsPTleI 5182
trl/extras/dataset_formatting.py sha256=FGJTuEDc8K4y2aEKXPR1rvyv0WdpDCIaL4rGOb8a22M 3646
trl/models/__init__.py sha256=HOolk_HhpjxEWoUwVGzk4hvaHAVVifLFvECKBMy-FXw 2208
trl/models/auxiliary_modules.py sha256=wKrXQL6z7MRroAZsOH5KhvN4o34CoNyejUeLbph1PJQ 3323
trl/models/modeling_base.py sha256=7IQgBx1a957nC0RNXSlzNh33b2dnKDdEhfOBUcAe8SI 28960
trl/models/modeling_sd_base.py sha256=gi4sYz3xRGsrppaHlQnHAWuHuWCl1Aj2ceDUfSKqRaE 42247
trl/models/modeling_value_head.py sha256=qbn9JG6rxBa7SamyGE_wZfiALW4Omedne0SD_pnbnHI 19384
trl/models/sd_utils.py sha256=EhTaUSJNwUcopoK6tcg46VJ027FwdetgYyswiJ0NG_g 5874
trl/models/utils.py sha256=Q2Ck0mw1A3_Xd9a6BGymYyltNPIy__Xmp5N8yZkFdO0 6459
trl/trainer/__init__.py sha256=_mvSFDf3FYzbxDP6EYmVjqHn4K9S6SP3s7FUe6MxtZk 4820
trl/trainer/alignprop_config.py sha256=iSs28udDb1As4HHJCKeC2iKE6lS558Wrk902FAV5pyw 4243
trl/trainer/alignprop_trainer.py sha256=1-KaPoGlYmLtBU1OjxXsfZ9c4Ykh6rgdOFX5MUknxHA 17286
trl/trainer/base.py sha256=51YAcpsjJ8ghVbfhBXcP7BT0_yXnmhj0G_zfZhl3guE 1772
trl/trainer/bco_config.py sha256=s0PzCTLTexi5YJHQ_M6Zk0eBClCrMK4KFSJVrjLV7ec 5294
trl/trainer/bco_trainer.py sha256=PBi-8pvr-jmILeibS8gTvl7Wb9kpb8nOIvqq2ToP3Qg 68861
trl/trainer/callbacks.py sha256=8Ltt9cnUQDDRGdHW13ct9qiMmlf2azkVVCMsrJINYog 9622
trl/trainer/cpo_config.py sha256=Z_afm3lUbKSRbQTcNYG5uzhtauFac5TuTpRr4HoksOU 4440
trl/trainer/cpo_trainer.py sha256=2Ta18xaCPeU-vZ1Bjwxy2Qa5FTSxfwRvsmXyF_01jTs 46227
trl/trainer/ddpo_config.py sha256=AXaRI_mGgNewsLggQm9QyINFauNruNHqpEKqeOw-HdI 4891
trl/trainer/ddpo_trainer.py sha256=1r0OoNBbFzIuSlgV57gR00DSkz8MZHFmCpO86OuLeoc 26823
trl/trainer/dpo_config.py sha256=Wytvb7vPPmn1gfZCcUmKhOIJTYEY9YkafWo-T4HUYLw 9163
trl/trainer/dpo_trainer.py sha256=ytz_lswKkEI7kzK52mx87mxKMwCqnSt5B8EnwrmWXTY 80499
trl/trainer/fpo_dpo_trainer.py sha256=rbiFNLwJKd_3bVAiVeCWPwx4Cu-bFbmVr3X_wrKmW_k 88761
trl/trainer/iterative_sft_trainer.py sha256=gA4F1D3LaVJOdKHgpW5RAzNYDEgPy0YHVdNCvaAE8uI 17534
trl/trainer/judges.py sha256=B56I8mdLiZ9GgDeLxjPWzvJYIqrYbE3fNHRtP2tDoNg 12312
trl/trainer/kto_config.py sha256=kZMsqu8xynEf8ZkH2D6YTVKlGMUcZmUtfXVypaQCPa0 4970
trl/trainer/kto_trainer.py sha256=3Bme32dWmC4DVIENVl9bobVdXBd5W49oGN6dOplUBt0 68179
trl/trainer/model_config.py sha256=ux-liNHurgpt--2DSnuKEkgURmXXCufGirCGsrzC4xY 3697
trl/trainer/online_dpo_config.py sha256=x0DN2V_qoKnLIVbcjwWf4Xb74EcYCgVyZ9LdWJ-bUvE 708
trl/trainer/online_dpo_trainer.py sha256=_6LCMJkpk9-H77oakB8xpq_ynVZInUabCkcFUV9l5bM 30030
trl/trainer/orpo_config.py sha256=Jd0P6XiNjIVkNMXM3fUM0_xdccyi67bUplHO4USY2Yk 3500
trl/trainer/orpo_trainer.py sha256=zLzC7CCgl9IFceLdWop3YXwf5aXcAbCIrXcOgaMeIsA 46909
trl/trainer/ppo_config.py sha256=vVZDYV9Zw_2Gg0kqYQiU2p7ycaX_cruRXcUKapMRa5Q 8289
trl/trainer/ppo_trainer.py sha256=3suunbBNWDjXSdGzoGLhXZjSUd0-5E6p6A7v0S7t800 63900
trl/trainer/ppov2_config.py sha256=_SHS_xULPE8fC1bI7i0Zr2Ktj9gZ58pZheWRm7-r2J0 850
trl/trainer/ppov2_trainer.py sha256=BJb3q3nC43pfS52ecUAPX01vYQLigbI9PIbxfdkvj0U 32085
trl/trainer/reward_config.py sha256=q-mjilg_ScuHnSbMkJf-K92nUO9dqa5q1oxD-6KgKqU 1893
trl/trainer/reward_trainer.py sha256=ipdM4lEGKPqRzaEskhlgjFbFiMomm-kuuc-5iC9bXJY 16847
trl/trainer/rloo_config.py sha256=P_x_Tyywi237nRafCBWEdUkfuCLWTOlkUtd82s2oIcQ 710
trl/trainer/rloo_trainer.py sha256=Bm8mymlJEvLb-SFNrWcfLhJurNTujcb6BUXvAM-RsYM 26794
trl/trainer/sft_config.py sha256=XC8n5_Hsa9PKXc0c5wHqXMILWyBET6twfr7FwOM0WYY 3735
trl/trainer/sft_trainer.py sha256=yiGmLjarR3GPItojBDzQYv7ob3yuifqQHflCvYdPBH0 31359
trl/trainer/utils.py sha256=kl58kHcPipiKVeZwpInqKfs7BrOa96d2PDr-2p8fpMo 50932
trl_fpo-0.0.15.dist-info/LICENSE sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ 11357
trl_fpo-0.0.15.dist-info/METADATA sha256=955bZn6PpgbN0LpostEDlhZG8SzIWI0LQ4SbObmt8TU 12593
trl_fpo-0.0.15.dist-info/WHEEL sha256=G16H4A3IeoQmnOrYV4ueZGKSjhipXx8zc8nu9FGlvMA 92
trl_fpo-0.0.15.dist-info/entry_points.txt sha256=hebJnzwv4Upjxsw81Jn3vG6ll8wS0NTz-hQI_ynajeA 50
trl_fpo-0.0.15.dist-info/top_level.txt sha256=TUKDSfcN6PgBU9dnJ6_YJ7qhDzZ0CKzHvHehR6GKpWg 10
trl_fpo-0.0.15.dist-info/RECORD

top_level.txt

tests
trl

entry_points.txt

trl_fpo = trl.commands.cli:main