Reverse Dependencies of datasketch
The following projects have a declared dependency on datasketch:
- annime — ANN-interface library
- cocoon-data — Cocoon is an open-source project that aims to free analysts from tedious data transformations with LLM.
- datamart-isi — USC ISI implementation of D3M Datamart API
- distilabel — Distilabel is an AI Feedback (AIF) framework for building datasets with and for LLMs.
- findopendata — A search engine for Open Data.
- formless — Handwritten + image OCR.
- guanciale — Grab information needed by Carbonara
- hf-clean-benchmarks — This repository contains code for cleaning your training data of benchmark data to help combat data snooping.
- HogProf — Phylogenetic Profiling with OMA and minhashing
- icas — Tool for labeling images
- impruver — Transformer based LLM trainer
- lbster — Language models for Biological Sequence Transformation and Evolutionary Representation.
- Lotte — Lotte is a tool for quotation detection in texts and can deal with common properties of quotations, for example, ellipses or inaccurate quotations.
- nlp-dedup — Remove duplicates and near-duplicates from text corpora, no matter the scale.
- nonebot-plugin-vividusfakeai — 模仿群友的聊天!
- PolyDeDupe — no summary
- pyoma — library to interact and build OMA hdf5 files
- qbindiff — QBindiff binary diffing tool based on a Network Alignment problem
- Quid — Quid is a tool for quotation detection in texts and can deal with common properties of quotations, for example, ellipses or inaccurate quotations.
- random-renormalization-group — The package for random renormalization group
- scandi-reddit — Construction of a Scandinavian Reddit dataset.
- scar-tool — SCAR: An AI-powered tool for ranking and filtering instruction-answer pairs based on writing quality and style consistency
- Sketch — Compute, store and operate on data sketches
- squeakily — A library for squeakily cleaning and filtering language datasets.
- textanalyzer — TextAnalyzer is a Python package that allows you to quickly and easily understand complex text data.
- tiny-elephant — In memory based collaborative filtering
1