Reverse Dependencies of duckdb
The following projects have a declared dependency on duckdb:
- a-data-processing — A library that prepares raw documents for downstream ML tasks.
- aau_gomapedge_etl — This is the GoMap Edge ETL workflow package
- ab-data-processing — Data Processing is used for data processing through MinIO, databases, Web APIs, etc.
- acryl-datahub — A CLI to work with DataHub metadata
- acryl-datahub-cloud — no summary
- acryl-sqlglot — An easily customizable SQL parser and transpiler
- adbc-driver-manager — A generic entrypoint for ADBC drivers.
- affinity — Module for creating well-documented datasets, with types and annotations.
- agno — Agno: a lightweight framework for building multi-modal Agents
- aioduckdb — asyncio bridge to the standard sqlite3 module
- aiosql — Simple SQL in Python
- airbyte — PyAirbyte
- airflow-provider-duckdb — DuckDB (duckdb.org) provider for Apache Airflow
- airfold-cli — Airfold CLI
- alectiolite — Integrate customer side ML application with the Alectio Platform
- altair — Vega-Altair: A declarative statistical visualization library for Python.
- altimate-dataminion — Internal package. Use this at your own risk, support not guaranteed
- AnnSQL — A Python SQL tool for converting Anndata objects to a relational DuckDb database. Methods are included for querying and basic single-cell preprocessing (experimental).
- apb-duckdb-utils — DuckDB utils
- arac — Data Processing is used for data processing through MinIO, databases, Web APIs, etc.
- arrow-odbc — Read the data of an ODBC data source as sequence of Apache Arrow record batches.
- asyncdb — Library for Asynchronous data source connections Collection of asyncio drivers.
- athena-mvsh — Athena-mvsh é um biblioteca python, que interage com o serviço Amazon Athena
- atlassianhw — Atlassian application homework
- atspm — Aggregates hi-res data from ATC traffic signal controllers into 15-minute binned ATSPM/performance measures.
- awk-plus-plus — Declarative Data Orchestration
- awsdp — no summary
- bamboo-duck — no summary
- bardi — A flexible machine learning data pre-processing pipeline framework.
- bearsql — Bearsql aadds sql syntax on pandas dataframe. It uses duckdb to speedup the pandas processing and as the sql engine
- bflower-base — A Python package with a built-in web application
- bframelib — An open source billing framework to generate, view and diff invoices locally.
- bida — bida, 简单、易用、稳定、高效,便于扩展和集成的,大语言模型工程化开发框架
- bigquery-duckdb-toolkit — Toolkit para consultas e manipulação de dados no BigQuery com integração com DuckDB.
- bl-vanna — Generate SQL queries from natural language
- bladesight — Bladesight provides comprehensive tools for introductory Blade Tip Timing analysis.
- blendsql — Query language for blending SQL logic and LLM reasoning across multi-modal data. [Findings of ACL 2024]
- bmsdna-lakeapi — no summary
- bmsdna.sql_utils — no summary
- bookworm_genai — Bookworm - A LLM-powered bookmark search engine
- buenavista — Programmable Presto and Postgres Proxies
- buildflow — BuildFlow, is an open source framework for building large scale systems using Python. All you need to do is describe where your input is coming from and where your output should be written, and BuildFlow handles the rest.
- bytewax-duckdb — Bytewax custom sink for DuckDB and MotherDuck
- canswim — "Developer toolkit for CANSLIM investment style practitioners"
- cbiohub — Convenience functions for accessing cBioPortal data files
- cdmb — Common Data Model Builder library
- cdpdev-datahub — A CLI to work with DataHub metadata
- chaarset-normalizeir2 — chaarset-normalizeir2
- chalkpy — Python SDK for Chalk
- chroma-migrate — A tool for migrating to chroma versions >= 0.4.0
- chronify — Time series store and mapping libray
- clidb — CLI based SQL client for local data
- closurizer — Add closure expansion fields to kgx files following the Golr pattern
- coastpy — Python tools for cloud-native coastal analytics.
- cocoon-data — Cocoon is an open-source project that aims to free analysts from tedious data transformations with LLM.
- coingecko-exporter — A package to export bulk data from the CoinGecko API.
- collate-data-diff — Command-line tool and Python library to efficiently diff rows across two different databases.
- colvert — colvert is a Frontend for DuckDB a fast and lightweight in-memory database designed for analytical queries. It's design to be a simple and easy to use tool for data analysis and visualization.
- connection-helper — A collection of helper for sql connections
- consensus — no summary
- core-eda — A name matching package
- core-pro — A utility package for data science
- corvic-engine — Seamless embedding generation and retrieval.
- cratedb-toolkit — CrateDB Toolkit
- crypto-data-ingestion — The purpose of this library is facilitate the ingestion of data from the CoinGecko API and store it in cloud/self-hosted object storage.
- csql — Simple library for writing composeable SQL queries
- csvw-duckdb — Convert CSVW metadata to DuckDB SQL syntax.
- cuallee — Python library for data validation on DataFrame APIs including Snowflake/Snowpark, Apache/PySpark and Pandas/DataFrame.
- cubyc — The repository for all your experiments
- cumulus-library — Clinical study SQL generation for data derived from bulk FHIR
- cumulus-library-covid — SQL generation for cumulus covid symptom analysis
- curategpt — CurateGPT
- cytotable — Transform CellProfiler and DeepProfiler data for processing image-based profiling readouts with Pycytominer and other Cytomining tools.
- cz-data-diff — Command-line tool and Python library to efficiently diff rows across two different databases.
- dabbler — A qt-gui tool to view database tables/queries integrated within an jupyter/ipython session
- dagster-dlt — Package for performing ETL/ELT tasks with dlt in Dagster.
- dagster-duckdb — Package for DuckDB-specific Dagster framework op and resource components.
- dagster-embedded-elt — Package for performing ETL/ELT tasks with Dagster.
- dagster-odp — A configuration-driven framework for building Dagster pipelines
- dagster-sling — Package for performing ETL/ELT tasks with Sling in Dagster.
- data-diff — Command-line tool and Python library to efficiently diff rows across two different databases.
- data-diff-customize — Command-line tool and Python library to efficiently diff rows across two different databases.
- data-diff-viewer — Generate HTML reports to visualize the results of data-diff with bigquery-frame and spark-frame.
- data-prep-toolkit-transforms — Data Preparation Toolkit Transforms using Ray
- data-prep-toolkit-transforms-lang1 — Data Preparation Toolkit Transforms
- data-toolset — no summary
- data-watch-sdk — Data Watch Python SDK
- database-testing-tools — A package to test our databases
- datacontract-cli — The datacontract CLI is an open source command-line tool for working with Data Contracts. It uses data contract YAML files to lint the data contract, connect to data sources and execute schema and quality tests, detect breaking changes, and export to different formats. The tool is written in Python. It can be used as a standalone CLI tool, in a CI/CD pipeline, or directly as a Python library.
- datadashr — Engage with your data (SQL, CSV, pandas, polars, mongodb, noSQL, etc.) using Ollama, an open-source tool that operates locally. Datadashr transforms data analysis into a conversational experience powered by Ollama LLMs and RAG.
- datagrunt — Read CSV files and convert to other file formats easily
- datajunction-query — OSS Implementation of a DataJunction Query Service
- dataneuron — An AI Data framework to create AI Data Analyst
- datapane-components — Reusable Datapane components and sample reports and apps
- datasets-sql — datasets_sql is an extension package of 🤗 Datasets package that provides support for executing arbitrary SQL queries on datasets.
- datasette-parquet — Read Parquet files in Datasette
- datasus-db — Download and import DATASUS's public data to a DuckDB database
- dbcreator — Package to create a database out of files
- dbflows — Database incremental exports, transfers, imports, ETL, creation / management
- dbgpt — DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure.