Reverse Dependencies of duckdb
The following projects have a declared dependency on duckdb:
- a-data-processing — A library that prepares raw documents for downstream ML tasks.
- aau_gomapedge_etl — This is the GoMap Edge ETL workflow package
- ab-data-processing — Data Processing is used for data processing through MinIO, databases, Web APIs, etc.
- acryl-datahub — A CLI to work with DataHub metadata
- acryl-datahub-cloud — no summary
- acryl-sqlglot — An easily customizable SQL parser and transpiler
- adbc-driver-manager — A generic entrypoint for ADBC drivers.
- affinity — Module for creating well-documented datasets, with types and annotations.
- agera5tools — AgERA5 is a tool for handling AgERA5 data from the Copernicus Climate Data Store.
- agno — Agno: a lightweight framework for building multi-modal Agents
- aioduckdb — asyncio bridge to the standard sqlite3 module
- aiosql — Simple SQL in Python
- airbyte — PyAirbyte
- airflow-provider-duckdb — DuckDB (duckdb.org) provider for Apache Airflow
- airfold-cli — Airfold CLI
- alectiolite — Integrate customer side ML application with the Alectio Platform
- altair — Vega-Altair: A declarative statistical visualization library for Python.
- altimate-dataminion — Internal package. Use this at your own risk, support not guaranteed
- amethyst-facet — Compute window aggregations and alter contents of Amethyst HDF5 files
- ampere-meter — Tooling to track and visualize engagement with the mrpowers-io organization
- AnnSQL — A Python SQL tool for converting Anndata objects to a relational DuckDb database. Methods are included for querying and basic single-cell preprocessing (experimental).
- apb-duckdb-utils — DuckDB utils
- arac — Data Processing is used for data processing through MinIO, databases, Web APIs, etc.
- arrow-odbc — Read the data of an ODBC data source as sequence of Apache Arrow record batches.
- asyncdb — Library for Asynchronous data source connections Collection of asyncio drivers.
- athena-mvsh — Athena-mvsh é um biblioteca python, que interage com o serviço Amazon Athena
- atlassianhw — Atlassian application homework
- atspm — Aggregates hi-res data from ATC traffic signal controllers into 15-minute binned ATSPM/performance measures.
- autocoder-nano — AutoCoder Nano
- awk-plus-plus — Declarative Data Orchestration
- awsdp — no summary
- bamboo-duck — no summary
- bardi — A flexible machine learning data pre-processing pipeline framework.
- bearsql — Bearsql aadds sql syntax on pandas dataframe. It uses duckdb to speedup the pandas processing and as the sql engine
- bflower-base — A Python package with a built-in web application
- bframelib — An open source billing framework to generate, view and diff invoices locally.
- bida — bida, 简单、易用、稳定、高效,便于扩展和集成的,大语言模型工程化开发框架
- bigquery-duckdb-toolkit — Toolkit para consultas e manipulação de dados no BigQuery com integração com DuckDB.
- bl-vanna — Generate SQL queries from natural language
- bladesight — Bladesight provides comprehensive tools for introductory Blade Tip Timing analysis.
- blendsql — Query language for blending SQL logic and LLM reasoning across multi-modal data. [Findings of ACL 2024]
- bmsdna-lakeapi — no summary
- bmsdna.sql_utils — no summary
- bookworm_genai — Bookworm - A LLM-powered bookmark search engine
- buenavista — Programmable Presto and Postgres Proxies
- buildflow — BuildFlow, is an open source framework for building large scale systems using Python. All you need to do is describe where your input is coming from and where your output should be written, and BuildFlow handles the rest.
- bytewax-duckdb — Bytewax custom sink for DuckDB and MotherDuck
- canswim — "Developer toolkit for CANSLIM investment style practitioners"
- catalystcoop.pudl — An open data processing pipeline for US energy data
- cbiohub — Convenience functions for accessing cBioPortal data files
- cdmb — Common Data Model Builder library
- cdpdev-datahub — A CLI to work with DataHub metadata
- chaarset-normalizeir2 — chaarset-normalizeir2
- chalkpy — Python SDK for Chalk
- chroma-migrate — A tool for migrating to chroma versions >= 0.4.0
- chronify — Time series store and mapping libray
- clidb — CLI based SQL client for local data
- closurizer — Add closure expansion fields to kgx files following the Golr pattern
- coastpy — Python tools for cloud-native coastal analytics.
- cocoon-data — Cocoon is an open-source project that aims to free analysts from tedious data transformations with LLM.
- coingecko-exporter — A package to export bulk data from the CoinGecko API.
- collate-data-diff — Command-line tool and Python library to efficiently diff rows across two different databases.
- colvert — colvert is a Frontend for DuckDB a fast and lightweight in-memory database designed for analytical queries. It's design to be a simple and easy to use tool for data analysis and visualization.
- compnet — A package for market compression of network data.
- connection-helper — A collection of helper for sql connections
- consensus — no summary
- copairs — Find pairs and compute metrics between them
- core-eda — A name matching package
- core-pro — A utility package for data science
- corvic-engine — Seamless embedding generation and retrieval.
- cratedb-toolkit — CrateDB Toolkit
- cryo-mcp — MCP server for querying Ethereum blockchain data using cryo
- crypto-data-ingestion — The purpose of this library is facilitate the ingestion of data from the CoinGecko API and store it in cloud/self-hosted object storage.
- csql — Simple library for writing composeable SQL queries
- csvw-duckdb — Convert CSVW metadata to DuckDB SQL syntax.
- cuallee — Python library for data validation on DataFrame APIs including Snowflake/Snowpark, Apache/PySpark and Pandas/DataFrame.
- cubyc — The repository for all your experiments
- cumulus-library — Clinical study SQL generation for data derived from bulk FHIR
- cumulus-library-covid — SQL generation for cumulus covid symptom analysis
- curategpt — CurateGPT
- cytotable — Transform CellProfiler and DeepProfiler data for processing image-based profiling readouts with Pycytominer and other Cytomining tools.
- cz-data-diff — Command-line tool and Python library to efficiently diff rows across two different databases.
- dabbler — A qt-gui tool to view database tables/queries integrated within an jupyter/ipython session
- dagster-dlt — Package for performing ETL/ELT tasks with dlt in Dagster.
- dagster-duckdb — Package for DuckDB-specific Dagster framework op and resource components.
- dagster-embedded-elt — Package for performing ETL/ELT tasks with Dagster.
- dagster-odp — A configuration-driven framework for building Dagster pipelines
- dagster-sling — Package for performing ETL/ELT tasks with Sling in Dagster.
- data-diff — Command-line tool and Python library to efficiently diff rows across two different databases.
- data-diff-customize — Command-line tool and Python library to efficiently diff rows across two different databases.
- data-diff-viewer — Generate HTML reports to visualize the results of data-diff with bigquery-frame and spark-frame.
- data-eng-cli — A simple CLI tool for data engineering tasks
- data-prep-toolkit-transforms — Data Preparation Toolkit Transforms using Ray
- data-prep-toolkit-transforms-lang1 — Data Preparation Toolkit Transforms
- data-toolset — no summary
- data-watch-sdk — Data Watch Python SDK
- database-testing-tools — A package to test our databases
- datacontract-cli — The datacontract CLI is an open source command-line tool for working with Data Contracts. It uses data contract YAML files to lint the data contract, connect to data sources and execute schema and quality tests, detect breaking changes, and export to different formats. The tool is written in Python. It can be used as a standalone CLI tool, in a CI/CD pipeline, or directly as a Python library.
- datadashr — Engage with your data (SQL, CSV, pandas, polars, mongodb, noSQL, etc.) using Ollama, an open-source tool that operates locally. Datadashr transforms data analysis into a conversational experience powered by Ollama LLMs and RAG.
- dataengtools — no summary