Reverse Dependencies of pyspark
The following projects have a declared dependency on pyspark:
- spark-connect-proxy — A reverse proxy server which allows secure connectivity to a Spark Connect server
- spark-dataframe-tools — spark_dataframe_tools
- spark-expectations — This project helps us to run Data Quality Rules in flight while spark job is being run
- spark-generated-rules-tools — spark_generated_rules_tools
- spark-hdfs-tools — spark_hdfs_tools
- spark-hydro — Advanced Delta-Lake related tooling based on Apache Spark
- spark-insights — A package to generate HTML reports for Spark DataFrames with detailed data health checks.
- spark-llm — LLM assistant for the development of Spark applications
- spark-loader — loads spark
- spark-map — Pyspark implementation of `map()` function for spark DataFrames
- Spark-Matcher — Record matching and entity resolution at scale in Spark
- spark-pipeline — Data Science oriented tools, mostly for Apache Spark
- spark-pit — PIT join library for PySpark
- spark-plotting-tools — spark_plotting_tools
- spark-privacy-preserver — Anonymizing Library for Apache Spark
- spark-quality-rules-tools — spark_quality_rules_tools
- spark-scaffolder-transforms-tools — spark_scaffolder_transforms_tools
- spark-silex — Silex adds more sparks to your project!
- spark-sql-to-sqlite — no summary
- sparkaid — Utils for working with Spark
- sparkautomapper — AutoMapper for Spark
- sparkautomapper.fhir — FHIR extensions for SparkAutoMapper
- SparkAutoML — For an easy implementation of spark's machine learning library
- SparkBoot — SparkBoot: make an easy way (yaml) to run pyspark
- sparkcraft — SparkCraft
- sparkdantic — A pydantic -> spark schema library
- sparkdataframecomparer — Deep Comparer for Spark Data Frames
- sparkdh — no summary
- sparkfhirschemas — AutoMapper for Spark
- sparkkgml — From Knowledge Graphs to Machine Learning!
- sparklanes — A lightweight framework to build and execute data processing pipelines in pyspark (Apache Spark's python API)
- sparklightautoml-dev — Spark-based distribution version of fast and customizable framework for automatic ML model creation (AutoML)
- sparkly-em — Sparkly is a TF/IDF top-k blocking for entity matching system built on top of Apache Spark and PyLucene.
- sparkmanager — A pyspark management framework
- SparkMinIOHandle — Spark MinIO Handler Package
- SparkMLTransforms — Transformations in Spark for ML Features
- sparkmon — sparkmon
- Sparkora — Exploratory data analysis toolkit for Pyspark
- sparkouille — Ways to productionize machine learning predictions
- sparkpipelineframework — Framework for simpler Spark Pipelines
- sparkpipelineframework.testing — Testing Framework for SparkPipelineFramework
- sparkpl — A utility package for converting between PySpark and Polars DataFrames
- sparkql — sparkql: Apache Spark SQL DataFrame schema management for sensible humans
- sparksampling — pyspark-sampling
- SparkSchemafy — Formats spark schema output into a schema definition
- sparksnake — Improving the development of Spark applications deployed as jobs on AWS services like Glue and EMR
- sparksql-helper — SparkSQL Helper
- sparksql-jupyter — Spark SQL magic command for Jupyter notebooks
- sparksql-magic — Spark SQL magic command for Jupyter notebooks
- SparkStream — A simple spark streaming handler.
- sparkypandy — It's not spark, it's now pandas, it's just awkward...
- SPARQL2Spark — SPARQL Result to Spark
- spinecore — The core lib of spine library
- spirograph — A tool to help building ML pipeline easier for non technical users..
- spl-transpiler — Convert Splunk SPL queries into PySpark code
- splink — Fast probabilistic data linkage at scale
- sqlframe — Turning PySpark Into a Universal DataFrame API
- sqlmesh — no summary
- sqlmesh-cube — SQLMesh extension for generating Cube semantic layer configurations
- squirrel-datasets-core — Squirrel public datasets collection
- ssb-ipython-kernels — Jupyter kernels for working with dapla services
- ssb-spark-tools — A collection of data processing Spark functions for the use in Statistics Norway.
- stacks-data — A suite of utilities to support data engineering workloads within an Ensono Stacks data platform.
- statscanpy — Basic package for querying & downloading StatsCan data by table name.
- stoys — Stoys: Spark Tools @ stoys.io
- superannotate-databricks-connector — Custom functions to work with SuperAnnotate in Databricks
- sws-spark-dissemination-helper — A Python helper package providing streamlined Spark functions for efficient data dissemination processes
- synaptiq-datawarehouse — Add your description here
- synthesized-datasets — Publically available datasets for benchmarking and evaluation.
- synthesized3 — Synthesized SDK.
- Synthius — A toolkit for generating and evaluating synthetic data in terms of utility, privacy, and similarity
- tabpipe — A toolkit for tabular data ML preprocessing pipelines.
- td-pyspark — Treasure Data extension for pyspark
- tecton — Tecton Python SDK
- tecton-parallel-retrieval — [private preview] Parallel feature retrieval for Tecton
- tecton-utils — [private preview] Utils for Tecton
- teehr — Tools for Exploratory Evaluation in Hydrologic Research
- test-amundsen-databuilder — Amundsen Data builder
- test-data-modori — LMOps Tool for Korean
- testabc — no summary
- testfate — no summary
- testKuldeep — testing databricks
- testlib123 — Library created to map two Dataset
- testuL — Library created to map two Dataset
- text-dedup — no summary
- tgedr-nihao — studies with financial data sources
- tgedr-pycode — python handy code
- tidal-algorithmic-mixes — common transformers used by the tidal personalization team.
- tidal-per-transformers — common transformers used by the tidal personalization team.
- tidy-tools — Declarative programming for PySpark workflows.
- tidypyspark — dplyr for pyspark
- timestep — Timestep AI CLI - free, local-first, open-source AI
- tinderbox — Sharable PySpark tranformation sequence
- tinsel — PySpark schema generator
- tinytimmy — A simple and easy to use Data Quality (DQ) tool built with Python.
- tmlt.analytics — Tumult's differential privacy analytics API
- tmlt.core — Tumult's differential privacy primitives
- toolbox-pyspark — Helper files/functions/classes for generic PySpark processes
- trac-runtime — TRAC Model Runtime for Python
- tracdap-runtime — Runtime package for building models on the TRAC Data & Analytics Platform