Reverse Dependencies of pyspark
The following projects have a declared dependency on pyspark:
- eaiesb — CSV Ingest-Transform-Persist
- easy-sql-easy-sql — A library developed to ease the data ETL development process.
- easy-sql-easy-sql-j — A library developed to ease the data ETL development process.
- ebes — EBES: Easy Benchmarking for Event Sequences.
- edawesome — Quick, easy and customizable data analysis with pandas and seaborn
- edp-amundsen-databuilder — EDP Amundsen Data builder
- edr-accessor — A pandas DataFrame accessor for accessing Enterprise Data Repository (EDR) tables with Spark.
- EDS-Scikit — eds-scikit is a Python library providing tools to
- edsnlp — Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.
- edsteva — EDS-TeVa provides a set of tools that aims at modeling the adoption over time and across space of the Electronic Health Records.
- Effulge — A small package used to find data variances
- elasticpedia — Indexing DBpedia with Elasticsearch.
- elephas — Distributed deep learning on Spark with Keras
- ellipsisPySpark — Package to load Ellipsis raster and vector layers as pySpark DataFrames.
- emm — Entity Matching Model package
- epsel — Ensure PySpark Executor Logging
- Eskapade-Spark — Spark for Eskapade
- etiq-spark — This is an optional, extension to the etiq library to provide spark datasets
- etiya-ie — Etiya Bilgi Teknolojisi ve Hizmetleri için Intern sürecinde yazılan fonksiyonları içerir.
- etl-jobs — no summary
- etlprocess — no summary
- evatestdb — EVA is a visual data management system (think MySQL for videos).
- event2vec — event2vec
- everhome — everhome.cloud api package.
- evidently — Open-source tools to analyze, monitor, and debug machine learning model in production.
- example-pkg-krish9d — A small example package
- exelog — Enabling meticulous logging for Spark Applications
- explodify — Package Description: Explodify
- fabric-data-guard — A library for data quality checks in Microsoft Fabric using Great Expectations
- fabric-fast-start — Fabric Fast Start is a set of tools to help you get started with Fabric.
- fakers — A package that provides ready-made objects of fake-data.
- farsante — Fake DataFrame generators for Pandas and PySpark
- fasttreeshap — A fast implementation of TreeSHAP algorithm.
- fate-flow — no summary
- fcl-algorithms — A suite of funcitons to analyse Bitcoin transaction networks and provide the back-end for findcryptolinks.com
- Feast — Python SDK for Feast
- feast-doris — Python SDK for Feast
- feast-spark — Spark extensions for Feast
- feast-spark-offline-store — Spark support for Feast offline store
- feastmo — Python SDK for Feast
- feathr — An Enterprise-Grade, High Performance Feature Store
- feature-engineering-package — Zepto MLOps Services
- featuretoolsOnSpark — A simplified version of featuretools for Spark
- featuretoolsOnSparkX — a bugfix version for raw repo
- fedml-databricks — A python library for building machine learning models on Databricks using a federated data source
- fedml-dsp — A python library for building machine learning models on gpu AI Platforms using a federated data source
- FGL — A fast graph library.
- FidoSniff — Libreria Fido Sniff para Calidad de Datos
- fifeforspark — Finite-Interval Forecasting Engine for Spark: Machine learning models for discrete-time survival analysis and multivariate time series forecasting for Apache Spark
- fink-anomaly-detection-model — Fink SNAD Anomaly Detection Model
- FireSpark — FireSpark data processing utility library
- fiware-pyspark-connector — Connects FIWARE Context Brokers with fiware_pyspark_connector
- flaightkit — Flyte SDK for Python (Latch fork)
- FLAML — A fast library for automated machine learning and tuning
- flan — Create (very good) fake NCSA Combined Log Format access.log files for testing log-consuming systems like Splunk, ActiveMQ, Amazon MQ, RabbitMQ, Kafka, FluentD, Flume, Pulsar, Nifi...
- flatten-spark-dataframe — Databricks PySpark module to flatten nested spark dataframes, basically struct and array of struct till the specified level
- flicker — Provides FlickerDataFrame, a wrapper over Pyspark DataFrame to provide a pandas-like API
- flowrunner — Flowrunner is a lightweight package to organize and represent Data Engineering/Science workflows
- flytekitplugins-great-expectations — Great Expectations Plugin for Flytekit
- flytekitplugins-spark — Spark 3 plugin for flytekit
- fnal-column-analysis-tools — Tools for doing Collider HEP style analysis with columnar operations at Fermilab
- foo-bar-eggs-baruch — Sample Python Project for creating a new Python Module
- forbids — forBIDS - BIDS protocol compliance validation
- forecastflowml — Scalable machine learning forecasting framework with Pyspark
- forml — A development framework and MLOps platform for the lifecycle management of data science projects.
- foundry-dev-tools-transforms — Seamlessly run your Palantir Foundry Repository transforms code on your local machine.
- framework3 — A flexible framework for machine learning pipelines
- fugue — An abstraction layer for distributed computation
- functionalizer — A PySpark implementation of the Blue Brain Project Functionalizer
- ga4 — no summary
- gators — Model building and model scoring library
- geniusrise — An LLM framework
- gentropy — Open Targets python framework for post-GWAS analysis
- genutility — A collection of various Python utilities
- geotorchai — GeoTorchAI, formarly GeoTorch, A Spatiotemporal Deep Learning Framework
- getml — Python API for getML
- globus-etl-utils — ETL tools for Globus Staffing
- gnuper — Open Source Package for Mobile Phone Metadata Preprocessing
- google-dataproc-templates — Google Dataproc templates written in Python
- gptdb — GPT-DB is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure.
- gptfunctionutil — A simple package for the purpose of providing a set of utilities that make it easier to invoke python functions and coroutines using OpenAI's GPT models.
- grad-info-opt — Implementation of Gradient Information Optimization for efficient and scalable training data selection
- gradiently — no summary
- graphlet — Graphlet AI Knowledge Graph Factory
- graphreduce — Leveraging graph data structures for complex feature engineering pipelines.
- great-expectations — Always know what to expect from your data.
- great-expectations-cta — Always know what to expect from your data.
- gspreadplusplus — Enhanced Google Sheets operations with advanced data type handling
- h1st-contrib — Human-First AI (H1st)
- h2o-mlops-scoring-client — A Python client library to simplify robust mini-batch scoring against an H2O MLOps scoring endpoint.
- h3spark — Lightweight pyspark wrapper for h3-py
- hadoop-fs-wrapper — Python Wrapper for Hadoop Java API
- handyspark — HandySpark - bringing pandas-like capabilities to Spark dataframes
- happy-pandas — no summary
- haychecker — a small library to check for data quality, either with spark or pandas
- hd2api.py — An API wrapper for the Helldivers 2 Community and official APIs
- hela — Your data catalog as code and one schema to rule them all.
- highcharts-core — High-end Data Visualization for the Python Ecosystem. Official wrapper for Highcharts Core (JS).
- highcharts-gantt — High-end Gantt Chart Visualization for the Python Ecosystem
- highcharts-maps — High-end Map Data Visualization for the Python Ecosystem