Reverse Dependencies of pyspark
The following projects have a declared dependency on pyspark:
- mkpipe — Core ETL pipeline framework for mkpipe.
- ml-verbs — Generic interfaces for machine learning
- ml2rt — Machine learning utilities for model conversion, serialization, loading etc
- ml4ir — Machine Learning libraries for Information Retrieval
- mlapp — IBM Services Framework for ML Applications Python 3 framework for building robust, production-ready machine learning applications. Official ML accelerator within the larger RAD-ML methodology.
- mlforecast — Scalable machine learning based time series forecasting
- mlops-cloud — no summary
- mlops-core — no summary
- mlpiper — An engine for running component based ML pipelines
- mlpr — A library for machine learning pipeline and creation of reports.
- mlpype-spark — no summary
- mlserver-mllib — Spark MLlib runtime for MLServer
- mltronsAutoDataPrep — First Automated Data Preparation library powered by Deep Learning to automatically clean and prepare TBs of data on clusters at scale.
- mockalot — Mockup data generator library.
- mockingbird — Generate mock documents in various formats (CSV, DOCX, PDF, TXT, and more) that embed seed data and can be used to test data classification software.
- model-fkeywords — A Natural Language Processing Library
- modern_robitcs_smc — Modern Robotics: Mechanics, Planning, and Control
- modern_robotics_smc — Modern Robotics: Mechanics, Planning, and Control
- moonspark — Logging helpers for PySpark
- more-pyspark — no summary
- mosaic-common-utils — Utils library for Mosaic
- mosaic-utils — Utils library for Mosaic
- mosaicml-streaming — Streaming lets users create PyTorch compatible datasets that can be streamed from cloud-based object stores
- mr_urdf_loader — Modern Robotics URDF Load Module
- mse — Make Structs Easy (MSE)
- mx-stream-core — This is package stream core of mindx
- my-pyspark-package — A package to count nulls and -1s in PySpark DataFrames.
- myetljob-run — My first ETL library
- mymaplib-123 — Library created to map two Dataset
- myray — my ray desc
- namedframes — Named Data Frames
- narwhals — Extremely lightweight compatibility layer between dataframe libraries
- nbodyx — A JAX simulator for N body problems.
- nebius-connect — Nebius AI connector for Apache Sparkā¢
- neuralforecast — Time series forecasting suite using deep learning models
- ng-ai — NebulaGraph AI Suite
- ng-data-pipelines-sdk — A library for interacting with data from Amazon S3 through PySpark. Read, write and transform data using a powerful and intuitive API with strong consistency and type checking, thanks to Pydantic. Compatible with Amazon MWAA running Airflow 2.7.2 and above.
- ngdi — NebulaGraph Data Intelligence Suite
- nh-prototype — no summary
- nichirin — TODO
- NikeCA — Standardize and Automate processes
- nixtlats — Python SDK for Nixtla API (TimeGPT)
- nlpbook — Applied Natural Language Processing in the Enterprise - An O'Reilly Media Publication
- no-spark-in-my-home — Yet another Python package for data generation
- noaa-object-data-delivery-pipeline — Pipeline for ingesting a sample dataset
- nolanm-portfolio-package — no summary
- NolanMQuantTradingEnvSetUp — no summary
- nops-metadata — Metadata producer tooling used in nOps.io
- numderivax — Numerical differentiation in JAX.
- nuna-sql-tools — Nuna Sql Tools contains utilities to create and manipulate schemas and sql statements.
- oarphpy — A collection of Python utils with an emphasis on Data Science
- obsrv — no summary
- ocean-spark-airflow-provider — Apache Airflow connector for Ocean for Apache Spark
- ocean-spark-connect — Spark Connect adapter for Ocean Spark
- ocean-sparkconnect — Spark Connect adapter for Ocean Spark
- ODP-DQ — DQ Solution to answer all DQ needs.
- olapy — OlaPy, an experimental OLAP engine based on Pandas
- omigo-ext — Extensions for omigo_core package
- omniduct — A toolkit providing a uniform interface for connecting to and extracting data from a wide variety of (potentially remote) data stores (including HDFS, Hive, Presto, MySQL, etc).
- onetl — One ETL tool to rule them all
- ons-metadata-validation — automated metadata validation for ONS metadata templates
- ons-utils — A suite of pyspark, pandas and general pipeline utils for ONS projects.
- openhunt — A Python library to expedite the analysis of data during hunting engagements
- openImageDatasetSDK — Python SDK for the Open Image Dataset.
- openImageDatasetSDKTest — Python SDK for the Open Image Dataset.
- openmetadata-data-profiler — Data Profiler Library for OpenMetadata
- openmeteo-requests — Open-Meteo Python Library
- OpenOA — A package for collecting and assigning wind turbine metrics
- openpredict — A package to help serve predictions of biomedical concepts associations as Translator Reasoner API.
- openwpm-utils — Tools for parsing crawl data generated by OpenWPM
- ophelia-spark — Ophelia is a spark miner AI engine that builds data mining & ml pipelines with PySpark.
- ophelian — Ophelian is a go-to framework for seamlessly putting ML & AI prototypes into production.
- oplangchain — langchain for OpenPlugin
- oracle-ads — Oracle Accelerated Data Science SDK
- outset — add zoom indicators, insets, and magnified panels to matplotlib/seaborn visualizations with ease!
- ovobdkit — Big Data Development Kit for OVO Big Data
- ovotestkit — Testing Kit for OVO Big Data
- owl-sanitizer-data-quality — Data Quality framework for Pyspark jobs
- packyak — Infrastructure for AI applications and machine learning pipelines
- pami — This software is being developed at the University of Aizu, Aizu-Wakamatsu, Fukushima, Japan
- pandera — A light-weight and flexible data validation and testing tool for statistical data objects.
- pano-airflow — Programmatically author, schedule and monitor data pipelines
- patek — A collection of utilities and tools for accelerating pyspark development and productivity.
- pathling — Python API for Pathling
- pb2df — A python module for converting proto3 type/object to spark dataframe object.
- pbspark — Convert between protobuf messages and pyspark dataframes
- petastorm — Petastorm is a library enabling the use of Parquet storage from Tensorflow, Pytorch, and other Python-based ML training frameworks.
- pfeed — Data pipeline for algo-trading, getting and storing both real-time and historical data made easy.
- pii-anonymizer — Data Protection Framework is a python library/command line application for identification, anonymization and de-anonymization of Personally Identifiable Information data.
- pillar1 — Official package for Pillar1 company
- pineapple-spark — Pineapple is an extension of Apache Sedona for processing large-scale complex spatial queries
- pingpong-datahub — A CLI to work with DataHub metadata
- pipeasy-spark — an easy way to define preprocessing data pipelines for pysparark
- places-intel — A library for fetching and processing place data with polygons using Outscraper and Overpass APIs.
- ploosh — A framework to automatize your tests for data projects
- ploosh-core — A framework to automatize your tests for data projects
- poetry-demo5678 — no summary
- PoliPrompt — PoliPrompt performs data analysis with state-of-the-art foundation models
- polyexpose — polyexpose
- pou-shap — A unified approach to explain the output of any machine learning model.