Reverse Dependencies of pyspark
The following projects have a declared dependency on pyspark:
- trawler-on-lake — no summary
- treemodel2sql — tree model transform to sql
- tuberia — Tuberia... when data engineering meets software engineering
- turntable-spoonbill — Productivity-centric Python Big Data Framework
- TweetsAnalysis — Real Time Tweets Analysis.
- typedspark — Column-wise type annotations for pyspark DataFrames
- typefu — A type casting guru from your data sources and data formats
- unionstation — Big ML/DL dataset processing station
- userstory8020 — test assignment
- valdata — A Python package for validating data consistency
- valido — PySpark dataframes based workflow validator
- vangap-meliora — Sample Python Project for creating a new Python Module
- vangap-meliora1 — Credit risk development and validation tools
- vecspark — A PySpark-based library for vector similarity and distance computations.
- vectice — Vectice Python library
- vectorcraft — A custom library extending LangChain functionality.
- ventiotools — Tools for usage in the Vent.io ecosystem
- vera-testframework — Testframework voor VERA referentiedata
- VerifyData — VerifyData is an intuitive library focused on data quality assessment, offering automated checks for completeness, consistency, and accuracy to guarantee high standards across your data flows.
- vevestaX — Stupidly simple library to track machine learning experiments as well as features
- vineyard-pyspark — Vineyard integration with PySpark
- visions — Visions
- vivqu — Easier data analysis and visualization based on pydeequ
- vtb-biname — A binary organization/ individual name classification library for PySpark
- waterfall-logging — Waterfall statistic logging for data quality or filtering steps.
- whylogs — Profile and monitor your ML data pipeline end-to-end
- wicker — An open source framework for Machine Learning dataset storage and serving
- XalExtractor — servicios para extractores
- xdbutils — Utilities for Databricks
- xedro — Kedro helps you build production-ready data and analytics pipelines
- xgboost — XGBoost Python Package
- xgboost-cpu — XGBoost Python Package
- xgboost2sql — Convert the trained xgboost model to sql
- xurpas-data-quality — XAIL Data quality
- xursparks — Encapsulating Apache Spark for Easy Usage
- yetl-framework — yet (another spark) etl framework
- yvestest — An open-source simplifies ETL workflow with Python based on Spark
- zat — Zeek Analysis Tools
- zdatasets — Dataset SDK for consistent read/write [batch, online, streaming] data.
- zgl-streaming — Streaming lets users create PyTorch compatible datasets that can be streamed from cloud-based object stores
- zlai — A LLM Agent Python package.
- zstreams — Zeek Analysis Tools
- zzingestions — ZZIngestions is a powerful Python package that uses PySpark to transform data ingestion into a fluid and efficient process. It allows performing ETL operations in a scalable and intuitive way, improving the robustness and agility of data flows.