Reverse Dependencies of FastWARC
The following projects have a declared dependency on FastWARC:
- cc2dataset — Easily convert common crawl to image caption set using pyspark
- cc2imgcap — Easily convert common crawl to image caption set using pyspark
- fundus — A very simple news crawler
- ir-datasets-clueweb22 — Extension for accessing the ClueWeb22 via ir_datasets.
- Resiliparse — A collection of robust and fast processing tools for parsing and analyzing (not only) web archive data.
- warc2summary — warc2summary
1