Simplify Data Conversion from Apache Spark to TensorFlow and PyTorch
Petastorm is a popular open-source library from Uber that enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. We are excited to announce that Petastorm 0.9.0 supports the easy conversion of data from Apache Spark DataFrame to TensorFlow Dataset and PyTorch DataLoader. The new Spark Dataset...
Introducing Built-in Image Data Source in Apache Spark 2.4
Introduction With recent advances in deep learning frameworks for image classification and object detection, the demand for standard image processing in Apache Spark has never been greater. Image handling and preprocessing have their specific challenges - for example, images come in different formats (eg., jpeg, png, etc.), sizes, and color schemes, and there is no...