Kidong Lee
1 min readJun 30, 2019

--

You may load data with distcp even if I have not experienced in production.

But with spark, you have more options and more controls to load s3 data to hdfs, for instance, you can tranform a data model on s3 to another one on hdfs, or you can load parquet files on s3 to hdfs as ORC files with ease.

Cheers,

Kidong.

--

--

Kidong Lee
Kidong Lee

Written by Kidong Lee

Founder of Cloud Chef Labs | Chango | Unified Data Lakehouse Platform | Iceberg centric Data Lakehouses https://www.cloudchef-labs.com/

No responses yet