1 min readJun 30, 2019
You may load data with distcp even if I have not experienced in production.
But with spark, you have more options and more controls to load s3 data to hdfs, for instance, you can tranform a data model on s3 to another one on hdfs, or you can load parquet files on s3 to hdfs as ORC files with ease.
Cheers,
Kidong.