Kidong Lee
2 min readDec 30, 2021

--

Hello Kim Ted,

I tried to answer your questions.

1. Was wondering if the deployment could be just made with a yaml.

...

As you referenced from https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/1116,

Because I am not sure, but I think, there is spark-submit already installed in the image of `gcr.io/spark-operator/spark:v3.0.0`, spark thrift server can be run with deployment manifest.

You have tested spark thrift server with deploy mode of `cluster` using this deployment manifest, which I am curious because as long as I know, spark thrift server with the version of 3.x cannot be run as `cluster` mode, that is why I have written a wrapper class for that.

2. Are there any reasons u made it with a spark-operator?

...

I have also benchmarked spark-on-k8s-operator before, but I think, spark-on-k8s-operator is not suit for my open source data platform, DataRoaster to more control spark applications run with spark operator, thus I have written dataroaster spark operator.

As deployment manifest approach you did, you can run spark applications on kubernetes. But I want to have dataroaster more control to spark applications like spark thrift server with custom resources.

3. Parameter change would be a burden since it has to go through mvn and docker build.

...

I think, generally, there are two things to do to run spark applications.

First, you should build spark container image for yourself, or you can use prebuilt spark image like cloudcheflabs/spark:v3.0.3.

Second, you have to build your spark application, for instance with maven to package uber jar of spark application in case of java or scala.

Cheers,

- Kidong

--

--

Kidong Lee
Kidong Lee

Written by Kidong Lee

Founder of Cloud Chef Labs | Chango | Unified Data Lakehouse Platform | Iceberg centric Data Lakehouses https://www.cloudchef-labs.com/

Responses (1)