Open in app

Sign In

Write

Sign In

Kidong Lee
Kidong Lee

218 Followers

Home

About

Published in ITNEXT

·5 hours ago

Trino Gateway reloaded in Chango Cloud in Preview

Trino gateway is a proxy which routes trino queries to upstream backend trino clusters dynamically. As I mentioned in the previous posts, Route trino queries dynamically using Trino Gateway and Trino Gateway , if one of the backend trino clusters has been exhausted, trino gateway will route the queries to…

Trino

4 min read

Trino Gateway reloaded in Chango Cloud in Preview
Trino Gateway reloaded in Chango Cloud in Preview
Trino

4 min read


Published in ITNEXT

·Dec 10, 2022

Simple Streaming Application to ingest Json to Iceberg Table in S3

If you want to build streaming applications to ingest events to storage like S3, you may consider event streaming platform like kafka, and streaming applications like spark streaming application to consume events from kafka topics, and transform and save them to S3. But with some reasons like management issues, if…

Iceberg

8 min read

Simple Streaming Application to ingest Json to Iceberg Table in S3
Simple Streaming Application to ingest Json to Iceberg Table in S3
Iceberg

8 min read


Published in ITNEXT

·Oct 14, 2022

The way to integrate Trino ETL Jobs using dbt-trino with Airflow on Kubernetes

Trino is the most popular query engine in data lakehouses. Recently, trino can be used to run long running ETL jobs with fault tolerant execution configuration as well as interactive queries, which means, I think, you can replace Hive with trino for most of the cases. Such trino etl jobs…

Dbt

6 min read

The way to integrate Trino ETL Jobs using dbt-trino with Airflow on Kubernetes
The way to integrate Trino ETL Jobs using dbt-trino with Airflow on Kubernetes
Dbt

6 min read


Published in ITNEXT

·Oct 9, 2022

Things to consider to submit Spark Jobs on Kubernetes in cluster mode

It is hard to submit spark jobs on kubernetes. As mentioned in the previous post of Hive on Spark in Kubernetes in which it is shown that spark thrift server as a usual spark job submitted to kubernetes, there are many things to consider to submit spark jobs onto kubernetes. …

Spark

9 min read

Things to consider to submit Spark Jobs on Kubernetes in cluster mode
Things to consider to submit Spark Jobs on Kubernetes in cluster mode
Spark

9 min read


Published in ITNEXT

·Jul 14, 2022

Trino Gateway

Previously, I have mentioned how to build a trino gateway to route queries dynamically to downstream trino clusters in this article. Here, I am going to show the improved trino gateway architecture using DataRoaster Trino Controller to control all the trino ecosystem components to create a nice trino gateway easily. Trino Gateway Architecture

Presto

5 min read

Trino Gateway
Trino Gateway
Presto

5 min read


Published in ITNEXT

·Jun 15, 2022

Route trino queries dynamically using Trino Gateway

Trino is a popular query engine to query data in data lakehouses. Let’s say there is just one big trino cluster consisting of lots of nodes, but such a big trino cluster can be problematic for some organizations. Trino can be used for ETL workloads and interactive query. …

Kubernetes

8 min read

Route trino queries dynamically using Trino Gateway
Route trino queries dynamically using Trino Gateway
Kubernetes

8 min read


Published in ITNEXT

·Dec 12, 2021

Hive on Spark with Spark Operator

Spark Thrift Server is used as Hive Server whose execution engine is spark. As mentioned in Hive on Spark in Kubernetes, Spark Thrift Server can be deployed onto Kubernetes. For this case,spark-submit installed on local machine has been used to submit spark thrift server to kubernetes. There is another way…

Kubernetes

9 min read

Hive on Spark with Spark Operator
Hive on Spark with Spark Operator
Kubernetes

9 min read


Published in ITNEXT

·Sep 9, 2021

DataRoaster is now open-sourced, why I created it

DataRoaster is a tool to provide data platforms running on kubernetes. Recently I have open-sourced it. Before I developed DataRoaster, I used free data platforms like HDP(Hortonworks Data Platform) to build data lakes. After Hortonworks was acquired by Cloudera, HDP was not free any more. To build a data lake…

Hive

2 min read

DataRoaster is now open-sourced, why I created it
DataRoaster is now open-sourced, why I created it
Hive

2 min read


Published in ITNEXT

·May 21, 2021

Trino on Nomad

Trino(formerly PrestoSQL) is a popular distributed interactive query engine in data lake. Trino can be used as not only query engine, but also data preparation engine in data lake. As data platform component, Trino is one of my favorite components to use in data lake. …

Presto

9 min read

Trino on Nomad
Trino on Nomad
Presto

9 min read


Published in ITNEXT

·May 20, 2021

Elasticsearch on Nomad

As I mentioned in the previous post, I have been looking for an alternative to Kubernetes to deploy stateful applications on container orchestrators. Generally, stateful applications need volumes to persist data. The volumes should be provisioned dynamically when the stateful application job is submitted. But currently, Nomad does not support…

Nomad

12 min read

Elasticsearch on Nomad
Elasticsearch on Nomad
Nomad

12 min read

Kidong Lee

Kidong Lee

218 Followers

Founder of Cloud Chef Labs Inc.(http://www.cloudchef-labs.com) | Creator of DataRoaster(https://bit.ly/3BM0ccA)

Following
  • Sajjad Hussain

    Sajjad Hussain

  • Amit Singh Rathore

    Amit Singh Rathore

  • Tabular

    Tabular

  • Jonathan Talmi

    Jonathan Talmi

  • Michael Lin

    Michael Lin

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech