Published in ITNEXT·Jun 15Route trino queries dynamically using Trino GatewayTrino is a popular query engine to query data in data lakehouses. Let’s say there is just one big trino cluster consisting of lots of nodes, but such a big trino cluster can be problematic for some organizations. Trino can be used for ETL workloads and interactive query. …Kubernetes10 min read
Published in ITNEXT·Dec 12, 2021Hive on Spark with Spark OperatorSpark Thrift Server is used as Hive Server whose execution engine is spark. As mentioned in Hive on Spark in Kubernetes, Spark Thrift Server can be deployed onto Kubernetes. For this case,spark-submit installed on local machine has been used to submit spark thrift server to kubernetes. There is another way…Kubernetes9 min read
Published in ITNEXT·Sep 9, 2021DataRoaster is now open-sourced, why I created itDataRoaster is a tool to provide data platforms running on kubernetes. Recently I have open-sourced it. Before I developed DataRoaster, I used free data platforms like HDP(Hortonworks Data Platform) to build data lakes. After Hortonworks was acquired by Cloudera, HDP was not free any more. …Hive2 min read
Published in ITNEXT·May 21, 2021Trino on NomadTrino(formerly PrestoSQL) is a popular distributed interactive query engine in data lake. Trino can be used as not only query engine, but also data preparation engine in data lake. As data platform component, Trino is one of my favorite components to use in data lake. …Presto9 min read
Published in ITNEXT·May 20, 2021Elasticsearch on NomadAs I mentioned in the previous post, I have been looking for an alternative to Kubernetes to deploy stateful applications on container orchestrators. Generally, stateful applications need volumes to persist data. The volumes should be provisioned dynamically when the stateful application job is submitted. But currently, Nomad does not support…Nomad12 min read
Published in ITNEXT·May 13, 2021Hive Metastore on NomadHive Metastore is one of the most important components in data lake. Hive Metastore is used as data catalog in Data Lake. There are many execution engines to use Hive Metastore, for instance, Spark, Presto, Hive on Tez, Hive on Spark, etc. …Nomad7 min read
Published in ITNEXT·May 7, 2021Provision Volumes on Kubernetes and Nomad using Ceph CSIIf you want to run stateful applications on Kubernetes, CSI(Container Storage Interface) has an important role in provisioning volumes dynamically. Generally speaking, CSI is used to provision volumes from storages not only for Kubernetes, but also for all the other container orchestrators such as Mesos, Nomad. …Kubernetes10 min read
Apr 8, 2021Install Nomad ClusterKubernetes is a popular container orchestrator nowadays. I have also proposed the concept of private cloud platform based on Kubernetes, and I have implemented a multi-tenant data platform for this concept. From my experiences of building my platform , it is great to run stateless applications on Kubernetes, but I…Nomad9 min read
Published in ITNEXT·Mar 16, 2021Secure MinIO At-Rest with VaultIf you want to expose a S3 service using MinIO, you should consider having MinIO Data secure In-Transit and At-Rest. In the previous post, I have talked about securing MinIO In-Transit. In this post, I am going to show how to make MinIO secure At-Rest with Vault. There are so…Minio13 min read
Feb 21, 2021Secure MinIO Not on KubernetesMinIO is a popular S3 compatible Object Storage which can be run on Kubernetes. MinIO running on Kubernetes can be managed in the kubernetes standard way with ease, but you have to also worry about that MinIO as Statefulsets would be restarted by unexpected reasons(for instance, Statefulset Pods can be…Minio6 min read