Column Stores, Column-Oriented, noSQL Data Structures, etc

Published by Jared Kunz on January 7, 2021January 7, 2021

In 2020 as I had a few job interviews with various “FAANG” companies, I found that some of my knowledge of Column Store databases was lacking, so I wrote this draft blog post with a bunch of links, planning to polish it later. I don’t recall what the original vision of this post was. Instead, I will call this post “a list of good links and some definitions pulled from various websites for learning about column stores and related database technologys”:

https://en.wikipedia.org/wiki/List_of_column-oriented_DBMSes

https://www.geeksforgeeks.org/aggregate-functions-in-sql/

https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-overview?view=sql-server-ver15

Parquet

“Parquet isn’t a database. Instead it’s a file format which can be used to store database tables on distributed file systems like HDFS, CEPH or AWS S3. Data is stored in Parquet in chunks that contain blocks of column data in a fashion that makes it possible to break up a Parquet file. Storing the file on many distributed hosts while allowing it to be processed in parallel. You can access Parquet files using Apache Spark, Hive, Pig, Apache Drill and Cloudera’s Impala.”

For more information on Parquet: https://parquet.apache.org/documentation/latest/

“Column store databases store data in columns instead of rows. They make it possible to compute statistics on those columns one to two orders of magnitude or more, faster than on traditional row-oriented databases.

A column-oriented table is very good for analytics but usually terrible for traditional transactional workloads.

Most, although not all, column stores are designed to operated on a distributed cluster of servers.”

https://www.quora.com/Which-NoSQL-database-is-most-suitable-for-GROUP-BY-Aggregation-queries-on-large-dataset

“Which NoSQL database is most suitable for GROUP BY\Aggregation queries on large dataset ?AnswerFollow·7Request13 AnswersAndrew Patterson, Software EngineerAnswered April 25, 2018

Colomn-oriented databases are your best bet because fields are stored together on disk thereby minimising seeks and hopefully faster queries.”

Free and open-source software (FOSS)

Database Name	Language Implemented in	Notes
Apache Druid	Java	started in 2011 for low-latency massive ingestion and queries
Apache Kudu	C++	released in 2016 to complete the Apache Hadoop ecosystem
Calpont InfiniDB	C++
ClickHouse	C++	released in 2016 to analyze data that is updated in real time
CrateDB	Java
C-Store
Greenplum Database	C
PostgreSQL cstore_fdw ^[1], vops ^[2]	C	cstore_fdw uses ORC format
MariaDB ColumnStore	C & C++	formerly Calpont InfiniDB
MapD	C++
Metakit	C++
MonetDB	C
Scylla (database) Open Source	C++

Platform as a Service (PaaS)

Amazon Redshift
Microsoft Azure SQL Data Warehouse
Google BigQuery
Oracle Autonomous Datawarehouse Cloud Service
Scylla (database) Cloud
Snowflake Computing
MariaDB SkySQL

Proprietary

Actuate Corporation BIRT Analytics ColumnarDB
Dimensional Insight
Endeca
EXASOL
EXtremeDB
IBM D b 2
Infobr i ght
KDB
k db+
memSQL
Microsoft SQL Server 2012
Oracle Database (in-memory option)^[3]
Oracle Exadata^{[citation needed]}
ParAccel
SAND_CDBMS
SAP HANA^[4]
SAP IQ
Scylla (d a tabase) Enterprise
SenSage
SQream
Te r adata
Vector, formerly Vectorwise
Vertica (developed from open-source C-S t ore)

Column Stores, Column-Oriented, noSQL Data Structures, etc

Parquet

Free and open-source software (FOSS)

Platform as a Service (PaaS)

Proprietary

0 Comments

Leave a Reply Cancel reply

Blog Posts

DevSecOps, CI CD Security Basics

Blog Posts

The Strength

Blog Posts

The Unforgettable Night: Lady Gaga and U2 Join for Three Iconic Songs at The Sphere in Las Vegas

Column Stores, Column-Oriented, noSQL Data Structures, etc

Parquet

Free and open-source software (FOSS)

Platform as a Service (PaaS)

Proprietary

0 Comments

Leave a Reply Cancel reply

Related Posts

Blog Posts

DevSecOps, CI CD Security Basics

Blog Posts

The Strength

Blog Posts

The Unforgettable Night: Lady Gaga and U2 Join for Three Iconic Songs at The Sphere in Las Vegas