Crate: Your Elastic Data Store

Scalable. Simple. Fast.

Crate is a distributed system that runs on one machine or a cluster of machines. Crate comes in one complete install package. It includes solid established open source components (Presto, Elasticsearch, Lucene, Netty) and extends those with added core functionalities like read/write support, SQL language, dashboard and query console.

All nodes in a Crate Cluster are equal. That makes configuration really easy. Furthermore, Crate is able to utilize all cluster resources for all functions if needed.

On each node, the Handler Side component receives client requests and processes them. After parsing, SQL queries are being analyzed and executed in a distributed manner. To calculate the final result, remote nodes collect intermediate results and merge them in a next phase. To make BLOB delivery even faster, nodes are able to stream results directly as well.


Crate.IO has built a new breed of database to serve today’s mammoth data needs. Based on the familiar SQL syntax, Crate combines high availability, resiliency, and scalability in a distributed design that allows you to query mountains of data in realtime, not batches. It is also 100 % open source. We built it so developers with a data intensive back-end won’t need to “glue” several technologies together to store documents, blobs and support real-time search. We also wanted to help developers avoid the manual work associated with tuning, sharding, replication and other operations required to keep a large de-normalized data store in good shape. We wanted a simple, failure-tolerant and massively scalable data store anyone can use, on a single machine, many machines or on the cloud.

A data store for data-intensive apps would need to be persistent, horizontally scalable and support these five requirements:

(1) store documents, (2) blobs, (3) find them again (4) run queries (analytics) and (5) allow making straightforward changes to the data structure if needed.

This is where the medley of technologies comes in. Couldn’t developers just glue together several technologies and get at the same results? Here are some examples:

  • MongoDB + ElasticSearch + GridFS. GridFS would support blob storage, ElasticSearch the search and MongoDB would do the rest. Making data structure changes may be difficult though and scaling and multi node environments may be difficult to manage.
  • Riak + Solr + Rados. Rados for blob storage, Solr for searches and Riak for the rest. This is very similar to the first option and likewise shares its issues.
  • CouchDb (document store and analystics) + ElasticSearch (search) + HDFS (blob storage, archiving) + Hadoop (analytics and changes). This solution would exhibit many of the issues associated with Hadoop and with its suitability for heavy read and write workloads.

Let’s see how Crate deals with these 5 requirements, while also supporting a self-healing and self-configuring data store.

  1. Storing Documents
    Crate.IO stores documents through the built in functionality of Lucene and ElasticSearch. Crate adds dump/restore functionality for major version upgrades which brings point-in-time-recovery to the next level.
  2. Storing Blobs
    Blob storage is done transparently, using the Crate blob implementation, which is built on the elasticsearch cluster semantics for distribution, replication and allocation. It simply stores blobs directly on the user-space filesystem.
  3. Finding Documents
    Crate uses Elasticsearch for search – that’s pretty much the best there is for speed.
  4. Real time data analytics using SQL
    Crate.IO enables real time data analytics using SQL through the distributed query analyzer, planner and execution engine. This adds advanced grouping and sorting functionality to provide SQL like “group by … having ..” and “order by” functionality. Think of Map/Reduce, but in real-time.
  5. Making changes without re-doing everything
    Remember the good old days when restructuring an entire relational database was possible by using a single SQL script? Crate implements that functionality by borrowing the ability to put the result of a query directly into one or more indexes. This requirement becomes even more eminent on document based databases, where data is often stored in a highly denormalized format

Use Case Examples

Get real-time answers from your huge datasets. Collect big amounts of data fast. Mix all kinds of data. Reduce expensive computing times. Quickly launch as many nodes as you want. Use it open sourced or buy support and SLAs. Crate’s speed allows new ways to think and new features to offer. Here are some samples:

  • Analyze billions of events that get logged on websites. Save computing time on the cloud (responses in seconds, not hours) and implement new features for customers or dynamic decisions on content rendering.
  • Mobile app developers – build a backend that will scale with the success of your downloads using straight-forward SQL commands and forget worrying about scalability problems. Use existing templates for iOS and Xcode.
  • Internet of things – provide a backend that constantly records sensor events of hundreds of thousands of devices. Deliver real-time responses when users access their smart home app on the web or mobile.
  • Tracking massive parallel activity of millions of users for a paywall or ad-system. Make real-time decisions, but also allow queries for long-time analysis.


Linux, Mac OS X or Windows
Java 7
(for the command line client)

Clients and Tools

Crate is shipping with a built-in Administration Interface. The Command Line interface (Crate Shell – CraSh) allows interactive queries. Crate’s Python client is most advanced and features SQLAlchemy integration.

当前网页内容, 由 大妈 ZoomQuiet 使用工具: ScrapBook :: Firefox Extension 人工从互联网中收集并分享;
若有不妥, 欢迎评注提醒:


订阅 substack 体验古早写作:

点击注册~> 获得 100$ 体验券: DigitalOcean Referral Badge

关注公众号, 持续获得相关各种嗯哼:


关于 ~ DebugUself with DAMA ;-)
公安备案号: 44049002000656 ...::