Thursday, April 24, 2014

In-memory data and compute on top of Hadoop

Hadoop gives us dramatic volume scalability at a cheap price. But core Hadoop is designed for sequential access - write once and read many times; making it impossible to use hadoop from a real-time/online application. Add a distributed in-memory tier in front and you could get the best of two worlds - very high speed, concurrency and the ability to scale to very large volume. We present the seamless integration of in-memory data grids with hadoop to achieve interesting new design patterns - ingesting raw or processed data into hadoop, random read-writes on operational data in memory or massive historical data in Hadoop with O(1) lookup times, zero ETL Map-reduce processing, enabling deep-scale SQL processing on data in Hadoop or the ability to easily output analytic models from hadoop into memory. We introduce and present the ideas and code samples through Pivotal in-memory real-time and the Hadoop platform.

No comments: