Thursday, April 24, 2014

Your Data, Your Search, Elasticsearch

Finding relevant information fast has always been a challenge, even more so in today's growing "oceans" of data. This talk explores the area of real-time full text search, using Elasticsearch, an open-source, distributed search engine built on top of Apache Lucene. The session will showcase how to perform real-time searches on structured and non-structured data alike, how to cope with types and suggestions, do social graph filters and aggregations for efficient analytics. All from a Spring perspective Last but not least, the presentation focuses on the Hadoop platform and how Map/Reduce, Hive, Pig or Cascading jobs can leverage a search engine to significantly speed up execution and enhance their capabilities.

The presentation covers architectural topics such as index scalability, data locality and partitioning, using off and on-premise storages (HDFS, S3, local file-systems) and multi-tenancy.

No comments: