DataCloud Blog

Cloud, IoT, Big data tools, Data analytics techniques and tutorials

  • Home
  • Big data tutorials
  • Big data datasets
  • Big data tools
    • Hadoop
      • Hadoop query tools
      • Hadoop infrastructure
    • Data visualization
    • Analysis tools
    • Data store
    • Analysis environment
  • Data governance
  • Data security
  • Project & knowledge mgm.
  • About

Author: Peter Kortvelyesi

Analysis tools for big data

Common and popular open source analysis tools for big data

Configure Apache Kylin with ODBC to work with MS PowerBI

Configure Apache Kylin with ODBC to work with MS PowerBI

PowerBI and Kylin – reporting from Hadoop via ODBC This article discusses how to setup…

Evaluation of Apache Kylin 1.5.4.1 with HDP 2.5, performance comparison w Hive

Evaluation of Apache Kylin 1.5.4.1 with HDP 2.5, performance comparison w Hive

Apache Kylin is a data cube solution on top of Hadoop providing an ODBC interface…

Elasticsearch – Search and analyze in real-time

Elasticsearch – Search and analyze in real-time

Elasticsearch is a real-time search and analytics engine. It is scalable, distributed and reliable. Entities…

Storm – Real-time data procession

Storm – Real-time data procession

Storm real-time data processor: while Hadoop is mainly used for batch processing of data Storm…

Mahout – Data Mining, Machine Learning

Mahout – Data Mining, Machine Learning

Mahout is  machine learning library that can be used on top of Hadoop HDFS. It…

Weka – Data Mining, Machine Learning

Weka – Data Mining, Machine Learning

Weka contains a set of machine learning algorithms for data mining tasks – available to…

R – Statistical Computing and Graphics Language

R – Statistical Computing and Graphics Language

R is a multi-platform open-source language for statistical computing and visualization. It provides a wide…

Hadoop big data framework – Hadoop virtual machines

Hadoop big data framework – Hadoop virtual machines

Hadoop is an open-source framework for processing large amount of data across clusters of computers…