Journal of Advances in Technology and Engineering Research
Journal ISSN: 2414-4592
Article DOI:
Received: 4 May 2017
Accepted: 6 May 2017
Published: 21 August 2017
Download Article(PDF)
  • Analysis of real-time data with spark streaming

Nikitha Johnsirani Venkatesan, ChoonSung Nam, Earl Kim, Dong Ryeol Shin

Published online: 2017


Data analysis in real-world application domains is a very challenging issue. For example, Thousand Gigabytes of multimedia data gets poured into Social media each and every minute. Since social media and most of the organizations are dealing with Big Data, tools like Hadoop and Spark system is more appropriate for dealing with those data. Hadoop and Map Reduce analyze the data only in batch mode. This makes it difficult for the real-time analysis because it increases latency. In order to solve the above problem, we used Spark streaming to do real-time data analysis. Spark streaming helps to iterate through the data much faster due to its in-memory processing. This paper presents an online machine learning system for real-time data. Using Spark streaming, data from online messaging system is streamed into the local system. Streaming K-means algorithm is applied to cluster the different languages of the people from various countries. Results show that predictions of the incoming data is accurate and fast the when Apache spark is used. Our results and methods are compared with other articles which have used spark streaming for real-time data processing. Queries like total word count and segregation based on keywords are done and the results are presented. The data are then stored in the local disk for future querying process.