Big Data Analytics Using Hadoop and Spark: Applications, Challenges, and Future Direction

Surya Veera Brahmaji Rao Sunnam

doi:10.32996/jcsts.2021.3.1.7

Research Article

Big Data Analytics Using Hadoop and Spark: Applications, Challenges, and Future Direction

Authors

Surya Veera Brahmaji Rao Sunnam Lead Data Engineer, Bank of America, USA

Abstract

Big data describes large datasets that are difficult to manage owing to their diversity, size, and complexity in terms of storage, analysis, and visualisation for future operations. Artificial intelligence (AI) based big data analytics have revolutionised the processing of massive data sets by leveraging distributed computing platforms like Apache Spark and Apache Hadoop. Both HDFS and MapReduce leverage the 5Vs of Hadoop—Volume, Velocity, Variety, and Veracity—to store files and perform batch analysis on massive, heterogeneous datasets. Spark expands upon these capabilities with libraries for graph analytics, streaming, machine learning, in-memory computing, and directed acyclic graphs (DAGs). This paper gives a general outline of the Hadoop and Spark ecosystem, as well as their architecture and how they are used in distributed machine learning, real-time analytics, NLP, and anomaly detection. It also addresses such critical issues as scalability limitations, data protection, resource administration, and complexity of infrastructures. Lastly, the future directions are discussed with a focus on cloud-native designs, edge computing, and hardware acceleration as well as intelligent resource optimization to improve the performance, efficiency, and flexibility of the next generation Big Data systems.

Article information

Journal

Journal of Computer Science and Technology Studies

Volume (Issue)

3 (1)

DOI

https://doi.org/10.32996/jcsts.2021.3.1.7

Pages

50-62

Published

2021-02-28

Copyright

Open access

This work is licensed under a Creative Commons Attribution 4.0 International License.

Journal of Computer Science and Technology Studies

Big Data Analytics Using Hadoop and Spark: Applications, Challenges, and Future Direction

Authors

Abstract

Article information

Journal

Journal of Computer Science and Technology Studies

Volume (Issue)

3 (1)

DOI

https://doi.org/10.32996/jcsts.2021.3.1.7

Pages

50-62

Published

Copyright

Open access

Downloads

319

134

Keywords:

rightbar

submission

menus