3 reasons why microservices should be used for big data

Isabella Ferreira
5 min readFeb 7, 2023

When the COVID-19 pandemic started in 2020, businesses of all sizes, locations, and industries had to shift to remote mode. With that, businesses started a non-stop digital transformation and the amount of produced data is increasing every day now more than ever. In fact, according to IDC, the amount of data that will be globally created by 2025 will reach 181 zettabytes and the market volume is expected to grow by over $100 billion by 2027. Given this amount of data, it is important that your company knows how to effectively leverage big data and this is what this article is about. We will present 3 reasons why microservices should be used for big data.

What is big data and what are its challenges?

Big data is not defined by the amount of data, since there is no exact rule about what size the database needs to have to be considered “big”. In fact, what defines big data is the need for new techniques and tools to be able to process it [2]. Typically, you would need programs running across multiple physical or virtual machines to process data in a reasonable amount of time. Consequently, this can pose some challenges.

First, storing huge amounts of data properly might be challenging since most of the data is unstructured from documents, videos, audio, text files, and many other sources. Hence, it is necessary to transform data from one or more sources for further processing and analysis (a process called data ingestion), which can decrease the quality of the data adding noise to the analyses. Second, a lack of proper understanding of big data might be challenging. Furthermore, professionals know different technology stacks and it is challenging to deal with teams working with different technologies. Finally, storing and accessing big data is costly due to the huge amount of data.

Why microservices are a way to handle big data?

Microservices have been around for a while and they are an architectural pattern in which the application is based on many small interconnected services [1]. This architecture is well known for helping to improve the team’s productivity, have better fault isolation, increase scalability, and optimize business functionality [3].

Microservices can help to address some of the aforementioned challenges of big data and we present below three reasons why microservices should be used to handle big data.

  • Data quality: during the ingestion process, many points of failure can occur and the quality of the data can be damaged. Since each microservice has a specific function/task, it makes it easier to create, maintain, and test services as they flow through the application ingestion and transformation process.
  • Scalability is a major benefit of the microservices architecture and it can definitely help with big data applications. Each service runs independently, and the servers in which each microservices are hosted (normally on the cloud) are able to scale up and down in resources as needed.
  • Talent allocation: microservices are typically developed across a variety of technology stacks, which makes it easier to hire talented people that could work on processing the data with microservices using the programming language they are most proficient.

How can microservices be used to handle big data?

We present below two ways in which microservices can be used to handle big data.

  1. Integrating microservices with big data tools

The most straightforward way to handle big data with microservices is by integrating microservices with big data analytics tools. For example, TARS, an open source microservices framework, is able to integrate many well-known big data tools to store, process, analyze, and visualize huge amounts of data. For example, the client application can print a log that will be sent to such tools or send the data directly to them. We present below four well-known big data tools that can also be integrated with TARS:

  • Apache Hadoop is a framework for storing and processing data on a large scale. It runs on commodity hardware, making it easier to use with an existing data center or on the cloud.
  • Apache Spark stores the data for processing in memory (as opposed to on disk), which can make some data analyses run faster.
  • Apache Kafka allows users to publish and subscribe to real-time data feeds.
  • Elasticsearch generates insights from structured and unstructured data.

2. Offline data analysis can be sped up

There are two classes of technologies that support big data, namely online and offline big data technologies. Online big data systems have the capability to do real-time analysis, i.e., the data is prepared and analyzed as soon as it enters the database allowing users to gain insights and draw conclusions immediately [4]. Social networking news feeds, real-time ad servers, and Client Management Software (CRM) applications are examples of online big data systems. Such analyses need to be done quickly and for that, individual servers might be used to run such analyses.

In contrast, offline big data systems offer analytical capabilities for retrospective analyses that usually touch most or all of the data [5]. Apache Hadoop is an example of offline big data technology. Due to a large amount of processed data, it might take days for systems to analyze the data. Although it is not as urgent to get the results compared to online big data systems, such analyses might need systems to have good performance and scalability, and here is where microservices come into play! By using microservices, such analyses can be performed at night (when most people are not using the system, hence the computational resources are free to be used) and due to the scalability nature of microservices, it can quickly scale the analyses processes to idle services, allowing the analyses to be performed much quicker than without microservices.

TL;DR: Companies that leverage big data will have advantages in the market. Working with big data poses many challenges that can be addressed with microservices. There are a variety of open source tools that could be used to analyze big data and integrated with microservices. Furthermore, offline big data systems may benefit from the scalability nature of microservices.

About the author:

Isabella Ferreira is an Ambassador at TARS Foundation, a cloud-native open-source microservice foundation under the Linux Foundation.

References:

[1] https://tarscloud.org/feeds/2113758633994533

[2] https://opensource.com/resources/big-data

[3] https://www.appdynamics.com/topics/benefits-of-microservices#~3-increased-scalability

[4] https://www.sisense.com/glossary/real-time-analytics/

[5] https://www.mongodb.com/scale/online-vs-offline-big-data

--

--

Isabella Ferreira

Data Scientist | Machine Learning Engineer | Software Engineering Researcher