Search
  • Skill Magnet

Do you really need to implement Big Data technologies in your ecosystem?

From many years, “Big Data” has become widespread and trendy. The Big Data technologies started to fill the gap between the traditional data technologies (RDBMS, File systems … ) and the high evolution of the data and business needs.

While implementing these technologies is a must for many large-scale organization to ensure the business continuity, many organization are aiming to adopt these technologies without really knowing if they can improve their business.

Before making your decision, there are many things you should take into consideration.


Knowing what is Big Data

Before asking if your business needs Big Data technologies, you have first to know what is Big Data. I know this may looks weird but for true it happens.

Many times we were asked to develop a Big data ecosystem for many companies just because the size of their databases is close to 1 TB after years of storing data while they only need to optimize the data storage and to adopt some best practices.

One time, I was asked to build a “Big Data solution” just to let a company manager able to search over hundreds of excel files that contains different information. At the end, the goal was achieved using some SQL Server integration services packages, a small data warehouse and a Power BI project.

Just googling the word “Big Data” will lead you to many articles describing it and it will removes any misunderstanding if existed.

Big Data is not only a data with a huge volume, there are many other characteristics such as velocity, veracity and variety.


What technologies do you need?

If you are sure that you have Big Data then you have to choose what is the right technology you are looking for.

There are some myths about Big Data technologies; many people think that Big Data means Hadoop all the time while others think that Hadoop is a database management system.

As a data engineer, before choosing the right technologies I have to study what type of data we are looking to handle; real-time or at rest, online or offline, structured or unstructured. Then I need to specify the ecosystem layers and then choose the relevant technologies for each layer after studying other factors such as available resources, cost and others.


Big Data Ecosystem example (Project called ORADIEX)


In general there are some common ecosystem layers:

  • Data ingestion layer (Reading data from data sources): there are many tools such as Apache Kafka, Sqoop and others.

  • Data Processing layer (Data cleansing, aggregation): Apache Spark, Storm, Hive, Pig, MapReduce …

  • Raw Data storage (Data lake which stores ingested data without as it comes): Hadoop

  • Processed Data storage (Database , Data warehouse which stores data after being processed and cleaned): Apache Cassandra, InfluxDb, MongoDB, Apache Hive …

  • Data Visualization layer (Drawing real-time graphs): Kibana, Grafana …

  • Data Retrieval layer (Searching over huge amount of data): You can use distributed search engines such as Solr, Elastic Search, Sphinx, Manticore …


Big Data technologies implementation cost

The second important things is to know the implementation cost (money, time, and staff …);

Even if most of Big Data technologies are open source and free of charge but they require expensive resources and a specialized staff while there is lack of a clear technical documentation for these technologies online.

You should not only think about the technologies prices without taking other factors into consideration.


Conclusion

At the end, there are other factors that you may take into consideration but you have to know that Big Data technologies are not magic sticks.

Sometimes your company only need to implement some best practices or to perform some optimization and the current system will serve you for many years before needing to adopt new technologies.

On the other hand, when you really need Big Data technologies you will not take a lot of time to adopt.


Source: Do you really need to implement Big Data technologies in your ecosystem? | by Hadi Fadlallah | Towards Data Science

3 views0 comments