Data Lakehouse: Benefits in small and medium enterprises

Authors

  • Blaž Čuš
  • PhD Darko Golec IBM Slovenija

DOI:

https://doi.org/10.32015/JIBM.2022.14.2.5

Keywords:

Data Lakehouse, Data Warehouse, Data Lake, Data Analysis, Enterprise

Abstract

This article argues that the collection and processing of structured and unstructured data is necessary in small and medium-sized enterprises to advance in the business. The architecture, as it is known today, of a Data Warehouse is superseded by the larger Data Lakehouse. It is based on open data formats with direct access and has excellent support for machine learning and data science. To the enterprises that currently process only structured data, so they are technologically behind the competition, Data Lakehouse can help to solve big challenges based on Data Warehouses and Data Lakes, including data obsolescence, unmanageable data, and misplaced data. In the article it is presented how small and medium-sized enterprises are choosing to make a technological shift in the direction of Data Lakehouse, which helps them with strategic decisions and further steps in the enterprise for faster growth and development.

References

Armbrust, M., Ghodsi, A., Xin, R., & Zaharia Matei. (2020). Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics – Databricks. CIDR ’21. https://www.databricks.com/research/lakehouse-a-new-generation-of-open-platforms-that-unify-data-warehousing-and-advanced-analytics

Azure Synapse Analytics - Azure Synapse Analytics | Microsoft Learn. (n.d.). Retrieved January 9, 2023, from https://learn.microsoft.com/en-us/azure/synapse-analytics/

Begoli, E., Goethert, I., & Knight, K. (2021). A Lakehouse Architecture for the Management and Analysis of Heterogeneous Data for Biomedical Research and Mega-biobanks. Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021, 4643–4651. https://doi.org/10.1109/BIGDATA52589.2021.9671534

Behm, A., Palkar, S., Agarwal, U., Armstrong, T., Cashman, D., Dave, A., Greenstein, T., Hovsepian, S., Johnson, R., Sai Krishnan, A., Leventis, P., Luszczak, A., Menon, P., Mokhtar, M., Pang, G., Paranjpye, S., Rahn, G., Samwel, B., van Bussel, T., … Zaharia, M. (2022). Photon: A Fast Query Engine for Lakehouse Systems. Proceedings of the ACM SIGMOD International Conference on Management of Data, 2326–2339. https://doi.org/10.1145/3514221.3526054

Business Process Automation Market Size, Share and Global Market Forecast to 2026 | MarketsandMarkets. (n.d.). Retrieved January 9, 2023, from https://www.marketsandmarkets.com/Market-Reports/business-process-automation-market-197532385.html

Data Warehouse vs. Data Lake vs. Data Lakehouse: An Overview of Three Cloud Data Storage Patterns | Striim. (n.d.). Retrieved January 9, 2023, from https://www.striim.com/blog/data-warehouse-vs-data-lake-vs-data-lakehouse-an-overview/

Giebler, C., Gröger, C., Hoos, E., Schwarz, H., & Mitschang, B. (2019). Leveraging the Data Lake: Current State and Challenges. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11708 LNCS, 179–188. https://doi.org/10.1007/978-3-030-27520-4_13/COVER

Khine, P. P., & Wang, Z. S. (2018). Data lake: a new ideology in big data era. ITM Web of Conferences, 17, 03025. https://doi.org/10.1051/ITMCONF/20181703025

Maroufkhani, P., Wan Ismail, W. K., & Ghobakhloo, M. (2020). Big data analytics adoption model for small and medium enterprises. Journal of Science and Technology Policy Management, 11(2), 171–201. https://doi.org/10.1108/JSTPM-02-2020-0018/FULL/XML

Nargesian, F., Zhu, E., Miller, R. J., Pu, K. Q., & Arocena, P. C. (2019). Data lake management. Proceedings of the VLDB Endowment, 12(12), 1986–1989. https://doi.org/10.14778/3352063.3352116

Orescanin, D., & Hlupic, T. (2021). Data Lakehouse - A Novel Step in Analytics Architecture. 2021 44th International Convention on Information, Communication and Electronic Technology, MIPRO 2021 - Proceedings, 1242–1246. https://doi.org/10.23919/MIPRO52101.2021.9597091

Panwar, A., & Bhatnagar, V. (1 C.E.). Data Lake Architecture: A New Repository for Data Engineer. Https://Services.Igi-Global.Com/Resolvedoi/Resolve.Aspx?Doi=10.4018/IJOCI.2020010104, 10(1), 63–75. https://doi.org/10.4018/IJOCI.2020010104

Ravat, F., & Zhao, Y. (2019). Data Lakes: Trends and Perspectives. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11706 LNCS, 304–313. https://doi.org/10.1007/978-3-030-27615-7_23/COVER

Saddad, E., El-Bastawissy, A., Mokhtar, H. M. O., & Hazman, M. (2020). Lake data warehouse architecture for big data solutions. International Journal of Advanced Computer Science and Applications, 11(8). https://doi.org/10.14569/IJACSA.2020.0110854

Sawadogo, P., & Darmont, J. (2021). On data lake architectures and metadata management. Journal of Intelligent Information Systems, 56(1), 97–120. https://doi.org/10.1007/S10844-020-00608-7/METRICS

Shiyal, B. (2021). Azure Synapse Analytics Use Cases and Reference Architecture. Beginning Azure Synapse Analytics, 225–241. https://doi.org/10.1007/978-1-4842-7061-5_10

Singh, A., & Ahmad, S. (2019). Architecture of Data Lake. International Journal of Scientific Research in Computer Science, Engineering and Information Technology © 2019 IJSRCSEIT |, 5(2), 2456–3307. https://doi.org/10.32628/CSEIT1952121

Syed, A. (2020). The Challenge of Building Effective, Enterprise-scale Data Lakes. 803–803. https://doi.org/10.1145/3318464.3393816

Thomas, K., & Nair, P. S. (2020). Data Lake: A Centralized Repository. International Research Journal of Engineering and Technology. www.irjet.net

World Bank SME Finance: Development news, research, data | World Bank. (n.d.). Retrieved January 9, 2023, from https://www.worldbank.org/en/topic/smefinance

Behm, J., Steinmann, P., & Schlosser, R. (2022). The Data Lakehouse architecture in small and medium-sized enterprises: A use case of diagnostic data analysis. Journal of Business Research, 142, 387-396.

Salleh, R., & Abdullah, Z. (2018). Diagnostic data analytics for small and medium enterprises: A systematic literature review. Journal of Telecommunication, Electronic and Computer Engineering, 10(1-5), 59-63

Roy, S., & Dey, S. (2019). Data Lakehouse: An Architectural Solution for Multi-Structured Data Management. 2019 IEEE 4th International Conference on Computing, Communication and Security (ICCCS), 262-266.

Downloads

Published

2023-04-11

Issue

Section

Professional article

How to Cite

Čuš, B. ., & Golec, D. (2023). Data Lakehouse: Benefits in small and medium enterprises . Mednarodno Inovativno Poslovanje = Journal of Innovative Business and Management, 14(2), 1-10. https://doi.org/10.32015/JIBM.2022.14.2.5