Data Lakehouse: Benefits in small and medium enterprises
DOI:
https://doi.org/10.32015/JIBM.2022.14.2.5Keywords:
Data Lakehouse, Data Warehouse, Data Lake, Data Analysis, EnterpriseAbstract
This article argues that the collection and processing of structured and unstructured data is necessary in small and medium-sized enterprises to advance in the business. The architecture, as it is known today, of a Data Warehouse is superseded by the larger Data Lakehouse. It is based on open data formats with direct access and has excellent support for machine learning and data science. To the enterprises that currently process only structured data, so they are technologically behind the competition, Data Lakehouse can help to solve big challenges based on Data Warehouses and Data Lakes, including data obsolescence, unmanageable data, and misplaced data. In the article it is presented how small and medium-sized enterprises are choosing to make a technological shift in the direction of Data Lakehouse, which helps them with strategic decisions and further steps in the enterprise for faster growth and development.
References
Armbrust, M., Ghodsi, A., Xin, R., & Zaharia Matei. (2020). Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics – Databricks. CIDR ’21. https://www.databricks.com/research/lakehouse-a-new-generation-of-open-platforms-that-unify-data-warehousing-and-advanced-analytics
Azure Synapse Analytics - Azure Synapse Analytics | Microsoft Learn. (n.d.). Retrieved January 9, 2023, from https://learn.microsoft.com/en-us/azure/synapse-analytics/
Begoli, E., Goethert, I., & Knight, K. (2021). A Lakehouse Architecture for the Management and Analysis of Heterogeneous Data for Biomedical Research and Mega-biobanks. Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021, 4643–4651. https://doi.org/10.1109/BIGDATA52589.2021.9671534
Behm, A., Palkar, S., Agarwal, U., Armstrong, T., Cashman, D., Dave, A., Greenstein, T., Hovsepian, S., Johnson, R., Sai Krishnan, A., Leventis, P., Luszczak, A., Menon, P., Mokhtar, M., Pang, G., Paranjpye, S., Rahn, G., Samwel, B., van Bussel, T., … Zaharia, M. (2022). Photon: A Fast Query Engine for Lakehouse Systems. Proceedings of the ACM SIGMOD International Conference on Management of Data, 2326–2339. https://doi.org/10.1145/3514221.3526054
Business Process Automation Market Size, Share and Global Market Forecast to 2026 | MarketsandMarkets. (n.d.). Retrieved January 9, 2023, from https://www.marketsandmarkets.com/Market-Reports/business-process-automation-market-197532385.html
Data Warehouse vs. Data Lake vs. Data Lakehouse: An Overview of Three Cloud Data Storage Patterns | Striim. (n.d.). Retrieved January 9, 2023, from https://www.striim.com/blog/data-warehouse-vs-data-lake-vs-data-lakehouse-an-overview/
Giebler, C., Gröger, C., Hoos, E., Schwarz, H., & Mitschang, B. (2019). Leveraging the Data Lake: Current State and Challenges. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11708 LNCS, 179–188. https://doi.org/10.1007/978-3-030-27520-4_13/COVER
Khine, P. P., & Wang, Z. S. (2018). Data lake: a new ideology in big data era. ITM Web of Conferences, 17, 03025. https://doi.org/10.1051/ITMCONF/20181703025
Maroufkhani, P., Wan Ismail, W. K., & Ghobakhloo, M. (2020). Big data analytics adoption model for small and medium enterprises. Journal of Science and Technology Policy Management, 11(2), 171–201. https://doi.org/10.1108/JSTPM-02-2020-0018/FULL/XML
Nargesian, F., Zhu, E., Miller, R. J., Pu, K. Q., & Arocena, P. C. (2019). Data lake management. Proceedings of the VLDB Endowment, 12(12), 1986–1989. https://doi.org/10.14778/3352063.3352116
Orescanin, D., & Hlupic, T. (2021). Data Lakehouse - A Novel Step in Analytics Architecture. 2021 44th International Convention on Information, Communication and Electronic Technology, MIPRO 2021 - Proceedings, 1242–1246. https://doi.org/10.23919/MIPRO52101.2021.9597091
Panwar, A., & Bhatnagar, V. (1 C.E.). Data Lake Architecture: A New Repository for Data Engineer. Https://Services.Igi-Global.Com/Resolvedoi/Resolve.Aspx?Doi=10.4018/IJOCI.2020010104, 10(1), 63–75. https://doi.org/10.4018/IJOCI.2020010104
Ravat, F., & Zhao, Y. (2019). Data Lakes: Trends and Perspectives. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11706 LNCS, 304–313. https://doi.org/10.1007/978-3-030-27615-7_23/COVER
Saddad, E., El-Bastawissy, A., Mokhtar, H. M. O., & Hazman, M. (2020). Lake data warehouse architecture for big data solutions. International Journal of Advanced Computer Science and Applications, 11(8). https://doi.org/10.14569/IJACSA.2020.0110854
Sawadogo, P., & Darmont, J. (2021). On data lake architectures and metadata management. Journal of Intelligent Information Systems, 56(1), 97–120. https://doi.org/10.1007/S10844-020-00608-7/METRICS
Shiyal, B. (2021). Azure Synapse Analytics Use Cases and Reference Architecture. Beginning Azure Synapse Analytics, 225–241. https://doi.org/10.1007/978-1-4842-7061-5_10
Singh, A., & Ahmad, S. (2019). Architecture of Data Lake. International Journal of Scientific Research in Computer Science, Engineering and Information Technology © 2019 IJSRCSEIT |, 5(2), 2456–3307. https://doi.org/10.32628/CSEIT1952121
Syed, A. (2020). The Challenge of Building Effective, Enterprise-scale Data Lakes. 803–803. https://doi.org/10.1145/3318464.3393816
Thomas, K., & Nair, P. S. (2020). Data Lake: A Centralized Repository. International Research Journal of Engineering and Technology. www.irjet.net
World Bank SME Finance: Development news, research, data | World Bank. (n.d.). Retrieved January 9, 2023, from https://www.worldbank.org/en/topic/smefinance
Behm, J., Steinmann, P., & Schlosser, R. (2022). The Data Lakehouse architecture in small and medium-sized enterprises: A use case of diagnostic data analysis. Journal of Business Research, 142, 387-396.
Salleh, R., & Abdullah, Z. (2018). Diagnostic data analytics for small and medium enterprises: A systematic literature review. Journal of Telecommunication, Electronic and Computer Engineering, 10(1-5), 59-63
Roy, S., & Dey, S. (2019). Data Lakehouse: An Architectural Solution for Multi-Structured Data Management. 2019 IEEE 4th International Conference on Computing, Communication and Security (ICCCS), 262-266.
Downloads
Published
Issue
Section
License
The authors retain rights under the Creative Commons Attribution-ShareAlike 4.0 International CC BY-SA 4.0. Authors assign copyright or license the publication rights in their articles, including abstracts, to MIP=JIBM. This enables us to ensure full copyright protection and to disseminate the article, and of course MIP=JIBM, to the widest possible readership in electronic format. Authors are themselves responsible for obtaining permission to reproduce copyright material from other sources.