Data Lakes vs Data Warehouses Data Warehouses, traditionally popular for business intelligence tasks, are being replaced by less-structured Data Lakes which allow more flexibility. By Sundeep Sanghavi (Co-founder & CEO, DataRPM), June 2014. There is a phenomenal shift that is happening now in the enterprise data world with data warehouses, which have so far been the foundation for business intelligence and data discovery for several decades, getting obsoleted by the emergence of data lakes. The limitation of data warehouses is that they store data from various sources in some specific static structures and categories that dictate the kind of analysis that is possible on that data, at the very point of entry. While this was sufficient during the early stages of evolution of business intelligence where analysis was primarily done on proprietary databases and the scope was restricted to the canned reports, dashboards with limited and pre-defined interaction paths. This approach has started to fall apart in the world of big data discovery where it is very difficult to ascertain upfront all the intelligence and insights one would be able to derive from the variety of different sources, including proprietary databases, files, 3rd party tools to social media and web, that keep cropping up on a regular basis. While one may have some initial list of questions that they want answers to during the setup phase but the real questions only emerge when one starts analyzing the data. Ability to navigate from a starting question or data point to different directions, slicing and dicing the data in any ad-hoc way that the train-of-thought of analysis demands is essential for real data discovery. Example someone may start with a question like “What was the total revenue last year from North America” on a Transaction Database and then may want to slice the result by “North American states” and further by the “Demographics of the buyer” from the CRM Database and then proceed to correlate with the “Ads Campaigns”, from the Ad Platforms, to analyze the effectiveness of marketing spends and then navigate from there to evaluate the impact of efficiency and timelines of their “Delivery Logistics” on repeat sales by using the GPS data of vehicles. All of these analyses are happening on ad-hoc basis with new data sources added on the fly as per what the user thought process requires for decision-making. Therefore the traditional approach of manually curated data warehouses, which provide limited window view of data and are designed to answer only specific questions identified at the design time, doesn’t make sense any more for data discovery in today’s big data world.
This is where data lakes excel and why the world is now shifting away from data warehouses to data lakes. A data lake is a hub or repository of all data that any organization has access to, where the data is ingested and stored in as close to the raw form as possible without enforcing any restrictive schema. This provides an unlimited window of view of data for anyone to run ad-hoc queries and perform cross-source navigation and analysis on the fly. Successful data lake implementations respond to queries in real-time and provide users an easy and uniform access interface to the disparate sources of data. DataRPM offers an integrated data lake and data discovery platform for the modern business. We are to the world of big data discovery for enterprises what Google is to the world of information discovery for the web. We have pioneered a Smart Machine that delivers the following:
With DataRPM the machines do all the heavy lifting that the implementation of data lakes and big data discovery platform entail. All that a business needs to is to specify the connection parameters to the different data sources and then start asking questions in natural language to get answers that they can interact with and collaborate in-place with stake holders, anytime and anywhere. DataRPM empowers any user to become a data scientist and leverage the true power of data intelligence in the fastest, easiest, affordable, scalable and most natural way, thereby delivering data democracy in any organization. Source: https://www.kdnuggets.com/2014/06/data-lakes-vs-data-warehouses.html Disclaimer: The above article is published here in addition to providing a link in other pages of Big Data Space website so that visitors can still read the article in the event of having a broken link to the original article.
1 Comment
19/9/2023 05:12:16 pm
Registering a company involves the legal process of officially establishing a business entity. This entails choosing a business name, defining its purpose, and adhering to local regulations. Registration grants the company legal recognition and specific rights, such as liability protection. It's a crucial step for tax purposes, permits, and contracts. Proper registration ensures the business operates within the bounds of the law, facilitating growth and credibility in the marketplace.
Reply
Leave a Reply. |
AuthorWe are writing to share what we read about Big Data and related subjects with readers from around the world. Archives
September 2019
Categories |