Data Lakes Described

Data Lake as an image is found here: but will require laying a proper conceptual foundation of modern business. There are two terms to address here:

  1. Data Lake
  2. Single Layer

A data lake is a repository of data that can be accessed in a flow type model, a pipeline or aqueduct form a lake to a city is exactly this analogy. It is only useful if the pipeline is built, and the distribution system is accessible to anyone that can have cheap and affordable access to it. A pipeline would be useless to a home that if that home can consume only 100 gallons of water a month, but the pipeline supplies 1,000 gallons/second. The home would be destroyed, and its foundation eroded if supplied water at that rate without the proper infrastructure to support the delivery of such vast quantities of water.

Similarly, in the current year we face this problem of every end user being a type of small pond of data, but generally with the data flowing only one way. From small ponds to big lakes. User to producer. The producer is similarly challenged in distribution of data that is equitable to both producer and consumer without violation of some concept of privacy. This is the importance of having a ‘single layer’.

This concept challenges normal thinking in that producers should have their hands-on data solely at the consumer’s requests. Which creates an inefficiency as previously described. The data is pooled at the producer level to not flow back to the consumer creating an unequal exchange of equity in our day-to-day transactions that do not enhance service offerings or anything further exploration of the problem.

The proposal is that the data lake exist as an intermediary between producer and consumer and able to direct flows and access to the data to proper infrastructures that all its efficient uses.

We are further challenged in modernity that data only flows one way, from consumer to producer in the form of transactions, likes, dislikes, and images. The distribution does not flow in equivalent exchange from producer to consumer. Therefore, these lakes are pooling in vast inaccessible amounts that equate to stagnant waters.

A home that uses only 100 gallons/month but is delivered 1,000 gallons per second is destroyed and eroded. A home that returns only 10 gallons to 10,000-gallon lake never fills it back up. It is with these concepts in mind that the data lake proposal comes to fruition. The pipelines need to be filled with consistent and equal amounts of data interchange for their total equity to be accessed by both consumer and producer.