Lifecycle of Data

Photo by rawpixel on Unsplash

Everything that lives, has three stages in its lifespan — Birth, Death and the time between the two. Let’s see if we can define the same for data.

Birth — Recorded

Any piece of data is born when observations are recorded. These observations can be a doctors’ observation about their patients’ health, recorded on a health report; a poet’s thoughts recorded as poetry; a scientist’s calculations recorded as equations and proofs; an IoT sensor’s observations on temperature values recorded on a cloud database. The term recorded is important because data exists only when it can be consumed. It can be consumed when it is accessible. It is accessible when it is recorded and stored. Our thoughts in our head are not data, but if we write them down then they become data. Like, till yesterday, while I was still thinking, this piece of text was not data, today it is.

Life — Storage and Analysis

Once any data is born (recorded), it becomes ready for consumption. And throughout its life, while data is being consumed, it is transformed, analysed, tampered with, (sometimes) corrupted, restored, and the cycle continues. One thing which is common to all of these phases in the life of data is storage. Homeless data is no data. As soon as data is recorded, it is already stored (be it on a piece of paper or in a data center under the ocean).

Side note: Most of the time, storage defines how the data lives. If the storage is not digital, the consumption options are limited. We then transform the data in the digital form so that it can be consumed by various digital and/or online systems. When the storage is not secure, data becomes vulnerable to tampering and corruption. When the storage is non-scalable then it becomes hard to make the data reach its audience/consumers. Hence, it’s important that we give an easy to reach, secure and scalable home to our data.

Death — Discarded

After all the analysis, once we have extracted all the results, data is mostly useless. It’s either archived or deleted. Sometimes it becomes meaningless (we find another dataset that tells a better story). This part is a bit vague because sometimes the same dataset is used for a completely different purpose after it is discarded after initial usage. Loosely, I think, we can say, data dies when it cannot be used anymore.

Data, Lifecycle, Data Analytics, Data Science
comments powered by Disqus