Friday, June 19, 2020

Dark Data and ML

Data can be categorised into 3 categories:
  • Critical Business Data: this is the data that is required for the day to day operations of the business, it allows the business to grow
  • Trivial Data: this data is never used and has no importance to the business
  • Dark Data: this is the data that is hiding within your internal systems, folders, sources and networks and can hold a large amount of information that can be useful for business and can be moved to Critical Data Set category.

According to a recent IBM study, over 80% of all data is dark and unstructured. IBM estimates that this will rise to 93% by 2020. Dark Data examples:

• Spreadsheets • Analytics Reports and Survey Data • Multiple old versions of documents • Email attachments that are downloaded and then ignored • Inactive databases with unused customer • Inactive databases with unused customer information • Project notes and learning's

Basically Dark Data is the data that is left behind from various processes, scattered across every level of business. Some people may consider it as unnecessary and ignore it, whereas it can be highly valuable for making business decisions.

To handle Dark Data spread across different types of spreadsheets, emails, zip files, documents and images stored on various servers, the power of Machine Learning algorithms can be used. AI, Machine Learning and Analytics can systematically identify the rarely used data and indicates that data is obsolete.

Aggregation of data may be required for queries which then would need integration to access the data from different sources. Machine Learning can make the process efficient by automatic mapping between the sources and data repository.
 

No comments:

Post a Comment

Reduce food wastage with IoT Solution

Ethylene gas is produced by most plants, which use it as a hormone to stimulate growth & ripening . Fruits and flowers under stress can...