AI will drive demand for unstructured data management in the enterprise 

British mathematician Clive Humby declared in 2006 that “data is the new oil,” but at the time, the statement had far less meaning than it does today.

AI is now the processing power refining the world’s vast data stores into oil, which will drive business (and society) moving forward. Humby was ahead of his time – but no one could have predicted how quickly the data and AI landscape would change with the launch of ChatGPT in the fall of 2022. 

The distinction today is that unstructured data is at the heart of everything. It is both oil and electricity for modern enterprises. It comprises at least 80% of all data created and stored today and has enormous potential and liabilities. On potential, we know that AI and machine learning engines rely upon large quantities of unstructured data to work their magic. Leveraging this data for new value is imperative for organizations in every sector and geography.   

The unstructured data liabilities come from sky-high costs to manage and protect it and risks for security and noncompliance if data is stored and shared without proper guardrails. A 2023 survey on unstructured data management conducted by Komprise found that 32% of organizations manage 10PB of data or more. That equates to 110,000 ultra-high-definition (UHD) movies, or half of the data stored by the U.S. Library of Congress. Further, most (73%) organizations spend over 30% of their IT budget on data storage.  

As you may imagine, respondents told us that cost optimization was one of their top data storage priorities – although we also learned that before that, enterprise IT organizations were now focused on preparing for AI. Delivering data services such as file search and tagging for departmental users was also identified as a top priority. 

What this all means is that IT leaders have challenging and complex mandates concerning unstructured data: They need to optimize costs by reducing waste and overspending on storage and backup technologies while at the same time creating the right infrastructure and tools for departments and users to safely leverage data for new value, including the latest generation of AI technologies. 

Here are a few additional details from the research: 

  • Most organizations (90%) allow employee use of AI, and the majority (65%) have a policy in place to govern it;  
  • Violation of privacy and security is the top concern for corporate AI use, followed by data provenance and risks from inaccurate or biased data. 
  • The top unstructured data management challenge is moving data without disrupting users and applications (47%), followed closely by preparing for AI and cloud services (46%).  
  • 44% of the respondents identified monitoring and alerting for capacity issues and anomalies as the most important future capability of software solutions in 2023.  
  • Policy-based automation, such as moving data to cold storage or confinement for deletion, was the second highest future software need (41%), followed by the familiar theme of self-service access for line-of-business IT teams and researchers.  
  • Data protection was the top new use case identified for unstructured data management in 2023—similar to the 2022 survey. 

Potential of unstructured data from AI 

Across sectors, there is vast potential for using unstructured data in AI to improve the bottom line and create new valuable products and services. In healthcare, AI is delivering more accurate, faster analysis of common scans such as mammograms, cardiograms, and colonoscopies and helping medical leaders deliver better preventive care by analyzing chronic disease data. Generative AI solutions are reducing clinicians’ paperwork burden and even improving communications between physicians and their patients.  

In the automotive sector, AI will play an increasing role in improving safety and performance by analyzing sensor data (another exploding type of unstructured data) continuously collected during rides. In agriculture, mining data on weather, soil conditions, and more with Gen AI tools can identify risks and opportunities for farm operators, according to McKinsey.  

In the volatile energy sector, AI is analyzing climate data so that utilities can see when and where disaster might strike and help with proactive grid management and predictive maintenance to prevent outages. 

The potential use cases for AI in every industry are limitless. 

Why unstructured data needs a management strategy 

However, these AI examples in production are still at the very early stages for most enterprise IT organizations. So, where do you start? First, get a handle on the data.  

According to IDC’s 2023 report, “What Every Organization Needs to Know About Unstructured Data,” only half of an organization’s unstructured data is analyzed to extract value. Only 58% of unstructured data is reused more than once after its initial use. This is a missed opportunity, considering that most enterprises have petabytes of data under management. 

Beyond the ever-present need to reduce the costs of unstructured data storage and protection, enterprises need to corral this data better so it can be reused in AI. Both issues have been challenging to resolve because unstructured data is spread across storage silos, from the data center to the edge of the cloud. IT directors need help understanding how much data they have, how fast it is growing, what types of data they have, and how important it is to users. These data points are vital to make the right decisions for its management. 

For example, by understanding data access patterns, you can move cold data that has not been accessed for more than a year to lower-cost secondary storage, such as the cloud. This can dramatically cut annual storage and backup costs, especially as archival data may need fewer or zero backup copies. 

Unstructured data management solutions can play a pivotal role here. The index data across all storage vendors, including the cloud, analyzes data usage and spending by departments, models the savings of moving data to different storage, and empowers users with deep search capabilities to quickly find specific data sets as needed for analytics projects.  

These solutions also offer ways to create automated workflows so that data tagged with certain identifiers, such as “Project X,” can be managed differently from other data sets. Or, when integrated with third-party AI tools, unstructured data management platforms can scan, analyze, and augment files with new metadata to classify them further. 

The point is that we can no longer treat all unstructured data the same. Data should be managed across its lifecycle and according to changing organizational needs and priorities, including compliance requirements and research and analytics needs.  

An unstructured data management platform that can look across storage and manage data independently of where it is stored is a valuable and exciting software sector. With modern data management and AI tools, we are finally entering the era that Clive Humby foretold, where data is the new oil (or perhaps even energy) that will sustain the next generation of innovation. 

Prateek Kansal is the Head of Engineering & India Operations at Komprise. 

Image Source: Freepik

Share on

Leave a Reply

Your email address will not be published. Required fields are marked *