Unstructured data management has a new job description 

For most of the last decade, unstructured data management was a storage problem. Move it, tier it, archive it, forget it. The tools built to handle file and object data were designed around one core objective: keep costs down and keep the lights on. That era is over. 

Unstructured data, which includes user documents, multimedia files, logs, emails, research and instrument data and anything else not in a database now represents roughly 80–90% of all enterprise data. Now that enterprises are storing multiple or dozens of petabytes of this data, along with other developments, its needs and requirements have vastly shifted: 

  • AI, which runs on this data, requires it to be clean, high quality and cataloged. As CXOs demand AI strategies and clear pathways for ROI without incurring risk, unstructured data has quickly become a critical enterprise asset, and liability. 
  • Costs to store and back it up escalate every year with 20% or higher annual growth rates. The situation has significantly worsened in 2026 due to the SSD and DRAM shortages and 30-100% price surge from IT infrastructure and hardware vendors. 
  • Security teams understand the compounding risk of unstructured data as it is unmanaged, highly and easily accessed and shared across teams and geographies.  

The software category built to manage this data is evolving rapidly to meet these demands.  

What independent unstructured data management actually means 

Given the commonplace use of this term across storage and data management vendors today, let’s review the category as it has developed in recent years. Independent unstructured data management software operates across storage environments, including on-premises NAS and object stores, cloud storage and edge locations, to deliver analysis, movement and workflows holistically. 

These platforms are distinct from the management consoles bundled with your NetApp, Everpure, Dell, Qumulo, VAST or even Amazon S3 buckets. This independence matters: when your unstructured data spans three to four vendor-native tools, you must patch together partial views of a problem that requires one complete picture. You also cannot execute data management policies across your hybrid storage environments with a storage or cloud vendor-centric tool. Plus, storage vendor methods for tiering data to low-cost storage are proprietary, disruptive to users and limit savings and flexibility. 

Modern, storage-agnostic platforms can turn unstructured data liability into a cost-efficient, governed, AI-ready asset that is the foundation for organizational success. 

Four forces driving the new face of unstructured data management 

Cost optimisation in hybrid IT 

Few organizations have accurate, deep analytics and visibility into their unstructured data: types, sizes, growth rates, where it lives, departmental trends, access patterns, and what it costs to store and move. This lack of visibility makes it difficult or impossible to right place data as it ages or as the data becomes less valuable to the organization. Duplicate and orphaned data can also be rampant, and deleted, yet too many organizations don’t have insights here either. The time is now to get deep visibility for data lifecycle management as the SSD price surge, driven by widespread memory shortages, may only get worse. 

  • Policy-driven, flexible tiering based on actual access patterns and showback reporting that ties storage costs to business units are standard capabilities in mature platforms.  
  • This gives IT a credible basis for cost accountability conversations with finance and allows for analytics driven lifecycle management and capacity reclamation.   
  • File-based tiering, versus block tiering with storage vendors, allows for full file access at the destination and zero rehydration costs when moving tiered data to new storage.   
  • Data access for tiered data should be simple and transparent, and the solution should never impede hot data performance. 

AI data preparation & workflows 

Every enterprise AI initiative eventually runs into the same wall: the data isn’t ready. Models need clean, labeled, contextually rich training data; unstructured data in most organizations is a sprawling mess of inconsistent metadata, redundant files, and content that nobody has touched in years. The newest generation of unstructured data management tools is attacking this problem directly, with automated metadata enrichment, content classification engines, and governance capabilities to protect sensitive data from ingestion into AI agentic workflows and RAG pipelines. 

  • IT teams that can reliably index, tag, curate and deliver high-quality unstructured data become genuine enablers of AI projects rather than a bottleneck.   
  • IT also needs ways to prevent AI processing and storage waste. This comes down to automated workflows that can curate exactly the right amount of data to AI and no more, along with deleting copies sent to storage for AI processing once the job’s complete. 

Ransomware and cybersecurity resilience 

Unstructured file stores are among the most attractive ransomware targets in the enterprise. They’re massive, often inconsistently monitored, and frequently connected to systems across the organization. Recovery from a ransomware event hitting unstructured data has historically been slow, expensive, and incomplete. 

Modern unstructured data management platforms are building security capabilities directly into the data layer:  

  • Sensitive data detection and mitigation; 
  • Behavioral anomaly detection on file access patterns; 
  • Tighter integration with immutable snapshot and air-gap technologies, and;  
  • Ransomware defense by tiering cold data to immutable object storage that attackers can’t touch. This can shrink the attack surface by up to 80%. 

Unstructured data governance and compliance 

The compliance surface area for unstructured data has expanded significantly. GDPR, HIPAA, CMMC, NIST, SOC 2, the EU AI Act, and a growing body of state-level data privacy regulation all touch unstructured content in ways that legacy storage tools were never designed to address. 

Watertight compliance programs require policy-based automated retention and deletion, sensitive data discovery that can identify PII and regulated content at scale, full audit trails tied to content classification, and role-based permissions that follow data across environments.  

What IT managers need to watch  

The capabilities described above represent genuine progress, but they also raise the bar for the teams deploying them. These are no longer tools that a storage administrator configures once and monitors occasionally. Teams must think beyond traditional storage metrics and capacity planning to data policy, classification logic and cross-functional governance. 

There is also an organizational dimension that IT leaders would be unwise to underestimate. Decisions about data retention, AI readiness, and compliance posture now intersect directly with legal, security, and finance. IT teams that position themselves as strategic partners in those conversations will have influence over outcomes, processes and resources. 

Addressing unstructured data management objections 

Enterprise adoption of new data management solutions can stall around a predictable set of objections. They’re worth addressing directly. 

  • “We already have tools from our storage vendors so why add another layer?” Storage vendor tools are optimized for their own platforms. In a hybrid environment where data moves across multiple systems and clouds, they provide partial visibility at best, and tiering capabilities that don’t save enough. 
  • “We don’t have the budget or headcount for this right now.” The cost of a ransomware recovery event, a failed compliance audit, 2X costs for new Flash storage or a delayed AI project due to data readiness failures is increasingly quantifiable. Build a business case with concrete numbers from your own environment to tell the story. 

Seizing the unstructured data business opportunity 

The teams that will navigate the next three to five years of enterprise IT most effectively are those that stop treating unstructured data management as a housekeeping function and start treating it as the lever to create new organizational value with AI. The practical question for any IT leader is this: does your current toolset give you genuine visibility and control across all four of these dimensions? If it doesn’t, the gap between where you are and where your organization needs you to be is your roadmap. 

Authored by Prateek Kansal, Senior VP of Engineering & Operations, Komprise. 

Share on