The Danger Lurking in Dark Data and the Importance of Strong Information Governance

Stock illustration: Disparate data being organized Astrophysicists hypothesize that as much as 85 percent of our universe is composed of dark matter, the enigmatic subatomic particles that we can't physically see with our eyes, yet we can infer their existence.

Today's modern enterprise—with its sprawling and interconnected systems and distributed applications that extend across geographies, ecosystems, and supply chains—possesses a similar "dark data," where data exists everywhere yet is often obscured from sight. In fact, dark data is estimated to comprise 80 percent or more of most companies' total data volume.

According to Gartner, dark data is "the information assets organizations collect, process, and store during regular business activities, but generally fail to use for other purposes." Dark data is most often comprised of unstructured data—data that is user-generated and not stored in tidy applications. Examples of unstructured dark data include everything from documents and messages in Slack or Teams, to file metadata, multimedia files, or legacy data from former employees or mergers and acquisitions (M&As).

Recommended For You

And just as the cosmos is continuously expanding, so too is the universe of enterprise data. However, just because it's hard to see all this dark unstructured data doesn't mean organizations can afford to ignore it.

As too many businesses have come to learn the hard way, large accumulations of unmanaged dark data can lead to major privacy violations, data breaches, and reputational damage, not to mention the significant expense incurred from having to needlessly store so much data. Unstructured dark data represents an existential challenge for the modern enterprise and, as such, requires focused attention. The challenge is that unstructured data is a discrete issue needing dedicated attention, because not only do few vendors provide for it, but most companies mistakenly assign ownership to the IT department instead of a well-versed information governance or compliance team.

When Big Data Goes Dark

Complicating efforts to rein in dark data is the simple fact that data continues to grow at volumetric rates. It wasn't all that long ago that a terabyte was considered unwieldy. Now it's estimated that organizations will collect and store tens of millions of petabytes of data over the next few years. Given that the vast majority of data housed by businesses is unstructured, meaning it is not able to be stored and organized in a traditional relational application, the places where data can go dark will only expand. For example, unchecked growth of dark data often includes:

Employee data. When an employee leaves an organization, IT will typically take a snapshot of their laptop before wiping it clean and will dump the data on a server somewhere to deal with later. These mini-caches of personal data would be an attractive target for an opportunistic threat actor.

Post-M&A. Following a merger or acquisition, the acquiring company will work to quickly integrate the acquired company's data into its environment. Typically, in such a scenario, the data of the acquired business will also be copied, moved off-site, and (all too often) forgotten. Since this mirrored source of data often contains personally identifiable information, or PII, this represents another potential dark data risk.

Big data mashups. Not only are data volumes growing at exponential rates, they're also propagating in ways that can be hard to predict. As advanced analytics takes hold of the enterprise, we see more departments mixing and mashing datasets together to fuel business insights. But with every mashup, there is the potential for more data to go dark.

While most enterprises have some type of information governance in place that articulates their data policy guidelines, enforcing these policies is a challenge. Employees are already overwhelmed with their day-to-day tasks, so asking individual users with minimal training to properly tag and classify their data is unrealistic at best—and, at worst, will likely introduce inconsistency, manual errors, or simply be ignored.

Even small errors can expose a company to a security breach or trigger a data privacy complaint. With cybercriminals now spending an average of 280 days inside a compromised network, they have ample time to identify and exfiltrate high-value data targets.

To meet these challenges, many organizations believe they can rely on automated tagging features found in platform offerings to identify sensitive PII. While helpful, these tools' utility can be problematic and only solves 60 percent of the problem, with opaque rules engines, hidden costs, and lack of scalable review support.

These platforms also don't provide advanced visualization and workflow support that information professionals often require to manage the results. And again, both vendors and organizations tend to turn to the IT team to manage, when in fact they are not the right people because it's not their data and they have other priorities.

So, who should be responsible and how?

Strategies for Illuminating Dark Data

While it might seem like a daunting task, there are a few practical steps that can be universally applied to begin the process of illuminating and tackling the dark data that exists across your environment:

1. Establish an information governance (IG) and/or compliance team. The people that should be making decisions on data are the information owners and data custodians, who need to be accountable and need scalable ways of making decisions. The best way to accomplish this is with the formation of a dedicated IG/compliance team armed with technology that allows and empowers them to push decisions to information owners.

2. Prioritize unstructured data mapping. Every mature data discovery process begins with a data mapping exercise which serves as the definitive inventory of all the data that lives within an organization and details the process in which one set of source data is assigned or mapped to its target destination. However, given the frenetic pace at which new data sources are introduced, integrated, and updated, a good data map should not be a one-and-done exercise but rather something that is regularly and continuously updated.

3. Establish clear, easy-to-understand data-retention policies. Cheap cloud storage options have made it all too easy to be a data hoarder. However, data privacy laws such as Europe's General Data Protection Regulation (GDPR) and the California Privacy Right Act (CPRA) spell out different retention requirements for certain types of digital assets. For this reason, companies need to take an assertive approach to updating their data retention policies, especially in noncritical areas where holding onto certain data types longer than necessary can expose the organization to unnecessary compliance and security risks.

While no one can predict the future, it's not a great stretch to say that the unstructured dark data problem is only going to grow more acute over time. However, with the right team, approach, and a solid plan in place, it's possible to shine a bright light on all your data to transform it into new opportunities rather than worry about how it might be used against you.

Rich Hale is the CTO of ActiveNav, where he focuses on developing the market-leading File Analysis software. Rich spent 16 years as a Royal Air Force Engineer Officer deployed around the world. His career in the Royal Air Force spanned more than a decade, across numerous countries including the United States, Saudi Arabia, Kuwait, and Canada. He is a product and information evangelist, with experience hard-won through many years' developing information governance programs in enterprise and government agencies. Rich holds a B.Eng. Honors Degree in Aeronautical Engineering from London University, as well as an MBA from the British Open University.

From: Legaltech News

NOT FOR REPRINT