Article

Short message data and eDiscovery: how to keep focus?

Short message data and eDiscovery

Published: 07 March 2024

Share

Written by:

Harry Trick

Director

Forensic Services Leeds

One only needs look to high-profile events such as the ongoing national Covid inquiry or the recent ‘Wagatha Christie’ trial to see how important ‘short message’ data – conversations on text, WhatsApps, Teams or Slack channels – is becoming in determining who did what, when.

But the sheer volumes of these messages, and the way these platforms are used for communication, mean forensic investigators and legal teams frequently come up against a ‘Goldilocks’ dilemma: how do you get the balance ‘just right’ between detail and context in a sea of information, and delivering an efficient investigations process?

The long and (short) of it

‘Short messages’ are an increasingly common part of how people communicate at work today, particularly following the substantial rise of remote and hybrid working.

While a large proportion of formal communication still takes place on email, telephone or printed materials, so much information is now shared on messaging platforms. Microsoft’s Teams boasts 300m users worldwide; Slack has 32m active daily users; WhatsApp, and even Facebook Messenger, are business channels.

This trend is only set to accelerate. Between 2022 and 2023, Relativity’s RelativityOne eDiscovery platform recorded a 430% increase in short message data volumes. It’s anticipated that short message data will soon surpass email volumes on the platform, if it has not already at the time of writing.

And the fact that short message data is enmeshed within our working lives means it must be processed as part of any contemporary investigation. But it’s raising new, and specific, challenges for the process that eDiscovery teams, and legal teams, have to both anticipate and manage.

One such challenge is sheer volume. Short messages are pithy by their nature . As a result, points are traded rapidly, and in large number. This often means investigators have particularly large data sets to work through – one case that we worked on last year resulted in us collecting almost three terabytes (3,000 GBs) of short message data alone.

Another is content format and diversity. Take a Teams chat as a simple, single example. One thread, plucked at random from a business, is likely to predominantly contain written messages being traded between colleagues. But it will also have attachments, images, GIFs and emoji-based reactions to messages. All of these encode meaning and constitute communication, and all therefore need to be considered as part of the investigation at hand.

And then there are the issues of ‘conversational cross-over’ and channel shift.

Anyone with experience of a group chat will know how regularly multiple conversations are conducted simultaneously in a single thread, or how conversations can jump back and forth between channels.

Where an exchange might once have been carried out entirely over email, it may now feasibly progress from email to Teams, WhatsApp, email and then videocall in quick succession.

This creates a complex, convoluted, conversational ‘trail’ that forensic investigators need to identify, map, and then follow.

Changing approach

The fact that there is so much of this data to manage and that it’s likely to be fragmented and multiform poses a risk that forensic investigators either miss information, waste time going down conversational ‘rabbit holes’, or extract fragments of information that are missing critical context.

Effectively untangling short messaging data comes down to always making sure that the investigation team is fully focused on the ‘right’ thing, and that context can be mapped and maintained throughout.

Fundamentally, this requires a very specific, human, mindset from investigators.

When faced with a sea of potential data, avenues, and connections the starting point and guiding principle for the investigation must be: ‘how and when are people likely to have communicated’?

It’s an obvious point but one that I find is so often overlooked, and that has unique implications, and transformative potential, in a short message context.

For example, a little bit of knowledge and real-world experience around messaging platforms will bring an investigator to the first principle of short message data: for every group chat, there’s an equally busy, and separate, group chat tied to it.

Applying this to the data identification process at the start of an investigation can help investigators map out the whole ‘universe’ of short message data they may need to analyse, that at first may have been hidden, or seemed unconnected.

Similarly, considering how people are likely to change their method of communication as they go about their day can help investigators focus their efforts by identifying when different data sources are likely to be most relevant. Someone may be most likely to use email as their primary communication channel during their working hours; an instant messaging platform like Teams during lunch or their commute, and then WhatsApp out of hours, for example.

This sort of ‘what would they have done’ planning ultimately means forensic investigators can minimise the risk of blindly stumbling into dead ends or following red herrings, and helps develop far more useful hypotheses that they can then test through questioning the data, and witnesses.

A (digital) helping hand

This is not to say that technology is not helpful. It’s the other part of the puzzle. The latest generation of eDiscovery platforms – like RelativityOne – have powerful capabilities to help investigators work more efficiently and generate better results.

For example, it can be very hard to thoroughly search short messages using a basic ‘keyword’ approach.

Some, potentially critical, information for the investigation could be coded; conveyed through emoji reactions, pictures or GIFs; or communicated indirectly through linguistic features like implication or sarcasm – all things that a text-based search function may not uncover, but which nonetheless ‘communicate’.

To avoid having to manually read entire conversations, investigation teams can use tools such as sentiment analysis to search in a different way instead. This identifies shifts in the emotion of conversations; picking out points of anger or desire, for example, which could indicate particularly ‘hot’ points in exchanges. These can then be homed in on for further manual analysis.

Investigators can also use timelines – and timeline-building tools – to focus. If you identify one message of interest, you can then look at all other conversations that the sender or the recipients were involved in immediately around that timepoint, or shortly before or after – even if they were on other channels.

Using an eDiscovery platform, message fragments from different conversations, or even different platforms, can then be stitched together to create a unified, digital, and, crucially, focused timeline of an event, highlighting everything from who was involved, to what exactly was said or done.

The road ahead

But technology really is a tool; and a tool that’s only as good as the strategy behind it. Any investigation involving short message data – which will become more and more prevalent– must have both working in harmony.

eDiscovery is about data and information, but ultimately data that’s just a record and reflection of how humans have behaved.

To keep focus with short message eDiscovery, a human-first strategy must be in place.