The trouble with Big Data:
unstructured data & legacy
systems

How can businesses get better insights from unstructured data by migrating from legacy systems? Graeme King, GTM Lead for Data Strategy and Advisory, talks us through the process.

The way the world generates data has changed. As individuals digitise more aspects of their lives than ever before – from social media profiles to wearable fitness tech, IoT kitchenware to mobile banking – we’re creating data in phenomenal volumes, at break-neck speed. It’s estimated that 2.5 quintillion bytes of data is produced every day, with 74 zettabytes of data expected to be created by the end of this year.

Most of this data – around 90% – is unstructured. As a result, the vast majority of data no longer fits into ‘traditional’ or structured data definitions: neat, clearly defined and readable fields and values, that can be stored in a relational database management system (RDBMS). We’ve moved away from data that fits into pre-defined data models – think excel spreadsheet rows and columns – towards richer, more complex data sets that have the potential to give us deeper and more nuanced insights into everything from product origins to customer behaviour and sentiment analysis on Social Media.

Yet while businesses are actively collecting this data, the technology they use to process, store and analyse it isn’t always up to the task. Legacy systems that were designed to process structured data aren’t fully compatible with unstructured data, yet it’s estimated that they still account for 31% of organisations’ technology systems worldwide. Sixty-one percent of financial firms surveyed by Adobe last year, for instance, cited legacy technology as a factor holding back their marketing and customer experience.

In short, legacy technology is preventing organisations from fully extracting value from their unstructured data - as well as creating additional strain on the organisation. If an organisation is dependent on legacy systems to house big data, it’s likely that it’s collecting a growing number of physical systems to cope with its growing unstructured data footprint, resulting in an increased need for management, monitoring and security to support them. The majority have opted – or ended up with – a hybrid of cloud and legacy architecture as a result, adding further strain if the systems are not integrated effectively.

If you want to embrace a data-driven future, you need to be able to access and analyse most of the data at your disposal, not just a fraction of it. That means having the ability to understand, process and act on unstructured data at scale.

What is unstructured data?

Unstructured data doesn’t follow the ‘excel spreadsheet’ format of information, which makes it harder to search and analyse. It can comprise of text files, photographs, documents, videos, audio files, social media profiles, survey responses, blog posts, information compiled from wearables, IoT gadgets, sensor data, website analytics; the list goes on.

To understand the difference between structured and unstructured data in terms of its usefulness, think of an entry in an old fashion Yellow pages phone book compared to a Facebook profile. One may be easy to search and extract meaning from, but the other gives a much richer representation of the person, sharing more insight than a simple name, address and telephone number. For businesses in today’s competitive digital market, that richer analysis is key to understanding their market, their customer behaviour and the commercial opportunities that are available with data.

How do legacy systems impact unstructured data analysis?

There are three main issues that hold back unstructured data processing and analysis when combined with legacy systems:

Volume

Managing escalating amounts of unstructured data in legacy systems just isn’t feasible. With most legacy systems being on-premise solutions, they lack the flexibility to cope with an influx of large amounts of data, requiring manual storage expansion to manage demand. Migration becomes a costly exercise, with data needing to be moved to new systems as space is filled. This is a drain on time, money and IT resources, for a solution that still can’t deliver the full results you need.

Speed

Legacy systems don’t offer real-time data processing, and struggle to process Big Data with speed and consistency. For many organisations, this is a cause of frustration, but for those with ‘mission-critical’ data operations – such as logistics, travel or medical – it poses a serious risk.

Diversity

Unstructured data comes in a variety of formats, while legacy systems were designed with a rigid idea of what ‘data’ should look like. As a result, legacy systems often need to be manually edited to incorporate unstructured data, and lack the full capability needed to analyse, search and extract these disparate data formats. Adding an image to a excel spreadsheet, for instance, might give you more information on a particular entry, but it won’t allow you to extract insights from those images at scale.

How can businesses modernise their data systems to incorporate unstructured data?

The answer to incorporating unstructured data lies in transferring from on-premise to cloud architecture to get the storage and processing flexibility you need – but it’s crucial that you choose the right cloud migration strategy for your organisation if you want unstructured data to become a viable commercial asset.

At Agile, we offer a four-step approach to cloud migration: identifying your pain points and vision for data, building a better architecture, phasing delivery with our agile methodology, and supporting your migration after implementation, allowing you to adapt and evolve your data strategy as and when you need to.

If legacy systems are preventing you from extracting value from unstructured data, contact our team to discuss cloud migration and integration.