The ‘Big’ in Big Data
by Craig Carpenter, Vice President Marketing at Recommind
Thursday, August 16, 2012
Craig Carpenter, Vice President Marketing at Recommind, discusses how the ‘Big’ in Big Data refers not just to the quantity of data, but also to the much larger variety of sources being analysed – especially unstructured data.
As data continues to be created at an exponential rate, the task of capturing, storing, searching, sharing, analysing and visualising it is ramping up to be a big business opportunity. According to IDC, companies will spend $120bn on analytics software between now and 2015, so they can extract better value from the vast quantities of data they hold, as well as reduce some of the associated IT headaches. The type of information that businesses have to manage and interpret is also changing - specifically, unstructured data requires a different approach. However, before companies dive into their vast ocean of data, it is best to understand what Big Data is in order to extract maximum value from it.
Gartner define Big Data as “extreme information management and processing issues which exceed the capability of traditional information technology along one or multiple dimensions to support the use of the information assets” . It has long been understood that organisations can use Business Intelligence (BI) tools to analyse internal, structured data to transform raw data into meaningful and useful information. However, businesses need to expand their analysis beyond the enterprise data warehouse and look at a much larger variety of sources, including unstructured data both within and outside the organisation, if they are to enable more effective strategic, tactical, and operational insights and decision-making . After all, unstructured data accounts for a large amount of all new content being created today. Merrill Lynch estimated that as much as 80% of all potentially useful business information is unstructured and because of this the real ‘Big’ in Big Data is the sheer variety of information being created and stored.
Unstructured data can take the form of blog posts, call transcripts, instant messages, tweets or other social media, to name but a few. It typically refers to information that has very little or no identified structure and does not easily fit into a relational database. A subset of unstructured data is semi-structured data, which is information that includes some structure but not a formal structure as seen with relational databases, which can take the form of emails or Word or PowerPoint documents, for example. Few companies currently have the technology in place to apply the same degree of sophistication to unstructured data as they can to structured data. Thus, the majority of businesses are only focusing on 20% of potentially useful information – a very “Big” problem indeed.
If businesses rely on internal structured information alone, they are missing out on key information spread across disparate systems as well as conversations taking place beyond their four walls. The true value of Big Data is the ability to identify and extract meaning from myriad data sources – especially unstructured ones – in order to support better business decision-making. These decisions must be based on the totality of information available and not merely a subset of transactional data from a relational database. Therefore, for businesses who are looking to extract maximum value from their data, they must no longer ignore the ‘Big’ in Big Data – unstructured data.