Data might be the new oil but making sense of that data is the critical step in extracting value
Without context, data cannot provide useful insights. Connecting data is one of the most important aspects to provide context, enabling analytics to deliver value. Businesses must ensure they can facilitate this effectively, so that data remains an enabler and not a hindrance. Graph analytics allows businesses to construct relationships between separate data points to build context and empower business leaders to make more informed and beneficial business decisions.
What is graph analytics?
Graph analytics, also known as network analytics, is an analytic method based on representing data through nodes and links (known as edges). This allows the complexity and variety of real-world relationships to be recorded and analysed without losing information. Questions can then be asked of the data, such as the strength and direction of relationships between objects in the graph. Graphs are mathematical structures utilised to model numerous forms of relationships and processes in information systems.
One important distinction of real-world graphs from many theoretical models is that they are heterogeneous, meaning they contain lots of different types of nodes and edges. Nodes within graphs can represent people- for example customers and employees, companies and institutions, or places and so on. Edges can be used to represent any relationship, contact or transaction – for example, emails, employment, payments, family structures, or likes and dislikes.
Edges can be directed, whereby they have a one-way direction to represent a relationship from one node to another, such as a payment made to Jane from Tom. They can also be non-directed – for example, Kelly is married to Farid. These can be also be weighted, such as the number of payments between two accounts being high. Graph analysis can be used to highlight dominant edges, allowing one to identify, for example, the fact that many payments between bank accounts indicates money laundering activity. It is imperative that the graph analytics you deploy can account for, and leverage, this variety.
Examples of widely used types of graph analytics are:
- Shortest path analysis, which can indicate how closely two objects are related;
- Community detection, which can detect trends among groups of closely connected people when signals are too weak to be applied to a single person with any certainty;
- Centrality measures of the graph provides important information about the shape of a graph and the importance of particular nodes. They can be used to answer questions such as whether a set of different records with capturing different combinations of data are related to the same person.
- Graph queries, where a particular shape in the graph can be searched for.
Advantages of graph analytics
Because graph systems represent data, not as elements in tables, but as nodes linked to one another by edges with a set of properties that denote a relationship between the nodes, data analysts can navigate data sets without needing to create and run complex queries that connect combinations of tables together.
Moreover, another key advantage is that new sources of data and new relationships can be added with ease. New data can be merged and diverse datasets can be unified without significant investment in data modelling. Consequently, time is reduced organising and adding in new data sources, and analysis can occur with greater efficiency.
By giving a common structure for the whole data through a graph database, other data can be easily analysed. If you’ve invested resource in time and money in linking data to known lists, you can utilise this structure to analyse new data similarly without needing to begin again from scratch. By contrast, with traditional coding, it takes much longer to match each data piece to each checklist, and there is no unified structure that can be used multiple times.
Graph analytic techniques also allow you to identify which transactions are significant. Lots of analytics comes down to building features upon the data based on graph databases, using contextual intelligence to understand, for example, which connection when identifying instances of financial crime, has a link to a known bad actor.
Within data, connections are everywhere, so in order to garner meaningful insights, businesses need to locate reliable and relevant connections by understanding the context around relationships within such data.
How to gain value from your data using entity resolution
Entity resolution involves disambiguating manifestations of real-world entities in records or mentions by linking and grouping. For example, there could be multiple addresses for businesses or many photos of a specific object. As the volume and velocity of data grows, inference across networks and semantic relationships between entities becomes a bigger challenge. Consequently, entity resolution can work to minimise the complexity by creating references to specific entities and untangling duplicates and linked entities.
Many businesses make the mistake of taking what initially might seem a sensible shortcut of building graphs where each data item is represented by a separate node, rather than nodes representing real-word objects. The savings obtained this way will be strictly short-term and it will be difficult to obtain ongoing benefits from such graphs. Businesses that apply graph analytics in this way often hold the approach, rather than the application of it, responsible for the lack of returns. Without entity resolution, businesses create a view of their data that simply replicates the raw data rather than enabling efficient analysis, resulting in a mass of data that is of little use.
The nodes, however, are vital to gaining value from graph analytics; it is critical to build a graph model that represents the real world that analysis will then query, and this is achieved through entity resolution.
Indeed, entity resolution is vital in graph analytics for most business problems. Each data item is resolved to the underlying real-word object – an entity. Most entities are represented by many different labels, such as names, customer numbers and national identifiers. Innovative technology is required to ensure appropriate levels of precision and recall for the business problem being addressed. For example, the correct model for finding terrorists would not be appropriate for making credit assessments.
Using these resolved entities as nodes within your graph model enables you to get meaningful results from graph analytics. For example, if person one, Abe, a customer making a payment, is resolved to Abe who is living with person two, a woman named Debbie, we can write our graph query and understand the results. If our ‘Address’ is well-built as a node in a network, then our query will also find person three, Xavier, who has a joint account with person four, Yvonne. Our path distances will be indicative of real-world relationships, rather than of the shape in the data models in our source systems.
Businesses can also extend models and merge new information into the graph without rebuilding the analytics. Once an organisation has defined a graph data model, this will become a common language for data management going forward, and new data sources will be matched to one set of organisational data.
Overall, the benefits of graph analytic techniques are evident; businesses can gain meaningful insights by using the context around relationships within data to make reliable and relevant connections. Several sophisticated graph analytics methods are possible, but it is clear that entity resolution is critical to extracting real value from data. Moreover, by extending graph analysis, business leaders can also add new data without significant investment in data modelling.
About the Author
Felix Hoddinott is CAO at Quantexa. With over 21 years working in data science, Felix is experienced in designing the algorithms that drive business-critical decisions to gain real value from big data. Following his role as Head of Detica’s Datalab Academy, Felix has trained and developed a team of over 75 data science experts at Quantexa. He creates the innovative techniques behind contextual analysis and works with business experts to deploy ground-breaking solutions.
Featured image: ©Ipopba