Extending the World Cup Graph Domain Model

Last week I wrote a post explaining the initial data model for the Neo4j World Cup graph which looked like this:

World Cup Graph Model

This version of the model covered World Cups, teams, stadiums and hosts but what was missing was the players who played in those matches.

Over the weekend I've worked on getting the player data into shape and have added it to the graph. The domain model with players included looks like this:

World Cup Graph Model

So the things that have been added are:

  • A Squad node which represents the players who were named to represent their country at a specific world cup. These nodes are connected to a specific WorldCup and have multiple Players coming in to them.
  • A PlayerPerformance node which represents a player's performance in a specific match. This node has an incoming relationship from a player of STARTED or SUBSTITUTE.
  • A Goal node which represents a goal that was scored in a match. This is connected to the PlayerPerformance node and also to a Team node to allow us to indicate which team the goal is for. I modelled it like this specifically to handle own goals but it does seem a bit over kill for normal goals.

Instructions on how to import the data set into your local Neo4j are on the github repository. You can see have a look at all the scripts and the data set as well.

Ideas for data sets to add to the graph next:

  • Historic weather data - to see whether England only do well if it's cold and miserable. Perhaps via wunderground
  • GDP data - to see performance based on wealth. Maddison's Historical GDP Data could be helpful here.
  • Countries -> Continents - to see how well countries do in World Cups on other continents. Wikipedia can help out here
  • Club teams that players played for at the time of a World Cup - not sure of a data source for this?

Let me know if you have other ideas and come along and join us on Wednesday if you're in London.

Back to the top
comments powered by Disqus