Event Data, as most people think, is not a new concept. You may not know it by its name, but it doesn’t mean you are not familiar with it already. Let’s try to understand it in detail.
For starters, it is the most powerful type of data there is and it is ubiquitous.
A lot of events are happening around us, regardless of our active participation in them. Be it in our phone applications, servers, cars, appliances, or even in our brains. This is great. We are now able to collect, analyze, and organize events in a huge way. Most of the things are connected via the internet these days and this makes it easier to collect information and structure them well. Imagine the kind of discoveries we can make about others’ behaviors, society, applications, and even ourselves. Wouldn’t that be something?
So what exactly is event data? Read on.
The simplest way to understand the concept of event data is by comparing it to entity data. Have you ever worked on an excel spreadsheet or with an application database? If you have, then you already know about entity data. It looks something like this:
Entity data is stored in tables. In simple terms, entities are nothing but users, accounts, posts, products, levels, and the like. For every type of entity, there is a separate table; and every table has columns that hold information about these entities. Also, one row is dedicated to each entity. In the example above, the entities are enemies.
Databases are usually designed to store entity data and these databases are sometimes called Relational Databases.
Nothing works as well as entity data to capture your applications’ current state, say accounts payable, number of each kind of product, users, etc. Whatever information you are looking for, you can find it quickly and easily.
One of the major attributes of entity databases is that they are all normalized. Data are seldom replicated. Say you have a table for ‘Accounts’ with features like account name, category, type, etc. These accounts have several users allocated to them, but you need not save those users’ information in the ‘Accounts’ table. Instead, you simple have allocate a key to each user, which in turn links to its account. This is especially useful when it comes to data storage.
The only drawback of entity data model is that you must check out the data from several tables. For example, if you want to sort customers by the products they purchase, you must cross check the data from several tables. This can be time-consuming.
Let us now understand the attributes of Event Data.
Event data example: ‘Pageview’ event
One of the outstanding attributes of event data is that it doesn’t describe entities alone: it also describes the actions that are performed by the entities. The above example describes the actions of publishing this article. Imagine a set of events called ‘Pageview’ that track events when a visitor comes into your website and views the webpages.
So what makes this ‘Pageview’ an ‘Event Data’? Simple. Event data has three pieces of data.
The ‘Action’ is nothing but a thing that’s happening, say, ‘viewing a page’. The ‘Timestamp’ is the exact time the thing happens. And finally, ‘State’ refers to the relevant data we know and understand about this event, including the data regarding the entities associated with the event, like the author, for example.
Note: Take a look at Ben Johnson’s take on Event Data in his Speaker Deck (he calls it ‘Behavior Data’)
Different use-cases of event data:
- Social media: Let’s think of a social media application that allows users to register, login, follow each other, like someone’s post or post a new content on the application. Each of these actions are user events. The social media application tracks all the events performed by the app’s users and analyze the data to increase their retention rate. Since they usually spend money for marketing in order to acquire users, they want the users to stick in the app so they analyze the users who stick in app and find the “North star metric”; which is the set of actions that are performed by the power users. Then, they optimize their app to convince the potential customers who are going to use the app to stick in the app.
- E-commerce: The e-commerce applications usually want to optimize their revenue so they collect all the user actions from sign-up to transaction in order to create funnels. If a user adds an item to his/her basket and doesn’t do the transaction, then want to optimize the workflow and try to make it highly optimized so that the users seamlessly complete the transaction funnel.
We can easily find out all of these things with the help of event data model. Some of the special characteristics of the data model are as follows:
- Data is Rich
- Data is Denormalized
- Data is Nested
- Data is Schemaless
Let’s go through the above one by one.
Event Data is Rich
In order to be able to analyze the data and create meaningful insights, you usually want to make your data as rich as possible so that you don’t miss any data points. It’s rather easy to delete the data if you decide that you don’t need it but if you don’t collect the data that may be useful for your business, there is usually not an easy way to recover that data.
Event Data is Denormalized
Unlike in a relational database, you will see the same information repeated multiple times in an event database. User attributes, difficulty settings, app versions, etc., could be repeated at each and every single event, even if they seldom change. It is not super intuitive, of course; but at the time of the event, this kind of repetition is the only way to capture the application state’s representation.
Now compare this with relational databases. The properties (player settings, for instance) are upgraded, but all the previous values are lost forever. On the contrary, event databases provide us with the potential to capture the entity data at a particular point in time. Having said that, event databases are not a replacement for entity databases. They are merely companions to entity databases.
Event Data is Nested
As explained before, event data can have multiple properties. It is important to note that most databases that are optimized and structured let you store properties with the help of nested JSON. This is especially helpful when you have a lot of properties and several entities to describe.
Event Data is Schemaless
Event data usually doesn’t follow a specific, strict schema. You can remove or add data based on your business needs. Your data warehouse should be able to handle the schema evolution without hassle as the business usually change the event schema often depending on their needs. Compared to the application databases which are mostly transactional, event databases are thus designed to manage any number of properties you send.
However; it’s often a good practice to have the schema as organized as possible. You may have 500 different event types but you shouldn’t get lost when trying find specific data-set. You need to have taxonomy system in order to be able to organize your data, have descriptions for each of the event types and even some of your event properties. Also, you need to give appropriate access to different organizations in order for them to analyze the event data themselves.
Event Data at Scale
Event data is different than your application data. It’s append-only, immutable, denormalized and streaming data. It’s also time-series data in some cases but since the data is collected from the client, the events can arrive the server after some time because of the network issues especially on mobile.
Even though you may only have a few thousands of users in your application, if they spend too much time on your application you may end up millions of events. In most cases, you usually want to track all the user actions in order to be able to analyze them later on, if you miss the user events, there is usually no way to recover them so it’s usually a better practise to collect all data and drop some of them if it’s not needed.
Since the characteristics of event data is different than the application data, you usually want to store it in an appropriate database in order to be able to ingest and run queries on that data-set easily. However; the event data warehouse solutions require different set of expertise since they’re optimized for different workloads and you either hire data engineers in order to maintain them or use solutions such as Rakam API in order to avoid starting from scratch. Also there are solutions such as Segment Warehouse for managed alternatives but if you have billions of events, the cost is usually too much.
In a Nutshell
Event Data is a very powerful data model there is. It lets us analyze and track things around us in an effective manner.
Take a look at the table below that compares entity data and event data.