Think of your favorite website: how did you find it? Did you use a search engine? Or click on an ad? Or follow a link in a blog post?

Web developers, marketers, and data analysts use that information to improve their sources (sometimes called channels or touchpoints) online. If an ad campaign drives a lot of visits to their site, then they know that source is working! We say that those visits are attributed to the ad campaign.

But how do websites capture that information? The answer is UTM parameters. These parameters capture when and how a user finds the site. Site owners use special links containing UTM parameters in their ads, blog posts, and other sources. When a user clicks one, a row is added to a database describing their page visit.

The problem is when a multi-touch attribution comes into play. The process gets messy because analytics tools like Google Analytics has limitations, because segmenting data takes talent, time and money to do, and because attribution tools are expensive. That is why, we created an analytics tool that works wonders once a pixel is installed.

To make it clearer, here is Datagran’s schema for a “page visits” table below.

Datagran web pixel generates raw logs in the form of a table. Below is the Schema of the basic elements:

  • dg_ user_id - A unique identifier for each visitor to a page (it is defined by Datagran)
  • hit_time_utc - The time at which the visitor came to the page
  • hit_type - Datagran has 4 different ways to identify the action: a) pv: Page View b) click: click c) ce: Custom Event, d) fs: FormSubmission
  • dynamic_payload - this column is JSON-formated string and contains querystring of landing page. Use JSON_EXTRACT function to transform into JSON i.e. JSON_EXTRACT(dynamic_payload, '$.utm_source'). You can find utm parameters here.
  • utm_source
  • utm_medium
  • utm_campaign
  • hit_payload - this column contains information about the events described in hit_type column.
  • if the event is pv (page view), an example of typical hit_payload is {"title": "Datagran", "url": "[<https://www.datagran.io/>](<https://www.datagran.io/>)", "ref": "[<https://www.google.no/>](<https://www.google.no/>)"}
  • if the event is ce (custom event), an example of typical hit_payload is {"en": "Visit GT 1m", "ep": {"User id": "Capture from Cookie"}}
  • if the event is click, an example of typical hit_payload is {"tag": "a", "class": "navlink w-inline-block", "id": "", "text": "", "anchor": "", "pu": "[<https://www.datagran.io/>](<https://www.datagran.io/>)"}
  • f_source - this column contains source name where visitor came from.

User table:

  • dg_ user_id - A unique identifier for each visitor to a page (it is defined by Datagran)
  • custom_user_id - A unique identifier for each visitor to a page set by the client. This identifier will be in a different table ending with user. (this id is defined by the customer with dg_tracker.identify('useridentify') function)

By collecting specific elements in the life of a customer, we can create attribution models that are as simple or complex as we need them to be. Here is an example:

First Touch Example

Imagine June. She wants to buy a new t-shirt for her mother, who is visiting from out of town. She reads about CoolTShirts.com in a Buzzfeed article, and clicks a link to their landing page. June finds a fabulous Ninja Turtle t-shirt and adds it to her cart. Before she can advance to the checkout page her mom calls, asking for directions. June navigates away from CoolTShirts.com to look up directions.

June’s initial visit is logged in the ....__pixel_raw_logs table as follows:

Notion doc

  • June’s first touch — the first time she was exposed to CoolTShirts.com — is attributed to buzzfeed
  • June is assigned a user id of 10069
  • She visited the landing page at 11:14:01 and the shopping cart at 11:55:01

Last Touch Example

Two days later, CoolTShirts.com runs an ad on June’s Facebook page. June remembers how much she wanted that Ninja Turtles t-shirt, and follows the ad back to CoolTShirts.com.

She now has the following rows in page_visits table:

.._raw_logs

· June’s last touch — the exposure to CoolTShirts.com that led to a purchase — is attributed to Facebook

· She visited the checkout page at 08:12:01 and the purchase page at 08:13:01

Having this in mind, we can compare attribution based on the first and last touchpoint June had prior to making a purchase.

First vs Last

If you want to increase sales at CoolTShirts.com, would you count on buzzfeed or increase facebook ads? The real question is: should June’s purchase be attributed to buzzfeed or to facebook?

There are two ways of analyzing this:

· First-touch attribution only considers the first utm_source for each customer, which would be buzzfeed in this case. This is a good way of knowing how visitors initially discover a website.

· Last-touch attribution only considers the last utm_source for each customer, which would be facebook in this case. This is a good way of knowing how visitors are drawn back to a website, especially for making a final purchase.

The results can be crucial to improving a company’s marketing and online presence. Most companies analyze both first- and last-touch attribution and display the results separately.

The Attribution query

We just learned how to attribute a user’s first and last touches. What if we want to attribute the first and last touches for ALL users? This is where SQL comes in handy — with one query we can find all first- or last-touch attributions (the first and last versions are nearly identical). We can save this query to run it later, or modify it for a subset of users. Let’s learn the query…

In order to get first-touch attributions, we need to find the first time that a user interacted with our website. We do this by using a GROUP BY. Let’s call this table first_touch:

If user is identified

A line of code


This tells us the first time that each user visited our site, but does not tell us how they got to our site — the query results have no UTM parameters! We’ll see how to get those in the next exercise.

The Attribution query II

To get the UTM parameters, we’ll need to JOIN these results back with the original table.

An attribution query

SQL:

SELECT dg_user_id, MIN(hit_time_utc) AS first_touch_atFROM bright-bda.workspace_5e8a03edda47df95e42a0165.Datagran_coreOS_Pixel_V3__pixel_raw_logs as raw_logGROUP BY user.custom_user_id

We’ll join tables first_touch_tbl aka ft and ..._raw_logs as rl on user_id and ht_time_utc.

An SQL query


Remember that first_touch_at is the earliest timestamp for each user. Here’s the simplified query:

An SQL query


Now fill in the WITH clause using the first_touch query from the previous exercise. We’ll also specify the columns to SELECT.

An SQL query

Here, you can get utm parameters in dynamic_payload.

The Attribution Query III

If you have identified a user, identity of the user is saved in custom_user_id column in ..._user table. Here, we have joined ..._user aka u on dg_user_id field.

An SQL query


The Attribution Query IV

We can easily modify the first-touch attribution query to get last-touch attribution: use MAX(timestamp) instead of MIN(timestamp).

So by making use of a query, we can map our customers journey from beginning to end, giving us a microscopic view of what each customer is doing, and most importantly, where it came from. This opens doors to distribution channels for marketing initiatives we may be unaware of and that actually become a brand new source of leads.