DIY Page Analytics

This site is hosted on GitHub Pages. Because I’d like to have a rough idea of which of my posts are receiving the most engagement and which sites are driving the most traffic, and since GitHub Pages doesn’t provide me with this by default, I decided to add some page view analytics to my site. I’m not particularly keen on privacy-invasive services like Google Analytics, which collect far more information than I need and share it with 3rd-parties, so I decided to build my own minimalist service. Also, it just seemed like a fun yak-shaving exercise, and everything else on this page is hand-made.

In this post we’ll walk through how to build a very basic page analytics service using Azure Functions and Table Storage, which you can host yourself for a few cents per year—my last twelve invoices reached a grand total of 49 cents.

Is this going to be a genuine substitute for Google Analytics? Definitely not. It won’t track session flows, time spent on the page, conversion, audience details, or anything fancy like that. It’s just going to do the absolute basics, which is to record which pages get viewed and what the referrer was. For a page which receives as little traffic as this website, that’s perfectly adequate.

The way that it works is that a bit of JavaScript will run on each page view of your website, which sends some information (the path of the page being viewed, and the referrer) to an Azure Function, which then stores these details into Table Storage. You can then query Table Storage to see which pages are being viewed and where the traffic is coming from.

Before we get started, you’re going to need to make sure you’ve installed the Azure CLI, the Azure Functions Core Tools, and the .NET Core CLI.

Function setup

Let’s get started by creating our project directory and adding the required boilerplate. Open up your favourite terminal and run the following commands to create a blank .NET function app.

mkdir logger
cd logger
func init --worker-runtime dotnet

Next we need to add our function to it, and install the required package dependencies.

func new --language 'C#' --template HttpTrigger --name Log
dotnet add package Microsoft.Azure.WebJobs.Extensions.Storage

This will create a Log.cs file containing a standard C# function. At this point, you could test the function locally by running the following.

func host start

Function implementation

The first thing we need to add is a class to represent a log entry. In Table Storage, each entry must have a PartitionKey and a RowKey, which combined must uniquely identify each entry. Entries will also automatically have a Timestamp property which records the time that they were last modified.

We’ll use the PartitionKey to record the page being visited, but unfortunately we can’t use the RowKey to record the referrer since this (hopefully) wouldn’t be a unique pair. Instead we’ll add a new property, Referrer, and use a random GUID to ensure uniqueness of the RowKey property. This means that our LogEntry class will look as follows.

public class LogEntry
{
  public string PartitionKey { get; set; }
  public string RowKey { get; set; }
  public string Referrer { get; set; }
}

Next we need to change the function implementation. I’ll supply the full code snippet first and then we’ll walk through it together.

[FunctionName("Log")]
[return: Table("LogEntries")]
public static async Task<LogEntry> Run(
  [HttpTrigger(AuthorizationLevel.Anonymous, "post", Route = null)]
  HttpRequest req,
  ILogger log)
{
  log.LogInformation("C# HTTP trigger function processed a request.");

  string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
  dynamic data = JsonConvert.DeserializeObject(requestBody);

  string pathname = data?.pathname;
  string referrer = data?.referrer;

  log.LogInformation($"{pathname} -> {referrer}");

  return new LogEntry
  {
      PartitionKey = pathname.Replace("/", "|"), // '/' not supported
      RowKey = Guid.NewGuid().ToString(),
      Referrer = referrer
  };
}

The first change is to add the [return: Table("LogEntries")] attribute and change the return type from Task<IActionResult> to Task<LogEntry>. This allows us to return a new entry for the LogEntries table in Table Storage, which will be inserted automatically.

Next we changed the access level from Function to Anonymous, because we’re going to be hitting this function from publicly viewable JavaScript where we can’t hide an API key anyway, and changed the allowed methods to only allow POST, since that’s the method which bests describes our action.

Finally, we changed the body of the function retrieve two properties from the body of the request, pathname and referrer, and use them to construct a new LogEntry object as described above. Note that we’ve had to replace forward-slashes / with vertical-line | because the PartitionKey property doesn’t support values with forward-slashes. We are expecting the body of incoming requests to look something like this.

{
  "pathname": "/posts/diy-page-analytics",
  "referrer": "/posts"
}

Putting it all together, we can now send an unauthenticated POST request to this function with a body as described above, it will pick out the pathname and referrer properties, and add an entry in the LogEntries table of Table Storage. “Which Table Storage account?”, you might ask. It turns out Azure Functions require a Table Storage account already to record function invocations, so when you test this locally this will likely be through Azure Storage Emulator, and when running in Azure, it will be a Storage Account that we’ll create.

If you want to test this now, you can run the following.

func host start --cors '*'

You can then use a tool like Postman to send a request, and Azure Storage Explorer to verify that the entry is added to Table Storage.

Updating your website

Next up we’ll add some JavaScript to your website to trigger the Azure Function and send the payload containing the pathnam and referrer.

All you’ll need to do is add the following script somewhere to each page, whether that be by adding it to some existing widely used script file or creating a new file and including it in the header of each page. I’d advise against in-lining it in a script element on each page in case you need to change any of the behaviour (especially the URL) later.

(function () {
  if (navigator.doNotTrack === '1') return;
  const payload = {
    pathname: document.location.pathname,
    referrer: document.referrer
  };
  navigator.sendBeacon(
    'http://localhost:7071/api/Log',
    JSON.stringify(payload)
  );
}());

Let’s step through this. The first step is to check the doNotTrack property on the global navigator object. This will be set if the user included the DNT HTTP header in the request, signifying that they do not wish to be tracked. We should be respectful of the users’ privacy wishes, so if they’ve set DNT then we abort.

Next we grab the location.pathname property as the pathname and the referrer, and use the Fetch API to send a POST request to our function. Finally, we wrapped all of this in a self-invoking anonymous function to avoid polluting the global namespace.

We’ll have to come back and update the URL after we deploy to Azure, but for now this should be testable locally.

Deploying to Azure

Now we’re going to deploy our function to Azure using the Azure CLI and the Azure Functions Core Tools in PowerShell. As I said above, we will use a Function App and Table Storage. The free tier of the Function App far exceeds our usage so it will be free, and Table Storage has no up-front pricing and very low usage costs, so should only cost you a few cents per year. If you’re concerned about cost, I would suggest putting a budget alert on the resource group we create. I added one for 5c per month and it’s only notified me once.

The first step is to log in to your Azure account and set the default subscription. When you log in, you’ll see a list of available subscriptions, so grab the ID of the one you want to use.

az login
az account set --subscription 'your-subscription-id'

Next we will set a few variables for values that we’ll need to use several times throughout the process.

$location = 'australiasoutheast'
$rg = 'resource-group-name'
$sa = 'storageaccountname'
$fa = 'function-app-name'

You’ll need to supply globally unique values for the Storage Account name and the Function App name, noting that Storage Account names must be 3-24 characters using only lower-case letters and digits, and Function App names can’t use special characters other than hyphens. You can pick whichever location you prefer—I’ve gone with the data center in Melbourne. You can use the following command to get a list of available locations, and you’ll need to grab the value for the Name property.

az account list-locations --out table

Next we will create the resource group, the Storage Account and the Function App.

az group create `
  --name $rg `
  --location $location

az storage account create `
  --name $sa `
  --location $location `
  --resource-group $rg `
  --sku STANDARD_LRS

az functionapp create `
  --name $fa `
  --resource-group $rg `
  --storage-account $sa `
  --consumption-plan-location $location

Finally we need to configure CORS for our Function App, and then deploy the function we created earlier.

az functionapp cors add `
  --name $fa `
  --resource-group $rg `
  --allowed-origins https://your-website.com http://localhost:8080

func azure functionapp publish $fa

It’s up to you whether you want to allow CORS from localhost. It will come in handy while testing to make sure you’ve set everything up correctly, but you might find it floods your logs while you’re testing content changes on your website.

Your function should now be live and running at https://$fa.azurewebsites.net/api/Log, based on your choice of the Function App name. Don’t forget to substitute this back into the JavaScript snippet on your website.

At this point, you might want to add a custom domain to your Function App. This is also completely free, and you can do this by following the instructions to add a Custom Domain and then create a free certificate. If you do this, don’t forget again to substitute the new URL back into the JavaScript snippet on your website.

Querying the logs

At this point your website should be set up and collecting page view records to Table Storage, and now you want to run queries on your data.

You can browse the records directly in the Azure Portal or via Azure Storage Explorer, or you can use the Azure CLI to retrieve the entries programmatically. The following command will fetch all entries for the last month and hydrate them as PowerShell objects.

$time = [DateTime]::UtcNow.AddDays(-7).ToString("o")
$logs = az storage entity query `
  --account-name $sa `
  --table-name LogEntries `
  --filter "Timestamp ge datetime'$time'" `
  | ConvertFrom-Json `
  | Select-Object -ExpandProperty items

We can then group by pathname (PartitionKey) or referrer to see which pages get the most hits and what drives the most traffic, for example, using the following two commands.

$logs `
  | Group-Object -Property PartitionKey `
  | Select-Object -Property Count,Name `
  | Sort-Object -Property Count -Descending
$logs `
  | Group-Object -Property Referrer `
  | Select-Object -Property Count,Name `
  | Sort-Object -Property Count -Descending

Wrap-up

If you’ve made it to the end, you should now have your very own basic analytics service running in Azure which records page views with the referrer, and your website hooked up to use it. Depending on your needs you could look at expanding the payload to include other basic details you care about, like screen size, but for any complex requirements, honestly this is not going to be the right tool for you.

I’ve been using it on this website for over a year now, I have never had to do any maintenance or code changes, and it has cost me around 50 cents in total to operate. But more importantly, it was fun to build, I get some small feeling of satisfaction in knowing that this website remains 100% hand-made, and I learnt about the basics of Azure Functions and Table Storage in the process.