DIY Page Analytics
This site is hosted on GitHub Pages. Because I’d like to have a rough idea of which of my posts are receiving the most engagement and which sites are driving the most traffic, and since GitHub Pages doesn’t provide me with this by default, I decided to add some page view analytics to my site. I’m not particularly keen on privacy-invasive services like Google Analytics, which collect far more information than I need and share it with 3rd-parties, so I decided to build my own minimalist service. Also, it just seemed like a fun yak-shaving exercise, and everything else on this page is hand-made.
In this post we’ll walk through how to build a very basic page
analytics service using Azure Functions and Table Storage, which you can host yourself for a few cents per
year—
Is this going to be a genuine substitute for Google Analytics? Definitely not. It won’t track session flows, time spent on the page, conversion, audience details, or anything fancy like that. It’s just going to do the absolute basics, which is to record which pages get viewed and what the referrer was. For a page which receives as little traffic as this website, that’s perfectly adequate.
The way that it works is that a bit of JavaScript will run on each page view of your website, which sends some information (the path of the page being viewed, and the referrer) to an Azure Function, which then stores these details into Table Storage. You can then query Table Storage to see which pages are being viewed and where the traffic is coming from.
Before we get started, you’re going to need to make sure you’ve installed the Azure CLI, the Azure Functions Core Tools, and the .NET Core CLI.
Function setup
Let’s get started by creating our project directory and adding the required boilerplate. Open up your favourite terminal and run the following commands to create a blank .NET function app.
mkdir logger
cd logger
func init --worker-runtime dotnet
Next we need to add our function to it, and install the required package dependencies.
func new --language 'C#' --template HttpTrigger --name Log
dotnet add package Microsoft.Azure.WebJobs.Extensions.Storage
This will create a Log.cs
file containing a standard C#
function. At this point, you could test the function locally by
running the following.
func host start
Function implementation
The first thing we need to add is a class to represent a log entry.
In Table Storage, each entry must have a PartitionKey
and a RowKey
, which combined must uniquely identify each
entry. Entries will also automatically have a Timestamp
property which records the time that they were last modified.
We’ll use the PartitionKey
to record the page being
visited, but unfortunately we can’t use the RowKey
to
record the referrer since this (hopefully) wouldn’t be a unique pair.
Instead we’ll add a new property, Referrer
, and use a
random GUID to ensure uniqueness of the RowKey
property.
This means that our LogEntry
class will look as follows.
public class LogEntry
{
public string PartitionKey { get; set; }
public string RowKey { get; set; }
public string Referrer { get; set; }
}
Next we need to change the function implementation. I’ll supply the full code snippet first and then we’ll walk through it together.
[FunctionName("Log")]
[return: Table("LogEntries")]
public static async Task<LogEntry> Run(
[HttpTrigger(AuthorizationLevel.Anonymous, "post", Route = null)]
HttpRequest req,
ILogger log)
{
log.LogInformation("C# HTTP trigger function processed a request.");
string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
dynamic data = JsonConvert.DeserializeObject(requestBody);
string pathname = data?.pathname;
string referrer = data?.referrer;
log.LogInformation($"{pathname} -> {referrer}");
return new LogEntry
{
PartitionKey = pathname.Replace("/", "|"), // '/' not supported
RowKey = Guid.NewGuid().ToString(),
Referrer = referrer
};
}
The first change is to add the [return:
Table("LogEntries")]
attribute and change the return type from
Task<IActionResult>
to
Task<LogEntry>
. This allows us to return a new
entry for the LogEntries
table in Table Storage, which
will be inserted automatically.
Next we changed the access level from Function
to
Anonymous
, because we’re going to be hitting this
function from publicly viewable JavaScript where we can’t hide an API
key anyway, and changed the allowed methods to only allow
POST
, since that’s the method which bests describes our
action.
Finally, we changed the body of the function retrieve two properties
from the body of the request, pathname
and
referrer
, and use them to construct a new
LogEntry
object as described above. Note that we’ve had
to replace forward-slashes /
with vertical-line
|
because the PartitionKey
property doesn’t
support values with forward-slashes. We are expecting the body of
incoming requests to look something like this.
{
"pathname": "/posts/diy-page-analytics",
"referrer": "/posts"
}
Putting it all together, we can now send an unauthenticated
POST
request to this function with a body as described
above, it will pick out the pathname
and
referrer
properties, and add an entry in the
LogEntries
table of Table Storage. “Which Table Storage
account?”, you might ask. It turns out Azure Functions require a
Table Storage account already to record function invocations, so when
you test this locally this will likely be through Azure Storage Emulator, and when running in Azure, it will be a
Storage Account that we’ll create.
If you want to test this now, you can run the following.
func host start --cors '*'
You can then use a tool like Postman to send a request, and Azure Storage Explorer to verify that the entry is added to Table Storage.
Updating your website
Next up we’ll add some JavaScript to your website to trigger the
Azure Function and send the payload containing the
pathnam
and referrer
.
All you’ll need to do is add the following script somewhere to each page, whether that be by adding it to some existing widely used script file or creating a new file and including it in the header of each page. I’d advise against in-lining it in a script element on each page in case you need to change any of the behaviour (especially the URL) later.
(function () {
if (navigator.doNotTrack === '1') return;
const payload = {
pathname: document.location.pathname,
referrer: document.referrer
};
navigator.sendBeacon(
'http://localhost:7071/api/Log',
JSON.stringify(payload)
);
}());
Let’s step through this. The first step is to check the
doNotTrack
property on the global navigator
object. This will be set if the user included the DNT HTTP header in the request, signifying that they do not wish
to be tracked. We should be respectful of the users’ privacy wishes,
so if they’ve set DNT then we abort.
Next we grab the location.pathname
property as the
pathname
and the referrer
, and use the
Fetch API to send a POST
request to our function.
Finally, we wrapped all of this in a self-invoking anonymous function
to avoid polluting the global namespace.
We’ll have to come back and update the URL after we deploy to Azure, but for now this should be testable locally.
Deploying to Azure
Now we’re going to deploy our function to Azure using the Azure CLI and the Azure Functions Core Tools in PowerShell. As I said above, we will use a Function App and Table Storage. The free tier of the Function App far exceeds our usage so it will be free, and Table Storage has no up-front pricing and very low usage costs, so should only cost you a few cents per year. If you’re concerned about cost, I would suggest putting a budget alert on the resource group we create. I added one for 5c per month and it’s only notified me once.
The first step is to log in to your Azure account and set the default subscription. When you log in, you’ll see a list of available subscriptions, so grab the ID of the one you want to use.
az login
az account set --subscription 'your-subscription-id'
Next we will set a few variables for values that we’ll need to use several times throughout the process.
$location = 'australiasoutheast'
$rg = 'resource-group-name'
$sa = 'storageaccountname'
$fa = 'function-app-name'
You’ll need to supply globally unique values for the Storage
Account name and the Function App name, noting that Storage Account
names must be 3-24 characters using only lower-case letters and
digits, and Function App names can’t use special characters other
than hyphens. You can pick whichever location you prefer—Name
property.
az account list-locations --out table
Next we will create the resource group, the Storage Account and the Function App.
az group create `
--name $rg `
--location $location
az storage account create `
--name $sa `
--location $location `
--resource-group $rg `
--sku STANDARD_LRS
az functionapp create `
--name $fa `
--resource-group $rg `
--storage-account $sa `
--consumption-plan-location $location
Finally we need to configure CORS for our Function App, and then deploy the function we created earlier.
az functionapp cors add `
--name $fa `
--resource-group $rg `
--allowed-origins https://your-website.com http://localhost:8080
func azure functionapp publish $fa
It’s up to you whether you want to allow CORS from
localhost
. It will come in handy while testing to make
sure you’ve set everything up correctly, but you might find it floods
your logs while you’re testing content changes on your website.
Your function should now be live and running at
https://$fa.azurewebsites.net/api/Log
, based on your
choice of the Function App name. Don’t forget to substitute this back
into the JavaScript snippet on your website.
At this point, you might want to add a custom domain to your Function App. This is also completely free, and you can do this by following the instructions to add a Custom Domain and then create a free certificate. If you do this, don’t forget again to substitute the new URL back into the JavaScript snippet on your website.
Querying the logs
At this point your website should be set up and collecting page view records to Table Storage, and now you want to run queries on your data.
You can browse the records directly in the Azure Portal or via Azure Storage Explorer, or you can use the Azure CLI to retrieve the entries programmatically. The following command will fetch all entries for the last month and hydrate them as PowerShell objects.
$time = [DateTime]::UtcNow.AddDays(-7).ToString("o")
$logs = az storage entity query `
--account-name $sa `
--table-name LogEntries `
--filter "Timestamp ge datetime'$time'" `
| ConvertFrom-Json `
| Select-Object -ExpandProperty items
We can then group by pathname
(PartitionKey
) or referrer
to see which
pages get the most hits and what drives the most traffic, for
example, using the following two commands.
$logs `
| Group-Object -Property PartitionKey `
| Select-Object -Property Count,Name `
| Sort-Object -Property Count -Descending
$logs `
| Group-Object -Property Referrer `
| Select-Object -Property Count,Name `
| Sort-Object -Property Count -Descending
Wrap-up
If you’ve made it to the end, you should now have your very own basic analytics service running in Azure which records page views with the referrer, and your website hooked up to use it. Depending on your needs you could look at expanding the payload to include other basic details you care about, like screen size, but for any complex requirements, honestly this is not going to be the right tool for you.
I’ve been using it on this website for over a year now, I have never had to do any maintenance or code changes, and it has cost me around 50 cents in total to operate. But more importantly, it was fun to build, I get some small feeling of satisfaction in knowing that this website remains 100% hand-made, and I learnt about the basics of Azure Functions and Table Storage in the process.