Keeping things fresh: Adding [Latest Snapshot Date]

One of the key additions I make to our team's staging tables is a Time attribute called [Latest Snapshot Date], which I append to each subject matter dataset in the process of processing it from a .xlsx file that's originated out of Business Objects into a .yxdb file that's going to be used as an input for our more involved data prep workflows.

In most situations, analysts are keen to cite their sources, all the way down to the snapshot timestamp used.  This helps immensely when questions of freshness come up.

Personally, I like to provide a small table that lists each dataset and a timestamp.  This is trivial to produce in a presentation layer if each dataset is equipped with something like a [Latest Snapshot Date] attribute. I resort to two different ways to derive this attribute:

1) Transactional / Live Data

With a transactional or pseudotransactional dataset, where snapshots aren't provided, I resort to capturing the write time on the xlsx file written by Business Objects and append it as the [Latest Snapshot Date].

 

2) Snapshot Data

It's a bit different when a dataset has a series of snapshots.  I'll  use the Summarize to to find the maximum value within the snapshot field of the dataset and append that max snapshot as the [Latest Snapshot Date].

In some cases, the timestamp data is provided in a string and in such a situation I'll resort to a regex tool to convert it from a string into the time format preferred by Alteryx.  Regular expressions are a topic worthy of their own series of articles.  More on them in a future post!