Adventures in Marketing: Connecting to an API and the Download Tool

This is one of the cooler workflows I've built in Alteryx:

A finger on the pulse of the internet...  Can you spot the month with no Reddit data? — A finger on the pulse of the internet... Can you spot the month with no Reddit data?

When working with the Download tool, to start, imagine you're building one or more URLs. In total, the URL will likely: identify who you are, what you want, and for what time periods you want it.

In this case, I want a count of mentions of Seagate that are positive, negative, neutral, for various forums and social platforms on the internet, for 6 complete historical months and the current month.

The Sysomos API has great documentation on how I write the various API calls that, when combined, will give me all of these details without hitting a row limit on their responses.

The green boxes on the left are built to quickly reconfigure the API requests.

For ease of maintenance, I choose to break out the various API inputs such that I only ever need to edit one cell to make a given change to the API call. In total, I create 21 distinct API calls to get all the data I need for this workflow. I make requests based on three sentiment types, across seven calendar months: (positive, negative, neutral) x ( -6 thru 0 month ages).

These are all the "common" payload parameters for the Sysomos API

These are common inputs. You should always expect to have a root URL. You'd usually also expect to have at least one or two API Keys that uniquely identify you as the requester. It's possible that publicly available APIs don't require a Key. Notice how I don't use all of these arguments. I filter the rows out if they resolve to null.

Humans like many rows, machines prefer columns...

You might notice I enter each API parameter on a unique row and transform them using the crosstab tool. I do this because rows of data are very friendly to read and manage in Alteryx, and that's not true if I have more than a couple columns. So I crosstab the columns in order to turn the variableNames into headers for columns, with a total of 1 row of data (the variableNames).

You'll find that if you have one row of data with all common data, then you can append the varying data in order to get all of the unique combinations you're going to need for a complete data set. I do this twice, once for the 3 sentiments, and then again for the 7 months.

Now, for a simple GET request, it's been my experience that configuring the download tool is pretty straightforward. I have had success leaving most settings defaulted, but I'd review your unique use case (and the API manual) to ensure you're complying with their requirements.

Here's how I'm set up. First, I tell it where to find the URL and I do tell it to encode the URL Text. That converts weird characters into URL compatible characters. That's including the "colon" I use to delimit the various forum types.

Configuring the payload based on the parameters I decided to use.

Now, in this case, all my headers are static; they never change, so I just pre-compile them into my URL. It may be that you need to vary your headers, in which case that tab applies to you, (Headers are just the parts of your URL separated by "/"). So, skipping to the Payload tab, I have the following attributes that I call, each of which is a unique column in my data.

You'll notice this is also where you would change the API type from GET (or FTP) to POST, etc.

You can also just take it all from one field or just in a big text input box, I think, as an alternative to checking multiple values by field. I could see that being useful in more strange APIs that require enhanced customization.

So far, every API I've worked with has returned JSON, XML or csv. Most only offer the first two, so check out the JSON Parse and XML Parse tools to wrap up your workflow. Once out the other side, it is likely that you will need to further refine the data, particularly if you get JSON outputs. Oftentimes the data is not structured in a tabular way because it's nested into groups of records. If that's the case, the good news is, it's still programatically structured and you can programatically restructure it!

For my part, if I can, I've tended to use a csv output, writing the data to as many temporary files as I need, one for each API call, and then I'll use a dynamic input tool in order to pick up those files and continue my work. The reason why I prefer csv is that the data is already given in columnar format, obviating quite a few steps, particularly when compared to JSON, which requires a handful of transforms before heading to Tableau for visualization.