If you have Office 2013 Professional (or Pro Plus, which is part of certain Office 365 offerings), you’ll be entitled to use a set of business intelligence add-ins for Excel 2013 which I refer to as the Power Tools. The full set consists of:
- Power Pivot – pivot tables on steroids, providing advanced data modelling and deep analysis and reporting
- Power View – presentation and exploration of results (dashboards)
- Power Map – visual representation of location-based data on maps
- Power Query – to be discussed here
I describe Power Query as the part of the process that comes before the modelling, analysis and presentation provided by Power Pivot and Power View. But first, a question… why use Excel for business intelligence?
- Familiarity – people know how to use Excel and it’s not a big leap to get comfortable with the BI tools
- Value-for-money – some BI solutions are expensive, and although the Power Tools require Office Professional that will still work out cheaper per seat than many other solutions
- It’s a way to provide BI to the masses, not just for the chosen few – all-too-often we hear that there’s a time lag in people getting data and reports because they have to wait for someone who has access to the tools to generate the output (and by then it’s often out-of-date)
- ‘Final resting place’ – often we hear that whatever the BI solution, most people just want to get it into Excel anyway… so why not do it with Excel in the first place?
Okay, back to Power Query… as I said, it often comes first in the BI process, and that’s because it can be used to obtain data, filter it and generally tidy it up before moving on to the modelling, analysis and presentation stages. It’s very easy to get side-tracked by the fun side of Power Query (really, it can be fun) and we’ll get onto that, but let’s look at the basics. Firstly, Power Query allows you to grab data from a source:
- A file (Excel, XML, CSV, text)
- A database (SQL Server, Access, Oracle, DB2, and others)
- Other miscellaneous sources (SharePoint, OData, Azure Marketplace, Hadoop, Active Directory and others… including Facebook)
- From the web or via on online search (here’s the fun part)
Once you have secured your data, you can clean it up – this may involve filtering to get down to a subset of records, removing unwanted columns, adding custom columns, and transforming text. You can also append the data to an existing table or merge with another data set (example coming up). Click on the image on the right to see the full query editor.
Obviously there are levels of complexity to getting to the data – grabbing data from another Excel file is pretty simple. If you’re grabbing data from a relational database you probably need to know something about the databases and tables. Having said that, I was able to query UK cash point (ATM) data from the Azure Marketplace and filter it to the Guildford postcode area very quickly. Once the querying and clean-up is complete you’re left with an Excel table – this can be analysed further (e.g. using a pivot table) or added to a data model for use in Power Pivot.
I mentioned the fun part, and here it is. An online search will look for tables of data related to your search term. Most of the results seem to be Wikipedia pages, and indeed there is some clever API stuff going on to directly extract a table from a Wikipedia page. Depending on the search term you’ll also see results from data.gov. Using the online search it’s incredibly quick and easy to pull in data on just about any subject – the first timed I saw this demoed the presenter asked someone in the audience to shout out the name of a movie star (Brad Pitt, a rather obvious choice) and within seconds we were looking at a table of Mr Pitt’s filmography.
My favourite demo goes like this… I search on ‘premier league’ and select the Wikipedia page for ‘Managers – Premier League’. In comes a table listing the managers, in order of their appointment (it’s out of date at the moment, because no-one has updated the page since Villas-Boas, Jol, Mackay and Clarke were shown the door, and it doesn’t list Tony Pulis at Crystal Palace). Remove the first column which is blank, we don’t need that. Then I perform another search, this time for ‘premier league stadia’, select the 2013-14 page, and we have a second table listing the clubs’ stadium names and capacities.
Now I can select the Merge option – we have the two tables available (created by the earlier queries) and it’s just a matter of selecting the columns they have in common (one is Team, the other is Club, but that doesn’t matter). The penultimate step is to select which columns from one table we want added to the other (we don’t need the club name twice so the second instance doesn’t need to be included). A final bit of tidying up (renaming columns, swapping their positions, maybe changing the sort order) and we’re done. After clicking on ‘Apply and Close’ we have a table which contains the Premier League clubs, details on their managers and their stadia. Very quick and simple.
You can download the Power Query preview from here.