Friday, 16 November 2012

Population Stability Index (PSI) in SQL

So dealing with probability is quite a fun thing to play around with in SQL if you're into that sort of thing.
It's easy when you're building a data warehouse and cubes and you need to compare values to a report vs the values that you're getting from a Transactional system like Syspro or SAP. (think ERP, POS systems, but you get the point.)
Recently I've had to build a report that compares values of data that might change, or might not change, but it's not financial figures, it's Population figures. So for example in the Credit Risk industry the percentage of people from last month that now have credit cards paid, to the percentage this month.
Do this calculation there's a formula called a Population Stability Index (PSI).
The formula looks like this:
All fancy and complicated, but once you break it down and you realize that you can substitute the Sigma sign actually just means SUM() in SQL, then your life becomes a whole lot easier.

So let's put together some Sample Data and create the formula in SQL.

 All of a sudden the mathematical formula is less intimmidating, and you're able to write the SQL code with ease.

The PSI will show you a shift in the population from 1 month to next if any, then you can create alerts based on the value that you get back.

I hope this has been informative, and I've been asked to blog a little more about writing mathematical algorithms into SQL code, so I'll see if I can find some good and applicable examples to use, and post them.

If you want to learn more about the PSI, then read this SAS document that I found online. It gives you quite a good explination.