Thursday, 21 May 2015

GSoC 2015 with pgmpy

This year I am working with pgmpy which is a Python library for Probabilistic Graphical Models (PGM) and my project is to add state name support to pgmpy.

Graphical Models are a fairly new technique in machine learning which allow us to compactly represent joint distribution over some set of random variables and also allows us to efficiently compute marginals and conditional marginals over these variables.

The random variables have states which they can attain. For example, the random variable for the result of a coin toss can attain two states heads or tails. Similarly, when working with Graphical Models each of the random variables have specified states that they can be in. Let's take the famous student example of a Bayesian Network:

The student network

In the above figure you can see a set of random variables connected to each other using directed arrows. And with each variable is an associated table known as Conditional Probability Table or CPT. And here we have used numbers to represent the states of the variables like d0, d1 etc. So for the variable Difficulty we can have two states easy and hard which have been represented by 0 and 1.

pgmpy also represents the state names using number internally. But for a user it is much better to provide input or get output of state as the name rather than the number. And pgmpy lacks this functionality right now and therefore the users have to manually keep a track of which number represented which state.

This summer I will be adding this functionality to pgmpy so that the user can work with both state name or state number. I am still having discussion with my mentors about the best ways to do this and will write about the exact implementation details in my next blog post.