Basic Concepts in ZhuSuan

Distribution

Distributions are basic functionalities for building probabilistic models. The Distribution class is the base class for various probabilistic distributions which support batch inputs, generating batches of samples and evaluate probabilities at batches of given values.

We can create a univariate Normal distribution in ZhuSuan by:

>>> import zhusuan as zs
>>> dist_a = zs.distributions.Normal(mean=0., logstd=0.)

The typical input shape for a Distribution is like batch_shape + input_shape, where input_shape represents the shape of a non-batch input parameter; batch_shape represents how many independent inputs are fed into the distribution. In general, distributions support broadcasting for inputs.

Samples can be generated by calling sample() method of distribution objects. The shape is ([n_samples] + )batch_shape + value_shape. The first additional axis is omitted only when passed n_samples is None (by default), in which case one sample is generated. value_shape is the non-batch value shape of the distribution. For a univariate distribution, its value_shape is [].

An example of univariate distributions (Normal):

>>> import jittor as jt

>>> dist_b = zs.distributions.Normal(mean=jt.array([[-1., 1.], [0., -2.]]), std=jt.array([0., 1.]))

>>> dist_b.sample().shape
[2,2,]

>>> dist_b.sample(10).shape
[10,2,2,]

There are cases where a batch of random variables are grouped into a single event so that their probabilities can be computed together. This is achieved by setting group_ndims argument, which defaults to 0. The last group_ndims number of axes in batch_shape are grouped into a single event. For example, Normal(..., group_ndims=1) will set the last axis of its batch_shape to a single event, i.e., a multivariate Normal with identity covariance matrix.

The log probability density (mass) function can be evaluated by passing given values to log_prob() method of distribution objects. In that case, the given Tensor should be broadcastable to shape (... + )batch_shape + value_shape. The returned Tensor has shape (... + )batch_shape[:-group_ndims]. For example:

>>> dist_c = zs.distributions.Normal(mean=jt.array([[-1., 1.], [0., -2.]]), std=1.,
...                                  group_ndims=1)

>>> dist_c.log_prob(jt.zeros[1])
jt.Var([-2.837877  -3.8378773], dtype=float32)

>>> dist_d = zs.distributions.Normal(mean=jt.zeros([2, 1, 3]), std=1.,
...                                  group_ndims=2)

>>> dist_d.log_prob(jt.zeros([5, 1, 1, 3])).shape
[5,2,]

BayesianNet

In ZhuSuan we support building probabilistic models as Bayesian networks, i.e., directed graphical models. Below we use a simple Bayesian linear regression example to illustrate this. The generative process of the model is

\[ \begin{align}\begin{aligned}w &\sim N(0, \alpha^2 I)\\y &\sim N(w^\top x, \beta^2)\end{aligned}\end{align} \]

where \(x\) denotes the input feature in the linear regression. We apply a Bayesian treatment and assume a Normal prior distribution of the regression weights \(w\). Suppose the input feature has 5 dimensions. For simplicity we define the input as a placeholder and fix the hyper-parameters:

x = jt.rand([5])
alpha = 1.
beta = 0.1

To define the model, the first step is to define a subclass of BayesianNet:

class Net(BayesianNet):
    def __init__(self):
        # Initialize...
    def execute(self, observed):
        # Forward propagation...

A Bayesian network describes the dependency structure of the joint distribution over a set of random variables as directed graphs. To support this, a BayesianNet instance can keep two kinds of nodes:

  • Stochastic nodes. They are random variables in graphical models. The w node can be constructed as:

    w = self.stochastic_node('Normal', name="w", mean=jt.zeros([x.shape[-1]]), std=alpha)
    

    Here w is a StochasticTensor that follows the Normal distribution, it will be registered to the nodes property of the class.

    >>> print(self.nodes['w'])
    <zhusuan.framework.stochastic_tensor.StochasticTensor object at ...
    

    For any distribution available in zhusuan.distributions, we can use the name of the distributions and the stochastic_node method of BayesianNet to create the corresponding stochastic node. The returned variables is an sample of stochastic_node, which means that you can mix them with any Jittor operations, for example, the predicted mean of the linear regression is an inner product between w and the input x:

    y_mean = jt.sum(w * x, dim=-1)
    
  • Deterministic nodes. As the above code shows, deterministic nodes can be constructed directly with Jittor operations, and in this way BayesianNet does not keep track of them. However, in some cases it’s convenient to enable the tracking by the cache property:

    self.cache['y_mean'] = y_mean
    

    This allows you to fetch the y_mean Var whenever you want it.

The full code of building a Bayesian linear regression model is like:

class bayesian_linear_regression(BayesianNet):
    def __init__(self, alpha, beta):
        self.alpha = alpha
        self.beta = beta

    def execute(self, observed):
        self.observe(observed)
        w = self.self.stochastic_node('Normal', name="w", mean=jt.zeros([x.shape[-1]]), std=alpha)
        x = self.observed['x']
        y_mean = jt.sum(w * x, dim=-1)
        y = self.self.stochastic_node('Normal', name="y", mean=y_mean, std=beta)
        return self

Then we can construct an instance of the model:

model = bayesian_linear_regression(alpha, beta)

In ZhuSuan-Jittor, we use a dictionary variable observed and the method observe() to assign observations to certain stochastic nodes or pass training data to model, for example:

model({'w': w_obs, 'x': x})

will cause the random variable \(w\) to be observed as w_obs. The result is that y_mean is computed from the observed value of w (w_obs) and the training data x passed by the dictionary variable.

For stochastic nodes that are not given observations, their samples will be used when the corresponding StochasticTensor is involved in computation with Vars or fed into Jittor operations. In this example it means that if we don’t pass any observation of \(w\) to the model, the samples of w will be used to compute y_mean.

After construction, BayesianNet supports queries about the current state of the network, such as:

# get named node(s)
w = self.nodes['w'].tensor
y = self.nodes['y'].tensor

# get log joint probability given the current values of all stochastic nodes
log_joint_value = self.log_joint()