MeanEmbeddingTest¶
- class hyppo.ksample.MeanEmbeddingTest(num_randfreq=5)¶
- Mean Embedding test statistic and p-value. - The Mean Embedding test is a two-sample test that uses differences in (analytic) mean embeddings of two data distributions in a reproducing kernel Hilbert space. 1. - Parameters
- num_randfreq ( - int) -- Used to construct random array with size- (p, q)where p is the number of dimensions of the data and q is the random frequency at which the test is performed. These are the random test points at which test occurs (see notes).
 - Notes - The test statistic, like the Smooth CF statistic, takes on the following form: \[W_n\Sigma_n^{-1}W_n\]- As seen in the above formulation, this test-statistic takes the same form as the Hotelling \(T^2\) statistic found in - hyppo.ksample.Hotelling. However, the components are defined differently in this case. Given data sets X and Y, define the following as \(Z_i\), the vector of differences:\[Z_i = (k(X_i, T_1) - k(Y_i, T_1), \ldots, k(X_i, T_J) - k(Y_i, T_J)) \in \mathbb{R}^J\]- The above is the vector of differences between kernels at test points, \(T_j\). The kernel maps into the reproducing kernel Hilbert space. This same formulation is used in the Mean Embedding Test. Moving forward, \(W_n\) can be defined: \[W_n = \frac{1}{n} \sum_{i = 1}^n Z_i\]- This leaves \(\Sigma_n\), the covariance matrix as: \[\Sigma_n = \frac{1}{n}ZZ^T\]- Once \(S_n\) is calculated, a threshold \(r_{\alpha}\) corresponding to the \(1 - \alpha\) quantile of a Chi-squared distribution w/ J degrees of freedom is chosen. Null is rejected if \(S_n\) is larger than this threshold. - References - 1
- Kacper P Chwialkowski, Aaditya Ramdas, Dino Sejdinovic, and Arthur Gretton. Fast two-sample testing with analytic representations of probability measures. Advances in Neural Information Processing Systems, 2015. 
 
Methods Summary
| 
 | Calculates the mean embedding test statistic. | 
| 
 | Calculates the mean embedding test statistic and p-value. | 
- MeanEmbeddingTest.statistic(x, y, random_state)¶
- Calculates the mean embedding test statistic. - Parameters
- Returns
- stat ( - float) -- The computed mean embedding statistic.
 
- MeanEmbeddingTest.test(x, y, random_state=None)¶
- Calculates the mean embedding test statistic and p-value. - Parameters
- Returns
 - Examples - >>> import numpy as np >>> from hyppo.ksample import MeanEmbeddingTest >>> np.random.seed(1234) >>> x = np.random.randn(500, 10) >>> y = np.random.randn(500, 10) >>> stat, pvalue = MeanEmbeddingTest().test(x, y, random_state=1234) >>> '%.2f, %.3f' % (stat, pvalue) '5.33, 0.377' 
