DcorrX¶
- class hyppo.time_series.DcorrX(compute_distance='euclidean', max_lag=0, **kwargs)¶
- Cross Distance Correlation (DcorrX) test statistic and p-value. - DcorrX is an independence test between two (paired) time series of not necessarily equal dimensions. The population parameter is 0 if and only if the time series are independent. It is based upon energy distance between distributions. - Parameters
- compute_distance ( - str,- callable, or- None, default:- "euclidean") -- A function that computes the distance among the samples within each data matrix. Valid strings for- compute_distanceare, as defined in- sklearn.metrics.pairwise_distances,- From scikit-learn: [ - "euclidean",- "cityblock",- "cosine",- "l1",- "l2",- "manhattan"] See the documentation for- scipy.spatial.distancefor details on these metrics.
- From scipy.spatial.distance: [ - "braycurtis",- "canberra",- "chebyshev",- "correlation",- "dice",- "hamming",- "jaccard",- "kulsinski",- "mahalanobis",- "minkowski",- "rogerstanimoto",- "russellrao",- "seuclidean",- "sokalmichener",- "sokalsneath",- "sqeuclidean",- "yule"] See the documentation for- scipy.spatial.distancefor details on these metrics.
 - Set to - Noneor- "precomputed"if- xand- yare already distance matrices. To call a custom function, either create the distance matrix before-hand or create a function of the form- metric(x, **kwargs)where- xis the data matrix for which pairwise distances are calculated and- **kwargsare extra arguements to send to your custom function.
- max_lag ( - int, default:- 0) -- The maximum number of lags in the past to check dependence between- xand the shifted- y. Also the- Mhyperparmeter below.
- **kwargs -- Arbitrary keyword arguments for - compute_distance.
 
 - Notes - The statistic can be derived as follows 1: - Let \(x\) and \(y\) be \((n, p)\) and \((n, q)\) series respectively, which each contain \(y\) observations of the series \((X_t)\) and \((Y_t)\). Similarly, let \(x[j:n]\) be the \((n-j, p)\) last \(n-j\) observations of \(x\). Let \(y[0:(n-j)]\) be the \((n-j, p)\) first \(n-j\) observations of \(y\). Let \(M\) be the maximum lag hyperparameter. The cross distance correlation is, \[\mathrm{DcorrX}_n (x, y) = \sum_{j=0}^M \frac{n-j}{n} Dcorr_n (x[j:n], y[0:(n-j)])\]- The p-value returned is calculated using a permutation test. - References - 1
- Cencheng Shen, Jaewon Chung, Ronak Mehta, Ting Xu, and Joshua T Vogelstein. Independence testing for temporal data. Transactions on Machine Learning Research, 2024. 
 
Methods Summary
| 
 | Helper function that calculates the DcorrX test statistic. | 
| 
 | Calculates the DcorrX test statistic and p-value. | 
- DcorrX.statistic(x, y)¶
- Helper function that calculates the DcorrX test statistic. - Parameters
- x,y ( - ndarrayof- float) -- Input data matrices.- xand- ymust have the same number of samples. That is, the shapes must be- (n, p)and- (n, q)where n is the number of samples and p and q are the number of dimensions. Alternatively,- xand- ycan be distance matrices, where the shapes must both be- (n, n).
- Returns
 
- DcorrX.test(x, y, reps=1000, workers=1, random_state=None)¶
- Calculates the DcorrX test statistic and p-value. - Parameters
- x,y ( - ndarrayof- float) -- Input data matrices.- xand- ymust have the same number of samples. That is, the shapes must be- (n, p)and- (n, q)where n is the number of samples and p and q are the number of dimensions. Alternatively,- xand- ycan be distance matrices, where the shapes must both be- (n, n).
- reps ( - int, default:- 1000) -- The number of replications used to estimate the null distribution when using the permutation test used to calculate the p-value.
- workers ( - int, default:- 1) -- The number of cores to parallelize the p-value computation over. Supply- -1to use all cores available to the Process.
 
- Returns
 - Examples - >>> import numpy as np >>> from hyppo.time_series import DcorrX >>> np.random.seed(456) >>> x = np.arange(7) >>> y = x >>> stat, pvalue, dcorrx_dict = DcorrX().test(x, y, reps = 100) >>> '%.1f, %.2f, %d' % (stat, pvalue, dcorrx_dict['opt_lag']) '1.0, 0.04, 0' 
