GraphicalLasso#

class sklearn.covariance.GraphicalLasso(alpha=0.01, *, mode='cd', covariance=None, tol=0.0001, enet_tol=0.0001, max_iter=100, verbose=False, eps=np.float64(2.220446049250313e-16), assume_centered=False)[source]#

Sparse inverse covariance estimation with an l1-penalized estimator.

For a usage example see Visualizing the stock market structure.

Read more in the User Guide.

Changed in version v0.20: GraphLasso has been renamed to GraphicalLasso

Parameters:
alphafloat, default=0.01

The regularization parameter: the higher alpha, the more regularization, the sparser the inverse covariance. Range is (0, inf].

mode{β€˜cd’, β€˜lars’}, default=’cd’

The Lasso solver to use: coordinate descent or LARS. Use LARS for very sparse underlying graphs, where p > n. Elsewhere prefer cd which is more numerically stable.

covarianceβ€œprecomputed”, default=None

If covariance is β€œprecomputed”, the input data in fit is assumed to be the covariance matrix. If None, the empirical covariance is estimated from the data X.

Added in version 1.3.

tolfloat, default=1e-4

The tolerance to declare convergence: if the dual gap goes below this value, iterations are stopped. Range is (0, inf].

enet_tolfloat, default=1e-4

The tolerance for the elastic net solver used to calculate the descent direction. This parameter controls the accuracy of the search direction for a given column update, not of the overall parameter estimate. Only used for mode=’cd’. Range is (0, inf].

max_iterint, default=100

The maximum number of iterations.

verbosebool, default=False

If verbose is True, the objective function and dual gap are plotted at each iteration.

epsfloat, default=eps

The machine-precision regularization in the computation of the Cholesky diagonal factors. Increase this for very ill-conditioned systems. Default is np.finfo(np.float64).eps.

Added in version 1.3.

assume_centeredbool, default=False

If True, data are not centered before computation. Useful when working with data whose mean is almost, but not exactly zero. If False, data are centered before computation.

Attributes:
location_ndarray of shape (n_features,)

Estimated location, i.e. the estimated mean.

covariance_ndarray of shape (n_features, n_features)

Estimated covariance matrix

precision_ndarray of shape (n_features, n_features)

Estimated pseudo inverse matrix.

n_iter_int

Number of iterations run.

costs_list of (objective, dual_gap) pairs

The list of values of the objective function and the dual gap at each iteration. Returned only if return_costs is True.

Added in version 1.3.

n_features_in_int

Number of features seen during fit.

Added in version 0.24.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

Added in version 1.0.

See also

graphical_lasso

L1-penalized covariance estimator.

GraphicalLassoCV

Sparse inverse covariance with cross-validated choice of the l1 penalty.

Examples

>>> import numpy as np
>>> from sklearn.covariance import GraphicalLasso
>>> true_cov = np.array([[0.8, 0.0, 0.2, 0.0],
...                      [0.0, 0.4, 0.0, 0.0],
...                      [0.2, 0.0, 0.3, 0.1],
...                      [0.0, 0.0, 0.1, 0.7]])
>>> np.random.seed(0)
>>> X = np.random.multivariate_normal(mean=[0, 0, 0, 0],
...                                   cov=true_cov,
...                                   size=200)
>>> cov = GraphicalLasso().fit(X)
>>> np.around(cov.covariance_, decimals=3)
array([[0.816, 0.049, 0.218, 0.019],
       [0.049, 0.364, 0.017, 0.034],
       [0.218, 0.017, 0.322, 0.093],
       [0.019, 0.034, 0.093, 0.69 ]])
>>> np.around(cov.location_, decimals=3)
array([0.073, 0.04 , 0.038, 0.143])
error_norm(comp_cov, norm='frobenius', scaling=True, squared=True)[source]#

Compute the Mean Squared Error between two covariance estimators.

Parameters:
comp_covarray-like of shape (n_features, n_features)

The covariance to compare with.

norm{β€œfrobenius”, β€œspectral”}, default=”frobenius”

The type of norm used to compute the error. Available error types: - β€˜frobenius’ (default): sqrt(tr(A^t.A)) - β€˜spectral’: sqrt(max(eigenvalues(A^t.A)) where A is the error (comp_cov - self.covariance_).

scalingbool, default=True

If True (default), the squared error norm is divided by n_features. If False, the squared error norm is not rescaled.

squaredbool, default=True

Whether to compute the squared error norm or the error norm. If True (default), the squared error norm is returned. If False, the error norm is returned.

Returns:
resultfloat

The Mean Squared Error (in the sense of the Frobenius norm) between self and comp_cov covariance estimators.

fit(X, y=None)[source]#

Fit the GraphicalLasso model to X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Data from which to compute the covariance estimate.

yIgnored

Not used, present for API consistency by convention.

Returns:
selfobject

Returns the instance itself.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

get_precision()[source]#

Getter for the precision matrix.

Returns:
precision_array-like of shape (n_features, n_features)

The precision matrix associated to the current covariance object.

mahalanobis(X)[source]#

Compute the squared Mahalanobis distances of given observations.

For a detailed example of how outliers affects the Mahalanobis distance, see Robust covariance estimation and Mahalanobis distances relevance.

Parameters:
Xarray-like of shape (n_samples, n_features)

The observations, the Mahalanobis distances of the which we compute. Observations are assumed to be drawn from the same distribution than the data used in fit.

Returns:
distndarray of shape (n_samples,)

Squared Mahalanobis distances of the observations.

score(X_test, y=None)[source]#

Compute the log-likelihood of X_test under the estimated Gaussian model.

The Gaussian model is defined by its mean and covariance matrix which are represented respectively by self.location_ and self.covariance_.

Parameters:
X_testarray-like of shape (n_samples, n_features)

Test data of which we compute the likelihood, where n_samples is the number of samples and n_features is the number of features. X_test is assumed to be drawn from the same distribution than the data used in fit (including centering).

yIgnored

Not used, present for API consistency by convention.

Returns:
resfloat

The log-likelihood of X_test with self.location_ and self.covariance_ as estimators of the Gaussian model mean and covariance matrix respectively.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.