Let's find the most negative and the most positive (ignoring self-correlation) values

from pandas import DataFrame
from sklearn.datasets import fetch_california_housing

housing_data = fetch_california_housing()
housing = DataFrame(housing_data.data, columns=housing_data.feature_names)

corr = housing.corr()

corr

Most negative correlation¶

Find the most negative correlation for each column:

corr.min()

MedInc       -0.119034
HouseAge     -0.296244
AveRooms     -0.153277
AveBedrms    -0.077747
Population   -0.296244
AveOccup     -0.006181
Latitude     -0.924664
Longitude    -0.924664
dtype: float64

Find the column which has the lowest correlation:

corr.min().idxmin()

'Latitude'

Extract the Latitude column and get the index of the most negative value in it:

corr[corr.min().idxmin()].idxmin()

'Longitude'

The most negative correlation is therefore between:

corr.min().idxmin(), corr[corr.min().idxmin()].idxmin()

('Latitude', 'Longitude')

with the value:

corr.min().min()

-0.9246644339150366

Most positive correlation¶

First we need to remove the 1.0 values on the diagonal:

import numpy as np

np.fill_diagonal(corr.values, np.nan)
corr

corr.max().idxmax(), corr[corr.max().idxmax()].idxmax()

('AveRooms', 'AveBedrms')

corr.max().max()

0.8476213257130424

	MedInc	HouseAge	AveRooms	AveBedrms	Population	AveOccup	Latitude	Longitude
MedInc	1.000000	-0.119034	0.326895	-0.062040	0.004834	0.018766	-0.079809	-0.015176
HouseAge	-0.119034	1.000000	-0.153277	-0.077747	-0.296244	0.013191	0.011173	-0.108197
AveRooms	0.326895	-0.153277	1.000000	0.847621	-0.072213	-0.004852	0.106389	-0.027540
AveBedrms	-0.062040	-0.077747	0.847621	1.000000	-0.066197	-0.006181	0.069721	0.013344
Population	0.004834	-0.296244	-0.072213	-0.066197	1.000000	0.069863	-0.108785	0.099773
AveOccup	0.018766	0.013191	-0.004852	-0.006181	0.069863	1.000000	0.002366	0.002476
Latitude	-0.079809	0.011173	0.106389	0.069721	-0.108785	0.002366	1.000000	-0.924664
Longitude	-0.015176	-0.108197	-0.027540	0.013344	0.099773	0.002476	-0.924664	1.000000

	MedInc	HouseAge	AveRooms	AveBedrms	Population	AveOccup	Latitude	Longitude
MedInc	NaN	-0.119034	0.326895	-0.062040	0.004834	0.018766	-0.079809	-0.015176
HouseAge	-0.119034	NaN	-0.153277	-0.077747	-0.296244	0.013191	0.011173	-0.108197
AveRooms	0.326895	-0.153277	NaN	0.847621	-0.072213	-0.004852	0.106389	-0.027540
AveBedrms	-0.062040	-0.077747	0.847621	NaN	-0.066197	-0.006181	0.069721	0.013344
Population	0.004834	-0.296244	-0.072213	-0.066197	NaN	0.069863	-0.108785	0.099773
AveOccup	0.018766	0.013191	-0.004852	-0.006181	0.069863	NaN	0.002366	0.002476
Latitude	-0.079809	0.011173	0.106389	0.069721	-0.108785	0.002366	NaN	-0.924664
Longitude	-0.015176	-0.108197	-0.027540	0.013344	0.099773	0.002476	-0.924664	NaN

Applied Data Analysis in Python

Most negative correlation¶

Most positive correlation¶