Applied Data Analysis in Python
Let's find the most negative and the most positive (ignoring self-correlation) values
In [1]:
from pandas import DataFrame
from sklearn.datasets import fetch_california_housing
housing_data = fetch_california_housing()
housing = DataFrame(housing_data.data, columns=housing_data.feature_names)
corr = housing.corr()
corr
Out[1]:
Most negative correlation¶
Find the most negative correlation for each column:
In [2]:
corr.min()
Out[2]:
Find the column which has the lowest correlation:
In [3]:
corr.min().idxmin()
Out[3]:
Extract the Latitude column and get the index of the most negative value in it:
In [4]:
corr[corr.min().idxmin()].idxmin()
Out[4]:
The most negative correlation is therefore between:
In [5]:
corr.min().idxmin(), corr[corr.min().idxmin()].idxmin()
Out[5]:
with the value:
In [6]:
corr.min().min()
Out[6]:
Most positive correlation¶
First we need to remove the 1.0 values on the diagonal:
In [7]:
import numpy as np
np.fill_diagonal(corr.values, np.nan)
corr
Out[7]:
In [8]:
corr.max().idxmax(), corr[corr.max().idxmax()].idxmax()
Out[8]:
In [9]:
corr.max().max()
Out[9]: