使用主成分分析使数据可视化

Question

我正在尝试使用pca可视化我的数据。我有一个包含4个变量（日期，地下水位，降雨量，温度）Here is an example of my dataset的数据集。我想看看降雨和暴风雨之间以及温度和暴风雨之间是否存在关系。

我听说我可以使用pca或回归来尝试。我对此并不陌生，对我如何去做到这一点有些困惑。

我在网上遵循了一个教程，但最后却出现了错误：

 >>>> **TypeError: invalid type comparison**.

我很困惑，因为目标不是简单的0或1，而是地下水位（gwl）。我正在尝试查看地下水位和温度之间是否存在任何相关性

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_excel("C:/Users/Tamarie/Downloads/joined1.xlsx")
print(df.dtypes)
df.head()
df.info()

from sklearn.preprocessing import StandardScaler
features = ['rainfall', 'temperature', 'Evapotrans']
# Separating out the features
x = df.loc[:, features].values
# Separating out the target
y = df.loc[:,['gwl']].values
# Standardizing the features
x = StandardScaler().fit_transform(x)

from sklearn.decomposition import PCA
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
             , columns = ['principal component 1', 'principal component 2'])

finalDf = pd.concat([principalDf, df[['gwl']]], axis = 1)
fig = plt.figure(figsize = (8,8))
ax = fig.add_subplot(1,1,1) 
ax.set_xlabel('Principal Component 1', fontsize = 15)
ax.set_ylabel('Principal Component 2', fontsize = 15)
ax.set_title('2 component PCA', fontsize = 20)
targets = ['gwl']
colors = ['r', 'g', 'b']
for target, color in zip(targets,colors):
    indicesToKeep = finalDf['gwl'] == target
    ax.scatter(finalDf.loc[indicesToKeep, 'principal component 1']
               , finalDf.loc[indicesToKeep, 'principal component 2']
               , c = color
               , s = 50)
ax.legend(targets)
ax.grid()

Answer 1

targets = ['gwl']
colors = ['r', 'g', 'b']
for target, color in zip(targets,colors):
    indicesToKeep = finalDf['gwl'] == target

在此行中，为什么要比较finalDf ['gwl']带有对象的float列表和字符串

不过，如果您想查看地下水位和温度之间是否存在任何关联，则>

然后您可以在地下水位和温度列之间绘制散点图

使用主成分分析使数据可视化

问题描述投票：1回答：1

1个回答

最新问题

使用主成分分析使数据可视化

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1