使用糖尿病数据集在example of linear regression有一个很好的sklearn
。
我复制了笔记本版本并在Jupyterlab中玩了一下。当然,它就像示例一样工作。但我想知道我真正看到了什么。
所以我玩了ipython / jupyter提供的很好的功能:
diabetes.DESCR
Diabetes dataset
================
Notes
-----
Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of
n = 442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.
Data Set Characteristics:
:Number of Instances: 442
:Number of Attributes: First 10 columns are numeric predictive values
:Target: Column 11 is a quantitative measure of disease progression one year after baseline
:Attributes:
:Age:
:Sex:
:Body mass index:
:Average blood pressure:
:S1:
:S2:
:S3:
:S4:
:S5:
:S6:
Note: Each of these 10 feature variables have been mean centered and scaled by the standard
deviation times `n_samples` (i.e. the sum of squares of each column totals 1).
Source URL:
http://www4.stat.ncsu.edu/~boos/var.select/diabetes.html
For more information see:
Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004)
"Least Angle Regression," Annals of Statistics (with discussion), 407-499.
(http://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)'
从源URL,我们被引导到原始的raw data,这是一个以制表符分隔的非标准化数据副本。它还进一步解释了问题域中“S”功能的含义。
但我真正的问题是sklearn
是否有办法确定
或者这仅仅是线性回归的证明?
在规范化之前没有关于数据的任何信息的情况下,无法对数据进行非规范化。但是,请注意sklearn.preprocessing
类MinMaxScaler
,StandardScaler
等确实包括inverse_transform
方法(example),所以如果在示例中也提供了这样做很容易。正如你所说的那样,这只是一个回归演示。