我正在使用spark MLlib在下面的数据集上进行一些缩放:-
+---+--------------+
| id| features|
+---+--------------+
| 0|[1.0,0.1,-1.0]|
| 1| [2.0,1.1,1.0]|
| 0|[1.0,0.1,-1.0]|
| 1| [2.0,1.1,1.0]|
| 1|[3.0,10.1,3.0]|
+---+--------------+
执行标准缩放后,我得到的结果如下:-
+---+--------------+------------------------------------------------------------+ |id |features |stdScal_06f7a85f98ef__output | +---+--------------+------------------------------------------------------------+ |0 |[1.0,0.1,-1.0]|[1.1952286093343936,0.02337622911060922,-0.5976143046671968]| |1 |[2.0,1.1,1.0] |[2.390457218668787,0.2571385202167014,0.5976143046671968] | |0 |[1.0,0.1,-1.0]|[1.1952286093343936,0.02337622911060922,-0.5976143046671968]| |1 |[2.0,1.1,1.0] |[2.390457218668787,0.2571385202167014,0.5976143046671968] | |1 |[3.0,10.1,3.0]|[3.5856858280031805,2.3609991401715313,1.7928429140015902] | +---+--------------+------------------------------------------------------------+
并且如果我执行最小/最大缩放**(设置val minMax = new MinMaxScaler()。setMin(5).setMax(10).setInputCol(“ features”))**我得到以下信息:-
+---+--------------+-------------------------------+ | id| features|minMaxScal_21493d63e2bf__output| +---+--------------+-------------------------------+ | 0|[1.0,0.1,-1.0]| [5.0,5.0,5.0]| | 1| [2.0,1.1,1.0]| [7.5,5.5,7.5]| | 0|[1.0,0.1,-1.0]| [5.0,5.0,5.0]| | 1| [2.0,1.1,1.0]| [7.5,5.5,7.5]| | 1|[3.0,10.1,3.0]| [10.0,10.0,10.0]| +---+--------------+-------------------------------+
请在下面找到代码:-
``` // loading dataset val scaleDF = spark.read.parquet("/data/simple-ml-scaling") // using standardScaler import org.apache.spark.ml.feature.StandardScaler val ss = new StandardScaler().setInputCol("features") ss.fit(scaleDF).transform(scaleDF).show(false) // using min/max scaler import org.apache.spark.ml.feature.MinMaxScaler val minMax = new MinMaxScaler().setMin(5).setMax(10).setInputCol("features") val fittedminMax = minMax.fit(scaleDF) fittedminMax.transform(scaleDF).show() ```
我知道标准化和最小/最大缩放的公式,但无法理解第三栏中的值,请帮助我解释其背后的数学。
我正在使用spark MLlib在下面的数据集上进行缩放:-+ --- + -------------- + | id |功能| + --- + -------------- + | 0 | [1.0,0.1,-1.0] | | 1 | [2.0,1.1,1.0] | | 0 | [1 ....
MinMaxScaler
在Spark中分别对每个功能起作用。从文档中我们可以找到: