使用 lmfit 和 scipy.curve_fit 拟合对数曲线 - 不起作用! (数据集:终极电影统计数据集 - 用于 ML)

问题描述 投票:0回答:1

这个数据集很棒!然而我被卡住了......投票数(x轴)和批准指数(y轴)之间似乎存在对数关系,但我尝试使用 scipy.curve_fit 和 lmfit 来拟合数据......但我不这样做得到一个方程,它要么给出错误的方程,从而给出错误的图表,要么当我尝试使用 lmfit 时得到真值错误......

但是,我使用一个在线网站生成了一条曲线,它起作用了......有点。我以恒定的间隔获取 x 值和相应的 y 值,然后用它来绘制点,并从网站生成曲线。我得到的方程是 y = 0.986203np.log(12.6695x + 64567.3) - 8.30291。当我尝试输入这些作为猜测参数[p0]或make_parameters值时,它似乎仍然不起作用......我添加了下面的代码以供参考!

如果有人能告诉我我做错了什么,我将非常感激,因为我花了大约 2 个小时来解决这个问题......谢谢:)

这里也是数据集的链接:https://www.kaggle.com/datasets/alessandrolobello/the-ultimate-film-statistics-dataset-for-ml

下面是我尝试过的代码:

import pandas as pd
import numpy as np
import sklearn
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split as tts
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm

from sklearn.metrics import mean_squared_error

import os
import plotly.express as px

df=pd.read_csv('movie_statistic_dataset.csv')

#number of votes vs approval index

data = newdf[['movie_numerOfVotes','approval_Index']].reset_index(drop=True)
data.columns = ['Votes','Approval']

X = data[['Votes']].values
Y = data[['Approval']].values


#trying curve_fit

from scipy.optimize import curve_fit

def logfunction(a,b,x):
    return a*np.log(b*x)

popt,pcov = curve_fit(logfunction,X.ravel(),Y.ravel())

a,b = popt
ypred_log = logfunction(a,b,X.ravel())

plt.scatter(X.ravel(),ypred_log)
plt.scatter(X.ravel(),Y.ravel())

#the curve doesn't fit

#trying lmfit

from lmfit import Model, Minimizer

def logfunc(x):
    return a*np.log(b*x + c)

mod = Model(logfunc)

params = mod.make_params(a=1,b=15,c=6000)

result = mod.fit(np.array(data['Approval']),params=params,x=np.array(data['Votes']))

result.fit_report()   #gives an error! 

#equation I found online:
x = np.arange(0,6000001,10000)
y = 0.986203*np.log(12.6695*x + 64567.3) - 8.30291


#trying again with lmfit

def logfunc(x):
    return y = a*np.log(b*x + c) - d

mod = Model(logfunc)

params = mod.make_params(a=0.982, b=12.7,c=64567,d=8.302)

result = mod.fit(np.array(data['Approval']),params=params,x=np.array(data['Votes']))
#also doesn't work!!

parameters scipy curve-fitting logarithm lmfit
1个回答
0
投票

我所做的主要改变是:

  • 对于
    curve_fit
    ,第一个变量必须是自变量(在本例中为
    x
  • 我调整了方程的形式以获得更好的拟合

import pandas as pd
import numpy as np
import sklearn
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split as tts
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics import mean_squared_error

import os

df = pd.read_csv('movie_statistic_dataset.csv')

#number of votes vs approval index

data = df[['movie_numerOfVotes','approval_Index']].reset_index(drop=True)
data.columns = ['Votes','Approval']

X = data[['Votes']].values
Y = data[['Approval']].values


#trying curve_fit

from scipy.optimize import curve_fit

def logfunction(x, a, c): #first argument must be the independent variable
    return a * np.log(x + 1e-8)**3.8 + c
    # return a * np.log(x + 1e-6)**b + c #could estimate b instead

popt,pcov = curve_fit(logfunction, X.ravel(), Y.ravel())
a, c = popt

#Fitted curve at the data points
ypred_log = logfunction(X.ravel(), a, c)

#Fitted curve on a new axis
x_fine = np.linspace(X.min(), X.max(), num=300)
ypred_log_fine = logfunction(x_fine, a, c)

#Plot
plt.scatter(X.ravel(), Y.ravel(), marker='.', s=20, color='darkgray', label='data')

plt.plot(x_fine, ypred_log_fine, color='black', label='fit')
#Plot the fit at data points:
# plt.scatter(X.ravel(), ypred_log, marker='|', color='black', label='fit')
plt.legend()

plt.gcf().set_size_inches(8, 3)
plt.xlabel('X')
plt.ylabel('Y')
© www.soinside.com 2019 - 2024. All rights reserved.