尝试完成 R 语句,出现错误[已关闭]

问题描述 投票:0回答:1

我正在做一项作业并尝试创建 R 语句,但不断出现错误。我花了很长时间试图弄清楚它。以下陈述是我所拥有的,并且几乎所有陈述都有错误。这些是我试图遵循的以下方向:

  1. 从网络下载数据集。您可以使用任何来源,但请在中指定来源 你的代码。还要确保数据具有定量和定性的混合 (分类)变量。
  2. 将数据集导入R
  3. 打印出一系列定量和分类变量的描述性统计数据。
  4. 变换至少一个变量。转变是什么并不重要。

下面是我一直在尝试处理以下错误的 R 语句。我真的需要任何帮助!

# https://www.kaggle.com/datasets/joebeachcapital/nba-player-statistics #
# Import the dataset into R #
dataset <- read.csv("~/Downloads/nba_data_processed.csv", header=FALSE)
head(dataset)

 # Print out descriptive statistics for a selection of quantitative and categorical   variables. #
# Quantitative variables #
quant_vars <- c("Tm", "Age")
cat_vars <- c("POS", "PTS")

# Frequency table for categorical variables #
for (PTS in cat_vars) {
cat_freq <- table(dataset[PTS])
print(paste("Frequency table for", PTS))
print(cat_freq)}

Error in `[.data.frame`(dataset, PTS) : undefined columns selected

# Transform a variable #
dataset$LogAge <- log(dataset$Age)

Error in log(dataset$Age) : non-numeric argument to mathematical function

# Histogram for Games Started #
dataset$GS<- as.integer(dataset$GS)

Error in `$<-.data.frame`(`*tmp*`, GS, value = integer(0)) : 
replacement has 0 rows, data has 706

## Warning: NAs introduced by coercion
hist(dataset$Tm, main="Tm Distribution", xlab="Tm")

Error in hist.default(dataset$Tm, main = "Tm Distribution", xlab = "Tm") : 
'x' must be numeric

# Scatterplot for Tm vs. Age #
plot(dataset$Tm, dataset$Age, main="Scatterplot: Tm vs. Age", xlab="Tm", ylab="Age")

Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
> 

我不知道如何处理这些错误。

> dput(head(df))
structure(list(V1 = c("Player", "Precious Achiuwa", "Steven Adams", 
"Bam Adebayo", "Ochai Agbaji", "Santi Aldama", "Nickeil Alexander-Walker", 
"Nickeil Alexander-Walker", "Nickeil Alexander-Walker", "Grayson Allen", 
"Jarrett Allen", "Jose Alvarado", "Kyle Anderson", "Giannis Antetokounmpo", 
"Thanasis Antetokounmpo", "Cole Anthony", "OG Anunoby", "Ryan Arcidiacono", 
"Ryan Arcidiacono", "Ryan Arcidiacono"), V2 = c("Pos", "C", "C", 
"C", "SG", "PF", "SG", "SG", "SG", "SG", "C", "PG", "PF", "PF", 
"PF", "PG", "SF", "PG", "PG", "PG"), V3 = c("Age", "23", "29", 
"25", "22", "22", "24", "24", "24", "27", "24", "24", "29", "28", 
"30", "22", "25", "28", "28", "28"), V4 = c("Tm", "TOR", "MEM", 
"MIA", "UTA", "MEM", "TOT", "UTA", "MIN", "MIL", "CLE", "NOP", 
"MIN", "MIL", "MIL", "ORL", "TOR", "TOT", "NYK", "POR"), V5 = c("G", 
"55", "42", "75", "59", "77", "59", "36", "23", "72", "68", "61", 
"69", "63", "37", "60", "67", "20", "11", "9"), V6 = c("GS", 
"12", "42", "75", "22", "20", "3", "3", "0", "70", "68", "10", 
"46", "63", "0", "4", "67", "4", "0", "4"), V7 = c("MP", "20.7", 
"27.0", "34.6", "20.5", "21.8", "15.0", "14.7", "15.5", "27.4", 
"32.6", "21.5", "28.4", "32.1", "5.6", "25.9", "35.6", "8.6", 
"2.4", "16.2"), V8 = c("FG", "3.6", "3.7", "8.0", "2.8", "3.2", 
"2.2", "2.3", "2.1", "3.4", "5.9", "3.3", "3.7", "11.2", "0.5", 
"4.6", "6.3", "0.5", "0.1", "0.9"), V9 = c("FGA", "7.3", "6.3", 
"14.9", "6.5", "6.8", "5.0", "4.7", "5.4", "7.7", "9.2", "8.0", 
"7.2", "20.3", "1.2", "10.2", "13.2", "1.9", "0.5", "3.6"), V10 = c("FG%", 
".485", ".597", ".540", ".427", ".470", ".444", ".488", ".384", 
".440", ".644", ".411", ".509", ".553", ".435", ".454", ".476", 
".243", ".200", ".250"), V11 = c("3P", "0.5", "0.0", "0.0", "1.4", 
"1.2", "1.0", "1.0", "1.1", "2.0", "0.0", "1.4", "0.6", "0.7", 
"0.0", "1.3", "2.1", "0.4", "0.1", "0.8"), V12 = c("3PA", "2.0", 
"0.0", "0.2", "3.9", "3.5", "2.7", "2.4", "3.1", "5.1", "0.1", 
"4.0", "1.5", "2.7", "0.2", "3.4", "5.5", "1.2", "0.3", "2.2"
), V13 = c("3P%", ".269", ".000", ".083", ".355", ".353", ".384", 
".402", ".361", ".399", ".100", ".336", ".410", ".275", ".000", 
".364", ".387", ".348", ".333", ".350"), V14 = c("2P", "3.0", 
"3.7", "8.0", "1.4", "2.0", "1.2", "1.3", "1.0", "1.4", "5.9", 
"1.9", "3.0", "10.5", "0.5", "3.4", "4.2", "0.1", "0.0", "0.1"
), V15 = c("2PA", "5.4", "6.2", "14.7", "2.7", "3.4", "2.3", 
"2.3", "2.3", "2.7", "9.1", "4.0", "5.7", "17.6", "1.0", "6.7", 
"7.7", "0.7", "0.2", "1.3"), V16 = c("2P%", ".564", ".599", ".545", 
".532", ".591", ".515", ".578", ".415", ".518", ".653", ".488", 
".536", ".596", ".526", ".500", ".539", ".071", ".000", ".083"
), V17 = c("eFG%", ".521", ".597", ".541", ".532", ".560", ".547", 
".591", ".488", ".571", ".645", ".496", ".553", ".572", ".435", 
".516", ".556", ".351", ".300", ".359"), V18 = c("FT", "1.6", 
"1.1", "4.3", "0.9", "1.4", "0.7", "0.8", "0.6", "1.6", "2.4", 
"1.1", "1.4", "7.9", "0.3", "2.5", "2.1", "0.0", "0.0", "0.0"
), V19 = c("FTA", "2.3", "3.1", "5.4", "1.2", "1.9", "1.0", "1.1", 
"0.9", "1.8", "3.3", "1.3", "2.0", "12.3", "0.6", "2.8", "2.5", 
"0.0", "0.0", "0.0"), V20 = c("FT%", ".702", ".364", ".806", 
".812", ".750", ".667", ".692", ".619", ".905", ".733", ".813", 
".735", ".645", ".500", ".894", ".838", "", "", ""), V21 = c("ORB", 
"1.8", "5.1", "2.5", "0.7", "1.1", "0.3", "0.2", "0.3", "0.8", 
"3.3", "0.5", "1.0", "2.2", "0.4", "0.8", "1.4", "0.0", "0.0", 
"0.0"), V22 = c("DRB", "4.1", "6.5", "6.7", "1.3", "3.7", "1.5", 
"1.4", "1.5", "2.4", "6.5", "1.9", "4.4", "9.6", "0.8", "4.0", 
"3.5", "0.8", "0.4", "1.2"), V23 = c("TRB", "6.0", "11.5", "9.2", 
"2.1", "4.8", "1.7", "1.6", "1.8", "3.3", "9.8", "2.3", "5.3", 
"11.8", "1.2", "4.8", "5.0", "0.8", "0.4", "1.2"), V24 = c("AST", 
"0.9", "2.3", "3.2", "1.1", "1.3", "1.8", "2.1", "1.4", "2.3", 
"1.7", "3.0", "4.9", "5.7", "0.4", "3.9", "2.0", "1.2", "0.2", 
"2.3"), V25 = c("STL", "0.6", "0.9", "1.2", "0.3", "0.6", "0.5", 
"0.7", "0.3", "0.9", "0.8", "1.1", "1.1", "0.8", "0.1", "0.6", 
"1.9", "0.3", "0.2", "0.3"), V26 = c("BLK", "0.5", "1.1", "0.8", 
"0.3", "0.6", "0.4", "0.4", "0.3", "0.2", "1.2", "0.2", "0.9", 
"0.8", "0.1", "0.5", "0.7", "0.0", "0.0", "0.0"), V27 = c("TOV", 
"1.1", "1.9", "2.5", "0.7", "0.8", "0.9", "1.3", "0.4", "1.0", 
"1.4", "1.3", "1.5", "3.9", "0.3", "1.5", "2.0", "0.4", "0.1", 
"0.7"), V28 = c("PF", "1.9", "2.3", "2.8", "1.7", "1.9", "1.5", 
"1.6", "1.3", "1.6", "2.3", "2.0", "2.1", "3.1", "0.6", "2.6", 
"3.0", "0.9", "0.3", "1.6"), V29 = c("PTS", "9.2", "8.6", "20.4", 
"7.9", "9.0", "6.2", "6.3", "5.9", "10.4", "14.3", "9.0", "9.4", 
"31.1", "1.4", "13.0", "16.8", "1.3", "0.3", "2.6")), row.names = c(NA, 
20L), class = "data.frame")

各列是:

Player: string - name of the player
Pos (Position): string - position played by the player
Age: integer - age of the player as of February 1, 2023
Tm (Team): string - team the player belongs to
G (Games Played): integer - number of games played by the player
GS (Games Started): integer - number of games started by the player
MP (Minutes Played): integer - total minutes played by the player
FG (Field Goals): integer - number of field goals made by the player
FGA (Field Goal Attempts): integer - number of field goal attempts by the player
FG% (Field Goal Percentage): float - percentage of field goals made by the player
3P (3-Point Field Goals): integer - number of 3-point field goals made by the player
3PA (3-Point Field Goal Attempts): integer - number of 3-point field goal attempts by the player
3P% (3-Point Field Goal Percentage): float - percentage of 3-point field goals made by the player
2P (2-Point Field Goals): integer - number of 2-point field goals made by the player
2PA (2-point Field Goal Attempts): integer - number of 2-point field goal attempts by the player
2P% (2-Point Field Goal Percentage): float - percentage of 2-point field goals made by the player
eFG% (Effective Field Goal Percentage): float - effective field goal percentage of the player
FT (Free Throws): integer - number of free throws made by the player
FTA (Free Throw Attempts): integer - number of free throw attempts by the player
FT% (Free Throw Percentage): float - percentage of free throws made by the player
ORB (Offensive Rebounds): integer - number of offensive rebounds by the player
DRB (Defensive Rebounds): integer - number of defensive rebounds by the player
TRB (Total Rebounds): integer - total rebounds by the player
AST (Assists): integer - number of assists made by the player
STL (Steals): integer - number of steals made by the player
BLK (Blocks): integer - number of blocks made by the player
TOV (Turnovers): integer - number of turnovers made by the player
PF (Personal Fouls): integer - number of personal fouls made by the player
PTS (Points): integer - total points scored by the player
r dataframe data-cleaning
1个回答
0
投票

TL;DR(事后):

header=FALSE
是错误的,你正在破坏你的数据并导致它没有列名。删除它。

dataset <- read.csv("~/Downloads/nba_data_processed.csv")

然后再试一次。


  1. R(像几乎所有编程语言一样)区分大小写,因此

    "POS"
    不起作用,它是
    "Pos"

    for (PTS in cat_vars) {
      cat_freq <- table(dataset[PTS])
      print(paste("Frequency table for", PTS))
      print(cat_freq)
    }
    # [1] "Frequency table for Pos"
    # Pos
    #  C PF SG 
    #  3  1  2 
    # [1] "Frequency table for PTS"
    # PTS
    #  6.2  7.9  8.6    9  9.2 20.4 
    #    1    1    1    1    1    1 
    

    对连续数据进行

    table
    不太可能取得成果,不确定你打算用它做什么。

  2. 您的

    log(dataset$Age)
    错误不会发生于此
    dataset

    log(dataset$Age)
    # [1] 3.135494 3.367296 3.218876 3.091042 3.091042 3.178054
    dataset$LogAge <- log(dataset$Age)
    
  3. 您的

    as.integer
    错误不会与此
    dataset
    发生。但是,它已经是
    numeric
    ,因此转换为整数可能没有多大意义,具体取决于您打算如何使用它。

    class(dataset$GS)
    # [1] "numeric"
    dataset$GS <- as.integer(dataset$GS)
    
  4. hist(dataset$Tm, ...)
    需要数字数据,但
    Tm
    是字符,你不能在上面做直方图。然而,你可以在它的
    barplot
    上做一个
    table
    ,尽管我不知道这是否是你想要的。

    barplot(table(dataset$Tm), main="Tm Distribution", xlab="Tm")
    

  5. plot(dataset$Tm, ...
    不能很好地处理字符(再次),尽管您可以沿着它们排序并更改 x 标签。

    plot(seq_along(dataset$Tm), dataset$Age, main="Scatterplot: Tm vs. Age", 
         xlab="Tm", ylab="Age", xaxt="n", pch=16)
    axis(1, at = seq_along(dataset$Tm), labels = dataset$Tm)
    

    作为介绍,

    ggplot2
    更直观地处理分类数据。

    library(ggplot2)
    ggplot(dataset, aes(Tm, Age)) +
      geom_point() +
      labs(title = "Scatterplot: Tm vs. Age")
    

© www.soinside.com 2019 - 2024. All rights reserved.