将示例数据映射到实际 csv 数据

问题描述 投票:0回答:2

感谢戴维和每个人,我认为我取得了进步。它仍然不会生成折线图,但代码中逻辑上没有任何看起来错误的地方。我在这里不相信任何功劳——我只是剪切并粘贴了比我聪明的人已经想出的东西,但我仍然没有得到图表。最后链接到 github csv。

data = read.csv("C:/Users/12083/Desktop/librarydata.csv") # Read the data into R

head(data)                                            # Quality control, looks good
str(data)
data$dates = as.Date(data$dates, format = "%d/%m/%Y") # This formats the date as dates for R
library(tidyverse)                                    # This will import some functions that you need, spcifically %>% and ggplot
# Step 0: look that the data makes sense to you
summary(data$dates)
summary(data$city)

# Step 1: filter the right data
start.date = as.Date("2003-01-02")
end.date   = as.Date("2010-05-04")

filtered = data %>% 
  filter(dates >= start.date & 
           dates <= end.date) # This will only take rows between those dates
summary(filtered)
colnames(filtered)

library(dplyr)

filtered_agg <- filtered %>%
  group_by(city, dates, Location) %>%
  summarize(location_sum=n()) 

filtered_agg
summary(filtered_agg)
# Step 2: Plotting
# Now you can create the plot with ggplot:
# Notes: 
# I added geom_point() so that each X value gets a point. 
# I think it's easier to read. You can remove this if you like
# Also added color, because I like it, feel free to delete



# The problem is in here - somewhere
Plot = ggplot(filtered_agg, aes(x=dates, y=Location, group = city)) + geom_line(aes(linetype=city, color = city)) + geom_point(aes(color=city))
Plot
dput

https://github.com/karl1776/chart

colnames(filtered)
 [1] "ï..Class.ID"                "city"                       "dates"                      "year"                       "month"                     
 [6] "day"                        "cit"                        "Department.College"         "Course.Level"               "Course.Title"              
[11] "Tour."                      "TILT."                      "Date.Taught"                "Session.Number"             "AM.PM"                     
[16] "Hour.Count"                 "Library.Instructor"         "Other.Library.Instructor"   "Duplicate."                 "Course.Instructor"         
[21] "ACRL"                       "IPED"                       "Location"                   "Building.Room"              "Distance.Class."           
[26] "Location.of.Site.1"         "Site.1.Number.of.Students"  "Location.of.Site.2"         "Site.2.Number.of.Students"  "Location.of.Site.3"        
[31] "Site.3.Number.of.Students"  "Location.of.Site.4"         "Site.4.Number.of.Students"  "Location.of.Site.5"         "Site.5.Number.of.Students" 
[36] "Location.of.Site.6"         "Site.6.Number.of.Students"  "Location.of.Site.7"         "Site.7.Number.of.Students"  "Location.of.Site.8"        
[41] "Site.8.Number.of.Students"  "Location.of.Site.9"         "Site.9.Number.of.Students"  "Location.of.Site.10"        "Site.10.Number.of.Students"

也许我只是没有看到它,但我很难查看带有虚拟数据的示例,并将其转换为如何从 csv 文件加载实际数据。图片显示了我从虚拟数据中的输出 - 正是我想要的。当我使用实际数据时,没有任何反应 - 我是否遗漏了 ggplot 命令来打印绘图?

library(readxl)
require(tidyverse)
require(ggplot2)
require(lubridate)
#load data
df <- read_excel("C:/Users/12083/Desktop/librarydata.xlsx")
#plot data
df_example %>%
  ggplot(aes(date,city, color=city))+
  geom_line(aes(linetype=lt))+ #you can use single string for the same linetype for all lines or a vector of strings for each data point
  scale_linetype_identity()+ #this removes the linetype from the legend
  theme_minimal()

df_example

我得到这个输出——这是完全正确的,但没有伴随它的情节。

city      dates classes       lt
1       Boise 2020-01-01      52    solid
2       Boise 2020-02-01      36    solid
3       Boise 2020-03-01      69    solid
4       Boise 2020-04-01     100    solid
5       Boise 2020-05-01      72    solid
6   Pocatello 2020-01-01      82   dashed
7   Pocatello 2020-02-01      15   dashed
8   Pocatello 2020-03-01      68   dashed
9   Pocatello 2020-04-01      17   dashed
10  Pocatello 2020-05-01      51   dashed
11  Salt Lake 2020-01-01      71   dotted
12  Salt Lake 2020-02-01      65   dotted
13  Salt Lake 2020-03-01      33   dotted
14  Salt Lake 2020-04-01      44   dotted
15  Salt Lake 2020-05-01      16   dotted
16 Twin Falls 2020-01-01       3  dotdash
17 Twin Falls 2020-02-01      30  dotdash
18 Twin Falls 2020-03-01      19  dotdash
19 Twin Falls 2020-04-01      34  dotdash
20 Twin Falls 2020-05-01      69  dotdash
21  Elsewhere 2020-01-01      62 longdash
22  Elsewhere 2020-02-01      14 longdash
23  Elsewhere 2020-03-01      59 longdash
24  Elsewhere 2020-04-01      35 longdash
25  Elsewhere 2020-05-01      91 longdash

dput

structure(list(`Class ID` = c(4438, 4439, 4428, 4437, 4430, 4431, 
4432, 4433, 4434, 4435, 4436, 4427, 4440, 4417, 4414, 4407, 4413, 
4412, 4418, 4410), city = c("Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Meridian", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Idaho Falls"), date = structure(c(1468972800, 1468972800, 
1468886400, 1468800000, 1468454400, 1468454400, 1468368000, 1468368000, 
1468368000, 1468281600, 1468281600, 1466553600, 1466553600, 1461283200, 
1460592000, 1460419200, 1460419200, 1460073600, 1460073600, 1459987200
), tzone = "UTC", class = c("POSIXct", "POSIXt")), year = c(2016, 
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016), month = c(7, 
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 4, 4, 4, 4, 4, 4, 4), day = c(20, 
20, 29, 18, 14, 14, 13, 13, 13, 12, 12, 22, 22, 22, 13, 12, 12, 
8, 8, 7), cit = c("Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Meridian", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Idaho Falls"), `Department/College` = c("College of Arts and Letters", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Arts and Letters", "Library", "Library", "Library", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Education", "Library", "Division of Health Sciecnes", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Arts and Letters", "College of Arts and Letters"), 
    `Course Level` = c("Lower Division", "Lower Division", "Lower Division", 
    "Lower Division", "Lower Division", "Lower Division", "K-12", 
    "K-12", "K-12", "Lower Division", "Lower Division", "Lower Division", 
    "K-12", "Graduate", "Lower Division", "Lower Division", "Lower Division", 
    "Lower Division", "Lower Division", "Lower Division"), `Course Title` = c("ACAD 1111", 
    "ACAD 1111", "POLS 1110", "ENGL 1123", "ACAD 1111", "ACAD 1111", 
    "Kid University", "Kid University", "Kid University", "ACAD 1111", 
    "ACAD 1111", "EDUC 1110", "Kid University", "Nursing_Orientation", 
    "ENGL 1102", "ENGL 1101", "ENGL 1101", "ENGL 1102", "ENGL 1102", 
    "ENGL 1102"), `Tour?` = c(FALSE, FALSE, FALSE, TRUE, FALSE, 
    FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, 
    FALSE, FALSE, TRUE, TRUE, FALSE), `TILT?` = c(FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
    ), `Date Taught` = structure(c(1468972800, 1468972800, 1468886400, 
    1468800000, 1468454400, 1468454400, 1468368000, 1468368000, 
    1468368000, 1468281600, 1468281600, 1466553600, 1466553600, 
    1461283200, 1460592000, 1460419200, 1460419200, 1460073600, 
    1460073600, 1459987200), tzone = "UTC", class = c("POSIXct", 
    "POSIXt")), `Session Number` = c("Third Session", "Third Session", 
    "Single Session", NA, "Second Session", "Second Session", 
    "Single Session", "Single Session", "Single Session", "First Session", 
    "First Session", "Single Session", "Single Session", "Single Session", 
    "Single Session", "Single Session", "First Session", "Third Session", 
    "Third Session", "Second Session"), `AM/PM` = c("AM", "PM", 
    "PM", "PM", "AM", "PM", "PM", "PM", "PM", "AM", "PM", "PM", 
    "PM", "AM", "PM", "PM", "AM", "AM", "AM", "AM"), `Hour Count` = c(1.5, 
    1.5, 1, 1.5, 1.5, 1.5, 0.5, 0.5, 1, 1.5, 1.5, 1.5, 1, 1, 
    1.5, 1.5, 1.5, 1, 1, 1.5), 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Cathy Gray", 
    NA, NA, NA, NA, "Monte Asche", "Philip Homan", NA), `Duplicate?` = c(FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, 
    FALSE), ACRL = c(0, 0, 7, 5, 0, 0, 7, 7, 7, 22, 9, 
    8, 13, 35, 19, 6, 8, 0, 0, 0), IPED = c(22, 9, 7, 5, 23, 
    9, 7, 7, 7, 22, 9, 8, 13, 35, 19, 6, 8, 19, 19, 22), `Location of Instructor` = c("Pocatello", 
    "Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
    "Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
    "Pocatello", "Pocatello", "Meridian", "Pocatello", "Pocatello", 
    "Pocatello", "Pocatello", "Pocatello", "Idaho Falls"), `Building/Room` = c("LIBR 212", 
    "LIBR 212", "LIBR 212", "LIBR 212", "LIBR 212", "LIBR 212", 
    "Special Collections", "LIBR 212", "LIBR 212", "LIBR 212", 
    "LIBR 212", "LIBR 212", "LIBR 212", "Meridian", "LIBR 212", 
    "LIBR 212", "LIBR 212", "LIBR 212", "LIBR 212", "CHE 306"
    ), `Distance Class?` = c(FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), `Location of Site 1` = c("Boise", 
    "Boise", "Boise", "Boise", "Boise", "Boise", "Boise", "Boise", 
    "Boise", "Boise", "Boise", "Boise", "Boise", "Boise", "Boise", 
    "Boise", "Boise", "Boise", "Boise", "Boise"), `Site 1 Number of Students` = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    `Location of Site 2` = c("Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls", "Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls", "Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls", "Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls", "Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls"), `Site 2 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 3` = c("Twin Falls", 
    "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", 
    "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", 
    "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", 
    "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls"), 
    `Site 3 Number of Students` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 4` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `Site 4 Number of Students` = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    `Location of Site 5` = c(NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_), `Site 5 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 6` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 6 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 7` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 7 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 8` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 8 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 9` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 9 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 10` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 10 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))
> 
r ggplot2 lubridate
2个回答
1
投票

OP,看来您通常在如何从 *.csv 导入数据并将其转换为您想要的绘图方面遇到一些麻烦。由于您似乎能够创建一个绘图,因此我将忽略该部分,并引导您完成一个导入数据的好方法的示例,然后确保您可以将其转换为您的绘图。

导入.csv文件并准备数据

我将从使用您在问题中发布的

df_example
的输出创建的 .csv 文件开始。我将该数据导出到 *.csv 文件,现在我们可以导入它:

df <- read.csv('OP_example.csv')

导入数据后的第一步是确保它“看起来正确”并了解结构。即使您自己创建了文件,确保

df
看起来应有的样子也非常重要。在这里,
head()
str()
summary()
是你的朋友。

> head(df)
  X      city      dates classes     lt
1 1     Boise 2020-01-01      52  solid
2 2     Boise 2020-02-01      36  solid
3 3     Boise 2020-03-01      69  solid
4 4     Boise 2020-04-01     100  solid
5 5     Boise 2020-05-01      72  solid
6 6 Pocatello 2020-01-01      82 dashed

> str(df)
'data.frame':   25 obs. of  5 variables:
 $ X      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ city   : chr  "Boise" "Boise" "Boise" "Boise" ...
 $ dates  : chr  "2020-01-01" "2020-02-01" "2020-03-01" "2020-04-01" ...
 $ classes: int  52 36 69 100 72 82 15 68 17 51 ...
 $ lt     : chr  "solid" "solid" "solid" "solid" ...

您可以看到,在写入 *.csv 文件时,它创建了一个“X”列,它只是行号。没什么大不了。我们其他一切看起来都很好,只是您会注意到

df$dates
被读作
chr
,而不是
Date
或其他类似日期的类。由于我将使用此列创建一个绘图,因此我需要它作为日期:

> df$dates <- as.Date(df$dates, format='%Y-%m-%d')

> str(df)
'data.frame':   25 obs. of  5 variables:
 $ X      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ city   : chr  "Boise" "Boise" "Boise" "Boise" ...
 $ dates  : Date, format: "2020-01-01" "2020-02-01" "2020-03-01" "2020-04-01" ...
 $ classes: int  52 36 69 100 72 82 15 68 17 51 ...
 $ lt     : chr  "solid" "solid" "solid" "solid" ...

请注意,我指定了

format=
来表示日期。您将在
%
函数的文档
中找到与 
format= 相关的
strptime()
命名法的信息。当我在
str()
上再次运行
df
时,您会看到
df$dates
现在是
Date
类,而不是
chr

绘图

现在对于绘图,只需确保您正在读取并绘制正确的数据框即可。从您的代码示例中...您正在使用

df_example
进行绘图,但在
df
中进行阅读。不确定这是不是拼写错误。

您的偏好似乎是使用管道

%>%
命令,而不是在
ggplot()
中声明数据帧,所以我将在这里执行此操作:

df %>%
  ggplot(aes(x=dates, y=classes, color=city)) +
  geom_line() + geom_point() + theme_bw()

给你:

希望对您有所帮助。由于我们没有您的特定 *.csv 文件,并且您在绘制特定数据框时没有遇到问题,因此您遇到困难的最合理的地方是确保当您在文件中读取时,列和类您的数据采用您期望的格式。此外,请确保您的代码正在调用以绘制正确的数据框。


0
投票

聚合和绘图

dplyr
可以轻松聚合数据。此代码将创建一个新的数据集,其中包含“位置”变量的每个值出现在城市和日期的每个唯一组合中的次数:

library(dplyr)

filtered_agg <- filtered %>%
  group_by(city, dates, Location) %>%
  summarize(location_sum=n()) 

filtered_agg

对于情节,类似这样的事情应该给你一个结果:

Plot = ggplot(filtered_agg, aes(x=dates, y=location_sum, group = city)) + geom_line(aes(linetype=city, color = city)) + geom_point(aes(color=city)) 

Plot 

但是对于一个简单的折线图来说,你的维度似乎太多了。如果城市数量(您也可以切换城市和位置总和)不太大,

facet_wrap
将使绘图更易读:

ggplot(filtered_agg, aes(x=dates, y=location_sum)) + geom_line(aes(linetype=Location, color = Location)) + geom_point(aes(color=Location)) + facet_wrap(~city) 

加载数据

log = df$city
行是否有效(如果无效,则会返回错误消息)?如果是的话,看来你想太多了。您可以跳过创建
df_example
所涉及的步骤,直接在
df
命令中使用
ggplot

library(readxl)
library(ggplot2)

df <- read_excel("C:/Users/12083/Desktop/librarydata.xlsx")

df %>%
   ggplot(aes(dates,classes, color=city))+
   geom_line(aes(linetype=lt))+ 
   scale_linetype_identity()+ #this removes the linetype from the legend
   theme_minimal()

如果这不起作用,您可能需要调整

read_excel
命令中的选项。

最新问题
© www.soinside.com 2019 - 2024. All rights reserved.