搜索
查看: 655|回复: 4

[R] 啃书团- R数据科学 第一章

[复制链接]

3

主题

6

帖子

59

积分

注册会员

Rank: 2

积分
59
发表于 2018-9-3 21:49:34 | 显示全部楼层 |阅读模式
很高兴能参加这一次的啃书活动,其实看过很多类型的R教程。第一章我觉得更深入的学习可以参考
数据分析与图形艺术这本书。

[C] 纯文本查看 复制代码
# #1.1 install.packages("tidyverse") --------------------------------------

#1.1 install.packages("tidyverse")
library(tidyverse)
#检查是否有更新
tidyverse_update()
#下载其他需要的包
#install.packages(c("nycflights13", "gapminder", "Lahman"))




devtools::session_info(c("tidyverse"))


# #1.2 The mpg data frame -------------------------------------------------


#1.2 The mpg data frame
head(mpg)
#Creating a ggplot
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy))

#1.2 Exercises
#Run ggplot(data = mpg). What do you see?

#How many rows are in mpg? How many columns?
Q2 <-c(nrow(mpg),ncol(mpg));Q2
#What does the drv variable describe? Read the help for ?mpg to find out.
variable.names(mpg)
mpg_factor <- factor(mpg$drv);head(mpg_factor)
?mpg
#Make a scatterplot of hwy vs cyl.
ggplot(mpg, aes(x = hwy, y = cyl)) +
  geom_point()
#What happens if you make a scatterplot of class vs drv? Why is the plot not useful?
ggplot(mpg, aes(x = class, y = drv)) +
  geom_point()
#散点图无法显示哪些重叠或不重叠。
count(mpg, drv, class)

# #1.3 使用color参数搭建图形映射 ----------------------------------------------------


#1.3 使用color参数搭建图形映射
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = class))

#映射为大小,但不适合将class类型转换为size。
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, size = class))

# 透明度 alpha
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, alpha = class))

# 形状 shape ,注意 ggplot2只能同时使用6种形状。
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, shape = class))

#aes()函数将名称和待显示变量结合起来。在aes外部设置的color没有实际意义。
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy), color = "blue")

#1.3 Exercises
#1.3.1What’s gone wrong with this code? Why are the points not blue?
  
  # ggplot(data = mpg) + 
  # geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = 1:234))#aes()内为映射的数据

#1.3.2 Which variables in mpg are categorical? Which variables are continuous? 
    #(Hint: type ?mpg to read the documentation for the dataset). 
    #How can you see this information when you run mpg?
glimpse(mpg)  #查看数据集变量信息。

#1.3.3 Map a continuous variable to color, size, and shape. 
    #How do these aesthetics behave differently for categorical vs. continuous variables?
color_cty <- ggplot(mpg, aes(x = displ, y = hwy, colour = cty)) +
  geom_point()  ;color_cty
size_cty <- ggplot(mpg, aes(x = displ, y = hwy, size = cty)) +
  geom_point();size_cty
shape_cty <-ggplot(mpg, aes(x = displ, y = hwy, shape = cty)) +
  geom_point();shape_cty #连续变量不能用shape来表示。

#1.3.4 What happens if you map the same variable to multiple aesthetics?
ggplot(mpg, aes(x = displ, y = hwy, colour = hwy, size = displ)) +
  geom_point()
  #Mapping a single variable to multiple aesthetics is redundant. 
  #Because it is redundant information, in most cases avoid mapping a single variable to multiple aesthetics.

#1.3.5 What does the stroke aesthetic do? What shapes does it work with? 
  #(Hint: use ?geom_point)
ggplot(mtcars, aes(wt, mpg)) +
  geom_point(shape = 21, colour = "black", fill = "white", size = 5, stroke = 5)
  #geom_point的形状、颜色、填充、大小和边框。剋以根据个人喜好来决定。


#1.3.6 What happens if you map an aesthetic to something other than a variable name, like aes(colour = displ < 5)?
ggplot(mpg, aes(x = displ, y = hwy, colour = displ < 5)) +
  geom_point()
  #映射为表达式,此时产生一种逻辑变量,从表达式的结果来取值。


# #1.4 Common problems ----------------------------------------------------


#1.4 Common problems
  # +放在行末,而不是行首。


# #1.5 Facets 分面(图形分割) ----------------------------------------------------

#1.5 Facets 分面(图形分割)
  #通过单个变量尽行分面 facet_wrap, ~后加变量名。此变量应为离散型变量。
  ggplot(data = mpg) + 
    geom_point(mapping = aes(x = displ, y = hwy)) + 
    facet_wrap(~ class, nrow = 4) 
  #通过单个变量尽行分面 facet_grid, ~隔开两个变量名。
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_grid(drv ~ cyl)
  #也可以使用.来代替变量名,以不进行  行或列的维度的分面。

#1.5 Exercises

  #1.What happens if you facet on a continuous variable?
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point() +
  facet_grid(. ~ cty)
#它将连续变量转换为因子,并以因子来分面。

  #2.What do the empty cells in plot with facet_grid(drv ~ cyl) mean? 
    #How do they relate to this plot?
  
  ggplot(data = mpg) + 
  geom_point(mapping = aes(x = drv, y = cyl))
 # They are cells in which there are no values of the combination of drv and cyl.
  ggplot(data = mpg) + 
    geom_point(mapping = aes(x = drv, y = cyl)) + 
    facet_grid(drv ~ cyl)                        
  #The locations in the above plot without points are the same cells in facet_grid(drv ~ cyl) that have no points.

  #3.What plots does the following code make? What does . do?
#   ... + facet_grid(drv ~ .)  VS  ... +  facet_grid(. ~ cyl)
#   Facets by values of drv on the y-axis: vs Facets by values of cyl on the x-axis:
#Take the first faceted plot in this section:

#4.Take the first faceted plot in this section:
  ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)
#What are the advantages to using faceting instead of the colour aesthetic? 
  #What are the disadvantages? How might the balance change if you had a larger dataset?
#优点是可以更好的分类,以区分不同变量。而颜色不适用于区分大量>9类的数据,并且重叠数据很难区分。
#缺点是不好直观比较,因为这些点分布在不同图上,无关的参数不会再可视化。
   
#5.Read ?facet_wrap. What does nrow do? What does ncol do? What other options control the layout of the individual panels? 
  #Why doesn’t facet_grid() have nrow and ncol arguments?
   #参数nrow(ncol)确定分面时要使用的行数(列),以控制分面的布局. 
   #因为facet_wrap()只面向一个变量,所以这是必要的。 
   #facet_grid()不需要这些参数,因为行数和列数由指定变量的唯一值决定。 

#6.When using facet_grid() you should usually put the variable with more unique levels in the columns. Why?
#因为横向绘图确定X轴,Y轴数据可以更好被比较。拥有更多唯一值的变量在列上时,数据可视化效果越好。


# 1.6几何对象 -----------------------------------------------------------------
#1.6 Geometric objects 几何对象(条形 直线 曲线 点 矩形等30种2)核心为mapping参数

   # geom_point 点几何对象
   ggplot(data = mpg) + 
     geom_point(mapping = aes(x = displ, y = hwy))
   
   # geom_smooth 平滑曲线几何对象
   ggplot(data = mpg) + 
     geom_smooth(mapping = aes(x = displ, y = hwy))

   # geom_smooth.linetype 曲线类型
   ggplot(data = mpg) + 
     geom_smooth(mapping = aes(x = displ, y = hwy, linetype = drv))
   
  #group or color. show.lengend是否添加图例
   ggplot(data = mpg) +
     geom_smooth(
       mapping = aes(x = displ, y = hwy, color = drv),
       show.legend = FALSE
     )
   #将一组映射传递给ggplot(),仅对该图层有效.
   ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
     geom_point() + 
     geom_smooth()
   
   ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
     geom_point(mapping = aes(color = class)) + 
     geom_smooth()
   ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
     geom_point(mapping = aes(color = class)) + 
     geom_smooth(data = filter(mpg, class == "subcompact"), se = FALSE) #选出 微型车 的曲线
  
# What geom would you use to draw a line chart? A boxplot? A histogram? An area chart?
     
  #line chart: geom_line()  boxplot: geom_boxplot()  histogram: geom_hist() area chart: geom_area()
   
 #  What does the se argument to geom_smooth() do?
     
  #   It adds standard error bands to the lines.  

# # 1.7 Statistical transformations ---------------------------------------
   
# 1.7 Statistical transformations 20种 参考?stat.bin
   ggplot(data = diamonds) + 
     geom_bar(mapping = aes(x = cut)) #you can using stat_count() instead of geom_bar():
   #a. override the default stat.
   # change the stat of geom_bar() from count (the default) to identity. 

   #b. override the default mapping from transformed variables to aesthetics
   ggplot(data = diamonds) + 
     geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))
   #c.draw greater attention to the statistical transformation in your code
   ggplot(data = diamonds) + 
     stat_summary(
       mapping = aes(x = cut, y = depth),
       fun.ymin = min,
       fun.ymax = max,
       fun.y = median
     )
   #In our proportion bar chart, we need to set group = 1 Why? In other words what is the problem with these two graphs?
     
    #If group is not set to 1, then all the bars have prop == 1. 
  # The function geom_bar() assumes that the groups are equal to the x values, since the stat computes the counts within the group. 
   

# 1.8 Position adjustments ------------------------------------------------
#1.8位置调整
  #5种方式,条形图 dodge fill  散点图identity jitter stack.
   ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) + 
     geom_bar(alpha = 1/5, position = "identity")
   ggplot(data = diamonds, mapping = aes(x = cut, colour = clarity)) + 
     geom_bar(fill = NA, position = "identity")
#1.9  坐标系
   #switches the x and y axes :: coord_flip()
   ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + 
     geom_boxplot()
   ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + 
     geom_boxplot() +
     coord_flip()
   #sets the aspect ratio correctly for maps.:: coord_quickmap()
   #install.packages('maps')
   library(maps)
   nz <- map_data("nz")
   
   ggplot(nz, aes(long, lat, group = group)) +
     geom_polygon(fill = "white", colour = "black")
   
   ggplot(nz, aes(long, lat, group = group)) +
     geom_polygon(fill = "white", colour = "black") +
     coord_quickmap()
   
   #uses polar coordinates  :: coord_polar()
   bar <- ggplot(data = diamonds) + 
     geom_bar(
       mapping = aes(x = cut, fill = cut), 
       show.legend = FALSE,
       width = 1
     ) + 
     theme(aspect.ratio = 1) +
     labs(x = NULL, y = NULL)
  bar 
   bar + coord_flip() 
   bar + coord_polar()
   
   #1.9  图形分层 一般模板,填好这7个参数就可以画出私人定制的图啦。此外ggplot2的默认设置也很强大。
  # ggplot(data = <DATA>) + 
  #   <GEOM_FUNCTION>(
  #     mapping = aes(<MAPPINGS>),
  #     stat = <STAT>, 
  #     position = <POSITION>
  #   ) +
  #   <COORDINATE_FUNCTION> +
  #   <FACET_FUNCTION>
   
   #主要思路, 数据集-统计变换-选择几何对象-选择合适坐标系。
   #进一步调整位置、分面;附加图层,每个图层使用单个数据集、一个几何对象、一个统计变换和一个位置调整。
   




上一篇:0903 R数据科学啃书
下一篇:R数据可续写 啃书
回复

使用道具 举报

2

主题

17

帖子

204

积分

中级会员

Rank: 3Rank: 3

积分
204
发表于 2018-9-3 22:39:07 | 显示全部楼层
刚看完第一章.
回复 支持 反对

使用道具 举报

3

主题

6

帖子

59

积分

注册会员

Rank: 2

积分
59
 楼主| 发表于 2018-9-5 09:53:01 | 显示全部楼层

其实有的章节很短的,主要是能实践到自己的数据。
回复 支持 反对

使用道具 举报

2

主题

17

帖子

204

积分

中级会员

Rank: 3Rank: 3

积分
204
发表于 2018-9-5 14:10:21 | 显示全部楼层
乔士达 发表于 2018-9-5 09:53
其实有的章节很短的,主要是能实践到自己的数据。

现在看到了第五章,不知道这周还能不能看完.
回复 支持 反对

使用道具 举报

25

主题

49

帖子

326

积分

中级会员

Rank: 3Rank: 3

积分
326
发表于 2018-9-6 17:05:03 | 显示全部楼层
点个赞就走
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|手机版|小黑屋|生信技能树    

GMT+8, 2019-5-22 17:49 , Processed in 0.055325 second(s), 25 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.