搜索
查看: 580|回复: 0

[R] 小洁详解《R数据科学》--第十一章 forcats处理因子

[复制链接]

25

主题

50

帖子

390

积分

中级会员

Rank: 3Rank: 3

积分
390
发表于 2018-11-19 11:06:19 | 显示全部楼层 |阅读模式
本帖最后由 hijack 于 2018-11-19 11:11 编辑

1.准备工作
library(tidyverse)
library(forcats)
2.创建因子
[AppleScript] 纯文本查看 复制代码
#创建字符串向量
x1 <- c("Dec", "Apr", "Jan", "Mar")
#创建levels
month_levels <- c(
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
)
#创建因子
y1 <- factor(x1, levels = month_levels)
y1
#如果不在levels向量内,默认转换为NA
x2 <- c("Dec", "Apr", "Jam", "Mar")
y2 <- factor(x2, levels = month_levels)
y2
#调整默认转换NA的参数:
y2 <- parse_factor(x2, levels = month_levels)
y2

顺序问题
[AppleScript] 纯文本查看 复制代码
#默认:字母顺序
factor(x1)
#自定义:用level实现
f1<-factor(x1, levels = month_levels)
#与初始数据保持一致
f2<-factor(x1, levels = unique(x1))
查看levels
levels(f2)
3.示例
数据集:forcats::gss_cat
查看数据框中因子列的levels
[AppleScript] 纯文本查看 复制代码
#方法一:count
gss_cat %>%
count(race)
#方法二:geom_bar条形图
##条形图只需要映射一列即可,作为横坐标。因为纵坐标是计数。其实geom_bar就是count的可视化啦。
ggplot(gss_cat, aes(race)) +
geom_bar()
##默认丢弃没有数据的levels,强制显示用(drop = FALSE):
ggplot(gss_cat, aes(race)) +
geom_bar() +
scale_x_discrete(drop = FALSE)


4.修改因子水平
重新编码:fct_recode
[AppleScript] 纯文本查看 复制代码
#查看数据框中某因子列的levels
gss_cat %>% count(partyid)
#修改:新=旧,是按照赋值的思路
gss_cat %>%
mutate(partyid = fct_recode(partyid,
"Republican, strong" = "Strong republican",
"Republican, weak" = "Not str republican",
"Independent, near rep" = "Ind,near rep",
"Independent, near dem" = "Ind,near dem",
"Democrat, weak" = "Not str democrat",
"Democrat, strong" = "Strong democrat"
)) %>%
count(partyid)
#这里用mutate进行了覆盖


多个原level赋给同一个新levels
[AppleScript] 纯文本查看 复制代码
#(看mutate最后三行)
gss_cat %>%
mutate(partyid = fct_recode(partyid,
"Republican, strong" = "Strong republican",
"Republican, weak" = "Not str republican",
"Independent, near rep" = "Ind,near rep",
"Independent, near dem" = "Ind,near dem",
"Democrat, weak" = "Not str democrat",
"Democrat, strong" = "Strong democrat",
"Other" = "No answer",
"Other" = "Don't know",
"Other" = "Other party"
)) %>%
count(partyid)
这个合并似乎是没有什么简便方法可以逆转的。
更专用的levels合并函数:fct_collapse,待合并的列名用向量表示

[AppleScript] 纯文本查看 复制代码
gss_cat %>%
mutate(partyid = fct_collapse(partyid,
other = c("No answer", "Don't know", "Other party"),
rep = c("Strong republican", "Not str republican"),
ind = c("Ind,near rep", "Independent", "Ind,near dem"),
dem = c("Not str democrat", "Strong democrat")
)) %>%
count(partyid)




上一篇:1118 chapter 17&amp;18
下一篇:小洁详解《R数据科学》--第十三章 管道操作
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|手机版|小黑屋|生信技能树 ( 粤ICP备15016384号  

GMT+8, 2019-10-22 22:55 , Processed in 0.032476 second(s), 26 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.