1 R绘图

1.1 斜率图与newggslopegraph

最近在制作分疾病别的患病率历年变化趋势时,看到有的研究采用多列排序图来展示,每列代表某年的疾病患病率排序,但不同列进行比较的时候只能看出排序变化而很难直观看出率的变化。

经过检索后发现斜率图可以满足前述需求,而newggslopegraph可以实现该需求,故记录该网页给出的代码,方便之后使用。

library(CGPfunctions)
df <- newgdp[16:30, ]
newggslopegraph(df, Year, GDP, Country,
                Title = "GDP evolution",
                SubTitle = "1970-1979",
                Caption = "By R CHARTS",
                XTextSize = 18,    # Size of the times
                YTextSize = 2,     # Size of the groups
                TitleTextSize = 14,
                SubTitleTextSize = 12,
                CaptionTextSize = 10,
                TitleJustify = "right",
                SubTitleJustify = "right",
                CaptionJustify = "left",
                DataTextSize = 2.5, # Size of the data
                ReverseYAxis = TRUE,
                ReverseXAxis = FALSE) # reverse axes

1.2 ggplot坐标轴刻度放缩

最近在用ggplot绘图,遇到了一个需求,就是对坐标轴进行放缩并不均等地展示刻度。

经过检索后发现简书有位大佬王诗翔科普了相关函数,可以实现该目的,故将该网页介绍的函数和给出的代码部分记录在此。

首先展示原图,很常规的散点图。

# Convert dose column dose from a numeric to a factor variable
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
library(ggplot2)
# scatter plot
sp<-ggplot(cars, aes(x = speed, y = dist)) + geom_point()
sp

想要改变连续轴的范围,可以使用xlim()和ylim()函数,min和max是每个轴的最小值和最大值。

# scatter plots : change x and y limits
sp + xlim(0, 40)+ylim(0, 150)

注意,函数 expand_limits() 可以用于:快速设置在x和y轴在 (0,0) 处的截距项,改变x和y轴范围。

# set the intercept of x and y axis at (0,0)
sp + expand_limits(x=0, y=0)

# change the axis limits
sp + expand_limits(x=c(0,30), y=c(0, 150))

也可以使用函数scale_x_continuous()和scale_y_continuous()分别改变x和y轴的刻度范围(该函数还可以通过trans参数控制轴转换实现对数化)。

# Change x and y axis labels, and limits
sp + scale_x_continuous(name="Speed of cars", limits=c(0, 30)) + 
  scale_y_continuous(name="Stopping distance", limits=c(0, 150))

其他实现轴转换(如对数化和开方)的内置转换函数:

  • scale_x_log10(), scale_y_log10() : for log10 transformation
  • scale_x_sqrt(), scale_y_sqrt() : for sqrt transformation
  • scale_x_reverse(), scale_y_reverse() : to reverse coordinates
  • coord_trans(x =“log10”, y=“log10”) : possible values for x and y are “log2”, “log10”, “sqrt”, …
  • scale_x_continuous(trans=‘log2’), scale_y_continuous(trans=‘log2’) : another allowed value for the argument trans is ‘log10’

使用示例:

# Log transformation using scale_xx()
# possible values for trans : 'log2', 'log10','sqrt'
sp + scale_x_continuous(trans='log2') +
  scale_y_continuous(trans='log2')

# Sqrt transformation
sp + scale_y_sqrt()

# Reverse coordinates
sp + scale_y_reverse() 

# Possible values for x and y : "log2", "log10", "sqrt", ...
sp + coord_trans(x="log2", y="log2") 

格式化刻度标签,需要加载scales包:

# Log2 scaling of the y axis (with visually-equal spacing)
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
sp + scale_y_continuous(trans = log2_trans())

# show exponents
sp + scale_y_continuous(trans = log2_trans(),
    breaks = trans_breaks("log2", function(x) 2^x),
    labels = trans_format("log2", math_format(2^.x)))

# Percent
sp + scale_y_continuous(labels = percent)

# dollar
sp + scale_y_continuous(labels = dollar)

# scientific
sp + scale_y_continuous(labels = scientific)

scales包也可以实现对数化刻度标记:

library(MASS) # to access Animals data sets
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
library(scales) # to access break formatting functions
# x and y axis are transformed and formatted
p2 <- ggplot(Animals, aes(x = body, y = brain)) + geom_point() +
     scale_x_log10(breaks = trans_breaks("log10", function(x) 10^x),
              labels = trans_format("log10", math_format(10^.x))) +
     scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
              labels = trans_format("log10", math_format(10^.x))) +
     theme_bw()
# log-log plot without log tick marks
p2

# Show log tick marks
p2 + annotation_logticks() 

设置显示的位置:

# Log ticks on left and right
p2 + annotation_logticks(sides="lr")

# All sides
p2+annotation_logticks(sides="trbl")

使用函数scale_x_date()和scale_y_date()实现日期轴格式化:

df <- data.frame(
  date = seq(Sys.Date(), len=100, by="1 day")[sample(100, 50)],
  price = runif(50)
)
df <- df[order(df$date), ]
# Plot with date
dp <- ggplot(data=df, aes(x=date, y=price)) + geom_line()
dp

library(scales)
# Format : month/day
dp + scale_x_date(labels = date_format("%m/%d")) +
  theme(axis.text.x = element_text(angle=45))

# Format : Week
dp + scale_x_date(labels = date_format("%W"))

# Months only
dp + scale_x_date(breaks = date_breaks("months"),
  labels = date_format("%b"))

设置日期轴范围:

# Plot with dates
dp <- ggplot(data=economics, aes(x=date, y=psavert)) + geom_line()
dp

# Axis limits c(min, max)
min <- as.Date("2002-1-1")
max <- max(economics$date)
dp+ scale_x_date(limits = c(min, max))
## Warning: Removed 414 rows containing missing values or values outside the scale range
## (`geom_line()`).

还可以阅读函数 scale_x_datetime()和scale_y_datetime()的说明进一步学习。

1.3 ggplot坐标轴截断

除却上方提到的ggplot坐标轴放缩,最近的另一种需求是对坐标轴部分进行截断,以使得坐标轴刻度从零开始,但又不留太多空白。发现两种解决措施,一是压缩部分刻度,一是删除部分刻度。

压缩部分刻度

###################
library(ggplot2)
library(scales)

squash_axis <- function(from, to, factor) { 
  # Args:
  #   from: left end of the axis
  #   to: right end of the axis
  #   factor: the compression factor of the range [from, to]
  trans <- function(x) {    
    # get indices for the relevant regions
    isq <-  !is.na(x) & x > from & x < to
    ito <-  !is.na(x) & x >= to
    # apply transformation
    x[isq] <- from + (x[isq] - from)/factor
    x[ito] <- from + (to - from)/factor + (x[ito] - to)
    return(x)
  }
  inv <- function(x) {
    # get indices for the relevant regions
    isq <-  !is.na(x) & x > from & x < from + (to - from)/factor
    ito <-  !is.na(x) & x >= from + (to - from)/factor
    # apply transformation
    x[isq] <- from + (x[isq] - from) * factor
    x[ito] <- to + (x[ito] - (from + (to - from)/factor))
    return(x)
  }
  # return the transformation
  return(trans_new("squash_axis", trans, inv))
}

#################
data <- data.frame(
  x <- seq(1,10,1),
  y <- seq(30,120,10)
)

ggplot(data=data)+
  geom_point(aes(x=x,y=y))+
  scale_y_continuous(limits=c(0,120),breaks=c(seq(0,120,10)))+
  coord_trans(y=squash_axis(0,30,5))+
  theme_classic()

删除部分刻度

library(ggbreak)
## ggbreak v0.1.2
## 
## If you use ggbreak in published research, please cite the following
## paper:
## 
## S Xu, M Chen, T Feng, L Zhan, L Zhou, G Yu. Use ggbreak to effectively
## utilize plotting space to deal with large datasets and outliers.
## Frontiers in Genetics. 2021, 12:774846. doi: 10.3389/fgene.2021.774846
ggplot(data=data)+
  geom_point(aes(x=x,y=y))+
  scale_y_continuous(limits=c(0,120),breaks=c(seq(0,120,10)))+
  scale_y_break(c(10,30),
                ticklabels=c(seq(30,120,10)))+
  theme_classic()+
  theme(axis.line.y.right  = element_line(colour = "white"),
        axis.text.y.right  = element_text(colour = 'white'),
        axis.ticks.y.right  = element_line(colour = 'white'))

此处遇到问题是ggrepel包和ggbreak包冲突,该问题出现在 ggrepel R 包的最新更新中(CRAN 中为 2024-09-07,Github 中为 2024-09-08,0.9.6),使用旧版本(Github中为2024-01-11,0.9.5)即可解决。

1.4 地理空间包的退役

参加中山大学传染病预测预警建模培训班,听赖颖斯教授讲解传染病的时空预测模型构建时,用到了很多地理空间相关的R软件包,安装时发现很多包已经无法下载,检索发现R-spatial相关的rgdal,rgeos和maptools包都在2023年退役了,因为这些软件包的开发者Roger Bivand已经退休,不过在历史版本的R包资源中还是可以找到并下载这些软件包的,不知道未来会有哪些包代替它们。