Приведенный ниже код выберет запись с максимальным ежедневным death
и максимальным ежедневным recovered
для максимального date
в данных.
## call the dplyr library
library(dplyr)
## read the data into R
df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv', stringsAsFactors = FALSE)
## determine the max date contained within the data
max.date <- df[which.max(as.Date(df$day)),"day"]
## copy the data to preserve original
df1 <- df
## filter the data to only entries from the max day
df1 <- filter(df1, as.Date(date, "%Y/%m/%d") == as.Date(max.date))
## determine the entry with the most deaths
max.deaths <- df1[which.max(df1$death),]
## format the number of deaths as given in the example
max.deaths$death <- paste0("**",max.deaths$death,"**")
## determine the entry with the most recovered
max.recovered <- df1[which.max(df1$recovered),]
## format the number recovered to match the format of the example
max.recovered$recovered <- paste0("**",max.recovered$recovered,"**")
## create a data frame containing our max death and max recovered entries
max.records <- rbind(max.deaths, max.recovered)
## attach a column with the max date which corresponds to the date of the entries selected
max.records$date <- max.date
## organize the data as shown in the example
max.records <- select(max.records, c("day","countryName","death","recovered"))
И этот код будет вычислять агрегат (или всего) смертей как totalDeaths
и совокупно восстановленных как totalRecovered
для каждой страны. Затем он возвращает запись с максимальным totalDeath
и максимальным totalRecovered
с максимальной датой в данных.
## call the dplyr library
library(dplyr)
## read the data into R
df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv', stringsAsFactors = FALSE)
## determine the max date contained within the data
max.date <- df[which.max(as.Date(df$day)),"day"]
## copy the data to preserve the original
df1 <- df
## group the data by countries
df1 <- group_by(df1, countryName)
## sum the death and recovered of each country
df1 <- summarise(df1, totalDeaths = sum(death), totalRecovered = sum(recovered))
## ungroup your data to avoid errors
df1 <- ungroup(df1)
## determine country with most total deaths reported
max.deaths <- df1[which.max(df1$totalDeaths),]
## format death numbers to match example
max.deaths$totalDeaths <- paste0("**",max.deaths$totalDeaths,"**")
## determine country with most total recovered reported
max.recovered <- df1[which.max(df1$totalRecovered),]
## format recovered numbers to match example
max.recovered$totalRecovered <- paste0("**",max.recovered$totalRecovered,"**")
## create a data frame containing our max entries
max.records <- rbind(max.deaths, max.recovered)
## attach a column with the max date which corresponds to the most current date the data reports
max.records$date <- max.date
## organize the data as shown in the example
max.records <- select(max.records, c("day","countryName","death","recovered"))
Примечание: оба метода полагаются на пакет dplyr
R. dplyr
можно установить, запустив install.packages(dplyr)
в R или RStudio.
Надеюсь, это поможет!