ADP (R)

[ADP 실기 study log] ADP 23회 코로나 시계열 데이터

멋쟁이천재사자 2022. 7. 14. 10:06

1. 문제

ADP 23회 (출처1:https://cafe.naver.com/sqlpd/28193)

                   출처2 : https://www.kaggle.com/code/kukuroo3/problem-r-base?scriptVersionId=87642636 

코로나 시계열 데이터 5만 관측치 가량, 날짜, 코로나 누적확진자 등 변수 3개
1. ACF 사용해서 distance 계산 (10)
2 계층적 군집 분석을 위해 덴드로그램 작성 (10)

 

2. 답안

 

1. ACF 사용해서 distance 계산 (10)

rm(list=ls())

pacman::p_load(tidyverse,magrittr,lubridate,reshape2,recipes,forecast, factoextra, dtw)
library(TSdist)

temp <- read.csv("adp23/problem3_covid.csv")

temp %>%
  na.omit() %>%
  group_by(location) %>%
  mutate (row =row_number()) %>%
  pivot_wider(names_from = location,
              values_from = new_cases,
              values_fill =0) %>%
  ungroup() %>%
  select (-date, -row) %>%
  t() %>%
  TSdist::TSDatabaseDistances( distance = "acf") -> dist
dist

 

 


2 계층적 군집 분석을 위해 덴드로그램 작성 (10)

dist %>%
 hclust(method ="average")  %>%
 plot(hang=-1)

3. 공부 과정 log

ACF 사용해서 distance 계산??

https://cafe.naver.com/sqlpd/28807

시계열에 보면 TSdist:: TSDatabaseDistances(distance = "acf") acf 구하는 TSdist 패키지가 있던데요..

 

https://blog.naver.com/meta_com/221571968119

먼소린지 너무 어렵다

library(TSdist)
install.packages("TSdist") -- ADP packages 목록에는 없다.

ACFDistance {TSdist} R Documentation
Autocorrelation-based Dissimilarity

Details
This is simply a wrapper for the diss.ACF function of package TSclust. 
As such, all the functionalities of the diss.ACF function are also available when using this function.

TSclust -- NO ADP package

 

-- ADP packages 중 관련있을 법한 패키지 대충 검토. 쓸만한 놈 못찾음

bigdist
bridgedist
disttools
ecodist
emdist
epandist
FAdist
freqdist
Newdistns
NFWdist
p2distance
parallelDist
DACF

 

https://journal.r-project.org/archive/2016-2/mori-mori-mendiburu-etal.pdf

<=== 2022-07-14

 

슬통 유튜브 영상 - "23회 ADP 실기 합격자 인터뷰"  (https://youtu.be/eSRZrkDpAow) study

https://it-freelancer.tistory.com/28

 

 

rm(list=ls())
pacman::p_load(tidyverse,magrittr,lubridate,reshape2,recipes,forecast, factoextra, dtw)

temp <- read.csv("adp23/problem3_covid.csv")

temp %>%
 group_by(location) %>%
 mutate (row =row_number()) %>%
 pivot_wider(names_from = location,
             values_from = new_cases,
     values_fill =0) %>%
 ungroup() %>%
 select (-date, -row) %>%
 t() %>%
 TSdist::TSDatabaseDistances( distance = "acf") %>% # <simpleError in .common.ts.sanity.check(x): NA in the series>
 hclust(method ="average") %>% #NA/NaN/Inf in foreign function call (arg 10)
 plot()

<=== 2022-07-17

 

 

1. ACF 사용해서 distance 계산 (10)

rm(list=ls())

pacman::p_load(tidyverse,magrittr,lubridate,reshape2,recipes,forecast, factoextra, dtw)
library(TSdist)

temp <- read.csv("adp23/problem3_covid.csv")

temp %>%
  na.omit() %>%
  group_by(location) %>%
  mutate (row =row_number()) %>%
  pivot_wider(names_from = location,
              values_from = new_cases,
              values_fill =0) %>%
  ungroup() %>%
  select (-date, -row) %>%
  t() %>%
  TSdist::TSDatabaseDistances( distance = "acf") -> dist
head(dist)

Error in .Primitive("[")(x, 1:6, , drop = FALSE) : 
  incorrect number of dimensions

 


2 계층적 군집 분석을 위해 덴드로그램 작성 (10)

dist %>%
 hclust(method ="average")  %>%
 plot()

 

 

 

<=== 2022-08-08

 

plot() -> plot(hang=-1)

<=== 2022-09-03