У меня есть фрейм данных с номером urls
.Я пишу некоторый код, чтобы сказать R перейти на url
и загрузить его.Однако я хочу быть немного организованным, поэтому я хочу сохранить urls
в папке в зависимости от года, из которого он был собран.То есть у меня есть столбец в данных с именем filing_date_year
.
Так что, если url
был получен из year 2003
, тогда я хочу сохранить url
в папке с именем 2003
.Однако, если год был 2010
, то я хотел бы сохранить документ в папке с именем 2010
.
########################################################################
У меня есть следующий код:
library(purrr)
walk2(data_information_documents_toget$href.y, data_information_documents_toget$CIKAccNumFileDate_web_extension,
function(x, y) {
download.file(x, destfile = paste0("c:/USER/directory/",year_to_filter, "/", y), quiet = FALSE)
})
, который берет из фрейма данных с именем data_information_documents_toget
url
, в котором находится документ href.y
.Я хочу загрузить это url
и сохранить его с уникальным идентификатором CIKAccNumFileDate_web_extension
Я пытаюсь добавить условие year_to_filter
, которое, по сути, будет индексом, указывающим, было ли взято url
из строки с годом 2003
, затем сохраните ее в папке 2003
и т. д.
Образцы данных:
data_information_documents_toget <- structure(list(href.y = c("https://www.sec.gov/Archives/edgar/data/1578845/000156459019003111/agn-10k_20181231.htm",
"https://www.sec.gov/Archives/edgar/data/81033/000093041308001260/c52299_10k.htm",
"https://www.sec.gov/Archives/edgar/data/704051/000070405115000045/lm_10kx3312015.htm",
"https://www.sec.gov/Archives/edgar/data/5133/000119312513209085/d460905d10k.htm",
"https://www.sec.gov/Archives/edgar/data/915912/000095012310019013/w77522e10vk.htm",
"https://www.sec.gov/Archives/edgar/data/823768/000095012311015242/h76657e10vk.htm",
"https://www.sec.gov/Archives/edgar/data/12978/000104746905006771/a2153651z10-k.htm",
"https://www.sec.gov/Archives/edgar/data/12659/000095013707009521/c16312e10vk.htm",
"https://www.sec.gov/Archives/edgar/data/941548/000095012904001055/h13049e10vk.htm",
"https://www.sec.gov/Archives/edgar/data/1800/000104746913001180/a2212523z10-k.htm",
"https://www.sec.gov/Archives/edgar/data/1004155/000100415506000097/form10ka.htm",
"https://www.sec.gov/Archives/edgar/data/5272/000000527215000002/maindocument001.htm",
"https://www.sec.gov/Archives/edgar/data/1308161/000156459018021493/fox-10k_20180630.htm",
"https://www.sec.gov/Archives/edgar/data/915389/000091538917000014/emn2016123110k.htm",
"https://www.sec.gov/Archives/edgar/data/1326380/000132638015000078/form10k-fy14.htm",
"https://www.sec.gov/Archives/edgar/data/85408/000095012907001047/h43875e10vk.htm",
"https://www.sec.gov/Archives/edgar/data/1224608/000122460816000053/cno1231201510-k.htm",
"https://www.sec.gov/Archives/edgar/data/836106/000089161804000704/f95884e10vk.htm",
"https://www.sec.gov/Archives/edgar/data/1040971/000110465905011116/a05-4733_110k.htm",
"https://www.sec.gov/Archives/edgar/data/909832/000119312505223245/d10k.htm",
"https://www.sec.gov/Archives/edgar/data/723254/000110465906053974/a06-16851_110k.htm",
"https://www.sec.gov/Archives/edgar/data/1037038/000103703815000006/rl-20150328x10k.htm",
"https://www.sec.gov/Archives/edgar/data/1113169/000095013308000389/w47962e10vk.htm",
"https://www.sec.gov/Archives/edgar/data/808450/000119312509257118/d10k.htm",
"https://www.sec.gov/Archives/edgar/data/909832/000119312511271844/d203874d10k.htm",
"https://www.sec.gov/Archives/edgar/data/319201/000144530511002394/klac10k2011.htm",
"https://www.sec.gov/Archives/edgar/data/915912/000091591218000004/a201710-k.htm",
"https://www.sec.gov/Archives/edgar/data/95304/000095010903001224/d10k.htm",
"https://www.sec.gov/Archives/edgar/data/3153/000009212211000013/g24641xxe10vk.htm",
"https://www.sec.gov/Archives/edgar/data/12659/000095013706004022/c03876e10vkza.htm",
"https://www.sec.gov/Archives/edgar/data/63541/000119312506027038/d10k.htm",
"https://www.sec.gov/Archives/edgar/data/1585689/000158568914000006/a2013hwh10-k.htm",
"https://www.sec.gov/Archives/edgar/data/1099800/000104746908001956/a2183020z10-k.htm",
"https://www.sec.gov/Archives/edgar/data/49196/000095015208001408/l29571ae10vk.htm",
"https://www.sec.gov/Archives/edgar/data/1101215/000110121519000048/ads-20181231x10k.htm",
"https://www.sec.gov/Archives/edgar/data/1310067/000119312510055594/d10k.htm",
"https://www.sec.gov/Archives/edgar/data/1174922/000119312512195995/d340198d10ka.htm",
"https://www.sec.gov/Archives/edgar/data/69970/000095015208004633/l32075ae10vkza.htm",
"https://www.sec.gov/Archives/edgar/data/5272/000104746914001096/a2218248z10-k.htm",
"https://www.sec.gov/Archives/edgar/data/1058090/000105809016000058/cmg-20151231x10k.htm",
"https://www.sec.gov/Archives/edgar/data/885639/000088563913000004/kohls_10kx2012.htm",
"https://www.sec.gov/Archives/edgar/data/354964/000035496413000002/hbio12311210-k.htm",
"https://www.sec.gov/Archives/edgar/data/1075531/000110465911010302/a11-2103_110k.htm",
"https://www.sec.gov/Archives/edgar/data/54480/000119312511028728/d10k.htm",
"https://www.sec.gov/Archives/edgar/data/1004434/000104746903011288/a2106221z10-k.htm",
"https://www.sec.gov/Archives/edgar/data/1526520/000119312514045532/d654086d10k.htm",
"https://www.sec.gov/Archives/edgar/data/1310067/000131006715000009/shld201410k.htm",
"https://www.sec.gov/Archives/edgar/data/4962/000119312513070554/d486442d10k.htm",
"https://www.sec.gov/Archives/edgar/data/354950/000104746907002295/a2176777z10-k.htm",
"https://www.sec.gov/Archives/edgar/data/823768/000119312516467957/d83265d10k.htm",
"https://www.sec.gov/Archives/edgar/data/50104/000095013409004250/d66470e10vk.htm",
"https://www.sec.gov/Archives/edgar/data/1437107/000095013309000442/w72867e10vk.htm",
"https://www.sec.gov/Archives/edgar/data/791519/000104746905004527/a2152243z10-k.htm",
"https://www.sec.gov/Archives/edgar/data/1136893/000089256908000207/a38312e10vk.htm",
"https://www.sec.gov/Archives/edgar/data/1141391/000119312511320907/d258542d10ka.htm",
"https://www.sec.gov/Archives/edgar/data/1365135/000136513518000013/wu-12312017x10k.htm",
"https://www.sec.gov/Archives/edgar/data/60667/000006066706000141/lowesform10ka02032006.htm",
"https://www.sec.gov/Archives/edgar/data/1090727/000119312512081067/d274494d10k.htm",
"https://www.sec.gov/Archives/edgar/data/80424/000095015205007351/l15436ae10vk.htm",
"https://www.sec.gov/Archives/edgar/data/108772/000010877218000012/xrx-123117x10xk.htm",
"https://www.sec.gov/Archives/edgar/data/1075531/000110465904007430/a04-3266_110k.htm",
"https://www.sec.gov/Archives/edgar/data/318154/000031815417000004/amgn-12312016x10k.htm",
"https://www.sec.gov/Archives/edgar/data/1442145/000095012311019814/y89886e10vk.htm",
"https://www.sec.gov/Archives/edgar/data/5513/000000551318000016/unm12312017-10xk.htm",
"https://www.sec.gov/Archives/edgar/data/1437107/000143710714000016/disca-2013123110k.htm",
"https://www.sec.gov/Archives/edgar/data/1466258/000146625819000073/ir-10kx12312018.htm",
"https://www.sec.gov/Archives/edgar/data/50104/000005010417000056/tso201610-k.htm",
"https://www.sec.gov/Archives/edgar/data/1166691/000119312506036698/d10k.htm",
"https://www.sec.gov/Archives/edgar/data/1141982/000095012311016589/h78025e10vk.htm",
"https://www.sec.gov/Archives/edgar/data/37785/000003778517000011/fmc201610k.htm",
"https://www.sec.gov/Archives/edgar/data/1040971/000104746909005369/a2192961z10-ka.htm",
"https://www.sec.gov/Archives/edgar/data/39911/000119312509066067/d10k.htm",
"https://www.sec.gov/Archives/edgar/data/1045810/000104581018000010/nvda-2018x10k.htm",
"https://www.sec.gov/Archives/edgar/data/1370946/000137094617000006/oc-20161231x10k.htm",
"https://www.sec.gov/Archives/edgar/data/936340/000095012405001542/k91838e10vk.htm",
"https://www.sec.gov/Archives/edgar/data/316709/000031670916000067/schw-20151231x10k.htm",
"https://www.sec.gov/Archives/edgar/data/25445/000144530514000574/cr-20131231x10k.htm",
"https://www.sec.gov/Archives/edgar/data/1336917/000133691718000009/ua-20171231x10k.htm",
"https://www.sec.gov/Archives/edgar/data/6281/000095013507007253/b67578ade10vk.htm",
"https://www.sec.gov/Archives/edgar/data/879169/000110465907015059/a07-5374_110k.htm",
"https://www.sec.gov/Archives/edgar/data/1039684/000103968412000027/form_10-k.htm",
"https://www.sec.gov/Archives/edgar/data/31235/000003123511000025/ek2010_10k.htm",
"https://www.sec.gov/Archives/edgar/data/1004434/000104746909002123/a2190957z10-k.htm",
"https://www.sec.gov/Archives/edgar/data/818479/000081847909000034/q40810k.htm",
"https://www.sec.gov/Archives/edgar/data/1121788/000161577419002739/s116041_10k.htm",
"https://www.sec.gov/Archives/edgar/data/766704/000095015209002082/l35635ae10vk.htm",
"https://www.sec.gov/Archives/edgar/data/29534/000104746913003283/a2213303z10-k.htm",
"https://www.sec.gov/Archives/edgar/data/865436/000086543614000161/wfm10k2014.htm",
"https://www.sec.gov/Archives/edgar/data/5272/000110465912013132/a11-32502_410ka.htm",
"https://www.sec.gov/Archives/edgar/data/931336/000095013403009830/d06474a1e10vkza.htm",
"https://www.sec.gov/Archives/edgar/data/1037646/000095012311014519/l41517e10vk.htm",
"https://www.sec.gov/Archives/edgar/data/1020569/000110465906017231/a06-2602_110k.htm",
"https://www.sec.gov/Archives/edgar/data/1496048/000149604817000018/ggp12311610k.htm",
"https://www.sec.gov/Archives/edgar/data/1169055/000162828018002128/noblecorpplc-201710xk.htm",
"https://www.sec.gov/Archives/edgar/data/920760/000162828018000562/len-20171130x10k.htm",
"https://www.sec.gov/Archives/edgar/data/28917/000002891718000159/dds-02032018x10k.htm",
"https://www.sec.gov/Archives/edgar/data/875320/000087532019000006/a201810k-main.htm",
"https://www.sec.gov/Archives/edgar/data/1359841/000135984117000040/hbi-20161231x10k.htm",
"https://www.sec.gov/Archives/edgar/data/20520/000002052015000011/ftr-20141231x10k.htm",
"https://www.sec.gov/Archives/edgar/data/1495569/000119312511040013/d10k.htm"
), CIKAccNumFileDate_web_extension = c("0000054480_0001564590-19-003111_2019-02-15.htm",
"0000788784_0000930413-08-001260_2008-02-28.htm", "0001000180_0000704051-15-000045_2015-05-22.htm",
"0001094093_0001193125-13-209085_2013-05-09.htm", "0000314808_0000950123-10-019013_2010-03-01.htm",
"0000029534_0000950123-11-015242_2011-02-17.htm", "0001585689_0001047469-05-006771_2005-03-16.htm",
"0000028917_0000950137-07-009521_2007-06-29.htm", "0000721683_0000950129-04-001055_2004-03-08.htm",
"0000001800_0001047469-13-001180_2013-02-15.htm", "0001141982_0001004155-06-000097_2006-06-01.htm",
"0001115222_0000005272-15-000002_2015-02-20.htm", "0001272547_0001564590-18-021493_2018-08-13.htm",
"0001166691_0000915389-17-000014_2017-02-27.htm", "0001053507_0001326380-15-000078_2015-03-30.htm",
"0000095521_0000950129-07-001047_2007-02-28.htm", "0000785161_0001224608-16-000053_2016-02-19.htm",
"0000819692_0000891618-04-000704_2004-03-12.htm", "0000006201_0001104659-05-011116_2005-03-15.htm",
"0000860730_0001193125-05-223245_2005-11-10.htm", "0000020520_0001104659-06-053974_2006-08-11.htm",
"0000915912_0001037038-15-000006_2015-05-15.htm", "0000006281_0000950133-08-000389_2008-02-07.htm",
"0000063541_0001193125-09-257118_2009-12-21.htm", "0000860730_0001193125-11-271844_2011-10-14.htm",
"0001400891_0001445305-11-002394_2011-08-05.htm", "0000314808_0000915912-18-000004_2018-02-23.htm",
"0000040704_0000950109-03-001224_2003-03-07.htm", "0000092122_0000092122-11-000013_2011-02-25.htm",
"0000028917_0000950137-06-004022_2006-03-31.htm", "0000026780_0001193125-06-027038_2006-02-10.htm",
"0001598014_0001585689-14-000006_2014-02-27.htm", "0001385187_0001047469-08-001956_2008-02-29.htm",
"0000812074_0000950152-08-001408_2008-02-26.htm", "0000851968_0001101215-19-000048_2019-02-26.htm",
"0001310067_0001193125-10-055594_2010-03-12.htm", "0000818479_0001193125-12-195995_2012-04-30.htm",
"0000883980_0000950152-08-004633_2008-06-16.htm", "0001115222_0001047469-14-001096_2014-02-20.htm",
"0001364742_0001058090-16-000058_2016-02-05.htm", "0001007456_0000885639-13-000004_2013-03-22.htm",
"0000006201_0000354964-13-000002_2013-03-04.htm", "0001274494_0001104659-11-010302_2011-02-25.htm",
"0000018926_0001193125-11-028728_2011-02-09.htm", "0001168054_0001047469-03-011288_2003-03-31.htm",
"0000935703_0001193125-14-045532_2014-02-11.htm", "0001310067_0001310067-15-000009_2015-03-17.htm",
"0001122304_0001193125-13-070554_2013-02-22.htm", "0000714154_0001047469-07-002295_2007-03-29.htm",
"0000029534_0001193125-16-467957_2016-02-18.htm", "0001571949_0000950134-09-004250_2009-03-02.htm",
"0000046765_0000950133-09-000442_2009-02-26.htm", "0000875570_0001047469-05-004527_2005-02-24.htm",
"0000816284_0000892569-08-000207_2008-02-29.htm", "0001430602_0001193125-11-320907_2011-11-23.htm",
"0001156375_0001365135-18-000013_2018-02-22.htm", "0001037949_0000060667-06-000141_2006-09-29.htm",
"0000352510_0001193125-12-081067_2012-02-27.htm", "0000080424_0000950152-05-007351_2005-08-29.htm",
"0000108772_0000108772-18-000012_2018-02-23.htm", "0001274494_0001104659-04-007430_2004-03-15.htm",
"0000043362_0000318154-17-000004_2017-02-14.htm", "0001166691_0000950123-11-019814_2011-02-28.htm",
"0000091576_0000005513-18-000016_2018-02-21.htm", "0000916076_0001437107-14-000016_2014-02-20.htm",
"0000896159_0001466258-19-000073_2019-02-12.htm", "0001571949_0000050104-17-000056_2017-02-21.htm",
"0001275283_0001193125-06-036698_2006-02-22.htm", "0001466258_0000950123-11-016589_2011-02-22.htm",
"0001087423_0000037785-17-000011_2017-02-28.htm", "0000006201_0001047469-09-005369_2009-05-11.htm",
"0000053117_0001193125-09-066067_2009-03-27.htm", "0000792985_0001045810-18-000010_2018-02-28.htm",
"0001370946_0001370946-17-000006_2017-02-08.htm", "0000936340_0000950124-05-001542_2005-03-15.htm",
"0000721371_0000316709-16-000067_2016-02-24.htm", "0000107681_0001445305-14-000574_2014-02-25.htm",
"0000850209_0001336917-18-000009_2018-02-28.htm", "0000764622_0000950135-07-007253_2007-11-30.htm",
"0001681459_0001104659-07-015059_2007-02-28.htm", "0001039684_0001039684-12-000027_2012-02-21.htm",
"0000934612_0000031235-11-000025_2011-02-25.htm", "0001168054_0001047469-09-002123_2009-03-02.htm",
"0001378946_0000818479-09-000034_2009-02-20.htm", "0000029534_0001615774-19-002739_2019-02-20.htm",
"0001020569_0000950152-09-002082_2009-03-02.htm", "0001593538_0001047469-13-003283_2013-03-25.htm",
"0001339947_0000865436-14-000161_2014-11-21.htm", "0001115222_0001104659-12-013132_2012-02-27.htm",
"0001652044_0000950134-03-009830_2003-07-03.htm", "0001659166_0000950123-11-014519_2011-02-16.htm",
"0000812074_0001104659-06-017231_2006-03-16.htm", "0001393612_0001496048-17-000018_2017-02-22.htm",
"0000711065_0001628280-18-002128_2018-02-23.htm", "0000820027_0001628280-18-000562_2018-01-25.htm",
"0001613103_0000028917-18-000159_2018-03-30.htm", "0001037868_0000875320-19-000006_2019-02-13.htm",
"0001101239_0001359841-17-000040_2017-02-03.htm", "0001017008_0000020520-15-000011_2015-02-25.htm",
"0001702780_0001193125-11-040013_2011-02-18.htm"), name = c("KANSAS CITY SOUTHERN",
"PUBLIC SERVICE ENTERPRISE GROUP INC", "SANDISK CORP", "PROGRESS ENERGY INC",
"Ensco plc", "DOLLAR GENERAL CORP", "Hilton Worldwide Holdings Inc.",
"DILLARD'S, INC.", "TOTAL SYSTEM SERVICES INC", "ABBOTT LABORATORIES",
"Cooper Industries plc", "DUN & BRADSTREET CORP/NW", "FREESCALE SEMICONDUCTOR INC",
"COMCAST CORP", "AMERICAN TOWER CORP /MA/", "SUPERVALU INC",
"Encompass Health Corp", "CHARTER ONE FINANCIAL INC", "American Airlines Group Inc.",
"HCA Healthcare, Inc.", "FRONTIER COMMUNICATIONS CORP", "AVALONBAY COMMUNITIES INC",
"ANALOG DEVICES INC", "MAYTAG CORP", "HCA Healthcare, Inc.",
"iHeartMedia, Inc.", "Ensco plc", "GENERAL MILLS INC", "SOUTHERN CO",
"DILLARD'S, INC.", "DANA INC", "IHS Markit Ltd.", "Covidien plc",
"OWENS ILLINOIS INC /DE/", "MOHAWK INDUSTRIES INC", "SEARS HOLDINGS CORP",
"DENTSPLY SIRONA Inc.", "FIRST DATA CORP", "DUN & BRADSTREET CORP/NW",
"BlackRock Inc.", "ELECTRONIC DATA SYSTEMS CORP /DE/", "American Airlines Group Inc.",
"FIRST SOLAR, INC.", "CENTURYLINK, INC", "CIMAREX ENERGY CO",
"DOLLAR TREE INC", "SEARS HOLDINGS CORP", "AETNA INC /PA/", "COMPAQ COMPUTER CORP",
"DOLLAR GENERAL CORP", "Intercontinental Exchange, Inc.", "Helmerich & Payne, Inc.",
"PEOPLESOFT INC", "CELGENE CORP /DE/", "Scripps Networks Interactive, Inc.",
"CME GROUP INC.", "QWEST COMMUNICATIONS INTERNATIONAL INC", "NORTH FORK BANCORPORATION INC",
"PROCTER & GAMBLE Co", "XEROX CORP", "FIRST SOLAR, INC.", "GREAT LAKES CHEMICAL CORP",
"COMCAST CORP", "KEYCORP /NEW/", "MARTIN MARIETTA MATERIALS INC",
"Chubb Ltd", "Intercontinental Exchange, Inc.", "REYNOLDS AMERICAN INC",
"Ingersoll-Rand plc", "RED HAT INC", "American Airlines Group Inc.",
"FORT JAMES CORP", "HEALTH MANAGEMENT ASSOCIATES, INC", "Owens Corning",
"DTE ENERGY CO", "CARDINAL HEALTH INC", "WINN DIXIE STORES INC",
"FOOT LOCKER, INC.", "PINNACLE WEST CAPITAL CORP", "TechnipFMC plc",
"ONEOK INC /NEW/", "BURLINGTON NORTHERN SANTA FE, LLC", "CIMAREX ENERGY CO",
"People's United Financial, Inc.", "DOLLAR GENERAL CORP", "IRON MOUNTAIN INC",
"NAVIENT CORP", "Viacom Inc.", "DUN & BRADSTREET CORP/NW", "Alphabet Inc.",
"Fortive Corp", "OWENS ILLINOIS INC /DE/", "Discover Financial Services",
"APPLIED MICRO CIRCUITS CORP", "AMERIPRISE FINANCIAL INC", "Medtronic plc",
"AMETEK INC/", "EQUINIX INC", "UNIVISION COMMUNICATIONS INC",
"Altice USA, Inc."), filing_date_year = c(2019L, 2008L, 2015L,
2013L, 2010L, 2011L, 2005L, 2007L, 2004L, 2013L, 2006L, 2015L,
2018L, 2017L, 2015L, 2007L, 2016L, 2004L, 2005L, 2005L, 2006L,
2015L, 2008L, 2009L, 2011L, 2011L, 2018L, 2003L, 2011L, 2006L,
2006L, 2014L, 2008L, 2008L, 2019L, 2010L, 2012L, 2008L, 2014L,
2016L, 2013L, 2013L, 2011L, 2011L, 2003L, 2014L, 2015L, 2013L,
2007L, 2016L, 2009L, 2009L, 2005L, 2008L, 2011L, 2018L, 2006L,
2012L, 2005L, 2018L, 2004L, 2017L, 2011L, 2018L, 2014L, 2019L,
2017L, 2006L, 2011L, 2017L, 2009L, 2009L, 2018L, 2017L, 2005L,
2016L, 2014L, 2018L, 2007L, 2007L, 2012L, 2011L, 2009L, 2009L,
2019L, 2009L, 2013L, 2014L, 2012L, 2003L, 2011L, 2006L, 2017L,
2018L, 2018L, 2018L, 2019L, 2017L, 2015L, 2011L)), row.names = c(NA,
-100L), class = "data.frame")
РЕДАКТИРОВАТЬ:
Если данныеназывается d
и каталог выглядит следующим образом: D:/SPY_data/
, затем начинается загрузка данных.
library(purrr)
walk2(d$href.y, d$CIKAccNumFileDate_web_extension,
function(x, y) {
download.file(x, destfile = paste0("D:/SPY_data/", y), quiet = FALSE)
})
Это загружает файлы в одну папку, однако я надеюсь, что файлы в нескольких папках по годам.