Как правило, вы хотите выполнить полное внешнее объединение для двух наборов данных на основе даты в качестве ключа.
Рассмотрим в качестве примера следующее:
%# vector of dates (serial datetime)
days = datenum( num2str((1:31)','2011-10-%02d') ); %'# one month (October 2011)
%# lets build two datasets similar to what you described
idx1 = rand(size(days)) > 0.2; %# randomly pick dates for 1st
M1 = [days(idx1) rand(sum(idx1),2)*1000]; %# sotcks: days,opening,closing
idx2 = rand(size(days)) > 0.5; %# randomly pick dates for 2nd
M2 = [days(idx2) rand(sum(idx2),2)*1000]; %# bonds: days,opening,closing
%# get the full range of dates, and convert them to indices starting at 1
[allDays,~,ind] = unique( [M1(:,1);M2(:,1)] );
indM1 = ind(1:size(M1,1));
indM2 = ind(size(M1,1)+1:end);
%# merge the two datasets (days,opening,closing,opening,closing)
M = nan(numel(allDays),size(M1,2)+size(M2,2)-1);
M(:,1) = allDays; %# available days from both
M(indM1,2:3) = M1(:,2:3); %# insert 1st dataset values
M(indM2,4:5) = M2(:,2:3); %# insert 2nd dataset values
%# final merged dataset formatted
C = [cellstr(datestr(M(:,1),'yyyy-mm-dd')) num2cell(M(:,2:end))]
результат:
C =
'2011-10-01' [ NaN] [ NaN] [332.5714] [241.5017]
'2011-10-03' [941.9189] [ 86.8151] [ NaN] [ NaN]
'2011-10-04' [655.9138] [429.3973] [ NaN] [ NaN]
'2011-10-05' [451.9457] [257.2828] [853.0636] [243.1452]
'2011-10-06' [839.6974] [297.5554] [ NaN] [ NaN]
'2011-10-07' [532.6235] [424.8584] [ NaN] [ NaN]
'2011-10-09' [553.8871] [119.2073] [ NaN] [ NaN]
'2011-10-11' [680.0655] [495.0669] [442.3979] [154.1594]
'2011-10-13' [367.1899] [706.4072] [904.3555] [956.4164]
'2011-10-14' [ NaN] [ NaN] [ 33.1794] [935.6614]
'2011-10-15' [239.2906] [243.5734] [ NaN] [ NaN]
'2011-10-16' [578.9235] [785.0701] [532.4265] [818.7144]
'2011-10-17' [866.8871] [ 74.0896] [716.4973] [728.2618]
'2011-10-18' [406.7768] [393.8834] [179.3018] [175.8117]
'2011-10-19' [112.6151] [ 3.3941] [336.5329] [360.3710]
'2011-10-20' [443.8458] [220.6769] [ NaN] [ NaN]
'2011-10-21' [ NaN] [ NaN] [187.7129] [188.7900]
'2011-10-22' [300.1844] [ 1.3006] [ NaN] [ NaN]
'2011-10-23' [401.3869] [189.1797] [ NaN] [ NaN]
'2011-10-24' [833.3636] [142.4841] [321.9272] [ 1.1984]
'2011-10-25' [ NaN] [ NaN] [403.8567] [316.4195]
'2011-10-26' [403.6287] [268.0760] [ NaN] [ NaN]
'2011-10-27' [390.1759] [174.8921] [ NaN] [ NaN]
'2011-10-28' [ NaN] [ NaN] [548.5663] [699.6170]
'2011-10-29' [360.4489] [138.6490] [ 48.7386] [625.2552]
'2011-10-30' [140.2554] [598.8856] [552.7321] [543.0622]
'2011-10-31' [260.1302] [901.0579] [274.8114] [439.0372]
Объединенный результат содержит цены открытия / закрытия из обоих наборов данных.Когда один из них недоступен на определенную дату, он заменяется на NaN
.Обратите внимание, что в результате есть несколько непредставленных дней, это связано с тем, что оба набора данных не указывали цены в эти дни.
В качестве альтернативы вы можете обратиться к классу dataset
из панели инструментов статистики (которая предназначена для таких случаев).Используя тот же пример:
%# build dataset object for the two sets
varNames1 = {'days' 'stock_open' 'stock_close'};
varNames2 = {'days' 'bond_open' 'bond_close'};
d1 = dataset([M1, varNames1]);
d2 = dataset([M2, varNames2]);
%# join on days (full-outer join)
d = join(d1,d2, 'keys','days', 'type','fullouter', 'MergeKeys',true);
d.days = datestr(d.days,'yyyy-mm-dd'); %# format the days column as string
Результат:
d =
days stock_open stock_close bond_open bond_close
2011-10-01 NaN NaN 332.57 241.5
2011-10-03 941.92 86.815 NaN NaN
2011-10-04 655.91 429.4 NaN NaN
2011-10-05 451.95 257.28 853.06 243.15
2011-10-06 839.7 297.56 NaN NaN
2011-10-07 532.62 424.86 NaN NaN
2011-10-09 553.89 119.21 NaN NaN
2011-10-11 680.07 495.07 442.4 154.16
2011-10-13 367.19 706.41 904.36 956.42
2011-10-14 NaN NaN 33.179 935.66
2011-10-15 239.29 243.57 NaN NaN
2011-10-16 578.92 785.07 532.43 818.71
2011-10-17 866.89 74.09 716.5 728.26
2011-10-18 406.78 393.88 179.3 175.81
2011-10-19 112.62 3.3941 336.53 360.37
2011-10-20 443.85 220.68 NaN NaN
2011-10-21 NaN NaN 187.71 188.79
2011-10-22 300.18 1.3006 NaN NaN
2011-10-23 401.39 189.18 NaN NaN
2011-10-24 833.36 142.48 321.93 1.1984
2011-10-25 NaN NaN 403.86 316.42
2011-10-26 403.63 268.08 NaN NaN
2011-10-27 390.18 174.89 NaN NaN
2011-10-28 NaN NaN 548.57 699.62
2011-10-29 360.45 138.65 48.739 625.26
2011-10-30 140.26 598.89 552.73 543.06
2011-10-31 260.13 901.06 274.81 439.04
РЕДАКТИРОВАТЬ:
Скажем, у вас были следующие два файла, содержащие данные:
bond.csv
10/6/1977 7.72 7.72
10/7/1977 7.73 7.73
10/11/1977 7.77 7.77
10/12/1977 7.79 7.79
10/13/1977 7.79 7.79
10/14/1977 7.79 7.79
10/17/1977 7.79 7.79
10/18/1977 7.8 7.8
stocks.csv
10/06/77 95.68 96.05
10/07/77 96.05 95.97
10/10/77 95.97 95.75
10/11/77 95.75 94.93
10/12/77 94.82 94.04
10/13/77 94.04 93.46
10/14/77 93.46 93.56
10/17/77 93.56 93.47
Вы можете прочитать данные, используя функцию TEXTSCAN:
%# read bonds data
fid = fopen('bonds.csv','rt');
C = textscan(fid, '%s %f %f', 'Delimiter',' ', 'CollectOutput',true);
fclose(fid);
M1 = [datenum(C{1},'mm/dd/yyyy') C{2}];
%# read stocks data
fid = fopen('stocks.csv','rt');
C = textscan(fid, '%s %f %f', 'Delimiter',' ', 'CollectOutput',true);
fclose(fid);
M2 = [datenum(C{1},'mm/dd/yy') C{2}];
Теперь вы можетеиспользуйте тот же код выше (начиная с «получить полный диапазон дат ...», или используйте класс DATASET).После присоединения к ним это дает мне:
C =
'1977-10-06' [7.72] [7.72] [95.68] [96.05]
'1977-10-07' [7.73] [7.73] [96.05] [95.97]
'1977-10-10' [ NaN] [ NaN] [95.97] [95.75]
'1977-10-11' [7.77] [7.77] [95.75] [94.93]
'1977-10-12' [7.79] [7.79] [94.82] [94.04]
'1977-10-13' [7.79] [7.79] [94.04] [93.46]
'1977-10-14' [7.79] [7.79] [93.46] [93.56]
'1977-10-17' [7.79] [7.79] [93.56] [93.47]
'1977-10-18' [ 7.8] [ 7.8] [ NaN] [ NaN]