Как найти совпадения между CSV-файлами на основе двух столбцов в диапазоне - PullRequest
0 голосов
/ 15 марта 2019

В настоящее время я пытаюсь собрать некоторый код, который найдет совпадения значений в двух разных столбцах в двух csv файлах в пределах диапазона. Я попытался использовать приведенный ниже код, но он не выдает того, что я пытаюсь выполнить. По сути, я хочу вывести новый файл, который содержит все строки во втором файле, которые совпадают с одинаковыми столбцами в первом файле, а не объединять их вместе. Я добавил более подробные пояснения ниже моего кода. Я чувствую, что то, что я сделал до сих пор, возможно, совершенно неправильно. Что мне нужно изменить, чтобы мой код давал результаты, которые я ищу?

import csv

with open('F435W.csv') as csvF435:
    readCSV1 = csv.reader(csvF435, delimiter=',')
    with open("F550Mnew.csv", "w") as new_F550M:
        pass
    with open("F550Mnew.csv", "a") as new_F550M:
        for header in readCSV1:
            new_F550M.write(','.join(header)+'\n')
            break
        for l435 in readCSV1:
            with open('F550M.csv') as csvF550:
                readCSV2 = csv.reader(csvF550, delimiter=',')
                for l550 in readCSV2:
                    if isfloat(l435[12]) and isfloat(l550[12]) and abs(float(l435[12])-float(l550[12])) < 0.002778:
                        if isfloat(l435[13]) and isfloat(l550[13]) and abs(float(l435[13])-float(l550[13])) < 0.002778:
                            new_F550M.write(','.join(l550)+'\n')

Для пояснения, каждый файл имеет столбец X и столбец Y, поэтому в основном каждая строка соответствует точке (X,Y). Кроме того, есть еще 21 столбец данных, которые не нужны для поиска совпадений, но должны быть включены в окончательный выходной файл. Я пытаюсь найти точки во втором файле, которые соответствуют точкам в первом файле в радиусе. Это потому, что я знаю, что ни одно из моих очков не будет точным совпадением. По моим данным, мой X - это столбец 13, а мой Y - это столбец 14.

Способ, которым я пытался это сделать, - найти различия между каждым X в первом файле и каждым X во втором файле (например, X1-X2) и различиями между каждым Y в первом файле и каждый Y во втором файле (например, Y1-Y2). Тогда каждая строка во втором файле, которая соответствует разностям для X и Y, которые меньше моего значения радиуса (0,0002778), будет считаться соответствием первому файлу.

К сожалению, мой код создает файл с более чем 300 000 баллов, когда мои исходные файлы имеют только 7 000 баллов. Там должно быть меньше данных, а не больше данных. Он также включает в себя множество повторений данных, когда вообще не должно быть повторений. Спасибо за ваше время!

Пример того, как выглядят данные: я прошу прощения за длину, но я боюсь, что они не будут содержать достаточно совпадений, чтобы быть полезными, если я не включу достаточно данных.

F435W.csv (файл 1)

1,2017.013,0.01242859,-8.2618,0,51434.12,0.3269918,-11.7781,0,0.01957931,1387.9406,541.916,49.9898514,41.5266996,8.81E+01,1.63E+03,1.44E+02,40.535,8.65,84.72,0.00061,0.00035,62.14
2,84.73392,0.01245409,-4.8201,0.0002,112.9723,0.04012135,-5.1324,0.0004,-0.002142646,150.306,146.7986,49.9942613,41.5444392,4.92E+00,5.60E+00,-2.02E-01,2.379,2.206,-74.69,0.00339,0.0029,88.88
3,215.1939,0.01242859,-5.8321,0.0001,262.2751,0.03840466,-6.0469,0.0002,-0.002961465,3248.686,52.8478,50.003155,41.5019044,4.77E+00,5.05E+00,-1.63E-01,2.263,2.166,-65.29,0.002,0.0019,-66.78
4,0.3796681,0.01240305,1.0515,0.0355,0.5823653,0.05487975,0.587,0.1023,-0.00425157,3760.344,11.113,50.0051049,41.4949256,1.93E+00,1.02E+00,-7.42E-02,1.393,1.007,-4.61,0.05461,0.03818,-6.68
5,0.9584663,0.01249223,0.0461,0.0142,1.043696,0.0175857,-0.0464,0.0183,-0.004156116,4013.2063,9.1225,50.0057256,41.4914444,1.12E+00,9.75E-01,1.09E-01,1.085,0.957,28.34,0.01934,0.01745,44.01
6,2.379565,0.01249223,-0.9412,0.0057,0.231205,0.02710035,1.59,0.1273,-0.004135321,3824.3706,9.0756,50.0052903,41.4940468,7.81E-01,6.99E-02,4.27E-02,0.885,0.26,3.42,0.01265,0.00622,15.52
7,0.3171223,0.01250492,1.2469,0.0428,0.5233852,0.05406558,0.7029,0.1122,-0.00399635,4097.3604,7.0301,50.0059585,41.4902884,9.61E-01,1.63E+00,-3.94E-01,1.346,0.883,-65.16,0.06171,0.04005,-65.05
8,0.289245,0.0125176,1.3468,0.047,0.2744479,0.02238134,1.4039,0.0886,-0.004173243,3904.7402,7.3912,50.0055069,41.4929422,7.90E-01,2.38E-01,7.13E-02,0.894,0.479,7.24,0.04501,0.02071,8.29
9,0.3543034,0.01247953,1.1266,0.0383,0.7666836,0.06376094,0.2885,0.0903,-0.004009248,4107.0684,3.259,50.0060503,41.4901611,3.53E+00,1.28E+00,-4.60E-01,1.903,1.09,-11.12,0.06873,0.03955,-11.22
10,1.308331,0.01250492,-0.2918,0.0104,-0.005209296,0.004877397,99,99,-0.004193406,3933.9834,6,50.0056001,41.4925416,5.78E-01,8.33E-02,0.00E+00,0.76,0.289,0,0.01272,0.00424,0
11,3.995717,0.01250492,-1.504,0.0034,0.1589517,0.007450347,1.9968,0.0509,-0.003990021,4069.0469,3.0234,50.0059668,41.4906855,8.03E-01,2.29E-02,1.02E-02,0.896,0.151,0.75,0.00888,0.00361,5.59
12,1.067634,0.01250492,-0.0711,0.0127,0.1260926,0.02787585,2.2483,0.2401,-0.004042602,4048.9148,4,50.0059023,41.4909612,7.40E-01,8.33E-02,0.00E+00,0.86,0.289,0,0.02449,0.00576,0
13,0.2808423,0.01162418,1.3788,0.0449,0.4633991,0.02235104,0.8351,0.0524,-0.004015559,4114.6655,2.0641,50.0060898,41.4900585,9.65E-01,5.88E-01,-9.47E-02,0.994,0.752,-13.34,0.05405,0.03814,-15.13
14,1.067291,0.01245409,-0.0707,0.0127,1.081617,0.01516444,-0.0852,0.0152,-0.004168633,3960.8787,18.0524,50.0054405,41.4921501,6.84E-01,8.29E-01,-6.18E-02,0.923,0.813,-69.77,0.01468,0.01229,-78.83
15,0.5216251,0.0125176,0.7066,0.0261,0.584776,0.01824955,0.5825,0.0339,-0.003026338,2661.6533,58.4563,50.0016952,41.5099844,8.51E-01,1.17E+00,-7.27E-02,1.089,0.914,-77.72,0.03244,0.02498,-81.68
16,0.6062042,0.01249223,0.5435,0.0224,0.8726375,0.05509822,0.1479,0.0686,-0.003950399,4149.8169,31.0127,50.0056384,41.489524,9.30E-01,3.48E+00,2.03E-01,1.87,0.956,85.48,0.05307,0.0241,86.01
17,0.1324067,0.01242859,2.1952,0.1019,0.1208224,0.01290438,2.2946,0.116,-0.004166729,3911.6807,12.661,50.005426,41.4928374,2.17E-01,2.24E-01,-1.08E-01,0.574,0.335,-45.89,0.0721,0.04162,-44.98
18,0.2136006,0.01247953,1.676,0.0634,0.3511444,0.02471001,1.1363,0.0764,-0.003978713,4096.9111,15.6285,50.0057993,41.4902797,1.00E+00,4.37E-01,2.85E-01,1.058,0.564,22.64,0.07548,0.03957,23.17
19,0.1470979,0.01244135,2.081,0.0919,0.1216703,0.0168958,2.287,0.1508,-0.004147241,3695.311,13.7044,50.004907,41.4958173,2.14E-01,2.08E-01,9.20E-02,0.551,0.345,44.05,0.07073,0.04115,45.12
20,0.5434682,0.01250492,0.6621,0.025,0.5819249,0.01592951,0.5878,0.0297,-0.004136056,3866.6416,24.8316,50.0050981,41.493437,8.34E-01,9.96E-01,2.74E-01,1.096,0.793,53.22,0.02966,0.02055,58.08
21,0.2259093,0.01249223,1.6152,0.0601,0.2848583,0.01867901,1.3634,0.0712,-0.00409535,3645.521,20.0162,50.0046759,41.4964926,5.71E-01,4.26E-01,-1.11E-02,0.756,0.652,-4.34,0.03735,0.0305,0.08
22,0.9499883,0.01247953,0.0557,0.0143,0.9711754,0.01891141,0.0318,0.0211,-0.003134006,3378.7927,19.5305,50.0040686,41.5001691,8.66E-01,4.09E-01,3.57E-03,0.931,0.639,0.45,0.01623,0.01142,-1.19
23,1.125635,0.01240305,-0.1285,0.012,1.050538,0.02402694,-0.0535,0.0248,-0.003295973,3132.9458,24.9024,50.0034018,41.5035477,9.65E-01,7.83E-01,-1.44E-01,1.022,0.839,-28.88,0.01702,0.01288,-21
24,0.168302,0.01249223,1.9348,0.0806,0.2447732,0.01930529,1.5281,0.0857,-0.004140488,3904.7268,27.0386,50.0051454,41.4929084,4.47E-01,4.56E-01,-1.28E-02,0.682,0.662,-54.61,0.04399,0.04068,89.66
25,0.0542859,0.01244135,3.1633,0.2489,0.08799078,0.007964755,2.6389,0.0983,-0.003241792,3454.2612,25.2749,50.0041373,41.4991191,1.93E-01,1.99E-01,-7.18E-02,0.518,0.353,-46.27,0.06408,0.03839,-44.76
26,0.4379335,0.01242859,0.8965,0.0308,0.4661828,0.01542368,0.8286,0.0359,-0.00336337,3478.7058,32.3355,50.0040639,41.4987701,6.15E-01,8.96E-01,-2.91E-02,0.948,0.782,-84.15,0.02891,0.02521,-70.04
27,0.1515608,0.01249223,2.0485,0.0895,0.1935181,0.01712885,1.7832,0.0961,-0.002904789,2982.0017,29.9904,50.0029594,41.505619,3.46E-01,3.61E-01,1.55E-05,0.601,0.588,89.94,0.05241,0.05241,-80.48
28,0.6658883,0.01250492,0.4415,0.0204,0.718064,0.01780974,0.3596,0.0269,-0.00324104,3408.0103,36.2539,50.0038284,41.4997375,9.45E-01,1.11E+00,1.98E-01,1.115,0.902,56.45,0.02706,0.02147,51.52
29,0.7244126,0.01244135,0.35,0.0187,1.030102,0.02744665,-0.0322,0.0289,-0.00280412,3259.0889,37.3165,50.0034648,41.5017879,8.65E-01,1.01E+00,5.85E-02,1.017,0.919,70.87,0.02225,0.02011,55.79
30,0.1651701,0.01247953,1.9552,0.0821,0.163293,0.01641976,1.9676,0.1092,-0.003909466,3595.4846,31.9761,50.0043403,41.4971614,2.50E-01,4.42E-01,2.21E-01,0.766,0.324,56.75,0.08087,0.03087,58.28

F550M.csv (файл 2)

2,1921.566,0.01258874,-8.2091,0,37128.06,0.2618096,-11.4243,0,0.01455503,4617.5225,554.576,49.9887896,41.5264699,6.09E+01,8.09E+02,1.78E+01,28.459,7.779,88.63,0.00054,0.00036,77.04
3,1.055918,0.01256313,-0.0591,0.0129,9.834856,0.1109255,-2.4819,0.0122,-0.002955142,3936.4946,85.3255,49.9949149,41.5370016,3.98E+01,1.23E+01,1.54E+01,6.83,2.336,24.13,0.06362,0.01965,23.98
4,151.2355,0.01260153,-5.4491,0.0001,184.0693,0.03634057,-5.6625,0.0002,-0.002626019,3409.2642,76.9891,49.9931935,41.5442109,4.02E+00,4.35E+00,-1.47E-03,2.086,2.005,-89.75,0.00227,0.00198,66.61
5,0.3506025,0.01258874,1.138,0.039,0.3466277,0.01300407,1.1503,0.0407,-0.002441164,3351.9893,8.9147,49.9942299,41.5451727,4.97E-01,5.07E-01,7.21E-03,0.715,0.702,62.75,0.02,0.01989,82.88
6,1.166133,0.01257594,-0.1669,0.0117,0.005819145,0.009692424,5.5879,1.8089,-0.003201006,3476.9932,10,49.9946543,41.5434658,5.88E-01,8.33E-02,0.00E+00,0.767,0.289,0,0.01497,0.00499,0
7,0.1372164,0.0125503,2.1565,0.0993,0.1238123,0.02608246,2.2681,0.2288,-0.003556473,3535.5281,13.4586,49.9947993,41.5426587,2.49E-01,2.48E-01,-7.69E-03,0.506,0.491,-43.27,0.05264,0.05237,-55.87
8,0.6174777,0.01260153,0.5234,0.0222,0.6206718,0.01300407,0.5178,0.0228,-0.002441164,3357.0044,20.0487,49.9940449,41.5450748,5.10E-01,5.22E-01,-6.28E-03,0.724,0.712,-66.7,0.01194,0.01192,84.82
9,1.46848,0.01260153,-0.4172,0.0093,0.001897994,0.009688255,6.8043,5.5435,-0.003612399,3584.0171,16,49.9949252,41.5419909,5.87E-01,8.33E-02,0.00E+00,0.766,0.289,0,0.01175,0.00392,0
10,1.452348,0.01258874,-0.4052,0.0094,3.124427,0.04807406,-1.2369,0.0167,-0.003148756,3805.6069,39.5791,49.9952831,41.5389075,2.25E+00,3.87E+00,-6.77E-01,2.03,1.416,-70.08,0.0302,0.01891,-67.61
11,0.1548658,0.01260153,2.0251,0.0884,0.1777253,0.01630147,1.8756,0.0996,-0.002919044,3459.7681,25.6248,49.9943085,41.5436591,4.64E-01,2.34E-01,8.40E-02,0.701,0.455,18.09,0.05739,0.03321,18.33
12,0.5046132,0.01253746,0.7426,0.027,0.7798272,0.04462456,0.27,0.0621,-0.00261193,3418.9119,65.5326,49.9934365,41.5441099,6.87E-01,2.77E+00,-2.92E-01,1.678,0.804,-82.19,0.05363,0.02182,-83.28
13,0.380733,0.01260153,1.0484,0.0359,0.4313257,0.01605258,0.913,0.0404,-0.003497544,3548.8484,34.5602,49.9944623,41.542421,8.27E-01,8.51E-01,8.92E-02,0.964,0.865,48.75,0.03776,0.03252,30.61
14,0.1643925,0.01258874,1.9603,0.0832,0.2181225,0.01839054,1.6532,0.0916,-0.003121084,3710.6785,33.3215,49.9950598,41.5402182,2.18E-01,2.18E-01,1.03E-01,0.567,0.339,45,0.0757,0.04376,45
15,0.3959635,0.01260153,1.0059,0.0346,0.9984215,0.0763398,0.0017,0.083,-0.003106286,3805.9988,48.3363,49.995125,41.5388789,1.87E+00,3.12E+00,4.86E-01,1.813,1.304,71.09,0.0559,0.04105,67.61
16,0.1625628,0.01260153,1.9724,0.0842,0.3490304,0.02234424,1.1428,0.0695,-0.002472953,3410.77,38.0388,49.9939083,41.544294,1.77E-01,4.75E-01,8.92E-03,0.689,0.421,88.29,0.0769,0.04707,89.86
17,0.1725209,0.01260153,1.9079,0.0793,0.2965718,0.02357189,1.3197,0.0863,-0.003454017,3629.0247,40.9706,49.9946304,41.541311,3.73E-01,7.91E-01,-3.73E-01,1.004,0.393,-59.65,0.09781,0.03734,-58.27
18,0.3034717,0.01260153,1.2947,0.0451,0.5031242,0.02774418,0.7458,0.0599,-0.003073985,4079.0825,42,49.9962105,41.5351731,6.68E-01,8.33E-02,0.00E+00,0.818,0.289,0,0.06348,0.02106,0
19,1.593927,0.01260153,-0.5062,0.0086,1.860803,0.0219809,-0.6743,0.0128,-0.003038161,4065.9434,58.3703,49.9958657,41.5353087,1.75E+00,1.41E+00,-7.15E-03,1.323,1.188,-1.21,0.01697,0.01464,-0.43
20,0.5464995,0.01258874,0.656,0.025,0.5661472,0.0144696,0.6177,0.0278,-0.003053429,4045.0474,54.439,49.9958631,41.535604,5.43E-01,8.46E-01,-1.22E-03,0.92,0.737,-89.77,0.02257,0.01649,-89.72
21,1.303251,0.01253746,-0.2876,0.0104,1.296672,0.01418861,-0.2821,0.0119,-0.00259741,4240.1406,55.2714,49.9965409,41.5329423,6.05E-01,6.81E-01,7.89E-03,0.826,0.777,84.15,0.00892,0.00852,69.62
22,0.5174786,0.01260153,0.7153,0.0264,0.5260691,0.01390194,0.6974,0.0287,-0.003019847,3828.95,55.19,49.9950817,41.5385478,5.18E-01,7.56E-01,-6.34E-02,0.879,0.709,-75.96,0.0236,0.01643,-75.02
23,0.1551826,0.01260153,2.0229,0.0882,0.166565,0.01726119,1.946,0.1125,-0.003271136,3504.7439,52.7386,49.9939745,41.5429739,1.91E-01,6.86E-01,1.89E-01,0.866,0.356,71.33,0.10376,0.04235,71.56
24,0.2214222,0.01260153,1.6369,0.0618,0.2389908,0.01360924,1.554,0.0618,-0.00285033,3750.3167,54.0027,49.994824,41.5396229,4.32E-01,5.51E-01,1.68E-03,0.742,0.657,89.18,0.04862,0.04505,89.94
25,0.1336059,0.01253746,2.1854,0.1019,0.1320868,0.009830156,2.1979,0.0808,-0.002921393,3459.6851,51.7091,49.9938331,41.5435908,2.16E-01,2.06E-01,-9.16E-02,0.55,0.345,-43.52,0.06231,0.03626,-45.19
26,0.1703959,0.01260153,1.9214,0.0803,0.1577456,0.0152816,2.0051,0.1052,-0.002779523,3446.95,49,49.9938372,41.5437717,7.29E-01,8.33E-02,0.00E+00,0.854,0.289,0,0.11183,0.03721,0
27,1.896325,0.01258874,-0.6948,0.0072,1.941203,0.0152816,-0.7202,0.0085,-0.00306097,3809.6836,57.8143,49.9949655,41.5388035,7.38E-01,6.80E-01,7.46E-03,0.86,0.824,7.18,0.00713,0.00678,59.71
28,0.6522877,0.01260153,0.4639,0.021,0.1713469,0.01312423,1.9153,0.0832,-0.002447558,4271.9614,52,49.9967135,41.5325172,5.92E-01,8.33E-02,0.00E+00,0.77,0.289,0,0.0274,0.00913,0
29,0.1370073,0.0125503,2.1581,0.0995,0.101415,0.02614047,2.4847,0.2799,-0.002207851,4324.667,55.3374,49.99684,41.5317898,2.22E-01,2.24E-01,1.12E-01,0.579,0.332,45.18,0.07753,0.04476,45
30,0.2240251,0.01253746,1.6243,0.0608,0.2254432,0.01360924,1.6174,0.0656,-0.003037372,3960.3042,58.9024,49.9954807,41.5367473,4.18E-01,4.81E-01,-1.07E-02,0.695,0.645,-80.65,0.03802,0.03492,-88.86

1 Ответ

1 голос
/ 15 марта 2019

Вы усложняете программу, вкладывая все циклы и условные выражения. Разбейте его на простые шаги.

Выполните следующие действия.
1. Прочитайте оба файла csv и преобразуйте их в 2d списки.
2. Сравните столбцы / значения списков в цикле на основе заданного индекса, добавьте строки из второго списка в новый список вывода.
3. Запишите список вывода в файл CSV.

def read_file(filepath):
    with open(filepath,'r') as f:
        x = csv.reader(f)
        l = list(x)  
    return l  

l435 = read_file('F435W.csv')
l550 = read_file('F550M.csv')
new_F550M = []
r = 0.002778

for i in l550:
    for j in l435:
       # I did't exactly get your if condition, so I am putting it down based on what I understood, so if it is wrong, modify it accordingly.
       if isfloat(i[12]) and isfloat(j[12]) and abs(float(i[12]) float(j[12])) < r:
            if isfloat(i[13]) and isfloat(j[13]) and abs(float(i[13]) float(j[13])) < r:
                new_F550M.append(i)

with open('new_F550M.csv','w') as f:
    out = csv.writer(f)
    out.writerows(new_F550M)
...