Масштабирование фрейма данных мешает упорядочению по конкретному столбцу (dplyr, R)? - PullRequest
0 голосов
/ 08 ноября 2018

У меня есть масштабированный фрейм данных (после применения масштаба () к нему).

Вот пример и str () кадра данных.

df <- structure(list(user_id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), obs_id = c(1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 11L, 11L, 11L, 11L, 11L, 
11L, 11L, 12L, 12L), scroll_id = c(3L, 1L, 2L, 3L, 4L, 5L, 6L, 
7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 
20L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 
14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 1L, 2L, 
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 
5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 5L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 1L, 2L), timestamp = c(-1.74966971796047, -1.70403832189443, 
-1.70379906928687, -1.70361867040459, -1.70347088963619, -1.70319128699835, 
-1.70294111235573, -1.70276028812429, -1.70258838640936, -1.70240339828655, 
-1.70222294730529, -1.70203831619891, -1.70184580908597, -1.70164818318538, 
-1.70149237212696, -1.7013640296891, -1.70107398661715, -1.70085023899303, 
-1.70074099926259, -1.70044208036186, -1.69930770675872, -1.69667227598748, 
-1.69639939688651, -1.69618314094462, -1.69601502194278, -1.69582576157782, 
-1.69559191392476, -1.69535447622416, -1.69505896439534, -1.69482541373163, 
-1.69460777288812, -1.69424368835751, -1.69400465270319, -1.69366103187144, 
-1.69327144514213, -1.69302531987381, -1.68783534954733, -1.68768506047735, 
-1.68752547752833, -1.68736590892541, -1.68712878394453, -1.68687829142263, 
-1.6865139129229, -1.68613167693171, -1.68580213389856, -1.68559918345711, 
-1.68526117320724, -1.67762462071637, -1.67743949844498, -1.67719347385101, 
-1.67700070322293, -1.67679554914567, -1.67656637580297, -1.67627753717385, 
-1.67586088842253, -1.67553297492968, -1.67524660357733, -1.66751967924049, 
-1.66734441626049, -1.66718372060815, -1.66695586421188, -1.66670939904175, 
-1.66646604382834, -1.66622975192775, -1.66573512056842, -1.66551213794357, 
-1.66527869688907, -1.65824249264058, -1.65804336329432, -1.6577609338464, 
-1.6574903921523, -1.65724853258229, -1.65700644737587, -1.65672791100531, 
-1.65651480591907, -1.65401190903667, -1.65381700511898, -1.65366132770577, 
-1.6534762649581, -1.65332788630987, -1.65305013457006, -1.64033030580623, 
-1.64012725129267, -1.63989790705525, -1.639762598707, -1.63959055894236, 
-1.63945035366759, -1.63927849801117, -1.63895297868036, -1.63865614273206, 
0.106686899084076, 0.118459441302061, 0.118859882614618, 0.119257328235862, 
0.119720814598829, 0.120355746999422, 0.12094259028028, 0.121517254355051, 
0.133461022259665, 0.133935049982786), row_num = 1:100, scroll_length = c(6, 
9, 14, 12, 13, 26, 12, 13, 11, 11, 12, 12, 11, 9, 10, 22, 13, 
4, 18, 7, 20, 9, 11, 13, 9, 11, 11, 12, 12, 13, 21, 19, 17, 28, 
18, 19, 6, 13, 8, 7, 14, 11, 10, 21, 7, 19, 16, 8, 13, 13, 10, 
12, 17, 29, 25, 18, 9, 7, 9, 11, 8, 13, 13, 24, 5, 12, 27, 13, 
16, 16, 10, 13, 20, 11, 18, 5, 11, 13, 11, 12, 20, 8, 12, 14, 
12, 14, 13, 22, 15, 7, 6, 6, 7, 9, 8, 9, 9, 26, 4, 7), x_mean = c(-1.74134749014902, 
-1.19087086808828, 1.36178725012622, -1.32786301490502, 1.24184201608646, 
-1.31953110973881, 1.26803503941515, -1.37457737398187, 1.26237762807268, 
1.19840722052349, 1.16504720433086, 1.0462654080771, 0.968758625683449, 
0.956775697244848, -1.47633515785, 0.899679425904551, 0.90696778043178, 
0.954448103151732, 0.880652150680739, 1.12902825876581, -1.25263840905782, 
-1.38688943772197, 0.876719500649841, -1.43919039141226, 0.912305509523784, 
-1.37582892077241, -1.29700634806947, -1.3516030947317, -1.33988639467635, 
1.03719166555639, -1.25281297086212, -1.36837727945264, -1.3013481499417, 
-1.12367601244846, -1.1867366485039, -1.1447906384705, 1.46822755722482, 
-1.51069448012566, 1.38801519536057, -1.28775740368453, -1.39288792410805, 
-1.3628256538654, -1.27243867579661, -1.34361528369304, -1.17198780192935, 
-1.13670671078087, 1.29697666340619, -1.49967886217826, 1.37352505680918, 
-1.43215994845407, 1.31551055589611, -1.42165675521735, -1.46000307590573, 
-1.45208194113707, -1.6500854625997, -1.51590332700378, -1.34525096166829, 
1.34594255443042, -1.31379823751357, 1.20241056443821, -1.27481075401918, 
-1.33709175501284, -1.28290838125703, -1.2599521767185, -1.12280098523898, 
-1.08865832426231, 1.2372036022589, -1.46650874196874, 1.20057767733893, 
1.1397720268145, 1.26778761577712, -1.32497265377788, -1.37730118917685, 
-1.20783694513472, -1.21720091894545, 1.34512212727258, -1.44648059935705, 
1.24875675994755, -1.32534469783767, 1.21272438356554, 1.15243129274637, 
-1.47296088831071, 1.38315156564797, -1.35831310138532, 1.28103755957798, 
-1.51953863958324, 1.29371236787117, -1.45963878156923, -1.54319041798057, 
-1.33782684342955, -0.116027859942214, 1.19005265031903, 1.08011437268818, 
0.980708462028398, 1.04375473285892, 1.01695992937272, 0.900330071630953, 
1.05795887600219, -1.51367047422977, -1.28054519452972), y_mean = c(-4.93507461932646, 
0.0304680987883223, 0.140001980341645, 0.61911843405746, 0.434230282460559, 
0.438563278736709, 0.293631671334964, 0.154410306899388, 0.401744019451561, 
0.36128426810253, 0.241124960593543, 0.600782688122651, 0.493847384568541, 
0.65439419165184, -0.0681595249734346, 1.05950714592312, 0.761975569044308, 
0.282570484077489, 0.718419364467949, 1.64069575643528, -0.127848923653606, 
-0.146545841721022, 1.1912951282035, 0.144509086604169, 1.04463101098425, 
0.430268926793686, 0.314223957247001, 0.100897716667591, -0.0425031965566726, 
0.258438186158059, 0.678490366423787, 0.0858853029684605, -0.27945335364965, 
0.404453412742001, 0.887555897576977, 0.814299169432573, 0.413160301258902, 
0.356957716046724, 0.340839665399864, 0.322899575010365, 0.178329477430451, 
0.544292645643499, 0.893841483905461, -0.091019994498439, 1.1446151574792, 
0.088616218629274, 0.726086962212147, -0.388415706321866, 1.04321403715794, 
-0.0913412290296168, 0.7948571888564, -0.00372758425509035, -0.083551562474271, 
-0.68421340952266, 0.90000676791651, 0.401793927802844, -1.96055350952763, 
0.55822918052635, -0.143841284430801, 1.00404150680558, -0.351892547202368, 
-0.604496829235394, -0.423149556817751, -1.11618728019758, -0.14440791809138, 
-0.546248843820815, 1.1900100419081, -0.255287284511152, 0.873424917731293, 
0.995161278018525, 0.860852090311174, 0.388011020049902, 0.0999905602296405, 
1.03970973386284, 0.922025425144326, 0.0481642325381772, 0.285344044119462, 
1.162491122371, 0.362963176802066, 1.35490345691353, 1.16751581329336, 
0.307862842322703, 1.13076805560918, 0.485805486499478, 1.33522908768678, 
0.161684382128358, 1.47294941368768, -0.220913762365118, -0.823915124538901, 
0.550574584061472, -4.77449848953265, 0.0087192624676276, 0.0920267383275781, 
0.889483318331135, 0.522059898914639, 0.695254653958987, 0.840851680499038, 
1.33767244868339, 0.351307449724814, 1.14901452631645), dx_mean = c(-0.514034686928457, 
-0.709482080612108, 0.924636289935977, -0.702980646737082, 0.515080876392673, 
-0.359676884238743, 0.201670657817143, -0.703758861104736, 0.35524796882122, 
0.497291976059282, 0.529203528197022, 0.0371434207718647, 0.0913592689813406, 
0.160055768463605, -1.08791332093132, 0.0575134393069876, -0.112562890376558, 
0.099525581091022, -0.0713384254907926, -0.0513862425691886, 
-0.221532705188051, -0.775713460273571, -0.0736903239126327, 
-0.640211495841536, 0.169096063758674, -0.898458071334162, -0.63798105266279, 
-0.963447672083935, -0.955212412672127, 0.363467923032555, -0.394906159490777, 
-0.559917114779459, -0.574268084479777, -0.0273902254763013, 
-0.221835669438393, -0.0861804165999153, 0.994694629331326, -0.559917136531156, 
0.960935318870694, -0.145062952802999, -0.863251386037286, -0.748103526151045, 
-0.416360550339528, -0.202627736267389, -0.32727144437964, -0.040680303825919, 
0.0540256116223241, -0.368985686778189, 0.369423610176312, -0.276975122295657, 
0.0727048818072388, -0.3828577763849, -0.415684687030015, -0.297152750489994, 
-0.267563324747934, -0.149993332685809, -0.155768866169288, 0.659889137622368, 
-0.542758875898267, 0.231874144004707, 0.039324053803897, -0.585920971267586, 
-0.67271157639439, -0.270857489417409, -0.0785113579510327, -0.00197437376628199, 
0.0814818233744693, -0.268892264196792, 0.518245748305908, 0.0871609014380659, 
0.493653609250937, -0.456681552419427, -0.199229820187941, -0.269965359999644, 
-0.282408596987534, 1.17173326215537, -0.153984610717356, 0.164643636394461, 
-0.513078687761111, 0.160567290293988, 0.0314570772696466, -0.770407551101201, 
0.413909349850608, -0.449289368704779, 0.206371255312031, -0.117406086493973, 
0.22964379750017, -0.19102861289208, -0.595943303115521, -0.104915809400637, 
0.00214323363018966, 1.98638306270689, 2.49461369261821, 2.03395773633152, 
2.87489646225022, 2.31219926685024, 0.85593965510982, -0.596594826526322, 
-4.61418725388315, -2.9486739997395), dy_mean = c(0.972265996197407, 
-0.692113718739584, -0.162463490249733, -0.373682612876388, -0.0663766957581004, 
0.293619375985922, 0.122073685940586, -0.285020148233188, -0.172842432309118, 
-0.219978162459523, -0.115892678260361, 0.0489675598198674, -0.160950008562538, 
-0.111002834150848, -0.453615467401099, 0.451722377264225, 0.546742515974247, 
0.127398077458348, 0.731724212982357, 1.01053722821425, 0.690956554920146, 
-0.337459424957412, 0.019162432997017, -0.243330321718604, -0.231688622572203, 
-0.547945830545732, -0.480308752606583, -0.667755977515433, -0.568560406165882, 
-0.0513258952443496, 0.0535542858714361, 0.167792931838639, -0.1355661374848, 
0.99033911463052, 1.71911716403875, 1.68255305598213, -1.31818510289039, 
-0.511877097332388, -0.873580216225893, -0.98280735956548, -0.346511138345665, 
-0.677488522915486, -0.702938120854123, 0.312636624174749, 2.41718837592081, 
1.35925888758152, 1.39418510791406, -0.8028921662872, -0.631446880430544, 
-0.603987414346105, -0.691380124718193, -0.849381925680468, -0.175581009201558, 
1.24601604290671, 1.71224193470548, 1.45050208434776, 1.04777040581163, 
-0.586576101243231, -1.0218449931103, -0.437929321738847, -1.27668853261677, 
-0.570258317292519, -0.642907214350154, 0.165431841427611, 2.4978548012476, 
2.08597194873166, 1.39130283430992, 0.0462477350626993, 0.154367086822897, 
0.294450305064071, -0.41008238161085, -0.435334843937996, 0.268249368449487, 
2.53922192675486, 1.69999861875844, -0.593505715413604, -0.684638051698653, 
-0.188842482031514, -0.686151622747976, -0.219193941042949, 0.545012560914077, 
-0.649719520780133, -0.159203425763482, -0.500287722903654, -0.133170252897988, 
-0.522082922608582, 0.0459739039693266, 0.104184366639482, 0.266292833518668, 
1.01734829793621, 1.02415910435903, -0.991064372354605, -0.324413168579542, 
-0.281373072326476, -0.638727085368933, -0.421121612312956, 0.210475166278368, 
1.66052247394878, -0.747122230059145, -0.612744264625772)), .Names = c("user_id", 
"obs_id", "scroll_id", "timestamp", "row_num", "scroll_length", 
"x_mean", "y_mean", "dx_mean", "dy_mean"), row.names = c(NA, 
-100L), class = c("tbl_df", "tbl", "data.frame"))

и стр. ():

Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 356 obs. of  28 variables: 
$ user_id                : int 1 1 1 1 1 1 1 1 1 1 ... 
$ obs_id                 : int 1 2 2 2 2 2 2 2 2 2 ... 
$ scroll_id              : int 3 1 2 3 4 5 6 7 8 9 ... 
$ timestamp              : num [1:356, 1] -1.75 -1.7 -1.7 -1.7 -1.7 ...   ..- attr(*, "scaled:center")= num 1.54e+18   ..- attr(*, "scaled:scale")= num 2.03e+12 
$ row_num                : int 1 2 3 4 5 6 7 8 9 10 ... 
$ scroll_length          : num 6 9 14 12 13 26 12 13 11 11 ... 
$ x_mean                 : num [1:356, 1] -1.74 -1.19 1.36 -1.33 1.24 ...   ..- attr(*, "scaled:center")= num 538   ..- attr(*, "scaled:scale")= num 256 
$ y_mean                 : num [1:356, 1] -4.9351 0.0305 0.14 0.6191 0.4342 ...   ..- attr(*, "scaled:center")= num 949   ..- attr(*, "scaled:scale")= num 185 
$ dx_mean                : num [1:356, 1] -0.514 -0.709 0.925 -0.703 0.515 ...   ..- attr(*, "scaled:center")= num 0.506   ..- attr(*, "scaled:scale")= num 9.85 
$ dy_mean                : num [1:356, 1] 0.9723 -0.6921 -0.1625 -0.3737 -0.0664 ...   ..- attr(*, "scaled:center")= num -30   ..- attr(*, "scaled:scale")= num 29.8 
$ press_lin_acc_mag      : num [1:356, 1] 1.0569 -0.5687 -0.2669 0.0235 -0.5489 ...   ..- attr(*, "scaled:center")= num 0.853   ..- attr(*, "scaled:scale")= num 0.963 
$ press_vel_ang_unc_mag  : num [1:356, 1] 0.0164 0.7285 0.1068 -0.1189 -0.6445 ...   ..- attr(*, "scaled:center")= num 0.359   ..- attr(*, "scaled:scale")= num 0.277 
$ release_lin_acc_mag    : num [1:356, 1] 0.0055 2.0082 -0.3155 -0.347 -0.9376 ...   ..- attr(*, "scaled:center")= num 0.669   ..- attr(*, "scaled:scale")= num 0.522 
$ release_vel_ang_unc_mag: num [1:356, 1] 1.069 0.909 -0.671 -0.586 -0.179 ...   ..- attr(*, "scaled:center")= num 0.29   ..- attr(*, "scaled:scale")= num 0.267 
$ intra_scroll_time      : num [1:356, 1] -0.774 -0.227 0.459 0.17 0.254 ...   ..- attr(*, "scaled:center")= num 1.75e+08   ..- attr(*, "scaled:scale")= num 1.28e+08 
$ inter_scroll_time      : num [1:356, 1] 2.628 -0.218 -0.057 -0.155 -0.169 ...   ..- attr(*, "scaled:center")= num 4.62e+08   ..- attr(*, "scaled:scale")= num 2.12e+09 
$ press_size             : num [1:356, 1] 1.746 0.33 -0.614 -1.086 0.33 ...   ..- attr(*, "scaled:center")= num 0.0365   ..- attr(*, "scaled:scale")= num 0.00831 
$ release_size           : num [1:356, 1] 1.086 0.124 -0.518 -0.518 -1.16 ...   ..- attr(*, "scaled:center")= num 0.0495   ..- attr(*, "scaled:scale")= num 0.0122 
$ size_mean              : num [1:356, 1] 1.4397 0.0256 -0.6225 -0.6678 -0.6751 ...   ..- attr(*, "scaled:center")= num 0.0421   ..- attr(*, "scaled:scale")= num 0.00887 
$ mid_size_mean          : num [1:356, 1] 1.019 0.18 -0.239 -0.239 -0.658 ...   ..- attr(*, "scaled:center")= num 0.0415   ..- attr(*, "scaled:scale")= num 0.00935 
$ acc_start_4_avg        : num [1:356, 1] 0.228 0.314 0.319 0.335 0.317 ...   ..- attr(*, "scaled:center")= num -3.29e-13   ..- attr(*, "scaled:scale")= num 1.16e-12 
$ L                      : num [1:356, 1] -1.84 0.741 0.959 0.917 0.449 ...   ..- attr(*, "scaled:center")= num 380   ..- attr(*, "scaled:scale")= num 187 
$ vel_mean               : num [1:356, 1] -1.158 0.171 -0.333 -0.155 -0.4 ...   ..- attr(*, "scaled:center")= num 3.16e-06   ..- attr(*, "scaled:scale")= num 2.31e-06 
$ gra_x_mean             : num [1:356, 1] 0.0867 0.276 -0.1085 0.1246 -0.0252 ...   ..- attr(*, "scaled:center")= num -0.161   ..- attr(*, "scaled:scale")= num 1.28 
$ gra_y_mean             : num [1:356, 1] -0.966 -1.038 -1.031 -1.202 -1.215 ...   ..- attr(*, "scaled:center")= num 5.46   ..- attr(*, "scaled:scale")= num 2.32 
$ gra_z_mean             : num [1:356, 1] 0.702 0.718 0.716 0.755 0.757 ...   ..- attr(*, "scaled:center")= num 7   ..- attr(*, "scaled:scale")= num 3.22 
$ lin_acc_mag_mean       : num [1:356, 1] 0.753 0.334 -0.431 -0.514 -0.683 ...   ..- attr(*, "scaled:center")= num 0.722   ..- attr(*, "scaled:scale")= num 0.493 
$ vel_ang_unc_mag_mean   : num [1:356, 1] 0.939 0.982 0.864 0.221 0.22 ...   ..- attr(*, "scaled:center")= num 0.316   ..- attr(*, "scaled:scale")= num 0.233

Я показал первые 10 столбцов и 100 строк в приведенном выше df.

Теперь, если я хочу организовать по отметке времени (другие столбцы позволяют мне делать это без проблем):

df %>% dplyr::arrange(timestamp)

Я получаю:

Ошибка вrange_impl (.data, точки): аргумент 1 имеет неподдерживаемый тип матрица

Но если я переведу его на data.table, он будет отлично работать:

df %>% as.data.table() %>% dplyr::arrange(timestamp)

Если я запускаю его на немасштабированном df, он работает нормально, но если я выполняю:

df_unscaled %>%
  mutate_at(vars(-"user_id", -"obs_id",
                 -"scroll_id", -"row_num",
                 -"scroll_length"),
            scale) %>% arrange(timestamp)

Я получаю ту же ошибку:

Ошибка вrange_impl (.data, точки): аргумент 1 имеет неподдерживаемый тип матрица

Посоветуйте, пожалуйста, что здесь не так? Это тип, который ломает dplyr::arrange()

Если вам нужны примеры данных, которыми я могу поделиться, скажите, пожалуйста, куда их можно отправить.

...