Я использую Dask для манипулирования набором данных.Я хочу связать эти наборы данных без уникальных значений, основанных на соответствующем им квантиле, а затем пометить каждый.
В пандах это довольно просто:
tags = range(4, 0, -1)
groups = pd.qcut(df.column_name.rank(method='first'), q = 4, labels = tags)
df['ranks'] = groups.values
Но я понятия не имею, как это сделать.это в Dask, использовать ли map или map_partitions.
Я перечитывал документацию снова и снова безрезультатно, я также нашел похожий вопрос , но в ответе отсутствует объяснение.нужно.
Мой начальный код
data['tot_top_amt'].map_partitions(pd.qcut,4, duplicates='drop')
Но я получаю сообщение об ошибке:
Traceback (most recent call last):
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/dataframe/utils.py", line 162, in raise_on_meta_error
yield
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/dataframe/core.py", line 3740, in _emulate
return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/pandas/core/reshape/tile.py", line 306, in qcut
dtype=dtype, duplicates=duplicates)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/pandas/core/reshape/tile.py", line 350, in _bins_to_cuts
dtype=dtype)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/pandas/core/reshape/tile.py", line 457, in _format_labels
v = adjust(labels[0].left)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/pandas/core/indexes/interval.py", line 1303, in __getitem__
mask = self._isnan[value]
IndexError: index 0 is out of bounds for axis 0 with size 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "dask_eda.py", line 7, in <module>
a = data['tot_top_amt'].map_partitions(pd.qcut,4, duplicates='drop')
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/dataframe/core.py", line 568, in map_partitions
return map_partitions(func, self, *args, **kwargs)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/dataframe/core.py", line 3779, in map_partitions
meta = _emulate(func, *args, udf=True, **kwargs2)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/dataframe/core.py", line 3740, in _emulate
return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/contextlib.py", line 99, in __exit__
self.gen.throw(type, value, traceback)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/dataframe/utils.py", line 179, in raise_on_meta_error
raise ValueError(msg)
ValueError: Metadata inference failed in `qcut`.
You have supplied a custom function and Dask is unable to
determine the type of output that that function returns.
To resolve this please provide a meta= keyword.
The docstring of the Dask function you ran should have more information.
Original error is below:
------------------------
IndexError('index 0 is out of bounds for axis 0 with size 0',)
Traceback:
---------
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/dataframe/utils.py", line 162, in raise_on_meta_error
yield
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/dataframe/core.py", line 3740, in _emulate
return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/pandas/core/reshape/tile.py", line 306, in qcut
dtype=dtype, duplicates=duplicates)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/pandas/core/reshape/tile.py", line 350, in _bins_to_cuts
dtype=dtype)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/pandas/core/reshape/tile.py", line 457, in _format_labels
v = adjust(labels[0].left)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/pandas/core/indexes/interval.py", line 1303, in __getitem__
mask = self._isnan[value]