У меня есть pandas фрейм данных, который состоит из одного столбца из numpy массивов. Я могу использовать функцию numpy.mean
для вычисления среднего значения массивов.
import numpy
import pandas
f = pandas.DataFrame({"a":[numpy.array([1.0, 2.0]), numpy.array([3.0, 4.0])]})
numpy.mean(f["a"]) # returns array([2., 3.])
Я хочу сделать то же самое в Dask.
import dask.dataframe
import dask.array
g = dask.dataframe.from_pandas(f, npartitions=1)
dask.array.mean(g["a"], dtype="float64")
(Вы должны указать dtype
, в противном случае вы получите TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'
исключение.)
Вызов dask.array.mean
возвращает следующее, что выглядит правильно.
dask.array<mean_agg-aggregate, shape=(), dtype=float64, chunksize=(), chunktype=numpy.ndarray>
Однако, когда я запускаю dask.array.mean(g["a"], dtype="float64").compute()
чтобы получить окончательное значение, я получаю исключение ValueError: setting an array element with a sequence.
. Полный стек выглядит следующим образом.
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<input>", line 1, in <module>
File "/Users/wmcneill/src.private/radius_limit/venv/lib/python3.7/site-packages/dask/base.py", line 165, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/Users/wmcneill/src.private/radius_limit/venv/lib/python3.7/site-packages/dask/base.py", line 436, in compute
results = schedule(dsk, keys, **kwargs)
File "/Users/wmcneill/src.private/radius_limit/venv/lib/python3.7/site-packages/dask/threaded.py", line 81, in get
**kwargs
File "/Users/wmcneill/src.private/radius_limit/venv/lib/python3.7/site-packages/dask/local.py", line 486, in get_async
raise_exception(exc, tb)
File "/Users/wmcneill/src.private/radius_limit/venv/lib/python3.7/site-packages/dask/local.py", line 316, in reraise
raise exc
File "/Users/wmcneill/src.private/radius_limit/venv/lib/python3.7/site-packages/dask/local.py", line 222, in execute_task
result = _execute_task(task, data)
File "/Users/wmcneill/src.private/radius_limit/venv/lib/python3.7/site-packages/dask/core.py", line 118, in _execute_task
args2 = [_execute_task(a, cache) for a in args]
File "/Users/wmcneill/src.private/radius_limit/venv/lib/python3.7/site-packages/dask/core.py", line 118, in <listcomp>
args2 = [_execute_task(a, cache) for a in args]
File "/Users/wmcneill/src.private/radius_limit/venv/lib/python3.7/site-packages/dask/core.py", line 119, in _execute_task
return func(*args2)
File "/Users/wmcneill/src.private/radius_limit/venv/lib/python3.7/site-packages/dask/optimization.py", line 982, in __call__
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
File "/Users/wmcneill/src.private/radius_limit/venv/lib/python3.7/site-packages/dask/core.py", line 149, in get
result = _execute_task(task, cache)
File "/Users/wmcneill/src.private/radius_limit/venv/lib/python3.7/site-packages/dask/core.py", line 119, in _execute_task
return func(*args2)
File "/Users/wmcneill/src.private/radius_limit/venv/lib/python3.7/site-packages/dask/utils.py", line 29, in apply
return func(*args, **kwargs)
File "/Users/wmcneill/src.private/radius_limit/venv/lib/python3.7/site-packages/dask/array/reductions.py", line 539, in mean_chunk
total = sum(x, dtype=dtype, **kwargs)
File "<__array_function__ internals>", line 6, in sum
File "/Users/wmcneill/src.private/radius_limit/venv/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2229, in sum
initial=initial, where=where)
File "/Users/wmcneill/src.private/radius_limit/venv/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 90, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: setting an array element with a sequence.
Возможно ли выполнить эквивалентную операцию Dask?