distributed-dataset-0.0.1.0: A distributed data processing framework in pure Haskell

Safe HaskellNone
LanguageHaskell2010

Control.Distributed.Dataset.Aggr

Contents

Synopsis

Documentation

data Aggr a b #

Represent an aggregation which takes many as and returns a single b.

Use dAggr and dGroupedAggr functions to use them on Datasets.

You can use the StaticApply and StaticProfunctor instances to compose Aggrs together. Example:

dAvg :: Aggr Double Double
dAvg =
  aggrConst (static (/))
    `staticApply` aggrSum (static Dict)
    `staticApply` staticMap (static realToFrac) aggrCount

Alternatively, you can use aggrFrom* functions to create Aggrs.

Instances
StaticProfunctor Aggr # 
Instance details

Defined in Control.Distributed.Dataset.Internal.Aggr

Methods

staticDimap :: (Typeable a, Typeable b, Typeable c, Typeable d) => Closure (a -> b) -> Closure (c -> d) -> Aggr b c -> Aggr a d #

staticLmap :: (Typeable a, Typeable b, Typeable c) => Closure (a -> b) -> Aggr b c -> Aggr a c #

staticRmap :: (Typeable a, Typeable c, Typeable d) => Closure (c -> d) -> Aggr a c -> Aggr a d #

Typeable m => StaticApply (Aggr m) # 
Instance details

Defined in Control.Distributed.Dataset.Internal.Aggr

Methods

staticApply :: (Typeable a, Typeable b) => Aggr m (a -> b) -> Aggr m a -> Aggr m b #

Typeable m => StaticFunctor (Aggr m) # 
Instance details

Defined in Control.Distributed.Dataset.Internal.Aggr

Methods

staticMap :: (Typeable a, Typeable b) => Closure (a -> b) -> Aggr m a -> Aggr m b #

aggrConst :: forall a t. (Typeable a, Typeable t) => Closure a -> Aggr t a #

An aggregation which ignores the input data and always yields the given value.

aggrCount :: Typeable a => Aggr a Integer #

Returns the number of inputs.

aggrSum :: StaticSerialise a => Closure (Dict (Num a)) -> Aggr a a #

Returns the sum of the inputs.

aggrMean :: Aggr Double Double #

Calculates the mean of the inputs.

aggrMax :: StaticSerialise a => Closure (Dict (Ord a)) -> Aggr a (Maybe a) #

Return the maximum of the inputs.

Returns Nothing on empty Datasets.

aggrMin :: StaticSerialise a => Closure (Dict (Ord a)) -> Aggr a (Maybe a) #

Return the minimum of the inputs.

Returns Nothing on empty Datasets.

aggrCollect :: StaticSerialise a => Aggr a [a] #

Collects the inputs as a list.

Warning: Ordering of the resulting list is non-deterministic.

aggrDistinct :: forall a. (StaticSerialise a, StaticHashable a) => Aggr a (HashSet a) #

Collects the inputs to a HashSet.

aggrTopK #

Arguments

:: (StaticSerialise a, Typeable k) 
=> Closure (Dict (Ord k)) 
-> Int

Number of rows to return

-> Closure (a -> k)

Sorting key

-> Aggr a [a] 

Returns the n greatest elements according to a key function. Similar to: take n . sortOn (Down . f)

Warning: Ordering of the repeated elements is non-deterministic.

aggrBottomK #

Arguments

:: (StaticSerialise a, Typeable k) 
=> Closure (Dict (Ord k)) 
-> Int

Number of rows to return

-> Closure (a -> k)

Sorting key

-> Aggr a [a] 

Returns the n least elements according to a key function. Similar to: take n . sortOn (Down . f)

Warning: Ordering of the repeated elements is non-deterministic.

aggrFiltered :: Closure (a -> Bool) -> Aggr a b -> Aggr a b #

Returns a new Aggr which only aggregates rows matching the predicate, discarding others.

Creating Aggr's

aggrFromMonoid :: StaticSerialise a => Closure (Dict (Monoid a)) -> Aggr a a #

Create an aggregation given a Monoid instance.

aggrFromReduce :: StaticSerialise a => Closure (a -> a -> a) -> Aggr a (Maybe a) #

Create an aggregation given a reduce function.

Returns Nothing on empty Datasets.

aggrFromFold #

Arguments

:: (StaticSerialise t, Typeable a, Typeable b) 
=> Closure (Fold a t)

Fold to run before the shuffle

-> Closure (Fold t b)

Fold to run after the shuffle

-> Aggr a b 

Create an aggregation given two Folds.

This is the most primitive way to create an aggregation, use other methods if possible.

The first Fold will be applied on each partition, and the results will be shuffled and fed to the second Fold.