public class MergingDigest extends AbstractTDigest
This can be very fast because the cost of sorting and merging is amortized over several insertion. If we keep N centroids total and have the input array is k long, then the amortized cost is something like
N/k + log k
These costs even out when N/k = log k. Balancing costs is often a good place to start in optimizing an algorithm. For different values of compression factor, the following table shows estimated asymptotic values of N and suggested values of k:
| Compression | N | k |
| 50 | 78 | 25 |
| 100 | 157 | 42 |
| 200 | 314 | 73 |
The virtues of this kind of t-digest implementation include:
The current implementation takes the liberty of using ping-pong buffers for implementing the merge resulting in a substantial memory penalty, but the complexity of an in place merge was not considered as worthwhile since even with the overhead, the memory cost is less than 40 bytes per centroid which is much less than half what the AVLTreeDigest uses and no dynamic allocation is required at all.
| Modifier and Type | Class and Description |
|---|---|
static class |
MergingDigest.Encoding |
| Modifier and Type | Field and Description |
|---|---|
boolean |
useAlternatingSort |
boolean |
useTwoLevelCompression |
static boolean |
useWeightLimit |
| Constructor and Description |
|---|
MergingDigest(double compression)
Allocates a buffer merging t-digest.
|
MergingDigest(double compression,
int bufferSize)
If you know the size of the temporary buffer for incoming points, you can use this entry point.
|
MergingDigest(double compression,
int bufferSize,
int size)
Fully specified constructor.
|
| Modifier and Type | Method and Description |
|---|---|
void |
add(double x,
int w)
Adds a sample to a histogram.
|
void |
add(List<? extends TDigest> others) |
void |
asBytes(ByteBuffer buf)
Serialize this TDigest into a byte buffer.
|
void |
asSmallBytes(ByteBuffer buf)
Serialize this TDigest into a byte buffer.
|
int |
byteSize()
Returns the number of bytes required to encode this TDigest using #asBytes().
|
double |
cdf(double x)
Returns the fraction of all points added which are ≤ x.
|
int |
centroidCount() |
Collection<Centroid> |
centroids()
A
Collection that lets you go through the centroids in ascending order by mean. |
void |
compress()
Merges any pending inputs and compresses the data down to the public setting.
|
double |
compression()
Returns the current compression factor.
|
static MergingDigest |
fromBytes(ByteBuffer buf) |
ScaleFunction |
getScaleFunction() |
double |
quantile(double q)
Returns an estimate of a cutoff such that a specified fraction of the data
added to this TDigest would be less than or equal to the cutoff.
|
TDigest |
recordAllData()
Turns on internal data recording.
|
void |
setScaleFunction(ScaleFunction scaleFunction) |
long |
size()
Returns the number of points that have been added to this TDigest.
|
int |
smallByteSize()
Returns the number of bytes required to encode this TDigest using #asSmallBytes().
|
String |
toString() |
add, add, createCentroid, isRecordingcreateAvlTreeDigest, createDigest, createMergingDigest, getMax, getMinpublic boolean useAlternatingSort
public boolean useTwoLevelCompression
public static boolean useWeightLimit
public MergingDigest(double compression)
compression - The compression factorpublic MergingDigest(double compression,
int bufferSize)
compression - Compression factor for t-digest. Same as 1/\delta in the paper.bufferSize - How many samples to retain before merging.public MergingDigest(double compression,
int bufferSize,
int size)
compression - Compression factorbufferSize - Number of temporary centroidssize - Size of main bufferpublic TDigest recordAllData()
recordAllData in class AbstractTDigestpublic void add(double x,
int w)
TDigestpublic void compress()
public long size()
TDigestpublic double cdf(double x)
TDigestpublic double quantile(double q)
TDigestpublic int centroidCount()
centroidCount in class TDigestpublic Collection<Centroid> centroids()
TDigestCollection that lets you go through the centroids in ascending order by mean. Centroids
returned will not be re-used, but may or may not share storage with this TDigest.public double compression()
TDigestcompression in class TDigestpublic int byteSize()
TDigestpublic int smallByteSize()
TDigestsmallByteSize in class TDigestpublic ScaleFunction getScaleFunction()
public void setScaleFunction(ScaleFunction scaleFunction)
setScaleFunction in class TDigestpublic void asBytes(ByteBuffer buf)
TDigestpublic void asSmallBytes(ByteBuffer buf)
TDigestasSmallBytes in class TDigestbuf - The byte buffer into which the TDigest should be serialized.public static MergingDigest fromBytes(ByteBuffer buf)
Copyright © 2021. All rights reserved.