Struct cmb_dataset

Struct Documentation

struct cmb_dataset

An automatically resizing array of (possibly unordered) sample values, each sample a double.

Public Functions

struct cmb_dataset *cmb_dataset_create(void)

Allocate memory for a dataset.

Remember to call a matching cmb_dataset_destroy when done to avoid memory leakage.

Returns:

A freshly allocated dataset object.

void cmb_dataset_initialize(struct cmb_dataset *dsp)

Initialize the dataset, clearing any data values.

Parameters:
  • dsp – Pointer to an already allocated dataset object.

void cmb_dataset_reset(struct cmb_dataset *dsp)

Re-initialize it, returning it to a newly initialized state.

Parameters:
  • dsp – Pointer to an already allocated dataset object.

void cmb_dataset_terminate(struct cmb_dataset *dsp)

Un-initialize it, returning it to a newly created state.

Parameters:
  • dsp – Pointer to an already allocated dataset object.

uint64_t cmb_dataset_copy(struct cmb_dataset *tgt, const struct cmb_dataset *src)

Copy tgt into src, overwriting whatever was in tgt.

Parameters:
  • tgt – Pointer to the target dataset object.

  • src – Pointer to the source dataset object.

Returns:

Number of data points copied.

uint64_t cmb_dataset_merge(struct cmb_dataset *tgt, const struct cmb_dataset *s1, const struct cmb_dataset *s2)

Merge datasets s1 and s2 into dataset tgt. The target may or may not be one of the two sources, but not NULL.

Parameters:
  • tgt – Pointer to the target dataset object.

  • s1 – Pointer to the first source dataset object.

  • s2 – Pointer to the second source dataset object.

Returns:

Number of data points in the merged data set.

void cmb_dataset_destroy(struct cmb_dataset *dsp)

Free memory allocated by cmb_dataset_create for the dataset and its arrays.

Do not call unless the dataset was created on the heap by cmb_dataset_create. Otherwise, only use cmb_dataset_terminate to free the internal data array.

Parameters:
  • dsp – Pointer to a previously allocated dataset object.

void cmb_dataset_sort(const struct cmb_dataset *dsp)

Sort the data array in ascending order.

Parameters:
  • dsp – Pointer to a dataset object.

uint64_t cmb_dataset_add(struct cmb_dataset *dsp, double x)

Add a single value to a dataset, resizing the array as needed.

Parameters:
  • dsp – Pointer to a dataset object.

  • x – The new sample value to add.

Returns:

The new number of data values in the array.

uint64_t cmb_dataset_summarize(const struct cmb_dataset *dsp, struct cmb_datasummary *dsump)

Calculate summary statistics of the data series.

Parameters:
  • dsp – Pointer to a dataset object.

  • dsump – Pointer to a data summary object to store the results.

Returns:

The number of data values included in the summary.

double cmb_dataset_median(const struct cmb_dataset *dsp)

Calculate and return the median of the dataset.

May be somewhat time-consuming, since it first needs to sort the data array. Calling it on an empty dataset will generate a warning and return zero.

Parameters:
  • dsp – Pointer to a dataset object.

Returns:

The maximum data value in the data set, zero if no data yet.

void cmb_dataset_fivenum_print(const struct cmb_dataset *dsp, FILE *fp, bool lead_ins)

Calculate and print the “five-number” summary of dataset quantiles, i.e., minimum, first quartile, median, third quartile, and maximum.

Parameters:
  • dsp – Pointer to a dataset object.

  • fp – A valid file pointer, possibly stdout

  • lead_ins – Flag for whether to add lead-in texts or just print the numeric values.

void cmb_dataset_histogram_print(const struct cmb_dataset *dsp, FILE *fp, unsigned num_bins, double low_lim, double high_lim)

Print a simple character-based histogram. Will autoscale to the dataset range if LowerLimit == UpperLimit.

Will print the symbol ‘#’ for a full bar “pixel”, ‘=’ for one that is more than half full, and ‘-’ for one that is less than half full.

Adds overflow bins to the ends of the range to catch anything outside.

Parameters:
  • dsp – Pointer to a dataset object.

  • fp – A valid file pointer, possibly stdout

  • num_bins – The number of bins, not including the two overflow bins

  • low_lim – The lower limit for the bin range.

  • high_lim – The upper limit for the bin range.

void cmb_dataset_print(const struct cmb_dataset *dsp, FILE *fp)

Print the raw data values in a single column.

Parameters:
  • dsp – Pointer to a dataset object.

  • fp – A valid file pointer, possibly stdout

void cmb_dataset_ACF(const struct cmb_dataset *dsp, unsigned n, double *acf)

Calculate autocorrelation coefficients.

Parameters:
  • dsp – Pointer to a dataset object.

  • n – The highest lag value to calculate

  • acf – The array where the acf’s will be stored, size n + 1

void cmb_dataset_PACF(const struct cmb_dataset *dsp, unsigned n, double *pacf, double *acf)

Calculate partial autocorrelation coefficients.

The first and most time-consuming step in the algorithm is to calculate the ACFs. If these already have been calculated, they can be given as the last argument acf[]. If this argument is NULL, they will be calculated directly from the dataset during the call.

Parameters:
  • dsp – Pointer to a dataset object.

  • n – The highest lag value to calculate.

  • pacf – The array where the pacf’s will be stored, size n + 1

  • acf – Array of ACF’s if already calculated, size n + 1, otherwise NULL

void cmb_dataset_correlogram_print(const struct cmb_dataset *dsp, FILE *fp, unsigned n, double *acf)

Print a simple correlogram of the autocorrelation coefficients previously calculated, either ACFs or PACFs.

If the data vector acf[] is NULL, ACFs will be calculated directly from the dataset by calling cmb_dataset_ACF.

To print PACFs, give a vector of PACFs as the acf argument.

Parameters:
  • dsp – Pointer to a dataset object.

  • fp – A valid file pointer, possibly stdout

  • n – The highest lag value to calculate.

  • acf – The array where the acf’s will be stored size n + 1

Public Members

uint64_t cookie

A “magic cookie” to catch uninitialized objects

uint64_t cursize

The currently allocated space as a number of samples

uint64_t count

The current number of samples in the array

double min

Smallest sample, initially DBL_MAX

double max

Largest sample, initially -DBL_MAX

double *xa

Pointer to the actual data array, initially NULL

Public Static Functions

static inline uint64_t cmb_dataset_count(const struct cmb_dataset *dsp)

Count the number of data values.

Parameters:
  • dsp – Pointer to a dataset object.

Returns:

The number of data values in the data set.

static inline double cmb_dataset_min(const struct cmb_dataset *dsp)

The minimum sample value in the dataset.

Parameters:
  • dsp – Pointer to a dataset object.

Returns:

The minimum data value in the data set, DBL_MAX if no data yet.

static inline double cmb_dataset_max(const struct cmb_dataset *dsp)

The maximum sample value in the dataset.

Parameters:
  • dsp – Pointer to a dataset object.

Returns:

The maximum data value in the data set, -DBL_MAX if no data yet.