Package 'bipl5'

Title: Construct Reactive Calibrated Axes Biplots
Description: A modern view on the principal component analysis biplot with calibrated axes. Create principal component analysis biplots rendered in HTML with significant reactivity embedded within the plot. Furthermore, the traditional biplot view is enhanced by translated axes with inter-class kernel densities superimposed. For more information on biplots, see Gower, J.C., Lubbe, S. and le Roux, N.J. (2011, ISBN: 978-0-470-01255-0).
Authors: Ruan Buys [aut, cre, cph] (ORCID: <https://orcid.org/0000-0001-8527-8631>), Carel van der Merwe [aut, ths] (ORCID: <https://orcid.org/0000-0003-0676-8240>), Delia Sandilands [ctb] (ORCID: <https://orcid.org/0000-0001-9419-7286>), Sugnet Lubbe [ctb] (ORCID: <https://orcid.org/0000-0003-2762-9944>)
Maintainer: Ruan Buys <[email protected]>
License: MIT + file LICENSE
Version: 1.1.0
Built: 2026-05-19 16:55:15 UTC
Source: https://github.com/ruanbuys/bipl5

Help Index


Append a mdsDisplay to a bipl5_biplot object

Description

Adds a new biplot layer for a specified pair of principal components. The pair is sorted automatically (e.g. c(5, 3) becomes c(3, 5)). Both PC indices must be between 1 and p (the number of variables), and the pair must not already exist.

Usage

append_mdsDisplay(object, eigenvectors)

## S3 method for class 'bipl5_biplot'
append_mdsDisplay(object, eigenvectors)

Arguments

object

A bipl5_biplot object

eigenvectors

Integer vector of length 2 giving the PC pair (e.g. c(4, 5))

Value

A new bipl5_biplot with the additional mdsDisplay appended


bipl5: Constructing Reactive Calibrated Axes Biplots

Description

A modern view on PCA biplot with calibrated axes. Create PCA biplots rendered in HTML with significant reactivity embedded on the plot. Furthermore, the traditional biplot view is enhanced by translated axes with interclass kernel densities superimposed. For more information on biplots, see Gower, J.C., Lubbe, S. and le Roux, N.J. (2011, ISBN: 978-0-470-01255-0)

Details

Package: bipl5
Type: Package
Version: 1.1
Date: 18-12-2023
License: MIT

Author(s)

  • Ruan Buys (Maintainer)

  • Carel van der Merwe

  • Delia Sandilands (contributer)

  • Sugnet Lubbe (contributer)

Core Functions

Code Availability

The newest version of the package can be obtained on GitHub: https://github.com/RuanBuys/bipl5


bipl5 default color scales

Description

bipl5 default color scales

Usage

colorpal(number = 16)

Arguments

number

Integer - number of distinct colors to return. Ranges from 1 to 16.

Value

Character vector of default colors in bipl5. There are sixteen unique colors defined.

Examples

colorpal(number=7)

Extract nested components from a bipl5_biplot object

Description

Three calling styles are supported:

  1. mdsDisplay subset: extract(bp, mdsDisplay_12) — returns a new bipl5_biplot containing only that mdsDisplay (plottable).

  2. Two-level: extract(bp, from = mdsDisplay_12, what = sample_coordinates) — returns the nested data element.

  3. Arbitrary depth: extract(bp, mdsDisplay_12$Data$sample_coordinates) — returns the nested data element.

Usage

extract(object, expr, from, what)

## S3 method for class 'bipl5_biplot'
extract(object, expr, from, what)

Arguments

object

A bipl5_biplot object

expr

An unquoted mdsDisplay name (e.g. mdsDisplay_12) or a path expression using $ (e.g. mdsDisplay_12$Data$sample_coordinates or fit_measures$CumPred)

from

Unquoted name of the top-level element

what

Unquoted name of the nested element

Details

In addition to the mdsDisplay access patterns above, graph-based fit measures can be extracted directly with calls such as extract(bp, fit_measures, CumPred) or extract(bp, fit_measures$CumPred). Supported graph-based fit measures are CumPred, CumAd, VarExp, and Scree. These calls return a bipl5_fit object that can be passed to plot() to obtain a static ggplot2 version of the corresponding fit graph.

Value

A bipl5_biplot (mdsDisplay subset), a bipl5_fit object for graph-based fit measures, or the requested sub-element.

Examples

bp <- biplotEZ::biplot(iris[, 1:4]) |>
  biplotEZ::PCA() |>
  wrap_bipl5()

only_12 <- extract(bp, mdsDisplay_12)
data_obj <- extract(bp, from = mdsDisplay_12, what = Data)
coords <- extract(bp, mdsDisplay_12$Data$sample_coordinates)

fit_plot <- extract(bp, fit_measures, CumPred)
plot(fit_plot)

Format sample aesthetics on a bipl5_biplot

Description

format_samples() rebuilds the sample-trace block inside each mdsDisplay so that observations are grouped by by and rendered with one trace per visual class. This means the visible trace structure, legend labels, and stored sample-format metadata all stay aligned.

Usage

format_samples(
  x,
  stratify = c("col", "symbol"),
  by = NULL,
  col = NULL,
  pch = NULL
)

Arguments

x

A bipl5_biplot object.

stratify

Which aesthetic to change: "col" for marker colour or "symbol" for marker symbol.

by

Optional grouping variable for the sample traces. This can be:

  • a bare column name stored in the dataset supplied to init_biplot()

  • a single character column name stored in the object, or

  • a vector/factor of length n, one value per observation.

When NULL, the current sample grouping in x$meta$group is reused.

col

Optional vector of colours. When stratify = "col", this must have one value per visual class defined by by. If omitted, a default palette is used.

pch

Optional vector of plotting symbols. When stratify = "symbol", this must have one value per visual class defined by by. Numeric base-R pch codes are converted internally to plotly symbols; character plotly symbol names are also accepted.

Details

The function is intended for sample formatting only. It does not refit the underlying ordination model. In particular, for CVA biplots the fitted CVA classes are preserved and only the sample traces are reformatted.

A first call to format_samples() creates one sample legend section for the requested aesthetic. For example, format_samples(stratify = "col", by = Species) will colour the observations by Species and create a legend section headed Species with one entry per class.

A second call can be used to add a second, independent sample stratification. If the second call uses the same grouping variable as the first call, both aesthetics are applied to the same set of classes and the legend remains unified. If the second call uses a different grouping variable, format_samples() creates a second legend section and internally splits the observation layer into all observed combinations of the two grouping variables.

For example, the sequence

init_biplot(iris2) |> scale_mds("pca") |> format_samples(stratify = "col", by = Species) |> format_samples(stratify = "symbol", by = Band)

will produce two sample legend sections:

  • Species for the colour grouping

  • Band for the symbol grouping

The visible observation traces are then split by ⁠Species x Band⁠, but these combination traces are hidden from the legend. Instead, format_samples() inserts legend-only sample entries so the legend remains easy to read.

If translated axes are available in the mdsDisplay, a colour stratification also rebuilds the kernel-density traces on the translated axes so that those densities reflect the colour classes. Symbol-only stratification does not change the translated-axis densities. This means:

  • format_samples(stratify = "col", ...) recalculates translated-axis densities by the colour grouping

  • format_samples(stratify = "symbol", ...) leaves the existing translated densities unchanged

The legend toggles operate across the full dual stratification:

  • clicking a colour legend entry hides or shows all observations belonging to that colour class, across every symbol class

  • clicking a symbol legend entry hides or shows all observations belonging to that symbol class, across every colour class

The formatting is applied to every mdsDisplay currently stored in the object. If additional displays are later added with append_mdsDisplay(), the stored sample-format state is reused so the new displays inherit the same sample legend structure.

format_samples() supports two complementary workflows.

⁠Single stratification⁠

A single call to format_samples() rebuilds the sample layer so that one trace is created per class in by. This updates the marker appearance, the legend entries, and the stored sample-format metadata consistently.

⁠Second stratification⁠

A second call to format_samples() can be used to add a second sample aesthetic. This is most useful when colour and plotting symbol represent different variables.

If the second call uses the same grouping structure as the first, the result is still one legend section with one entry per class, but each class now carries both a colour and a plotting symbol.

If the second call uses a different grouping structure, the object stores two independent sample legend sections. Internally, the observation layer is rebuilt as one hidden trace per observed combination of the two grouping variables. The visible legend then shows one section for each stratifying variable.

⁠Translated-axis densities⁠

When translated axes are present, the kernel-density traces on those axes are tied to the current colour grouping. Applying format_samples(stratify = "col", ...) rebuilds the translated-axis density traces so they match the colour classes. Applying format_samples(stratify = "symbol", ...) does not rebuild those densities.

So:

  • a first colour stratification updates both the sample layer and the translated-axis densities

  • a later symbol stratification leaves those densities as they are

  • if a symbol stratification is applied first and a colour stratification is added later, the translated-axis densities are rebuilt when the colour stratification is added

⁠Legend click behaviour⁠

When two different stratifications are active, the legend entries behave like filters:

  • clicking a class in the first legend section toggles all observations in that class, regardless of their membership in the second stratification

  • clicking a class in the second legend section toggles all observations in that class, regardless of their membership in the first stratification

So if colours represent Species and symbols represent Band, clicking setosa hides all setosa observations, while clicking class1 hides all class1 observations across every species.

⁠Non-standard evaluation⁠

If by is supplied as a bare column name, format_samples() looks for that column in the dataset stored by init_biplot(). If by is supplied as a character string, it is interpreted as the name of a stored column. If by is supplied as a vector, it must have one value per observation; in that case the legend title defaults to "Data" because there is no stored column name to display.

⁠CVA note⁠

format_samples() does not change the fitted CVA model. It only reformats the sample traces. The grouping used to fit the CVA model should therefore be specified in scale_mds(), not in format_samples().

Value

A modified bipl5_biplot.

Examples

bp <- init_biplot(iris) |>
  scale_mds(type = "pca", eigenvectors = c(1, 2))

bp_species <- format_samples(
  bp,
  stratify = "col",
  by = Species,
  col = c("tomato", "steelblue", "darkgreen")
)

sample_idx <- vapply(
  bp_species$mdsDisplay_12$mdsDisplay$trace_data,
  function(tr) "data" %in% unlist(tr$meta),
  logical(1)
)

vapply(
  bp_species$mdsDisplay_12$mdsDisplay$trace_data[sample_idx],
  `[[`,
  character(1),
  "name"
)

bp_symbol <- format_samples(
  bp,
  stratify = "symbol",
  by = Species,
  pch = c(16, 17, 15)
)

iris2 <- iris
iris2$Band <- factor(
  rep(c("class1", "class2", "class3", "class4"), length.out = nrow(iris2))
)

bp_dual <- init_biplot(iris2) |>
  scale_mds(type = "pca", eigenvectors = c(1, 2)) |>
  format_samples(
    stratify = "col",
    by = Species,
    col = c("tomato", "steelblue", "darkgreen")
  ) |>
  format_samples(
    stratify = "symbol",
    by = Band,
    pch = c(12, 13, 14, 15)
  )

# When plotted, the legend now has one section for Species and one for Band.
# Clicking a Species entry hides that species across all Band classes.
# Clicking a Band entry hides that Band class across all Species classes.
if (interactive()) {
  plot(bp_dual)
}

bp_species_13 <- append_mdsDisplay(bp_species, c(1, 3))

Create a bipl5 specification object

Description

init_biplot() stores the raw data and preprocessing options needed to construct a biplot later with scale_mds(). It does not perform any ordination itself. When data is a data frame containing both numeric and non-numeric columns, only the numeric columns are used for the biplot calculation, while the full data frame is retained for later formatting steps such as format_samples().

Usage

init_biplot(data, center = TRUE, scale = FALSE)

Arguments

data

A matrix or data frame. If a data frame contains non-numeric columns, they are stored but excluded from the ordination input.

center

Logical; should numeric variables be centered before analysis?

scale

Logical; should numeric variables be scaled before analysis?

Value

An object of class bipl5_spec.


Store a default fit-measure display mode on a biplot

Description

overlay_fit() is a convenience helper for pipelines. It does not refit the underlying ordination; it only stores whether fit measures should default to the right-hand panel or an overlay view when plot() is called.

Usage

overlay_fit(x, overlay = TRUE)

Arguments

x

A bipl5_biplot object with fit measures.

overlay

Logical scalar. TRUE stores "overlay"; FALSE stores "panel".

Details

A later call to plot(x, fit_display = ...) always takes precedence over the stored default.

Value

A modified bipl5_biplot.


Plot a bipl5_biplot object

Description

Initialises a plotly graph, populates it with the first available mdsDisplay traces and annotations, then attaches the remaining mdsDisplays and fit measures to the JavaScript event handler.

Usage

## S3 method for class 'bipl5_biplot'
plot(x, y = NULL, fit_display = c("inherit", "panel", "overlay"), ...)

Arguments

x

A bipl5_biplot object

y

Ignored (for S3 consistency)

fit_display

How fit measures should be shown for biplots that support them: inherit the object's stored preference, render them in the right-hand panel, or render them as an overlay over the full plot width.

...

Additional arguments (ignored)

Value

A plotly htmlwidget


Plot a single extracted fit graph as a ggplot

Description

Reconstructs one of the PCA fit graphs from its stored plotly traces. The fit type is inferred from the trace metadata and trace types, then translated into a ggplot2 chart with matching title, legend titles, and axes.

Usage

## S3 method for class 'bipl5_fit'
plot(x, y = NULL, ...)

Arguments

x

A bipl5_fit object, typically returned by extract(bp, fit_measures, CumPred) or a similar fit graph extraction.

y

Ignored (for S3 consistency)

...

Additional arguments (ignored)

Details

Supported fit graphs are cumulative predictivity (CumPred), cumulative adequacy (CumAd), variance explained (VarExp), and the scree plot (Scree). The summary-table fit objects are not handled by this plotting method.

Value

A ggplot object.

Examples

bp <- biplotEZ::biplot(iris[, 1:4]) |>
  biplotEZ::PCA() |>
  wrap_bipl5()

fit_plot <- extract(bp, fit_measures, Scree)
plot(fit_plot)

Print a bipl5_biplot object as a tree diagram

Description

Print a bipl5_biplot object as a tree diagram

Usage

## S3 method for class 'bipl5_biplot'
print(x, ...)

Arguments

x

A bipl5_biplot object

...

Additional arguments (ignored)

Value

Invisibly returns x


Print a bipl5_data object

Description

Print a bipl5_data object

Usage

## S3 method for class 'bipl5_data'
print(x, ...)

Arguments

x

A bipl5_data object

...

Additional arguments (ignored)

Value

Invisibly returns x


Print a bipl5_fitmeasures object

Description

Print a bipl5_fitmeasures object

Usage

## S3 method for class 'bipl5_fitmeasures'
print(x, ...)

Arguments

x

A bipl5_fitmeasures object

...

Additional arguments (ignored)

Value

Invisibly returns x


Print a bipl5_mdsDisplay object

Description

Print a bipl5_mdsDisplay object

Usage

## S3 method for class 'bipl5_mdsDisplay'
print(x, ...)

Arguments

x

A bipl5_mdsDisplay object

...

Additional arguments (ignored)

Value

Invisibly returns x


Remove a mdsDisplay from a bipl5_biplot object

Description

Returns a new bipl5_biplot with the specified mdsDisplay (and its corresponding fit table) removed. At least one mdsDisplay must remain.

Usage

remove_mdsDisplay(object, mdsDisplay)

## S3 method for class 'bipl5_biplot'
remove_mdsDisplay(object, mdsDisplay)

Arguments

object

A bipl5_biplot object

mdsDisplay

Unquoted name of the mdsDisplay to remove (e.g. mdsDisplay_13)

Value

A new bipl5_biplot without the removed mdsDisplay


Scale a biplot specification into a bipl5_biplot

Description

scale_mds() turns a bipl5_spec created by init_biplot() into a fully formed bipl5_biplot by dispatching to one of the underlying biplotEZ::PCA(), biplotEZ::CVA(), biplotEZ::PCO(), or biplotEZ::regress() methods and then compiling only the requested mdsDisplay. Any additional displays can be added later with append_mdsDisplay().

Usage

scale_mds(x, type = c("pca", "cva", "pco", "regress"), ...)

## S3 method for class 'bipl5_spec'
scale_mds(x, type = c("pca", "cva", "pco", "regress"), ...)

Arguments

x

A bipl5_spec created by init_biplot().

type

The biplot method to construct. One of "pca", "cva", "pco", "regress", or "regression".

...

Additional named arguments for the chosen method.

Details

The type argument chooses the underlying biplot method. Additional arguments are method-specific and should be supplied via ....

Supported aliases in ...:

  • Common: classes, group_aes / group.aes, title / Title

  • PCA: dimensions / dim.biplot, eigenvectors / e.vects, show_class_means / show.class.means / show_group_means / show.group.means, correlation_biplot / correlation.biplot

  • CVA: classes, dimensions / dim.biplot, eigenvectors / e.vects, weighted_cva / weightedCVA, show_class_means / show.class.means / show_group_means / show.group.means, low_dim / low.dim

  • PCO: Dmat / dist_mat, dist_func / dist.func, dist_func_cat / dist.func.cat, dimensions / dim.biplot, eigenvectors / e.vects, show_class_means / show.class.means / show_group_means / show.group.means, axes

  • regress: Z / z, show_group_means / show.group.means / show_class_means / show.class.means, axes

For type = "pco", any remaining named arguments in ... are forwarded to the chosen distance function.

Value

A fully formed bipl5_biplot.


Retrieve all valid plotting symbols for the plotly library

Description

Retrieve all valid plotting symbols for the plotly library

Usage

Symbol_List()

Value

A vector of all the valid plotting symbols used in the plot_ly library.

Examples

Symbol_List()

Convert a biplotEZ object to a bipl5_biplot

Description

Convert a biplotEZ object to a bipl5_biplot

Usage

wrap_bipl5(x)

Arguments

x

A biplotEZ biplot object

Value

An object of class bipl5_biplot


Construct a bipl5_biplot from a CVA biplot

Description

Builds mdsDisplays for the user's CV pair and available supplementary pairs, along with a dropdown menu. Fit measures are not yet computed for CVA biplots and will be NULL. Plotting is deferred to plot.bipl5_biplot.

Usage

## S3 method for class 'CVA'
wrap_bipl5(x)

Arguments

x

An object of class biplot from the biplotEZ package with CVA method applied.

Value

An object of class c("bipl5_biplot", "cva")

Examples

## Not run: 
library(biplotEZ)
bp <- biplot(iris[, 1:4]) |> CVA(classes = iris[, 5]) |> wrap_bipl5()
bp
plot(bp)

## End(Not run)

Construct a bipl5_biplot from a PCA biplot

Description

Builds the mdsDisplay(s) used for a principal component analysis (PCA) biplot and documents the associated PCA-biplot fit, predictivity and direct-reading measures. In contrast to a regression biplot, the low-dimensional sample map is obtained internally from the singular value decomposition of the processed data matrix. If the wrapped object stores more than one principal-component pair as separate mdsDisplays, the same formulas below apply to each mdsDisplay separately, with the active mdsDisplay determined by the displayed pair of principal components.

Usage

## S3 method for class 'PCA'
wrap_bipl5(x)

Arguments

x

An object of class biplot from the biplotEZ package with PCA() method applied.

Details

For the PCA biplot handled by this method, let XRn×p\mathbf{X}\in\mathbb{R}^{n\times p} denote the processed data matrix stored in the input biplot object after centring and any optional scaling performed by biplot(). Thus X\mathbf{X} is the matrix on which PCA is actually carried out. Write the singular value decomposition as

X=UDV,\mathbf{X} = \mathbf{U}\mathbf{D}\mathbf{V}^{\top},

where UU=I\mathbf{U}^{\top}\mathbf{U}=\mathbf{I}, VV=I\mathbf{V}^{\top}\mathbf{V}=\mathbf{I}, and D=diag(d1,,dq)\mathbf{D}=\mathrm{diag}(d_1,\ldots,d_q) with q=rank(X)q=\mathrm{rank}(\mathbf{X}) and d1dq0d_1\ge \cdots \ge d_q \ge 0. The columns of V\mathbf{V} are the principal directions, and the corresponding principal component score vectors are zt=dtut=Xvt\mathbf{z}_t = d_t\mathbf{u}_t = \mathbf{X}\mathbf{v}_t, t=1,,qt=1,\ldots,q. This is the standard PCA biplot construction underlying Gabriel's original formulation and subsequent calibrated-axis developments (Gabriel, 1971; Gower and Hand, 1996; Gower, Lubbe and le Roux, 2011; Greenacre, 2010).

Suppose the user has selected two principal components a<ba<b. Let Jab\mathbf{J}_{ab} denote the diagonal q×qq\times q selector matrix with ones in positions aa and bb and zeros elsewhere. Then the two-dimensional PCA fitted matrix is

X^ab=UDJabV=XVJabV.\widehat{\mathbf{X}}_{ab} = \mathbf{U}\mathbf{D}\mathbf{J}_{ab}\mathbf{V}^{\top} = \mathbf{X}\mathbf{V}\mathbf{J}_{ab}\mathbf{V}^{\top}.

Equivalently, if Uab\mathbf{U}_{ab} and Vab\mathbf{V}_{ab} denote the submatrices containing columns aa and bb, and Dab=diag(da,db)\mathbf{D}_{ab}=\mathrm{diag}(d_a,d_b), then

X^ab=UabDabVab=dauava+dbubvb.\widehat{\mathbf{X}}_{ab} = \mathbf{U}_{ab}\mathbf{D}_{ab}\mathbf{V}_{ab}^{\top} = d_a\mathbf{u}_a\mathbf{v}_a^{\top} + d_b\mathbf{u}_b\mathbf{v}_b^{\top}.

When (a,b)=(1,2)(a,b)=(1,2), this is the best rank-2 approximation to X\mathbf{X} in Frobenius norm by the Eckart–Young theorem. For any other selected pair, the same formula gives the orthogonal projection of X\mathbf{X} onto the chosen two-dimensional principal-component subspace, but it is not generally the globally optimal rank-2 approximation (Eckart and Young, 1936; Gabriel, 1971; Greenacre, 2010).

The calibrated-axis PCA biplot may be written in the general form

X^ab=ZabHab,\widehat{\mathbf{X}}_{ab} = \mathbf{Z}_{ab}\mathbf{H}_{ab}^{\top},

where the exact factorization depends on the type of PCA biplot being displayed.

For the ordinary PCA biplot, which prioritizes the Euclidean geometry of the sample points, take

Zab=UabDab,Hab=Vab.\mathbf{Z}_{ab} = \mathbf{U}_{ab}\mathbf{D}_{ab}, \qquad \mathbf{H}_{ab} = \mathbf{V}_{ab}.

Thus the displayed sample coordinates are the selected principal component scores, and the fitted matrix is

X^ab=ZabHab.\widehat{\mathbf{X}}_{ab} = \mathbf{Z}_{ab}\mathbf{H}_{ab}^{\top}.

If h(j)\mathbf{h}_{(j)} denotes the jjth row of Hab\mathbf{H}_{ab} written as a column vector in R2\mathbb{R}^2, then for sample ii

x^ij=zih(j).\widehat{x}_{ij} = \mathbf{z}_{i}^{\top}\mathbf{h}_{(j)}.

Hence the calibrated axis for variable jj has direction h(j)\mathbf{h}_{(j)}, and the point on that axis corresponding to marker value μ\mu is

pμj=μh(j)h(j)h(j).\mathbf{p}_{\mu j} = \frac{\mu}{\mathbf{h}_{(j)}^{\top}\mathbf{h}_{(j)}}\mathbf{h}_{(j)}.

This is the standard calibrated-axis formula used to place tick marks and to recover predicted values from projections onto the displayed axis (Gower and Hand, 1996; Gower, Lubbe and le Roux, 2011).

A second important special case is the correlation biplot, obtained when the processed matrix X\mathbf{X} is standardized and the display is chosen so that correlations between variables are approximated by the cosines of the angles between the displayed variable directions. In that case one may equivalently factorize

X^ab=Uab(VabDab).\widehat{\mathbf{X}}_{ab} = \mathbf{U}_{ab}(\mathbf{V}_{ab}\mathbf{D}_{ab})^{\top}.

Hence the displayed sample coordinates are Uab\mathbf{U}_{ab} and the variable directions are the rows of VabDab\mathbf{V}_{ab}\mathbf{D}_{ab}. If c(j)\mathbf{c}_{(j)} denotes the jjth such row written as a column vector, then

x^ij=ui,abc(j).\widehat{x}_{ij} = \mathbf{u}_{i,ab}^{\top}\mathbf{c}_{(j)}.

In this standardized setting the coordinates c(j)\mathbf{c}_{(j)} are proportional to the correlations of variable jj with the selected principal components, and the geometry of the displayed variable directions is therefore tuned to the correlation structure rather than to the raw score geometry of the samples. This is the sense in which correlation.biplot = TRUE preserves variable-correlation information in the display (Gabriel, 1971; Greenacre, 2010; biplotEZ manual and vignette).

In either factorization, all predicted values in X^ab\widehat{\mathbf{X}}_{ab} are on the same centred/scaled scale as the stored matrix X\mathbf{X}. If required, predictions can be back-transformed to the original variable scale using the means and standard deviations stored in the input biplot object.

A fundamental feature of the PCA biplot is that both sample-side and variable-side orthogonal decompositions hold. Writing Eab=XX^ab\mathbf{E}_{ab}=\mathbf{X}-\widehat{\mathbf{X}}_{ab}, one has

XX=X^abX^ab+EabEab,\mathbf{X}\mathbf{X}^{\top} = \widehat{\mathbf{X}}_{ab}\widehat{\mathbf{X}}_{ab}^{\top} + \mathbf{E}_{ab}\mathbf{E}_{ab}^{\top},

and

XX=X^abX^ab+EabEab.\mathbf{X}^{\top}\mathbf{X} = \widehat{\mathbf{X}}_{ab}^{\top}\widehat{\mathbf{X}}_{ab} + \mathbf{E}_{ab}^{\top}\mathbf{E}_{ab}.

The first is the Type A orthogonality, which justifies sample-side measures of fit. The second is the Type B orthogonality, which justifies variable-side measures of fit. For PCA both orthogonality relations hold simultaneously because X^ab\widehat{\mathbf{X}}_{ab} is obtained from an orthogonal principal-component projection (Gabriel, 1971; Gower, Lubbe and le Roux, 2011; Gardner-Lubbe, le Roux and Gower, 2008).

Let xi\mathbf{x}_{i\cdot}^{\top} denote row ii of X\mathbf{X} and x^i\widehat{\mathbf{x}}_{i\cdot}^{\top} the corresponding row of X^ab\widehat{\mathbf{X}}_{ab}. The sample predictivity of sample ii is then

ψi=x^i2xi2=1xix^i2xi2,i=1,,n.\psi_i = \frac{\|\widehat{\mathbf{x}}_{i\cdot}\|^2} {\|\mathbf{x}_{i\cdot}\|^2} = 1 - \frac{\|\mathbf{x}_{i\cdot} - \widehat{\mathbf{x}}_{i\cdot}\|^2} {\|\mathbf{x}_{i\cdot}\|^2}, \qquad i=1,\ldots,n.

Thus ψi\psi_i is the proportion of the sum of squares of sample ii reproduced by the chosen two-dimensional PCA display. Because of Type A orthogonality, 0ψi10\le \psi_i \le 1. Samples with ψi\psi_i near one lie close to the displayed PCA plane, whereas samples with ψi\psi_i near zero lie largely orthogonal to it. This is the sample-side fit measure used in the PCA biplot literature and in biplotEZ (Gardner-Lubbe, le Roux and Gower, 2008; biplotEZ vignette).

Let x(j)\mathbf{x}_{(j)} denote column jj of X\mathbf{X} and x^(j)\widehat{\mathbf{x}}_{(j)} the corresponding column of X^ab\widehat{\mathbf{X}}_{ab}. The axis predictivity of variable jj is

ϕj=x^(j)2x(j)2=1x(j)x^(j)2x(j)2,j=1,,p.\phi_j = \frac{\|\widehat{\mathbf{x}}_{(j)}\|^2} {\|\mathbf{x}_{(j)}\|^2} = 1 - \frac{\|\mathbf{x}_{(j)} - \widehat{\mathbf{x}}_{(j)}\|^2} {\|\mathbf{x}_{(j)}\|^2}, \qquad j=1,\ldots,p.

Thus ϕj\phi_j is the proportion of the sum of squares of variable jj reproduced by the chosen PCA plane. Because of Type B orthogonality, 0ϕj10\le \phi_j \le 1. In a calibrated-axis display, ϕj\phi_j is the natural sum-of-squares measure of how well the axis for variable jj reproduces the underlying processed values. This is the quantity reported in biplotEZ as “axis predictivity” (Gardner-Lubbe, le Roux and Gower, 2008; biplotEZ vignette; Greenacre, 2010).

The overall quality of the displayed PCA subspace is

Rdisp,ab2=X^abF2XF2=1XX^abF2XF2.R^2_{\mathrm{disp},ab} = \frac{\|\widehat{\mathbf{X}}_{ab}\|_F^2}{\|\mathbf{X}\|_F^2} = 1 - \frac{\|\mathbf{X} - \widehat{\mathbf{X}}_{ab}\|_F^2} {\|\mathbf{X}\|_F^2}.

Since X^abF2=da2+db2\|\widehat{\mathbf{X}}_{ab}\|_F^2 = d_a^2+d_b^2, this may also be written as

Rdisp,ab2=da2+db2t=1qdt2.R^2_{\mathrm{disp},ab} = \frac{d_a^2+d_b^2}{\sum_{t=1}^{q} d_t^2}.

In particular, when (a,b)=(1,2)(a,b)=(1,2), this is the familiar proportion of total sum of squares explained by the first two principal components. More generally, it is the quality of the specific displayed pair chosen by the user, matching the biplotEZ quality measure (Gabriel, 1971; Greenacre, 2010; biplotEZ vignette).

Because both Type A and Type B orthogonality hold, the overall quality can be expressed as a weighted average on either the sample side or the variable side. On the variable side,

Rdisp,ab2=j=1pwjϕj,R^2_{\mathrm{disp},ab} = \sum_{j=1}^{p} w_j\phi_j,

where

wj=x(j)2XF2,j=1pwj=1.w_j = \frac{\|\mathbf{x}_{(j)}\|^2}{\|\mathbf{X}\|_F^2}, \qquad \sum_{j=1}^{p} w_j = 1.

Hence variables with larger sums of squares contribute more to the overall display quality. If the original call to biplot() used scale = TRUE, so that all processed variables have equal sums of squares, then

Rdisp,ab2=1pj=1pϕj.R^2_{\mathrm{disp},ab} = \frac{1}{p}\sum_{j=1}^{p}\phi_j.

Thus, for a standardized PCA biplot, the overall quality is the simple average of the individual axis predictivities. This is the weighted-average interpretation requested by the present wrapper.

Similarly, on the sample side,

Rdisp,ab2=i=1nmiψi,R^2_{\mathrm{disp},ab} = \sum_{i=1}^{n} m_i\psi_i,

where

mi=xi2XF2,i=1nmi=1.m_i = \frac{\|\mathbf{x}_{i\cdot}\|^2}{\|\mathbf{X}\|_F^2}, \qquad \sum_{i=1}^{n} m_i = 1.

Hence the same overall display quality may be read as a weighted average of sample predictivities or as a weighted average of axis predictivities (Gardner-Lubbe, le Roux and Gower, 2008).

Unlike the regression-biplot case, no ordered orthogonalization is required to decompose the quality of a PCA display into separate contributions from the two displayed dimensions, because principal components are already mutually orthogonal. Indeed,

X^ab=dauava+dbubvb,\widehat{\mathbf{X}}_{ab} = d_a\mathbf{u}_a\mathbf{v}_a^{\top} + d_b\mathbf{u}_b\mathbf{v}_b^{\top},

and these two rank-1 parts are orthogonal in Frobenius inner product. Hence

Rdisp,ab2=Ra2+Rb2,R^2_{\mathrm{disp},ab} = R^2_a + R^2_b,

where

Ra2=da2XF2,Rb2=db2XF2.R^2_a = \frac{d_a^2}{\|\mathbf{X}\|_F^2}, \qquad R^2_b = \frac{d_b^2}{\|\mathbf{X}\|_F^2}.

Thus the contribution of each displayed principal component is obtained directly from its singular value.

The same orthogonal decomposition yields a per-component breakdown of each axis predictivity. Since

x^(j)=davjaua+dbvjbub,\widehat{\mathbf{x}}_{(j)} = d_a v_{ja}\mathbf{u}_a + d_b v_{jb}\mathbf{u}_b,

with orthogonal components ua\mathbf{u}_a and ub\mathbf{u}_b, one has

ϕj=ϕja+ϕjb,\phi_j = \phi_{ja} + \phi_{jb},

where

ϕja=da2vja2x(j)2,ϕjb=db2vjb2x(j)2.\phi_{ja} = \frac{d_a^2 v_{ja}^2}{\|\mathbf{x}_{(j)}\|^2}, \qquad \phi_{jb} = \frac{d_b^2 v_{jb}^2}{\|\mathbf{x}_{(j)}\|^2}.

Hence the predictivity of axis jj can be decomposed exactly into the separate contributions of the two displayed principal components. This is the PCA analogue of the dimension-wise decomposition used elsewhere in the biplot literature, but here it is especially simple because the components are orthogonal from the outset. In particular, if the same variable is well aligned with one selected principal direction but not the other, this will be visible in the separate values ϕja\phi_{ja} and ϕjb\phi_{jb}.

Likewise, each sample predictivity decomposes as

ψi=ψia+ψib,\psi_i = \psi_{ia} + \psi_{ib},

where

ψia=da2uia2xi2,ψib=db2uib2xi2.\psi_{ia} = \frac{d_a^2 u_{ia}^2}{\|\mathbf{x}_{i\cdot}\|^2}, \qquad \psi_{ib} = \frac{d_b^2 u_{ib}^2}{\|\mathbf{x}_{i\cdot}\|^2}.

Thus the contribution of each displayed principal component may be read not only globally through Ra2R^2_a and Rb2R^2_b, but also locally through the sample-wise contributions ψia\psi_{ia} and ψib\psi_{ib} and the axis-wise contributions ϕja\phi_{ja} and ϕjb\phi_{jb} (Gardner-Lubbe, le Roux and Gower, 2008).

In addition to the sum-of-squares fit measures above, this method may also report direct-reading diagnostics in the sense of Alves (2012). These quantities serve a different purpose from the predictivities ϕj\phi_j and ψi\psi_i. The predictivities measure how much of the variation in X\mathbf{X} is reproduced by the selected PCA plane in a sum-of-squares sense. By contrast, the Alves diagnostics measure how accurately values can be read directly from the displayed calibrated axes in the current two-dimensional map. This distinction is central in the predictive biplot literature (Alves, 2012).

Let g(j)R2\mathbf{g}_{(j)}\in\mathbb{R}^2 denote the displayed direction of variable axis jj under the active PCA factorization. Thus g(j)=h(j)\mathbf{g}_{(j)}=\mathbf{h}_{(j)} for the ordinary PCA biplot and g(j)=c(j)\mathbf{g}_{(j)}=\mathbf{c}_{(j)} for the correlation biplot. Let ziR2\mathbf{z}_i\in\mathbb{R}^2 denote the corresponding displayed sample coordinate of sample ii. Then the value read from the graph on axis jj for sample ii is

x^ij=zig(j),\widehat{x}_{ij} = \mathbf{z}_{i}^{\top}\mathbf{g}_{(j)},

and the point on the calibrated axis corresponding to that reading is

pij=x^ijg(j)g(j)g(j).\mathbf{p}_{ij} = \frac{\widehat{x}_{ij}} {\mathbf{g}_{(j)}^{\top}\mathbf{g}_{(j)}}\mathbf{g}_{(j)}.

Thus the direct reading from the displayed PCA axis coincides exactly with the fitted value from the active two-dimensional PCA approximation.

Let sjs_j denote the standard deviation used to standardize variable jj. When the processed matrix X\mathbf{X} is already standardized, sj=1s_j=1. The pointwise direct-reading error for sample ii on axis jj is then

δij=xijx^ijsj.\delta_{ij} = \frac{|x_{ij}-\widehat{x}_{ij}|}{s_j}.

If X\mathbf{X} is already standardized, then δij=xijx^ij\delta_{ij}=|x_{ij}-\widehat{x}_{ij}|. The corresponding axis-level mean direct-reading error is

δˉj=1ni=1nδij=1ni=1nxijx^ijsj.\bar{\delta}_j = \frac{1}{n}\sum_{i=1}^{n}\delta_{ij} = \frac{1}{n}\sum_{i=1}^{n} \frac{|x_{ij}-\widehat{x}_{ij}|}{s_j}.

This is the two-dimensional PCA-biplot analogue of the mean standard predictive error of Alves (2012).

Let τaxis>0\tau_{\mathrm{axis}} > 0 be a user-specified tolerance parameter for axis-level direct-reading error. Then the Alves selection rule becomes

retain axis jδˉjτaxis.\text{retain axis }j \quad\Longleftrightarrow\quad \bar{\delta}_j \le \tau_{\mathrm{axis}}.

Likewise, for an observation-level tolerance parameter τunits>0\tau_{\mathrm{units}} > 0, sample ii is flagged with respect to axis jj whenever

δij>τunits.\delta_{ij} > \tau_{\mathrm{units}}.

Hence δˉj\bar{\delta}_j may be used for axis selection and δij\delta_{ij} for observation-level checking, exactly as in the predictive PCA-biplot framework of Alves (2012).

The Alves diagnostics and the PCA predictivities are complementary. The quantity ϕj\phi_j answers the question: “How much of variable jj's sum of squares is reproduced by the selected PCA plane?” The quantity ψi\psi_i answers the corresponding sample-side question: “How much of sample ii's sum of squares is reproduced by the selected PCA plane?” By contrast, δˉj\bar{\delta}_j answers the distinct question: “How accurately can values of variable jj be read directly from the displayed calibrated axis?” Consequently, an axis may have high ϕj\phi_j yet still have non-negligible direct-reading error in the current display, while an axis with only moderate ϕj\phi_j may nevertheless admit acceptable direct readings. In this implementation, the ϕj\phi_j- and ψi\psi_i-families are the primary sum-of-squares fit measures, whereas the Alves quantities δˉj\bar{\delta}_j and δij\delta_{ij} provide supplementary, display-specific diagnostics.

In the wrapped bipl5_biplot object, these formulas drive the hover-time fitted values X^ab\widehat{\mathbf{X}}_{ab}, the calibrated tick markers for each active PCA mdsDisplay, the bottom display-quality label, and the axis/sample fit summaries attached to the active two-dimensional principal-component view. If several PC pairs are stored as separate mdsDisplays, the same construction applies to each mdsDisplay separately.

Value

An object of class c("bipl5_biplot", "PCA")

References

Eckart, C. and Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1, 211–218.

Gabriel, K. R. (1971). The biplot graphical display of matrices with application to principal component analysis. Biometrika, 58(3), 453–467. doi:10.1093/biomet/58.3.453

Gower, J. C. and Hand, D. J. (1996). Biplots. London: Chapman \& Hall.

Gower, J. C., Lubbe, S. and le Roux, N. J. (2011). Understanding Biplots. Chichester: Wiley.

Greenacre, M. (2010). Biplots in Practice. Bilbao: BBVA Foundation.

Gardner-Lubbe, S., le Roux, N. J. and Gower, J. C. (2008). Measures of fit in principal component and canonical variate analyses. Journal of Applied Statistics, 35(9), 947–965. doi:10.1080/02664760802185399

Lubbe, S., le Roux, N. J., Nienkemper-Swanepoel, J., Ganey, R., Buys, R., Adams, Z.-M. and Manefeldt, P. (2025). biplotEZ: EZ-to-Use Biplots. R package version 2.2.

Alves, M. R. (2012). Evaluation of the predictive power of biplot axes to automate the construction and layout of biplots based on the accuracy of direct readings from common outputs of multivariate analyses:

  1. application to principal component analysis. Journal of Chemometrics, 26(5), 180–190. doi:10.1002/cem.2433

Examples

## Not run: 
library(biplotEZ)
bp <- biplot(iris[, 1:4], scale = TRUE) |>
  PCA(e.vects = c(1, 2), group.aes = iris[, 5]) |>
  wrap_bipl5()
bp
plot(bp)

bp_cor <- biplot(iris[, 1:4], scale = TRUE) |>
  PCA(
    e.vects = c(1, 2),
    group.aes = iris[, 5],
    correlation.biplot = TRUE
  ) |>
  wrap_bipl5()
plot(bp_cor)

## End(Not run)

Construct a bipl5_biplot from a PCO biplot

Description

Handles two cases depending on the axis type stored in x$PCOaxes:

Linear axes

Built identically to regression biplots via build_one_mdsDisplay(), including translated density axes.

Spline axes

Uses a custom mdsDisplay builder (build_spline_mdsDisplay()) that places only sample points, the spline axis curves with tick marks, and a bounding circle. A custom JavaScript handler is attached at plot time.

In both cases there is a single mdsDisplay (mdsDisplay_12), no fit measures, and append_mdsDisplay() / remove_mdsDisplay() are disabled.

Usage

## S3 method for class 'PCO'
wrap_bipl5(x)

Arguments

x

An object of class biplot from the biplotEZ package with PCO() method applied.

Value

An object of class c("bipl5_biplot", "pco")

Examples

## Not run: 
library(biplotEZ)
bp <- biplot(iris[, 1:4]) |>
  PCO(dist.func = stats::dist) |>
  wrap_bipl5()
bp
plot(bp)

## End(Not run)

Construct a bipl5_biplot from a regression biplot

Description

Builds the single mdsDisplay used for a linear regression biplot and documents the associated regression-biplot fit and predictivity measures. Regression biplots do not use the multi-mdsDisplay fit machinery available for PCA/CVA displays: they have one fixed mdsDisplay (mdsDisplay_12), append_mdsDisplay() and remove_mdsDisplay() are not supported, and the only active toggle button is “Translated Axes”.

Usage

## S3 method for class 'regress'
wrap_bipl5(x)

Arguments

x

An object of class biplot from the biplotEZ package with regress() method applied.

Details

For the linear regression biplot handled by this method, let XRn×p\mathbf{X}\in\mathbb{R}^{n\times p} denote the processed data matrix stored in the biplot object after centring and any optional scaling performed by biplot(), and let ZRn×2\mathbf{Z}\in\mathbb{R}^{n\times 2} denote the externally supplied display coordinates of the nn samples. Write Z=[z1 z2]\mathbf{Z} = [\mathbf{z}_1\ \mathbf{z}_2], where z1\mathbf{z}_1 and z2\mathbf{z}_2 are the first and second displayed coordinates respectively. In contrast to a PCA biplot, the sample map is taken as given and the variable axes are then fitted to that map by multivariate least squares. This is the regression-biplot point of view used in the biplot literature for general low-dimensional sample maps (Gower and Hand, 1996; Gower, Lubbe and le Roux, 2011).

The fitted linear model is

X=ZH+E,\mathbf{X} = \mathbf{Z}\mathbf{H}^{\top} + \mathbf{E},

where, when Z\mathbf{Z} has full column rank,

H=(ZZ)1ZX.\mathbf{H}^{\top} = (\mathbf{Z}^{\top}\mathbf{Z})^{-1}\mathbf{Z}^{\top}\mathbf{X}.

Hence the fitted values are

X^=ZH=Z(ZZ)1ZX=PZX,\widehat{\mathbf{X}} = \mathbf{Z}\mathbf{H}^{\top} = \mathbf{Z}(\mathbf{Z}^{\top}\mathbf{Z})^{-1}\mathbf{Z}^{\top}\mathbf{X} = \mathbf{P}_Z\mathbf{X},

where PZ\mathbf{P}_Z is the orthogonal projector onto the column space of Z\mathbf{Z}. More generally, if the supplied display coordinates are rank-deficient, the same fitted matrix X^\widehat{\mathbf{X}} is obtained by interpreting PZ\mathbf{P}_Z as the orthogonal projector onto col(Z)\mathrm{col}(\mathbf{Z}). The regression biplot therefore displays the variables through the least-squares predictions obtained from the supplied 2D sample map (Gower and Hand, 1996; Gower, Lubbe and le Roux, 2011).

If h(j)\mathbf{h}_{(j)} denotes the jjth column of H\mathbf{H}, then the predicted value of variable jj for sample ii is

x^ij=zih(j).\widehat{x}_{ij} = \mathbf{z}_i^{\top}\mathbf{h}_{(j)}.

The calibrated axis for variable jj has direction h(j)\mathbf{h}_{(j)}, and the point on that axis corresponding to marker value μ\mu is

pμj=μh(j)h(j)h(j).\mathbf{p}_{\mu j} = \frac{\mu}{\mathbf{h}_{(j)}^{\top}\mathbf{h}_{(j)}}\mathbf{h}_{(j)}.

This is the calibration formula used to place tick marks and to recover predicted values from projections onto the displayed axis, in direct analogy with calibrated-axis biplot constructions (Gabriel, 1971; Gower, Lubbe and le Roux, 2011). All such predicted values are on the same centred/scaled scale as the stored matrix X\mathbf{X}; if needed, they can be back-transformed to the original variable scale using the means and standard deviations stored in the input biplot object.

A regression biplot admits a natural family of predictivity measures on the variable side. Let x(j)\mathbf{x}_{(j)} denote column jj of X\mathbf{X}, let x^(j)\widehat{\mathbf{x}}_{(j)} denote column jj of X^\widehat{\mathbf{X}}, and let e(j)=x(j)x^(j)\mathbf{e}_{(j)} = \mathbf{x}_{(j)} - \widehat{\mathbf{x}}_{(j)}. Since X^=PZX\widehat{\mathbf{X}} = \mathbf{P}_Z\mathbf{X} is an orthogonal projection, the residual matrix satisfies

X^E=0,\widehat{\mathbf{X}}^{\top}\mathbf{E} = \mathbf{0},

and therefore

XX=X^X^+(XX^)(XX^).\mathbf{X}^{\top}\mathbf{X} = \widehat{\mathbf{X}}^{\top}\widehat{\mathbf{X}} + (\mathbf{X} - \widehat{\mathbf{X}})^{\top} (\mathbf{X} - \widehat{\mathbf{X}}).

This is the variable-side, or Type B, orthogonality that justifies variance-accounted-for ratios for the columns of X\mathbf{X}; it is the same side of the orthogonality argument that underlies column-wise predictivities in the biplot literature (Gower, Lubbe and le Roux, 2011; Greenacre, 2010).

The predictivity of variable jj is therefore defined by

ϕj=x^(j)2x(j)2=1x(j)x^(j)2x(j)2,j=1,,p.\phi_j = \frac{\|\widehat{\mathbf{x}}_{(j)}\|^2} {\|\mathbf{x}_{(j)}\|^2} = 1 - \frac{\|\mathbf{x}_{(j)} - \widehat{\mathbf{x}}_{(j)}\|^2} {\|\mathbf{x}_{(j)}\|^2}, \qquad j=1,\ldots,p.

Thus ϕj\phi_j is the proportion of the sum of squares of variable jj reproduced by the regression biplot, equivalently the ordinary multiple-regression R2R^2 obtained by regressing variable jj on the displayed coordinates Z\mathbf{Z}. Each ϕj\phi_j lies in [0,1][0,1]; values near one indicate that the variable is well predicted by the displayed map, while values near zero indicate that the variable is poorly reproduced by the chosen display (Greenacre, 2010).

A natural overall quality-of-display measure is the proportion of total sum of squares reproduced by the display,

Rdisp2=X^F2XF2=1XX^F2XF2.R^2_{\mathrm{disp}} = \frac{\|\widehat{\mathbf{X}}\|_F^2}{\|\mathbf{X}\|_F^2} = 1 - \frac{\|\mathbf{X} - \widehat{\mathbf{X}}\|_F^2}{\|\mathbf{X}\|_F^2}.

Because the column-wise decomposition above is orthogonal, this overall quality can be written as a weighted average of the variable predictivities:

Rdisp2=j=1pwjϕj,R^2_{\mathrm{disp}} = \sum_{j=1}^{p} w_j \phi_j,

where

wj=x(j)2XF2,j=1pwj=1.w_j = \frac{\|\mathbf{x}_{(j)}\|^2}{\|\mathbf{X}\|_F^2}, \qquad \sum_{j=1}^{p} w_j = 1.

Hence variables with larger sums of squares contribute more to the overall quality. In particular, if the original call to biplot() used scale = TRUE, so that all processed variables have equal sums of squares, then the weights are equal and

Rdisp2=1pj=1pϕj.R^2_{\mathrm{disp}} = \frac{1}{p}\sum_{j=1}^{p}\phi_j.

This weighted-average interpretation is often the most natural way to read the overall regression-biplot quality, since it combines the separate variable predictivities into a single display-wide summary (Greenacre, 2010).

The quantities ϕj\phi_j and Rdisp2R^2_{\mathrm{disp}} depend only on the fitted projection PZX\mathbf{P}_Z\mathbf{X} and therefore only on the subspace col(Z)\mathrm{col}(\mathbf{Z}). They do not depend on any particular basis chosen for that subspace. In particular, the variable predictivities ϕj\phi_j do not require any QR decomposition.

To decompose the total display quality into separate contributions for the two displayed dimensions, this package applies an ordered orthogonalization of the supplied display coordinates. Specifically, define

u1=z1,q1=u1u1\mathbf{u}_1 = \mathbf{z}_1, \qquad \mathbf{q}_1 = \frac{\mathbf{u}_1}{\|\mathbf{u}_1\|}

whenever u10\mathbf{u}_1 \neq \mathbf{0}, and then define

u2=z2q1q1z2,q2=u2u2\mathbf{u}_2 = \mathbf{z}_2 - \mathbf{q}_1\mathbf{q}_1^{\top}\mathbf{z}_2, \qquad \mathbf{q}_2 = \frac{\mathbf{u}_2}{\|\mathbf{u}_2\|}

whenever u20\mathbf{u}_2 \neq \mathbf{0}. Equivalently, Q=[q1 q2]\mathbf{Q} = [\mathbf{q}_1\ \mathbf{q}_2] is obtained from the QR decomposition of Z\mathbf{Z}, preserving the supplied column order. The vectors q1\mathbf{q}_1 and q2\mathbf{q}_2 are orthonormal and span the same display subspace as the nonzero columns of Z\mathbf{Z}.

Because Q\mathbf{Q} and Z\mathbf{Z} span the same subspace, the orthogonal projector may also be written as

PZ=QQ.\mathbf{P}_Z = \mathbf{Q}\mathbf{Q}^{\top}.

Consequently,

X^=QQX=q1q1X+q2q2X\widehat{\mathbf{X}} = \mathbf{Q}\mathbf{Q}^{\top}\mathbf{X} = \mathbf{q}_1\mathbf{q}_1^{\top}\mathbf{X} + \mathbf{q}_2\mathbf{q}_2^{\top}\mathbf{X}

whenever both orthogonalized directions are present. Since q1q2=0\mathbf{q}_1^{\top}\mathbf{q}_2 = 0, the two fitted parts are orthogonal and their sums of squares add. This yields the dimension-specific contributions

R12=q1q1XF2XF2,R^2_1 = \frac{\|\mathbf{q}_1\mathbf{q}_1^{\top}\mathbf{X}\|_F^2} {\|\mathbf{X}\|_F^2},

and

R212=q2q2XF2XF2,R^2_{2\mid 1} = \frac{\|\mathbf{q}_2\mathbf{q}_2^{\top}\mathbf{X}\|_F^2} {\|\mathbf{X}\|_F^2},

so that

Rdisp2=R12+R212R^2_{\mathrm{disp}} = R^2_1 + R^2_{2\mid 1}

whenever the display space is two-dimensional.

Care should be taken when interpreting this decomposition. If the columns of Z\mathbf{Z} are already orthogonal, then the two displayed contributions correspond directly to the first and second supplied display axes. If the columns of Z\mathbf{Z} are not orthogonal, however, the decomposition is ordered. The first contribution R12R^2_1 is attributable to the first supplied display coordinate z1\mathbf{z}_1. The second contribution R212R^2_{2\mid 1} is attributable to the component of the second supplied display coordinate z2\mathbf{z}_2 that is orthogonal to the first. Thus R212R^2_{2\mid 1} should be interpreted as the additional contribution of “Dim 2 given Dim 1”, not as the contribution of the raw second column of Z\mathbf{Z} considered in isolation. The ordering of the columns of Z\mathbf{Z} is therefore important for this decomposition.

The same ordered orthogonalization yields a decomposition of each variable's predictivity:

ϕj=ϕj1+ϕj,21,\phi_j = \phi_{j1} + \phi_{j,2\mid 1},

where

ϕj1=q1q1x(j)2x(j)2=(q1x(j))2x(j)2,\phi_{j1} = \frac{\|\mathbf{q}_1\mathbf{q}_1^{\top}\mathbf{x}_{(j)}\|^2} {\|\mathbf{x}_{(j)}\|^2} = \frac{(\mathbf{q}_1^{\top}\mathbf{x}_{(j)})^2} {\|\mathbf{x}_{(j)}\|^2},

and

ϕj,21=q2q2x(j)2x(j)2=(q2x(j))2x(j)2.\phi_{j,2\mid 1} = \frac{\|\mathbf{q}_2\mathbf{q}_2^{\top}\mathbf{x}_{(j)}\|^2} {\|\mathbf{x}_{(j)}\|^2} = \frac{(\mathbf{q}_2^{\top}\mathbf{x}_{(j)})^2} {\|\mathbf{x}_{(j)}\|^2}.

Thus ϕj1\phi_{j1} is the part of variable jj's predictivity explained by the first supplied display dimension, while ϕj,21\phi_{j,2\mid 1} is the additional part explained by the second display dimension after removing its overlap with the first.

If the supplied display coordinates are collinear, then u2=0\mathbf{u}_2 = \mathbf{0} and the effective display space is one-dimensional. In that case R212=0R^2_{2\mid 1} = 0 and ϕj,21=0\phi_{j,2\mid 1} = 0 for all variables.

In addition to the sum-of-squares fit measures above, this method may also report direct-reading error diagnostics in the sense of Alves (2012). The purpose of these diagnostics is different from that of the predictivities ϕj\phi_j. The quantities ϕj\phi_j, ϕj1\phi_{j1}, ϕj,21\phi_{j,2\mid 1} and Rdisp2R^2_{\mathrm{disp}} measure how much of the variation in X\mathbf{X} is reproduced by the fitted regression biplot. By contrast, the Alves diagnostics measure how accurately values can be read directly from a displayed calibrated axis in the current two-dimensional map. Alves (2012) proposed this idea for predictive PCA biplots; in the present two-dimensional regression-biplot setting the same principle applies in a particularly simple form because there is only one displayed map.

For each sample i=1,,ni=1,\ldots,n and variable axis j=1,,pj=1,\ldots,p, the reading taken from the displayed axis of variable jj is precisely the fitted value

x^ij=zih(j).\widehat{x}_{ij} = \mathbf{z}_i^{\top}\mathbf{h}_{(j)}.

The corresponding point on the calibrated axis is

pij=x^ijh(j)h(j)h(j),\mathbf{p}_{ij} = \frac{\widehat{x}_{ij}} {\mathbf{h}_{(j)}^{\top}\mathbf{h}_{(j)}}\mathbf{h}_{(j)},

obtained by substituting μ=x^ij\mu = \widehat{x}_{ij} into the calibration formula. Thus the direct reading from the graph and the fitted value from the regression model coincide.

Let sjs_j denote the standard deviation used to standardize variable jj. When scale = TRUE, the processed matrix X\mathbf{X} already has unit-variance columns and hence sj=1s_j = 1. The pointwise direct-reading error for sample ii on variable axis jj is defined by

δij=xijx^ijsj.\delta_{ij} = \frac{|x_{ij} - \widehat{x}_{ij}|}{s_j}.

If the processed matrix is already standardized, then δij=xijx^ij\delta_{ij} = |x_{ij} - \widehat{x}_{ij}|; in that case δij\delta_{ij} is the direct analogue of Alves's standard predictive error. More generally, dividing by sjs_j expresses the discrepancy on a comparable variable-wise scale. The quantity δij\delta_{ij} is therefore a sample-by-axis direct-reading error.

The corresponding axis-level mean direct-reading error is

δˉj=1ni=1nδij=1ni=1nxijx^ijsj.\bar{\delta}_j = \frac{1}{n}\sum_{i=1}^{n}\delta_{ij} = \frac{1}{n}\sum_{i=1}^{n} \frac{|x_{ij} - \widehat{x}_{ij}|}{s_j}.

This is the two-dimensional regression-biplot analogue of the mean standard predictive error of Alves (2012). Small values of δˉj\bar{\delta}_j indicate that the calibrated axis for variable jj supports accurate direct readings on average across the displayed observations, whereas large values indicate that direct readings from that axis are unreliable in the current display.

Let τaxis>0\tau_{\mathrm{axis}} > 0 be a user-specified tolerance parameter for axis-level direct-reading error. Then the Alves selection rule specialized to the present two-dimensional regression biplot is

retain axis jδˉjτaxis.\text{retain axis }j \quad\Longleftrightarrow\quad \bar{\delta}_j \le \tau_{\mathrm{axis}}.

Thus an axis is shown only when its average direct-reading error is at most the allowed tolerance. Larger values of τaxis\tau_{\mathrm{axis}} retain more axes and therefore produce denser displays; smaller values enforce stricter axis selection and lead to sparser, more conservative displays. In Alves (2012), values around 0.5 are discussed as a practical starting point in conventional settings, but no universal default should be assumed.

In addition to axis selection, Alves (2012) proposed a second tolerance parameter for individual sample-axis discrepancies. Let τunits>0\tau_{\mathrm{units}} > 0. A sample ii is then flagged as an outlier with respect to axis jj whenever

δij>τunits.\delta_{ij} > \tau_{\mathrm{units}}.

Such a flag indicates that, even if axis jj is acceptable on average, the direct reading for sample ii from that axis is poor in the current display. Alves (2012) discusses values around 0.75 as a practical starting point for this tolerance parameter, again subject to the application and the scale of the analysis.

Because this wrapper is tied to a single two-dimensional regression-biplot display, the quantities δij\delta_{ij} and δˉj\bar{\delta}_j are display-specific diagnostics. They are not measures of the quality of the underlying fitted subspace in the sum-of-squares sense; rather, they quantify the numerical accuracy of direct readings from the currently displayed axes. This distinction is central in Alves (2012), who emphasizes that direct-reading error is conceptually different from earlier axis-predictivity measures.

The Alves diagnostics and the regression-biplot predictivities are therefore complementary. The quantity ϕj\phi_j is a variance-accounted-for ratio justified by Type B orthogonality and answers the question: “How much of variable jj's sum of squares is reproduced by the displayed regression biplot?” The quantity δˉj\bar{\delta}_j is a mean absolute direct-reading error and answers the different question: “How accurately can values of variable jj be read from the displayed calibrated axis?” Consequently, a variable may have high ϕj\phi_j and still have a non-negligible direct-reading error, while a variable with moderate ϕj\phi_j may nevertheless support acceptable average direct readings. In this implementation, the ϕj\phi_j-family is the primary set of sum-of-squares fit measures, whereas the Alves quantities δˉj\bar{\delta}_j and δij\delta_{ij} provide supplementary, display-specific diagnostics for axis selection and observation-level checking.

In contrast, a regression biplot does not in general satisfy the sample-side decomposition

XX=X^X^+(XX^)(XX^).\mathbf{X}\mathbf{X}^{\top} = \widehat{\mathbf{X}}\widehat{\mathbf{X}}^{\top} + (\mathbf{X} - \widehat{\mathbf{X}}) (\mathbf{X} - \widehat{\mathbf{X}})^{\top}.

Consequently, PCA-style sample predictivities are not generally justified for a regression biplot. The principled sum-of-squares fit measures are the variable predictivities ϕj\phi_j, the overall quality Rdisp2R^2_{\mathrm{disp}}, and the ordered dimension-specific contributions described above, with the Alves direct-reading errors providing a distinct supplementary perspective on the quality of the displayed axes.

In the wrapped bipl5_biplot object, these formulas drive the bottom display-quality label, the hover-time predicted values X^\widehat{\mathbf{X}}, and the calibrated linear axes stored in mdsDisplay_12. Since the regression display is tied to one externally supplied map, wrap_bipl5.regress() produces a single mdsDisplay only. There is no PC/CV toggle and no separate PCA-style sample-fit panel.

Value

An object of class c("bipl5_biplot", "reg")

References

Gabriel, K. R. (1971). The biplot graphical display of matrices with application to principal component analysis. Biometrika, 58(3), 453–467. doi:10.1093/biomet/58.3.453

Gower, J. C. and Hand, D. J. (1996). Biplots. London: Chapman \& Hall.

Gower, J. C., Lubbe, S. and le Roux, N. J. (2011). Understanding Biplots. Chichester: Wiley.

Greenacre, M. (2010). Biplots in Practice. Bilbao: BBVA Foundation.

la Grange, A., le Roux, N. and Gardner-Lubbe, S. (2009). BiplotGUI: Interactive Biplots in R. Journal of Statistical Software, 30(12), 1–37. doi:10.18637/jss.v030.i12

Alves, M. R. (2012). Evaluation of the predictive power of biplot axes to automate the construction and layout of biplots based on the accuracy of direct readings from common outputs of multivariate analyses: application to principal component analysis. Journal of Chemometrics, 26(5), 180–190. doi:10.1002/cem.2433

Examples

## Not run: 
library(biplotEZ)
bp <- biplot(iris[, 1:4]) |>
  regress(Z = prcomp(iris[, 1:4])$x[, 1:2], group.aes = iris[, 5]) |>
  wrap_bipl5()
bp
plot(bp)

## End(Not run)