| Title: | Construct Reactive Calibrated Axes Biplots |
|---|---|
| Description: | A modern view on the principal component analysis biplot with calibrated axes. Create principal component analysis biplots rendered in HTML with significant reactivity embedded within the plot. Furthermore, the traditional biplot view is enhanced by translated axes with inter-class kernel densities superimposed. For more information on biplots, see Gower, J.C., Lubbe, S. and le Roux, N.J. (2011, ISBN: 978-0-470-01255-0). |
| Authors: | Ruan Buys [aut, cre, cph] (ORCID: <https://orcid.org/0000-0001-8527-8631>), Carel van der Merwe [aut, ths] (ORCID: <https://orcid.org/0000-0003-0676-8240>), Delia Sandilands [ctb] (ORCID: <https://orcid.org/0000-0001-9419-7286>), Sugnet Lubbe [ctb] (ORCID: <https://orcid.org/0000-0003-2762-9944>) |
| Maintainer: | Ruan Buys <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.0 |
| Built: | 2026-05-19 16:55:15 UTC |
| Source: | https://github.com/ruanbuys/bipl5 |
Adds a new biplot layer for a specified pair of principal components.
The pair is sorted automatically (e.g. c(5, 3) becomes
c(3, 5)). Both PC indices must be between 1 and p
(the number of variables), and the pair must not already exist.
append_mdsDisplay(object, eigenvectors) ## S3 method for class 'bipl5_biplot' append_mdsDisplay(object, eigenvectors)append_mdsDisplay(object, eigenvectors) ## S3 method for class 'bipl5_biplot' append_mdsDisplay(object, eigenvectors)
object |
A |
eigenvectors |
Integer vector of length 2 giving the PC pair
(e.g. |
A new bipl5_biplot with the additional mdsDisplay appended
A modern view on PCA biplot with calibrated axes. Create PCA biplots rendered in HTML with significant reactivity embedded on the plot. Furthermore, the traditional biplot view is enhanced by translated axes with interclass kernel densities superimposed. For more information on biplots, see Gower, J.C., Lubbe, S. and le Roux, N.J. (2011, ISBN: 978-0-470-01255-0)
| Package: | bipl5 |
| Type: | Package |
| Version: | 1.1 |
| Date: | 18-12-2023 |
| License: | MIT |
Ruan Buys (Maintainer)
Carel van der Merwe
Delia Sandilands (contributer)
Sugnet Lubbe (contributer)
The newest version of the package can be obtained on GitHub: https://github.com/RuanBuys/bipl5
bipl5 default color scales
colorpal(number = 16)colorpal(number = 16)
number |
Integer - number of distinct colors to return. Ranges from 1 to 16. |
Character vector of default colors in bipl5. There are
sixteen unique colors defined.
colorpal(number=7)colorpal(number=7)
Three calling styles are supported:
mdsDisplay subset: extract(bp, mdsDisplay_12) — returns a
new bipl5_biplot containing only that mdsDisplay (plottable).
Two-level: extract(bp, from = mdsDisplay_12, what = sample_coordinates)
— returns the nested data element.
Arbitrary depth: extract(bp, mdsDisplay_12$Data$sample_coordinates)
— returns the nested data element.
extract(object, expr, from, what) ## S3 method for class 'bipl5_biplot' extract(object, expr, from, what)extract(object, expr, from, what) ## S3 method for class 'bipl5_biplot' extract(object, expr, from, what)
object |
A |
expr |
An unquoted mdsDisplay name (e.g. |
from |
Unquoted name of the top-level element |
what |
Unquoted name of the nested element |
In addition to the mdsDisplay access patterns above, graph-based fit measures
can be extracted directly with calls such as
extract(bp, fit_measures, CumPred) or
extract(bp, fit_measures$CumPred). Supported graph-based fit measures
are CumPred, CumAd, VarExp, and Scree. These
calls return a bipl5_fit object that can be passed to plot() to
obtain a static ggplot2 version of the corresponding fit graph.
A bipl5_biplot (mdsDisplay subset), a bipl5_fit object
for graph-based fit measures, or the requested sub-element.
bp <- biplotEZ::biplot(iris[, 1:4]) |> biplotEZ::PCA() |> wrap_bipl5() only_12 <- extract(bp, mdsDisplay_12) data_obj <- extract(bp, from = mdsDisplay_12, what = Data) coords <- extract(bp, mdsDisplay_12$Data$sample_coordinates) fit_plot <- extract(bp, fit_measures, CumPred) plot(fit_plot)bp <- biplotEZ::biplot(iris[, 1:4]) |> biplotEZ::PCA() |> wrap_bipl5() only_12 <- extract(bp, mdsDisplay_12) data_obj <- extract(bp, from = mdsDisplay_12, what = Data) coords <- extract(bp, mdsDisplay_12$Data$sample_coordinates) fit_plot <- extract(bp, fit_measures, CumPred) plot(fit_plot)
format_samples() rebuilds the sample-trace block inside each
mdsDisplay so that observations are grouped by by and rendered with one
trace per visual class. This means the visible trace structure, legend
labels, and stored sample-format metadata all stay aligned.
format_samples( x, stratify = c("col", "symbol"), by = NULL, col = NULL, pch = NULL )format_samples( x, stratify = c("col", "symbol"), by = NULL, col = NULL, pch = NULL )
x |
A |
stratify |
Which aesthetic to change: |
by |
Optional grouping variable for the sample traces. This can be:
When |
col |
Optional vector of colours. When |
pch |
Optional vector of plotting symbols. When |
The function is intended for sample formatting only. It does not refit the underlying ordination model. In particular, for CVA biplots the fitted CVA classes are preserved and only the sample traces are reformatted.
A first call to format_samples() creates one sample legend section for the
requested aesthetic. For example, format_samples(stratify = "col", by = Species) will colour the observations by Species and create a legend
section headed Species with one entry per class.
A second call can be used to add a second, independent sample
stratification. If the second call uses the same grouping variable as the
first call, both aesthetics are applied to the same set of classes and the
legend remains unified. If the second call uses a different grouping
variable, format_samples() creates a second legend section and internally
splits the observation layer into all observed combinations of the two
grouping variables.
For example, the sequence
init_biplot(iris2) |> scale_mds("pca") |> format_samples(stratify = "col", by = Species) |> format_samples(stratify = "symbol", by = Band)
will produce two sample legend sections:
Species for the colour grouping
Band for the symbol grouping
The visible observation traces are then split by Species x Band, but these
combination traces are hidden from the legend. Instead, format_samples()
inserts legend-only sample entries so the legend remains easy to read.
If translated axes are available in the mdsDisplay, a colour
stratification also rebuilds the kernel-density traces on the translated
axes so that those densities reflect the colour classes. Symbol-only
stratification does not change the translated-axis densities. This means:
format_samples(stratify = "col", ...) recalculates translated-axis
densities by the colour grouping
format_samples(stratify = "symbol", ...) leaves the existing translated
densities unchanged
The legend toggles operate across the full dual stratification:
clicking a colour legend entry hides or shows all observations belonging to that colour class, across every symbol class
clicking a symbol legend entry hides or shows all observations belonging to that symbol class, across every colour class
The formatting is applied to every mdsDisplay currently stored in the
object. If additional displays are later added with append_mdsDisplay(),
the stored sample-format state is reused so the new displays inherit the same
sample legend structure.
format_samples() supports two complementary workflows.
Single stratification
A single call to format_samples() rebuilds the sample layer so that one
trace is created per class in by. This updates the marker appearance, the
legend entries, and the stored sample-format metadata consistently.
Second stratification
A second call to format_samples() can be used to add a second sample
aesthetic. This is most useful when colour and plotting symbol represent
different variables.
If the second call uses the same grouping structure as the first, the result is still one legend section with one entry per class, but each class now carries both a colour and a plotting symbol.
If the second call uses a different grouping structure, the object stores two independent sample legend sections. Internally, the observation layer is rebuilt as one hidden trace per observed combination of the two grouping variables. The visible legend then shows one section for each stratifying variable.
Translated-axis densities
When translated axes are present, the kernel-density traces on those axes are
tied to the current colour grouping. Applying format_samples(stratify = "col", ...) rebuilds the translated-axis density traces so they match the
colour classes. Applying format_samples(stratify = "symbol", ...) does not
rebuild those densities.
So:
a first colour stratification updates both the sample layer and the translated-axis densities
a later symbol stratification leaves those densities as they are
if a symbol stratification is applied first and a colour stratification is added later, the translated-axis densities are rebuilt when the colour stratification is added
Legend click behaviour
When two different stratifications are active, the legend entries behave like filters:
clicking a class in the first legend section toggles all observations in that class, regardless of their membership in the second stratification
clicking a class in the second legend section toggles all observations in that class, regardless of their membership in the first stratification
So if colours represent Species and symbols represent Band, clicking
setosa hides all setosa observations, while clicking class1 hides all
class1 observations across every species.
Non-standard evaluation
If by is supplied as a bare column name, format_samples() looks for that
column in the dataset stored by init_biplot(). If by is supplied as a
character string, it is interpreted as the name of a stored column. If by
is supplied as a vector, it must have one value per observation; in that case
the legend title defaults to "Data" because there is no stored column name
to display.
CVA note
format_samples() does not change the fitted CVA model. It only reformats
the sample traces. The grouping used to fit the CVA model should therefore be
specified in scale_mds(), not in format_samples().
A modified bipl5_biplot.
bp <- init_biplot(iris) |> scale_mds(type = "pca", eigenvectors = c(1, 2)) bp_species <- format_samples( bp, stratify = "col", by = Species, col = c("tomato", "steelblue", "darkgreen") ) sample_idx <- vapply( bp_species$mdsDisplay_12$mdsDisplay$trace_data, function(tr) "data" %in% unlist(tr$meta), logical(1) ) vapply( bp_species$mdsDisplay_12$mdsDisplay$trace_data[sample_idx], `[[`, character(1), "name" ) bp_symbol <- format_samples( bp, stratify = "symbol", by = Species, pch = c(16, 17, 15) ) iris2 <- iris iris2$Band <- factor( rep(c("class1", "class2", "class3", "class4"), length.out = nrow(iris2)) ) bp_dual <- init_biplot(iris2) |> scale_mds(type = "pca", eigenvectors = c(1, 2)) |> format_samples( stratify = "col", by = Species, col = c("tomato", "steelblue", "darkgreen") ) |> format_samples( stratify = "symbol", by = Band, pch = c(12, 13, 14, 15) ) # When plotted, the legend now has one section for Species and one for Band. # Clicking a Species entry hides that species across all Band classes. # Clicking a Band entry hides that Band class across all Species classes. if (interactive()) { plot(bp_dual) } bp_species_13 <- append_mdsDisplay(bp_species, c(1, 3))bp <- init_biplot(iris) |> scale_mds(type = "pca", eigenvectors = c(1, 2)) bp_species <- format_samples( bp, stratify = "col", by = Species, col = c("tomato", "steelblue", "darkgreen") ) sample_idx <- vapply( bp_species$mdsDisplay_12$mdsDisplay$trace_data, function(tr) "data" %in% unlist(tr$meta), logical(1) ) vapply( bp_species$mdsDisplay_12$mdsDisplay$trace_data[sample_idx], `[[`, character(1), "name" ) bp_symbol <- format_samples( bp, stratify = "symbol", by = Species, pch = c(16, 17, 15) ) iris2 <- iris iris2$Band <- factor( rep(c("class1", "class2", "class3", "class4"), length.out = nrow(iris2)) ) bp_dual <- init_biplot(iris2) |> scale_mds(type = "pca", eigenvectors = c(1, 2)) |> format_samples( stratify = "col", by = Species, col = c("tomato", "steelblue", "darkgreen") ) |> format_samples( stratify = "symbol", by = Band, pch = c(12, 13, 14, 15) ) # When plotted, the legend now has one section for Species and one for Band. # Clicking a Species entry hides that species across all Band classes. # Clicking a Band entry hides that Band class across all Species classes. if (interactive()) { plot(bp_dual) } bp_species_13 <- append_mdsDisplay(bp_species, c(1, 3))
init_biplot() stores the raw data and preprocessing options needed to
construct a biplot later with scale_mds(). It does not perform any
ordination itself. When data is a data frame containing both numeric and
non-numeric columns, only the numeric columns are used for the biplot
calculation, while the full data frame is retained for later formatting
steps such as format_samples().
init_biplot(data, center = TRUE, scale = FALSE)init_biplot(data, center = TRUE, scale = FALSE)
data |
A matrix or data frame. If a data frame contains non-numeric columns, they are stored but excluded from the ordination input. |
center |
Logical; should numeric variables be centered before analysis? |
scale |
Logical; should numeric variables be scaled before analysis? |
An object of class bipl5_spec.
overlay_fit() is a convenience helper for pipelines. It does not refit the
underlying ordination; it only stores whether fit measures should default to
the right-hand panel or an overlay view when plot() is called.
overlay_fit(x, overlay = TRUE)overlay_fit(x, overlay = TRUE)
x |
A |
overlay |
Logical scalar. |
A later call to plot(x, fit_display = ...) always takes precedence over the
stored default.
A modified bipl5_biplot.
Initialises a plotly graph, populates it with the first available mdsDisplay traces and annotations, then attaches the remaining mdsDisplays and fit measures to the JavaScript event handler.
## S3 method for class 'bipl5_biplot' plot(x, y = NULL, fit_display = c("inherit", "panel", "overlay"), ...)## S3 method for class 'bipl5_biplot' plot(x, y = NULL, fit_display = c("inherit", "panel", "overlay"), ...)
x |
A |
y |
Ignored (for S3 consistency) |
fit_display |
How fit measures should be shown for biplots that support them: inherit the object's stored preference, render them in the right-hand panel, or render them as an overlay over the full plot width. |
... |
Additional arguments (ignored) |
A plotly htmlwidget
Reconstructs one of the PCA fit graphs from its stored plotly traces. The fit type is inferred from the trace metadata and trace types, then translated into a ggplot2 chart with matching title, legend titles, and axes.
## S3 method for class 'bipl5_fit' plot(x, y = NULL, ...)## S3 method for class 'bipl5_fit' plot(x, y = NULL, ...)
x |
A |
y |
Ignored (for S3 consistency) |
... |
Additional arguments (ignored) |
Supported fit graphs are cumulative predictivity (CumPred),
cumulative adequacy (CumAd), variance explained (VarExp), and
the scree plot (Scree). The summary-table fit objects are not handled
by this plotting method.
A ggplot object.
bp <- biplotEZ::biplot(iris[, 1:4]) |> biplotEZ::PCA() |> wrap_bipl5() fit_plot <- extract(bp, fit_measures, Scree) plot(fit_plot)bp <- biplotEZ::biplot(iris[, 1:4]) |> biplotEZ::PCA() |> wrap_bipl5() fit_plot <- extract(bp, fit_measures, Scree) plot(fit_plot)
Print a bipl5_biplot object as a tree diagram
## S3 method for class 'bipl5_biplot' print(x, ...)## S3 method for class 'bipl5_biplot' print(x, ...)
x |
A |
... |
Additional arguments (ignored) |
Invisibly returns x
Print a bipl5_data object
## S3 method for class 'bipl5_data' print(x, ...)## S3 method for class 'bipl5_data' print(x, ...)
x |
A |
... |
Additional arguments (ignored) |
Invisibly returns x
Print a bipl5_fitmeasures object
## S3 method for class 'bipl5_fitmeasures' print(x, ...)## S3 method for class 'bipl5_fitmeasures' print(x, ...)
x |
A |
... |
Additional arguments (ignored) |
Invisibly returns x
Print a bipl5_mdsDisplay object
## S3 method for class 'bipl5_mdsDisplay' print(x, ...)## S3 method for class 'bipl5_mdsDisplay' print(x, ...)
x |
A |
... |
Additional arguments (ignored) |
Invisibly returns x
Returns a new bipl5_biplot with the specified mdsDisplay (and its
corresponding fit table) removed. At least one mdsDisplay must remain.
remove_mdsDisplay(object, mdsDisplay) ## S3 method for class 'bipl5_biplot' remove_mdsDisplay(object, mdsDisplay)remove_mdsDisplay(object, mdsDisplay) ## S3 method for class 'bipl5_biplot' remove_mdsDisplay(object, mdsDisplay)
object |
A |
mdsDisplay |
Unquoted name of the mdsDisplay to remove
(e.g. |
A new bipl5_biplot without the removed mdsDisplay
scale_mds() turns a bipl5_spec created by init_biplot() into a fully
formed bipl5_biplot by dispatching to one of the underlying
biplotEZ::PCA(), biplotEZ::CVA(), biplotEZ::PCO(), or
biplotEZ::regress() methods and then compiling only the requested
mdsDisplay. Any additional displays can be added later with
append_mdsDisplay().
scale_mds(x, type = c("pca", "cva", "pco", "regress"), ...) ## S3 method for class 'bipl5_spec' scale_mds(x, type = c("pca", "cva", "pco", "regress"), ...)scale_mds(x, type = c("pca", "cva", "pco", "regress"), ...) ## S3 method for class 'bipl5_spec' scale_mds(x, type = c("pca", "cva", "pco", "regress"), ...)
x |
A |
type |
The biplot method to construct. One of |
... |
Additional named arguments for the chosen method. |
The type argument chooses the underlying biplot method. Additional
arguments are method-specific and should be supplied via ....
Supported aliases in ...:
Common: classes, group_aes / group.aes, title / Title
PCA: dimensions / dim.biplot, eigenvectors / e.vects,
show_class_means / show.class.means / show_group_means /
show.group.means, correlation_biplot / correlation.biplot
CVA: classes, dimensions / dim.biplot, eigenvectors / e.vects,
weighted_cva / weightedCVA, show_class_means / show.class.means /
show_group_means / show.group.means, low_dim / low.dim
PCO: Dmat / dist_mat, dist_func / dist.func,
dist_func_cat / dist.func.cat, dimensions / dim.biplot,
eigenvectors / e.vects, show_class_means / show.class.means /
show_group_means / show.group.means, axes
regress: Z / z, show_group_means / show.group.means /
show_class_means / show.class.means, axes
For type = "pco", any remaining named arguments in ... are forwarded to
the chosen distance function.
A fully formed bipl5_biplot.
plotly libraryRetrieve all valid plotting symbols for the plotly library
Symbol_List()Symbol_List()
A vector of all the valid plotting symbols used in the
plot_ly library.
Symbol_List()Symbol_List()
Convert a biplotEZ object to a bipl5_biplot
wrap_bipl5(x)wrap_bipl5(x)
x |
A biplotEZ biplot object |
An object of class bipl5_biplot
Builds mdsDisplays for the user's CV pair and available supplementary pairs,
along with a dropdown menu. Fit measures are not yet computed for CVA
biplots and will be NULL.
Plotting is deferred to plot.bipl5_biplot.
## S3 method for class 'CVA' wrap_bipl5(x)## S3 method for class 'CVA' wrap_bipl5(x)
x |
An object of class |
An object of class c("bipl5_biplot", "cva")
## Not run: library(biplotEZ) bp <- biplot(iris[, 1:4]) |> CVA(classes = iris[, 5]) |> wrap_bipl5() bp plot(bp) ## End(Not run)## Not run: library(biplotEZ) bp <- biplot(iris[, 1:4]) |> CVA(classes = iris[, 5]) |> wrap_bipl5() bp plot(bp) ## End(Not run)
Builds the mdsDisplay(s) used for a principal component analysis (PCA) biplot and documents the associated PCA-biplot fit, predictivity and direct-reading measures. In contrast to a regression biplot, the low-dimensional sample map is obtained internally from the singular value decomposition of the processed data matrix. If the wrapped object stores more than one principal-component pair as separate mdsDisplays, the same formulas below apply to each mdsDisplay separately, with the active mdsDisplay determined by the displayed pair of principal components.
## S3 method for class 'PCA' wrap_bipl5(x)## S3 method for class 'PCA' wrap_bipl5(x)
x |
An object of class |
For the PCA biplot handled by this method, let
denote the processed data
matrix stored in the input biplot object after centring and any
optional scaling performed by biplot(). Thus is the
matrix on which PCA is actually carried out. Write the singular value
decomposition as
where ,
, and
with
and
. The columns of are the
principal directions, and the corresponding principal component score vectors
are ,
. This is the standard PCA biplot construction underlying
Gabriel's original formulation and subsequent calibrated-axis developments
(Gabriel, 1971; Gower and Hand, 1996; Gower, Lubbe and le Roux, 2011;
Greenacre, 2010).
Suppose the user has selected two principal components
. Let denote the diagonal
selector matrix with ones in positions and
and zeros elsewhere. Then the two-dimensional PCA fitted matrix is
Equivalently, if and denote the
submatrices containing columns and , and
, then
When , this is the best rank-2 approximation to
in Frobenius norm by the Eckart–Young theorem. For any
other selected pair, the same formula gives the orthogonal projection of
onto the chosen two-dimensional principal-component subspace,
but it is not generally the globally optimal rank-2 approximation
(Eckart and Young, 1936; Gabriel, 1971; Greenacre, 2010).
The calibrated-axis PCA biplot may be written in the general form
where the exact factorization depends on the type of PCA biplot being displayed.
For the ordinary PCA biplot, which prioritizes the Euclidean geometry of the sample points, take
Thus the displayed sample coordinates are the selected principal component scores, and the fitted matrix is
If denotes the th row of
written as a column vector in ,
then for sample
Hence the calibrated axis for variable has direction
, and the point on that axis corresponding to marker
value is
This is the standard calibrated-axis formula used to place tick marks and to recover predicted values from projections onto the displayed axis (Gower and Hand, 1996; Gower, Lubbe and le Roux, 2011).
A second important special case is the correlation biplot, obtained
when the processed matrix is standardized and the display is
chosen so that correlations between variables are approximated by the cosines
of the angles between the displayed variable directions. In that case one may
equivalently factorize
Hence the displayed sample coordinates are and the
variable directions are the rows of .
If denotes the th such row written as a column
vector, then
In this standardized setting the coordinates
are proportional to the correlations of variable
with the selected principal components, and the geometry of the
displayed variable directions is therefore tuned to the correlation structure
rather than to the raw score geometry of the samples. This is the sense in
which correlation.biplot = TRUE preserves variable-correlation
information in the display (Gabriel, 1971; Greenacre, 2010; biplotEZ
manual and vignette).
In either factorization, all predicted values in
are on the same centred/scaled scale as the
stored matrix . If required, predictions can be
back-transformed to the original variable scale using the means and standard
deviations stored in the input biplot object.
A fundamental feature of the PCA biplot is that both sample-side and
variable-side orthogonal decompositions hold. Writing
, one has
and
The first is the Type A orthogonality, which justifies sample-side
measures of fit. The second is the Type B orthogonality, which
justifies variable-side measures of fit. For PCA both orthogonality relations
hold simultaneously because is obtained from
an orthogonal principal-component projection
(Gabriel, 1971; Gower, Lubbe and le Roux, 2011; Gardner-Lubbe, le Roux and
Gower, 2008).
Let denote row of
and the corresponding row of
. The sample predictivity of sample
is then
Thus is the proportion of the sum of squares of sample
reproduced by the chosen two-dimensional PCA display. Because of Type A
orthogonality, . Samples with near one
lie close to the displayed PCA plane, whereas samples with near
zero lie largely orthogonal to it. This is the sample-side fit measure used in
the PCA biplot literature and in biplotEZ
(Gardner-Lubbe, le Roux and Gower, 2008; biplotEZ vignette).
Let denote column of and
the corresponding column of
. The axis predictivity of variable
is
Thus is the proportion of the sum of squares of variable
reproduced by the chosen PCA plane. Because of Type B orthogonality,
. In a calibrated-axis display, is the
natural sum-of-squares measure of how well the axis for variable
reproduces the underlying processed values. This is the quantity reported in
biplotEZ as “axis predictivity”
(Gardner-Lubbe, le Roux and Gower, 2008; biplotEZ vignette;
Greenacre, 2010).
The overall quality of the displayed PCA subspace is
Since
, this may also be
written as
In particular, when , this is the familiar proportion of
total sum of squares explained by the first two principal components. More
generally, it is the quality of the specific displayed pair chosen by the
user, matching the biplotEZ quality measure
(Gabriel, 1971; Greenacre, 2010; biplotEZ vignette).
Because both Type A and Type B orthogonality hold, the overall quality can be expressed as a weighted average on either the sample side or the variable side. On the variable side,
where
Hence variables with larger sums of squares contribute more to the overall
display quality. If the original call to biplot() used
scale = TRUE, so that all processed variables have equal sums of
squares, then
Thus, for a standardized PCA biplot, the overall quality is the simple average of the individual axis predictivities. This is the weighted-average interpretation requested by the present wrapper.
Similarly, on the sample side,
where
Hence the same overall display quality may be read as a weighted average of sample predictivities or as a weighted average of axis predictivities (Gardner-Lubbe, le Roux and Gower, 2008).
Unlike the regression-biplot case, no ordered orthogonalization is required to decompose the quality of a PCA display into separate contributions from the two displayed dimensions, because principal components are already mutually orthogonal. Indeed,
and these two rank-1 parts are orthogonal in Frobenius inner product. Hence
where
Thus the contribution of each displayed principal component is obtained directly from its singular value.
The same orthogonal decomposition yields a per-component breakdown of each axis predictivity. Since
with orthogonal components and , one has
where
Hence the predictivity of axis can be decomposed exactly into the
separate contributions of the two displayed principal components. This is the
PCA analogue of the dimension-wise decomposition used elsewhere in the biplot
literature, but here it is especially simple because the components are
orthogonal from the outset. In particular, if the same variable is well
aligned with one selected principal direction but not the other, this will be
visible in the separate values and .
Likewise, each sample predictivity decomposes as
where
Thus the contribution of each displayed principal component may be read not
only globally through and , but also locally through
the sample-wise contributions and and the
axis-wise contributions and
(Gardner-Lubbe, le Roux and Gower, 2008).
In addition to the sum-of-squares fit measures above, this method may also
report direct-reading diagnostics in the sense of Alves (2012). These
quantities serve a different purpose from the predictivities
and . The predictivities measure how much of the
variation in is reproduced by the selected PCA plane in a
sum-of-squares sense. By contrast, the Alves diagnostics measure how
accurately values can be read directly from the displayed calibrated axes in
the current two-dimensional map. This distinction is central in the predictive
biplot literature (Alves, 2012).
Let denote the displayed direction of
variable axis under the active PCA factorization. Thus
for the ordinary PCA biplot and
for the correlation biplot. Let
denote the corresponding displayed sample
coordinate of sample . Then the value read from the graph on axis
for sample is
and the point on the calibrated axis corresponding to that reading is
Thus the direct reading from the displayed PCA axis coincides exactly with the fitted value from the active two-dimensional PCA approximation.
Let denote the standard deviation used to standardize variable
. When the processed matrix is already standardized,
. The pointwise direct-reading error for sample on axis
is then
If is already standardized, then
. The corresponding axis-level mean
direct-reading error is
This is the two-dimensional PCA-biplot analogue of the mean standard predictive error of Alves (2012).
Let be a user-specified
tolerance parameter for axis-level direct-reading error. Then the Alves
selection rule becomes
Likewise, for an observation-level tolerance parameter
, sample is flagged with respect to
axis whenever
Hence may be used for axis selection and
for observation-level checking, exactly as in the predictive
PCA-biplot framework of Alves (2012).
The Alves diagnostics and the PCA predictivities are complementary. The
quantity answers the question:
“How much of variable 's sum of squares is reproduced by the
selected PCA plane?” The quantity answers the corresponding
sample-side question:
“How much of sample 's sum of squares is reproduced by the
selected PCA plane?” By contrast, answers the distinct
question:
“How accurately can values of variable be read directly from
the displayed calibrated axis?” Consequently, an axis may have high
yet still have non-negligible direct-reading error in the current
display, while an axis with only moderate may nevertheless admit
acceptable direct readings. In this implementation, the
- and -families are the primary sum-of-squares fit
measures, whereas the Alves quantities and
provide supplementary, display-specific diagnostics.
In the wrapped bipl5_biplot object, these formulas drive the
hover-time fitted values , the calibrated tick
markers for each active PCA mdsDisplay, the bottom display-quality label, and
the axis/sample fit summaries attached to the active two-dimensional
principal-component view. If several PC pairs are stored as separate mdsDisplays,
the same construction applies to each mdsDisplay separately.
An object of class c("bipl5_biplot", "PCA")
Eckart, C. and Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1, 211–218.
Gabriel, K. R. (1971). The biplot graphical display of matrices with application to principal component analysis. Biometrika, 58(3), 453–467. doi:10.1093/biomet/58.3.453
Gower, J. C. and Hand, D. J. (1996). Biplots. London: Chapman \& Hall.
Gower, J. C., Lubbe, S. and le Roux, N. J. (2011). Understanding Biplots. Chichester: Wiley.
Greenacre, M. (2010). Biplots in Practice. Bilbao: BBVA Foundation.
Gardner-Lubbe, S., le Roux, N. J. and Gower, J. C. (2008). Measures of fit in principal component and canonical variate analyses. Journal of Applied Statistics, 35(9), 947–965. doi:10.1080/02664760802185399
Lubbe, S., le Roux, N. J., Nienkemper-Swanepoel, J., Ganey, R., Buys, R., Adams, Z.-M. and Manefeldt, P. (2025). biplotEZ: EZ-to-Use Biplots. R package version 2.2.
Alves, M. R. (2012). Evaluation of the predictive power of biplot axes to automate the construction and layout of biplots based on the accuracy of direct readings from common outputs of multivariate analyses:
application to principal component analysis. Journal of Chemometrics, 26(5), 180–190. doi:10.1002/cem.2433
## Not run: library(biplotEZ) bp <- biplot(iris[, 1:4], scale = TRUE) |> PCA(e.vects = c(1, 2), group.aes = iris[, 5]) |> wrap_bipl5() bp plot(bp) bp_cor <- biplot(iris[, 1:4], scale = TRUE) |> PCA( e.vects = c(1, 2), group.aes = iris[, 5], correlation.biplot = TRUE ) |> wrap_bipl5() plot(bp_cor) ## End(Not run)## Not run: library(biplotEZ) bp <- biplot(iris[, 1:4], scale = TRUE) |> PCA(e.vects = c(1, 2), group.aes = iris[, 5]) |> wrap_bipl5() bp plot(bp) bp_cor <- biplot(iris[, 1:4], scale = TRUE) |> PCA( e.vects = c(1, 2), group.aes = iris[, 5], correlation.biplot = TRUE ) |> wrap_bipl5() plot(bp_cor) ## End(Not run)
Handles two cases depending on the axis type stored in x$PCOaxes:
Built identically to regression biplots via
build_one_mdsDisplay(), including translated density axes.
Uses a custom mdsDisplay builder
(build_spline_mdsDisplay()) that places only sample points, the
spline axis curves with tick marks, and a bounding circle.
A custom JavaScript handler is attached at plot time.
In both cases there is a single mdsDisplay (mdsDisplay_12), no fit
measures, and append_mdsDisplay() / remove_mdsDisplay() are
disabled.
## S3 method for class 'PCO' wrap_bipl5(x)## S3 method for class 'PCO' wrap_bipl5(x)
x |
An object of class |
An object of class c("bipl5_biplot", "pco")
## Not run: library(biplotEZ) bp <- biplot(iris[, 1:4]) |> PCO(dist.func = stats::dist) |> wrap_bipl5() bp plot(bp) ## End(Not run)## Not run: library(biplotEZ) bp <- biplot(iris[, 1:4]) |> PCO(dist.func = stats::dist) |> wrap_bipl5() bp plot(bp) ## End(Not run)
Builds the single mdsDisplay used for a linear regression biplot and documents
the associated regression-biplot fit and predictivity measures.
Regression biplots do not use the multi-mdsDisplay fit machinery available for
PCA/CVA displays: they have one fixed mdsDisplay (mdsDisplay_12),
append_mdsDisplay() and remove_mdsDisplay() are not supported,
and the only active toggle button is “Translated Axes”.
## S3 method for class 'regress' wrap_bipl5(x)## S3 method for class 'regress' wrap_bipl5(x)
x |
An object of class |
For the linear regression biplot handled by this method, let
denote the processed data
matrix stored in the biplot object after centring and any optional
scaling performed by biplot(), and let
denote the externally supplied
display coordinates of the samples. Write
, where
and are the first and second displayed
coordinates respectively. In contrast to a PCA biplot, the sample map is taken
as given and the variable axes are then fitted to that map by multivariate
least squares. This is the regression-biplot point of view used in the biplot
literature for general low-dimensional sample maps (Gower and Hand, 1996;
Gower, Lubbe and le Roux, 2011).
The fitted linear model is
where, when has full column rank,
Hence the fitted values are
where is the orthogonal projector onto the column space
of . More generally, if the supplied display coordinates are
rank-deficient, the same fitted matrix is obtained
by interpreting as the orthogonal projector onto
. The regression biplot therefore displays the
variables through the least-squares predictions obtained from the supplied
2D sample map (Gower and Hand, 1996; Gower, Lubbe and le Roux, 2011).
If denotes the th column of
, then the predicted value of variable for sample
is
The calibrated axis for variable has direction
, and the point on that axis corresponding to marker
value is
This is the calibration formula used to place tick marks and to recover
predicted values from projections onto the displayed axis, in direct analogy
with calibrated-axis biplot constructions (Gabriel, 1971; Gower, Lubbe and
le Roux, 2011). All such predicted values are on the same centred/scaled scale
as the stored matrix ; if needed, they can be back-transformed
to the original variable scale using the means and standard deviations stored
in the input biplot object.
A regression biplot admits a natural family of predictivity measures on
the variable side. Let denote column of
, let denote column
of , and let
.
Since is an orthogonal
projection, the residual matrix satisfies
and therefore
This is the variable-side, or Type B, orthogonality that justifies
variance-accounted-for ratios for the columns of ; it is the
same side of the orthogonality argument that underlies column-wise
predictivities in the biplot literature (Gower, Lubbe and le Roux, 2011;
Greenacre, 2010).
The predictivity of variable is therefore defined by
Thus is the proportion of the sum of squares of variable
reproduced by the regression biplot, equivalently the ordinary
multiple-regression obtained by regressing variable on the
displayed coordinates . Each lies in
; values near one indicate that the variable is well predicted by
the displayed map, while values near zero indicate that the variable is poorly
reproduced by the chosen display (Greenacre, 2010).
A natural overall quality-of-display measure is the proportion of total sum of squares reproduced by the display,
Because the column-wise decomposition above is orthogonal, this overall quality can be written as a weighted average of the variable predictivities:
where
Hence variables with larger sums of squares contribute more to the overall
quality. In particular, if the original call to biplot() used
scale = TRUE, so that all processed variables have equal sums of
squares, then the weights are equal and
This weighted-average interpretation is often the most natural way to read the overall regression-biplot quality, since it combines the separate variable predictivities into a single display-wide summary (Greenacre, 2010).
The quantities and depend only on the
fitted projection and therefore only on the
subspace . They do not depend on any
particular basis chosen for that subspace. In particular, the variable
predictivities do not require any QR decomposition.
To decompose the total display quality into separate contributions for the two displayed dimensions, this package applies an ordered orthogonalization of the supplied display coordinates. Specifically, define
whenever , and then define
whenever . Equivalently,
is obtained from the QR
decomposition of , preserving the supplied column order. The
vectors and are orthonormal and span
the same display subspace as the nonzero columns of .
Because and span the same subspace, the
orthogonal projector may also be written as
Consequently,
whenever both orthogonalized directions are present. Since
, the two fitted parts are
orthogonal and their sums of squares add. This yields the dimension-specific
contributions
and
so that
whenever the display space is two-dimensional.
Care should be taken when interpreting this decomposition. If the columns of
are already orthogonal, then the two displayed contributions
correspond directly to the first and second supplied display axes. If the
columns of are not orthogonal, however, the decomposition is
ordered. The first contribution is attributable to the
first supplied display coordinate . The second contribution
is attributable to the component of the second supplied
display coordinate that is orthogonal to the first.
Thus should be interpreted as the additional contribution
of “Dim 2 given Dim 1”, not as the contribution of the raw second
column of considered in isolation. The ordering of the
columns of is therefore important for this decomposition.
The same ordered orthogonalization yields a decomposition of each variable's predictivity:
where
and
Thus is the part of variable 's predictivity explained
by the first supplied display dimension, while
is the additional part explained by the second display
dimension after removing its overlap with the first.
If the supplied display coordinates are collinear, then
and the effective display space is
one-dimensional. In that case and
for all variables.
In addition to the sum-of-squares fit measures above, this method may also
report direct-reading error diagnostics in the sense of Alves (2012). The
purpose of these diagnostics is different from that of the predictivities
. The quantities , ,
and measure how much of the
variation in is reproduced by the fitted regression biplot.
By contrast, the Alves diagnostics measure how accurately values can be read
directly from a displayed calibrated axis in the current two-dimensional map.
Alves (2012) proposed this idea for predictive PCA biplots; in the present
two-dimensional regression-biplot setting the same principle applies in a
particularly simple form because there is only one displayed map.
For each sample and variable axis
, the reading taken from the displayed axis of variable
is precisely the fitted value
The corresponding point on the calibrated axis is
obtained by substituting into the calibration
formula. Thus the direct reading from the graph and the fitted value from the
regression model coincide.
Let denote the standard deviation used to standardize variable
. When scale = TRUE, the processed matrix
already has unit-variance columns and hence . The
pointwise direct-reading error for sample on variable axis
is defined by
If the processed matrix is already standardized, then
; in that case
is the direct analogue of Alves's standard predictive error.
More generally, dividing by expresses the discrepancy on a
comparable variable-wise scale. The quantity is therefore a
sample-by-axis direct-reading error.
The corresponding axis-level mean direct-reading error is
This is the two-dimensional regression-biplot analogue of the mean standard
predictive error of Alves (2012). Small values of
indicate that the calibrated axis for variable supports accurate
direct readings on average across the displayed observations, whereas large
values indicate that direct readings from that axis are unreliable in the
current display.
Let be a user-specified
tolerance parameter for axis-level direct-reading error. Then the
Alves selection rule specialized to the present two-dimensional regression
biplot is
Thus an axis is shown only when its average direct-reading error is at most
the allowed tolerance. Larger values of retain
more axes and therefore produce denser displays; smaller values enforce
stricter axis selection and lead to sparser, more conservative displays.
In Alves (2012), values around 0.5 are discussed as a practical starting
point in conventional settings, but no universal default should be assumed.
In addition to axis selection, Alves (2012) proposed a second
tolerance parameter for individual sample-axis discrepancies. Let
. A sample is then flagged as an
outlier with respect to axis whenever
Such a flag indicates that, even if axis is acceptable on average,
the direct reading for sample from that axis is poor in the current
display. Alves (2012) discusses values around 0.75 as a practical starting
point for this tolerance parameter, again subject to the application and the
scale of the analysis.
Because this wrapper is tied to a single two-dimensional regression-biplot
display, the quantities and are
display-specific diagnostics. They are not measures of the quality of the
underlying fitted subspace in the sum-of-squares sense; rather, they quantify
the numerical accuracy of direct readings from the currently displayed axes.
This distinction is central in Alves (2012), who emphasizes that direct-reading
error is conceptually different from earlier axis-predictivity measures.
The Alves diagnostics and the regression-biplot predictivities are therefore
complementary. The quantity is a variance-accounted-for ratio
justified by Type B orthogonality and answers the question:
“How much of variable 's sum of squares is reproduced by the
displayed regression biplot?” The quantity is a mean
absolute direct-reading error and answers the different question:
“How accurately can values of variable be read from the
displayed calibrated axis?” Consequently, a variable may have high
and still have a non-negligible direct-reading error, while a
variable with moderate may nevertheless support acceptable
average direct readings. In this implementation, the -family is
the primary set of sum-of-squares fit measures, whereas the Alves quantities
and provide supplementary,
display-specific diagnostics for axis selection and observation-level
checking.
In contrast, a regression biplot does not in general satisfy the sample-side decomposition
Consequently, PCA-style sample predictivities are not generally justified for
a regression biplot. The principled sum-of-squares fit measures are the
variable predictivities , the overall quality
, and the ordered dimension-specific contributions
described above, with the Alves direct-reading errors providing a distinct
supplementary perspective on the quality of the displayed axes.
In the wrapped bipl5_biplot object, these formulas drive the bottom
display-quality label, the hover-time predicted values
, and the calibrated linear axes stored in
mdsDisplay_12. Since the regression display is tied to one externally
supplied map, wrap_bipl5.regress() produces a single mdsDisplay only.
There is no PC/CV toggle and no separate PCA-style sample-fit panel.
An object of class c("bipl5_biplot", "reg")
Gabriel, K. R. (1971). The biplot graphical display of matrices with application to principal component analysis. Biometrika, 58(3), 453–467. doi:10.1093/biomet/58.3.453
Gower, J. C. and Hand, D. J. (1996). Biplots. London: Chapman \& Hall.
Gower, J. C., Lubbe, S. and le Roux, N. J. (2011). Understanding Biplots. Chichester: Wiley.
Greenacre, M. (2010). Biplots in Practice. Bilbao: BBVA Foundation.
la Grange, A., le Roux, N. and Gardner-Lubbe, S. (2009). BiplotGUI: Interactive Biplots in R. Journal of Statistical Software, 30(12), 1–37. doi:10.18637/jss.v030.i12
Alves, M. R. (2012). Evaluation of the predictive power of biplot axes to automate the construction and layout of biplots based on the accuracy of direct readings from common outputs of multivariate analyses: application to principal component analysis. Journal of Chemometrics, 26(5), 180–190. doi:10.1002/cem.2433
## Not run: library(biplotEZ) bp <- biplot(iris[, 1:4]) |> regress(Z = prcomp(iris[, 1:4])$x[, 1:2], group.aes = iris[, 5]) |> wrap_bipl5() bp plot(bp) ## End(Not run)## Not run: library(biplotEZ) bp <- biplot(iris[, 1:4]) |> regress(Z = prcomp(iris[, 1:4])$x[, 1:2], group.aes = iris[, 5]) |> wrap_bipl5() bp plot(bp) ## End(Not run)