Title: | Three-Way / Multigroup Data Analysis Through Densities |
---|---|
Description: | The data consist of a set of variables measured on several groups of individuals. To each group is associated an estimated probability density function. The package provides tools to create or manage such data and functional methods (principal component analysis, multidimensional scaling, cluster analysis, discriminant analysis...) for such probability densities. |
Authors: | Rachid Boumaza [aut], Pierre Santagostini [aut, cre], Smail Yousfi [aut], Gilles Hunault [ctb], Julie Bourbeillon [ctb], Besnik Pumo [ctb], Sabine Demotes-Mainard [aut] |
Maintainer: | Pierre Santagostini <[email protected]> |
License: | GPL (>= 2) |
Version: | 4.1.5 |
Built: | 2024-11-23 03:16:19 UTC |
Source: | https://github.com/cran/dad |
The three-way data consists of a set of variables measured on several groups of individuals. To each group is associated an estimated probability density function. The package provides functional methods (principal component analysis, multidimensional scaling, cluster analysis, discriminant analysis...) for such probability densities.
Package: | dad |
Type: | Package |
Version: | 4.1.2 |
Date: | 2023-08-28 |
License: | GPL-2 |
URL: https://forgemia.inra.fr/dad/dad BugReports: https://forgemia.inra.fr/dad/dad/issues |
To cite dad
, use citation("dad")
.
The main functions applying to the probability densities are:
fpcad
: functional principal component analysis,
fpcat
: functional principal component analysis applied to data indexed according to time,
fmdsd
: multidimensional scaling,
fhclustd
: hierarchical clustering,
fdiscd.misclass
: functional discriminant analysis in order to compute the misclassification ratio with the one-leave-out method,
fdiscd.predict
: discriminant analysis in order to predict the class (synonymous with cluster, not to be confused with the class attribute of an R object) of each probability density whose class is unknown,
mdsdd
: multidimensional scaling of discrete probability distributions,
discdd.misclass
: functional discriminant analysis of discrete probability distributions, in order to compute the misclassification ratio with the one-leave-out method,
discdd.predict
: discriminant analysis of discrete probability distributions, in order to predict the class of each probability distribution whose class is unknown,
The above functions are completed by:
A print()
method for objects of class fpcad
, fmdsd
, fdiscd.misclass
, fdiscd.predict
or mdsdd
, in order to display the results of the corresponding function,
A plot()
method for objects of class fpcad
, fmdsd
, fhclustd
or mdsdd
, in order to display some useful graphics attached to the corresponding function,
A generic function interpret
that applies to objects of class fpcad
fmdsd
or mdsdd
, helps the user to interpret the scores returned by the corresponding function, in terms of moments (fpcad
or fmdsd
) or in terms of marginal probability distributions (mdsdd
).
We also introduce classes of objects and tools in order to handle collections of data frames:
folder
creates an object of class folder
, that is a list of data frames which have in common the same columns.
The following functions apply to a folder and compute some statistics on the columns of its elements: mean.folder
, var.folder
, cor.folder
, skewness.folder
or kurtosis.folder
.
folderh
creates an object of class folderh
, that is a list of data frames with a hierarchic relation between each pair of consecutive data frames.
foldert
creates an object of class foldert
, that is a list of data frames indexed according to time, concerning the same individuals and variables or not.
read.mtg
creates an object of class foldermtg
from an MTG (Multiscale Tree Graph) file containing plant architecture data.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard with the contributions from Gilles Hunault, Julie Bourbeillon and Besnik Pumo
Boumaza, R. (1998). Analyse en composantes principales de distributions gaussiennes multidimensionnelles. Revue de Statistique Appliqu?e, XLVI (2), 5-20.
Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.
Boumaza, R. (2004). Discriminant analysis with independently repeated multivariate measurements: an approach. Computational Statistics & Data Analysis, 47, 823-843.
Delicado, P. (2011). Dimensionality reduction when data are density functions. Computational Statistics & Data Analysis, 55, 401-420.
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
Pradal, C., Godin, C. and Cokelaer, T. (2023). MTG user guide
Rudrauf, J.M., Boumaza, R. (2001). Contribution à l'étude de l'architecture médiévale: les caractéristiques des pierres à bossage des châteaux forts alsaciens. Centre de Recherches Archéologiques Médiévales de Saverne, 5, 5-38.
Rachev, S.T., Klebanov, L.B., Stoyanov, S.V. and Fabozzi, F.J. (2013). The methods of distances in the theory of probability and statistics. Springer.
Yousfi, S., Boumaza, R., Aissani, D., Adjabi, S. (2014). Optimal bandwith matrices in functional principal component analysis of density functions. Journal of Statistical Computation and Simulation, 85 (11), 2315-2330.
folderh
.
Creates an object of class folderh
by appending a data frame to an object of class folderh
.
The appended data frame will be the first or last element of the returned folderh
.
appendtofolderh(fh, df, key, after = FALSE)
appendtofolderh(fh, df, key, after = FALSE)
fh |
object of class |
df |
data frame to be appended to |
key |
character string. The key defining the relation |
after |
logical. If |
Returns an object of class folderh
, that is a list of data frames where
is the number of data frames of
fh
.
The value of the attribute attr(, "keys")
is c(key, attr(fh, "keys"))
if after = FALSE
), c(attr(fh, "keys"), key)
otherwise.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Builds a data frame from an object of class folder
.
## S3 method for class 'folder' as.data.frame(x, row.names = NULL, optional = FALSE, ..., group.name = "group")
## S3 method for class 'folder' as.data.frame(x, row.names = NULL, optional = FALSE, ..., group.name = "group")
x |
object of class |
row.names , optional
|
for consistency with |
... |
further arguments passed to or from other methods. |
group.name |
the name of the grouping variable. It is the name of the last column of the returned data frame. |
The data frame is simply obtained by row binding the data frames of the folder and adding a factor (as last column). The name of this column is given by group.name
argument. The levels of this factor are the names of the elements of the folder.
as.data.frame.folder
returns a data frame.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
folder
: object of class folder
.
as.folder.data.frame
: build an object of class folder
from a data frame.
data(iris) iris.fold <- as.folder(iris, "Species") print(iris.fold) iris.df <- as.data.frame(iris.fold) print(iris.df)
data(iris) iris.fold <- as.folder(iris, "Species") print(iris.fold) iris.df <- as.data.frame(iris.fold) print(iris.df)
Builds a data frame from a folderh
.
## S3 method for class 'folderh' as.data.frame(x, row.names = NULL, optional = FALSE, ..., elt = names(x)[2], key = attr(x, "keys")[1])
## S3 method for class 'folderh' as.data.frame(x, row.names = NULL, optional = FALSE, ..., elt = names(x)[2], key = attr(x, "keys")[1])
x |
object of class |
row.names , optional
|
for consistency with |
... |
further arguments passed to or from other methods. |
elt |
string. The name of one element of |
key |
string. The name of an element of |
as.data.frame.folderh
returns a data frame whose row names are those of x[[elt]]
(that is x[[j]]
). The data frame contains the values of x[[elt]]
and the corresponding values of the data frames x[[k]]
, these correspondances being defined by the keys of the hierarchic folder.
The column names of the returned data frame are organized in three parts.
The first part consists in the key names keys[k]
,..., keys[j-1]
.
The second part consists in the values of x[[j]]
.
The third part consists in the values of x[[k]]
except the key keys[k]
.
See the examples to view these details.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
folder
, folderh
, as.folder.folderh
.
# First example: rose flowers data(roseflowers) flg <- roseflowers$variety flx <- roseflowers$flower flfh <- folderh(flg, "rose", flx) print(flfh) fldf <- as.data.frame(flfh) print(fldf) # Second example: castles data(castles.dated) cag <- castles.dated$periods cax <- castles.dated$stones cafh <- folderh(cag, "castle", cax) print(cafh) cadf <- as.data.frame(cafh) print(summary(cadf)) # Third example: leaves (example of a folderh with more than two data frames) data(roseleaves) lvr <- roseleaves$rose lvs <- roseleaves$stem lvl <- roseleaves$leaf lvll <- roseleaves$leaflet lfh <- folderh(lvr, "rose", lvs, "stem", lvl, "leaf", lvll) lf1 <- as.data.frame(lfh, elt = "lvs", key = "rose") print(lf1) lf2 <- as.data.frame(lfh, elt = "lvl", key = "rose") print(lf2) lf3 <- as.data.frame(lfh, elt = "lvll", key = "rose") print(lf3) lf4 <- as.data.frame(lfh, elt = "lvll", key = "stem") print(lf4)
# First example: rose flowers data(roseflowers) flg <- roseflowers$variety flx <- roseflowers$flower flfh <- folderh(flg, "rose", flx) print(flfh) fldf <- as.data.frame(flfh) print(fldf) # Second example: castles data(castles.dated) cag <- castles.dated$periods cax <- castles.dated$stones cafh <- folderh(cag, "castle", cax) print(cafh) cadf <- as.data.frame(cafh) print(summary(cadf)) # Third example: leaves (example of a folderh with more than two data frames) data(roseleaves) lvr <- roseleaves$rose lvs <- roseleaves$stem lvl <- roseleaves$leaf lvll <- roseleaves$leaflet lfh <- folderh(lvr, "rose", lvs, "stem", lvl, "leaf", lvll) lf1 <- as.data.frame(lfh, elt = "lvs", key = "rose") print(lf1) lf2 <- as.data.frame(lfh, elt = "lvl", key = "rose") print(lf2) lf3 <- as.data.frame(lfh, elt = "lvll", key = "rose") print(lf3) lf4 <- as.data.frame(lfh, elt = "lvll", key = "stem") print(lf4)
Builds a data frame from an object of class foldert
.
## S3 method for class 'foldert' as.data.frame(x, row.names = NULL, optional = FALSE, ..., group.name = "time")
## S3 method for class 'foldert' as.data.frame(x, row.names = NULL, optional = FALSE, ..., group.name = "time")
x |
object of class |
row.names , optional
|
for consistency with |
... |
further arguments passed to or from other methods. |
group.name |
the name of the grouping variable. It is the name of the last column of the returned data frame. As the observations are indexed by time, the default value is |
as.data.frame.foldert
uses as.data.frame.folder
.
as.data.frame.foldert
returns a data frame.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
foldert
: object of class foldert
.
as.foldert.data.frame
: build an object of class foldert
from a data frame.
as.foldert.array
: build an object of class foldert
from a -array.
data(floribundity) ftflor <- foldert(floribundity, cols.select = "union", rows.select = "union") print(ftflor) dfflor <- as.data.frame(ftflor) summary(dfflor)
data(floribundity) ftflor <- foldert(floribundity, cols.select = "union", rows.select = "union") print(ftflor) dfflor <- as.data.frame(ftflor) summary(dfflor)
Coerces a data frame or an object of class "folderh"
to an object of class "folder"
.
as.folder(x, ...)
as.folder(x, ...)
x |
an object of class
|
... |
further arguments passed to or from other methods. |
an object of class folder
.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
folder
: objects of class folder
.
as.data.frame.folder
: build a data frame from an object of class folder
.
as.folder.data.frame
: build an object of class folder
from a data frame.
as.folder.folderh
: build an object of class folder
from an object of class folderh
.
Builds an object of class folder
from a data frame.
## S3 method for class 'data.frame' as.folder(x, groups = tail(colnames(x), 1), ...)
## S3 method for class 'data.frame' as.folder(x, groups = tail(colnames(x), 1), ...)
x |
data frame. |
groups |
string. The name of the column of x containing the grouping variable. If omitted, the last column of |
... |
further arguments passed to or from other methods. |
as.folder.data.frame
returns an object of class folder
that is a list of data frames with the same column names.
Each element of the folder contains the data corresponding to one level of x[, groups]
.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
folder
: objects of class folder
.
as.data.frame.folder
: build a data frame from an object of class folder
.
as.folder.folderh
: build an object of class folder
from an object of class folderh
.
# First example: iris (Fisher) data(iris) iris.fold <- as.folder(iris, "Species") print(iris.fold) # Second example: roses data(roses) roses.fold <- as.folder(roses, "rose") print(roses.fold)
# First example: iris (Fisher) data(iris) iris.fold <- as.folder(iris, "Species") print(iris.fold) # Second example: roses data(roses) roses.fold <- as.folder(roses, "rose") print(roses.fold)
Creates an object of class folder
, that is a list of data frames with the same column names, from a folderh
.
## S3 method for class 'folderh' as.folder(x, elt = names(x)[2], key = attr(x, "keys")[1], ...)
## S3 method for class 'folderh' as.folder(x, elt = names(x)[2], key = attr(x, "keys")[1], ...)
x |
object of class |
elt |
string. The name of one element of |
key |
string. The name of an element of |
... |
further arguments passed to or from other methods. |
as.folder.folderh
returns an object of class folder
, a list of data frames with the same columns. These data frames contain the values of x[[elt]]
(or x[[j]]
) and the corresponding values of the data frames x[[j-1]]
, ... x[[k]]
, these correspondances being defined by the keys of the hierarchic folder. The names of these data frames are given by the levels of the key attr(x, "keys")[k])
.
The rows of the data frame x[[elt]]
(or x[[j]]
) are distributed among the data frames of the returned folder accordingly to the levels of the key attr(x, "keys")[k]
. So the row names of the l
-th data frame of the returned folder consist in the rows of x[[j]]
corresponding to the l
-th level of the key attr(x, "keys")[k]
.
The column names of the data frames of the returned folder are the union of the column names of the data frames x[[k]]
,..., x[[j]]
and are organized in two parts.
The first part consists in the columns of x[[k]]
except the column corresponding to the key attr(x, "keys")[k]
.
For each i=k+1,...,j
the column names of the data frame x[[i]]
are reorganized so that the key attr(x, "keys")[i]
is its first column. The columns of the reorganized data frames x[[k+1]]
,..., x[[j]]
are concatenated. The result forms the second part.
Notice that if:
the folderh has two data frames df1
and df2
, where the factor corresponding to the key has levels, and one column of
df2
, say df2[, "Fa"]
, is a factor with levels "a1"
, ..., "ap"
and the folder returned by as.folder
includes data frames
dat1
, ..., datT
,
then each of dat1
, ..., datT
has a column named "Fa"
which is a factor with the same levels "a1"
, ..., "ap"
as df2[, "Fa"]
.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
folder
, folderh
.
as.folder.folderh
to build an object of class folder
from an object of class folderh
.
as.data.frame.folder
to build a data frame from an object of class folder
.
as.data.frame.folderh
to build a data frame from an object of class folderh
.
# First example: flowers data(roseflowers) flg <- roseflowers$variety flx <- roseflowers$flower flfh <- folderh(flg, "rose", flx) print(flfh) flf <- as.folder(flfh) print(flf) # Second example: castles data(castles.dated) cag <- castles.dated$periods cax <- castles.dated$stones cafh <- folderh(cag, "castle", cax) print(cafh) caf <- as.folder(cafh) print(caf) # Third example: leaves (example of a folderh of more than two data frames) data(roseleaves) lvr <- roseleaves$rose lvs <- roseleaves$stem lvl <- roseleaves$leaf lvll <- roseleaves$leaflet lfh <- folderh(lvr, "rose", lvs, "stem", lvl, "leaf", lvll) lf1 <- as.folder(lfh, elt = "lvs", key = "rose") print(lf1) lf2 <- as.folder(lfh, elt = "lvl", key = "rose") print(lf2) lf3 <- as.folder(lfh, elt = "lvll", key = "rose") print(lf3) lf4 <- as.folder(lfh, elt = "lvll", key = "stem") print(lf4)
# First example: flowers data(roseflowers) flg <- roseflowers$variety flx <- roseflowers$flower flfh <- folderh(flg, "rose", flx) print(flfh) flf <- as.folder(flfh) print(flf) # Second example: castles data(castles.dated) cag <- castles.dated$periods cax <- castles.dated$stones cafh <- folderh(cag, "castle", cax) print(cafh) caf <- as.folder(cafh) print(caf) # Third example: leaves (example of a folderh of more than two data frames) data(roseleaves) lvr <- roseleaves$rose lvs <- roseleaves$stem lvl <- roseleaves$leaf lvll <- roseleaves$leaflet lfh <- folderh(lvr, "rose", lvs, "stem", lvl, "leaf", lvll) lf1 <- as.folder(lfh, elt = "lvs", key = "rose") print(lf1) lf2 <- as.folder(lfh, elt = "lvl", key = "rose") print(lf2) lf3 <- as.folder(lfh, elt = "lvll", key = "rose") print(lf3) lf4 <- as.folder(lfh, elt = "lvll", key = "stem") print(lf4)
Coerces an object to an object of class folderh
.
as.folderh(x, classes)
as.folderh(x, classes)
x |
an object to be coerced to an object of class |
classes |
argument useful for |
an object of class folderh
.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
as.folderh.foldermtg
: build an object of class folderh
from an object of class foldermtg
.
Creates an object of class folderh
from an object of class foldermtg
.
## S3 method for class 'foldermtg' as.folderh(x, classes)
## S3 method for class 'foldermtg' as.folderh(x, classes)
x |
object of class |
classes |
character vector. Codes of the vertex classes in the returned folderh.
These codes are the names of the elements (data frames) of These codes must be distinct, and the corresponding classes must have distinct scales (see These codes, except the one with the highest scale, are the keys of the returned folderh. |
This function uses folderh
.
An object of class folderh
. Its elements are the data frames of x
containing the features on vertices. Hence, each data frame matches with a class of vertex, and a scale. These data frames are in increasing order of the scale.
A column (factor) is added to the first data frame, containing the identifier of the vertex. Two columns are added to the second data frame:
the first one is a factor which gives, for each vertex, the name of the vertex of the first data frame which is its "parent",
and the second one is also a factor and contains the vertex's identifier.
And so on for the third and following data frames, if relevant.
The column containing the vertex identifiers is redundant with the row names; anyway, it is necessary for folderh
.
The key of the relationship between the two first data frame is given by the first column of each of these data frames.
If there are more than two data frames, the key of the relationship between the -th and
-th data frames (
) is given by the second column of the
$th data frame and the first column of the
-th data frame.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
read.mtg
: reads a MTG file and creates an object of class "foldermtg".
folderh
: object of class folderh
.
mtgfile <- system.file("extdata/plant1.mtg", package = "dad") x <- read.mtg(mtgfile) # folderh containing the plant ("P") and the stems ("A") as.folderh(x, classes = c("P", "A")) # folderh containing the plant ("P"), axes ("A") and phytomers ("M") as.folderh(x, classes = c("P", "A", "M")) # folderh containing the plant ("P") and the phytomers ("M") as.folderh(x, classes = c("P", "M")) # folderh containing the axes and phytomers fhPM <- as.folderh(x, classes = c("A", "M")) # coerce this folderh into a folder, and compute statistics on this folder fPM <- as.folder(fhPM) mean(fPM)
mtgfile <- system.file("extdata/plant1.mtg", package = "dad") x <- read.mtg(mtgfile) # folderh containing the plant ("P") and the stems ("A") as.folderh(x, classes = c("P", "A")) # folderh containing the plant ("P"), axes ("A") and phytomers ("M") as.folderh(x, classes = c("P", "A", "M")) # folderh containing the plant ("P") and the phytomers ("M") as.folderh(x, classes = c("P", "M")) # folderh containing the axes and phytomers fhPM <- as.folderh(x, classes = c("A", "M")) # coerce this folderh into a folder, and compute statistics on this folder fPM <- as.folder(fhPM) mean(fPM)
Coerces a data frame or array to an object of class foldert
.
as.foldert(x, ...)
as.foldert(x, ...)
x |
an object of class
|
... |
arguments passed to |
an object of class foldert
.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Builds an object of class foldert
from a -array.
## S3 method for class 'array' as.foldert(x, ind = 1, var = 2, time = 3, ...)
## S3 method for class 'array' as.foldert(x, ind = 1, var = 2, time = 3, ...)
x |
a |
ind , var , time
|
three distinct integers among 1, 2 and 3.
|
... |
further arguments passed to or from other methods. |
an object ft
of class foldert
that is a list of data frames, each of them corresponding to a time of observation; these data frames have the same column names.
They necessarily have the same row names (attr(ft, "same.rows")=TRUE
).
The "times"
attribute of ft
: attr(ft, "times")
is a numeric vector, an ordered factor or an object of class Date
, and contains the values nf the dimension of x
given by time
argument.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
foldert
: objects of class foldert
.
as.foldert.data.frame
: build an object of class foldert
from a data frame.
x <- array(c(rep(0, 5), rep(0, 5), rep(0, 5), rnorm(5, 2, 1), rnorm(5, 3, 2), rnorm(5, -2, 0.5), rnorm(5, 4, 1), rnorm(5, 5, 3), rnorm(5, -3, 1)), dim = c(5, 3, 3), dimnames = list(1:5, c("z1", "z2", "z3"), c("t1", "t2", "t3"))) # The individuals which were observed are on the 1st dimension, # the variables are on the 2nd dimension and the times are on the 3rd dimension. ft <- as.foldert(x, ind = 1, var = 2, time = 3)
x <- array(c(rep(0, 5), rep(0, 5), rep(0, 5), rnorm(5, 2, 1), rnorm(5, 3, 2), rnorm(5, -2, 0.5), rnorm(5, 4, 1), rnorm(5, 5, 3), rnorm(5, -3, 1)), dim = c(5, 3, 3), dimnames = list(1:5, c("z1", "z2", "z3"), c("t1", "t2", "t3"))) # The individuals which were observed are on the 1st dimension, # the variables are on the 2nd dimension and the times are on the 3rd dimension. ft <- as.foldert(x, ind = 1, var = 2, time = 3)
Builds an object of class foldert
from a data frame.
## S3 method for class 'data.frame' as.foldert(x, method = 1, ind = 1, timecol = 2, nvar = NULL, same.rows = TRUE, ...)
## S3 method for class 'data.frame' as.foldert(x, method = 1, ind = 1, timecol = 2, nvar = NULL, same.rows = TRUE, ...)
x |
data frame. |
method |
1 or 2. Indicates the layout of the data frame x and, therefore, the method used to extract the data and build the foldert.
|
ind |
string or numeric. The name of the column of x containing the indentifiers of the measured objects, or the number of this column. |
timecol |
string or numeric.
|
nvar |
integer. If Omitted if |
same.rows |
logical. If Necessarily |
... |
further arguments passed to or from other methods. |
an object ft
of class foldert
, that is a list of data frames organised according to time; these data frames have the same column names.
If method = 1
, they can have the same row names (attr(ft, "same.rows") = TRUE
) or not (attr(ft, "same.rows") = FALSE
).
The time attribute attr(ft, "times")
has the same class as x[, timecol]
(numeric vector, ordered factor or object of class "Date"
, "POSIXlt"
or "POSIXct"
) and contains the values of x[, timecol]
.
If method = 2
, they necessarily have the same row names: attr(ft, "same.rows") = TRUE
and attr(ft, "times")
is 1:length(ft)
.
The rownames of each data frame are the identifiers of the individuals, as given by x[, ind]
.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
foldert
: objects of class foldert
.
as.data.frame.foldert
: build a data frame from an object of class foldert
.
as.foldert.array
: build an object of class foldert
from a -array.
# First example: method = 1 times <- as.Date(c("2017-03-01", "2017-04-01", "2017-05-01")) x1 <- data.frame(t=times[1], ind=1:6, f=c("a","a","a","b","b","b"), z1=rep(0,6), z2=rep(0,6), stringsAsFactors = TRUE) x2 <- data.frame(t=times[2], ind=c(1,4,6), f=c("a","b","b"), z1=rnorm(3,1,1), z2=rnorm(3,3,2), stringsAsFactors = TRUE) x3 <- data.frame(t=times[3], ind=c(1,3:6), f=c("a","a","a","b","b"), z1=rnorm(5,3,2), z2=rnorm(5,6,3), stringsAsFactors = TRUE) x <- rbind(x1, x2, x3) ft1 <- as.foldert(x, method = 1, ind = "ind", timecol = "t", same.rows = TRUE) print(ft1) ft2 <- as.foldert(x, method = 1, ind = "ind", timecol = "t", same.rows = FALSE) print(ft2) data(castles.dated) periods <- castles.dated$periods stones <- castles.dated$stones stones$stone <- rownames(stones) castledf <- merge(periods, stones, by = "castle") castledf$period <- as.numeric(castledf$period) castledf$stone <- as.factor(paste(as.character(castledf$castle), as.character(castledf$stone), sep = "_")) castfoldt1 <- as.foldert(castledf, method = 1, ind = "stone", timecol = "period", same.rows = FALSE) summary(castfoldt1) # Second example: method = 2 times <- as.Date(c("2017-03-01", "2017-04-01", "2017-05-01")) y1 <- data.frame(z1=rep(0,6), z2=rep(0,6)) y2 <- data.frame(z1=rnorm(6,1,1), z2=rnorm(6,3,2)) y3 <- data.frame(z1=rnorm(6,3,2), z2=rnorm(6,6,3)) y <- cbind(ind = 1:6, y1, y2, y3) ft3 <- as.foldert(y, method = 2, ind = "ind", timecol = 2, nvar = 2) print(ft3)
# First example: method = 1 times <- as.Date(c("2017-03-01", "2017-04-01", "2017-05-01")) x1 <- data.frame(t=times[1], ind=1:6, f=c("a","a","a","b","b","b"), z1=rep(0,6), z2=rep(0,6), stringsAsFactors = TRUE) x2 <- data.frame(t=times[2], ind=c(1,4,6), f=c("a","b","b"), z1=rnorm(3,1,1), z2=rnorm(3,3,2), stringsAsFactors = TRUE) x3 <- data.frame(t=times[3], ind=c(1,3:6), f=c("a","a","a","b","b"), z1=rnorm(5,3,2), z2=rnorm(5,6,3), stringsAsFactors = TRUE) x <- rbind(x1, x2, x3) ft1 <- as.foldert(x, method = 1, ind = "ind", timecol = "t", same.rows = TRUE) print(ft1) ft2 <- as.foldert(x, method = 1, ind = "ind", timecol = "t", same.rows = FALSE) print(ft2) data(castles.dated) periods <- castles.dated$periods stones <- castles.dated$stones stones$stone <- rownames(stones) castledf <- merge(periods, stones, by = "castle") castledf$period <- as.numeric(castledf$period) castledf$stone <- as.factor(paste(as.character(castledf$castle), as.character(castledf$stone), sep = "_")) castfoldt1 <- as.foldert(castledf, method = 1, ind = "stone", timecol = "period", same.rows = FALSE) summary(castfoldt1) # Second example: method = 2 times <- as.Date(c("2017-03-01", "2017-04-01", "2017-05-01")) y1 <- data.frame(z1=rep(0,6), z2=rep(0,6)) y2 <- data.frame(z1=rnorm(6,1,1), z2=rnorm(6,3,2)) y3 <- data.frame(z1=rnorm(6,3,2), z2=rnorm(6,6,3)) y <- cbind(ind = 1:6, y1, y2, y3) ft3 <- as.foldert(y, method = 2, ind = "ind", timecol = 2, nvar = 2) print(ft3)
Computes pairwise association measures (Cramer's V, Pearson's contingency coefficient, phi, Tschuprow's T) between the categorical variables of a data frame, using functions of the package DescTools
(see Assocs
).
cramer.data.frame(x, check = TRUE) pearson.data.frame(x, check = TRUE) phi.data.frame(x, check = TRUE) tschuprow.data.frame(x, check = TRUE)
cramer.data.frame(x, check = TRUE) pearson.data.frame(x, check = TRUE) phi.data.frame(x, check = TRUE) tschuprow.data.frame(x, check = TRUE)
x |
a data frame (can also be a tibble). Its columns should be factors. |
check |
logical. If |
A square matrix whose elements are the pairwise association measures.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
data(roses) xr = roses[,c("Sha", "Den", "Sym", "rose")] xr$Sha = cut(xr$Sha, breaks = c(0, 5, 7, 10)) xr$Den = cut(xr$Den, breaks = c(0, 4, 6, 10)) xr$Sym = cut(xr$Sym, breaks = c(0, 6, 8, 10)) cramer.data.frame(xr) pearson.data.frame(xr) phi.data.frame(xr) tschuprow.data.frame(xr)
data(roses) xr = roses[,c("Sha", "Den", "Sym", "rose")] xr$Sha = cut(xr$Sha, breaks = c(0, 5, 7, 10)) xr$Den = cut(xr$Den, breaks = c(0, 4, 6, 10)) xr$Sym = cut(xr$Sym, breaks = c(0, 6, 8, 10)) cramer.data.frame(xr) pearson.data.frame(xr) phi.data.frame(xr) tschuprow.data.frame(xr)
Computes the pairwise association measures (Cramer's V, Pearson's contingency coefficient, phi, Tschuprow's T) between the categorical variables of an object of class folder
. The computation is carried out using the functions cramer.data.frame
, tschuprow.data.frame
, pearson.data.frame
or phi.data.frame
. These functions are built from corresponding functions of the package DescTools
(see Assocs
)
cramer.folder(xf) tschuprow.folder(xf) pearson.folder(xf) phi.folder(xf)
cramer.folder(xf) tschuprow.folder(xf) pearson.folder(xf) phi.folder(xf)
xf |
an object of class |
A list the length of which is equal to the number of data frames of the folder. Each element of the list is a square matrice giving the pairwise association measures of the variables of the corresponding data frame.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
data(roses) xr = roses[,c("Sha", "Den", "Sym", "rose")] xr$Sha = cut(xr$Sha, breaks = c(0, 5, 7, 10)) xr$Den = cut(xr$Den, breaks = c(0, 4, 6, 10)) xr$Sym = cut(xr$Sym, breaks = c(0, 6, 8, 10)) xfolder = as.folder(xr, groups = "rose") cramer.folder(xfolder) pearson.folder(xfolder) phi.folder(xfolder) tschuprow.folder(xfolder)
data(roses) xr = roses[,c("Sha", "Den", "Sym", "rose")] xr$Sha = cut(xr$Sha, breaks = c(0, 5, 7, 10)) xr$Den = cut(xr$Den, breaks = c(0, 4, 6, 10)) xr$Sym = cut(xr$Sym, breaks = c(0, 6, 8, 10)) xfolder = as.folder(xr, groups = "rose") cramer.folder(xfolder) pearson.folder(xfolder) phi.folder(xfolder) tschuprow.folder(xfolder)
Computation of the parameter of the normal reference rule in order to estimate the (matrix) bandwidth.
bandwidth.parameter(p, n)
bandwidth.parameter(p, n)
p |
sample dimension. |
n |
sample size. |
The parameter is equal to:
It is based on the minimisation of the asymptotic mean integrated square error in density estimation when using the Gaussian kernel method (Wand and Jones, 1995).
Returns the value required by the functions fpcad
, fmdsd
, fdiscd.misclass
and fdiscd.predict
when their argument windowh
is set to NULL
.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.
Wand, M. P., Jones, M. C. (1995). Kernel Smoothing. Boca Raton, FL: Chapman and Hall.
# Sample size : n <- 20 # Number of variables : p <- 3 bandwidth.parameter(p, n)
# Sample size : n <- 20 # Number of variables : p <- 3 bandwidth.parameter(p, n)
The data were collected by J.M. Rudrauf on Alsacian castles whose building year is known (even approximatively). On each castle, he measured 4 structural parameters on a sample of building stones.
These data are about the same castles as in castles.dated
data set.
data(castles)
data(castles)
castles
is a list of 46 data frames. Each of these data frames matches with one year (between 1136 and 1510) and contains measures on one or several castles which have been built since that year.
Each data frame has 5 to 101 rows (stones) and 5 columns: height
, width
, edging
, boss
(numeric) and castle
(factor).
Rudrauf, J.M., Boumaza, R. (2001). Contribution a l'etude de l'architecture medievale: les caracteristiques des pierres a bossage des chateaux forts alsaciens. Centre de Recherches Archeologiques Medievales de Saverne, 5, 5-38.
data(castles) foldert(castles)
data(castles) foldert(castles)
The data were collected by J.M. Rudrauf on Alsacian castles whose building period is known (even approximately). On each castle, he measured 4 structural parameters on a sample of building stones.
data(castles.dated)
data(castles.dated)
castles.dated
is a list of two data frames:
castles.dated$stones
: this first data frame has 1262 cases (rows) and 5 variables (columns) that are named
height, width, edging, boss
(numeric) and castle
(factor).
castles.dated$periods
: this second data frame has 68 cases and 2 variables named castle
and period
; the column castle
corresponds to the levels of the factor castle
of the first data frame; the column period
is a factor with 6 levels indicating the approximative building period. Thus this factor defines 6 classes of castles.
Rudrauf, J.M., Boumaza, R. (2001). Contribution a l'etude de l'architecture medievale: les caracteristiques des pierres a bossage des chateaux forts alsaciens. Centre de Recherches Archeologiques Medievales de Saverne, 5, 5-38.
data(castles.dated) summary(castles.dated$stones) summary(castles.dated$periods)
data(castles.dated) summary(castles.dated$stones) summary(castles.dated$periods)
The data were collected by J.M. Rudrauf on Alsacian castles whose building period is unknown. On each castle, he measured 4 structural parameters on a sample of building stones.
data(castles.nondated)
data(castles.nondated)
castles.nondated
is a list of two data frames:
castles.nondated$stones
: this first data frame has 1280 cases (rows) and 5 variables (columns) that are named
height, width, edging, boss
(numeric) and castle
(factor).
castles.nondated$periods
: this second data frame has 67 cases and 2 variables named castle
and period
; the column castle
corresponds to the levels of the factor castle
of the first data frame; the column period
is a factor indicating NA as the building period is unknown.
Notice that the data frames corresponding to the castles whose building period is known are those in castles.dated
.
Rudrauf, J.M., Boumaza, R. (2001). Contribution a l'etude de l'architecture medievale: les caracteristiques des pierres a bossage des chateaux forts alsaciens. Centre de Recherches Archeologiques Medievales de Saverne, 5, 5-38.
data(castles.nondated) summary(castles.nondated$stones) summary(castles.nondated$periods)
data(castles.nondated) summary(castles.nondated$stones) summary(castles.nondated$periods)
Computes the correlation matrices of the elements of an object of class folder
.
cor.folder(x, use = "everything", method = "pearson")
cor.folder(x, use = "everything", method = "pearson")
x |
an object of class |
use |
an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs" (see |
method |
a character string indicating which correlation coefficient (or covariance) is to be computed. One of "pearson" (default), "kendall", or "spearman": can be abbreviated. |
It uses cor
to compute the variance matrix of the numeric columns of each element of the folder. If some columns of the data frames are not numeric, there is a warning, and the variances are computed on the numeric columns only.
A list whose elements are the correlation matrices of the elements of the folder.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
folder
to create an object is of class folder
.
mean.folder
, var.folder
, skewness.folder
, kurtosis.folder
for other statistics for folder
objects.
# First example: iris (Fisher) data(iris) iris.fold <- as.folder(iris, "Species") iris.cor <- cor.folder(iris.fold) print(iris.cor) # Second example: roses data(roses) roses.fold <- as.folder(roses, "rose") roses.cor <- cor.folder(roses.fold) print(roses.cor)
# First example: iris (Fisher) data(iris) iris.fold <- as.folder(iris, "Species") iris.cor <- cor.folder(iris.fold) print(iris.cor) # Second example: roses data(roses) roses.fold <- as.folder(roses, "rose") roses.cor <- cor.folder(roses.fold) print(roses.cor)
This function changes numerical columns of a data frame x
into factors. For each of these columns, its range is divided into intervals and the values of this column is recoded according to which interval they fall.
For that, cut
is applied to each column of x
.
## S3 method for class 'data.frame' cut(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE, dig.lab = 3L, ordered_result = FALSE, cutcol = NULL, ...)
## S3 method for class 'data.frame' cut(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE, dig.lab = 3L, ordered_result = FALSE, cutcol = NULL, ...)
x |
data frame (can also be a tibble). |
breaks |
list or numeric.
|
labels |
list of character vectors. If given, its length is equal to the number of columns of x.
See |
include.lowest |
logical, indicating if, for each column |
right |
logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa (see |
dig.lab |
integer or integer vector, which is used when labels are not given. It determines the number of digits used in formatting the break numbers.
|
ordered_result |
logical: should the results be ordered factors? (see |
cutcol |
numeric vector: indices of the columns to be converted into factors. These columns must all be numeric. Otherwise, there is a warning. |
... |
further arguments passed to or from other methods. |
A data frame with the same column and row names as x
.
If cutcol
is given, each numeric column x[, j]
whose number is contained in cutcol
is replaced by a factor.
The other columns are unmodified.
If any column x[, j]
whose number is in cutcol
is not numeric, it is unmodified.
If cutcol
is omitted, every numerical columns are replaced by factors.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
data("roses") x <- roses[roses$rose %in% c("A", "B"), c("Sha", "Sym", "Den", "rose")] cut(x, breaks = 3) cut(x, breaks = 5) cut(x, breaks = c(0, 4, 6, 10)) cut(x, breaks = list(c(0, 6, 8, 10), c(0, 5, 7, 10), c(0, 6, 7, 10))) cut(x, breaks = list(c(0, 6, 8, 10), c(0, 5, 7, 10)), cutcol = 1:2)
data("roses") x <- roses[roses$rose %in% c("A", "B"), c("Sha", "Sym", "Den", "rose")] cut(x, breaks = 3) cut(x, breaks = 5) cut(x, breaks = c(0, 4, 6, 10)) cut(x, breaks = list(c(0, 6, 8, 10), c(0, 5, 7, 10), c(0, 6, 7, 10))) cut(x, breaks = list(c(0, 6, 8, 10), c(0, 5, 7, 10)), cutcol = 1:2)
This function applies to a folder. For each elements (data frames) of this folder, it changes its numerical columns into factors, using cut.data.frame
.
## S3 method for class 'folder' cut(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE, dig.lab = 3L, ordered_result = FALSE, cutcol = NULL, ...)
## S3 method for class 'folder' cut(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE, dig.lab = 3L, ordered_result = FALSE, cutcol = NULL, ...)
x |
an object of class |
breaks |
list or numeric, defining the intervals into which the variables of each element of the folder is to be cut.
See |
labels |
list of character vectors. If not omitted, it gives the labels for the intervals of each column of the elements of |
include.lowest |
logical, indicating if a value equal to the lowest (or highest, for |
right |
logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa (see |
dig.lab |
integer or integer vector, which is used when labels are not given.
It determines the number of digits used in formatting the break numbers.
See |
ordered_result |
logical: should the results be ordered factors? (see |
cutcol |
numeric vector: indices of the columns of the elements of |
... |
further arguments passed to or from other methods. |
An object of class folder
with the same length and names as x
.
Its elements (data frames) have the same column and row names as the elements of x
.
For more details, see cut.data.frame
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
data("roses") x <- as.folder(roses[, c("Sha", "Den", "Sym", "rose")], groups = "rose") summary(x) x3 <- cut(x, breaks = 3) summary(x3) x7 <- cut(x, breaks = 7) summary(x7)
data("roses") x <- as.folder(roses[, c("Sha", "Den", "Sym", "rose")], groups = "rose") summary(x) x3 <- cut(x, breaks = 3) summary(x3) x7 <- cut(x, breaks = 7) summary(x7)
Symmetrized chi-squared distance between two multivariate () or univariate (
) discrete probability distributions, estimated from samples.
ddchisqsym(x1, x2)
ddchisqsym(x1, x2)
x1 , x2
|
vectors or data frames of If they are data frames and have not the same column names, there is a warning. |
Let and
denote the estimated probability distributions of the discrete samples
and
. The symmetrized chi-squared distance between the discrete probability distributions of the samples are computed using the
ddchisqsympar
function.
The distance between the two probability distributions.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
ddchisqsympar
: chi-squared distance between two discrete distributions, given the probabilities on their common support.
Other distances: ddhellinger
, ddjeffreys
, ddjensen
, ddlp
.
# Example 1 x1 <- c("A", "A", "B", "B") x2 <- c("A", "A", "A", "B", "B") ddchisqsym(x1, x2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) ddchisqsym(x1, x2)
# Example 1 x1 <- c("A", "A", "B", "B") x2 <- c("A", "A", "A", "B", "B") ddchisqsym(x1, x2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) ddchisqsym(x1, x2)
Symmetrized chi-squared distance between two discrete probability distributions on the same support (which can be a Cartesian product of sets) , given the probabilities of the states (which are
-tuples) of the support.
ddchisqsympar(p1, p2)
ddchisqsympar(p1, p2)
p1 |
array (or table) the dimension of which is |
p2 |
array (or table) the dimension of which is |
The chi-squared distance between two discrete distributions and
is given by:
Then the symmetrized chi-squared distance is given by the formula:
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
ddchisqsym
: chi-squared distance between two estimated discrete distributions, given samples.
Other distances: ddhellingerpar
, ddjeffreyspar
, ddjensenpar
, ddlppar
.
# Example 1 p1 <- array(c(1/2, 1/2), dimnames = list(c("a", "b"))) p2 <- array(c(1/4, 3/4), dimnames = list(c("a", "b"))) ddchisqsympar(p1, p2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) p1 <- table(x1)/nrow(x1) p2 <- table(x2)/nrow(x2) ddchisqsympar(p1, p2)
# Example 1 p1 <- array(c(1/2, 1/2), dimnames = list(c("a", "b"))) p2 <- array(c(1/4, 3/4), dimnames = list(c("a", "b"))) ddchisqsympar(p1, p2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) p1 <- table(x1)/nrow(x1) p2 <- table(x2)/nrow(x2) ddchisqsympar(p1, p2)
Hellinger (or Matusita) distance between two multivariate () or univariate (
) discrete probability distributions, estimated from samples.
ddhellinger(x1, x2)
ddhellinger(x1, x2)
x1 , x2
|
data frames of If they are data frames and have not the same column names, there is a warning. |
Let and
denote the estimated probability distributions of the discrete samples
and
. The Matusita distance between the discrete probability distributions of the samples are computed using the
ddhellingerpar
function.
The distance between the two probability distributions.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
ddhellingerpar
: Hellinger metric (Matusita distance) between two discrete distributions, given the on their common support probabilities.
Other distances: ddchisqsym
, ddjeffreys
, ddjensen
, ddlp
.
# Example 1 x1 <- c("A", "A", "B", "B") x2 <- c("A", "A", "A", "B", "B") ddhellinger(x1, x2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) ddhellinger(x1, x2)
# Example 1 x1 <- c("A", "A", "B", "B") x2 <- c("A", "A", "A", "B", "B") ddhellinger(x1, x2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) ddhellinger(x1, x2)
Hellinger (or Matusita) distance between two discrete probability distributions on the same support (which can be a Cartesian product of sets) , given the probabilities of the states (which are
-tuples) of the support.
ddhellingerpar(p1, p2)
ddhellingerpar(p1, p2)
p1 |
array (or table) the dimension of which is |
p2 |
array (or table) the dimension of which is |
The Hellinger distance between two discrete distributions and
is given by:
Notice that some authors divide this expression by .
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
ddhellinger
: Hellinger distance between two estimated discrete distributions, given samples.
Other distances: ddchisqsympar
, ddjeffreyspar
, ddjensenpar
, ddlppar
.
# Example 1 p1 <- array(c(1/2, 1/2), dimnames = list(c("a", "b"))) p2 <- array(c(1/4, 3/4), dimnames = list(c("a", "b"))) ddhellingerpar(p1, p2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) p1 <- table(x1)/nrow(x1) p2 <- table(x2)/nrow(x2) ddhellingerpar(p1, p2)
# Example 1 p1 <- array(c(1/2, 1/2), dimnames = list(c("a", "b"))) p2 <- array(c(1/4, 3/4), dimnames = list(c("a", "b"))) ddhellingerpar(p1, p2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) p1 <- table(x1)/nrow(x1) p2 <- table(x2)/nrow(x2) ddhellingerpar(p1, p2)
jeffreys's divergence (symmetrized Kullback-Leibler divergence) between two multivariate () or univariate (
) discrete probability distributions, estimated from samples.
ddjeffreys(x1, x2)
ddjeffreys(x1, x2)
x1 , x2
|
vectors or data frames of If they are data frames and have not the same column names, there is a warning. |
Let and
denote the estimated probability distributions of the discrete samples
and
. The jeffreys's divergence between the discrete probability distributions of the samples are computed using the
ddjeffreyspar
function.
The divergence between the two probability distributions.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
ddjeffreyspar
: Jeffrey's distances between two discrete distributions, given the probabilities on their common support.
Other distances: ddchisqsym
, ddhellinger
, ddjensen
, ddlp
.
# Example 1 x1 <- c("A", "A", "B", "B") x2 <- c("A", "A", "A", "B", "B") ddjeffreys(x1, x2) # Example 2 (Its value can be infinity -Inf-) x1 <- c("A", "A", "B", "C") x2 <- c("A", "A", "A", "B", "B") ddjeffreys(x1, x2) # Example 3 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) ddjeffreys(x1, x2)
# Example 1 x1 <- c("A", "A", "B", "B") x2 <- c("A", "A", "A", "B", "B") ddjeffreys(x1, x2) # Example 2 (Its value can be infinity -Inf-) x1 <- c("A", "A", "B", "C") x2 <- c("A", "A", "A", "B", "B") ddjeffreys(x1, x2) # Example 3 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) ddjeffreys(x1, x2)
Jeffreys divergence (symmetrized Kullback-Leibler divergence) between two discrete probability distributions on the same support (which can be a Cartesian product of sets) , given the probabilities of the states (which are
-tuples) of the support.
ddjeffreyspar(p1, p2)
ddjeffreyspar(p1, p2)
p1 |
array (or table) the dimension of which is |
p2 |
array (or table) the dimension of which is |
Jeffreys divergence between two discrete distributions
and
is given by the formula:
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
ddjeffreys
: Jeffreys distance between two estimated discrete distributions, given samples.
Other distances: ddchisqsympar
, ddhellingerpar
, ddjensenpar
, ddlppar
.
# Example 1 p1 <- array(c(1/2, 1/2), dimnames = list(c("a", "b"))) p2 <- array(c(1/4, 3/4), dimnames = list(c("a", "b"))) ddjeffreyspar(p1, p2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) p1 <- table(x1)/nrow(x1) p2 <- table(x2)/nrow(x2) ddjeffreyspar(p1, p2)
# Example 1 p1 <- array(c(1/2, 1/2), dimnames = list(c("a", "b"))) p2 <- array(c(1/4, 3/4), dimnames = list(c("a", "b"))) ddjeffreyspar(p1, p2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) p1 <- table(x1)/nrow(x1) p2 <- table(x2)/nrow(x2) ddjeffreyspar(p1, p2)
Jensen-Shannon divergence between two multivariate () or univariate (
) discrete probability distributions, estimated from samples.
ddjensen(x1, x2)
ddjensen(x1, x2)
x1 , x2
|
vectors or data frames of If they are data frames and have not the same column names, there is a warning. |
Let and
denote the estimated probability distributions of the discrete samples
and
. The Jensen-Shannon divergence between the discrete probability distributions of the samples are computed using the
ddjensenpar
function.
The distance between the two probability distributions.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
ddjensenpar
: Jensen-Shannon distance between two discrete distributions, given the probabilities on their common support.
Other distances: ddchisqsym
, ddhellinger
, ddjeffreys
, ddlp
.
# Example 1 x1 <- c("A", "A", "B", "B") x2 <- c("A", "A", "A", "B", "B") ddjensen(x1, x2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) ddjensen(x1, x2)
# Example 1 x1 <- c("A", "A", "B", "B") x2 <- c("A", "A", "A", "B", "B") ddjensen(x1, x2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) ddjensen(x1, x2)
Jensen-Shannon divergence between two discrete probability distributions on the same support (which can be a Cartesian product of sets), given the probabilities of the states (which are
-tuples) of the support.
ddjensenpar(p1, p2)
ddjensenpar(p1, p2)
p1 |
array (or table) the dimension of which is |
p2 |
array (or table) the dimension of which is |
The Jensen-Shannon divergence between two discrete distributions
and
is given by the formula:
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
ddjensen
: Jensen-Shannon distance between two estimated discrete distributions, given samples.
Other distances: ddchisqsympar
, ddhellingerpar
, ddjeffreyspar
, ddlppar
.
# Example 1 p1 <- array(c(1/2, 1/2), dimnames = list(c("a", "b"))) p2 <- array(c(1/4, 3/4), dimnames = list(c("a", "b"))) ddjensenpar(p1, p2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) p1 <- table(x1)/nrow(x1) p2 <- table(x2)/nrow(x2) ddjensenpar(p1, p2)
# Example 1 p1 <- array(c(1/2, 1/2), dimnames = list(c("a", "b"))) p2 <- array(c(1/4, 3/4), dimnames = list(c("a", "b"))) ddjensenpar(p1, p2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) p1 <- table(x1)/nrow(x1) p2 <- table(x2)/nrow(x2) ddjensenpar(p1, p2)
distance between two multivariate (
) or univariate (
) discrete probability distributions, estimated from samples.
ddlp(x1, x2, p = 1)
ddlp(x1, x2, p = 1)
x1 , x2
|
vectors or data frames of If they are data frames and have not the same column names, there is a warning. |
p |
integer. Parameter of the distance. |
Let and
denote the estimated probability distributions of the discrete samples
and
. The
distance between the discrete probability distributions of the samples are computed using the
ddlppar
function.
The distance between the two discrete probability distributions.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
ddlppar
: distance between two discrete distributions, given the probabilities on their common support.
Other distances: ddchisqsym
, ddhellinger
, ddjeffreys
, ddjensen
.
# Example 1 x1 <- c("A", "A", "B", "B") x2 <- c("A", "A", "A", "B", "B") ddlp(x1, x2) ddlp(x1, x2, p = 2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) ddlp(x1, x2)
# Example 1 x1 <- c("A", "A", "B", "B") x2 <- c("A", "A", "A", "B", "B") ddlp(x1, x2) ddlp(x1, x2, p = 2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) ddlp(x1, x2)
distance between two discrete probability distributions on the same support (which can be a Cartesian product of
sets) , given the probabilities of the states (which are
-tuples) of the support.
ddlppar(p1, p2, p = 1)
ddlppar(p1, p2, p = 1)
p1 |
array (or table) the dimension of which is |
p2 |
array (or table) the dimension of which is |
p |
integer. Parameter of the distance. |
The distance
between two discrete distributions
and
is given by the formula:
If , it is the variational distance.
If , it is the Patrick-Fisher distance.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
ddlp
: distance between two estimated discrete distributions, given samples.
Other distances: ddchisqsympar
, ddhellingerpar
, ddjeffreyspar
, ddjensenpar
.
# Example 1 p1 <- array(c(1/2, 1/2), dimnames = list(c("a", "b"))) p2 <- array(c(1/4, 3/4), dimnames = list(c("a", "b"))) ddlppar(p1, p2) ddlppar(p1, p2, p=2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) p1 <- table(x1)/nrow(x1) p2 <- table(x2)/nrow(x2) ddlppar(p1, p2)
# Example 1 p1 <- array(c(1/2, 1/2), dimnames = list(c("a", "b"))) p2 <- array(c(1/4, 3/4), dimnames = list(c("a", "b"))) ddlppar(p1, p2) ddlppar(p1, p2, p=2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) p1 <- table(x1)/nrow(x1) p2 <- table(x2)/nrow(x2) ddlppar(p1, p2)
Departments and regions of metropolitan France.
data(departments)
data(departments)
departments
is a data frame with 96 rows and 4 columns (factors):
coded:
departments: numbers
named:
departments: names
coder:
regions: ISO code
namer:
region: names
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
INSEE. Code officiel g\'eographique au 1er janvier 2018.
data(departments) print(departments)
data(departments) print(departments)
Computes the one-leave-out misclassification ratio of the rule assigning groups of individuals, one group after another, to the class of groups (among
classes of groups) which achieves the minimum of the distances or divergences between the probability distribution associated to the group to assign and the
probability distributions associated to the
classes.
discdd.misclass(xf, class.var, distance = c("l1", "l2", "chisqsym", "hellinger", "jeffreys", "jensen", "lp"), crit = 1, p)
discdd.misclass(xf, class.var, distance = c("l1", "l2", "chisqsym", "hellinger", "jeffreys", "jensen", "lp"), crit = 1, p)
xf |
object of class
|
class.var |
string (if
|
distance |
The distance or dissimilarity used to compute the distance matrix between the densities. It can be:
|
crit |
1 or 2. In order to select the densities associated to the classes. See Details. |
p |
integer. Optional. When |
If xf
is an object of class "folderh"
containing the data:
The probability distributions
corresponding to the
groups of individuals are estimated by frequency distributions within each group.
To the class consisting of
groups is associated the probability distribution
,
knowing that when using the one-leave-out method, we do not include the group to assign in its class
.
The
crit
argument selects the estimation method of the 's.
crit=1
The probability distribution is estimated using the whole data of this class, that is the rows of
x
corresponding to the groups of the class
.
The estimation of the 's uses the same method as the estimation of the
's.
crit=2
The probability distributions
are estimated using the corresponding data from
xf
. Then they are averaged to obtain an estimation of the density , that is
.
If xf
is a list of arrays (or list of tables):
The array is the joint frequency distribution of the
group. The frequencies can be absolute or relative.
To the class consisting of
groups is associated the probability distribution
,
knowing that when using the one-leave-out method, we do not include the group to assign in its class
.
The
crit
argument selects the estimation method of the 's.
crit=1
,
where
is the total of
xf[[t]]
.
Notice that when xf[[t]]
contains relative frequencies, its total is 1.
That is equivalent to crit=2
.
crit=2
.
Returns an object of class discdd.misclass
, that is a list including:
classification |
data frame with 4 columns:
|
confusion.mat |
confusion matrix, |
misalloc.per.class |
the misclassification ratio per class, |
misclassed |
the misclassification ratio, |
distances |
matrix with |
proximities |
matrix of the proximity indices (in percents) between the groups and the classes. The proximity between the group |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Rudrauf, J.M., Boumaza, R. (2001). Contribution à l'étude de l'architecture médiévale: les caractéristiques des pierres à bossage des châteaux forts alsaciens, Centre de Recherches Archéologiques médiévales de Saverne, 5, 5-38.
# Example 1 with a folderh obtained by converting numeric variables data("castles.dated") stones <- castles.dated$stones periods <- castles.dated$periods stones$height <- cut(stones$height, breaks = c(19, 27, 40, 71), include.lowest = TRUE) stones$width <- cut(stones$width, breaks = c(24, 45, 62, 144), include.lowest = TRUE) stones$edging <- cut(stones$edging, breaks = c(0, 3, 4, 8), include.lowest = TRUE) stones$boss <- cut(stones$boss, breaks = c(0, 6, 9, 20), include.lowest = TRUE ) castlefh <- folderh(periods, "castle", stones) # Default: dist="l1", crit=1 discdd.misclass(castlefh, "period") # Hellinger distance, crit=2 discdd.misclass(castlefh, "period", distance = "hellinger", crit = 2) # Example 2 with a list of 96 arrays data("dspgd2015") data("departments") classes <- departments[, c("coded", "namer")] names(classes) <- c("group", "class") # Default: dist="l1", crit=1 discdd.misclass(dspgd2015, classes) # Hellinger distance, crit=2 discdd.misclass(dspgd2015, classes, distance = "hellinger", crit = 2)
# Example 1 with a folderh obtained by converting numeric variables data("castles.dated") stones <- castles.dated$stones periods <- castles.dated$periods stones$height <- cut(stones$height, breaks = c(19, 27, 40, 71), include.lowest = TRUE) stones$width <- cut(stones$width, breaks = c(24, 45, 62, 144), include.lowest = TRUE) stones$edging <- cut(stones$edging, breaks = c(0, 3, 4, 8), include.lowest = TRUE) stones$boss <- cut(stones$boss, breaks = c(0, 6, 9, 20), include.lowest = TRUE ) castlefh <- folderh(periods, "castle", stones) # Default: dist="l1", crit=1 discdd.misclass(castlefh, "period") # Hellinger distance, crit=2 discdd.misclass(castlefh, "period", distance = "hellinger", crit = 2) # Example 2 with a list of 96 arrays data("dspgd2015") data("departments") classes <- departments[, c("coded", "namer")] names(classes) <- c("group", "class") # Default: dist="l1", crit=1 discdd.misclass(dspgd2015, classes) # Hellinger distance, crit=2 discdd.misclass(dspgd2015, classes, distance = "hellinger", crit = 2)
Assigns several groups of individuals, one group after another, to the class of groups (among classes of groups) which achieves the minimum of the distances or divergences between the probability distribution associated to the group to assign and the
probability distributions associated to the
classes.
discdd.predict(xf, class.var, distance = c("l1", "l2", "chisqsym", "hellinger", "jeffreys", "jensen", "lp"), crit = 1, misclass.ratio = FALSE, p)
discdd.predict(xf, class.var, distance = c("l1", "l2", "chisqsym", "hellinger", "jeffreys", "jensen", "lp"), crit = 1, misclass.ratio = FALSE, p)
xf |
object of class
|
class.var |
string (if
|
distance |
The distance or dissimilarity used to compute the distance matrix between the densities. It can be:
|
crit |
1 or 2. In order to select the densities associated to the classes. See Details. |
misclass.ratio |
logical (default |
p |
integer. Optional. When |
If xf
is an object of class "folderh"
containing the data:
The probability distributions
corresponding to the
groups of individuals are estimated by frequency distributions within each group.
To the class consisting of
groups is associated the probability distribution
.
The
crit
argument selects the estimation method of the 's.
crit=1
The probability distribution is estimated using the whole data of this class, that is the rows of
x
corresponding to the groups of the class
.
The estimation of the 's uses the same method as the estimation of the
's.
crit=2
The probability distributions
are estimated using the corresponding data from
xf
. Then they are averaged to obtain an estimation of the density , that is
.
If xf
is a list of arrays (or list of tables):
The array is the joint frequency distribution of the
group. The frequencies can be absolute or relative.
To the class consisting of
groups is associated the probability distribution
.
The
crit
argument selects the estimation method of the 's.
crit=1
,
where
is the total of
xf[[t]]
.
Notice that when xf[[t]]
contains relative frequencies, its total is 1.
That is equivalent to crit=2
.
crit=2
.
Returns an object of class discdd.predict
, that is a list including:
prediction |
data frame with 3 columns:
|
distances |
matrix with |
proximities |
matrix of the proximities (in percents). The proximity of a group |
confusion.mat |
the confusion matrix (if |
misclassed |
the misclassification ratio (if |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Rudrauf, J.M., Boumaza, R. (2001). Contribution à l'étude de l'architecture médiévale: les caractéristiques des pierres à bossage des châteaux forts alsaciens, Centre de Recherches Archéologiques médiévales de Saverne, 5, 5-38.
data(castles.dated) data(castles.nondated) stones <- rbind(castles.dated$stones, castles.nondated$stones) periods <- rbind(castles.dated$periods, castles.nondated$periods) stones$height <- cut(stones$height, breaks = c(19, 27, 40, 71), include.lowest = TRUE) stones$width <- cut(stones$width, breaks = c(24, 45, 62, 144), include.lowest = TRUE) stones$edging <- cut(stones$edging, breaks = c(0, 3, 4, 8), include.lowest = TRUE) stones$boss <- cut(stones$boss, breaks = c(0, 6, 9, 20), include.lowest = TRUE ) castlesfh <- folderh(periods, "castle", stones) # Default: dist="l1", crit=1 discdd.predict(castlesfh, "period") # With the calculation of the confusion matrix and misclassification ratio discdd.predict(castlesfh, "period", misclass.ratio = TRUE) # Hellinger distance discdd.predict(castlesfh, "period", distance = "hellinger") # crit=2 discdd.predict(castlesfh, "period", crit = 2)
data(castles.dated) data(castles.nondated) stones <- rbind(castles.dated$stones, castles.nondated$stones) periods <- rbind(castles.dated$periods, castles.nondated$periods) stones$height <- cut(stones$height, breaks = c(19, 27, 40, 71), include.lowest = TRUE) stones$width <- cut(stones$width, breaks = c(24, 45, 62, 144), include.lowest = TRUE) stones$edging <- cut(stones$edging, breaks = c(0, 3, 4, 8), include.lowest = TRUE) stones$boss <- cut(stones$boss, breaks = c(0, 6, 9, 20), include.lowest = TRUE ) castlesfh <- folderh(periods, "castle", stones) # Default: dist="l1", crit=1 discdd.predict(castlesfh, "period") # With the calculation of the confusion matrix and misclassification ratio discdd.predict(castlesfh, "period", misclass.ratio = TRUE) # Hellinger distance discdd.predict(castlesfh, "period", distance = "hellinger") # crit=2 discdd.predict(castlesfh, "period", crit = 2)
distance between probability densities
distance between two multivariate (
) or univariate (dimension:
) probability densities, estimated from samples.
distl2d(x1, x2, method = "gaussiand", check = FALSE, varw1 = NULL, varw2 = NULL)
distl2d(x1, x2, method = "gaussiand", check = FALSE, varw1 = NULL, varw2 = NULL)
x1 , x2
|
the samples from the probability densities (see |
method |
string. It can be:
|
check |
logical. When Notice that if |
varw1 , varw2
|
the bandwidths when the densities are estimated by the kernel method (see |
The function distl2d
computes the distance between and
from the formula
For some information about the method used to compute the inner product or about the arguments, see
l2d
.
The distance between the two densities.
Be careful! If check = FALSE
and one smoothing bandwidth matrix is degenerate, the result returned can not be considered.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
matdistl2d
in order to compute pairwise distances between several densities.
require(MASS) m1 <- c(0,0) v1 <- matrix(c(1,0,0,1),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(4,1,1,9),ncol = 2) x1 <- mvrnorm(n = 3,mu = m1,Sigma = v1) x2 <- mvrnorm(n = 5, mu = m2, Sigma = v2) distl2d(x1, x2, method = "gaussiand") distl2d(x1, x2, method = "kern") distl2d(x1, x2, method = "kern", varw1 = v1, varw2 = v2)
require(MASS) m1 <- c(0,0) v1 <- matrix(c(1,0,0,1),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(4,1,1,9),ncol = 2) x1 <- mvrnorm(n = 3,mu = m1,Sigma = v1) x2 <- mvrnorm(n = 5, mu = m2, Sigma = v2) distl2d(x1, x2, method = "gaussiand") distl2d(x1, x2, method = "kern") distl2d(x1, x2, method = "kern", varw1 = v1, varw2 = v2)
distance between
-normed probability densities
distance between two multivariate (
) or univariate (dimension:
)
-normed probability densities, estimated from samples, where a
-normed probability density is the original probability density function divided by its
-norm.
distl2dnorm(x1, x2, method = "gaussiand", check = FALSE, varw1 = NULL, varw2 = NULL)
distl2dnorm(x1, x2, method = "gaussiand", check = FALSE, varw1 = NULL, varw2 = NULL)
x1 , x2
|
the samples from the probability densities (see |
method |
string. It can be:
|
check |
logical. When Notice that if |
varw1 , varw2
|
the bandwidths when the densities are estimated by the kernel method (see |
Given densities and
, the function
distl2dnormpar
computes the distance between the -normed densities
and
:
For some information about the method used to compute the inner product or about the arguments, see
l2d
.
The distance between the two
-normed densities.
Be careful! If check = FALSE
and one smoothing bandwidth matrix is degenerate, the result returned can not be considered.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
distl2d
for the distance between two probability densities.
matdistl2dnorm
in order to compute pairwise distances between several -normed densities.
require(MASS) m1 <- c(0,0) v1 <- matrix(c(1,0,0,1),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(4,1,1,9),ncol = 2) x1 <- mvrnorm(n = 3,mu = m1,Sigma = v1) x2 <- mvrnorm(n = 5, mu = m2, Sigma = v2) distl2dnorm(x1, x2, method = "gaussiand") distl2dnorm(x1, x2, method = "kern") distl2dnorm(x1, x2, method = "kern", varw1 = v1, varw2 = v2)
require(MASS) m1 <- c(0,0) v1 <- matrix(c(1,0,0,1),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(4,1,1,9),ncol = 2) x1 <- mvrnorm(n = 3,mu = m1,Sigma = v1) x2 <- mvrnorm(n = 5, mu = m2, Sigma = v2) distl2dnorm(x1, x2, method = "gaussiand") distl2dnorm(x1, x2, method = "kern") distl2dnorm(x1, x2, method = "kern", varw1 = v1, varw2 = v2)
distance between
-normed Gaussian densities given their parameters
distance between two multivariate (
) or univariate (dimension:
)
-normed Gaussian densities, given their parameters (mean vectors and covariance matrices if the densities are multivariate, or means and variances if univariate) where a
-normed probability density is the original probability density function divided by its
-norm.
distl2dnormpar(mean1, var1, mean2, var2, check = FALSE)
distl2dnormpar(mean1, var1, mean2, var2, check = FALSE)
mean1 , mean2
|
means of the probability densities. |
var1 , var2
|
variances ( |
check |
logical. When If the variables are univariate, it checks if the variances are not zero. |
Given densities and
, the function
distl2dnormpar
computes the distance between the -normed densities
and
:
.
For some information about the method used to compute the inner product or about the arguments, see
l2dpar
; the norm of the multivariate Gaussian density
is equal to
.
The distance between the two
-normed Gaussian densities.
Be careful! If check = FALSE
and one variance matrix is degenerated (or one variance is zero if the densities are univariate), the result returned must not be considered.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
distl2dpar
for the distance between two probability densities.
matdistl2d
in order to compute pairwise distances between several densities.
u1 <- c(1,1,1); v1 <- matrix(c(4,0,0,0,16,0,0,0,25),ncol = 3); u2 <- c(0,1,0); v2 <- matrix(c(1,0,0,0,1,0,0,0,1),ncol = 3); distl2dnormpar(u1,v1,u2,v2)
u1 <- c(1,1,1); v1 <- matrix(c(4,0,0,0,16,0,0,0,25),ncol = 3); u2 <- c(0,1,0); v2 <- matrix(c(1,0,0,0,1,0,0,0,1),ncol = 3); distl2dnormpar(u1,v1,u2,v2)
distance between Gaussian densities given their parameters
distance between two multivariate (
) or univariate (dimension:
) Gaussian densities, given their parameters (mean vectors and covariance matrices if the densities are multivariate, or means and variances if univariate).
distl2dpar(mean1, var1, mean2, var2, check = FALSE)
distl2dpar(mean1, var1, mean2, var2, check = FALSE)
mean1 , mean2
|
means of the probability densities. |
var1 , var2
|
variances ( |
check |
logical. When If the variables are univariate, it checks if the variances are not zero. |
The function distl2dpar
computes the distance between two densities, say and
, from the formula:
.
For some information about the method used to compute the inner product or about the arguments, see
l2dpar
.
The distance between the two densities.
Be careful! If check = FALSE
and one variance matrix is degenerated (or one variance is zero if the densities are univariate), the result returned must not be considered.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
matdistl2d
in order to compute pairwise distances between several densities.
u1 <- c(1,1,1); v1 <- matrix(c(4,0,0,0,16,0,0,0,25),ncol = 3); u2 <- c(0,1,0); v2 <- matrix(c(1,0,0,0,1,0,0,0,1),ncol = 3); distl2dpar(u1,v1,u2,v2)
u1 <- c(1,1,1); v1 <- matrix(c(4,0,0,0,16,0,0,0,25),ncol = 3); u2 <- c(0,1,0); v2 <- matrix(c(1,0,0,0,1,0,0,0,1),ncol = 3); distl2dpar(u1,v1,u2,v2)
Contingency tables of the counts of Diploma x Socio professional group of France
data(dspg)
data(dspg)
dspg
is a list of 7 arrays (each one corresponding to a year: 1968, 1975, 1982, 1990, 1999, 2010, 2015) of 4 rows (each one corresponding to a level of diploma) and 6 columns (each one corresponding to a socio professional group).
csp:
Socio professional group
diplome:
Diploma
agri:
farmer (agriculteur)
arti:
craftsperson (artisan)
cadr:
senior manager (cadre sup\'erieur)
pint:
middle manager (profession interm\'ediaire)
empl:
employee (employ\'e)
ouvr:
worker (ouvrier)
bepc:
brevet
cap:
NVQ (cap)
bac:
baccalaureate
sup:
higher education (sup\'erieur)
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
data(dspg) names(dspg) print(dspg[[1]])
data(dspg) names(dspg) print(dspg[[1]])
Contingency tables of the counts of Diploma x Socio professional group by metroplitan France departement in year 2015.
data(dspgd2015)
data(dspgd2015)
dspgd2015
is a list of 96 arrays (each one corresponding to a department, designated by its official geographical code) of 4 rows (each one corresponding to a level of diploma) and 6 columns (each one corresponding to a socio professional group).
csp:
Socio professional group
diplome:
Diploma
agri:
farmer (agriculteur)
arti:
craftsperson (artisan)
cadr:
senior manager (cadre sup\'erieur)
pint:
middle manager (profession interm\'ediaire)
empl:
employee (employ\'e)
ouvr:
worker (ouvrier)
bepc:
brevet
cap:
NVQ (cap)
bac:
baccalaureate
sup:
higher education (sup\'erieur)
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
data(dspgd2015) names(dspgd2015) print(dspgd2015[[1]])
data(dspgd2015) names(dspgd2015) print(dspgd2015[[1]])
Performs the first stage (interstructure) of the dual STATIS method in order to describe a data folder, consisting of groups of individuals on which are observed
variables. It returns an object of class
dstatis
.
dstatis.inter(xf, normed = TRUE, centered = TRUE, data.scaled = FALSE, nb.factors = 3, nb.values = 10, sub.title = "", plot.eigen = TRUE, plot.score = FALSE, nscore = 1:3, group.name = "group", filename = NULL)
dstatis.inter(xf, normed = TRUE, centered = TRUE, data.scaled = FALSE, nb.factors = 3, nb.values = 10, sub.title = "", plot.eigen = TRUE, plot.score = FALSE, nscore = 1:3, group.name = "group", filename = NULL)
xf |
object of class |
normed |
logical. If |
centered |
logical. If |
data.scaled |
logical. If |
nb.factors |
numeric. Number of returned principal scores (default |
nb.values |
numerical. Number of returned eigenvalues (default |
sub.title |
string. If provided, the subtitle for the graphs. |
plot.eigen |
logical. If |
plot.score |
logical. If |
nscore |
numeric vector. If |
group.name |
string. Name of the grouping variable. Default: |
filename |
string. Name of the file in which the results are saved. By default ( |
The covariance matrices (if data.scale
is FALSE
) or correlation matrices (if TRUE
) per group are computed. The matrix of the scalar products between these covariance matrices is then computed.
To perform the STATIS method, see the function DSTATIS
of the multigroup
package.
Returns an object of class dstatis
, that is a list including:
inertia |
data frame of the eigenvalues and percentages of inertia. |
contributions |
data frame of the contributions to the first |
qualities |
data frame of the qualities on the first |
scores |
data frame of the first |
norm |
vector of the |
means |
list of the means. |
variances |
list of the covariance matrices. |
correlations |
list of the correlation matrices. |
skewness |
list of the skewness coefficients. |
kurtosis |
list of the kurtosis coefficients. |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Lavit, C., Escoufier, Y., Sabatier, R., Traissac, P. (1994). The ACT (STATIS method). Computational Statistics & Data Analysis, 18 (1994), 97-119.
print.dstatis, plot.dstatis, interpret.dstatis.
data(roses) rosesf <- as.folder(roses[,c("Sha","Den","Sym","rose")]) # Dual STATIS on the covariance matrices result1 <- dstatis.inter(rosesf, data.scaled = FALSE, group.name = "rose") print(result1) plot(result1) # Dual STATIS on the correlation matrices result2 <- dstatis.inter(rosesf, data.scaled = FALSE, group.name = "rose") print(result2) plot(result2)
data(roses) rosesf <- as.folder(roses[,c("Sha","Den","Sym","rose")]) # Dual STATIS on the covariance matrices result1 <- dstatis.inter(rosesf, data.scaled = FALSE, group.name = "rose") print(result1) plot(result1) # Dual STATIS on the correlation matrices result2 <- dstatis.inter(rosesf, data.scaled = FALSE, group.name = "rose") print(result2) plot(result2)
Computes the one-leave-out misclassification ratio of the rule assigning groups of individuals, one group after another, to the class of groups (among
classes of groups) which achieves the minimum of the distances or divergences between the density function associated to the group to assign and the
density functions associated to the
classes.
fdiscd.misclass(xf, class.var, gaussiand = TRUE, distance = c("jeffreys", "hellinger", "wasserstein", "l2", "l2norm"), crit = 1, windowh = NULL)
fdiscd.misclass(xf, class.var, gaussiand = TRUE, distance = c("jeffreys", "hellinger", "wasserstein", "l2", "l2norm"), crit = 1, windowh = NULL)
xf |
object of class
|
class.var |
string. The name of the class variable. |
distance |
The distance or dissimilarity used to compute the distance matrix between the densities. It can be:
If |
crit |
1, 2 or 3. In order to select the densities associated to the classes. See Details. If |
gaussiand |
logical. If If |
windowh |
strictly positive numeric value. If Omitted when |
The probability densities
corresponding to the
groups of individuals are either parametrically estimated (
gaussiand = TRUE
) or estimated using the Gaussian kernel method (gaussiand = FALSE
). In the latter case, the windowh
argument provides the list of the bandwidths to be used. Notice that in the multivariate case (>1), the bandwidths are positive-definite matrices.
The argument windowh
is a numerical value, the matrix bandwidth is of the form , where
is either the square root of the covariance matrix (
>1) or the standard deviation of the estimated density.
If windowh = NULL
(default), in the above formula is computed using the
bandwidth.parameter
function.
To the class consisting of
groups is associated the density denoted
. The
crit
argument selects the estimation method of the densities
.
The density is estimated using the whole data of this class, that is the rows of
x
corresponding to the groups of the class
.
The estimation of the densities uses the same method as the estimation of the
.
The densities
are estimated using the corresponding data from
x
. Then they are averaged to obtain an estimation of the density , that is
.
Each previous density is weighted by
(the number of rows of
corresponding to
). Then they are averaged, that is
.
The last two methods are only available for the -distance. If the divergences between densities are computed using the Hellinger or Wasserstein distance or Jeffreys measure, only the first of these methods is available.
The distance or dissimilarity between the estimated densities is either the distance, the Hellinger distance, Jeffreys measure (symmetrised Kullback-Leibler divergence) or the Wasserstein distance.
If it is the L^2
distance (distance="l2"
or distance="l2norm"
), the densities can be either parametrically estimated or estimated using the Gaussian kernel.
If it is the Hellinger distance (distance="hellinger"
), Jeffreys measure (distance="jeffreys"
) or the Wasserstein distance (distance="wasserstein"
), the densities are considered Gaussian and necessarily parametrically estimated.
Returns an object of class fdiscd.misclass
, that is a list including:
classification |
data frame with 4 columns:
|
confusion.mat |
confusion matrix, |
misalloc.per.class |
the misclassification ratio per class, |
misclassed |
the misclassification ratio, |
distances |
matrix with |
proximities |
matrix of the proximity indices (in percents) between the groups and the classes. The proximity of the group |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Boumaza, R. (2004). Discriminant analysis with independently repeated multivariate measurements: an approach. Computational Statistics & Data Analysis, 47, 823-843.
Rudrauf, J.M., Boumaza, R. (2001). Contribution à l'étude de l'architecture médiévale: les caractéristiques des pierres à bossage des châteaux forts alsaciens. Centre de Recherches Archéologiques Médiévales de Saverne, 5, 5-38.
data(castles.dated) castles.stones <- castles.dated$stones castles.periods <- castles.dated$periods castlesfh <- folderh(castles.periods, "castle", castles.stones) result <- fdiscd.misclass(castlesfh, "period") print(result)
data(castles.dated) castles.stones <- castles.dated$stones castles.periods <- castles.dated$periods castlesfh <- folderh(castles.periods, "castle", castles.stones) result <- fdiscd.misclass(castlesfh, "period") print(result)
Assigns several groups of individuals, one group after another, to the class of groups (among classes of groups) which achieves the minimum of the distances or divergences between the density function associated to the group to assign and the
density functions associated to the
classes.
fdiscd.predict(xf, class.var, gaussiand = TRUE, distance = c("jeffreys", "hellinger", "wasserstein", "l2", "l2norm"), crit = 1, windowh = NULL, misclass.ratio = FALSE)
fdiscd.predict(xf, class.var, gaussiand = TRUE, distance = c("jeffreys", "hellinger", "wasserstein", "l2", "l2norm"), crit = 1, windowh = NULL, misclass.ratio = FALSE)
xf |
object of class
Notice that for the versions earlier than 2.0, fdiscd.predict applied to two data frames. |
class.var |
string. The name of the class variable. |
distance |
The distance or divergence used to compute the distance matrix between the densities. It can be:
If |
crit |
1, 2 or 3. In order to select the densities associated to the classes. See Details. If |
gaussiand |
logical. If If |
windowh |
strictly positive number. If Omitted when |
misclass.ratio |
logical (default |
To the group is associated the density denoted
. To the class
consisting of
groups is associated the density denoted
. The
crit
argument selects the estimation method of the densities
.
The density is estimated using the whole data of this class, that is the rows of
x
corresponding to the groups of the class
.
The densities
are estimated using the corresponding data from
x
. Then they are averaged to obtain an estimation of the density , that is
.
Each previous density is weighted by
(the number of rows of
corresponding to
). Then they are averaged, that is
.
The last two methods are available only for the -distance. If the divergences between densities are computed using the Hellinger or Wasserstein distance or Jeffreys measure, only the first of these methods is available.
Returns an object of class fdiscd.predict
, that is a list including:
prediction |
data frame with 3 columns:
|
distances |
matrix with |
proximities |
matrix of the proximities (in percents). The proximity of a group |
confusion.mat |
the confusion matrix (if |
misclassed |
the misclassification ratio (if |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Boumaza, R. (2004). Discriminant analysis with independently repeated multivariate measurements: an approach. Computational Statistics & Data Analysis, 47, 823-843.
Rudrauf, J.M., Boumaza, R. (2001). Contribution à l'étude de l'architecture médiévale: les caractéristiques des pierres à bossage des châteaux forts alsaciens. Centre de Recherches Archéologiques Médiévales de Saverne, 5, 5-38.
data(castles.dated) data(castles.nondated) castles.stones <- rbind(castles.dated$stones, castles.nondated$stones) castles.periods <- rbind(castles.dated$periods, castles.nondated$periods) castlesfh <- folderh(castles.periods, "castle", castles.stones) # With the L^2-distance # - crit=1 resultl2.1 <- fdiscd.predict(castlesfh, "period", distance="l2", crit=1) print(resultl2.1) # - crit=2 ## Not run: resultl2.2 <- fdiscd.predict(castlesfh, "period", distance="l2", crit=2) print(resultl2.2) ## End(Not run) # - crit=3 resultl2.3 <- fdiscd.predict(castlesfh, "period", distance="l2", crit=3) print(resultl2.3) # With the Hellinger distance resulthelling <- fdiscd.predict(castlesfh, "period", distance="hellinger") print(resulthelling) # With jeffreys measure resultjeff <- fdiscd.predict(castlesfh, "period", distance="jeffreys") print(resultjeff)
data(castles.dated) data(castles.nondated) castles.stones <- rbind(castles.dated$stones, castles.nondated$stones) castles.periods <- rbind(castles.dated$periods, castles.nondated$periods) castlesfh <- folderh(castles.periods, "castle", castles.stones) # With the L^2-distance # - crit=1 resultl2.1 <- fdiscd.predict(castlesfh, "period", distance="l2", crit=1) print(resultl2.1) # - crit=2 ## Not run: resultl2.2 <- fdiscd.predict(castlesfh, "period", distance="l2", crit=2) print(resultl2.2) ## End(Not run) # - crit=3 resultl2.3 <- fdiscd.predict(castlesfh, "period", distance="l2", crit=3) print(resultl2.3) # With the Hellinger distance resulthelling <- fdiscd.predict(castlesfh, "period", distance="hellinger") print(resulthelling) # With jeffreys measure resultjeff <- fdiscd.predict(castlesfh, "period", distance="jeffreys") print(resultjeff)
Performs functional hierarchic cluster analysis of probability densities. It returns an object of class fhclustd
. It applies hclust
to the distance matrix between the densities.
fhclustd(xf, group.name = "group", gaussiand = TRUE, distance = c("jeffreys", "hellinger", "wasserstein", "l2", "l2norm"), windowh=NULL, data.centered = FALSE, data.scaled = FALSE, common.variance = FALSE, sub.title = "", filename = NULL, method.hclust = "complete")
fhclustd(xf, group.name = "group", gaussiand = TRUE, distance = c("jeffreys", "hellinger", "wasserstein", "l2", "l2norm"), windowh=NULL, data.centered = FALSE, data.scaled = FALSE, common.variance = FALSE, sub.title = "", filename = NULL, method.hclust = "complete")
xf |
object of class
|
group.name |
string.
|
gaussiand |
logical. If If |
distance |
The distance or divergence used to compute the distance matrix between the densities. It can be:
If |
windowh |
either a list of Omitted when |
data.centered |
logical. If |
data.scaled |
logical. If |
common.variance |
logical. If |
sub.title |
string. If provided, the subtitle for the graphs. |
filename |
string. Name of the file in which the results are saved. By default ( |
method.hclust |
the agglomeration method to be used for the clustering. See the |
In order to compute the distances/dissimilarities between the groups, the probability densities
corresponding to the
groups of individuals are either parametrically estimated (
gaussiand = TRUE
) or estimated using the Gaussian kernel method (gaussiand = FALSE
). In the latter case, the windowh
argument provides the list of the bandwidths to be used. Notice that in the multivariate case (>1), the bandwidths are positive-definite matrices.
The distances between the
groups of individuals are given by the
-distances between the
probability densities
corresponding to these groups. The
hclust
function is then applied to the distance matrix to perform the hierarchical clustering on the groups.
If windowh
is a numerical value, the matrix bandwidth is of the form , where
is either the square root of the covariance matrix (
>1) or the standard deviation of the estimated density.
If windowh = NULL
(default), in the above formula is computed using the
bandwidth.parameter
function.
The distance or dissimilarity between the estimated densities is either the distance, the Hellinger distance, Jeffreys measure (symmetrised Kullback-Leibler divergence) or the Wasserstein distance.
If it is the L^2
distance (distance="l2"
or distance="l2norm"
), the densities can be either parametrically estimated or estimated using the Gaussian kernel.
If it is the Hellinger distance (distance="hellinger"
), Jeffreys measure (distance="jeffreys"
) or the Wasserstein distance (distance="wasserstein"
), the densities are considered Gaussian and necessarily parametrically estimated.
Returns an object of class fhclustd
, that is a list including:
distances |
matrix of the |
clust |
an object of class |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
fdiscd.predict, fdiscd.misclass
data(castles.dated) stones <- castles.dated$stones periods <- castles.dated$periods periods123 <- periods[periods$period %in% 1:3, "castle"] stones123 <- stones[stones$castle %in% periods123, ] stones123$castle <- as.factor(as.character(stones123$castle)) yf <- as.folder(stones123) # Jeffreys measure (default): resultjef <- fhclustd(yf) print(resultjef) print(resultjef, dist.print = TRUE) plot(resultjef) plot(resultjef, hang = -1) # Use cutree (stats package) to get the partition cutree(resultjef$clust, k = 1:4) cutree(resultjef$clust, k = 5) cutree(resultjef$clust, h = 0.041) # Applied to a data frame (Jeffreys measure): fhclustd(stones123, group.name = "castle") # Use cutree (stats package) to get the partition cutree(resultjef$clust, k = 1:4) cutree(resultjef$clust, k = 5) cutree(resultjef$clust, h = 0.041) # Hellinger distance: resulthel <- fhclustd(yf, distance = "hellinger") print(resulthel) print(resulthel, dist.print = TRUE) plot(resulthel) plot(resulthel, hang = -1) # Use cutree (stats package) to get the partition cutree(resulthel$clust, k = 1:4) cutree(resulthel$clust, k = 5) cutree(resulthel$clust, h = 0.041) ## Not run: # L2-distance: xf <- as.folder(stones) result <- fhclustd(xf, distance = "l2") print(result) print(result, dist.print = TRUE) plot(result) plot(result, hang = -1) # Use cutree (stats package) to get the partition cutree(result$clust, k = 1:5) cutree(result$clust, k = 5) cutree(result$clust, h = 0.18) ## End(Not run) periods123 <- periods[periods$period %in% 1:3, "castle"] stones123 <- stones[stones$castle %in% periods123, ] stones123$castle <- as.factor(as.character(stones123$castle)) yf <- as.folder(stones123) result123 <- fhclustd(yf, distance = "l2") print(result123) print(result123, dist.print = TRUE) plot(result123) plot(result123, hang = -1) # Use cutree (stats package) to get the partition cutree(result123$clust, k = 1:4) cutree(result123$clust, k = 5) cutree(result123$clust, h = 0.041)
data(castles.dated) stones <- castles.dated$stones periods <- castles.dated$periods periods123 <- periods[periods$period %in% 1:3, "castle"] stones123 <- stones[stones$castle %in% periods123, ] stones123$castle <- as.factor(as.character(stones123$castle)) yf <- as.folder(stones123) # Jeffreys measure (default): resultjef <- fhclustd(yf) print(resultjef) print(resultjef, dist.print = TRUE) plot(resultjef) plot(resultjef, hang = -1) # Use cutree (stats package) to get the partition cutree(resultjef$clust, k = 1:4) cutree(resultjef$clust, k = 5) cutree(resultjef$clust, h = 0.041) # Applied to a data frame (Jeffreys measure): fhclustd(stones123, group.name = "castle") # Use cutree (stats package) to get the partition cutree(resultjef$clust, k = 1:4) cutree(resultjef$clust, k = 5) cutree(resultjef$clust, h = 0.041) # Hellinger distance: resulthel <- fhclustd(yf, distance = "hellinger") print(resulthel) print(resulthel, dist.print = TRUE) plot(resulthel) plot(resulthel, hang = -1) # Use cutree (stats package) to get the partition cutree(resulthel$clust, k = 1:4) cutree(resulthel$clust, k = 5) cutree(resulthel$clust, h = 0.041) ## Not run: # L2-distance: xf <- as.folder(stones) result <- fhclustd(xf, distance = "l2") print(result) print(result, dist.print = TRUE) plot(result) plot(result, hang = -1) # Use cutree (stats package) to get the partition cutree(result$clust, k = 1:5) cutree(result$clust, k = 5) cutree(result$clust, h = 0.18) ## End(Not run) periods123 <- periods[periods$period %in% 1:3, "castle"] stones123 <- stones[stones$castle %in% periods123, ] stones123$castle <- as.factor(as.character(stones123$castle)) yf <- as.folder(stones123) result123 <- fhclustd(yf, distance = "l2") print(result123) print(result123, dist.print = TRUE) plot(result123) plot(result123, hang = -1) # Use cutree (stats package) to get the partition cutree(result123$clust, k = 1:4) cutree(result123$clust, k = 5) cutree(result123$clust, h = 0.041)
These data are collected on eight rosebushes from four varieties, during summer 2010 in Angers, France. They give measures of the flowering.
data("floribundity")
data("floribundity")
floribundity
is a list of 16 data frames, each corresponding to an observation date. Each one of these data frames has 3 or 4 columns:
rose
: the number of the rosebush, that is an identifier.
variety
: factor. The variety of the rosebush.
area
(when available): numeric. The ratio of flowering area to the whole plant area, measured on the photograph of the rosebush.
nflowers
(when available): integer. The number of flowers on the rosebush.
The row names of these data frames are the rose identifiers.
data(floribundity) foldt <- foldert(floribundity, times = as.Date(names(floribundity)), rows.select = "union") summary(foldt)
data(floribundity) foldt <- foldert(floribundity, times = as.Date(names(floribundity)), rows.select = "union") summary(foldt)
Applies the multidimensional scaling (MDS) method to probability densities in order to describe a data folder, consisting of groups of individuals on which are observed
variables. It returns an object of class
fmdsd
. It applies cmdscale
to the distance matrix between the densities.
fmdsd(xf, group.name = "group", gaussiand = TRUE, distance = c("jeffreys", "hellinger", "wasserstein", "l2", "l2norm"), windowh=NULL, data.centered = FALSE, data.scaled = FALSE, common.variance = FALSE, add = TRUE, nb.factors = 3, nb.values = 10, sub.title = "", plot.eigen = TRUE, plot.score = FALSE, nscore = 1:3, filename = NULL)
fmdsd(xf, group.name = "group", gaussiand = TRUE, distance = c("jeffreys", "hellinger", "wasserstein", "l2", "l2norm"), windowh=NULL, data.centered = FALSE, data.scaled = FALSE, common.variance = FALSE, add = TRUE, nb.factors = 3, nb.values = 10, sub.title = "", plot.eigen = TRUE, plot.score = FALSE, nscore = 1:3, filename = NULL)
xf |
object of class
|
group.name |
string.
|
gaussiand |
logical. If |
distance |
The distance or divergence used to compute the distance matrix between the densities. If
If |
windowh |
either a list of Omitted when |
data.centered |
logical. If |
data.scaled |
logical. If |
common.variance |
logical. If |
add |
logical indicating if an additive constant should be computed and added to the non diagonal dissimilarities such that the modified dissimilarities are Euclidean (default |
nb.factors |
numeric. Number of returned principal coordinates (default Warning: The |
nb.values |
numeric. Number of returned eigenvalues (default |
sub.title |
string. Subtitle for the graphs (default |
plot.eigen |
logical. If |
plot.score |
logical. If |
nscore |
numeric vector. If |
filename |
string. Name of the file in which the results are saved. By default ( |
In order to compute the distances/dissimilarities between the groups, the probability densities
corresponding to the
groups of individuals are either parametrically estimated (
gaussiand = TRUE
) or estimated using the Gaussian kernel method (gaussiand = FALSE
). In the latter case, the windowh
argument provides the list of the bandwidths to be used. Notice that in the multivariate case (>1), the bandwidths are positive-definite matrices.
If windowh
is a numerical value, the matrix bandwidth is of the form , where
is either the square root of the covariance matrix (
>1) or the standard deviation of the estimated density.
If windowh = NULL
(default), in the above formula is computed using the
bandwidth.parameter
function.
The distance or dissimilarity between the estimated densities is either the distance, the Hellinger distance, Jeffreys measure (symmetrised Kullback-Leibler divergence) or the Wasserstein distance.
If it is the L^2
distance (distance="l2"
or distance="l2norm"
), the densities can be either parametrically estimated or estimated using the Gaussian kernel.
If it is the Hellinger distance (distance="hellinger"
), Jeffreys measure (distance="jeffreys"
) or the Wasserstein distance (distance="wasserstein"
), the densities are considered Gaussian and necessarily parametrically estimated.
Returns an object of class fmdsd
, i.e. a list including:
inertia |
data frame of the eigenvalues and percentages of inertia. |
scores |
data frame of the |
means |
list of the means. |
variances |
list of the covariance matrices. |
correlations |
list of the correlation matrices. |
skewness |
list of the skewness coefficients. |
kurtosis |
list of the kurtosis coefficients. |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.
Delicado, P. (2011). Dimensionality reduction when data are density functions. Computational Statistics & Data Analysis, 55, 401-420.
Yousfi, S., Boumaza, R., Aissani, D., Adjabi, S. (2014). Optimal bandwith matrices in functional principal component analysis of density function. Journal of Statistical Computation and Simulation, 85 (11), 2315-2330.
Cox, T.F., Cox, M.A.A. (2001). Multimensional Scaling, second ed. Chapman & Hall/CRC.
fpcad print.fmdsd, plot.fmdsd, interpret.fmdsd, bandwidth.parameter
data(roses) rosesf <- as.folder(roses[,c("Sha","Den","Sym","rose")]) # MDS on Gaussian densities (on sensory data) # using jeffreys measure (default): resultjeff <- fmdsd(rosesf, distance = "jeffreys") print(resultjeff) plot(resultjeff) ## Not run: # Applied to a data frame: resultjeffdf <- fmdsd(roses[,c("Sha","Den","Sym","rose")], distance = "jeffreys", group.name = "rose") print(resultjeffdf) plot(resultjeffdf) ## End(Not run) # using the Hellinger distance: resulthellin <- fmdsd(rosesf, distance = "hellinger") print(resulthellin) plot(resulthellin) # using the Wasserstein distance: resultwass <- fmdsd(rosesf, distance = "wasserstein") print(resultwass) plot(resultwass) # Gaussian case, using the L2-distance: resultl2 <- fmdsd(rosesf, distance = "l2") print(resultl2) plot(resultl2) # Gaussian case, using the L2-distance between normed densities: resultl2norm <- fmdsd(rosesf, distance = "l2norm") print(resultl2norm) plot(resultl2norm) ## Not run: # Non Gaussian case, using the L2-distance, # the densities are estimated using the Gaussian kernel method: result <- fmdsd(rosesf, distance = "l2", gaussiand = FALSE, group.name = "rose") print(result) plot(result) ## End(Not run)
data(roses) rosesf <- as.folder(roses[,c("Sha","Den","Sym","rose")]) # MDS on Gaussian densities (on sensory data) # using jeffreys measure (default): resultjeff <- fmdsd(rosesf, distance = "jeffreys") print(resultjeff) plot(resultjeff) ## Not run: # Applied to a data frame: resultjeffdf <- fmdsd(roses[,c("Sha","Den","Sym","rose")], distance = "jeffreys", group.name = "rose") print(resultjeffdf) plot(resultjeffdf) ## End(Not run) # using the Hellinger distance: resulthellin <- fmdsd(rosesf, distance = "hellinger") print(resulthellin) plot(resulthellin) # using the Wasserstein distance: resultwass <- fmdsd(rosesf, distance = "wasserstein") print(resultwass) plot(resultwass) # Gaussian case, using the L2-distance: resultl2 <- fmdsd(rosesf, distance = "l2") print(resultl2) plot(resultl2) # Gaussian case, using the L2-distance between normed densities: resultl2norm <- fmdsd(rosesf, distance = "l2norm") print(resultl2norm) plot(resultl2norm) ## Not run: # Non Gaussian case, using the L2-distance, # the densities are estimated using the Gaussian kernel method: result <- fmdsd(rosesf, distance = "l2", gaussiand = FALSE, group.name = "rose") print(result) plot(result) ## End(Not run)
Creates an object of class "folder"
(called folder below), that is a list of data frames with the same column names. Thus, these data sets are on the same variables. They can be on the same individuals or not.
folder(x1, x2 = NULL, ..., cols.select = "intersect", rows.select = "")
folder(x1, x2 = NULL, ..., cols.select = "intersect", rows.select = "")
x1 |
data frame (can also be a tibble) or list of data frames.
|
x2 |
data frame. Must be provided if |
... |
optional. One or several data frames. When |
cols.select |
string. Gives the method used to choose the column names of the data frames of the folder. This argument can be:
If |
rows.select |
string. Gives the method used to choose the row names of the data frames of the folder. This argument can be:
|
The class folder
has a logical attributes attr(,"same.rows")
.
The data frames in the returned folder all have the same column names. That means that the same variables are observed in every data sets.
If the rows.select
argument is "union"
or "intersect"
, the elements of the returned folder have the same rows. That means that the same individuals are present in every data sets. This allows to consider the evolution of each individual among time.
If rows.select
is ""
, every rows of this folder are different, and the row names are made unique by adding the name of the data frame to the row names. In this case, The individuals of the data sets are assumed to be all different. Or, at least, the user does not mind if they are the same or not.
Returns an object of class "folder"
, that is a list of data frames.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
is.folder
to test if an object is of class folder
.
folderh
to build a folder of several data frames with a hierarchic relation between each pair of consecutive data frames.
# First example x1 <- data.frame(x = rnorm(10), y = 1:10) x2 <- data.frame(x = rnorm(10), z = runif(10, 1, 10)) f1 <- folder(x1, x2) print(f1) f2 <- folder(x1, x2, cols.select = "union") print(f2) #Second example data(iris) iris.set <- iris[iris$Species == "setosa", 1:4] iris.ver <- iris[iris$Species == "versicolor", 1:4] iris.vir <- iris[iris$Species == "virginica", 1:4] irisf1 <- folder(iris.set, iris.ver, iris.vir) print(irisf1) listofdf <- list(df1 = iris.set,df2 = iris.ver,df3 = iris.vir) irisf2 <- folder(listofdf,x2 = NULL) print(irisf2)
# First example x1 <- data.frame(x = rnorm(10), y = 1:10) x2 <- data.frame(x = rnorm(10), z = runif(10, 1, 10)) f1 <- folder(x1, x2) print(f1) f2 <- folder(x1, x2, cols.select = "union") print(f2) #Second example data(iris) iris.set <- iris[iris$Species == "setosa", 1:4] iris.ver <- iris[iris$Species == "versicolor", 1:4] iris.vir <- iris[iris$Species == "virginica", 1:4] irisf1 <- folder(iris.set, iris.ver, iris.vir) print(irisf1) listofdf <- list(df1 = iris.set,df2 = iris.ver,df3 = iris.vir) irisf2 <- folder(listofdf,x2 = NULL) print(irisf2)
Creates an object of class folderh
, that is a list of data frames whose rows are related by (n-1) keys, each key defining a relation "1 to N" between the two adjacent data frames passed as arguments of the function.
folderh(df1, key1, df2, ..., na.rm = TRUE)
folderh(df1, key1, df2, ..., na.rm = TRUE)
df1 |
data frame (can also be a tibble) with at least two columns. It contains a factor (whose name is given by |
key1 |
character string. The name of the factor of the data frames |
df2 |
data frame (or tibble) with at least two columns. It contains a factor column (named by |
... |
optional. One or several supplementary character strings and data frames, ordered as follows: |
na.rm |
logical. If |
The object of class folderh
is a list of data frames.
If no optional arguments are given via ...
, that is , the two data frames of the list
have a column named by the attribute
attr(, "keys")
(argument key1
), which is a factor with the same levels.
Each one of these levels occur exactly once in the first data frame of the list.
If some supplementary data frames and supplementary strings key2, df3
, ... are given as optional arguments, is the number of data frames given as arguments.
Then, the attribute
attr(, "keys")
is a vector of character strings.
For
, its
-th element is the name of a column of the
-th and
-th data frames of the folderh, which are factors with the same levels.
Each one of these levels occur exactly once in the
-th data frame.
If there are more than two data frames, folderh
computes a folderh with the two last data frames, and then uses the function appendtofolderh
to append each one of the other data frames to the folderh.
Returns an object of class folderh
. Its elements are the data frames passed as arguments, and the attribute attr(, "keys")
contains the character arguments.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
is.folderh
to test if an object is of class folderh
.
folder
for a folder of data frames with no hierarchic relation between them.
as.folder.folderh
(or as.data.frame.folderh
) to build an object of class folder
(or a data frame) from an object of class folderh
,
# First example: rose flowers data(roseflowers) df1 <- roseflowers$variety df2 <- roseflowers$flower fh1 <- folderh(df1, "rose", df2) print(fh1) # Second example data(roseleaves) roses <- roseleaves$rose stems <- roseleaves$stem leaves <- roseleaves$leaf leaflets <- roseleaves$leaflet fh2 <- folderh(roses, "rose", stems, "stem", leaves, "leaf", leaflets) print(fh2)
# First example: rose flowers data(roseflowers) df1 <- roseflowers$variety df2 <- roseflowers$flower fh1 <- folderh(df1, "rose", df2) print(fh1) # Second example data(roseleaves) roses <- roseleaves$rose stems <- roseleaves$stem leaves <- roseleaves$leaf leaflets <- roseleaves$leaflet fh2 <- folderh(roses, "rose", stems, "stem", leaves, "leaf", leaflets) print(fh2)
An object of S3 class "foldermtg" is built and returned by the function read.mtg
.
An object of this S3 class is a list of at least 5 data frames (see the Value section in read.mtg
):
classes
, description
, features
, topology
, coordinates
...
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Pradal, C., Godin, C. and Cokelaer, T. (2023). MTG user guide
read.mtg
print.foldermtg
mtgorder
mtgfile1 <- system.file("extdata/plant1.mtg", package = "dad") x1 <- read.mtg(mtgfile1) print(x1) mtgfile2 <- system.file("extdata/plant2.mtg", package = "dad") x2 <- read.mtg(mtgfile2) print(x2)
mtgfile1 <- system.file("extdata/plant1.mtg", package = "dad") x1 <- read.mtg(mtgfile1) print(x1) mtgfile2 <- system.file("extdata/plant2.mtg", package = "dad") x2 <- read.mtg(mtgfile2) print(x2)
Creates an object of class "foldert"
(called foldert below), that is a list of data frames, each of them corresponding to a time of observation. These data sets are on the same variables. They can be on the same individuals or not.
foldert(x1, x2 = NULL, ..., times = NULL, cols.select = "intersect", rows.select = "")
foldert(x1, x2 = NULL, ..., times = NULL, cols.select = "intersect", rows.select = "")
x1 |
data frame (can also be a tibble) or list of data frames.
|
x2 |
data frame. Must be provided if |
... |
optional. One or several data frames when |
times |
Vector of the “times” of observations. It can be either numeric, or an ordered factor or an object of class So there is an order relationship between these times. |
cols.select |
string or character vector. Gives the method used to choose the column names of the data frames of the foldert. This argument can be:
If |
rows.select |
string. Gives the method used to choose the row names of the data frames of the foldert. This argument can be:
|
The class "foldert"
has an attribute attr(,"times")
(the times
argument, when provided) and a logical attributes
attr(,"same.rows")
.
The data frames in the returned foldert all have the same column names. That means that the same variables are observed in every data sets.
If the rows.select
argument is "union"
or "intersect"
, the elements of the returned foldert have the same rows. That means that the same individuals are present in every data sets. This allows to consider the evolution of each individual among time.
If rows.select
is ""
, every rows of this foldert are different, and the row names are made unique by adding the name of the data frame to the row names. In this case, The individuals of the data sets are assumed to be all different. Or, at least, the user does not mind if they are the same or not.
Returns an object of class "foldert"
, that is a list of data frames. The elements of this list are ordered according to time.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
is.foldert
to test if an object is of class foldert
.
as.foldert.data.frame
: build an object of class foldert
from a data frame.
as.foldert.array
: build an object of class foldert
from a -array.
x <- data.frame(xyz = rep(c("A", "B", "C"), each = 2), xy = letters[1:6], x1 = rnorm(6), x2 = rnorm(6, 2, 1), row.names = paste0("i", 1:6), stringsAsFactors = TRUE) y <- data.frame(xyz = c("A", "A", "B", "C"), xy = c("a", "b", "a", "c"), y1 = rnorm(4, 4, 2), row.names = c(paste0("i", c(1, 2, 4, 6))), stringsAsFactors = TRUE) z <- data.frame(xyz = c("A", "B", "C"), z1 = rnorm(3), row.names = c("i1", "i2", "i5"), stringsAsFactors = TRUE) # Columns selected by the user ftc. <- foldert(x, y, z, cols.select = c("xyz", "x1", "y1", "z1")) print(ftc.) # cols.select = "union": all the variables (columns) of each data frame are kept ftcun <- foldert(x, y, z, cols.select = "union") print(ftcun) # cols.select = "intersect": only variables common to all data frames ftcint <- foldert(x, y, z, cols.select = "intersect") print(ftcint) # rows.select = "": the rows of the data frames are unchanged # and the rownames are made unique ftr. <- foldert(x, y, z, rows.select = "") print(ftr.) # rows.select = "union": all the individuals (rows) of each data frame are kept ftrun <- foldert(x, y, z, rows.select = "union") print(ftrun) # rows.select = "intersect": only individuals common to all data frames ftrint <- foldert(x, y, z, rows.select = "intersect") print(ftrint) # Define the times (times argument) ftimes <- foldert(x, y, z, times = as.Date(c("2018-03-01", "2018-04-01", "2018-05-01"))) print(ftimes)
x <- data.frame(xyz = rep(c("A", "B", "C"), each = 2), xy = letters[1:6], x1 = rnorm(6), x2 = rnorm(6, 2, 1), row.names = paste0("i", 1:6), stringsAsFactors = TRUE) y <- data.frame(xyz = c("A", "A", "B", "C"), xy = c("a", "b", "a", "c"), y1 = rnorm(4, 4, 2), row.names = c(paste0("i", c(1, 2, 4, 6))), stringsAsFactors = TRUE) z <- data.frame(xyz = c("A", "B", "C"), z1 = rnorm(3), row.names = c("i1", "i2", "i5"), stringsAsFactors = TRUE) # Columns selected by the user ftc. <- foldert(x, y, z, cols.select = c("xyz", "x1", "y1", "z1")) print(ftc.) # cols.select = "union": all the variables (columns) of each data frame are kept ftcun <- foldert(x, y, z, cols.select = "union") print(ftcun) # cols.select = "intersect": only variables common to all data frames ftcint <- foldert(x, y, z, cols.select = "intersect") print(ftcint) # rows.select = "": the rows of the data frames are unchanged # and the rownames are made unique ftr. <- foldert(x, y, z, rows.select = "") print(ftr.) # rows.select = "union": all the individuals (rows) of each data frame are kept ftrun <- foldert(x, y, z, rows.select = "union") print(ftrun) # rows.select = "intersect": only individuals common to all data frames ftrint <- foldert(x, y, z, rows.select = "intersect") print(ftrint) # Define the times (times argument) ftimes <- foldert(x, y, z, times = as.Date(c("2018-03-01", "2018-04-01", "2018-05-01"))) print(ftimes)
Performs functional principal component analysis of probability densities in order to describe a data folder, consisting of groups of individuals on which are observed
variables. It returns an object of class
fpcad
.
fpcad(xf, group.name = "group", gaussiand = TRUE, windowh = NULL, normed = TRUE, centered = TRUE, data.centered = FALSE, data.scaled = FALSE, common.variance = FALSE, nb.factors = 3, nb.values = 10, sub.title = "", plot.eigen = TRUE, plot.score = FALSE, nscore = 1:3, filename = NULL)
fpcad(xf, group.name = "group", gaussiand = TRUE, windowh = NULL, normed = TRUE, centered = TRUE, data.centered = FALSE, data.scaled = FALSE, common.variance = FALSE, nb.factors = 3, nb.values = 10, sub.title = "", plot.eigen = TRUE, plot.score = FALSE, nscore = 1:3, filename = NULL)
xf |
object of class
|
group.name |
string.
|
gaussiand |
logical. If |
windowh |
either a list of |
normed |
logical. If |
centered |
logical. If |
data.centered |
logical. If |
data.scaled |
logical. If |
common.variance |
logical. If |
nb.factors |
numeric. Number of returned principal scores (default Warning: The |
nb.values |
numerical. Number of returned eigenvalues (default |
sub.title |
string. If provided, the subtitle for the graphs. |
plot.eigen |
logical. If |
plot.score |
logical. If |
nscore |
numeric vector. If |
filename |
string. Name of the file in which the results are saved. By default ( |
The probability densities
corresponding to the
groups of individuals are either parametrically estimated (
gaussiand = TRUE
) or estimated using the Gaussian kernel method (gaussiand = FALSE
). In the latter case, the windowh
argument provides the list of the bandwidths to use. Notice that in the multivariate case (>1) the bandwidths are positive-definite matrices.
If windowh
is a numerical value, the matrix bandwidth is of the form , where
is either the square root of the covariance matrix (
>1) or the standard deviation of the estimated density.
If windowh = NULL
(default), in the above formula is computed using the
bandwidth.parameter
function.
Returns an object of class fpcad
, that is a list including:
inertia |
data frame of the eigenvalues and percentages of inertia. |
contributions |
data frame of the contributions to the first |
qualities |
data frame of the qualities on the first |
scores |
data frame of the first |
norm |
vector of the |
means |
list of the means. |
variances |
list of the covariance matrices. |
correlations |
list of the correlation matrices. |
skewness |
list of the skewness coefficients. |
kurtosis |
list of the kurtosis coefficients. |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Boumaza, R. (1998). Analyse en composantes principales de distributions gaussiennes multidimensionnelles. Revue de Statistique Appliqu?e, XLVI (2), 5-20.
Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.
Delicado, P. (2011). Dimensionality reduction when data are density functions. Computational Statistics & Data Analysis, 55, 401-420.
Yousfi, S., Boumaza, R., Aissani, D., Adjabi, S. (2014). Optimal bandwith matrices in functional principal component analysis of density functions. Journal of Statistical Computation and Simulation, 85 (11), 2315-2330.
print.fpcad, plot.fpcad, interpret.fpcad, bandwidth.parameter
data(roses) # Case of a normed non-centred PCA of Gaussian densities (on 3 architectural # characteristics of roses: shape (Sha), foliage density (Den) and symmetry (Sym)) rosesf <- as.folder(roses[,c("Sha","Den","Sym","rose")]) result3 <- fpcad(rosesf, group.name = "rose") print(result3) plot(result3) # Applied to a data frame: result3df <- fpcad(roses[,c("Sha","Den","Sym","rose")], group.name = "rose") print(result3df) plot(result3df) # Flower colors of the roses scores <- result3$scores scores <- data.frame(scores, color = scores$rose, stringsAsFactors = TRUE) colours <- scores$rose colours <- factor(c(A = "yellow", B = "yellow", C = "pink", D = "yellow", E = "red", F = "yellow", G = "pink", H = "pink", I = "yellow", J = "yellow")) levels(scores$color) <- c(A = "yellow", B = "yellow", C = "pink", D = "yellow", E = "red", F = "yellow", G = "pink", H = "pink", I = "yellow", J = "yellow") # Scores according to the first two principal components, per color plot(result3, nscore = 1:2, color = colours)
data(roses) # Case of a normed non-centred PCA of Gaussian densities (on 3 architectural # characteristics of roses: shape (Sha), foliage density (Den) and symmetry (Sym)) rosesf <- as.folder(roses[,c("Sha","Den","Sym","rose")]) result3 <- fpcad(rosesf, group.name = "rose") print(result3) plot(result3) # Applied to a data frame: result3df <- fpcad(roses[,c("Sha","Den","Sym","rose")], group.name = "rose") print(result3df) plot(result3df) # Flower colors of the roses scores <- result3$scores scores <- data.frame(scores, color = scores$rose, stringsAsFactors = TRUE) colours <- scores$rose colours <- factor(c(A = "yellow", B = "yellow", C = "pink", D = "yellow", E = "red", F = "yellow", G = "pink", H = "pink", I = "yellow", J = "yellow")) levels(scores$color) <- c(A = "yellow", B = "yellow", C = "pink", D = "yellow", E = "red", F = "yellow", G = "pink", H = "pink", I = "yellow", J = "yellow") # Scores according to the first two principal components, per color plot(result3, nscore = 1:2, color = colours)
Performs functional principal component analysis of probability densities in order to describe a data “foldert”, consisting of individuals on which are observed variables on
times. It returns an object of class
fpcat
.
fpcat(xf, group.name="time", method = 1, ind = 1, nvar = NULL, gaussiand = TRUE, windowh = NULL, normed=TRUE, centered=TRUE, data.centered = FALSE, data.scaled = FALSE, common.variance = FALSE, nb.factors = 3, nb.values = 10, sub.title = "", plot.eigen = TRUE, plot.score = FALSE, nscore = 1:3, filename = NULL)
fpcat(xf, group.name="time", method = 1, ind = 1, nvar = NULL, gaussiand = TRUE, windowh = NULL, normed=TRUE, centered=TRUE, data.centered = FALSE, data.scaled = FALSE, common.variance = FALSE, nb.factors = 3, nb.values = 10, sub.title = "", plot.eigen = TRUE, plot.score = FALSE, nscore = 1:3, filename = NULL)
xf |
object of class
|
group.name |
string or numeric.
|
method |
if If
|
ind |
if The name of the column of x containing the indentifiers of the measured objects, or the number of this column.
See the |
nvar |
if The number of variable measured at each observation time.
See the |
All other arguments are the same as for fpcad
.
gaussiand |
logical. If |
windowh |
either a list of |
normed |
logical. If |
centered |
logical. If |
data.centered |
logical. If |
data.scaled |
logical. If |
common.variance |
logical. If |
nb.factors |
numeric. Number of returned principal scores (default Warning: The |
nb.values |
numerical. Number of returned eigenvalues (default |
sub.title |
string. Subtitle for the graphs (default |
plot.eigen |
logical. If |
plot.score |
logical. If |
nscore |
numeric vector. If |
filename |
string. Name of the file in which the results are saved. By default ( |
The probability densities
corresponding to the
times of observation are either parametrically estimated or estimated using the Gaussian kernel method (see
fpcad
for the use of the arguments indicating the method used to estimate these densities).
Returns an object of class fpcat
, that is a list including:
times |
vector of the times of observation. |
inertia |
data frame of the eigenvalues and percentages of inertia. |
contributions |
data frame of the contributions to the first |
qualities |
data frame of the qualities on the first |
scores |
data frame of the first |
norm |
vector of the |
means |
list of the means. |
variances |
list of the covariance matrices. |
correlations |
list of the correlation matrices. |
skewness |
list of the skewness coefficients. |
kurtosis |
list of the kurtosis coefficients. |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Boumaza, R. (1998). Analyse en composantes principales de distributions gaussiennes multidimensionnelles. Revue de Statistique Appliqu?e, XLVI (2), 5-20.
Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.
Delicado, P. (2011). Dimensionality reduction when data are density functions. Computational Statistics & Data Analysis, 55, 401-420.
Yousfi, S., Boumaza, R., Aissani, D., Adjabi, S. (2014). Optimal bandwith matrices in functional principal component analysis of density functions. Journal of Statistical Computation and Simulation, 85 (11), 2315-2330.
print.fpcat, plot.fpcat, bandwidth.parameter
times <- as.Date(c("2017-03-01", "2017-04-01", "2017-05-01", "2017-06-01")) x1 <- data.frame(z1=rnorm(6,1,5), z2=rnorm(6,3,3)) x2 <- data.frame(z1=rnorm(6,4,6), z2=rnorm(6,5,2)) x3 <- data.frame(z1=rnorm(6,7,2), z2=rnorm(6,8,4)) x4 <- data.frame(z1=rnorm(6,9,3), z2=rnorm(6,10,2)) ft <- foldert(x1, x2, x3, x4, times = times, rows.select="intersect") print(ft) result <- fpcat(ft) print(result) plot(result)
times <- as.Date(c("2017-03-01", "2017-04-01", "2017-05-01", "2017-06-01")) x1 <- data.frame(z1=rnorm(6,1,5), z2=rnorm(6,3,3)) x2 <- data.frame(z1=rnorm(6,4,6), z2=rnorm(6,5,2)) x3 <- data.frame(z1=rnorm(6,7,2), z2=rnorm(6,8,4)) x4 <- data.frame(z1=rnorm(6,9,3), z2=rnorm(6,10,2)) ft <- foldert(x1, x2, x3, x4, times = times, rows.select="intersect") print(ft) result <- fpcat(ft) print(result) plot(result)
Select columns in all data frames of a folder.
getcol.folder(object, name)
getcol.folder(object, name)
object |
object of class |
name |
character vector. The names of the columns to be selected in each data frame of the folder. |
A folder with the same number of elements as object
. Its element is a data frame, and its columns are the columns of
object[[k]]
given by name
.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
folder
: object of class folder
.
rmcol.folder
: remove columns in all elements of a folder.
getrow.folder
: select rows in all elements of a folder.
rmrow.folder
: remove rows in all elements of a folder.
data(iris) iris.fold <- as.folder(iris, "Species") getcol.folder(iris.fold, "Sepal.Length") getcol.folder(iris.fold, c("Petal.Length", "Petal.Width"))
data(iris) iris.fold <- as.folder(iris, "Species") getcol.folder(iris.fold, "Sepal.Length") getcol.folder(iris.fold, c("Petal.Length", "Petal.Width"))
Select columns in all data frames of a foldert.
getcol.foldert(object, name)
getcol.foldert(object, name)
object |
object of class |
name |
character vector. The names of the columns to be selected in each data frame of the foldert. |
A foldert with the same number of elements as object
. Its element is a data frame, and its columns are the columns of
object[[k]]
given by name
.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
foldert
: object of class foldert
.
rmcol.foldert
: remove columns in all elements of a foldert.
getrow.foldert
: select rows in all elements of a foldert.
rmrow.foldert
: remove rows in all elements of a foldert.
data(floribundity) ft0 <- foldert(floribundity, cols.select = "union") getcol.foldert(ft0, c("rose", "variety"))
data(floribundity) ft0 <- foldert(floribundity, cols.select = "union") getcol.foldert(ft0, c("rose", "variety"))
Select rows in all data frames of a folder.
getrow.folder(object, name)
getrow.folder(object, name)
object |
object of class |
name |
character vector. The names of the rows to be selected in each data frame of the folder. |
A folder with the same number of elements as object
. Its element is a data frame, and its rows are the rows of
object[[k]]
given by name
.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
folder
: object of class folder
.
rmrow.folder
: remove rows in all elements of a folder.
getcol.folder
: select rows in all elements of a folder.
rmcol.folder
: remove rows in all elements of a folder.
data(iris) iris.fold <- as.folder(iris, "Species") getrow.folder(iris.fold, c(1:5, 51:55, 101:105))
data(iris) iris.fold <- as.folder(iris, "Species") getrow.folder(iris.fold, c(1:5, 51:55, 101:105))
Select rows in all data frames of a foldert.
getrow.foldert(object, name)
getrow.foldert(object, name)
object |
object of class |
name |
character vector. The names of the rows to be selected in each data frame of the foldert. |
A foldert with the same number of elements as object
. Its element is a data frame, and its rows are the rows of
object[[k]]
given by name
.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
foldert
: object of class foldert
.
rmrow.foldert
: remove rows in all elements of a foldert.
getcol.foldert
: select columns in all elements of a foldert.
rmcol.foldert
: remove columns in all elements of a foldert.
data(floribundity) ft0 <- foldert(floribundity, cols.select = "union", rows.select = "union") getrow.foldert(ft0, c("16", "51"))
data(floribundity) ft0 <- foldert(floribundity, cols.select = "union", rows.select = "union") getrow.foldert(ft0, c("16", "51"))
Performs functional hierarchic cluster analysis of discrete probability distributions. It returns an object of class hclustdd
. It applies hclust
to the distance matrix between the distributions.
hclustdd(xf, group.name = "group", distance = c("l1", "l2", "chisqsym", "hellinger", "jeffreys", "jensen", "lp"), sub.title = "", filename = NULL, method.hclust = "complete")
hclustdd(xf, group.name = "group", distance = c("l1", "l2", "chisqsym", "hellinger", "jeffreys", "jensen", "lp"), sub.title = "", filename = NULL, method.hclust = "complete")
xf |
object of class
|
group.name |
string. Name of the grouping variable. Default: |
distance |
The distance or divergence used to compute the distance matrix between the discrete distributions (see Details). It can be:
|
sub.title |
string. If provided, the subtitle for the graphs. |
filename |
string. Name of the file in which the results are saved. By default ( |
method.hclust |
the agglomeration method to be used for the clustering. See the |
In order to compute the distances/dissimilarities between the groups, the probability distributions
corresponding to the
groups of individuals are estimated from observations.
Then the distances/dissimilarities between the estimated distributions are computed, using the distance or divergence defined by the
distance
argument:
If the distance is "l1"
, "l2"
or "lp"
, the distances are computed by the function matddlppar
.
Otherwise, it can be computed by matddchisqsympar
("chisqsym"
), matddhellingerpar
("hellinger"
), matddjeffreyspar
("jeffreys"
) or matddjensenpar
("jensen"
).
Returns an object of class hclustdd
, that is a list including:
distances |
matrix of the |
clust |
an object of class |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
# Example 1 with a folder (10 groups) of 3 factors # obtained by converting numeric variables data(roses) xr = roses[,c("Sha", "Den", "Sym", "rose")] xr = cut(xr, breaks = list(c(0, 5, 7, 10), c(0, 4, 6, 10), c(0, 6, 8, 10))) xf = as.folder(xr, groups = "rose") af = hclustdd(xf) print(af) print(af, dist.print = TRUE) plot(af) plot(af, hang = -1) # Example 2 with a data frame obtained by converting numeric variables ar = hclustdd(xr, group.name = "rose") print(ar) print(ar, dist.print = TRUE) plot(ar) plot(ar, hang = -1) # Example 3 with a list of 7 arrays data(dspg) xl = dspg hclustdd(xl)
# Example 1 with a folder (10 groups) of 3 factors # obtained by converting numeric variables data(roses) xr = roses[,c("Sha", "Den", "Sym", "rose")] xr = cut(xr, breaks = list(c(0, 5, 7, 10), c(0, 4, 6, 10), c(0, 6, 8, 10))) xf = as.folder(xr, groups = "rose") af = hclustdd(xf) print(af) print(af, dist.print = TRUE) plot(af) plot(af, hang = -1) # Example 2 with a data frame obtained by converting numeric variables ar = hclustdd(xr, group.name = "rose") print(ar) print(ar, dist.print = TRUE) plot(ar) plot(ar, hang = -1) # Example 3 with a list of 7 arrays data(dspg) xl = dspg hclustdd(xl)
Hellinger distance between two multivariate () or univariate (
) Gaussian densities (see Details).
hellinger(x1, x2, check = FALSE)
hellinger(x1, x2, check = FALSE)
x1 |
a matrix or data frame of |
x2 |
matrix or data frame (or tibble) of |
check |
logical. When |
The Hellinger distance between the two Gaussian densities is computed by using the hellingerpar
function and the density parameters estimated from samples.
Returns the distance between the two probability densities.
Be careful! If check = FALSE
and one smoothing bandwidth matrix is degenerate, the result returned can not be considered.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
McLachlan, G.J. (1992). Discriminant analysis and statistical pattern recognition. John Wiley & Sons, New York .
hellingerpar: Hellinger distance between Gaussian densities, given their parameters.
require(MASS) m1 <- c(0,0) v1 <- matrix(c(1,0,0,1),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(4,1,1,9),ncol = 2) x1 <- mvrnorm(n = 3,mu = m1,Sigma = v1) x2 <- mvrnorm(n = 5, mu = m2, Sigma = v2) hellinger(x1, x2)
require(MASS) m1 <- c(0,0) v1 <- matrix(c(1,0,0,1),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(4,1,1,9),ncol = 2) x1 <- mvrnorm(n = 3,mu = m1,Sigma = v1) x2 <- mvrnorm(n = 5, mu = m2, Sigma = v2) hellinger(x1, x2)
Hellinger distance between two multivariate () or univariate (
) Gaussian densities given their parameters (mean vectors and covariance matrices if the densities are multivariate, or means and variances if univariate) (see Details).
hellingerpar(mean1, var1, mean2, var2, check = FALSE)
hellingerpar(mean1, var1, mean2, var2, check = FALSE)
mean1 |
|
var1 |
|
mean2 |
|
var2 |
|
check |
logical. When |
The mean vectors ( and
) and variance matrices (
and
) given as arguments (
mean1
, mean2
, var1
and var2
) are used to compute the Hellinger distance between the two Gaussian densities, equal to:
If the means and variances are numbers, the formula is the same ignoring the following operators: t (transpose of a matrix or vector) and det (determinant of a square matrix).
The Hellinger distance between two Gaussian densities.
Be careful! If check = FALSE
and one covariance matrix is degenerated (multivariate case) or one variance is zero (univariate case), the result returned must not be considered.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
McLachlan, G.J. (1992). Discriminant analysis and statistical pattern recognition. John Wiley & Sons, New York .
hellinger: Hellinger distance between Gaussian densities estimated from samples.
m1 <- c(1,1) v1 <- matrix(c(4,1,1,9),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(1,0,0,1),ncol = 2) hellingerpar(m1,v1,m2,v2)
m1 <- c(1,1) v1 <- matrix(c(4,1,1,9),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(1,0,0,1),ncol = 2) hellingerpar(m1,v1,m2,v2)
fmdsd
, dstatis
, fpcad
, or fpcat
vs. moments, or scores of mdsdd
vs. marginal distributions or association measures
This generic function provides a tool for the interpretation of the results of fmdsd
, dstatis
, fpcad
, fpcat
or mdsdd
function.
interpret(x, nscore = 1:3, ...)
interpret(x, nscore = 1:3, ...)
x |
object of class
|
nscore |
numeric vector. Selects the columns of the data frame Warning: Its components cannot be greater than the |
... |
Arguments to be passed to the methods, such as |
Returns a list including:
pearson |
matrix of Pearson correlations between selected scores and moments, probabilities or associations. |
spearman |
matrix of Spearman correlations between selected scores and moments, probabilities or associations. |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.
interpret.fmdsd; interpret.dstatis; interpret.fpcad; interpret.fpcat; interpret.mdsdd.
dstatis
function vs. moments of the densities
Applies to an object of class "dstatis"
, plots the principal scores vs. the moments of the densities (means, standard deviations, variances, correlations, skewness and kurtosis coefficients), and computes the correlations between these scores and moments.
## S3 method for class 'dstatis' interpret(x, nscore = 1, moment=c("mean", "sd", "var", "cov", "cor", "skewness", "kurtosis"), ...)
## S3 method for class 'dstatis' interpret(x, nscore = 1, moment=c("mean", "sd", "var", "cov", "cor", "skewness", "kurtosis"), ...)
x |
object of class |
nscore |
numeric. Selects the column of the data frame Note that since dad-4, Warning: |
moment |
characters string. Selects the moments to cross with scores:
|
... |
Arguments to be passed to methods. |
A graphics device can contain up to 9 graphs. If there are too many (more than 36) graphs for each score, one can display the graphs in a multipage PDF file.
The number of principal scores to be interpreted cannot be greater than nb.factors
of the data frame x$scores
returned by the function dstatis.inter.
Returns a list including:
pearson |
matrix of Pearson correlations between selected scores and moments. |
spearman |
matrix of Spearman correlations between selected scores and moments. |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Lavit, C., Escoufier, Y., Sabatier, R., Traissac, P. (1994). The ACT (STATIS method). Computational Statistics & Data Analysis, 18 (1994), 97-119.
data(roses) rosesf <- as.folder(roses[,c("Sha","Den","Sym","rose")]) # Dual STATIS on the covariance matrices ## Not run: result <- dstatis.inter(rosesf, group.name = "rose") interpret(result) interpret(result, moment = "var") interpret(result, moment = "cor") interpret(result, nscore = 2) ## End(Not run)
data(roses) rosesf <- as.folder(roses[,c("Sha","Den","Sym","rose")]) # Dual STATIS on the covariance matrices ## Not run: result <- dstatis.inter(rosesf, group.name = "rose") interpret(result) interpret(result, moment = "var") interpret(result, moment = "cor") interpret(result, nscore = 2) ## End(Not run)
fmdsd
function vs. moments of the densities
Applies to an object of class "fmdsd"
, plots the scores vs. the moments of the densities (means, standard deviations, variances, correlations, skewness and kurtosis coefficients), and computes the correlations between these scores and moments.
## S3 method for class 'fmdsd' interpret(x, nscore = 1, moment=c("mean", "sd", "var", "cov", "cor", "skewness", "kurtosis"), ...)
## S3 method for class 'fmdsd' interpret(x, nscore = 1, moment=c("mean", "sd", "var", "cov", "cor", "skewness", "kurtosis"), ...)
x |
object of class |
nscore |
numeric. Selects the column of the data frame Note that since dad-4, Warning: |
moment |
character string. Selects the moments to cross with scores:
|
... |
Arguments to be passed to methods. |
A graphics device can contain up to 9 graphs. If there are too many (more than 36) graphs for each score, one can display the graphs in a multipage PDF file.
The number of principal scores to be interpreted cannot be greater than nb.factors
of the data frame x$scores
returned by the function fmdsd
.
Returns a list including:
pearson |
matrix of Pearson correlations between selected scores and moments. |
spearman |
matrix of Spearman correlations between selected scores and moments. |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.
Delicado, P. (2011). Dimensionality reduction when data are density functions. Computational Statistics & Data Analysis, 55, 401-420.
data(roses) x <- roses[,c("Sha","Den","Sym","rose")] rosesfold <- as.folder(x) result1 <- fmdsd(rosesfold) interpret(result1) ## Not run: interpret(result1, moment = "var") ## End(Not run) interpret(result1, nscore = 2)
data(roses) x <- roses[,c("Sha","Den","Sym","rose")] rosesfold <- as.folder(x) result1 <- fmdsd(rosesfold) interpret(result1) ## Not run: interpret(result1, moment = "var") ## End(Not run) interpret(result1, nscore = 2)
fpcad
function vs. moments of the densities
Applies to an object of class "fpcad"
, plots the principal scores vs. the moments of the densities (means, standard deviations, variances, correlations, skewness and kurtosis coefficients), and computes the correlations between these scores and moments.
## S3 method for class 'fpcad' interpret(x, nscore = 1, moment=c("mean", "sd", "var", "cov", "cor", "skewness", "kurtosis"), ...)
## S3 method for class 'fpcad' interpret(x, nscore = 1, moment=c("mean", "sd", "var", "cov", "cor", "skewness", "kurtosis"), ...)
x |
object of class |
nscore |
numeric. Selects the column of the data frame Note that since dad-4, Warning: |
moment |
characters string. Selects the moments to cross with scores:
|
... |
Arguments to be passed to methods. |
A graphics device can contain up to 9 graphs. If there are too many (more than 36) graphs for each score, one can display the graphs in a multipage PDF file.
The number of principal scores to be interpreted cannot be greater than nb.factors
of the data frame x$scores
returned by the function fpcad.
Returns a list including:
pearson |
matrix of Pearson correlations between selected scores and moments. |
spearman |
matrix of Spearman correlations between selected scores and moments. |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.
data(roses) rosefold <- as.folder(roses[,c("Sha","Den","Sym","rose")]) result1 <- fpcad(rosefold) interpret(result1) ## Not run: interpret(result1, moment = "var") ## End(Not run) interpret(result1, moment = "cor") interpret(result1, nscore = 2)
data(roses) rosefold <- as.folder(roses[,c("Sha","Den","Sym","rose")]) result1 <- fpcad(rosefold) interpret(result1) ## Not run: interpret(result1, moment = "var") ## End(Not run) interpret(result1, moment = "cor") interpret(result1, nscore = 2)
"fpcat"
function vs. moments of the densities
This function applies to an object of class "fpcat"
and does the same as for an object of class "fpcad"
: it plots the principal scores vs. the moments of the densities (means, standard deviations, variances, correlations, skewness and kurtosis coefficients), and computes the correlations between these scores and moments.
## S3 method for class 'fpcat' interpret(x, nscore = 1, moment=c("mean", "sd", "var", "cov", "cor", "skewness", "kurtosis"), ...)
## S3 method for class 'fpcat' interpret(x, nscore = 1, moment=c("mean", "sd", "var", "cov", "cor", "skewness", "kurtosis"), ...)
x |
object of class |
nscore |
numeric. Selects the column of the data frame Note that since dad-4, Warning: |
moment |
characters string. Selects the moments to cross with scores:
|
... |
Arguments to be passed to methods. |
A graphics device can contain up to 9 graphs. If there are too many (more than 36) graphs for each score, one can display the graphs in a multipage PDF file.
The number of principal scores to be interpreted cannot be greater than nb.factors
of the data frame x$scores
returned by the function fpcat.
Returns a list including:
pearson |
matrix of Pearson correlations between selected scores and moments. |
spearman |
matrix of Spearman correlations between selected scores and moments. |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.
# Alsacian castles with their building year data(castles) castyear <- foldert(lapply(castles, "[", 1:4)) fpcayear <- fpcat(castyear, group.name = "year") interpret(fpcayear) ## Not run: interpret(fpcayear, moment="var") ## End(Not run)
# Alsacian castles with their building year data(castles) castyear <- foldert(lapply(castles, "[", 1:4)) fpcayear <- fpcat(castyear, group.name = "year") interpret(fpcayear) ## Not run: interpret(fpcayear, moment="var") ## End(Not run)
mdsdd
function vs. marginal probability distributions or association measures
Applies to an object of class "mdsdd"
, plots the scores vs. the marginal probability distributions or pairwise association measures of the discrete variables, and computes the correlations between these scores and probabilities or association measures (see Details).
## S3 method for class 'mdsdd' interpret(x, nscore = 1, mma = c("marg1", "marg2", "assoc"), ...)
## S3 method for class 'mdsdd' interpret(x, nscore = 1, mma = c("marg1", "marg2", "assoc"), ...)
x |
object of class |
nscore |
numeric. Selects the column of the data frame Note that since dad-4, Warning: |
mma |
character. Indicates which measures will be considered:
|
... |
Arguments to be passed to methods. |
A graphics device can contain up to 9 graphs. If there are too many (more than 36) graphs for each score, one can display the graphs in a multipage PDF file.
The number of principal scores to be interpreted cannot be greater than nb.factors
of the data frame x$scores
returned by the function mdsdd
.
Returns a list including:
pearson |
matrix of Pearson correlations between selected scores and probabilities or association measures. |
spearman |
matrix of Spearman correlations between selected scores and probabilities or association measures. |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
# INSEE (France): Diploma x Socio professional group, seven years. data(dspg) xlista = dspg a <- mdsdd(xlista) interpret(a) # Example 3 with a list of 96 arrays (departments) ## Not run: data(dspgd2015) xd = dspgd2015 res = mdsdd(xd, group.name = "coded") interpret(res) plot(res, fontsize.points = 0.7) # Each department is represented by its name data(departments) coor = merge(res$scores, departments, by = "coded") dev.new() plot(coor$PC.1, coor$PC.2, type ="n") text(coor$PC.1, coor$PC.2, coor$named, cex = 0.5) # Each department is represented by its region dev.new() plot(coor$PC.1, coor$PC.2, type ="n") text(coor$PC.1, coor$PC.2, coor$coder, cex = 0.7) ## End(Not run)
# INSEE (France): Diploma x Socio professional group, seven years. data(dspg) xlista = dspg a <- mdsdd(xlista) interpret(a) # Example 3 with a list of 96 arrays (departments) ## Not run: data(dspgd2015) xd = dspgd2015 res = mdsdd(xd, group.name = "coded") interpret(res) plot(res, fontsize.points = 0.7) # Each department is represented by its name data(departments) coor = merge(res$scores, departments, by = "coded") dev.new() plot(coor$PC.1, coor$PC.2, type ="n") text(coor$PC.1, coor$PC.2, coor$named, cex = 0.5) # Each department is represented by its region dev.new() plot(coor$PC.1, coor$PC.2, type ="n") text(coor$PC.1, coor$PC.2, coor$coder, cex = 0.7) ## End(Not run)
discdd.misclass
Tests if its argument is an object of class discdd.misclass
(see Details
of the function discdd.misclass).
is.discdd.misclass(x)
is.discdd.misclass(x)
x |
object to be tested. |
TRUE
if its argument is of class discdd.misclass
, and FALSE
otherwise.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
discdd.predict
Tests if its argument is an object of class discdd.predict
(see Details
of the function discdd.predict).
is.discdd.predict(x)
is.discdd.predict(x)
x |
object to be tested. |
TRUE
if its argument is of class discdd.predict
, and FALSE
otherwise.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
dstatis
Tests if its argument is an object of class dstatis
(see Details
of the function dstatis.inter).
is.dstatis(x)
is.dstatis(x)
x |
object to be tested. |
TRUE
if its argument is of class dstatis
, and FALSE
otherwise.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
fdiscd.misclass
Tests if its argument is an object of class fdiscd.misclass
(see Details
of the function fdiscd.misclass).
is.fdiscd.misclass(x)
is.fdiscd.misclass(x)
x |
object to be tested. |
TRUE
if its argument is of class fdiscd.misclass
, and FALSE
otherwise.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
fdiscd.predict
Tests if its argument is an object of class fdiscd.predict
(see Details
of the function fdiscd.predict)..
is.fdiscd.predict(x)
is.fdiscd.predict(x)
x |
object to be tested. |
TRUE
if its argument is of class fdiscd.predict
, and FALSE
otherwise.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
fhclustd
Tests if its argument is an object of class fhclustd
(see Details
of the function fhclustd).
is.fhclustd(x)
is.fhclustd(x)
x |
object to be tested. |
TRUE
if its argument is of class fhclustd
, and FALSE
otherwise.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
fmdsd
Tests if its argument is an object of class fmdsd
(see Details
of the function fmdsd).
is.fmdsd(x)
is.fmdsd(x)
x |
object to be tested. |
TRUE
if its argument is of class fmdsd
, and FALSE
otherwise.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
folder
Tests if its argument is an object of class folder
(see folder
).
is.folder(x)
is.folder(x)
x |
object to be tested. |
TRUE
if its argument is of class folder
, and FALSE
otherwise.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
folder
to create an object of class folder
.
folderh
Tests if its argument is an object of class folderh
(see folderh
).
is.folderh(x)
is.folderh(x)
x |
object to be tested. |
TRUE
if its argument is of class folderh
, and FALSE
otherwise.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
folderh
to create an object of class folderh
.
foldermtg
Tests if its argument is an object of class foldermtg
(see read.mtg
).
is.foldermtg(x)
is.foldermtg(x)
x |
object to be tested. |
TRUE
if its argument is of class foldermtg
, and FALSE
otherwise.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
read.mtg
to read a MTG file and create an object of class foldermtg
.
foldert
Tests if its argument is an object of class foldert
(see foldert
).
is.foldert(x)
is.foldert(x)
x |
object to be tested. |
TRUE
if its argument is of class foldert
, and FALSE
otherwise.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
foldert
to create an object of class foldert
.
fpcad
Tests if its argument is an object of class fpcad
(see Details
of the function fpcad).
is.fpcad(x)
is.fpcad(x)
x |
object to be tested. |
TRUE
if its argument is of class fpcad
, and FALSE
otherwise.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
mdsdd
Tests if its argument is an object of class mdsdd
(see Details
of the function mdsdd).
is.mdsdd(x)
is.mdsdd(x)
x |
object to be tested. |
TRUE
if its argument is of class mdsdd
, and FALSE
otherwise.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Jeffreys measure (or symmetrised Kullback-Leibler divergence) between two multivariate () or univariate (
) Gaussian densities given samples (see Details).
jeffreys(x1, x2, check = FALSE)
jeffreys(x1, x2, check = FALSE)
x1 |
a matrix or data frame of |
x2 |
matrix or data frame (or tibble) of |
check |
logical. When |
The Jeffreys measure between the two Gaussian densities is computed by using the jeffreyspar
function and the density parameters estimated from samples.
Returns the Jeffrey's measure between the two probability densities.
Be careful! If check = FALSE
and one smoothing bandwidth matrix is degenerate, the result returned must not be considered.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Thabane, L., Safiul Haq, M. (1999). On Bayesian selection of the best population using the Kullback-Leibler divergence measure. Statistica Neerlandica, 53(3): 342-360.
jeffreyspar: Jeffreys measure between Gaussian densities, given their parameters.
require(MASS) m1 <- c(0,0) v1 <- matrix(c(1,0,0,1),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(4,1,1,9),ncol = 2) x1 <- mvrnorm(n = 3,mu = m1,Sigma = v1) x2 <- mvrnorm(n = 5, mu = m2, Sigma = v2) jeffreys(x1, x2)
require(MASS) m1 <- c(0,0) v1 <- matrix(c(1,0,0,1),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(4,1,1,9),ncol = 2) x1 <- mvrnorm(n = 3,mu = m1,Sigma = v1) x2 <- mvrnorm(n = 5, mu = m2, Sigma = v2) jeffreys(x1, x2)
Jeffreys measure (or symmetrised Kullback-Leibler divergence) between two multivariate () or univariate (
) Gaussian densities, given their parameters (mean vectors and covariance matrices if they are multivariate, means and variances if univariate) (see Details).
jeffreyspar(mean1, var1, mean2, var2, check = FALSE)
jeffreyspar(mean1, var1, mean2, var2, check = FALSE)
mean1 |
|
var1 |
|
mean2 |
|
var2 |
|
check |
logical. When |
Let and
the mean vectors,
and
the covariance matrices, Jeffreys measure of the two Gaussian densities is equal to:
.
If the means and variances are numbers, the formula is the same ignoring the following operators: t (transpose of a matrix or vector) and tr (trace of a square matrix).
Jeffreys measure between two Gaussian densities.
Be careful! If check = FALSE
and one covariance matrix is degenerated (multivariate case) or one variance is zero (univariate case), the result returned must not be considered.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
McLachlan, G.J. (1992). Discriminant analysis and statistical pattern recognition. John Wiley & Sons, New York .
Thabane, L., Safiul Haq, M. (1999). On Bayesian selection of the best population using the Kullback-Leibler divergence measure. Statistica Neerlandica, 53(3): 342-360.
jeffreys: Jeffreys measure of two parametrically estimated Gaussian densities, given samples.
m1 <- c(1,1) v1 <- matrix(c(4,1,1,9),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(1,0,0,1),ncol = 2) jeffreyspar(m1,v1,m2,v2)
m1 <- c(1,1) v1 <- matrix(c(4,1,1,9),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(1,0,0,1),ncol = 2) jeffreyspar(m1,v1,m2,v2)
Computes the kurtosis coefficient by column of the elements of an object of class folder
.
kurtosis.folder(x, na.rm = FALSE, type = 3)
kurtosis.folder(x, na.rm = FALSE, type = 3)
x |
an object of class |
na.rm |
logical. Should missing values be omitted from the calculations? (see |
type |
an integer between 1 and 3 (see |
It uses kurtosis
to compute the mean by numeric column of each element of the folder. If some columns of the data frames are not numeric, there is a warning, and the means are computed on the numeric columns only.
A list whose elements are the kurtosis coefficients by column of the elements of the folder.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
folder
to create an object is of class folder
.
mean.folder
, var.folder
, cor.folder
, skewness.folder
for other statistics for folder
objects.
# First example: iris (Fisher) data(iris) iris.fold <- as.folder(iris, "Species") iris.kurtosis <- kurtosis.folder(iris.fold) print(iris.kurtosis) # Second example: roses data(roses) roses.fold <- as.folder(roses, "rose") roses.kurtosis <- kurtosis.folder(roses.fold) print(roses.kurtosis)
# First example: iris (Fisher) data(iris) iris.fold <- as.folder(iris, "Species") iris.kurtosis <- kurtosis.folder(iris.fold) print(iris.kurtosis) # Second example: roses data(roses) roses.fold <- as.folder(roses, "rose") roses.kurtosis <- kurtosis.folder(roses.fold) print(roses.kurtosis)
inner product of probability densities
inner product of two multivariate (
) or univariate (
) probability densities, estimated from samples.
l2d(x1, x2, method = "gaussiand", check = FALSE, varw1 = NULL, varw2 = NULL)
l2d(x1, x2, method = "gaussiand", check = FALSE, varw1 = NULL, varw2 = NULL)
x1 |
a matrix or data frame of |
x2 |
matrix or data frame (or tibble) of |
method |
string. It can be:
|
check |
logical. When Notice that if |
varw1 , varw2
|
|
If method = "gaussiand"
, the mean vectors and the variance matrices ( and
) of the two samples are computed, and they are used to compute the inner product using the
l2dpar
function.
If method = "kern"
, the densities of both samples are estimated using the Gaussian kernel method.
These estimations are then used to compute the inner product.
if varw1
and varw2
arguments are omitted, the smoothing bandwidths are computed using the normal reference rule matrix bandwidth:
where
for the first density. Idem for the second density after making the necessary changes.
The inner product of the two probability densities.
Be careful! If check = FALSE
and one smoothing bandwidth matrix is degenerate, the result returned can not be considered.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.
Wand, M., Jones, M. (1995). Kernel smoothing. Chapman and Hall/CRC, London.
Yousfi, S., Boumaza R., Aissani, D., Adjabi, S. (2014). Optimal bandwith matrices in functional principal component analysis of density functions. Journal of Statistical Computational and Simulation, 85 (11), 2315-2330.
l2dpar for Gaussian densities whose parameters are given.
require(MASS) m1 <- c(0,0) v1 <- matrix(c(1,0,0,1),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(4,1,1,9),ncol = 2) x1 <- mvrnorm(n = 3,mu = m1,Sigma = v1) x2 <- mvrnorm(n = 5, mu = m2, Sigma = v2) l2d(x1, x2, method = "gaussiand") l2d(x1, x2, method = "kern") l2d(x1, x2, method = "kern", varw1 = v1, varw2 = v2)
require(MASS) m1 <- c(0,0) v1 <- matrix(c(1,0,0,1),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(4,1,1,9),ncol = 2) x1 <- mvrnorm(n = 3,mu = m1,Sigma = v1) x2 <- mvrnorm(n = 5, mu = m2, Sigma = v2) l2d(x1, x2, method = "gaussiand") l2d(x1, x2, method = "kern") l2d(x1, x2, method = "kern", varw1 = v1, varw2 = v2)
inner product of Gaussian densities given their parameters
inner product of multivariate (
) or univariate (
) Gaussian densities, given their parameters (mean vectors and covariance matrices if the densities are multivariate, or means and variances if univariate).
l2dpar(mean1, var1, mean2, var2, check = FALSE)
l2dpar(mean1, var1, mean2, var2, check = FALSE)
mean1 |
|
var1 |
|
mean2 |
|
var2 |
|
check |
logical. When |
Computes the inner product of two Gaussian densities, equal to:
If the means and variances are numbers, the formula is the same ignoring the following operators: t (transpose of a matrix or vector) and det (determinant of a square matrix).
The inner product between two Gaussian densities.
Be careful! If check = FALSE
and one covariance matrix is degenerated (multivariate case) or one variance is zero (univariate case), the result returned must not be considered.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
M. Wand and M. Jones (1995). Kernel Smoothing. Chapman and Hall, London.
l2d for parametrically estimated Gaussian densities or nonparametrically estimated densities, given samples;
m1 <- c(1,1) v1 <- matrix(c(4,1,1,9),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(1,0,0,1),ncol = 2) l2dpar(m1,v1,m2,v2)
m1 <- c(1,1) v1 <- matrix(c(4,1,1,9),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(1,0,0,1),ncol = 2) l2dpar(m1,v1,m2,v2)
Computes the matrix of the symmetric Chi-squared distances between several multivariate or univariate discrete probability distributions, estimated from samples.
matddchisqsym(x)
matddchisqsym(x)
x |
object of class |
Positive symmetric matrix whose order is equal to the number of data frames (or distributions), consisting of the pairwise symmetric chi-squared distances between the distributions.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
matddchisqsympar
for discrete probability densities, given the probabilities on the same support.
# Example 1 x1 <- data.frame(x = factor(c("A", "A", "B", "B"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B"))) xf <- folder(x1, x2, x3) matddchisqsym(xf) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B")), y = factor(c("a", "b", "a", "b", "a", "b"))) xf <- folder(x1, x2, x3) matddchisqsym(xf)
# Example 1 x1 <- data.frame(x = factor(c("A", "A", "B", "B"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B"))) xf <- folder(x1, x2, x3) matddchisqsym(xf) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B")), y = factor(c("a", "b", "a", "b", "a", "b"))) xf <- folder(x1, x2, x3) matddchisqsym(xf)
Computes the matrix of the symmetric Chi-squared distances between several multivariate or univariate discrete probability distributions on the same support (which can be a Cartesian product of sets), given the probabilities of the states (which are
-tuples) of the support.
matddchisqsympar(freq)
matddchisqsympar(freq)
freq |
list of arrays. Their |
Positive symmetric matrix whose order is equal to the number of distributions, consisting of the pairwise symmetric chi-squared distances between these distributions.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
matddchisqsym
for discrete probability densities which are estimated from the data.
Computes the matrix of the Hellinger (or Matusita) distances between several multivariate or univariate discrete probability distributions, estimated from samples.
matddhellinger(x)
matddhellinger(x)
x |
object of class |
Positive symmetric matrix whose order is equal to the number of data frames (or distributions), consisting of the pairwise Hellinger distances between the distributions.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
matddhellingerpar
for discrete probability densities, given the probabilities on the same support.
# Example 1 x1 <- data.frame(x = factor(c("A", "A", "B", "B"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B"))) xf <- folder(x1, x2, x3) matddhellinger(xf) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B")), y = factor(c("a", "b", "a", "b", "a", "b"))) xf <- folder(x1, x2, x3) matddhellinger(xf)
# Example 1 x1 <- data.frame(x = factor(c("A", "A", "B", "B"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B"))) xf <- folder(x1, x2, x3) matddhellinger(xf) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B")), y = factor(c("a", "b", "a", "b", "a", "b"))) xf <- folder(x1, x2, x3) matddhellinger(xf)
Computes the matrix of the Hellinger (or Matusita) distances between several multivariate or univariate discrete probability distributions on the same support (which can be a Cartesian product of sets), given the probabilities of the states (which are
-tuples) of the support.
matddhellingerpar(freq)
matddhellingerpar(freq)
freq |
list of arrays. Their |
Positive symmetric matrix whose order is equal to the number of distributions, consisting of the pairwise Hellinger distances between these distributions.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
matddhellinger
for discrete probability densities which are estimated from the data.
Computes the matrix of Jeffreys divergences between several multivariate or univariate discrete probability distributions, estimated from samples.
matddjeffreys(x)
matddjeffreys(x)
x |
object of class |
Positive symmetric matrix whose order is equal to the number of data frames (or distributions), consisting of the pairwise Jeffreys divergences between the distributions.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Dezaz E. (2013). Encyclopedia of distances. Springer.
matddjeffreyspar
for discrete probability densities, given the probabilities on the same support.
# Example 1 x1 <- data.frame(x = factor(c("A", "A", "B", "B"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B"))) xf <- folder(x1, x2, x3) matddhellinger(xf) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B")), y = factor(c("a", "b", "a", "b", "a", "b"))) xf <- folder(x1, x2, x3) matddhellinger(xf)
# Example 1 x1 <- data.frame(x = factor(c("A", "A", "B", "B"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B"))) xf <- folder(x1, x2, x3) matddhellinger(xf) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B")), y = factor(c("a", "b", "a", "b", "a", "b"))) xf <- folder(x1, x2, x3) matddhellinger(xf)
Computes the matrix of Jeffreys divergences between several multivariate or univariate discrete probability distributions on the same support (which can be a Cartesian product of sets), given the probabilities of the states (which are
-tuples) of the support.
matddjeffreyspar(freq)
matddjeffreyspar(freq)
freq |
list of arrays. Their |
Positive symmetric matrix whose order is equal to the number of distributions, consisting of the pairwise Jeffreys divergences between these distributions.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
matddjeffreys
for discrete probability densities which are estimated from the data.
Computes the matrix of the Jensen-Shannon divergences between several multivariate or univariate discrete probability distributions, estimated from samples.
matddjensen(x)
matddjensen(x)
x |
object of class |
Positive symmetric matrix whose order is equal to the number of data frames (or distributions), consisting of the pairwise Jensen-Shannon divergences between the distributions.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
matddjensenpar
for discrete probability densities, given the probabilities on the same support.
# Example 1 x1 <- data.frame(x = factor(c("A", "A", "B", "B"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B"))) xf <- folder(x1, x2, x3) matddhellinger(xf) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B")), y = factor(c("a", "b", "a", "b", "a", "b"))) xf <- folder(x1, x2, x3) matddhellinger(xf)
# Example 1 x1 <- data.frame(x = factor(c("A", "A", "B", "B"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B"))) xf <- folder(x1, x2, x3) matddhellinger(xf) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B")), y = factor(c("a", "b", "a", "b", "a", "b"))) xf <- folder(x1, x2, x3) matddhellinger(xf)
Computes the matrix of the Jensen-Shannon divergences between several multivariate or univariate discrete probability distributions on the same support (which can be a Cartesian product of sets), given the probabilities of the states (which are
-tuples) of the support.
matddjensenpar(freq)
matddjensenpar(freq)
freq |
list of arrays. Their |
Positive symmetric matrix whose order is equal to the number of densities, consisting of the pairwise Jensen-Shannon divergences between the discrete probability densities.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
matddjensen
for discrete probability densities which are estimated from the data.
Computes the matrix of the distances between several multivariate or univariate discrete probability distributions, estimated from samples.
matddlp(x, p = 1)
matddlp(x, p = 1)
x |
object of class |
p |
integer. Parameter of the distance. |
Positive symmetric matrix whose order is equal to the number of data frames (or distributions), consisting of the pairwise distances between the distributions.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
ddlp
.
matddlppar
for discrete probability distributions, given the probabilities on the same support.
# Example 1 x1 <- data.frame(x = factor(c("A", "A", "B", "B"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B"))) xf <- folder(x1, x2, x3) matddlp(xf) matddlp(xf, p = 2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B")), y = factor(c("a", "b", "a", "b", "a", "b"))) xf <- folder(x1, x2, x3) matddlp(xf, p = 1)
# Example 1 x1 <- data.frame(x = factor(c("A", "A", "B", "B"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B"))) xf <- folder(x1, x2, x3) matddlp(xf) matddlp(xf, p = 2) # Example 2 x1 <- data.frame(x = factor(c("A", "A", "A", "B", "B", "B")), y = factor(c("a", "a", "a", "b", "b", "b"))) x2 <- data.frame(x = factor(c("A", "A", "A", "B", "B")), y = factor(c("a", "a", "b", "a", "b"))) x3 <- data.frame(x = factor(c("A", "A", "B", "B", "B", "B")), y = factor(c("a", "b", "a", "b", "a", "b"))) xf <- folder(x1, x2, x3) matddlp(xf, p = 1)
Computes the matrix of the distances between several multivariate or univariate discrete probability distributions on the same support (which can be a Cartesian product of
sets), given the probabilities of the states (which are
-tuples) of the support.
matddlppar(freq, p = 1)
matddlppar(freq, p = 1)
freq |
list of arrays. Their |
p |
integer. Parameter of the distance. |
Positive symmetric matrix whose order is equal to the number of distributions, consisting of the pairwise distances between these distributions.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Deza, M.M. and Deza E. (2013). Encyclopedia of distances. Springer.
matddlp
for discrete probability distributions which are estimated from samples.
distances between probability densities
Computes the matrix of the distances between several multivariate (
) or univariate (
) probability densities, estimated from samples.
matdistl2d(x, method = "gaussiand", varwL = NULL)
matdistl2d(x, method = "gaussiand", varwL = NULL)
x |
object of class "folder" containing the data. Its elements have only numeric variables (observations of the probability densities). If there are non numeric variables, there is an error. |
method |
string. It can be:
|
varwL |
list of matrices. The smoothing bandwidths for the estimation of each probability density. If they are omitted, the smoothing bandwidths are computed using the normal reference rule matrix bandwidth (see details of the |
Positive symmetric matrix whose order is equal to the number of densities, consisting of the pairwise distances between the probability densities.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
matdistl2dpar
when the probability densities are Gaussian, given the parameters (means and variances).
data(roses) # Multivariate: X <- as.folder(roses[,c("Sha","Den","Sym","rose")], groups = "rose") summary(X) mean.X <- mean(X) var.X <- var.folder(X) # Parametrically estimated Gaussian densities: matdistl2d(X) ## Not run: # Estimated densities using the Gaussian kernel method ()normal reference rule bandwidth): matdistl2d(X, method = "kern") # Estimated densities using the Gaussian kernel method (bandwidth provided): matdistl2d(X, method = "kern", varwL = var.X) ## End(Not run) # Univariate : X1 <- as.folder(roses[,c("Sha","rose")], groups = "rose") summary(X1) mean.X1 <- mean(X1) var.X1 <- var.folder(X1) # Parametrically estimated Gaussian densities: matdistl2d(X1) # Estimated densities using the Gaussian kernel method (normal reference rule bandwidth): matdistl2d(X1, method = "kern") # Estimated densities using the Gaussian kernel method (normal reference rule bandwidth): matdistl2d(X1, method = "kern", varwL = var.X1)
data(roses) # Multivariate: X <- as.folder(roses[,c("Sha","Den","Sym","rose")], groups = "rose") summary(X) mean.X <- mean(X) var.X <- var.folder(X) # Parametrically estimated Gaussian densities: matdistl2d(X) ## Not run: # Estimated densities using the Gaussian kernel method ()normal reference rule bandwidth): matdistl2d(X, method = "kern") # Estimated densities using the Gaussian kernel method (bandwidth provided): matdistl2d(X, method = "kern", varwL = var.X) ## End(Not run) # Univariate : X1 <- as.folder(roses[,c("Sha","rose")], groups = "rose") summary(X1) mean.X1 <- mean(X1) var.X1 <- var.folder(X1) # Parametrically estimated Gaussian densities: matdistl2d(X1) # Estimated densities using the Gaussian kernel method (normal reference rule bandwidth): matdistl2d(X1, method = "kern") # Estimated densities using the Gaussian kernel method (normal reference rule bandwidth): matdistl2d(X1, method = "kern", varwL = var.X1)
distances between
-normed probability densities
Computes the matrix of the distances between several multivariate (
) or univariate (
)
-normed probability densities, estimated from samples, where a
-normed probability density is the original probability density function divided by its
-norm.
matdistl2dnorm(x, method = "gaussiand", varwL = NULL)
matdistl2dnorm(x, method = "gaussiand", varwL = NULL)
x |
object of class "folder" containing the data. Its elements have only numeric variables (observations of the probability densities). If there are non numeric variables, there is an error. |
method |
string. It can be:
|
varwL |
list of matrices. The smoothing bandwidths for the estimation of each probability density. If they are omitted, the smoothing bandwidths are computed using the normal reference rule matrix bandwidth (see details of the |
Positive symmetric matrix whose order is equal to the number of densities, consisting of the pairwise distances between the -normed probability densities.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
matdistl2d
for the distance matrix between probability densities.
matdistl2dnormpar
when the probability densities are Gaussian, given the parameters (means and variances).
data(roses) # Multivariate: X <- as.folder(roses[,c("Sha","Den","Sym","rose")], groups = "rose") summary(X) mean.X <- mean(X) var.X <- var.folder(X) # Parametrically estimated Gaussian densities: matdistl2dnorm(X) ## Not run: # Estimated densities using the Gaussian kernel method ()normal reference rule bandwidth): matdistl2dnorm(X, method = "kern") # Estimated densities using the Gaussian kernel method (bandwidth provided): matdistl2dnorm(X, method = "kern", varwL = var.X) ## End(Not run) # Univariate : X1 <- as.folder(roses[,c("Sha","rose")], groups = "rose") summary(X1) mean.X1 <- mean(X1) var.X1 <- var.folder(X1) # Parametrically estimated Gaussian densities: matdistl2dnorm(X1) # Estimated densities using the Gaussian kernel method (normal reference rule bandwidth): matdistl2dnorm(X1, method = "kern") # Estimated densities using the Gaussian kernel method (normal reference rule bandwidth): matdistl2dnorm(X1, method = "kern", varwL = var.X1)
data(roses) # Multivariate: X <- as.folder(roses[,c("Sha","Den","Sym","rose")], groups = "rose") summary(X) mean.X <- mean(X) var.X <- var.folder(X) # Parametrically estimated Gaussian densities: matdistl2dnorm(X) ## Not run: # Estimated densities using the Gaussian kernel method ()normal reference rule bandwidth): matdistl2dnorm(X, method = "kern") # Estimated densities using the Gaussian kernel method (bandwidth provided): matdistl2dnorm(X, method = "kern", varwL = var.X) ## End(Not run) # Univariate : X1 <- as.folder(roses[,c("Sha","rose")], groups = "rose") summary(X1) mean.X1 <- mean(X1) var.X1 <- var.folder(X1) # Parametrically estimated Gaussian densities: matdistl2dnorm(X1) # Estimated densities using the Gaussian kernel method (normal reference rule bandwidth): matdistl2dnorm(X1, method = "kern") # Estimated densities using the Gaussian kernel method (normal reference rule bandwidth): matdistl2dnorm(X1, method = "kern", varwL = var.X1)
distances between
-normed Gaussian densities given their parameters
Computes the matrix of the distances between several multivariate (
) or univariate (
)
-normed Gaussian densities, given their parameters (mean vectors and covariance matrices if the densities are multivariate, or means and variances if univariate), where a
-normed Gaussian density is the original probability density function divided by its
-norm.
matdistl2dnormpar(meanL, varL)
matdistl2dnormpar(meanL, varL)
meanL |
list of the means ( |
varL |
list of the variances ( |
Positive symmetric matrix whose order is equal to the number of densities, consisting of the pairwise distances between the -normed probability densities.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
matdistl2dpar
for the distance matrix between Gaussian densities, given their parameters.
matdistl2dnorm
for the distance matrix between normed probability densities which are estimated from the data.
data(roses) # Multivariate: X <- roses[,c("Sha","Den","Sym","rose")] summary(X) mean.X <- as.list(by(X[, 1:3], X$rose, colMeans)) var.X <- as.list(by(X[, 1:3], X$rose, var)) # Gaussian densities, given parameters matdistl2dnormpar(mean.X, var.X) # Univariate : X1 <- roses[,c("Sha","rose")] summary(X1) mean.X1 <- by(X1$Sha, X1$rose, mean) var.X1 <- by(X1$Sha, X1$rose, var) # Gaussian densities, given parameters matdistl2dnormpar(mean.X1, var.X1)
data(roses) # Multivariate: X <- roses[,c("Sha","Den","Sym","rose")] summary(X) mean.X <- as.list(by(X[, 1:3], X$rose, colMeans)) var.X <- as.list(by(X[, 1:3], X$rose, var)) # Gaussian densities, given parameters matdistl2dnormpar(mean.X, var.X) # Univariate : X1 <- roses[,c("Sha","rose")] summary(X1) mean.X1 <- by(X1$Sha, X1$rose, mean) var.X1 <- by(X1$Sha, X1$rose, var) # Gaussian densities, given parameters matdistl2dnormpar(mean.X1, var.X1)
distances between Gaussian densities given their parameters
Computes the matrix of the distances between several multivariate (
) or univariate (
) Gaussian densities, given their parameters (mean vectors and covariance matrices if the densities are multivariate, or means and variances if univariate).
matdistl2dpar(meanL, varL)
matdistl2dpar(meanL, varL)
meanL |
list of the means ( |
varL |
list of the variances ( |
Positive symmetric matrix whose order is equal to the number of densities, consisting of the pairwise distances between the probability densities.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
matdistl2d
for the distance matrix between probability densities which are estimated from the data.
data(roses) # Multivariate: X <- roses[,c("Sha","Den","Sym","rose")] summary(X) mean.X <- as.list(by(X[, 1:3], X$rose, colMeans)) var.X <- as.list(by(X[, 1:3], X$rose, var)) # Gaussian densities, given parameters matdistl2dpar(mean.X, var.X) # Univariate : X1 <- roses[,c("Sha","rose")] summary(X1) mean.X1 <- by(X1$Sha, X1$rose, mean) var.X1 <- by(X1$Sha, X1$rose, var) # Gaussian densities, given parameters matdistl2dpar(mean.X1, var.X1)
data(roses) # Multivariate: X <- roses[,c("Sha","Den","Sym","rose")] summary(X) mean.X <- as.list(by(X[, 1:3], X$rose, colMeans)) var.X <- as.list(by(X[, 1:3], X$rose, var)) # Gaussian densities, given parameters matdistl2dpar(mean.X, var.X) # Univariate : X1 <- roses[,c("Sha","rose")] summary(X1) mean.X1 <- by(X1$Sha, X1$rose, mean) var.X1 <- by(X1$Sha, X1$rose, var) # Gaussian densities, given parameters matdistl2dpar(mean.X1, var.X1)
Computes the matrix of the Hellinger distances between several multivariate () or univariate (
) Gaussian densities given samples and using
hellinger
.
mathellinger(x)
mathellinger(x)
x |
object of class "folder" containing the data. Its elements have only numeric variables (observations of the probability densities). If there are non numeric variables, there is an error. |
Positive symmetric matrix whose order is equal to the number of densities, consisting of the pairwise Hellinger distances between the probability densities.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
mathellingerpar
when the probability densities are Gaussian, given the parameters (means and variances).
data(roses) # Multivariate: X <- as.folder(roses[,c("Sha","Den","Sym","rose")], groups = "rose") summary(X) mathellinger(X) # Univariate : X1 <- as.folder(roses[,c("Sha","rose")], groups = "rose") summary(X1) mathellinger(X1)
data(roses) # Multivariate: X <- as.folder(roses[,c("Sha","Den","Sym","rose")], groups = "rose") summary(X) mathellinger(X) # Univariate : X1 <- as.folder(roses[,c("Sha","rose")], groups = "rose") summary(X1) mathellinger(X1)
Computes the matrix of the Hellinger distances between several multivariate () or univariate (
) Gaussian densities, given their means and variances, using
hellingerpar
.
mathellingerpar(meanL, varL)
mathellingerpar(meanL, varL)
meanL |
list of the means ( |
varL |
list of the variances ( |
Positive symmetric matrix whose order is equal to the number of densities, consisting of the pairwise distances between the Gaussian densities.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
mathellinger
for the distance matrix between probability densities which are estimated from the data.
data(roses) # Multivariate: X <- roses[,c("Sha","Den","Sym","rose")] summary(X) mean.X <- as.list(by(X[, 1:3], X$rose, colMeans)) var.X <- as.list(by(X[, 1:3], X$rose, var)) mathellingerpar(mean.X, var.X) # Univariate : X1 <- roses[,c("Sha","rose")] summary(X1) mean.X1 <- by(X1$Sha, X1$rose, mean) var.X1 <- by(X1$Sha, X1$rose, var) mathellingerpar(mean.X1, var.X1)
data(roses) # Multivariate: X <- roses[,c("Sha","Den","Sym","rose")] summary(X) mean.X <- as.list(by(X[, 1:3], X$rose, colMeans)) var.X <- as.list(by(X[, 1:3], X$rose, var)) mathellingerpar(mean.X, var.X) # Univariate : X1 <- roses[,c("Sha","rose")] summary(X1) mean.X1 <- by(X1$Sha, X1$rose, mean) var.X1 <- by(X1$Sha, X1$rose, var) mathellingerpar(mean.X1, var.X1)
inner products of probability densities
Computes the matrix of the inner products between several multivariate (
) or univariate (
) probability densities, estimated from samples, using
l2d
.
matipl2d(x, method = "gaussiand", varwL = NULL)
matipl2d(x, method = "gaussiand", varwL = NULL)
x |
object of class "folder" containing the data. Its elements have only numeric variables (observations of the probability densities). If there are non numeric variables, there is an error. |
method |
string. It can be:
|
varwL |
list of matrices. The smoothing bandwidths for the estimation of each probability density. If they are omitted, the smoothing bandwidths are computed using the normal reference rule matrix bandwidth (see details of the |
Positive symmetric matrix whose order is equal to the number of densities, consisting of the pairwise inner products between the probability densities.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
l2d
.
matipl2dpar
when the probability densities are Gaussian, given the parameters (means and variances).
data(roses) # Multivariate: X <- as.folder(roses[,c("Sha","Den","Sym","rose")], groups = "rose") summary(X) mean.X <- mean(X) var.X <- var.folder(X) # Parametrically estimated Gaussian densities: matipl2d(X) # Estimated densities using the Gaussian kernel method (normal reference rule bandwidth): matipl2d(X, method = "kern") # Estimated densities using the Gaussian kernel method (bandwidth provided): matipl2d(X, method = "kern", varwL = var.X) # Univariate : X1 <- as.folder(roses[,c("Sha","rose")], groups = "rose") summary(X1) mean.X1 <- mean(X1) var.X1 <- var.folder(X1) # Parametrically estimated Gaussian densities: matipl2d(X1) # Estimated densities using the Gaussian kernel method (normal reference rule bandwidth): matipl2d(X1, method = "kern") # Estimated densities using the Gaussian kernel method (bandwidth provided): matipl2d(X1, method = "kern", varwL = var.X1)
data(roses) # Multivariate: X <- as.folder(roses[,c("Sha","Den","Sym","rose")], groups = "rose") summary(X) mean.X <- mean(X) var.X <- var.folder(X) # Parametrically estimated Gaussian densities: matipl2d(X) # Estimated densities using the Gaussian kernel method (normal reference rule bandwidth): matipl2d(X, method = "kern") # Estimated densities using the Gaussian kernel method (bandwidth provided): matipl2d(X, method = "kern", varwL = var.X) # Univariate : X1 <- as.folder(roses[,c("Sha","rose")], groups = "rose") summary(X1) mean.X1 <- mean(X1) var.X1 <- var.folder(X1) # Parametrically estimated Gaussian densities: matipl2d(X1) # Estimated densities using the Gaussian kernel method (normal reference rule bandwidth): matipl2d(X1, method = "kern") # Estimated densities using the Gaussian kernel method (bandwidth provided): matipl2d(X1, method = "kern", varwL = var.X1)
inner products of Gaussian densities
Computes the matrix of the inner products between several multivariate (
) or univariate (
) Gaussian densities, given their parameters (mean vectors and covariance matrices if the densities are multivariate, or means and variances if univariate).
matipl2dpar(meanL, varL)
matipl2dpar(meanL, varL)
meanL |
list of the means ( |
varL |
list of the variances ( |
Positive symmetric matrix whose order is equal to the number of densities, consisting of the pairwise inner products between the Gaussian densities.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
matipl2d
for the distance matrix between probability densities which are estimated from the data.
data(roses) # Multivariate: X <- roses[,c("Sha","Den","Sym","rose")] summary(X) mean.X <- as.list(by(X[, 1:3], X$rose, colMeans)) var.X <- as.list(by(X[, 1:3], X$rose, var)) # Gaussian densities, given parameters matipl2dpar(mean.X, var.X) # Univariate : X1 <- roses[,c("Sha","rose")] summary(X1) mean.X1 <- by(X1$Sha, X1$rose, mean) var.X1 <- by(X1$Sha, X1$rose, var) # Gaussian densities, given parameters matipl2dpar(mean.X1, var.X1)
data(roses) # Multivariate: X <- roses[,c("Sha","Den","Sym","rose")] summary(X) mean.X <- as.list(by(X[, 1:3], X$rose, colMeans)) var.X <- as.list(by(X[, 1:3], X$rose, var)) # Gaussian densities, given parameters matipl2dpar(mean.X, var.X) # Univariate : X1 <- roses[,c("Sha","rose")] summary(X1) mean.X1 <- by(X1$Sha, X1$rose, mean) var.X1 <- by(X1$Sha, X1$rose, var) # Gaussian densities, given parameters matipl2dpar(mean.X1, var.X1)
Computes the matrix of Jeffreys measures between several multivariate () or univariate (
) Gaussian densities, given samples.
matjeffreys(x)
matjeffreys(x)
x |
object of class "folder" containing the data. Its elements have only numeric variables (observations of the probability densities). If there are non numeric variables, there is an error. |
Positive symmetric matrix whose order is equal to the number of densities, consisting of pairwise Jeffreys measures between the Gaussian densities.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
matjeffreyspar
if the parameters of the Gaussian densities are known.
data(roses) # Multivariate: X <- as.folder(roses[,c("Sha","Den","Sym","rose")], groups = "rose") summary(X) matjeffreys(X) # Univariate : X1 <- as.folder(roses[,c("Sha","rose")], groups = "rose") summary(X1) matjeffreys(X1)
data(roses) # Multivariate: X <- as.folder(roses[,c("Sha","Den","Sym","rose")], groups = "rose") summary(X) matjeffreys(X) # Univariate : X1 <- as.folder(roses[,c("Sha","rose")], groups = "rose") summary(X1) matjeffreys(X1)
Computes the matrix of Jeffreys measures between several multivariate () or univariate (
) Gaussian densities, given their parameters (mean vectors and covariance matrices if the densities are multivariate, or means and variances if univariate), using
jeffreyspar
.
matjeffreyspar(meanL, varL)
matjeffreyspar(meanL, varL)
meanL |
list of the means ( |
varL |
list of the variances ( |
Positive symmetric matrix whose order is equal to the number of densities, consisting of pairwise Jeffreys measures between the Gaussian densities.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
matjeffreys
for the matrix of Jeffreys divergences between probability densities which are estimated from the data.
data(roses) # Multivariate: X <- roses[,c("Sha","Den","Sym","rose")] summary(X) mean.X <- as.list(by(X[, 1:3], X$rose, colMeans)) var.X <- as.list(by(X[, 1:3], X$rose, var)) matjeffreyspar(mean.X, var.X) # Univariate : X1 <- roses[,c("Sha","rose")] summary(X1) mean.X1 <- by(X1$Sha, X1$rose, mean) var.X1 <- by(X1$Sha, X1$rose, var) matjeffreyspar(mean.X1, var.X1)
data(roses) # Multivariate: X <- roses[,c("Sha","Den","Sym","rose")] summary(X) mean.X <- as.list(by(X[, 1:3], X$rose, colMeans)) var.X <- as.list(by(X[, 1:3], X$rose, var)) matjeffreyspar(mean.X, var.X) # Univariate : X1 <- roses[,c("Sha","rose")] summary(X1) mean.X1 <- by(X1$Sha, X1$rose, mean) var.X1 <- by(X1$Sha, X1$rose, var) matjeffreyspar(mean.X1, var.X1)
Computes the matrix of the 2-Wassterstein distances between several multivariate () or univariate (
) Gaussian densities, given samples.
matwasserstein(x)
matwasserstein(x)
x |
object of class "folder" containing the data. Its elements have only numeric variables (observations of the probability densities). If there are non numeric variables, there is an error. |
Positive symmetric matrix whose order is equal to the number of densities, consisting of the pairwise 2-Wassterstein distance between the Gaussian densities.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
matwassersteinpar
if the parameters of the Gaussian densities are known.
data(roses) # Multivariate: X <- as.folder(roses[,c("Sha","Den","Sym","rose")], groups = "rose") summary(X) matwasserstein(X) # Univariate : X1 <- as.folder(roses[,c("Sha","rose")], groups = "rose") summary(X1) matwasserstein(X1)
data(roses) # Multivariate: X <- as.folder(roses[,c("Sha","Den","Sym","rose")], groups = "rose") summary(X) matwasserstein(X) # Univariate : X1 <- as.folder(roses[,c("Sha","rose")], groups = "rose") summary(X1) matwasserstein(X1)
Computes the matrix of the 2-Wasserstein distances between several multivariate () or univariate (
) Gaussian densities, given their parameters (mean vectors and covariance matrices if the densities are multivariate, or means and variances if univariate), using
wassersteinpar
.
matwassersteinpar(meanL, varL)
matwassersteinpar(meanL, varL)
meanL |
list of the means ( |
varL |
list of the variances ( |
Positive symmetric matrix whose order is equal to the number of densities, consisting of the pairwise 2-Wasserstein distances between the Gaussian densities.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
matwasserstein
for the matrix of 2-Wasserstein distances between probability densities which are estimated from the data.
data(roses) # Multivariate: X <- roses[,c("Sha","Den","Sym","rose")] summary(X) mean.X <- as.list(by(X[, 1:3], X$rose, colMeans)) var.X <- as.list(by(X[, 1:3], X$rose, var)) matwassersteinpar(mean.X, var.X) # Univariate : X1 <- roses[,c("Sha","rose")] summary(X1) mean.X1 <- by(X1$Sha, X1$rose, mean) var.X1 <- by(X1$Sha, X1$rose, var) matwassersteinpar(mean.X1, var.X1)
data(roses) # Multivariate: X <- roses[,c("Sha","Den","Sym","rose")] summary(X) mean.X <- as.list(by(X[, 1:3], X$rose, colMeans)) var.X <- as.list(by(X[, 1:3], X$rose, var)) matwassersteinpar(mean.X, var.X) # Univariate : X1 <- roses[,c("Sha","rose")] summary(X1) mean.X1 <- by(X1$Sha, X1$rose, mean) var.X1 <- by(X1$Sha, X1$rose, var) matwassersteinpar(mean.X1, var.X1)
Applies the multidimensional scaling (MDS) method to discrete probability distributions in order to describe groups of individuals on which are observed
categorical variables. It returns an object of class
mdsdd
. It applies cmdscale
to the distance matrix between the distributions.
mdsdd(xf, group.name = "group", distance = c("l1", "l2", "chisqsym", "hellinger", "jeffreys", "jensen", "lp"), nb.factors = 3, nb.values = 10, association = c("cramer", "tschuprow", "pearson", "phi"), sub.title = "", plot.eigen = TRUE, plot.score = FALSE, nscore = 1:3, filename = NULL, add = TRUE, p)
mdsdd(xf, group.name = "group", distance = c("l1", "l2", "chisqsym", "hellinger", "jeffreys", "jensen", "lp"), nb.factors = 3, nb.values = 10, association = c("cramer", "tschuprow", "pearson", "phi"), sub.title = "", plot.eigen = TRUE, plot.score = FALSE, nscore = 1:3, filename = NULL, add = TRUE, p)
xf |
object of class
|
group.name |
string. Name of the grouping variable. Default: |
distance |
The distance or divergence used to compute the distance matrix between the discrete distributions (see Details). It can be:
|
nb.factors |
numeric. Number of returned principal coordinates (default Warning: The |
nb.values |
numeric. Number of returned eigenvalues (default |
association |
The association measure between two discrete distributions to be used (see Details). It can be:
|
sub.title |
string. Subtitle for the graphs (default |
plot.eigen |
logical. If |
plot.score |
logical. If |
nscore |
numeric vector. If |
filename |
string. Name of the file in which the results are saved. By default ( |
add |
logical indicating if an additive constant should be computed and added to the non diagonal dissimilarities such that the modified dissimilarities are Euclidean (default |
p |
integer. Optional. When |
If a folder is given as argument, the discrete probability distributions
corresponding to the
groups of individuals are estimated from observations.
Then the distances/dissimilarities between the estimated distributions are computed, using the distance or divergence defined by the
distance
argument:
If the distance is "l1"
, "l2"
or "lp"
, the distances are computed by the function matddlppar
.
Otherwise, it can be computed by matddchisqsympar
("chisqsym"
), matddhellingerpar
("hellinger"
), matddjeffreyspar
("jeffreys"
) or matddjensenpar
("jensen"
).
The association measures are computed accordingly to the value of the parameter association
The computation uses the corresponding function of the package DescTools
(see Assocs
). Notice that an association measure between a constant variable with and other variable is set to zero. The association measure between each variable with itself is not computed and the diagonal of the returned association matrices is set to NA
.
Returns an object of class mdsdd
, that is a list including:
inertia |
data frame of the eigenvalues and the percentages of their sum. |
scores |
data frame of the coordinates along the |
jointp |
list of arrays. The joint probability distribution for each group. |
margins |
list of two data frames giving respectively:
|
associations |
list of |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
Cox, T.F., Cox, M.A.A. (2001). Multidimensional Scaling, second ed. Chapman & Hall/CRC.
Saporta, G. (2006). Probabilit\'es, Analyse des donn\'ees et Statistique. Editions Technip, Paris.
print.mdsdd, plot.mdsdd, interpret.mdsdd
# Example 1 with a folder (10 groups) of 3 factors # obtained by converting numeric variables data(roses) xr = roses[,c("Sha", "Den", "Sym", "rose")] xf = as.folder(xr, groups = "rose") xf = cut(xf, breaks = list(c(0, 5, 7, 10), c(0, 4, 6, 10), c(0, 6, 8, 10)), cutcol = 1:3) af = mdsdd(xf) print(af) print(af$jointp) print(af$margins[[1]]) # equivalent to print(af$margins$margin1) print(af$margins[[2]]) print(af$associations) # Example 2 with a data frame obtained by converting numeric variables data(roses) xr = roses[,c("Sha", "Den", "Sym", "rose")] xr = cut(xr, breaks = list(c(0, 5, 7, 10), c(0, 4, 6, 10), c(0, 6, 8, 10)), cutcol = 1:3) ar = mdsdd(xr, group.name = "rose") print(ar) print(ar$jointp) print(ar$margins[[1]]) # equivalent to print(ar$margins$margin1) print(ar$margins[[2]]) print(ar$associations) # Example 3 with a list of 7 arrays data(dspg) xl = dspg mdsdd(xl)
# Example 1 with a folder (10 groups) of 3 factors # obtained by converting numeric variables data(roses) xr = roses[,c("Sha", "Den", "Sym", "rose")] xf = as.folder(xr, groups = "rose") xf = cut(xf, breaks = list(c(0, 5, 7, 10), c(0, 4, 6, 10), c(0, 6, 8, 10)), cutcol = 1:3) af = mdsdd(xf) print(af) print(af$jointp) print(af$margins[[1]]) # equivalent to print(af$margins$margin1) print(af$margins[[2]]) print(af$associations) # Example 2 with a data frame obtained by converting numeric variables data(roses) xr = roses[,c("Sha", "Den", "Sym", "rose")] xr = cut(xr, breaks = list(c(0, 5, 7, 10), c(0, 4, 6, 10), c(0, 6, 8, 10)), cutcol = 1:3) ar = mdsdd(xr, group.name = "rose") print(ar) print(ar$jointp) print(ar$margins[[1]]) # equivalent to print(ar$margins$margin1) print(ar$margins[[2]]) print(ar$associations) # Example 3 with a list of 7 arrays data(dspg) xl = dspg mdsdd(xl)
Computes the means by column of the elements of an object of class folder
.
## S3 method for class 'folder' mean(x, ..., na.rm = FALSE)
## S3 method for class 'folder' mean(x, ..., na.rm = FALSE)
x |
an object of class |
... |
further arguments passed to or from other methods. |
na.rm |
logical. Should missing values (including NaN) be omitted from the calculations? (see |
It uses colMeans
to compute the mean by numeric column of each element of the folder. If some columns of the data frames are not numeric, there is a warning, and the means are computed on the numeric columns only.
A list whose elements are the mean by column of the elements of the folder.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
folder
to create an object of class folder
.
var.folder
, cor.folder
, skewness.folder
, kurtosis.folder
for other statistics for folder
objects.
# First example: iris (Fisher) data(iris) iris.fold <- as.folder(iris, "Species") iris.means <- mean(iris.fold) print(iris.means) # Second example: roses data(roses) roses.fold <- as.folder(roses, "rose") roses.means <- mean(roses.fold) print(roses.means)
# First example: iris (Fisher) data(iris) iris.fold <- as.folder(iris, "Species") iris.means <- mean(iris.fold) print(iris.means) # Second example: roses data(roses) roses.fold <- as.folder(roses, "rose") roses.means <- mean(roses.fold) print(roses.means)
For a vertex in an object of class foldermtg
, computes its decomposition into vertices of an upper scale.
mtgcomponents(x, vertex, scale)
mtgcomponents(x, vertex, scale)
x |
an object of class |
vertex |
character. The identifier of a vertex. These identifiers are the rownames of the data frame |
scale |
integer. The scale of the components of |
If vertex
is a vertex of scale i
, then scale
(the scale of the returned components of vertex
) must be higher than i
. For example, if vertex
is a vertex of scale 2, then scale > 2
, for instance scale = 3
. The returned components are then vertices of scale 3 which have a decomposition relationship with vertex
.
A character vector, containing the idendifiers of the components of vertex
.
If there is no component, then the returned vector is empty.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Pradal, C., Godin, C. and Cokelaer, T. (2023). MTG user guide
read.mtg
: reads a MTG file and builds an object of class foldermtg
.
mtgfile <- system.file("extdata/plant1.mtg", package = "dad") xmtg <- read.mtg(mtgfile) # Vertex of class "P" (plant, of scale 1), components of class 2 (axes: "A") mtgcomponents(xmtg, vertex = "v01", scale = 2) # Vertex of class "P" (plant, of scale 1), components of class 3 ("O", "M" and "I") mtgcomponents(xmtg, vertex = "v01", scale = 3) # Vertex of class "A" (stem, of scale 2), components of class 3 ("O", "M" and "I") mtgcomponents(xmtg, vertex = "v12", scale = 3)
mtgfile <- system.file("extdata/plant1.mtg", package = "dad") xmtg <- read.mtg(mtgfile) # Vertex of class "P" (plant, of scale 1), components of class 2 (axes: "A") mtgcomponents(xmtg, vertex = "v01", scale = 2) # Vertex of class "P" (plant, of scale 1), components of class 3 ("O", "M" and "I") mtgcomponents(xmtg, vertex = "v01", scale = 3) # Vertex of class "A" (stem, of scale 2), components of class 3 ("O", "M" and "I") mtgcomponents(xmtg, vertex = "v12", scale = 3)
Computes the branching order of vertices contained in an object of class foldermtg
. The order of a vertex is the number of the column of topology
, which contains this vertex.
mtgorder(x, classes = "all", display = FALSE)
mtgorder(x, classes = "all", display = FALSE)
x |
an object of class |
classes |
character vector. The classes of entities for which the branching order is computed. If omitted, the branching orders are computed for all entities. |
display |
logical. If |
Returns x
after appending the branching orders of the vertices of the classes given in the argument classes
. The branching orders
are appended to the data frames containing the vertices (one data frame per class) and the values of their corresponding features.
Returns an object of class foldermtg
, that is a list of data frames.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Pradal, C., Godin, C. and Cokelaer, T. (2023). MTG user guide
read.mtg
: reads a MTG file and builds an object of class foldermtg
.
mtgfile <- system.file("extdata/plant1.mtg", package = "dad") xmtg <- read.mtg(mtgfile) # The branching orders ymtg <- mtgorder(xmtg) print(ymtg) # Add the branching orders to the 'foldermtg' zmtg <- mtgorder(xmtg, display = TRUE) print(zmtg)
mtgfile <- system.file("extdata/plant1.mtg", package = "dad") xmtg <- read.mtg(mtgfile) # The branching orders ymtg <- mtgorder(xmtg) print(ymtg) # Add the branching orders to the 'foldermtg' zmtg <- mtgorder(xmtg, display = TRUE) print(zmtg)
foldermtg
These data produced by the SAGAH team (Sciences Agronomiques Appliquées à l'Horticulture, now Research Institute on Horticulture and Seeds), provide the topological structure of a rosebush.
data("mtgplant1")
data("mtgplant1")
This object of class foldermtg
is a list of 10 data frames:
mtgplant1$classes
: data frame with 6 rows and 5 columns named
SYMBOL
(factor: the classes of the vertices), SCALE
(integer: the scale at which they appear), DECOMPOSITION
(factor), INDEXATION
(factor) and DEFINITION
(factor).
The vertex classes are:
P
: the whole plant (scale 1)
A
: the axes (scale 2)
O
, M
, I
: the ..., metamers (phytomers) and inflorescences (scale 3)
mtgplant1$description
: data frame with 8 rows and 4 columns (factors) named LEFT
, RIGHT
, RELTYPE
and MAX
.
mtgplant1$features
: data frame with 13 rows and 2 columns (factors) named NAME
and TYPE
.
mtgplant1$topology
: data frame with 88 rows and 4 columns:
order1
, order2
and order3
(factors): the codes of the vertices, as they are found in the MTG table of the MTG file. The column on which a code appears gives the branching order of the corresponding vertex.
vertex
(character): the same codes of vertices, on a single column.
mtgplant1$coordinates
: data frame with 86 rows and 6 columns (numeric) named XX
, YY
and 22
: cartesian coordinates of the vertices, and AA
, BB
and CC
: an other coordinates system.
mtgplant1$P
, mtgplant1$A
, mtgplant1$M
, mtgplant1$I
: data frames of the features on the vertices (all numeric).
This object of class foldermtg
can be built by reading the data in a MTG file (see examples).
Pradal, C., Godin, C. and Cokelaer, T. (2023). MTG user guide
read.mtg
: to read an MTG file and build an object of class MTG.
mtgplant2
: an other example of such data.
data(mtgplant1) print(mtgplant1) # To read these data from a MTG file: mtgfile1 <- system.file("extdata/plant1.mtg", package = "dad") mtgplant1 <- read.mtg(mtgfile1) print(mtgplant1)
data(mtgplant1) print(mtgplant1) # To read these data from a MTG file: mtgfile1 <- system.file("extdata/plant1.mtg", package = "dad") mtgplant1 <- read.mtg(mtgfile1) print(mtgplant1)
foldermtg
These data provides the topology of a bushy plant.
data("mtgplant2")
data("mtgplant2")
This object of class foldermtg
is a list of 9 data frames:
mtgplant2$classes
: data frame with 6 rows and 5 columns named
SYMBOL
(factor: the classes of the vertices), SCALE
(integer: the scale at which they appear), DECOMPOSITION
(factor), INDEXATION
(factor) and DEFINITION
(factor).
The vertex classes are:
P
: the whole plant (scale 1)
A
: the axes (scale 2)
F
, I
: the flower and internodes (scale 3)
mtgplant2$description
: data frame with 4 rows and 4 columns (factors) named LEFT
, RIGHT
, RELTYPE
and MAX
.
mtgplant2$features
: data frame with 9 rows and 2 columns (factors) named NAME
and TYPE
.
mtgplant2$topology
: data frame with 14 rows and 3 columns:
order1
and order2
(factors): the codes of the vertices, as they are found in the MTG table of the MTG file. The column on which a code appears gives the branching order of the corresponding vertex.
vertex
(character): the same codes of vertices, on a single column.
mtgplant2$coordinates
: data frame with 0 rows and 0 columns (there are no spatial coordinates in these MTG data).
mtgplant2$P
, mtgplant2$A
, mtgplant2$F
and mtgplant2$I
: data frames of the features on the vertices (all numeric).
This object of class foldermtg
can be built by reading the data in a MTG file (see examples).
Pradal, C., Godin, C. and Cokelaer, T. (2023). MTG user guide
read.mtg
: to read an MTG file and build an object of class MTG.
mtgplant1
: an other example of such data.
data(mtgplant2) print(mtgplant2) # To read these data from a MTG file: mtgfile2 <- system.file("extdata/plant2.mtg", package = "dad") mtgplant2 <- read.mtg(mtgfile2) print(mtgplant2)
data(mtgplant2) print(mtgplant2) # To read these data from a MTG file: mtgfile2 <- system.file("extdata/plant2.mtg", package = "dad") mtgplant2 <- read.mtg(mtgfile2) print(mtgplant2)
Computes the rank of the vertices contained in an object of class foldermtg
. The vertex sequences resulting from a decomposition of other vertices, the rank of the vertices making up the sequences are computed from the beginning of the sequence or from its end. These ranks can be absolute or relative.
For example: ranks of the phytomeres and inflorescences in each stem.
mtgrank(x, classe, parent.class = NULL, sibling.classes = NULL, relative = FALSE, from = c("origin", "end"), rank.name = "Rank", display = FALSE)
mtgrank(x, classe, parent.class = NULL, sibling.classes = NULL, relative = FALSE, from = c("origin", "end"), rank.name = "Rank", display = FALSE)
x |
an object of class |
classe |
character. The class of the vertices for which the ranks are computed. |
parent.class |
character. The class of the parent entities of those for which the ranks are computed. If omitted, the entities of scale |
sibling.classes |
character vector. The classes of vertices appearing at the same scale as If omitted, only the vertices of class |
relative |
logical. If |
from |
character. It can be If |
rank.name |
character. Name of the rank column that is appended to |
display |
logical. If |
If the branching orders of the entities given by classe
, parent.class
and, if relevant, sibling.classes
are not contained in x
, mtgrank()
uses mtgorder
to compute them. The ranks are appended to the data frames containing the vertices (one data frame per class) and the values of their corresponding features.
Returns an object of class foldermtg
, that is a list of data frames.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Pradal, C., Godin, C. and Cokelaer, T. (2023). MTG user guide
read.mtg
: reads a MTG file and builds an object of class foldermtg
.
mtgfile <- system.file("extdata/plant1.mtg", package = "dad") xmtg <- read.mtg(mtgfile) ymtg <- mtgrank(xmtg, "M") print(ymtg) mtgrank(xmtg, "M", display = TRUE) mtgrank(xmtg, "M", parent.class = "A", display = TRUE) mtgrank(xmtg, "M", parent.class = "A", sibling.classes = c("O", "I"), display = TRUE) mtgrank(xmtg, "M", relative = TRUE, display = TRUE) mtgrank(xmtg, "M", from = "origin", display = TRUE) mtgrank(xmtg, "M", from = "end", display = TRUE)
mtgfile <- system.file("extdata/plant1.mtg", package = "dad") xmtg <- read.mtg(mtgfile) ymtg <- mtgrank(xmtg, "M") print(ymtg) mtgrank(xmtg, "M", display = TRUE) mtgrank(xmtg, "M", parent.class = "A", display = TRUE) mtgrank(xmtg, "M", parent.class = "A", sibling.classes = c("O", "I"), display = TRUE) mtgrank(xmtg, "M", relative = TRUE, display = TRUE) mtgrank(xmtg, "M", from = "origin", display = TRUE) mtgrank(xmtg, "M", from = "end", display = TRUE)
Applies to an object of class "dstatis"
(see details of the
dstatis.inter
function). Plots the scores.
## S3 method for class 'dstatis' plot(x, nscore = c(1, 2), sub.title = NULL, color = NULL, fontsize.points = 1.5, ...)
## S3 method for class 'dstatis' plot(x, nscore = c(1, 2), sub.title = NULL, color = NULL, fontsize.points = 1.5, ...)
x |
object of class |
nscore |
a length 2 numeric vector. The numbers of the score vectors to be plotted. Warning: Its components cannot be greater than the |
sub.title |
string. Subtitle to be added to each graph. |
color |
When provided, the colour of the symbols of each group. Can be a vector with length equal to the number of groups. |
fontsize.points |
Numeric. Expansion of the characters (or symbols) of the groups on the graph. This works as a multiple of |
... |
optional arguments to |
Plots the principal scores returned by the dstatis.inter
function.
A new graphics window is opened for each pair of principal axes defined by the nscore
argument.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Lavit, C., Escoufier, Y., Sabatier, R., Traissac, P. (1994). The ACT (STATIS method). Computational Statistics & Data Analysis, 18 (1994), 97-119.
dstatis.inter; print.dstatis; interpret.dstatis.
data(roses) rosesf <- as.folder(roses[,c("Sha","Den","Sym","rose")]) # Dual STATIS on the covariance matrices result <- dstatis.inter(rosesf, data.scaled = FALSE, group.name = "rose") plot(result)
data(roses) rosesf <- as.folder(roses[,c("Sha","Den","Sym","rose")]) # Dual STATIS on the covariance matrices result <- dstatis.inter(rosesf, data.scaled = FALSE, group.name = "rose") plot(result)
Applies to an object of class fhclustd
(see details of the
fhclustd
function). Plots the dendogram.
## S3 method for class 'fhclustd' plot(x, labels = NULL, hang = 0.1, check = TRUE, axes = TRUE, frame.plot = FALSE, ann = TRUE, main = "HCA of probability density functions", sub = NULL, xlab = NULL, ylab = "Height", ...)
## S3 method for class 'fhclustd' plot(x, labels = NULL, hang = 0.1, check = TRUE, axes = TRUE, frame.plot = FALSE, ann = TRUE, main = "HCA of probability density functions", sub = NULL, xlab = NULL, ylab = "Height", ...)
x |
object of class |
labels , hang , check , axes , frame.plot , ann , main , sub , xlab , ylab
|
Arguments concerning the graphical representation of the dendogram. See |
... |
Further graphical arguments. |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
data(castles.dated) xf <- as.folder(castles.dated$stones) ## Not run: result <- fhclustd(xf) plot(result) plot(result, hang = -1) ## End(Not run)
data(castles.dated) xf <- as.folder(castles.dated$stones) ## Not run: result <- fhclustd(xf) plot(result) plot(result, hang = -1) ## End(Not run)
Applies to an object of class "fmdsd"
(see the details section of the
fmdsd
function). Plots the scores.
## S3 method for class 'fmdsd' plot(x, nscore = c(1, 2), main="MDS of probability density functions", sub.title = NULL, color = NULL, fontsize.points = 1.5, ...)
## S3 method for class 'fmdsd' plot(x, nscore = c(1, 2), main="MDS of probability density functions", sub.title = NULL, color = NULL, fontsize.points = 1.5, ...)
x |
object of class |
nscore |
a length 2 numeric vector. The numbers of the score vectors to be plotted. Warning: Its components cannot be greater than the |
main |
this argument to title has an useful default here. |
sub.title |
string. Subtitle to be added to each graph. |
color |
When provided, the colour of the symbols of each group. Can be a vector with length equal to the number of groups. |
fontsize.points |
Numeric. Expansion of the characters (or symbols) of the groups on the graph. This works as a multiple of |
... |
optional arguments to |
Plots the principal scores returned by the function fmdsd
.
A new graphics window is opened for each pair of principal score vectors defined by the
nscore
argument.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.
fmdsd; print.fmdsd; interpret.fmdsd.
data(roses) x <- roses[,c("Sha","Den","Sym","rose")] rosesfold <- as.folder(x) result <- fmdsd(rosesfold) plot(result)
data(roses) x <- roses[,c("Sha","Den","Sym","rose")] rosesfold <- as.folder(x) result <- fmdsd(rosesfold) plot(result)
Applies to an object of class foldert
(called foldert below) that is a list.
Plots the longitudinal evolution of a numeric variable for every individuals.
## S3 method for class 'foldert' plot(x, which, na.inter = TRUE, type = "l", ylim = NULL, ylab = which, main = "", ...)
## S3 method for class 'foldert' plot(x, which, na.inter = TRUE, type = "l", ylim = NULL, ylab = which, main = "", ...)
x |
object of class |
which |
character. Name of a column of the data frames of For each element |
na.inter |
logical. If |
type |
character string (length 1 vector) or vector of 1-character strings (default |
ylim |
ranges of y axis. |
ylab |
a label for the |
main |
an overall title for the plot: see |
... |
optional arguments to |
Internally, plot.foldert
builds a matrix mdata
containing the data of the variable given by which
argument.
The element mdata[ind, t]
of this matrix is the value of the variable which
for the individual ind
: x[[t]][ind, which]
.
If the ylim
argument is omitted, the range of y
axis is given by range(mdata, na.rm = TRUE)*c(0, 1.2)
.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.
foldert
: object of class foldert
.
as.foldert.data.frame
: build an object of class foldert
from a data frame.
as.foldert.array
: build an object of class foldert
from a -array.
data(floribundity) ftflor <- foldert(floribundity, cols.select = "union", rows.select = "union") plot(ftflor, which = "nflowers", ylab = "Number of flowers per plant", main = "Floribundity of rosebushes, 2010, Angers (France)")
data(floribundity) ftflor <- foldert(floribundity, cols.select = "union", rows.select = "union") plot(ftflor, which = "nflowers", ylab = "Number of flowers per plant", main = "Floribundity of rosebushes, 2010, Angers (France)")
Applies to an object of class "fpcad"
(see details of the
fpcad
function). Plots the scores.
## S3 method for class 'fpcad' plot(x, nscore = c(1, 2), main = "PCA of probability density functions", sub.title = NULL, color = NULL, fontsize.points = 1.5, ...)
## S3 method for class 'fpcad' plot(x, nscore = c(1, 2), main = "PCA of probability density functions", sub.title = NULL, color = NULL, fontsize.points = 1.5, ...)
x |
object of class |
nscore |
a length 2 numeric vector. The numbers of the score vectors to be plotted. Warning: Its components cannot be greater than the |
main |
this argument to title has an useful default here. |
sub.title |
string. Subtitle to be added to each graph. |
color |
When provided, the colour of the symbols of each group. Can be a vector with length equal to the number of groups. |
fontsize.points |
Numeric. Expansion of the characters (or symbols) of the groups on the graph. This works as a multiple of |
... |
optional arguments to |
Plots the principal scores returned by the fpcad
function.
A new graphics window is opened for each pair of principal axes defined by the nscore
argument.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.
fpcad; print.fpcad; interpret.fpcad.
data(roses) rosefold <- as.folder(roses[,c("Sha","Den","Sym","rose")]) result <- fpcad(rosefold) plot(result)
data(roses) rosefold <- as.folder(roses[,c("Sha","Den","Sym","rose")]) result <- fpcad(rosefold) plot(result)
Applies to an object of class "fpcat"
(see details of the
fpcat
function). Plots the scores.
## S3 method for class 'fpcat' plot(x, nscore=c(1, 2), main = "PCA of probability density functions", sub.title = NULL, ...)
## S3 method for class 'fpcat' plot(x, nscore=c(1, 2), main = "PCA of probability density functions", sub.title = NULL, ...)
x |
object of class |
nscore |
numeric or length 2 numeric vector. If it is a length 2 numeric vector (default), it contains the numbers of the score vectors to be plotted. If it is a single value, it is the number of the score which is plotted among time. Warning: The components of |
main |
this argument to title has an useful default here. |
sub.title |
string. Subtitle to be added to each graph. |
... |
optional arguments to |
Plots:
if nscore
is a length 2 vector (default): the principal scores returned by the fpcat
function with arrows from the point corresponding to each time to the next one.
if nscore
is a single value, the principal scores among time with arrows from each time to the next one.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.
times <- as.Date(c("2017-03-01", "2017-04-01", "2017-05-01", "2017-06-01")) x1 <- data.frame(z1=rnorm(6,1,5), z2=rnorm(6,3,3)) x2 <- data.frame(z1=rnorm(6,4,6), z2=rnorm(6,5,2)) x3 <- data.frame(z1=rnorm(6,7,2), z2=rnorm(6,8,4)) x4 <- data.frame(z1=rnorm(6,9,3), z2=rnorm(6,10,2)) ft <- foldert(x1, x2, x3, x4, times = times, rows.select="intersect") print(ft) result <- fpcat(ft) plot(result) plot(result, nscore = c(1, 2)) plot(result, nscore = 1) plot(result)
times <- as.Date(c("2017-03-01", "2017-04-01", "2017-05-01", "2017-06-01")) x1 <- data.frame(z1=rnorm(6,1,5), z2=rnorm(6,3,3)) x2 <- data.frame(z1=rnorm(6,4,6), z2=rnorm(6,5,2)) x3 <- data.frame(z1=rnorm(6,7,2), z2=rnorm(6,8,4)) x4 <- data.frame(z1=rnorm(6,9,3), z2=rnorm(6,10,2)) ft <- foldert(x1, x2, x3, x4, times = times, rows.select="intersect") print(ft) result <- fpcat(ft) plot(result) plot(result, nscore = c(1, 2)) plot(result, nscore = 1) plot(result)
Applies to an object of class hclustdd
(see details of the
hclustdd
function). Plots the dendogram.
## S3 method for class 'hclustdd' plot(x, labels = NULL, hang = 0.1, check = TRUE, axes = TRUE, frame.plot = FALSE, ann = TRUE, main = "HCA of probability density functions", sub = NULL, xlab = NULL, ylab = "Height", ...)
## S3 method for class 'hclustdd' plot(x, labels = NULL, hang = 0.1, check = TRUE, axes = TRUE, frame.plot = FALSE, ann = TRUE, main = "HCA of probability density functions", sub = NULL, xlab = NULL, ylab = "Height", ...)
x |
object of class |
labels , hang , check , axes , frame.plot , ann , main , sub , xlab , ylab
|
Arguments concerning the graphical representation of the dendogram. See |
... |
Further graphical arguments. |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
data(dspg) xl = dspg result <- hclustdd(xl) plot(result) plot(result, hang = -1)
data(dspg) xl = dspg result <- hclustdd(xl) plot(result) plot(result, hang = -1)
Applies to an object of class "mdsdd"
(see the details section of the
mdsdd
function). Plots the scores.
## S3 method for class 'mdsdd' plot(x, nscore = c(1, 2), main="MDS of probability density functions", sub.title = NULL, color = NULL, fontsize.points = 1.5, ...)
## S3 method for class 'mdsdd' plot(x, nscore = c(1, 2), main="MDS of probability density functions", sub.title = NULL, color = NULL, fontsize.points = 1.5, ...)
x |
object of class |
nscore |
a length 2 numeric vector. The numbers of the score vectors to be plotted. Warning: Its components cannot be greater than the |
main |
this argument to title has an useful default here. |
sub.title |
string. Subtitle to be added to each graph. |
color |
When provided, the colour of the symbols of each group. Can be a vector with length equal to the number of groups. |
fontsize.points |
Numeric. Expansion of the characters (or symbols) of the groups on the graph. This works as a multiple of |
... |
optional arguments to |
Plots the principal scores returned by the function mdsdd
.
A new graphics window is opened for each pair of principal score vectors defined by the
nscore
argument.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
mdsdd; print.mdsdd; interpret.mdsdd.
# INSEE (France): Diploma x Socio professional group, seven years. data(dspg) xlista = dspg a <- mdsdd(xlista) plot(a)
# INSEE (France): Diploma x Socio professional group, seven years. data(dspg) xlista = dspg a <- mdsdd(xlista) plot(a)
Plots a set of numeric variables vs. another set and prints the pairwise correlations. It uses the ggplot2 package.
plotframes(x, y, xlab = NULL, ylab = NULL, font.size = 12, layout = NULL)
plotframes(x, y, xlab = NULL, ylab = NULL, font.size = 12, layout = NULL)
x |
data frame (can also be a tibble). Variables on x coordinates. |
y |
data frame (or tibble). Variables on y coordinates. |
xlab |
a label for the x axis, by default the column names of |
ylab |
a label for the y axis (by default there is no label). |
font.size |
integer. Size of the characters in the strips. |
layout |
numeric vector of length 2 or 3 giving the number of columns, rows, and optionally pages of the lattice. If omitted, the graphs will be displayed on 3 lines and 3 columns, with a number of pages set to the required number. |
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
require(MASS) mx <- c(0,0) vx <- matrix(c(1,0,0,1),ncol = 2) my <- c(0,1) vy <- matrix(c(4,1,1,9),ncol = 2) x <- as.data.frame(mvrnorm(n = 10, mu = mx, Sigma = vx)) y <- as.data.frame(mvrnorm(n = 10, mu = my, Sigma = vy)) colnames(x) <- c("x1", "x2") colnames(y) <- c("y1", "y2") plotframes(x, y)
require(MASS) mx <- c(0,0) vx <- matrix(c(1,0,0,1),ncol = 2) my <- c(0,1) vy <- matrix(c(4,1,1,9),ncol = 2) x <- as.data.frame(mvrnorm(n = 10, mu = mx, Sigma = vx)) y <- as.data.frame(mvrnorm(n = 10, mu = my, Sigma = vy)) colnames(x) <- c("x1", "x2") colnames(y) <- c("y1", "y2") plotframes(x, y)
Applies to an object of class "discdd.misclass"
. Prints the numerical results of discdd.misclass
.
## S3 method for class 'discdd.misclass' print(x, dist.print=FALSE, prox.print=FALSE, digits=2, ...)
## S3 method for class 'discdd.misclass' print(x, dist.print=FALSE, prox.print=FALSE, digits=2, ...)
x |
object of class |
dist.print |
logical. Its default value is |
prox.print |
logical. Its default value is |
digits |
numeric. Number of significant digits for the display of numerical results. |
... |
optional arguments to |
By default, are printed the whole misallocation ratio, the confusion matrix (allocations versus origins) with the misallocation ratios per class, and the data frame whose rows are the groups, and whose columns are the origin classes and allocation classes, and a logical variable indicating misclassification.
If dist.print = TRUE
or prox.print = TRUE
, the distances or proximity indices (in percent) between groups and classes, are displayed.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
data("castles.dated") stones <- castles.dated$stones periods <- castles.dated$periods stones$height <- cut(stones$height, breaks = c(19, 27, 40, 71), include.lowest = TRUE) stones$width <- cut(stones$width, breaks = c(24, 45, 62, 144), include.lowest = TRUE) stones$edging <- cut(stones$edging, breaks = c(0, 3, 4, 8), include.lowest = TRUE) stones$boss <- cut(stones$boss, breaks = c(0, 6, 9, 20), include.lowest = TRUE ) castlefh <- folderh(periods, "castle", stones) res <- discdd.misclass(castlefh, "period") print(res)
data("castles.dated") stones <- castles.dated$stones periods <- castles.dated$periods stones$height <- cut(stones$height, breaks = c(19, 27, 40, 71), include.lowest = TRUE) stones$width <- cut(stones$width, breaks = c(24, 45, 62, 144), include.lowest = TRUE) stones$edging <- cut(stones$edging, breaks = c(0, 3, 4, 8), include.lowest = TRUE) stones$boss <- cut(stones$boss, breaks = c(0, 6, 9, 20), include.lowest = TRUE ) castlefh <- folderh(periods, "castle", stones) res <- discdd.misclass(castlefh, "period") print(res)
print
function, applied to an object of class "discdd.predict"
, prints numerical results of discdd.predict .
## S3 method for class 'discdd.predict' print(x, dist.print=TRUE, prox.print=FALSE, digits=2, ...)
## S3 method for class 'discdd.predict' print(x, dist.print=TRUE, prox.print=FALSE, digits=2, ...)
x |
object of class |
dist.print |
logical. If |
prox.print |
logical. Its default value is |
digits |
numerical. Number of significant digits for the display of numerical results. |
... |
optional arguments to |
By default, are printed:
if available (if misclass.ratio
argument of discdd.predict
was TRUE
), the whole misallocation ratio, the confusion matrix (allocations versus origins) and the misallocation ratio per class are printed.
the data frame the rows of which are the groups, and the columns of which are of the origin (NA
if not available) and allocation classes.
If dist.print = TRUE
or prox.print = TRUE
, the distances or proximity indices between groups and classes, are displayed.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
data(castles.dated) data(castles.nondated) stones <- rbind(castles.dated$stones, castles.nondated$stones) periods <- rbind(castles.dated$periods, castles.nondated$periods) stones$height <- cut(stones$height, breaks = c(19, 27, 40, 71), include.lowest = TRUE) stones$width <- cut(stones$width, breaks = c(24, 45, 62, 144), include.lowest = TRUE) stones$edging <- cut(stones$edging, breaks = c(0, 3, 4, 8), include.lowest = TRUE) stones$boss <- cut(stones$boss, breaks = c(0, 6, 9, 20), include.lowest = TRUE ) castlesfh <- folderh(periods, "castle", stones) result <- discdd.predict(castlesfh, "period") print(result) print(result, prox.print=TRUE)
data(castles.dated) data(castles.nondated) stones <- rbind(castles.dated$stones, castles.nondated$stones) periods <- rbind(castles.dated$periods, castles.nondated$periods) stones$height <- cut(stones$height, breaks = c(19, 27, 40, 71), include.lowest = TRUE) stones$width <- cut(stones$width, breaks = c(24, 45, 62, 144), include.lowest = TRUE) stones$edging <- cut(stones$edging, breaks = c(0, 3, 4, 8), include.lowest = TRUE) stones$boss <- cut(stones$boss, breaks = c(0, 6, 9, 20), include.lowest = TRUE ) castlesfh <- folderh(periods, "castle", stones) result <- discdd.predict(castlesfh, "period") print(result) print(result, prox.print=TRUE)
Applies to an object of class "dstatis"
. Prints the numeric results returned by the dstatis.inter
function.
## S3 method for class 'dstatis' print(x, mean.print = FALSE, var.print = FALSE, cor.print = FALSE, skewness.print = FALSE, kurtosis.print = FALSE, digits = 2, ...)
## S3 method for class 'dstatis' print(x, mean.print = FALSE, var.print = FALSE, cor.print = FALSE, skewness.print = FALSE, kurtosis.print = FALSE, digits = 2, ...)
x |
object of class |
mean.print |
logical. If |
var.print |
logical. If |
cor.print |
logical. If |
skewness.print |
logical. If |
kurtosis.print |
logical. If |
digits |
numeric. Number of significant digits for the display of numeric results. |
... |
optional arguments to |
By default, are printed the inertia explained by the nb.values
(see dstatis.inter
) first principal components, the contributions, the qualities of representation of the densities along the nb.factors
(see dstatis.inter
) first principal components, and the principal scores.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Lavit, C., Escoufier, Y., Sabatier, R., Traissac, P. (1994). The ACT (STATIS method). Computational Statistics & Data Analysis, 18 (1994), 97-119.
dstatis.inter; plot.dstatis; interpret.dstatis; print.dstatis.
data(roses) rosesf <- as.folder(roses[,c("Sha","Den","Sym","rose")]) # Dual STATIS on the covariance matrices result <- dstatis.inter(rosesf, data.scaled = FALSE, group.name = "rose") print(result)
data(roses) rosesf <- as.folder(roses[,c("Sha","Den","Sym","rose")]) # Dual STATIS on the covariance matrices result <- dstatis.inter(rosesf, data.scaled = FALSE, group.name = "rose") print(result)
Applies to an object of class "fdiscd.misclass"
. Prints the numerical results of fdiscd.misclass
.
## S3 method for class 'fdiscd.misclass' print(x, dist.print=FALSE, prox.print=FALSE, digits=2, ...)
## S3 method for class 'fdiscd.misclass' print(x, dist.print=FALSE, prox.print=FALSE, digits=2, ...)
x |
object of class |
dist.print |
logical. Its default value is |
prox.print |
logical. Its default value is |
digits |
numeric. Number of significant digits for the display of numerical results. |
... |
optional arguments to |
By default, are printed the whole misallocation ratio, the confusion matrix (allocations versus origins) with the misallocation ratios per class, and the data frame whose rows are the groups, and whose columns are the origin classes and allocation classes, and a logical variable indicating misclassification.
If dist.print = TRUE
or prox.print = TRUE
, the distances or proximity indices (in percent) between groups and classes, are displayed.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Boumaza, R. (2004). Discriminant analysis with independently repeated multivariate measurements: an approach. Computational Statistics & Data Analysis, 47, 823-843.
Rudrauf, J.M., Boumaza, R. (2001). Contribution à l'étude de l'architecture médiévale: les caractéristiques des pierres à bossage des châteaux forts alsaciens, Centre de Recherches Archéologiques Médiévales de Saverne, 5, 5-38.
data(castles.dated) castlesfh <- folderh(castles.dated$periods, "castle", castles.dated$stones) result <- fdiscd.misclass(castlesfh, "period") print(result) print(result, dist.print=TRUE) print(result, prox.print=TRUE)
data(castles.dated) castlesfh <- folderh(castles.dated$periods, "castle", castles.dated$stones) result <- fdiscd.misclass(castlesfh, "period") print(result) print(result, dist.print=TRUE) print(result, prox.print=TRUE)
print
function, applied to an object of class "fdiscd.predict"
, prints numerical results of fdiscd.predict .
## S3 method for class 'fdiscd.predict' print(x, dist.print=TRUE, prox.print=FALSE, digits=2, ...)
## S3 method for class 'fdiscd.predict' print(x, dist.print=TRUE, prox.print=FALSE, digits=2, ...)
x |
object of class |
dist.print |
logical. If |
prox.print |
logical. Its default value is |
digits |
numerical. Number of significant digits for the display of numerical results. |
... |
optional arguments to |
By default, are printed:
if available (if misclass.ratio
argument of fdiscd.predict
was TRUE
), the whole misallocation ratio, the confusion matrix (allocations versus origins) and the misallocation ratio per class are printed.
the data frame the rows of which are the groups, and the columns of which are of the origin (NA
if not available) and allocation classes.
If dist.print = TRUE
or prox.print = TRUE
, the distances or proximity indices between groups and classes, are displayed.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Boumaza, R. (2004). Discriminant analysis with independently repeated multivariate measurements: an approach. Computational Statistics & Data Analysis, 47, 823-843.
Rudrauf, J.M., Boumaza, R. (2001). Contribution à l'étude de l'architecture médiévale: les caractéristiques des pierres à bossage des châteaux forts alsaciens, Centre de Recherches Archéologiques médiévales de Saverne, 5, 5-38.
data(castles.dated) data(castles.nondated) castles.stones <- rbind(castles.dated$stones, castles.nondated$stones) castles.periods <- rbind(castles.dated$periods, castles.nondated$periods) castlesfh <- folderh(castles.periods, "castle", castles.stones) result <- fdiscd.predict(castlesfh, "period") print(result) print(result, prox.print=TRUE)
data(castles.dated) data(castles.nondated) castles.stones <- rbind(castles.dated$stones, castles.nondated$stones) castles.periods <- rbind(castles.dated$periods, castles.nondated$periods) castlesfh <- folderh(castles.periods, "castle", castles.stones) result <- fdiscd.predict(castlesfh, "period") print(result) print(result, prox.print=TRUE)
print
function, applied to an object of class "fhclustd"
, prints numerical results of fhclustd .
## S3 method for class 'fhclustd' print(x, dist.print=FALSE, digits=2, ...)
## S3 method for class 'fhclustd' print(x, dist.print=FALSE, digits=2, ...)
x |
object of class |
dist.print |
logical. If |
digits |
numerical. Number of significant digits for the display of numerical results. |
... |
optional arguments to |
If dist.print = TRUE
, the distances between groups are displayed.
By default, the result of the clustering is printed. The display is the same as that of the print.hclust
function.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
data(castles.dated) xf <- as.folder(castles.dated$stones) ## Not run: result <- fhclustd(xf) print(result) print(result, dist.print = TRUE) ## End(Not run)
data(castles.dated) xf <- as.folder(castles.dated$stones) ## Not run: result <- fhclustd(xf) print(result) print(result, dist.print = TRUE) ## End(Not run)
Applies to an object of class "fmdsd"
. Prints the numeric results returned by the fmdsd
function.
## S3 method for class 'fmdsd' print(x, mean.print = FALSE, var.print = FALSE, cor.print = FALSE, skewness.print = FALSE, kurtosis.print = FALSE, digits = 2, ...)
## S3 method for class 'fmdsd' print(x, mean.print = FALSE, var.print = FALSE, cor.print = FALSE, skewness.print = FALSE, kurtosis.print = FALSE, digits = 2, ...)
x |
object of class |
mean.print |
logical. If |
var.print |
logical. If |
cor.print |
logical. If |
skewness.print |
logical. If |
kurtosis.print |
logical. If |
digits |
numeric. Number of significant digits for the display of numeric results. |
... |
optional arguments to |
By default, are printed the inertia explained by the nb.values
(see fmdsd
) first coordinates and the nb.factors
(see fmdsd
) coordinates of the densities.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.
fmdsd; plot.fmdsd; interpret.fmdsd; print.
data(roses) x <- roses[,c("Sha","Den","Sym","rose")] rosesfold <- as.folder(x) result <- fmdsd(rosesfold) print(result) print(result, mean.print = TRUE)
data(roses) x <- roses[,c("Sha","Den","Sym","rose")] rosesfold <- as.folder(x) result <- fmdsd(rosesfold) print(result) print(result, mean.print = TRUE)
foldermtg
print
function, applied to an object of class "foldermtg"
, prints an MTG (Multiscale Tree Graph) folder, as returned by foldermtg
function.
## S3 method for class 'foldermtg' print(x, classes = TRUE, description = FALSE, features = TRUE, topology = FALSE, coordinates = FALSE, ...)
## S3 method for class 'foldermtg' print(x, classes = TRUE, description = FALSE, features = TRUE, topology = FALSE, coordinates = FALSE, ...)
x |
an object of class |
classes |
logical. If |
description |
logical. If |
features |
logical. If |
topology |
logical. If |
coordinates |
logical. If |
... |
optional arguments to |
If classes
, description
or features
are TRUE
, the corresponding data frames are displayed.
If topology = TRUE
, the plant structure is displayed; and if coordinates = TRUE
, the spatial coordinates are displayed.
By default, the data frames containing the features on the vertices per class are printed.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Pradal, C., Godin, C. and Cokelaer, T. (2023). MTG user guide
read.mtg
: reads a MTG file and creates an object of class "foldermtg"
.
mtgfile1 <- system.file("extdata/plant1.mtg", package = "dad") xmtg1 <- read.mtg(mtgfile1) print(xmtg1) print(xmtg1, topology = TRUE) print(xmtg1, coordinates = TRUE) mtgfile2 <- system.file("extdata/plant2.mtg", package = "dad") xmtg2 <- read.mtg(mtgfile2) print(xmtg2) print(xmtg2, topology = TRUE) print(xmtg2, coordinates = TRUE)
mtgfile1 <- system.file("extdata/plant1.mtg", package = "dad") xmtg1 <- read.mtg(mtgfile1) print(xmtg1) print(xmtg1, topology = TRUE) print(xmtg1, coordinates = TRUE) mtgfile2 <- system.file("extdata/plant2.mtg", package = "dad") xmtg2 <- read.mtg(mtgfile2) print(xmtg2) print(xmtg2, topology = TRUE) print(xmtg2, coordinates = TRUE)
foldert
print
function, applied to an object of class "foldert"
, prints a foldert, as returned by foldert
or as.foldert
function.
## S3 method for class 'foldert' print(x, ...)
## S3 method for class 'foldert' print(x, ...)
x |
an object of class |
... |
optional arguments to |
The foldert is printed. In any data frame x[[t]]
of this foldert, if a row is entirely NA
(which means that the corresponding individual was not observed at time t
), this row are not printed.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
foldert
: object of class foldert
.
as.foldert.data.frame
: build an object of class foldert
from a data frame.
as.foldert.array
: build an object of class foldert
from a -array.
data(floribundity) ft <- foldert(floribundity, cols.select = "union", rows.select = "union") print(ft)
data(floribundity) ft <- foldert(floribundity, cols.select = "union", rows.select = "union") print(ft)
Applies to an object of class "fpcad"
. Prints the numeric results returned by the fpcad
function.
## S3 method for class 'fpcad' print(x, mean.print = FALSE, var.print = FALSE, cor.print = FALSE, skewness.print = FALSE, kurtosis.print = FALSE, digits = 2, ...)
## S3 method for class 'fpcad' print(x, mean.print = FALSE, var.print = FALSE, cor.print = FALSE, skewness.print = FALSE, kurtosis.print = FALSE, digits = 2, ...)
x |
object of class |
mean.print |
logical. If |
var.print |
logical. If |
cor.print |
logical. If |
skewness.print |
logical. If |
kurtosis.print |
logical. If |
digits |
numeric. Number of significant digits for the display of numeric results. |
... |
optional arguments to |
By default, are printed the inertia explained by the nb.values
(see fpcad
) first principal components, the contributions, the qualities of representation of the densities along the nb.factors
(see fpcad
) first principal components, and the principal scores.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.
fpcad; plot.fpcad; interpret.fpcad; print.
data(roses) rosefold <- as.folder(roses[,c("Sha","Den","Sym","rose")]) result <- fpcad(rosefold) print(result) print(result, mean.print = TRUE)
data(roses) rosefold <- as.folder(roses[,c("Sha","Den","Sym","rose")]) result <- fpcad(rosefold) print(result) print(result, mean.print = TRUE)
Applies to an object of class "fpcat"
. Prints the numeric results returned by the fpcat
function.
## S3 method for class 'fpcat' print(x, mean.print = FALSE, var.print = FALSE, cor.print = FALSE, skewness.print = FALSE, kurtosis.print = FALSE, digits = 2, ...)
## S3 method for class 'fpcat' print(x, mean.print = FALSE, var.print = FALSE, cor.print = FALSE, skewness.print = FALSE, kurtosis.print = FALSE, digits = 2, ...)
x |
object of class |
mean.print |
logical. If |
var.print |
logical. If |
cor.print |
logical. If |
skewness.print |
logical. If |
kurtosis.print |
logical. If |
digits |
numeric. Number of significant digits for the display of numeric results. |
... |
optional arguments to |
By default, are printed the vector of observation times (numeric, ordered factor or object of class "Date"
), the inertia explained by the nb.values
(see fpcat
) first principal components, the contributions, the qualities of representation of the densities along the nb.factors
(see fpcat
) first principal components, and the principal scores.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.
fpcat; plot.fpcat; print.
times <- as.Date(c("2017-03-01", "2017-04-01", "2017-05-01", "2017-06-01")) x1 <- data.frame(z1=rnorm(6,1,5), z2=rnorm(6,3,3)) x2 <- data.frame(z1=rnorm(6,4,6), z2=rnorm(6,5,2)) x3 <- data.frame(z1=rnorm(6,7,2), z2=rnorm(6,8,4)) x4 <- data.frame(z1=rnorm(6,9,3), z2=rnorm(6,10,2)) ft <- foldert(x1, x2, x3, x4, times = times, rows.select="intersect") print(ft) result <- fpcat(ft) print(result) print(result, mean.print = TRUE, var.print = TRUE)
times <- as.Date(c("2017-03-01", "2017-04-01", "2017-05-01", "2017-06-01")) x1 <- data.frame(z1=rnorm(6,1,5), z2=rnorm(6,3,3)) x2 <- data.frame(z1=rnorm(6,4,6), z2=rnorm(6,5,2)) x3 <- data.frame(z1=rnorm(6,7,2), z2=rnorm(6,8,4)) x4 <- data.frame(z1=rnorm(6,9,3), z2=rnorm(6,10,2)) ft <- foldert(x1, x2, x3, x4, times = times, rows.select="intersect") print(ft) result <- fpcat(ft) print(result) print(result, mean.print = TRUE, var.print = TRUE)
print
function, applied to an object of class "hclustdd"
, prints numerical results of hclustdd .
## S3 method for class 'hclustdd' print(x, dist.print=FALSE, digits=2, ...)
## S3 method for class 'hclustdd' print(x, dist.print=FALSE, digits=2, ...)
x |
object of class |
dist.print |
logical. If |
digits |
numerical. Number of significant digits for the display of numerical results. |
... |
optional arguments to |
If dist.print = TRUE
, the distances between groups are displayed.
By default, the result of the clustering is printed. The display is the same as that of the print.hclust
function.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
data(dspg) xl = dspg result <- hclustdd(xl) print(result) print(result, dist.print = TRUE)
data(dspg) xl = dspg result <- hclustdd(xl) print(result) print(result, dist.print = TRUE)
Applies to an object of class "mdsdd"
. Prints the numeric results returned by the mdsdd
function.
## S3 method for class 'mdsdd' print(x, joint = FALSE, margin1 = FALSE, margin2 = FALSE, association = FALSE, ...)
## S3 method for class 'mdsdd' print(x, joint = FALSE, margin1 = FALSE, margin2 = FALSE, association = FALSE, ...)
x |
object of class |
joint |
logical. If |
margin1 |
logical. If |
margin2 |
logical. If |
association |
logical. If |
... |
optional arguments to |
By default, are printed the inertia explained by the nb.values
(see mdsdd
) first coordinates and the nb.factors
(see mdsdd
) coordinates of the densities.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard
mdsdd; plot.mdsdd; interpret.mdsdd
# INSEE (France): Diploma x Socio professional group, seven years. data(dspg) xlista = dspg a <- mdsdd(xlista) print(a, joint = TRUE, margin1 = TRUE, margin2 = TRUE)
# INSEE (France): Diploma x Socio professional group, seven years. data(dspg) xlista = dspg a <- mdsdd(xlista) print(a, joint = TRUE, margin1 = TRUE, margin2 = TRUE)
Reads an MTG (Multiscale Tree Graph) file and returns an object of class foldermtg
, that is a list of data frames (see Details).
read.mtg(file, ...)
read.mtg(file, ...)
file |
character. Path of the MTG file. |
... |
optional arguments to |
Recalling that a MTG file is a text file that can be opened with a spreadsheet (Excel, LibreOffice-Calc...). Its 4 tables are:
CLASSES: In this table the first column, named SYMBOL
, contains the symbolic character denoting each botanical entity (or vertex class, plant component...) used in the MTG (for example, P for plant, A for axis...). The second column, named SCALE
, represents the scale at which each entity appears in the MTG (for example 1 for P, 2 for axis...).
DESCRIPTION: This table displays the relations between the vertices: +
(branching relationship) or <
(successor relationship).
FEATURES: This table contains the features that can be attached to the vertices and their types: INT
(integer), REAL
(real numbers), STRING
(character)...
MTG: This table describes the plant topology, that is the vertices (one vertex per row) and their relations, the spatial coordinates of each vertex and the values taken by each vertex on the above listed features.
Each vertex is labelled by its class, designating its botanical entity, and its index, designating its position among its immediate neighbours having the same scale. Each vertex label is preceded by +
or <
, seen above, or by the symbol /
(decomposition relationship) that means that the corresponding vertex is the first vertex of the decomposition of the vertex which precedes /
.
Notice that the column number of a vertex matches with its branching order. The vertices of scale k
resulting from the decomposition of a vertex of scale k-1
, named parent vertex, have the same order as that of the parent vertex.
See the example below.
read.mtg
returns an object, say x
, of class fodermtg
, that is a list of at least 6 data frames:
classes |
the table |
description |
the table |
features |
the table |
topology |
data frame containing the first columns of the If the |
coordinates |
data frame of the spatial coordinates of the entities. It has six columns: |
The sixth and following elements are nclass
data frames, nclass
being the number of classes in the MTG file. Each data frame matches with a vertex class, such as "P"
(plant), "A"
(axes), "M"
(metamers or phytomers), and contains the features on the corresponing vertices.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Pradal, C., Godin, C. and Cokelaer, T. (2023). MTG user guide
mtgfile1 <- system.file("extdata/plant1.mtg", package = "dad") x1 <- read.mtg(mtgfile1) print(x1) mtgfile2 <- system.file("extdata/plant2.mtg", package = "dad") x2 <- read.mtg(mtgfile2) print(x2)
mtgfile1 <- system.file("extdata/plant1.mtg", package = "dad") x1 <- read.mtg(mtgfile1) print(x1) mtgfile2 <- system.file("extdata/plant2.mtg", package = "dad") x2 <- read.mtg(mtgfile2) print(x2)
Remove some columns in all data frames of a folder.
rmcol.folder(object, name)
rmcol.folder(object, name)
object |
object of class |
name |
character vector. The names of the columns to be removed in each data frame of the folder. |
A folder with the same number of elements as object
. Its element is a data frame, and its columns are the columns of
object[[k]]
, except those given by name
.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
folder
: object of class folder
.
getcol.folder
: select columns in all elements of a folder.
getrow.folder
: select rows in all elements of a folder.
rmrow.folder
: remove rows in all elements of a folder.
data(iris) iris.fold <- as.folder(iris, "Species") rmcol.folder(iris.fold, c("Petal.Length", "Petal.Width"))
data(iris) iris.fold <- as.folder(iris, "Species") rmcol.folder(iris.fold, c("Petal.Length", "Petal.Width"))
Remove some columns in all data frames of a foldert.
rmcol.foldert(object, name)
rmcol.foldert(object, name)
object |
object of class |
name |
character vector. The names of the columns to be removed in each data frame of the foldert. |
A foldert with the same number of elements as object
. Its element is a data frame, and its columns are the columns of
object[[k]]
, except those given by name
.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
foldert
: object of class foldert
.
getcol.foldert
: select columns in all elements of a foldert.
getrow.foldert
: get rows in all elements of a foldert.
rmrow.foldert
: remove rows in all elements of a foldert.
data(floribundity) ft0 <- foldert(floribundity, cols.select = "union", rows.select = "union") ft0 rmcol.foldert(ft0, c("area"))
data(floribundity) ft0 <- foldert(floribundity, cols.select = "union", rows.select = "union") ft0 rmcol.foldert(ft0, c("area"))
Remove some rows in all data frames of a folder.
rmrow.folder(object, name)
rmrow.folder(object, name)
object |
object of class |
name |
character vector. The names of the rows to be removed in each data frame of the folder. |
A folder with the same number of elements as object
. Its element is a data frame, and its rows are the rows of
object[[k]]
, except those given by name
.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
folder
: object of class folder
.
getrow.folder
: select rows in all elements of a folder.
getcol.folder
: select columns in all elements of a folder.
rmcol.folder
: remove columns in all elements of a folder.
data(iris) iris.fold <- as.folder(iris, "Species") rmrow.folder(iris.fold, as.character(seq(1, 150, by = 2)))
data(iris) iris.fold <- as.folder(iris, "Species") rmrow.folder(iris.fold, as.character(seq(1, 150, by = 2)))
Remove some rows in all data frames of a foldert.
rmrow.foldert(object, name)
rmrow.foldert(object, name)
object |
object of class |
name |
character vector. The names of the rows to be removed in each data frame of the foldert. |
A foldert with the same number of elements as object
. Its element is a data frame, and its rows are the rows of
object[[k]]
, except those given by name
.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
foldert
: object of class foldert
.
getrow.foldert
: select rows in all elements of a foldert.
getcol.foldert
: select columns in all elements of a foldert.
rmcol.foldert
: remove columns in all elements of a foldert.
data(floribundity) ft0 <- foldert(floribundity, cols.select = "union", rows.select = "union") ft0 rmrow.foldert(ft0, c("rose", c("16", "51")))
data(floribundity) ft0 <- foldert(floribundity, cols.select = "union", rows.select = "union") ft0 rmrow.foldert(ft0, c("rose", c("16", "51")))
The data are extracted from measures on roses from an agronomic experiment in a greenhouse and outdoors.
data(roseflowers)
data(roseflowers)
roseflowers
is a list of two data frames:
roseflowers$variety
: this first data frame has 5 rows and 3 columns (factors) named
place
, rose
and variety
.
roseflowers$flower
: this second data frame has 11 cases and 5 columns named numflower
(the order number of the flower), rose
, diameter
and height
(the diameter and height of the flower), and nleaves
(the number of the leaves of the axis).
data(roseflowers) summary(roseflowers$variety) summary(roseflowers$flower)
data(roseflowers) summary(roseflowers$variety) summary(roseflowers$flower)
The data are extracted from measures on roses from an agronomic experiment in a greenhouse and outdoors.
data("roseleaves")
data("roseleaves")
roseleaves
is a list of four data frames:
roseflowers$rose
: data frame with 7 rows and 3 columns (factors) named
rose
, place
and variety
.
roseflowers$stem
: data frame with 12 rows and 5 columns named rose
, stem
, date
, order
(the ramification order of the stem) and nleaves
(the number of leaves of the stem).
roseflowers$leaf
: data frame with 35 rows and 5 columns named stem
, leaf
, rank
(the rank of the leaf on the stem), nleaflets
and lrachis
(the number of leaflets of the leaf and the length of its rachis).
roseflowers$leaflet
: data frame with 221 rows and 4 columns named leaf
, leaflet
, lleaflet
and wleaflet
(the length and width of the leaflet).
Each row (rose) in roseleaves$rose
pertains to several rows (stems) in roseleaves$stem
.
Each row (stem) in roseleaves$rose
pertains to several rows (leaves) in roseleaves$leaf
.
Each row (leaf) in roseleaves$rose
pertains to several rows (leaflets) in roseleaves$leaflet
.
data(roseleaves) summary(roseleaves$rose) summary(roseleaves$stem) summary(roseleaves$leaf) summary(roseleaves$leaflet)
data(roseleaves) summary(roseleaves$rose) summary(roseleaves$stem) summary(roseleaves$leaf) summary(roseleaves$leaflet)
These data are extracted from measures on rosebushes during a study on leaf and internode expansion dynamics. For four rosebushes, on each metamer, the length of the terminal leaflet and the length of the internode were measured on several days, from the 24 april 2010 to the 19 july 2010.
The metamers which have no leaflets are omitted.
data("rosephytomer")
data("rosephytomer")
A data frame with 643 rows (4 plants, 7, 8 or 9 metamers per plant, 37 days of observation) and 6 columns:
date
a POSIXct
nplant
a factor with levels 113
114
118
121
. Numbers of the plants.
rank
numeric. Rank of the metamer on the stem.
lleaflet
, linternode
numeric. Length of the terminal leaflet, length of the internode.
phytomer
factor. Identifiers of the metamers.
Demotes-Mainard, S., Bertheloot, J., Boumaza, R., Huché-Thélier, L., Guéritaine, G., Guérin, V. and Andrieu, B. (2013). Rose bush leaf and internode expansion dynamics: analysis and development of a model capturing interplant variability. Frontiers in Plant Science 4: 418. Doi: 10.3389/fpls.2013.00418
data(rosephytomer) as.foldert(rosephytomer, method = 1, ind = "phytomer", timecol = "date", same.rows = TRUE)
data(rosephytomer) as.foldert(rosephytomer, method = 1, ind = "phytomer", timecol = "date", same.rows = TRUE)
Sensory data characterising the visual aspect of 10 rosebushes
data(roses)
data(roses)
roses
is a data frame of sensory data with 420 rows (10 products, 14 assessors, 3 sessions) and 17 columns. The first 16 columns are numeric and correspond to 16 visual characteristics of rosebushes. The last column is a factor giving the name of the corresponding rosebush.
Sha:
top sided shape
Den:
foliage thickness
Sym:
plant symmetry
Vgr:
stem vigour
Qrm:
quantity of stems
Htr:
branching level
Qfl:
quantity of flowers
Efl:
staggering of flowering
Mvfl:
flower enhancement
Difl:
flower size
Qfr:
quantity of faded flowers/fruits
Qbt:
quantity of floral buds
Defl:
density of flower petals
Vcfl:
intensity of flower colour
Tfe:
leaf size
Vfe:
darkness of leaf colour
rose:
factor with 10 levels: A
, B
, C
, D
,
E
, F
, G
, H
, I
and J
Boumaza, R., Huché-Thélier, L., Demotes-Mainard, S., Le Coz, E., Leduc, N., Pelleschi-Travier, S., Qannari, E.M., Sakr, S., Santagostini, P., Symoneaux, R., Guérin, V. (2010). Sensory profile and preference analysis in ornamental horticulture: The case of rosebush. Food Quality and Preference, 21, 987-997.
data(roses) summary(roses)
data(roses) summary(roses)
Computes the skewness coefficient by column of the elements of an object of class folder
.
skewness.folder(x, na.rm = FALSE, type = 3)
skewness.folder(x, na.rm = FALSE, type = 3)
x |
an object of class |
na.rm |
logical. Should missing values be omitted from the calculations? (see |
type |
an integer between 1 and 3 (see |
It uses skewness
to compute the mean by numeric column of each element of the folder. If some columns of the data frames are not numeric, there is a warning, and the means are computed on the numeric columns only.
A list whose elements are the skewness coefficients by column of the elements of the folder.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
folder
to create an object is of class folder
.
mean.folder
, var.folder
, cor.folder
, kurtosis.folder
for other statistics for folder
objects.
# First example: iris (Fisher) data(iris) iris.fold <- as.folder(iris, "Species") iris.skewness <- skewness.folder(iris.fold) print(iris.skewness) # Second example: roses data(roses) roses.fold <- as.folder(roses, "rose") roses.skewness <- skewness.folder(roses.fold) print(roses.skewness)
# First example: iris (Fisher) data(iris) iris.fold <- as.folder(iris, "Species") iris.skewness <- skewness.folder(iris.fold) print(iris.skewness) # Second example: roses data(roses) roses.fold <- as.folder(roses, "rose") roses.skewness <- skewness.folder(roses.fold) print(roses.skewness)
Calculation of the square root of a positive semi-definite matrix (see Details for the definition of such a matrix).
sqrtmatrix(mat)
sqrtmatrix(mat)
mat |
numeric matrix. |
The matrix mat
must be symmetric and positive semi-definite. Otherwise, there is an error.
The square root of the matrix mat
is the positive semi-definite matrix M
such as t(M) %*% M = mat
.
Do not confuse with sqrt(mat)
, which returns the square root of the elements of mat
.
The computation is based on the diagonalisation of mat
. The eigenvalues smaller than 10^-16 are identified as null values.
Matrix: the square root of the matrix mat
.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
M2 <- matrix(c(5, 4, 4, 5), nrow = 2) M <- sqrtmatrix(M2) M
M2 <- matrix(c(5, 4, 4, 5), nrow = 2) M <- sqrtmatrix(M2) M
Summarize an object of class folder
.
## S3 method for class 'folder' summary(object, ...)
## S3 method for class 'folder' summary(object, ...)
object |
object of class |
... |
further arguments passed to or from other methods. |
A list, each element of it contains the summary of the corresponding element of object
.
This list has an attribute attr(, "same.rows")
.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
folder
: object of class folder
.
as.folder.data.frame
: build an object of class folder
from a data frame.
data(iris) iris.fold <- as.folder(iris, "Species") summary(iris.fold)
data(iris) iris.fold <- as.folder(iris, "Species") summary(iris.fold)
Summarize an object of class folderh
.
## S3 method for class 'folderh' summary(object, ...)
## S3 method for class 'folderh' summary(object, ...)
object |
object of class |
... |
further arguments passed to or from other methods. |
A list, each element of it containing the summary of the corresponding element of object
.
This list has an attribute attr(, "keys")
(see folderh
).
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
folderh
: object of class folderh
.
# First example mtgfile <- system.file("extdata/plant1.mtg", package = "dad") x <- read.mtg(mtgfile) fh1 <- as.folderh(x, classes = c("P", "A", "M")) summary(fh1) # Second example data(roseleaves) roses <- roseleaves$rose stems <- roseleaves$stem leaves <- roseleaves$leaf leaflets <- roseleaves$leaflet fh2 <- folderh(roses, "rose", stems, "stem", leaves, "leaf", leaflets) summary(fh2)
# First example mtgfile <- system.file("extdata/plant1.mtg", package = "dad") x <- read.mtg(mtgfile) fh1 <- as.folderh(x, classes = c("P", "A", "M")) summary(fh1) # Second example data(roseleaves) roses <- roseleaves$rose stems <- roseleaves$stem leaves <- roseleaves$leaf leaflets <- roseleaves$leaflet fh2 <- folderh(roses, "rose", stems, "stem", leaves, "leaf", leaflets) summary(fh2)
foldermtg
Summary method for S3 class foldermtg
.
## S3 method for class 'foldermtg' summary(object, ...)
## S3 method for class 'foldermtg' summary(object, ...)
object |
an object of class |
... |
optional arguments to |
The summary of the data frames containing the vertices of each class and the values of the features on these vertices.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Pradal, C., Godin, C. and Cokelaer, T. (2023). MTG user guide
read.mtg
: reads a MTG file and creates an object of class "foldermtg"
.
mtgfile1 <- system.file("extdata/plant1.mtg", package = "dad") x1 <- read.mtg(mtgfile1) summary(x1) mtgfile2 <- system.file("extdata/plant2.mtg", package = "dad") x2 <- read.mtg(mtgfile2) summary(x2)
mtgfile1 <- system.file("extdata/plant1.mtg", package = "dad") x1 <- read.mtg(mtgfile1) summary(x1) mtgfile2 <- system.file("extdata/plant2.mtg", package = "dad") x2 <- read.mtg(mtgfile2) summary(x2)
Summarize an object of class foldert
.
## S3 method for class 'foldert' summary(object, ...)
## S3 method for class 'foldert' summary(object, ...)
object |
object of class |
... |
further arguments passed to or from other methods. |
A list, each element of it contains the summary of the corresponding element of object
.
This list has two attributes attr(, "times")
and attr(, "same.rows")
.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
foldert
: object of class foldert
.
as.foldert.data.frame
: build an object of class foldert
from a data frame.
as.foldert.array
: build an object of class foldert
from a -array.
# 1st example data(floribundity) ftflor <- foldert(floribundity, cols.select = "union", rows.select = "union") summary(ftflor)
# 1st example data(floribundity) ftflor <- foldert(floribundity, cols.select = "union", rows.select = "union") summary(ftflor)
Computes the variance matrices of the elements of an object of class folder
.
var.folder(x, na.rm = FALSE, use = "everything")
var.folder(x, na.rm = FALSE, use = "everything")
x |
an object of class |
na.rm |
logical. Should missing values be removed? (see |
use |
an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs" (see |
It uses var
to compute the variance matrix of the numeric columns of each element of the folder. If some columns of the data frames are not numeric, there is a warning, and the variances are computed on the numeric columns only.
A list whose elements are the variance matrices of the elements of the folder.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
folder
to create an object is of class folder
.
mean.folder
, cor.folder
, skewness.folder
, kurtosis.folder
for other statistics for folder
objects.
# First example: iris (Fisher) data(iris) iris.fold <- as.folder(iris, "Species") iris.vars <- var.folder(iris.fold) print(iris.vars) # Second example: roses data(roses) roses.fold <- as.folder(roses, "rose") roses.vars <- var.folder(roses.fold) print(roses.vars)
# First example: iris (Fisher) data(iris) iris.fold <- as.folder(iris, "Species") iris.vars <- var.folder(iris.fold) print(iris.vars) # Second example: roses data(roses) roses.fold <- as.folder(roses, "rose") roses.vars <- var.folder(roses.fold) print(roses.vars)
The data are extracted from measures on roses from an agronomic experiment in a greenhouse and outdoors.
data("varietyleaves")
data("varietyleaves")
varietyleaves
is an object of class "folderh"
, that is a list of two data frames:
varietyleaves$variety
: data frame with 31 rows and 2 columns (factors) named
rose
and variety
.
varietyleaves$leaves
: data frame with 581 rows and 5 columns named rose
, nleaflet
(number of leaflets), lrachis
(length of the rachis), lleaflet
(length of the principal leaflet) and wleaflet
(width of the principal leaflet).
data(varietyleaves) summary(varietyleaves)
data(varietyleaves) summary(varietyleaves)
The 2-Wasserstein distance between two multivariate () or univariate (
) Gaussian densities (see Details).
wasserstein(x1, x2, check = FALSE)
wasserstein(x1, x2, check = FALSE)
x1 |
a matrix or data frame of |
x2 |
matrix or data frame (or tibble) of |
check |
logical. When |
The Wasserstein distance between the two Gaussian densities is computed by using the wassersteinpar
function and the density parameters estimated from samples.
Returns the 2- distance between the two probability densities.
Be careful! If check = FALSE
and one smoothing bandwidth matrix is degenerate, the result returned can not be considered.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Peterson, A., Mueller, H.G. (2016). Functional Data Analysis for Density Functions by Transformation to a Hilbert Space. The annals of Statistics, 44 (1), 183-218. DOI: 10.1214/15-AOS1363
Dowson, D.C., Ladau, B.V. (1982). The Fréchet Distance between Multivariate Normal Distributions. Journal of Multivariate Analysis, 12, 450-455.
wassersteinpar: 2-Wasserstein distance between Gaussian densities, given their parameters.
require(MASS) m1 <- c(0,0) v1 <- matrix(c(1,0,0,1),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(4,1,1,9),ncol = 2) x1 <- mvrnorm(n = 3,mu = m1,Sigma = v1) x2 <- mvrnorm(n = 5, mu = m2, Sigma = v2) wasserstein(x1, x2)
require(MASS) m1 <- c(0,0) v1 <- matrix(c(1,0,0,1),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(4,1,1,9),ncol = 2) x1 <- mvrnorm(n = 3,mu = m1,Sigma = v1) x2 <- mvrnorm(n = 5, mu = m2, Sigma = v2) wasserstein(x1, x2)
The 2-Wasserstein distance between two multivariate () or univariate (
) Gaussian densities given their parameters (mean vectors and covariance matrices if the densities are multivariate, or means and variances if univariate) (see Details).
wassersteinpar(mean1, var1, mean2, var2, check = FALSE)
wassersteinpar(mean1, var1, mean2, var2, check = FALSE)
mean1 |
|
var1 |
|
mean2 |
|
var2 |
|
check |
logical. When |
The mean vectors ( and
) and variance matrices (
and
) given as arguments (
mean1
, mean2
, var1
and var2
) are used to compute the 2-Wasserstein distance between the two Gaussian densities, equal to:
If :
The 2-Wasserstein distance between two Gaussian densities.
Be careful! If check = FALSE
and one covariance matrix is degenerated (multivariate case) or one variance is zero (univariate case), the result returned must not be considered.
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Peterson, A., Mueller, H.G (2016). Functional Data Analysis for Density Functions by Transformation to a Hilbert Space. The annals of Statistics, 44 (1), 183-218. DOI: 10.1214/15-AOS1363
Dowson, D.C., Ladau, B.V. (1982). The Fréchet Distance between Multivariate Normal Distributions. Journal of Multivariate Analysis, 12, 450-455.
wasserstein: 2-Wasserstein distance between Gaussian densities estimated from samples.
m1 <- c(1,1) v1 <- matrix(c(4,1,1,9),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(1,0,0,1),ncol = 2) wassersteinpar(m1,v1,m2,v2)
m1 <- c(1,1) v1 <- matrix(c(4,1,1,9),ncol = 2) m2 <- c(0,1) v2 <- matrix(c(1,0,0,1),ncol = 2) wassersteinpar(m1,v1,m2,v2)