Skip to content

Commit

Permalink
Fixes and unescaped equation in the documentation and a incorrect typ…
Browse files Browse the repository at this point in the history
…e used in 2 print statements.
  • Loading branch information
jkrijthe committed Nov 27, 2023
1 parent ca6f630 commit 53fb776
Show file tree
Hide file tree
Showing 4 changed files with 5 additions and 4 deletions.
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Description: An R wrapper around the fast T-distributed Stochastic
Neighbor Embedding implementation by Van der Maaten (see <https://github.com/lvdmaaten/bhtsne/> for more information on the original implementation).
License: file LICENSE
URL: https://github.com/jkrijthe/Rtsne
Encoding: UTF-8
Imports:
Rcpp (>= 0.11.0),
stats
Expand Down
2 changes: 1 addition & 1 deletion R/Rtsne.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#'
#' Wrapper for the C++ implementation of Barnes-Hut t-Distributed Stochastic Neighbor Embedding. t-SNE is a method for constructing a low dimensional embedding of high-dimensional data, distances or similarities. Exact t-SNE can be computed by setting theta=0.0.
#'
#' Given a distance matrix \eqn{D} between input objects (which by default, is the euclidean distances between two objects), we calculate a similarity score in the original space p_ij. \deqn{ p_{j | i} = \frac{\exp(-\|D_{ij}\|^2 / 2 \sigma_i^2)}{\sum_{k \neq i} \exp(-\|D_{ij}\|^2 / 2 \sigma_i^2)} } which is then symmetrized using: \deqn{ p_{i j}=\frac{p_{j|i} + p_{i|j}}{2n}.} The \eqn{\sigma} for each object is chosen in such a way that the perplexity of p_{j|i} has a value that is close to the user defined perplexity. This value effectively controls how many nearest neighbours are taken into account when constructing the embedding in the low-dimensional space.
#' Given a distance matrix \eqn{D} between input objects (which by default, is the euclidean distances between two objects), we calculate a similarity score in the original space: \deqn{ p_{j | i} = \frac{\exp(-\|D_{ij}\|^2 / 2 \sigma_i^2)}{\sum_{k \neq i} \exp(-\|D_{ij}\|^2 / 2 \sigma_i^2)} } which is then symmetrized using: \deqn{ p_{i j}=\frac{p_{j|i} + p_{i|j}}{2n}.} The \eqn{\sigma} for each object is chosen in such a way that the perplexity of \eqn{p_{j|i}} has a value that is close to the user defined perplexity. This value effectively controls how many nearest neighbours are taken into account when constructing the embedding in the low-dimensional space.
#' For the low-dimensional space we use the Cauchy distribution (t-distribution with one degree of freedom) as the distribution of the distances to neighbouring objects:
#' \deqn{ q_{i j} = \frac{(1+ \| y_i-y_j\|^2)^{-1}}{\sum_{k \neq l} 1+ \| y_k-y_l\|^2)^{-1}}.}
#' By changing the location of the objects y in the embedding to minimize the Kullback-Leibler divergence between these two distributions \eqn{ q_{i j}} and \eqn{ p_{i j}}, we create a map that focusses on small-scale structure, due to the asymmetry of the KL-divergence. The t-distribution is chosen to avoid the crowding problem: in the original high dimensional space, there are potentially many equidistant objects with moderate distance from a particular object, more than can be accounted for in the low dimensional representation. The t-distribution makes sure that these objects are more spread out in the new representation.
Expand Down
2 changes: 1 addition & 1 deletion man/Rtsne.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions src/Rtsne.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Rcpp::List Rtsne_cpp(NumericMatrix X, int no_dims, double perplexity,
size_t N = X.ncol(), D = X.nrow();
double * data=X.begin();

if (verbose) Rprintf("Read the %i x %i data matrix successfully!\n", N, D);
if (verbose) Rprintf("Read the %zu x %zu data matrix successfully!\n", N, D);
std::vector<double> Y(N * no_dims), costs(N), itercosts(static_cast<int>(std::ceil(max_iter/50.0)));

// Providing user-supplied solution.
Expand Down Expand Up @@ -60,7 +60,7 @@ Rcpp::List Rtsne_nn_cpp(IntegerMatrix nn_dex, NumericMatrix nn_dist,
double eta, double exaggeration_factor, unsigned int num_threads) {

size_t N = nn_dex.ncol(), K=nn_dex.nrow(); // transposed - columns are points, rows are neighbors.
if (verbose) Rprintf("Read the NN results for %i points successfully!\n", N);
if (verbose) Rprintf("Read the NN results for %zu points successfully!\n", N);
std::vector<double> Y(N * no_dims), costs(N), itercosts(static_cast<int>(std::ceil(max_iter/50.0)));

// Providing user-supplied solution.
Expand Down

0 comments on commit 53fb776

Please sign in to comment.