Macros | Functions
csr.c File Reference

Various routines with dealing with CSR matrices. More...

#include <GKlib.h>
Include dependency graph for csr.c:

Go to the source code of this file.

Macros

#define OMPMINOPS   50000
 

Functions

void gk_csr_CompactColumns (gk_csr_t *mat)
 
float gk_csr_ComputeSimilarity (gk_csr_t *mat, int i1, int i2, int what, int simtype)
 
void gk_csr_ComputeSquaredNorms (gk_csr_t *mat, int what)
 
void gk_csr_ComputeSums (gk_csr_t *mat, int what)
 
gk_csr_tgk_csr_Create ()
 
void gk_csr_CreateIndex (gk_csr_t *mat, int what)
 
gk_csr_tgk_csr_Dup (gk_csr_t *mat)
 
gk_csr_tgk_csr_ExtractPartition (gk_csr_t *mat, int *part, int pid)
 
gk_csr_tgk_csr_ExtractRows (gk_csr_t *mat, int nrows, int *rind)
 
gk_csr_tgk_csr_ExtractSubmatrix (gk_csr_t *mat, int rstart, int nrows)
 
void gk_csr_Free (gk_csr_t **mat)
 
void gk_csr_FreeContents (gk_csr_t *mat)
 
int gk_csr_GetSimilarRows (gk_csr_t *mat, int nqterms, int *qind, float *qval, int simtype, int nsim, float minsim, gk_fkv_t *hits, int *i_marker, gk_fkv_t *i_cand)
 
void gk_csr_Init (gk_csr_t *mat)
 
gk_csr_tgk_csr_LowFilter (gk_csr_t *mat, int what, int norm, float fraction)
 
void gk_csr_Normalize (gk_csr_t *mat, int what, int norm)
 
gk_csr_tgk_csr_Prune (gk_csr_t *mat, int what, int minf, int maxf)
 
gk_csr_tgk_csr_Read (char *filename, int format, int readvals, int numbering)
 
void gk_csr_Scale (gk_csr_t *mat, int type)
 
void gk_csr_SortIndices (gk_csr_t *mat, int what)
 
gk_csr_t ** gk_csr_Split (gk_csr_t *mat, int *color)
 
gk_csr_tgk_csr_TopKPlusFilter (gk_csr_t *mat, int what, int topk, float keepval)
 
void gk_csr_Write (gk_csr_t *mat, char *filename, int format, int writevals, int numbering)
 
gk_csr_tgk_csr_ZScoreFilter (gk_csr_t *mat, int what, float zscore)
 

Detailed Description

Various routines with dealing with CSR matrices.

Author
George Karypis
Version
$Id: csr.c 13437 2013-01-11 21:54:10Z karypis $ 

Definition in file csr.c.

Macro Definition Documentation

◆ OMPMINOPS

#define OMPMINOPS   50000

Definition at line 12 of file csr.c.

Function Documentation

◆ gk_csr_CompactColumns()

void gk_csr_CompactColumns ( gk_csr_t mat)

Compacts the column-space of the matrix by removing empty columns. As a result of the compaction, the column numbers are renumbered. The compaction operation is done in place and only affects the row-based representation of the matrix. The new columns are ordered in decreasing frequency.

Parameters
matthe matrix whose empty columns will be removed.

Definition at line 1098 of file csr.c.

◆ gk_csr_ComputeSimilarity()

float gk_csr_ComputeSimilarity ( gk_csr_t mat,
int  i1,
int  i2,
int  what,
int  simtype 
)

Computes the similarity between two rows/columns

Parameters
matthe matrix itself. The routine assumes that the indices are sorted in increasing order.
i1is the first row/column,
i2is the second row/column,
whatis either GK_CSR_ROW or GK_CSR_COL indicating the type of objects between the similarity will be computed,
simtypeis the type of similarity and is one of GK_CSR_COS, GK_CSR_JAC, GK_CSR_MIN, GK_CSR_AMIN
Returns
the similarity between the two rows/columns.

Definition at line 1694 of file csr.c.

◆ gk_csr_ComputeSquaredNorms()

void gk_csr_ComputeSquaredNorms ( gk_csr_t mat,
int  what 
)

Computes the squared of the norms of the rows/columns

Parameters
matthe matrix itself,
whatis either GK_CSR_ROW or GK_CSR_COL indicating which squared norms to compute.

Definition at line 1643 of file csr.c.

◆ gk_csr_ComputeSums()

void gk_csr_ComputeSums ( gk_csr_t mat,
int  what 
)

Computes the sums of the rows/columns

Parameters
matthe matrix itself,
whatis either GK_CSR_ROW or GK_CSR_COL indicating which sums to compute.

Definition at line 1597 of file csr.c.

◆ gk_csr_Create()

gk_csr_t* gk_csr_Create ( )

Allocate memory for a CSR matrix and initializes it

Returns
the allocated matrix. The various fields are set to NULL.

Definition at line 19 of file csr.c.

◆ gk_csr_CreateIndex()

void gk_csr_CreateIndex ( gk_csr_t mat,
int  what 
)

Creates a row/column index from the column/row data.

Parameters
matthe matrix itself,
whatis either GK_CSR_ROW or GK_CSR_COL indicating which index will be created.

Definition at line 1223 of file csr.c.

◆ gk_csr_Dup()

gk_csr_t* gk_csr_Dup ( gk_csr_t mat)

Returns a copy of a matrix.

Parameters
matis the matrix to be duplicated.
Returns
the newly created copy of the matrix.

Definition at line 80 of file csr.c.

◆ gk_csr_ExtractPartition()

gk_csr_t* gk_csr_ExtractPartition ( gk_csr_t mat,
int part,
int  pid 
)

Returns a submatrix corresponding to a specified partitioning of rows.

Parameters
matis the original matrix.
partis the partitioning vector of the rows.
pidis the partition ID that will be extracted.
Returns
the row structure of the newly created submatrix.

Definition at line 230 of file csr.c.

◆ gk_csr_ExtractRows()

gk_csr_t* gk_csr_ExtractRows ( gk_csr_t mat,
int  nrows,
int rind 
)

Returns a submatrix containing a certain set of rows.

Parameters
matis the original matrix.
nrowsis the number of rows to extract.
rindis the set of row numbers to extract.
Returns
the row structure of the newly created submatrix.

Definition at line 191 of file csr.c.

◆ gk_csr_ExtractSubmatrix()

gk_csr_t* gk_csr_ExtractSubmatrix ( gk_csr_t mat,
int  rstart,
int  nrows 
)

Returns a submatrix containint a set of consecutive rows.

Parameters
matis the original matrix.
rstartis the starting row.
nrowsis the number of rows from rstart to extract.
Returns
the row structure of the newly created submatrix.

Definition at line 135 of file csr.c.

◆ gk_csr_Free()

void gk_csr_Free ( gk_csr_t **  mat)

Frees all the memory allocated for matrix.

Parameters
matis the matrix to be freed.

Definition at line 48 of file csr.c.

◆ gk_csr_FreeContents()

void gk_csr_FreeContents ( gk_csr_t mat)

Frees only the memory allocated for the matrix's different fields and sets them to NULL.

Parameters
matis the matrix whose contents will be freed.

Definition at line 63 of file csr.c.

◆ gk_csr_GetSimilarRows()

int gk_csr_GetSimilarRows ( gk_csr_t mat,
int  nqterms,
int qind,
float *  qval,
int  simtype,
int  nsim,
float  minsim,
gk_fkv_t *  hits,
int i_marker,
gk_fkv_t *  i_cand 
)

Finds the n most similar rows (neighbors) to the query using cosine similarity.

Parameters
matthe matrix itself
nqtermsis the number of columns in the query
qindis the list of query columns
qvalis the list of correspodning query weights
simtypeis the type of similarity and is one of GK_CSR_COS, GK_CSR_JAC, GK_CSR_MIN, GK_CSR_AMIN
nsimis the maximum number of requested most similar rows. If -1 is provided, then everything is returned unsorted.
minsimis the minimum similarity of the requested most similar rows
hitsis the result set. This array should be at least of length nsim.
i_markeris an array of size equal to the number of rows whose values are initialized to -1. If NULL is provided then this array is allocated and freed internally.
i_candis an array of size equal to the number of rows. If NULL is provided then this array is allocated and freed internally.
Returns
the number of identified most similar rows, which can be smaller than the requested number of nnbrs in those cases in which there are no sufficiently many neighbors.

Definition at line 1866 of file csr.c.

◆ gk_csr_Init()

void gk_csr_Init ( gk_csr_t mat)

Initializes the matrix

Parameters
matis the matrix to be initialized.

Definition at line 36 of file csr.c.

◆ gk_csr_LowFilter()

gk_csr_t* gk_csr_LowFilter ( gk_csr_t mat,
int  what,
int  norm,
float  fraction 
)

Eliminates certain entries from the rows/columns of the matrix. The filtering takes place by keeping only the highest weight entries whose sum accounts for a certain fraction of the overall weight of the row/column.

Parameters
matthe matrix to be prunned,
whatindicates if the rows (GK_CSR_ROW) or the columns (GK_CSR_COL) of the matrix will be prunned,
normindicates the norm that will be used to aggregate the weights and possible values are 1 or 2,
fractionis the fraction of the overall norm that will be retained by the kept entries.
Returns
the filtered matrix consisting only of its row-based structure. The input matrix is not modified.

Definition at line 759 of file csr.c.

◆ gk_csr_Normalize()

void gk_csr_Normalize ( gk_csr_t mat,
int  what,
int  norm 
)

Normalizes the rows/columns of the matrix to be unit length.

Parameters
matthe matrix itself,
whatindicates what will be normalized and is obtained by specifying GK_CSR_ROW, GK_CSR_COL, GK_CSR_ROW|GK_CSR_COL.
normindicates what norm is to normalize to, 1: 1-norm, 2: 2-norm

Definition at line 1319 of file csr.c.

◆ gk_csr_Prune()

gk_csr_t* gk_csr_Prune ( gk_csr_t mat,
int  what,
int  minf,
int  maxf 
)

Prunes certain rows/columns of the matrix. The prunning takes place by analyzing the row structure of the matrix. The prunning takes place by removing rows/columns but it does not affect the numbering of the remaining rows/columns.

Parameters
matthe matrix to be prunned,
whatindicates if the rows (GK_CSR_ROW) or the columns (GK_CSR_COL) of the matrix will be prunned,
minfis the minimum number of rows (columns) that a column (row) must be present in order to be kept,
maxfis the maximum number of rows (columns) that a column (row) must be present at in order to be kept.
Returns
the prunned matrix consisting only of its row-based structure. The input matrix is not modified.

Definition at line 669 of file csr.c.

◆ gk_csr_Read()

gk_csr_t* gk_csr_Read ( char *  filename,
int  format,
int  readvals,
int  numbering 
)

Reads a CSR matrix from the supplied file and stores it the matrix's forward structure.

Parameters
filenameis the file that stores the data.
formatis either GK_CSR_FMT_METIS, GK_CSR_FMT_CLUTO, GK_CSR_FMT_CSR, GK_CSR_FMT_BINROW, GK_CSR_FMT_BINCOL specifying the type of the input format. The GK_CSR_FMT_CSR does not contain a header line, whereas the GK_CSR_FMT_BINROW is a binary format written by gk_csr_Write() using the same format specifier.
readvalsis either 1 or 0, indicating if the CSR file contains values or it does not. It only applies when GK_CSR_FMT_CSR is used.
numberingis either 1 or 0, indicating if the numbering of the indices start from 1 or 0, respectively. If they start from 1, they are automatically decreamented during input so that they will start from 0. It only applies when GK_CSR_FMT_CSR is used.
Returns
the matrix that was read.

Definition at line 349 of file csr.c.

◆ gk_csr_Scale()

void gk_csr_Scale ( gk_csr_t mat,
int  type 
)

Applies different row scaling methods.

Parameters
matthe matrix itself,
typeindicates the type of row scaling. Possible values are: GK_CSR_MAXTF, GK_CSR_SQRT, GK_CSR_LOG, GK_CSR_IDF, GK_CSR_MAXTF2.

Definition at line 1386 of file csr.c.

◆ gk_csr_SortIndices()

void gk_csr_SortIndices ( gk_csr_t mat,
int  what 
)

Sorts the indices in increasing order

Parameters
matthe matrix itself,
whatis either GK_CSR_ROW or GK_CSR_COL indicating which set of indices to sort.

Definition at line 1146 of file csr.c.

◆ gk_csr_Split()

gk_csr_t** gk_csr_Split ( gk_csr_t mat,
int color 
)

Splits the matrix into multiple sub-matrices based on the provided color array.

Parameters
matis the original matrix.
coloris an array of size equal to the number of non-zeros in the matrix (row-wise structure). The matrix is split into as many parts as the number of colors. For meaningfull results, the colors should be numbered consecutively starting from 0.
Returns
an array of matrices for each supplied color number.

Definition at line 277 of file csr.c.

◆ gk_csr_TopKPlusFilter()

gk_csr_t* gk_csr_TopKPlusFilter ( gk_csr_t mat,
int  what,
int  topk,
float  keepval 
)

Eliminates certain entries from the rows/columns of the matrix. The filtering takes place by keeping only the highest weight top-K entries along each row/column and those entries whose weight is greater than a specified value.

Parameters
matthe matrix to be prunned,
whatindicates if the rows (GK_CSR_ROW) or the columns (GK_CSR_COL) of the matrix will be prunned,
topkis the number of the highest weight entries to keep.
keepvalis the weight of a term above which will be kept. This is used to select additional terms past the first topk.
Returns
the filtered matrix consisting only of its row-based structure. The input matrix is not modified.

Definition at line 901 of file csr.c.

◆ gk_csr_Write()

void gk_csr_Write ( gk_csr_t mat,
char *  filename,
int  format,
int  writevals,
int  numbering 
)

Writes the row-based structure of a matrix into a file.

Parameters
matis the matrix to be written,
filenameis the name of the output file.
formatis one of: GK_CSR_FMT_CLUTO, GK_CSR_FMT_CSR, GK_CSR_FMT_BINROW, GK_CSR_FMT_BINCOL.
writevalsis either 1 or 0 indicating if the values will be written or not. This is only applicable when GK_CSR_FMT_CSR is used.
numberingis either 1 or 0 indicating if the internal 0-based numbering will be shifted by one or not during output. This is only applicable when GK_CSR_FMT_CSR is used.

Definition at line 591 of file csr.c.

◆ gk_csr_ZScoreFilter()

gk_csr_t* gk_csr_ZScoreFilter ( gk_csr_t mat,
int  what,
float  zscore 
)

Eliminates certain entries from the rows/columns of the matrix. The filtering takes place by keeping only the terms whose contribution to the total length of the document is greater than a user-splied multiple over the average.

This routine assumes that the vectors are normalized to be unit length.

Parameters
matthe matrix to be prunned,
whatindicates if the rows (GK_CSR_ROW) or the columns (GK_CSR_COL) of the matrix will be prunned,
zscoreis the multiplicative factor over the average contribution to the length of the document.
Returns
the filtered matrix consisting only of its row-based structure. The input matrix is not modified.

Definition at line 1031 of file csr.c.



gtsam
Author(s):
autogenerated on Sat Nov 16 2024 04:09:46