Skip to content
VersionSize

similarity

The similarity utility calculates a score between 0 and 1 representing how similar two strings are. A score of 1 means the strings are identical, while 0 means they have nothing in common.

Implementation

View Source Code
ts
import { assert } from '../function/assert';

/**
 * Calculate the similarity between two strings using the Levenshtein distance algorithm.
 *
 * @example
 * ```ts
 * similarity('abc', 'abc') // 1
 * similarity('a', 'b') // 0
 * similarity('ab', 'ac') // 0.5
 * similarity('doe', 'John Doe') // 0.25
 * similarity('abc', 'axc') // 0.6666666666666667
 * similarity('kitten', 'sitting') // 0.5714285714285714
 * ```
 *
 * @param str1 - The first string.
 * @param str2 - The second string.
 *
 * @returns A number between 0 and 1 representing the similarity between the two strings.
 */
export function similarity(str1: unknown, str2: unknown): number {
  assert(
    ['string', 'number'].includes(typeof str1) && ['string', 'number'].includes(typeof str2),
    'Invalid arguments',
    {
      args: { str1, str2 },
      type: TypeError,
    },
  );

  const a = String(str1).toLowerCase();
  const b = String(str2).toLowerCase();

  if (a === b) return 1;
  if (a.length === 0) return b.length === 0 ? 1 : 0;
  if (b.length === 0) return 0;

  // Swap to ensure we use the smaller string for columns (O(min(A,B)) space)
  const [shorter, longer] = a.length < b.length ? [a, b] : [b, a];
  const shorterLength = shorter.length;
  const longerLength = longer.length;

  let prevRow = Array.from({ length: shorterLength + 1 }, (_, i) => i);
  let currRow = new Array(shorterLength + 1);

  for (let i = 1; i <= longerLength; i++) {
    currRow[0] = i;
    for (let j = 1; j <= shorterLength; j++) {
      const cost = longer[i - 1] === shorter[j - 1] ? 0 : 1;
      currRow[j] = Math.min(
        currRow[j - 1] + 1, // insertion
        prevRow[j] + 1, // deletion
        prevRow[j - 1] + cost, // substitution
      );
    }
    // Swap rows for the next iteration (avoid allocation)
    [prevRow, currRow] = [currRow, prevRow];
  }

  // After the loop, a result is in prevRow because of the swap
  const distance = prevRow[shorterLength];

  return 1 - distance / Math.max(a.length, b.length);
}

Features

  • Isomorphic: Works in both Browser and Node.js.
  • Normalized Output: Returns a float between 0 and 1.
  • Case-Insensitive (Optional): Can be easily combined with .toLowerCase() for case-insensitive matching.

API

ts
function similarity(a: string, b: string): number;

Parameters

  • a: The first string to compare.
  • b: The second string to compare.

Returns

  • A similarity score between 0 and 1.

Examples

Basic Comparison

ts
import { similarity } from '@vielzeug/toolkit';

similarity('apple', 'apply'); // 0.8
similarity('kitten', 'sitting'); // ~0.57
similarity('hello', 'world'); // 0.2
similarity('same', 'same'); // 1

Case-Insensitive Matching

ts
import { similarity } from '@vielzeug/toolkit';

const s1 = 'Vielzeug';
const s2 = 'vielzeug';

similarity(s1, s2); // Lower than 1 due to case difference
similarity(s1.toLowerCase(), s2.toLowerCase()); // 1

Implementation Notes

  • Internally uses the Levenshtein distance algorithm to determine the number of single-character edits required to change one string into another.
  • The score is normalized by dividing the distance by the length of the longer string and subtracting from 1.
  • Throws TypeError if either argument is not a string.

See Also

  • search: Use similarity to perform fuzzy searches in arrays.
  • seek: Use similarity to find values in deep objects.
  • truncate: Shorten long strings.