High pass / low pass selection

Keywords: low threshold, high confidence, high filter, mid confidence, candidate score, pass, result, case, screen, good, normalization, bunch. Powered by TextRank.

You have a bunch of strings presented on the screen as options and then you have another input string. You would like to get the best matches for the options on the screen. There are a bunch of different algorithms that can score the matches for you like Levenshtein ratio, Jaccard distance, Ratcliff/Obershelp, … But now that you have the scores for each match, how do you actually select the result or results? Do you just pick the highest one? What if that’s not the right thing to do because the confidence in the score isn’t that good? If you had the following scores

A - 30%

B - 35%

C - 10%

Do you think B is the correct answer? Maybe you should request some more information. But what is the right threshold. Is 64% a good match? These are some of the questions I have been thinking about lately. I think I found a somewhat acceptable approach that gives 2 knobs to play with. A 2 pass normalizing filter. The idea is that in the first pass you want to keep recall high and then in another pass you want to increase precision. The steps are simple

Set a low threshold t1
Filter every candidate that scored above t1
Normalize the candidate scores
Set a high filter t2
Filter every candidate that scores above t2

Here is an example scenario and the output they produce.

2 - pass selection

Low: 0.3+ / High: 0.8+

Case 1

“1 high confidence, 2 mid, 1 low”

ITEM	CONFIDENCE
A	    0.8
B	    0.3
C	    0.3
D	    0.1

Result of 1st pass with threshold: 0.3+ is [A,B,C]

Normalization step

ITEM	CONFIDENCE
A	    1.0
B	    0.375
C	    0.375

Result of 2nd pass with threshold: 0.8+ is [A]

Chosen item is A

Case 2

“2 mid confidence, 2 low”

ITEM	CONFIDENCE
A	    0
B	    0.3
C	    0.3
D	    0.1

Result of 1st pass with threshold 0.3+ is [B,C]

After normalization

ITEM	CONFIDENCE
B	    1
C	    1

Result of 2nd pass with threshold: 0.8+ is [B,C] Disambiguate further between [B, C]

1 - pass selection

Threshold: 0.8+

Case 1

“1 high confidence, 2 mid, 1 low”

ITEM	CONFIDENCE
A	    0.8
B	    0.3
C	    0.3
D	    0.1

Result of 1st pass with threshold: 0.5+ is [A]

Chosen item is A .

Case 2

“2 mid confidence, 2 low”

ITEM	CONFIDENCE
A	    0
B	    0.3
C	    0.3
D	    0.1

Result of 1st pass with threshold 0.5+ is []

Nothing is selected.

Metadata

363 words

First published on 2023-12-06

Generated on 11 Aug 2025 at 12:23 PM

Index

Mobile optimized version. Desktop version.