High pass / low pass selection

Keywords: low threshold, high confidence, high filter, mid confidence, candidate score, pass, result, case, screen, good, normalization, bunch. Powered by TextRank.

You have a bunch of strings presented on the screen as options and then you have another input string. You would like to get the best matches for the options on the screen. There are a bunch of different algorithms that can score the matches for you like Levenshtein ratio, Jaccard distance, Ratcliff/Obershelp, … But now that you have the scores for each match, how do you actually select the result or results? Do you just pick the highest one? What if that’s not the right thing to do because the confidence in the score isn’t that good? If you had the following scores

A - 30%

B - 35%

C - 10%

Do you think B is the correct answer? Maybe you should request some more information. But what is the right threshold. Is 64% a good match? These are some of the questions I have been thinking about lately. I think I found a somewhat acceptable approach that gives 2 knobs to play with. A 2 pass normalizing filter. The idea is that in the first pass you want to keep recall high and then in another pass you want to increase precision. The steps are simple

Here is an example scenario and the output they produce.

2 - pass selection

Low: 0.3+ / High: 0.8+

Case 1

“1 high confidence, 2 mid, 1 low”

ITEM	CONFIDENCE
A	    0.8
B	    0.3
C	    0.3
D	    0.1

Result of 1st pass with threshold: 0.3+ is [A,B,C]

Normalization step

ITEM	CONFIDENCE
A	    1.0
B	    0.375
C	    0.375

Result of 2nd pass with threshold: 0.8+ is [A]

Chosen item is A

Case 2

“2 mid confidence, 2 low”

ITEM	CONFIDENCE
A	    0
B	    0.3
C	    0.3
D	    0.1

Result of 1st pass with threshold 0.3+ is [B,C]

After normalization

ITEM	CONFIDENCE
B	    1
C	    1

Result of 2nd pass with threshold: 0.8+ is [B,C] Disambiguate further between [B, C]

1 - pass selection

Threshold: 0.8+

Case 1

“1 high confidence, 2 mid, 1 low”

ITEM	CONFIDENCE
A	    0.8
B	    0.3
C	    0.3
D	    0.1

Result of 1st pass with threshold: 0.5+ is [A]

Chosen item is A .

Case 2

“2 mid confidence, 2 low”

ITEM	CONFIDENCE
A	    0
B	    0.3
C	    0.3
D	    0.1

Result of 1st pass with threshold 0.5+ is []

Nothing is selected.


Metadata

First published on 2023-12-06

Generated on May 29, 2024, 10:01 PM

Index

Mobile optimized version. Desktop version.