Working through the basics before getting onto networks
Published
March 19, 2023
This is an exploration of Bayesian theorem. I’m going to work through the theorem and the derivation of it in different ways. The wikipedia page has the different derivations that I will be working through.
The Equation
The core equation is:
\[
P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)}
\]
To write this out in English, it says that the probability of \(A\) if \(B\) is true is derivable using the probability of \(B\) if \(A\) is true with the independent probabilities of \(A\) and \(B\).
The Drug Testing Example
On the wikipedia page they talk about a drug test to show how to work through this. The probabilities that they state are:
With this we can then calculate the value of \(P(\text{Positive} \mid \text{Non-user})\) (i.e. the chance that a non user will be flagged by the test):
If we are interested in finding the chance that a person is a user given a positive test result (\(P(\text{User} \mid \text{Positive})\)) then we can use the rule. To do so we need to first calculate the overall chance of \(P(\text{Positive})\), which we can calculate as follows:
The wikipedia page agrees with this result. As a drug test this is not encouraging though! You are far more likely to identify non-users when you get a positive result.
Precision and Recall
I’ve often seen classifier accuracy referred to using the precision and recall terms. Breaking these down we have four different classifications:
True Positive (TP) - the classification is positive and correct
True Negative (TN) - the classification is negative and correct
False Positive (FP) - the classification is positive and incorrect
False Negative (FN) - the classification is negative and incorrect
Precision is the ratio of positive drug tests that identify drug users compared to all positive drug tests (\(\frac{TP}{TP + FP}\)). Recall is the ratio of positive drug tests that identify drug users compared to all drug users (\(\frac{TP}{TP + FN}\)).
We can break down the drug test into these four classifications.
I do not think it is possible. The key problem here is that we have two known values, \(Precision\) and \(Recall\) and three unknown values, \(TP\), \(FP\) and \(FN\). If one of the unknown values were known then it would be possible to calculate the others.
We can demonstrate this inability by showing two sets of values that produce the same precision and recall values.
For this first table we can see that precision and recall are both 0.5:
This is a simultaneous equation problem and my rule of thumb is that you need at least as many equations as there are unknown variables.
Interactive Expression Solver
So since I went to the trouble of learning Objective JS last time, is it possible to create an interactive Bayesian solver? Ideally you would be able to express some facts and then ask it to derive another expression.
Let’s imagine the input. The original challenge was to calculate \(P(\text{User} \mid \text{Positive})\) given \(P(\text{Positive} \mid \text{User})\), \(P(\lnot \text{Positive} \mid \lnot \text{User})\) and \(P(\text{User})\). We can imagine providing the values as separate inputs and then asking it to calculate a given probability. Implementing this should be a fun way to explore the underlying problem.
To make the parser simpler I have separated the input into two types of expression - facts and queries. The facts are statements that include the probability, such as \(P(\text{Positive} \mid \text{User}) = 0.9\). The queries are statements that do not include the probability, such as \(P(\text{User} \mid \text{Positive})\).
When working like this the original drug testing example can be stated as:
Broadly this works by parsing the expressions that you have entered, and then deriving new facts until all of the queries are answered by a fact. To derive a new fact the following transformations can be applied.
Now that I write this out I realise that this doesn’t feel right. The inverse should be all of the things that are not expressed by the original, so the expression should be:
Given that we can negate \(P(\text{User})\) that means we need \(P(\text{Positive} \mid \text{User})\), \(P(\text{Positive} \mid \lnot \text{User})\) and \(P(\text{User})\) (or the inverse) to calculate this.
I thought that this was legit and now I’m torn. The example shows the problem - the distribution of users could differ between the young and the old, and so the multiplication using the overall distribution doesn’t relate to the subset distribution. So, once again, this solver is incorrect when operating over multiple givens.
Then to double down on this I also allowed it to work with unrelated conditions that are not altered by the operation:
easy. So if we use the chain rule in a more general solver then it should be possible to operate accurately over terms with more expressions.
Conclusion
I came up with what I thought were reasonable extensions to the equations to operate over multiple conditions or givens. Now that I think about it I can negate each one.
It’s unfortunate that my initial instincts around this were inaccurate. I’m more pleased about my ability to disprove each extension. I know I need to work on this, and finding the areas where I am weak is the best way to do so. Onward!
Why leave it?
I’ve worked on this post quite a bit. I have learned by doing so. Changing the solver is more of a programming problem than a data science problem.
I might well work on the solver to change it for another post. To fix these operations would be reasonably involved and I want to move to a better way to express the operations that can be performed.
Observable JS Insanity
So how does the solver work? How does it understand what you put in and create a series of steps to solve it? Here I am going to explain how it all works.
This blog post is on a website and websites use javascript. In the evaluation of different interactivity solutions that I did I found that Observable JS was the best way to add interactivity. That means this solver is written in javascript.
The flow of the solver is as follows:
The full code is available below. It is quite long.
Parsed Representation
The parser creates Queries and Facts out of what you write. A statement that has a probability is a Fact, while a statement without a probability is a Query. I named them this way as the solver is finding solutions to the Queries that you make based on the Facts you enter and what can be derived from them. I created a class to represent each of them.
The words used for the condition or given within a probability are Terms. They can be negated so they get a class too.
It’s a copy of the combinator parser in this post which is really nice to work with. Combinator parsers are cool.
It works around consuming the input a single character at a time. A parser can either accept that input, returning some encoded representation of it, or reject it by returning undefined.
Some parsers are created by combining other parsers. The seq parser takes several child parsers which are run in sequence over the input. If any of them fail then the created seq parser fails.
Combinations like this are how you create the overall parser. You are combining functions until you end up with something that can consume the entire input. After working with this I find it easy to work with, and it leads to code like this:
Which shows how a fact is matched. You can see the entire code below:
Code
// inspired by https://github.com/dabeaz/blog/blob/main/2023/three-problems.mdparse = {/** * the signature of the parser is input -> (x, xs) | undefined * if the parser accepts the input then the output will be of the form (x, xs) * if the parser rejectes the input then the output will be undefined *//** * Shift if a parser that just tests if the input is acceptable at all, * consuming a single letter from it if it is. */const shift = ([input, index]) => {if (index < input.length) {return [input[index], [input, index +1]]; }returnundefined; };/** * Nothing just returns undefined as the first element. * This is the inverse of shift. */const nothing = (input) => [undefined, input];/** * Applies the predicate to the first element issued by the parser. * If the predicate passes then the parser output is returned. */const filter = (predicate) => (parser) => (input) => {const parsed =parser(input);if (parsed ===undefined) {returnundefined; }if (predicate(parsed[0])) {return parsed; }returnundefined; };/** * Applies the function to the first element issued by the parser. */const map = (f) => (parser) => (input) => {const parsed =parser(input);if (parsed ===undefined) {returnundefined; }return [f(parsed[0]), parsed[1]]; };/** * This repeatedly applies a parser until it rejects the input. * If the parser is applied at least once then this returns all of the accepted values. * Otherwise it returns undefined. */const oneOrMore = (parser) => (input) => {const result = [];let current = input;let match;while ((match =parser(current)) !==undefined) {let value; [value, current] = match; result.push(value); }if (result.length===0) {returnundefined; }return [result, current]; };/** * This applies each parser in turn unless one rejects the input. */const seq = (...parsers) => (input) => {const result = [];let current = input;for (let i =0, count = parsers.length; i < count; i +=1) {const parser = parsers[i];const match =parser(current);if (match ===undefined) {returnundefined; }let value; [value, current] = match; result.push(value); }return [result, current]; };/** * This applies each parser in turn, returning the first that accepts the input. */const or = (...parsers) => (input) => {for (let i =0, count = parsers.length-1; i < count; i +=1) {const parser = parsers[i];const match =parser(input);if (match !==undefined) {return match; } }return parsers[parsers.length-1](input); };const zeroOrMore = (parser) =>or(oneOrMore(parser),seq());const join = (letters) => letters.join('');const left = (first, second) =>map((pair) => pair[0])(seq(first, second));const right = (first, second) =>map((pair) => pair[1])(seq(first, second));const maybe = (parser) =>or(parser, nothing);const isDigit = (char) => char >='0'&& char <='9';const isLetter = (char) => (char >='a'&& char <='z') || (char >='A'&& char <='Z');const isLiteral = (value) => (char) => char === value;const digit =filter(isDigit)(shift);const letter =filter(isLetter)(shift);const literal = (value) =>filter(isLiteral(value))(shift);const dot =literal('.');const digits =oneOrMore(digit);const letters =oneOrMore(letter);const number =map(parseFloat)(map(join)(or(map((result) => [...result[0], result[1],...result[2]])(seq(digits, dot, digits)),map((result) => [...result[0], result[1]])(seq(digits, dot)),map((result) => [result[0],...result[1]])(seq(dot, digits)), digits, )), );const word =map(join)(letters);const whitespace =zeroOrMore(or(literal(' '),literal('\t'),literal('\r'),literal('\n')));const token = (parser) =>right(whitespace, parser);const P =token(literal('P'));const OPEN =token(literal('('));const CLOSE =token(literal(')'));const EQUALS =token(literal('='));const NOT =token(literal('¬'));const WORD =token(word);const GIVEN =token(literal('|'));const COMMA =token(literal(','));const NUMBER =token(number);const SEMICOLON =token(literal(';'));const TERM =or(map((term) => (newTerm({ term,not:true })))(right(NOT, WORD)),map((term) => (newTerm({ term,not:false })))(WORD), );const combine = (result) => [result[0],...result[1]];const TERM_LIST =map(combine)(seq( TERM,zeroOrMore(right(COMMA, TERM)),maybe(COMMA), ));const extractWithGiven = (result) => ({ condition: result[0],given: result[2] });const WITH_GIVEN =map(extractWithGiven)(seq(TERM_LIST, GIVEN, TERM_LIST));const extractWithoutGiven = (result) => ({ condition: result,given: [] });const WITHOUT_GIVEN =map(extractWithoutGiven)(TERM_LIST);const extractFact = (result) => (newFact({ ...result[2],probability: result[5] }));const FACT =seq( P, OPEN,or(WITH_GIVEN, WITHOUT_GIVEN), CLOSE, EQUALS, NUMBER, );const extractQuery = (result) => (newQuery(result[2]));const QUERY =seq( P, OPEN,or(WITH_GIVEN, WITHOUT_GIVEN), CLOSE, );const FACT_OR_QUERY =or(map(extractFact)(FACT),map(extractQuery)(QUERY));functionparse(text) {const grammar =map(combine)(seq( FACT_OR_QUERY,zeroOrMore(right(SEMICOLON, FACT_OR_QUERY)),maybe(SEMICOLON), ));const match =grammar([text,0]);if (match ===undefined) {return { facts: [],queries: [],error:`unable to parse '${text}'` }; }const [expressions, [_text, index]] = match;const remaining = _text.substring(index);const facts = expressions.filter((expression) => expression.probability!==undefined);const queries = expressions.filter((expression) => expression.probability===undefined);if (remaining.trim().length>0) {return { facts, queries,error:`unable to parse '${remaining}'` }; }return { facts, queries,error:undefined }; }return parse;}
Solver
The solver is relatively simple. It works as a breadth first search, where a given set of facts are expanded according to the three operations, above. The generated facts are tested to see if they have been considered before, and those that have not are added to a queue for evaluation. Before expanding the facts the current set of facts is tested to see if it answers the queries posed.
While the process is conceptually simple the implementation is quite messy. One reason that I do not want to address the issues with the expansions is that it is quite involved to work with these representations.
Code
functiongetMissingTerms({ facts, queries }) {const getTerms = ({ condition, given }) => condition.concat(given).map(({ term }) => term);const termsInFacts =newSet(facts.flatMap(getTerms));const termsInQueries = [...newSet(queries.flatMap(getTerms))];const missingTerms = termsInQueries.filter((term) =>!termsInFacts.has(term));return missingTerms;}functiongetDuplicates(entries) {const seen =newSet();const duplicates = entries.filter((entry) => {const identifier = entry.toId();if (seen.has(identifier)) {returntrue; } seen.add(identifier);returnfalse; });return duplicates;}functionvalidate({ facts, queries }) {if (queries.length===0) {return { error:'no query to solve' }; }const missingTerms =getMissingTerms({ facts, queries });if (missingTerms.length>0) {return { error:`cannot establish value of query terms ${missingTerms.join(', ')} as they do not appear in the conditions` }; }const duplicatedFacts =getDuplicates(facts);if (duplicatedFacts.length>0) {return { error:`duplicate facts stated: ${duplicatedFacts.map((fact) => fact.toString()).join(', ')}` }; }const duplicatedQueries =getDuplicates(queries);if (duplicatedQueries.length>0) {return { error:`duplicate queries stated: ${duplicatedQueries.map((query) => query.toString()).join(', ')}` }; }return { error:undefined };}/** * This creates a comparable key that encodes the current state. * The key is used to determine if this state has been visited before. */functiontoStateIdentifier({ facts }) {const identifiers = facts.map((fact) => fact.toId());return identifiers.sort().join(', ');}functionmakeNextStates({ facts, expressions }) {const hasGiven = (fact) => fact.given.length>0;const singleCondition = (fact) => fact.given.length===0&& fact.condition.length===1;const ungivenFacts =newMap( facts.filter(singleCondition).map((fact) => [fact.condition[0].toId(), fact]), );const invert = (fact) => fact.invert({ expressions });const removeGiven = (fact) => fact.removeGiven({ facts, expressions });const bayes = (fact) => fact.bayes({ ungivenFacts, expressions });const current = facts.map((fact) => fact.toId());const nextFacts = [...facts.flatMap(invert),...facts.filter(hasGiven).flatMap(bayes),...facts.flatMap(removeGiven), ];return nextFacts.filter(({ fact }) =>!current.includes(fact.toId())).map((row) => ({ facts: [...facts, row.fact],expressions: row.expressions }));}functiongetSolution({ facts, queries }) {return queries.map((query) => facts.filter((fact) => fact.solves(query))).filter((solutions) => solutions.length>0);}functionsolve({ facts, queries }) {const { error } =validate({ facts, queries });if (error !==undefined) {return { error,solutions:undefined,expressions:undefined }; }const expressions = facts.map((fact) => fact.toString());let states = [{ facts, queries, expressions }];const evaluated = [];while (states.length>0) {const state = states.shift();const solutions =getSolution(state);if (solutions.length=== queries.length) {return {solutions: solutions.map((list) => list[0]),error:undefined,expressions: state.expressions, }; } evaluated.push(toStateIdentifier(state));const current = states.map(toStateIdentifier);const children =makeNextStates(state).map((child) => ({ ...child,queries: state.queries })).filter((childState) => {const id =toStateIdentifier(childState);return!(evaluated.includes(id) || current.includes(id)); }); states = states.concat(children); }return { error:'no solution found',solutions:undefined, expressions };}
If I were to do this again I would write a language for the substitutions which is then parsed into the expander. It would also be nice to have a directed search, although that might be a bit much.