Ouissal/tutorials babi6 #63

ouissal-moumou · 2025-05-29T13:21:03Z

The bAbI6 Tutorial with DisCoCirc

This PR introduces the bAbI6 tutorial. Please add any comments or questions if you have.

AnnaNPearson · 2025-05-30T09:19:48Z

Some general comments on the text in the preprocessing notebook:

change 'subjects' to 'people' as it's more readable
the sandwich functor is introduce by Laakonnen so that paper should be cited as well
"addresses this issue by introducing a novel construction that breaks down a frame into a sequence of boxes with the frame's content. Now that we have these different frames" - we need to change this wording a bit. Can discuss in our meeting
-text for optimization purposes -> reword this (discuss in meeting)
-Section 4 add in the reference to Tiffany's paper to explain why we choose Sim4 Ansatz with three layers
-'guess the answer to the question by performing postselections. ' need to rephrase (discuss in meeting)
-question asking circuits -> question circuits

AnnaNPearson · 2025-05-30T09:30:28Z

Please double check Tiffany's original comments and incorporate these as there are several which are not addressed

DNA386 · 2025-05-30T10:21:03Z

docs/tutorials/discocirc_babi6_prep.ipynb

+    "    print(\"the minimum size is: \" + str(min_size))\n",
+    "    \n",
+    "    # Randomly sample from each group to balance\n",
+    "    balanced_positive = random.sample(positive_items, min_size)\n",


should this be seeded?

DNA386 · 2025-05-30T10:26:47Z

docs/tutorials/discocirc_babi6_prep.ipynb

+    "TRAINING_SAMPLE_SIZE = 120\n",
+    "VALIDATION_SAMPLE_SIZE = 30\n",
+    "TEST_SAMPLE_SIZE = 30\n",


You define these here, but I don't think they're used later?
In the next tutorial the validation set has 36 items in it.

Indeed. These are redundant variables from a previous version where we tweaked the number of entries.

… as it is still under investigation

ouissal-moumou · 2025-06-02T00:21:05Z

We noticed that there is a reproducibility problem in this experiment. While the accuracies obtained are generally good: between 80% and 100%, the results of the experiments are not reproducible once run on the same data despite forcing the same seeding. Here is what we know:

the same seeding is applied + there is no shuffling for epochs
we get the same initial weights
we get the same accuracy and loss in the first epoch across all runs, and it only starts changing from the first epoch onward
Looking at the Lambeq code, it seems as so fixing the seeding from our side should do the job. Two possible explanations have been suggested: maybe we should look at the contraction algorithm for tensors by Lambeq (suggested by Richie) or loading items from dictionaries which do not have the same ordering might be behind the problem (suggested by Tiffany). Currently these are under investigation.

…ts to reduce verbosity of the tutorial by simplifiying sentences, rewriting them, or making them shorter. The reproducibility problem persists, and one might notice that some changes to the training part have been added to reflect debugging for this particular problem. Will be removed after the problem is solved.

AnnaNPearson · 2025-06-03T10:14:44Z

Please can you do another pass to remove all instances of the word 'subejct' to the word 'person', also applies for 'context' or story which should instead be text for consitency (and also e.g. ctx in the code). And any instances of 'question asking' which remain - e.g For more details about question asking in {term}DisCoCirc, -> for more details on the choice of the implementation of quesitons in discocirc see ....

docs/tutorials/discocirc_babi6_prep.ipynb

AnnaNPearson · 2025-06-03T10:24:54Z

docs/tutorials/discocirc_babi6_prep.ipynb

+    "\n",
+    "After extracting the texts, they are filtered to only keep the ones whose number of sentences is less than or equal to `TEXT_LENGTH`, which we set in the previous cell to determine the maximum number of sentences that we want in a text for better efficiency. This is to make sure that we do not get huge circuits later on when we convert the texts into circuits, which might slow down the experiment. \n",
+    "\n",
+    "After this filtering, the last step is to convert the list of texts from a list of arrays of sentences, to a list of sentences. In other words, we concatenate the sentences in each text (which is an array) to obtain a string."


The function returns the texts as strings.

AnnaNPearson · 2025-06-03T10:25:25Z

docs/tutorials/discocirc_babi6_prep.ipynb

+    "    texts = []\n",
+    "    qnas = []\n",
+    "    text_length = []\n",
+    "    for story in stories:\n",


why the switch from text to story? Also ctx as a variable is outdated refering to context

docs/tutorials/discocirc_babi6_prep.ipynb

AnnaNPearson · 2025-06-03T10:29:49Z

docs/tutorials/discocirc_babi6_prep.ipynb

+   "id": "b04a8ff9",
+   "metadata": {},
+   "source": [
+    "Now that the text circuits are post-processed for optimization, it is time to make the assertion circuits to later sequentially compose the latter with the former. \n",


This can be said in a simpler way

docs/tutorials/discocirc_babi6_prep.ipynb

AnnaNPearson · 2025-06-03T10:35:30Z

docs/tutorials/discocirc_babi6_prep.ipynb

+    "We start by creating a layer composed of either identities (to link with the wires corresponding to the question nouns), or discards (for the rest of the wires). Once we sequentially compose this layer with the text circuit, this leaves us with a circuit whose codomain has two wires corresponding to the question nouns. In order for us to attach the assertion boxes, we have to make sure that the wires from the assertion circuits are linked to the right wires from the text circuit. To achieve this, we check the question ids of the wires in the text circuits (to see whether the nouns in the text circuits are in the right order). This helps us decide whether to use the assertion boxes that come with swaps, or the ones without swaps (if the question wires are in the wrong order, we would need a swap to bring them back to the right order for the questions. Remember, we already created assertion boxes that are also equiped with swaps for this purpose).\n",
+    "\n",
+    "Notice that, throughout the next cell, we always have two circuits. The circuit names ending in \"pos\" signal the circuits corrsponding to the affirmative assertions, while their counterparts ending in \"neg\" signal the ones corresponding to the negative assertions.\n",
+    "\n",


Does it make more sense to use 'aff' for affiramtive instead of 'pos'?
I think the three paragraphs can be shortened to a simple statement saying that the wires have to match. If the location comes before the person in the text circuit then a swap is needed for the question to be composed to the correct wires.

Thank you. Should I delete the paragraph that explains the implementation details (the discards and the swaps), and just keep it simple by saying that the wires have to match and they can come in any order so we have to provide boxes that accommodate both possible cases?

…ies and the one on reqriting the three paragraphs justifying the swaps

neiljdo · 2025-06-03T12:53:16Z

Hi @ouissal-moumou, does this PR supersede the previous PR #42. I see that both refer to using the experimental DisCoCirc module on the bAbI6 task?

EDIT: Closed the other ticket as all work done is on this PR now.

…s have been addressed + more changes to wording

…the works since there seems to be a problem with Lambeq

ouissal-moumou · 2025-06-04T22:33:41Z

Further changes have been made to the data-prep part of the tutorial both in terms of the text and code. The training notebook is still under investigation since it seems there is a bug in Lambeq. In a nutshell, it seems that reproducibility is only possible when we make sure that the training dataset is unshuffled.

AnnaNPearson · 2025-06-05T11:28:31Z

docs/tutorials/discocirc_babi6_prep.ipynb

+    "Now that we already have the circuits representing the texts, we need to make the circuits representing the assertions. Remember, in our experiment, we need to have a pair of circuits, one for the affirmative case, and the other for the negative case. However, when adding the box corresponding to the assertion, we have to make sure that the wires of the assertion box match with the wires representing the nouns from the text.  \n",
+    "\n",
+    "Below, the function `return_noun_list` returns all the nouns in a text. The function `return_q_nouns` return all the nouns in a question. In the latter, we take the third and sixth word as the person and location in the question respectively. This works because of the simple case of the bAbI6 experiments, all the questions are of the format \"Is the person in the location?\".\n",
+    "\n",


I think the dataset has Questions e.g "Is Emily in the kitchen?" so slight rephrasing needed with no 'the' in front of 'person' for the question 'Is person in the location? ' and then we have second and fifth words of the question. This also corresponds to the code using index 1 and 4.

ouissal-moumou and others added 14 commits February 11, 2025 17:28

first version of the tutorial

e022b89

minor changes: errors eliminated

2356f8c

Rename NBs

73d0317

Add index and hooks

fb08232

Move and rename data files

3aa48ac

Change dataset filename in NB

39f4d55

resolving some of Tiffany's comments

350514a

some conflicts. used git merge rebase false

5717bc9

pushing some of the changes so far: may 27th 2025

45c4ee1

Merge remote-tracking branch 'upstream/main' into discocirc-babi6

ccf98a9

more changes + seeding for pytorch to improve accuracy

e6b9033

removing redundant files

3f1a24f

changes to glossary added

c6ffbd4

cleaning up the notebooks from unecessary outputs

11e47dd

DNA386 reviewed May 30, 2025

View reviewed changes

fixing the comments raised todfay except the point on reproducibility…

79431e1

… as it is still under investigation

ouissal-moumou added 2 commits June 2, 2025 22:42

minor chanes to the presentation

955c26a

AnnaNPearson reviewed Jun 3, 2025

View reviewed changes

ouissal-moumou added 4 commits June 3, 2025 12:29

all Anna's recent comments have been fixed except the one on the stor…

9afdc9b

…ies and the one on reqriting the three paragraphs justifying the swaps

one more pass on the tutorial

a83ec62

another pass

7be5030

removing errors with training due to outdated naming conventions

8dd3359

another pass was done on the tutorial to make sure the latest comment…

7764cdf

…s have been addressed + more changes to wording

neiljdo mentioned this pull request Jun 4, 2025

Add DisCoCirc tutorial on babi task #42

Closed

more updates on the prep notebook + training notebook is still under …

fc74c82

…the works since there seems to be a problem with Lambeq

AnnaNPearson reviewed Jun 5, 2025

View reviewed changes

ouissal-moumou and others added 2 commits June 5, 2025 14:41

Fixing Anna's comment

5f4b70a

Ignore pickle files

5414013

Ouissal/tutorials babi6 #63

Are you sure you want to change the base?

Ouissal/tutorials babi6 #63

Uh oh!

Conversation

ouissal-moumou commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The bAbI6 Tutorial with DisCoCirc

Uh oh!

AnnaNPearson commented May 30, 2025

Uh oh!

AnnaNPearson commented May 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ouissal-moumou commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AnnaNPearson commented Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

neiljdo commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ouissal-moumou commented Jun 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ouissal-moumou commented May 29, 2025 •

edited

Loading

ouissal-moumou commented Jun 2, 2025 •

edited

Loading

neiljdo commented Jun 3, 2025 •

edited

Loading