Skip to content
This repository has been archived by the owner on Jun 8, 2024. It is now read-only.

Issue in BaseChunker.ts #49

Merged
merged 2 commits into from
Sep 4, 2023
Merged

Issue in BaseChunker.ts #49

merged 2 commits into from
Sep 4, 2023

Conversation

Jaikant
Copy link
Contributor

@Jaikant Jaikant commented Aug 30, 2023

createChunks function has an asynchronous operation running within a forEach loop. The async function inside forEach doesn't make the loop wait for the promise to resolve. As a result, the function would return an empty arrays of documents, ids, metadatas, before they are fully populated.

createChunks function has an asynchronous operation running within a forEach loop. The async function inside forEach doesn't make the loop wait for the promise to resolve. As a result, the function would return an empty arrays of documents, ids, metadatas, before they are fully populated.
Copy link
Contributor

@cachho cachho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thank you, but this is eslint restricted syntax. Do you not see the warnings?

@cachho
Copy link
Contributor

cachho commented Aug 30, 2023

How does this look as an alternative?

import { createHash } from 'crypto';
import type { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';

import type { BaseLoader } from '../loaders';
import type { Input, LoaderResult } from '../models';
import type { ChunkResult } from '../models/ChunkResult';

class BaseChunker {
  textSplitter: RecursiveCharacterTextSplitter;

  constructor(textSplitter: RecursiveCharacterTextSplitter) {
    this.textSplitter = textSplitter;
  }

  async createChunks(loader: BaseLoader, url: Input): Promise<ChunkResult> {
    const documents: ChunkResult['documents'] = [];
    const ids: ChunkResult['ids'] = [];
    const datas: LoaderResult = await loader.loadData(url);
    const metadatas: ChunkResult['metadatas'] = [];

    const dataPromises = datas.map(async (data) => {
      const { content, metaData } = data;
      const chunks: string[] = await this.textSplitter.splitText(content);
      chunks.forEach((chunk) => {
        const chunkId = createHash('sha256')
          .update(chunk + metaData.url)
          .digest('hex');
        ids.push(chunkId);
        documents.push(chunk);
        metadatas.push(metaData);
      });
    });

    await Promise.all(dataPromises);

    return {
      documents,
      ids,
      metadatas,
    };
  }
}

export { BaseChunker };

@Jaikant
Copy link
Contributor Author

Jaikant commented Aug 30, 2023

Yes this should work too!

@cachho
Copy link
Contributor

cachho commented Aug 30, 2023

Do you want to update it? It's your PR :)

@cachho
Copy link
Contributor

cachho commented Sep 4, 2023

did the changes, merging now. please check on your end if everything works as intended. thanks.

@cachho cachho merged commit 87ccf74 into mem0ai:main Sep 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants