Add Firecrawl integration for website content crawling #6

malinosqui · 2025-06-09T17:22:56Z

Description

Related Issues

Type of Change

Testing

Screenshots (if applicable)

Additional Notes

This pull request introduces a new integration with the Firecrawl API to enhance the web crawling capabilities of the kodustech/onlook project. The key changes include:

Chat API Enhancements:
- A new processUrls function is added to the chat API endpoint in apps/web/client/src/app/api/chat/route.ts. This function processes embedded URLs in user messages using an AI model, modifying the last user message before further processing. New imports for URL extraction and AI tooling are included, along with logic in the POST handler to invoke this function.
AI Tool Handling:
- In apps/web/client/src/app/project/[id]/_hooks/use-chat.tsx, handling for a new AI tool, CRAWL_URL_TOOL_NAME, is introduced. This includes importing necessary constants and adding a conditional block in the tool dispatcher, currently as a placeholder.
Crawler Service Implementation:
- A new CrawlerService module is added in packages/ai/src/tools/crawler.ts for web crawling using the Firecrawl API. It includes TypeScript interfaces for crawl options, API responses, and crawled content structure. A validation function checks the integrity of crawler responses. The CrawlerService class is implemented as a singleton, handling API key initialization and providing an async crawlUrl method with error handling.
Utility Function for URL Extraction:
- A utility function extractUrls is introduced in packages/ai/src/tools/helpers.ts to parse text strings, identify and validate various forms of URLs, and return a unique set of valid URLs.
AI Toolset Expansion:
- A new crawlUrlTool is added to the AI toolset in packages/ai/src/tools/index.ts, defining its parameters with Zod for URL crawling and specifying options like crawl limits and content formats.
Chat Message Context Extension:
- The chat message context system is extended in packages/models/src/chat/message/context.ts by adding a new type, LinkMessageContext, to represent hyperlinks. This involves updating the MessageContextType enum and including the new type in the ChatMessageContext union type.

These changes collectively enhance the project's ability to process and analyze web content through improved URL handling and web crawling functionalities.

Co-Authored-By: kiet@onlook.dev <kiet@onlook.dev>

kodus-qa · 2025-06-09T17:23:00Z

Code Review Completed! 🔥

The code review was successfully completed based on your current configurations.

Kody Guide: Usage and Configuration

Interacting with Kody

Request a Review: Ask Kody to review your PR manually by adding a comment with the @kody start-review command at the root of your PR.
Provide Feedback: Help Kody learn and improve by reacting to its comments with a 👍 for helpful suggestions or a 👎 if improvements are needed.

Current Kody Configuration

Review Options

The following review options are enabled or disabled:

Options	Enabled
Security	✅
Code Style	✅
Kody Rules	✅
Refactoring	✅
Error Handling	✅
Maintainability	✅
Potential Issues	✅
Documentation And Comments	✅
Performance And Optimization	✅
Breaking Changes	✅

Access your configuration settings here.

kodus-qa · 2025-06-09T17:27:25Z

apps/web/client/src/app/project/[id]/_hooks/use-chat.tsx

@@ -10,6 +10,8 @@ import {
    ONLOOK_INSTRUCTIONS_TOOL_NAME,
    READ_FILES_TOOL_NAME,
    READ_FILES_TOOL_PARAMETERS,
+    CRAWL_URL_TOOL_NAME,
+    CRAWL_URL_TOOL_PARAMETERS,


// If CRAWL_URL_TOOL_PARAMETERS are not used by the CRAWL_URL_TOOL_NAME handler, // this import might be unnecessary. Ensure it's used if intended, otherwise consider removing: // CRAWL_URL_TOOL_PARAMETERS,

Unused or improperly utilized CRAWL_URL_TOOL_PARAMETERS in apps/web/client/src/app/project/[id]/_hooks/use-chat.tsx. The parameter is imported but not used in the handler logic (lines 14, 66-67).

This issue appears in multiple locations:

apps/web/client/src/app/project/[id]/_hooks/use-chat.tsx: Lines 14-14

apps/web/client/src/app/project/[id]/_hooks/use-chat.tsx: Lines 66-67
Please ensure CRAWL_URL_TOOL_PARAMETERS is either used in the handler logic or remove the import to avoid clutter.

_{Talk to Kody by mentioning @kody}

_{Was this suggestion helpful? React with 👍 or 👎 to help Kody learn from this interaction.}

kodus-qa · 2025-06-09T17:27:29Z

apps/web/client/src/app/api/chat/route.ts

+                    content: content,
+                },
+            ],
+            tools: { [CRAWL_URL_TOOL_NAME]: chatToolSet[CRAWL_URL_TOOL_NAME] as any },


tools: { [CRAWL_URL_TOOL_NAME]: chatToolSet[CRAWL_URL_TOOL_NAME] as any /* TODO: Replace 'any' with the specific type for this tool if known. The original suggestion's 'SpecificToolType' was a placeholder. */ },

Multiple instances of using any type in TypeScript code, which bypasses type checking and reduces maintainability. In apps/web/client/src/app/api/chat/route.ts, lines 26 and 46 use as any and (m: any) respectively.

This issue appears in multiple locations:

apps/web/client/src/app/api/chat/route.ts: Lines 26-26

apps/web/client/src/app/api/chat/route.ts: Lines 46-46
Please replace any with specific types or interfaces to maintain type safety. If the type is complex or dynamic, add a comment explaining the rationale for using any.

_{Talk to Kody by mentioning @kody}

_{Was this suggestion helpful? React with 👍 or 👎 to help Kody learn from this interaction.}

kodus-qa · 2025-06-09T17:27:31Z

apps/web/client/src/app/project/[id]/_hooks/use-chat.tsx

+        } else if (toolName === CRAWL_URL_TOOL_NAME) {
+            return 'Web content crawled successfully.';


} else if (toolName === CRAWL_URL_TOOL_NAME) { const params = toolCall.args as z.infer<typeof CRAWL_URL_TOOL_PARAMETERS>; // Assuming parameters are parsed and used return await handleCrawlUrlTool(params, editorEngine); // Call to a new dedicated handler function }

The logic for CRAWL_URL_TOOL_NAME is implemented inline within handleToolCall. Other tools in this function (e.g., LIST_FILES_TOOL_NAME, READ_FILES_TOOL_NAME) delegate to separate handler functions like handleListFilesTool. For consistency, improved readability, and easier maintenance (especially if the crawl logic becomes complex), consider extracting the CRAWL_URL_TOOL_NAME handling into its own dedicated async function, similar to the pattern used for other tools.

_{Talk to Kody by mentioning @kody}

_{Was this suggestion helpful? React with 👍 or 👎 to help Kody learn from this interaction.}

kodus-qa · 2025-06-09T17:27:35Z

packages/ai/src/tools/crawler.ts

+    const firstItem = response.data[0];
+    return (
+        typeof firstItem === 'object' &&
+        firstItem !== null &&
+        ('html' in firstItem || 'markdown' in firstItem) &&
+        (firstItem.html === undefined || typeof firstItem.html === 'string') &&
+        (firstItem.markdown === undefined || typeof firstItem.markdown === 'string')
+    );


// This loop executes only if response.data is not empty, // due to the check on lines 48-50. for (const item of response.data) { if ( !( typeof item === 'object' && item !== null && ('html' in item || 'markdown' in item) && (item.html === undefined || typeof item.html === 'string') && (item.markdown === undefined || typeof item.markdown === 'string') ) ) { return false; // Found an invalid item } } return true; // All items are valid

The validateCrawlerResponse function currently only validates the structure of the first item in the response.data array. If response.data contains multiple items, subsequent items are not validated. This could lead to runtime errors if other items in the array do not conform to the expected { html?: string; markdown?: string; } structure, as the function would incorrectly return true. Consider iterating over all items in response.data to ensure each one meets the validation criteria.

_{Talk to Kody by mentioning @kody}

_{Was this suggestion helpful? React with 👍 or 👎 to help Kody learn from this interaction.}

kodus-qa · 2025-06-09T17:27:37Z

packages/ai/src/tools/crawler.ts

+            }
+            return response;
+        } catch (error) {
+            console.error('Error during crawling:', error);


// Let the caller handle logging and any error wrapping throw error;

The crawlUrl method currently logs errors to console.error before re-throwing them. While helpful for debugging, this direct logging can make the service harder to integrate into applications with specific logging strategies. Consider removing the console.error call and letting the caller decide how to handle/log the thrown error. This improves modularity and makes the service more versatile.

_{Talk to Kody by mentioning @kody}

_{Was this suggestion helpful? React with 👍 or 👎 to help Kody learn from this interaction.}

kodus-qa · 2025-06-09T17:27:41Z

packages/ai/src/tools/index.ts

@@ -30,8 +31,44 @@ export const onlookInstructionsTool = tool({
    parameters: z.object({}),
 });

+export const CRAWL_URL_TOOL_NAME = 'crawl_url';
+export const CRAWL_URL_TOOL_PARAMETERS = z.object({
+    urls: z.array(z.string()).describe('Array of URLs to crawl'),


urls: z.array(z.string().url()).describe('Array of URLs to crawl'),

The urls parameter is described as 'Array of URLs to crawl', but the schema z.array(z.string()) allows any string, not just valid URLs. To ensure data integrity and prevent potential downstream errors in CrawlerService if it expects valid URL formats, consider using z.string().url().

_{Talk to Kody by mentioning @kody}

_{Was this suggestion helpful? React with 👍 or 👎 to help Kody learn from this interaction.}

Add Firecrawl integration for website content crawling

fa14301

Co-Authored-By: kiet@onlook.dev <kiet@onlook.dev>

kodus-qa bot reviewed Jun 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Firecrawl integration for website content crawling #6

Add Firecrawl integration for website content crawling #6

Uh oh!

malinosqui commented Jun 9, 2025 •

edited by kodus-qa bot

Loading

Uh oh!

kodus-qa bot commented Jun 9, 2025 •

edited

Loading

Uh oh!

kodus-qa bot Jun 9, 2025

Uh oh!

kodus-qa bot Jun 9, 2025

Uh oh!

kodus-qa bot Jun 9, 2025

Uh oh!

kodus-qa bot Jun 9, 2025

Uh oh!

kodus-qa bot Jun 9, 2025

Uh oh!

kodus-qa bot Jun 9, 2025

Uh oh!

Uh oh!

		} else if (toolName === CRAWL_URL_TOOL_NAME) {
		return 'Web content crawled successfully.';

Add Firecrawl integration for website content crawling #6

Are you sure you want to change the base?

Add Firecrawl integration for website content crawling #6

Uh oh!

Conversation

malinosqui commented Jun 9, 2025 • edited by kodus-qa bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Type of Change

Testing

Screenshots (if applicable)

Additional Notes

Uh oh!

kodus-qa bot commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Completed! 🔥

Uh oh!

kodus-qa bot Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

kodus-qa bot Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

kodus-qa bot Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

kodus-qa bot Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

kodus-qa bot Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

kodus-qa bot Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

malinosqui commented Jun 9, 2025 •

edited by kodus-qa bot

Loading

kodus-qa bot commented Jun 9, 2025 •

edited

Loading