-
Notifications
You must be signed in to change notification settings - Fork 0
Add Firecrawl integration for website content crawling #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-Authored-By: kiet@onlook.dev <kiet@onlook.dev>
Code Review Completed! 🔥The code review was successfully completed based on your current configurations. Kody Guide: Usage and ConfigurationInteracting with Kody
Current Kody ConfigurationReview OptionsThe following review options are enabled or disabled:
|
@@ -10,6 +10,8 @@ import { | |||
ONLOOK_INSTRUCTIONS_TOOL_NAME, | |||
READ_FILES_TOOL_NAME, | |||
READ_FILES_TOOL_PARAMETERS, | |||
CRAWL_URL_TOOL_NAME, | |||
CRAWL_URL_TOOL_PARAMETERS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If CRAWL_URL_TOOL_PARAMETERS are not used by the CRAWL_URL_TOOL_NAME handler,
// this import might be unnecessary. Ensure it's used if intended, otherwise consider removing:
// CRAWL_URL_TOOL_PARAMETERS,
Unused or improperly utilized CRAWL_URL_TOOL_PARAMETERS
in apps/web/client/src/app/project/[id]/_hooks/use-chat.tsx
. The parameter is imported but not used in the handler logic (lines 14, 66-67).
This issue appears in multiple locations:
- apps/web/client/src/app/project/[id]/_hooks/use-chat.tsx: Lines 14-14
- apps/web/client/src/app/project/[id]/_hooks/use-chat.tsx: Lines 66-67
Please ensureCRAWL_URL_TOOL_PARAMETERS
is either used in the handler logic or remove the import to avoid clutter.
Talk to Kody by mentioning @kody
Was this suggestion helpful? React with 👍 or 👎 to help Kody learn from this interaction.
content: content, | ||
}, | ||
], | ||
tools: { [CRAWL_URL_TOOL_NAME]: chatToolSet[CRAWL_URL_TOOL_NAME] as any }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tools: { [CRAWL_URL_TOOL_NAME]: chatToolSet[CRAWL_URL_TOOL_NAME] as any /* TODO: Replace 'any' with the specific type for this tool if known. The original suggestion's 'SpecificToolType' was a placeholder. */ },
Multiple instances of using any
type in TypeScript code, which bypasses type checking and reduces maintainability. In apps/web/client/src/app/api/chat/route.ts
, lines 26 and 46 use as any
and (m: any)
respectively.
This issue appears in multiple locations:
- apps/web/client/src/app/api/chat/route.ts: Lines 26-26
- apps/web/client/src/app/api/chat/route.ts: Lines 46-46
Please replaceany
with specific types or interfaces to maintain type safety. If the type is complex or dynamic, add a comment explaining the rationale for usingany
.
Talk to Kody by mentioning @kody
Was this suggestion helpful? React with 👍 or 👎 to help Kody learn from this interaction.
} else if (toolName === CRAWL_URL_TOOL_NAME) { | ||
return 'Web content crawled successfully.'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else if (toolName === CRAWL_URL_TOOL_NAME) {
const params = toolCall.args as z.infer<typeof CRAWL_URL_TOOL_PARAMETERS>; // Assuming parameters are parsed and used
return await handleCrawlUrlTool(params, editorEngine); // Call to a new dedicated handler function
}
The logic for CRAWL_URL_TOOL_NAME
is implemented inline within handleToolCall
. Other tools in this function (e.g., LIST_FILES_TOOL_NAME
, READ_FILES_TOOL_NAME
) delegate to separate handler functions like handleListFilesTool
. For consistency, improved readability, and easier maintenance (especially if the crawl logic becomes complex), consider extracting the CRAWL_URL_TOOL_NAME
handling into its own dedicated async function, similar to the pattern used for other tools.
Talk to Kody by mentioning @kody
Was this suggestion helpful? React with 👍 or 👎 to help Kody learn from this interaction.
const firstItem = response.data[0]; | ||
return ( | ||
typeof firstItem === 'object' && | ||
firstItem !== null && | ||
('html' in firstItem || 'markdown' in firstItem) && | ||
(firstItem.html === undefined || typeof firstItem.html === 'string') && | ||
(firstItem.markdown === undefined || typeof firstItem.markdown === 'string') | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This loop executes only if response.data is not empty,
// due to the check on lines 48-50.
for (const item of response.data) {
if (
!(
typeof item === 'object' &&
item !== null &&
('html' in item || 'markdown' in item) &&
(item.html === undefined || typeof item.html === 'string') &&
(item.markdown === undefined || typeof item.markdown === 'string')
)
) {
return false; // Found an invalid item
}
}
return true; // All items are valid
The validateCrawlerResponse
function currently only validates the structure of the first item in the response.data
array. If response.data
contains multiple items, subsequent items are not validated. This could lead to runtime errors if other items in the array do not conform to the expected { html?: string; markdown?: string; }
structure, as the function would incorrectly return true
. Consider iterating over all items in response.data
to ensure each one meets the validation criteria.
Talk to Kody by mentioning @kody
Was this suggestion helpful? React with 👍 or 👎 to help Kody learn from this interaction.
} | ||
return response; | ||
} catch (error) { | ||
console.error('Error during crawling:', error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Let the caller handle logging and any error wrapping
throw error;
The crawlUrl
method currently logs errors to console.error
before re-throwing them. While helpful for debugging, this direct logging can make the service harder to integrate into applications with specific logging strategies. Consider removing the console.error
call and letting the caller decide how to handle/log the thrown error. This improves modularity and makes the service more versatile.
Talk to Kody by mentioning @kody
Was this suggestion helpful? React with 👍 or 👎 to help Kody learn from this interaction.
@@ -30,8 +31,44 @@ export const onlookInstructionsTool = tool({ | |||
parameters: z.object({}), | |||
}); | |||
|
|||
export const CRAWL_URL_TOOL_NAME = 'crawl_url'; | |||
export const CRAWL_URL_TOOL_PARAMETERS = z.object({ | |||
urls: z.array(z.string()).describe('Array of URLs to crawl'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urls: z.array(z.string().url()).describe('Array of URLs to crawl'),
The urls
parameter is described as 'Array of URLs to crawl', but the schema z.array(z.string())
allows any string, not just valid URLs. To ensure data integrity and prevent potential downstream errors in CrawlerService
if it expects valid URL formats, consider using z.string().url()
.
Talk to Kody by mentioning @kody
Was this suggestion helpful? React with 👍 or 👎 to help Kody learn from this interaction.
Description
Related Issues
Type of Change
Testing
Screenshots (if applicable)
Additional Notes
This pull request introduces a new integration with the Firecrawl API to enhance the web crawling capabilities of the kodustech/onlook project. The key changes include:
Chat API Enhancements:
processUrls
function is added to the chat API endpoint inapps/web/client/src/app/api/chat/route.ts
. This function processes embedded URLs in user messages using an AI model, modifying the last user message before further processing. New imports for URL extraction and AI tooling are included, along with logic in thePOST
handler to invoke this function.AI Tool Handling:
apps/web/client/src/app/project/[id]/_hooks/use-chat.tsx
, handling for a new AI tool,CRAWL_URL_TOOL_NAME
, is introduced. This includes importing necessary constants and adding a conditional block in the tool dispatcher, currently as a placeholder.Crawler Service Implementation:
CrawlerService
module is added inpackages/ai/src/tools/crawler.ts
for web crawling using the Firecrawl API. It includes TypeScript interfaces for crawl options, API responses, and crawled content structure. A validation function checks the integrity of crawler responses. TheCrawlerService
class is implemented as a singleton, handling API key initialization and providing anasync crawlUrl
method with error handling.Utility Function for URL Extraction:
extractUrls
is introduced inpackages/ai/src/tools/helpers.ts
to parse text strings, identify and validate various forms of URLs, and return a unique set of valid URLs.AI Toolset Expansion:
crawlUrlTool
is added to the AI toolset inpackages/ai/src/tools/index.ts
, defining its parameters with Zod for URL crawling and specifying options like crawl limits and content formats.Chat Message Context Extension:
packages/models/src/chat/message/context.ts
by adding a new type,LinkMessageContext
, to represent hyperlinks. This involves updating theMessageContextType
enum and including the new type in theChatMessageContext
union type.These changes collectively enhance the project's ability to process and analyze web content through improved URL handling and web crawling functionalities.