-
-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
instruct search engine not to crawl /catalog folder #3585
Comments
juanluisrp
added a commit
to GeoCat/core-geonetwork
that referenced
this issue
Sep 9, 2023
* Some search engines are indexing the Angular JS HTML template files. Exclude the ${context_path}/catalog and ${context_path}/static from being crawled by robots. Fix geonetwork#3585.
juanluisrp
added a commit
that referenced
this issue
Sep 11, 2023
…vided (#7327) * Return 200 OK for robots.txt and sitemap. Return 200 OK for /robots.xt and /srv/api/sitemap instead of 500 "service not found". Previously if the request didn't contain "Accept: text/plain" or "Accept: application/xml" the server returned a 500 error. Now the server accepts any "Accept" header without complaining, returning a 200 response with "Content-Type: text/plain" or "Content-Type: application/xml" and the right content. * Disallow /catalog and /static in robots.txt. Some search engines are indexing the Angular JS HTML template files. Exclude the ${context_path}/catalog and ${context_path}/static from being crawled by robots. Fix #3585.
juanluisrp
added a commit
that referenced
this issue
Sep 11, 2023
…s provided (#7327) * Return 200 OK for robots.txt and sitemap. Return 200 OK for /robots.xt and /srv/api/sitemap instead of 500 "service not found". Previously if the request didn't contain "Accept: text/plain" or "Accept: application/xml" the server returned a 500 error. Now the server accepts any "Accept" header without complaining, returning a 200 response with "Content-Type: text/plain" or "Content-Type: application/xml" and the right content. * Disallow /catalog and /static in robots.txt. Some search engines are indexing the Angular JS HTML template files. Exclude the ${context_path}/catalog and ${context_path}/static from being crawled by robots. Fix #3585.
juanluisrp
added a commit
that referenced
this issue
Sep 11, 2023
…s provided (#7327) * Return 200 OK for robots.txt and sitemap. Return 200 OK for /robots.xt and /srv/api/sitemap instead of 500 "service not found". Previously if the request didn't contain "Accept: text/plain" or "Accept: application/xml" the server returned a 500 error. Now the server accepts any "Accept" header without complaining, returning a 200 response with "Content-Type: text/plain" or "Content-Type: application/xml" and the right content. * Disallow /catalog and /static in robots.txt. Some search engines are indexing the Angular JS HTML template files. Exclude the ${context_path}/catalog and ${context_path}/static from being crawled by robots. Fix #3585.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Reproduce:
Search on google for "mdView.current.record.title"
-> you'll find hits from templates in various gn implementations
Fix it by excluding
/catalog
folder from google indexing inrobots.txt
Would be good to hide also other pages for crawler, such as
/doc/api
Note that
robots.txt
included in gn will only work if you deploy geonetwork at a root folder. If not, then create your own robots.txt in root folder. In that scenario include/geonetwork
in the disallowed path.Read more at https://webmasters.stackexchange.com/questions/89395/can-robots-txt-be-in-a-servers-sub-directory and http://www.robotstxt.org
The text was updated successfully, but these errors were encountered: