Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try/improve URI detection in <LinkControl /> #19816

Closed
wants to merge 2 commits into from

Conversation

getdave
Copy link
Contributor

@getdave getdave commented Jan 22, 2020

Description

As part of #19775 it's important that URIs are matched more carefully in LinkControl. This PR introduces a new method into @wordpress/url calledisURI() (not to be confused with isURL) which is then utilised in LinkControl to determine whether to return results for:

  1. Entity searches (eg: Posts, Pages...etc)
  2. Direct Entry URLs

URI matching is a tricky business so this PR relies on an external package ported from a PEARL module which is well tested:

Replicates the functionality of Richard Sonnen sonnen@richardsonnen.com perl module : http://search.cpan.org/~sonnen/Data-Validate-URI-0.01/lib/Data/Validate/URI.pm full code here into a nodejs module. Translated practically line by line from perl. It passes all the original tests.
https://github.com/ogt/valid-url#readme

Note that despite being comprehensive in other areas, the @wordpress/url package doesn't have a general util to check for a "valid URI" so this is why we're introducing one.

If accepted we will need to write unit tests around URI matching to provide our own coverage against the package.

How has this been tested?

Manual for now. Unit tests will be introduced.

Screenshots

Types of changes

New feature (non-breaking change which adds functionality)

Checklist:

  • My code is tested.
  • My code follows the WordPress code style.
  • My code follows the accessibility standards.
  • My code has proper inline documentation.
  • I've included developer documentation if appropriate.
  • I've updated all React Native files affected by any refactorings/renamings in this PR. .

Currently @wordpress/url doesn’t have a standard check for “anything that is a valid url”. This packages provides a well tested version
Previously the matching was very loose which lead to a lot of false positives and confusion. Now some stricter criteria are applied to check for valid URIs, internal links or (by request) anything starting with `www.`.
@getdave getdave added the [Package] Url /packages/url label Jan 22, 2020
@getdave getdave self-assigned this Jan 22, 2020
@getdave getdave changed the title Try/improve link control URI detection Try/improve URI detection in <LinkControl /> Jan 22, 2020
Copy link
Member

@WunderBart WunderBart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like something that could be achieved using the native URL interface. I know we can't just use it because of IE11 support, but maybe it would make sense to include the polyfill, e.g. whatwg-url? It's a pretty useful thing to have at hand.

cc @youknowriad

Edit:
I created an example PR of how this could be achieved using the whatwg-url package here: #19819.

@aduth
Copy link
Member

aduth commented Jan 22, 2020

This looks like something that could be achieved using the native URL interface. I know we can't just use it because of IE11 support, but maybe it would make sense to include the polyfill, e.g. whatwg-url? It's a pretty useful thing to have at hand.

Coincidentally, I have a local branch on my machine which aims to refactor the existing url module to use the native URL constructor. As you noted, it's something we'd need to polyfill, but we do have prior art for this sort of thing. See also #13386 and related discussion in Automattic/wp-calypso#37606.

In how it ties to the needs of this pull request, I do think it's a case where we may not actually need a new function isURI, but rather improve (moreso "fix", in my view) the existing isURL to be more tolerant about the sorts of strings it accepts.

For example, I like this idea for a way to check if a URL is valid:

function isURL( value ) {
	try {
		new URL( value );
		return true;
	} catch {
		return false;
	}
}

This would accept many more inputs as considered valid URLs, including the one in the example given for the new isURI.

isURL( 'mailto:hello@wordpress.org' );
// true

The differences in considering what specifically is and isn't a "URL" vs. a "URI" are a bit confusing, so I'd welcome other thoughts, but my interpretation is that the above is an expected output, at least in normalizing toward one (where "URL" already has precedent of being used).

Related:

  • RFC 6068 ("The mailto URI scheme")
  • RFC 2368 ("The mailto URL scheme", superseded by RFC 6068)
  • RFC 3986
  • RFC 3305
    • Over time, the importance of this additional level of hierarchy seemed to lessen; the view became that an individual scheme did not need to be cast into one of a discrete set of URI types, such as "URL", "URN", "URC", etc. and understandings of the specific distinctions between "URI" and "URL" is that this would be an expected output.

  • URL Living Standard
    • Standardize on the term URL. URI and IRI are just confusing. In practice a single algorithm is used for both so keeping them distinct is not helping anyone. URL also easily wins the search result popularity contest.

@aduth
Copy link
Member

aduth commented Jan 22, 2020

Coincidentally, I have a local branch on my machine which aims to refactor the existing url module to use the native URL constructor. As you noted, it's something we'd need to polyfill, but we do have prior art for this sort of thing. See also #13386 and related discussion in Automattic/wp-calypso#37606.

I opened #19823 which includes a polyfill for the URL constructor. It's not exactly what I'd been working on (since @wordpress/url refactor is a bit more involved), but would unblock this pull request if we decide to try to use URL constructor.

Another option is to simply cherry-pick the commit e2c0f74 from that branch.

@getdave
Copy link
Contributor Author

getdave commented Jan 23, 2020

@aduth @WunderBart Thanks for your input and PRs. I'm going to close this PR in favour of your improved versions.

I agree with @aduth that we ought to be looking at improving isURL() so it matches more widely. I'd sound a note of caution regarding the fact that this function has up until now been only matching http and so broadening that scope could cause knock-on effects (this is why I created a new method).

I think based on URL Living Standard we avoid URI in favour of URL and move towards better checking.

This is critical for #19775 so I'm happy to help move this forward a-pace.

@aduth
Copy link
Member

aduth commented Jan 24, 2020

I agree with @aduth that we ought to be looking at improving isURL() so it matches more widely.

Follow-up at: #19871

@aristath aristath deleted the try/improve-link-control-url-detection branch November 10, 2020 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Package] Url /packages/url
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants