Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce filename length #499

Closed
skymoo opened this issue Sep 2, 2020 · 28 comments
Closed

Reduce filename length #499

skymoo opened this issue Sep 2, 2020 · 28 comments

Comments

@skymoo
Copy link

skymoo commented Sep 2, 2020

I am using gocryptfs to reverse encrypt some files and then sync the resulting encrypted files to OneDrive for remote backup. Everything seems to be working fine but when I sync to OneDrive using rclone I get a few errors of the form:

2020-09-01 21:54:13 ERROR : 2Ddc9ywAOFEbIdlgPXI19Q/ZUyJffdGZg_dBIZ-0zuFVg/42PW1SMQDlCuh68QjKLUeEBsTRCsnCSWSMO7606Kaq4/fOOjbi8sU9D8N2VQz1ZLtHFIz9iBL1BpZmVgRHEzL50oxuTvstwTezFRvSVRFs2JuF4UsKicdQMXzaUcBqgQ6Q/lP_OT__s6GQ_CkoghPVuKtQfbWeaP1iGunypwq-BpSWBn4D_hQPM9fdNhhXjzd2Lv7oG__zNDr8N4O9Kt4y4TGlx0u-aawIDoG2E-LzMwbp_mSz5Mw1jEmGTwvSXeKc2SFXuhcoaNHFIEidTbLBUZV7UzfVdyUshSF1xYhK3XmiDB4lqxV16RzLVYLWditAo: Failed to copy: pathIsTooLong: Path is too long

I'm currently using the default feature flags as applied with the -reverse option, i.e.:

        "FeatureFlags": [
                "GCMIV128",
                "HKDF",
                "DirIV",
                "EMENames",
                "LongNames",
                "Raw64",
                "AESSIV"
        ]

Is there a way to make the encrypted filenames shorter?

I'm using gocryptfs-1.8.0 as packaged in Fedora 32.

@rfjakob rfjakob added this to the v2.1 milestone Oct 6, 2020
@rfjakob
Copy link
Owner

rfjakob commented Aug 17, 2021

Hmm, looking at this in detail, I don't understand why you get this error. The mentioned path

2Ddc9ywAOFEbIdlgPXI19Q/ZUyJffdGZg_dBIZ-0zuFVg/42PW1SMQDlCuh68QjKLUeEBsTRCsnCSWSMO7606Kaq4/fOOjbi8sU9D8N2VQz1ZLtHFIz9iBL1BpZmVgRHEzL50oxuTvstwTezFRvSVRFs2JuF4UsKicdQMXzaUcBqgQ6Q/lP_OT__s6GQ_CkoghPVuKtQfbWeaP1iGunypwq-BpSWBn4D_hQPM9fdNhhXjzd2Lv7oG__zNDr8N4O9Kt4y4TGlx0u-aawIDoG2E-LzMwbp_mSz5Mw1jEmGTwvSXeKc2SFXuhcoaNHFIEidTbLBUZV7UzfVdyUshSF1xYhK3XmiDB4lqxV16RzLVYLWditAo

is 369 characters long. According to https://support.microsoft.com/en-us/office/restrictions-and-limitations-in-onedrive-and-sharepoint-64883a5d-228e-48f5-b3d2-eb39e07630fa#filenamepathlengths :

The entire decoded file path, including the file name, can't contain more than 400 characters

@rfjakob rfjakob removed this from the v2.1 milestone Aug 17, 2021
@Satlinker
Copy link

Satlinker commented Aug 20, 2021

@rfjakob
Copy link
Owner

rfjakob commented Sep 14, 2021

This contradicts the microsoft article

@page-down
Copy link

page-down commented Sep 16, 2021

Please note the definition of path length in the documentation. For example, for Sharepoint paths, the longer the company URL domain name and site name, the shorter the actual length the user can upload.

Take the example in the documentation:

The prefix has taken up 38 characters and is counted in the total path length.

personal/meganb_contoso_com/Documents/

This means that there are only 362 characters left to use.

In my actual tests, I can only upload path lengths up to 342 characters under my account.


Although gocryptfs shortens long filenames and keeps track of the actual filenames in a separate file, the names are still too long.

Cryptomator also shortens filenames, but makes it so that conversion starts as soon as the encrypted file name reaches 220 characters (1 ASCII character = 1 byte). Whereas gocryptfs only does conversions when the filename exceeds 255 bytes.

https://docs.cryptomator.org/en/1.5/security/architecture/#name-shortening

However there is still the problem of the whole path being too long.

@BrsyRockSs
Copy link

https://www.boxcryptor.com/en/blog/post/our-encrypted-filename-encoding-is-now-open-source/

@page-down
Copy link

page-down commented Sep 18, 2021

Thank you for referring to this solution.

The above mentioned approach uses Unicode characters with printable ranges for the file names.

In my testing, using Unicode characters for encoding effectively addresses the file name and path length limitations of macOS AFPS, Windows NTFS file systems, and some cloud vendors (specifically, Microsoft OneDrive).

  • APFS (macOS / iOS)

    • name: 255 UTF-8 characters
    • path: no limit
    • notes: macOS has a limit of 1024 bytes PATH_MAX
  • NTFS (Windows)

    • name: 255 UTF-16 characters
    • path:
      • default: 260 UTF-16 characters
      • opt-in Long Path: 32767 UTF-16 characters

For OneDrive, the path length is calculated as the length of the decoded Unicode character, so it can easily carry more information.

In my tests, the problem of long file names can be solved in this way and works with local file systems and cloud storage providers that support Unicode characters.

@litori
Copy link

litori commented Sep 29, 2021

I have too encountered issues with file names being too long and not working in gocryptfs in macOS. This happens a lot when the file names are in foreign languages. If the solution above solves it, that is great news.

@dumblob
Copy link

dumblob commented Oct 16, 2021

Yep, the boxcryptor's approach of using 4000 carefully chosen unicode chars seems viable.

@rfjakob
Copy link
Owner

rfjakob commented Oct 17, 2021

The boxcryptor thing is windows-only, won't use that one :)

@dumblob
Copy link

dumblob commented Oct 17, 2021

The boxcryptor thing is windows-only, won't use that one :)

There is a smiley, so this is probably some jokey nudge. Let's get more to the point then - is there any technical concern why the "higher density encoding" (e.g. using those 4000 carefully chosen characters) couldn't be used by gocryptfs?

@rfjakob
Copy link
Owner

rfjakob commented Oct 17, 2021

Essentially, the boxcryptor trick works because on windows, the length limit is is counted in characters. That means you can have as many chinese characters as you could have ASCII characters.

That is not the case on linux, where the limit is counted in bytes. And a chinese character takes more bytes than an ASCII one, which eats up the gains of having a larger alphabet.

@dumblob
Copy link

dumblob commented Oct 17, 2021

Yes, this is all clear. Though my intuition says it wouldn't eat up the gains on Linux (simply because Linux filesystems APIs have generaly much higher limits than Windows filesystems API) and second it would solve the OP's problem.

@rfjakob
Copy link
Owner

rfjakob commented Oct 19, 2021

My plan of action is is to add a command-line parameter to -init called something like -longnamemax.

gocryptfs already stores names that are longer than 255 bytes in a .name file (see https://nuetzlich.net/gocryptfs/forward_mode_crypto/#long-file-name-handling ). The -longnamemax parameter will allow to change the value from 255 to something lower (100 for example). This will guarantee that each path component is at most 100 bytes long.

@dumblob
Copy link

dumblob commented Oct 19, 2021

@rfjakob does this mean many more people would need to back up also .name in each directory along with gocryptfs.conf to avoid accidental loss of access? Isn't the trick with chinese chars a much better heuristics without such risks and high probability of solving most practical cases (sure, it just "delays" the problem - but I'd guess it surpasses the "critical" threshold of practical cases, so it's much more worth it)?

@BrsyRockSs
Copy link

If you finally choose to add the -longnamemax parameter to solve the problem, can you provide more options for this parameter, for example, you can add a number after the parameter to define the limit
This may prevent the length of the path from exceeding the limit in some special cases

such as
-longnamemax 20
-longnamemax 55

@rfjakob
Copy link
Owner

rfjakob commented Oct 20, 2021

@dumblob
(1) one .name file for each file with a long name
(2) I won't consider the chinese char trick, as it only works on Windows

@BrsyRockSs Yes exactly, -longnamemax NUMBER. Note that values below 67 do not make sense, as a .name file is 67 bytes long by itself. Looks like this:

gocryptfs.longname.nONaEDDZOrwtQdXPH1SxSFkPtOc8srIyB82ZuduqG10.name

rfjakob added a commit that referenced this issue Oct 21, 2021
Determines when to start hashing long names instead
of hardcoded 255. Will be used to alleviate "name too long"
issues some users see on cloud storage.

#499
rfjakob added a commit that referenced this issue Oct 21, 2021
Feature flag + numeric paramater

#499
@bailey27
Copy link
Contributor

@rfjakob I started working on adding longnamemax to cppcryptfs.

I found that I still have this bug in cppcryptfs #143 that you fixed in gocryptfs years ago. I just wanted to point out that the gocryptfs man page still says 176 bytes where I think it should say 175.

I don't understand why the gocryptfs man page says the minimum value for LongNameMax is 62, but the .name files are 67 chars (with the .name extension). Shouldn't the minimum LongNameMax value be 67 to reflect that files with names that long will be created?

rfjakob added a commit that referenced this issue Nov 1, 2021
Quoting fusefrontend_reverse/node_helpers.go :

	// File names are padded to 16-byte multiples, encrypted and
	// base64-encoded. We can encode at most 176 bytes to stay below the 255
	// bytes limit:
	// * base64(176 bytes) = 235 bytes
	// * base64(192 bytes) = 256 bytes (over 255!)
	// But the PKCS#7 padding is at least one byte. This means we can only use
	// 175 bytes for the file name.

Noticed by @bailey27 at #499 (comment)
@rfjakob
Copy link
Owner

rfjakob commented Nov 1, 2021

@bailey27

(1) 175 vs 176: Yes, this should read 175, thanks! Fixed in d530fbd .

(2) 62 vs 67: Yes, as you have observed, the .name file is 67 bytes, example:

gocryptfs.longname.nONaEDDZOrwtQdXPH1SxSFkPtOc8srIyB82ZuduqG10           = 62 bytes
gocryptfs.longname.nONaEDDZOrwtQdXPH1SxSFkPtOc8srIyB82ZuduqG10.name      = 67 bytes

But, if you have nested directories, only the first file becomes part of the path. So with -longnamemax=62 you can get a shorter complete path. But the basename will still need 67 bytes.

@ccchan234
Copy link

@dumblob (1) one .name file for each file with a long name (2) I won't consider the chinese char trick, as it only works on Windows

@BrsyRockSs Yes exactly, -longnamemax NUMBER. Note that values below 67 do not make sense, as a .name file is 67 bytes long by itself. Looks like this:

gocryptfs.longname.nONaEDDZOrwtQdXPH1SxSFkPtOc8srIyB82ZuduqG10.name

hi, thx for the function,
yet if you (or others who can contribute) could suggest the -longnamemax valuve for laymen like me, at least for the big3: onedrive, box, dropbox would be nice. google support very long file name/path so it never be an issue.

or if possible mention the values in the manual (even list as unknown) etc.

thanks

@BrsyRockSs
Copy link

BrsyRockSs commented May 17, 2022

@ccchan234
Short file names can get short paths, but at the same time, more files need to be created Name file may cause additional performance burden on frequently accessed file systems. I think you should consider it in combination with the files you want to store
Is it better path compatibility or better access performance

@MartinMiller23

This comment was marked as spam.

@ccchan234
Copy link

ccchan234 commented Mar 30, 2024

@ccchan234 Short file names can get short paths, but at the same time, more files need to be created Name file may cause additional performance burden on frequently accessed file systems. I think you should consider it in combination with the files you want to store Is it better path compatibility or better access performance

my next PC is gonna faster than most supercomputers 5 years ago......

@Rickeymarchram

This comment was marked as spam.

@ccchan234
Copy link

Longpath files are too irritating - to make my life easier, I use LongPath Tool.

can you tell more?

even in win10 i enabled very very long path in registry, it's not enough for me. thanks

@ccchan234
Copy link

Longpath files are too irritating - to make my life easier, I use LongPath Tool.

Personal license
( non-commercial usage)

$59.97

with gpt4 i could duplicate many small programs in a day.

@Rickeymarchram

This comment was marked as spam.

@John-davis78

This comment was marked as spam.

@rfjakob
Copy link
Owner

rfjakob commented Apr 6, 2024

This "Long Path Tool" stuff is useless spam. Please don't feed the spammers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests