Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hints to random access availability to archives #708

Open
vpenades opened this issue Nov 28, 2022 · 11 comments
Open

Add hints to random access availability to archives #708

vpenades opened this issue Nov 28, 2022 · 11 comments

Comments

@vpenades
Copy link
Contributor

vpenades commented Nov 28, 2022

Following #707

I've noticed you really don't know if you need to read sequentially or you can access randomly until you have opened the archive... correct me if I'm wrong:

Plain .TAR files can actually be read randomly.... but tar archives inside a gzip archive can't

So, if that's the case, the SupportsRandomAccess property should go into the IArchive, not in the IArchiveFactory interface, right?

Also, wouldn't the IsSolid property would be enough for this? (in that case, IsSolid should be true for .tar.gz files, I've checked and currently it's false)

If IsSolid serves a different purpose, then, IArchive definitely needs a SupportsRandomAccess

Any thoughts?

@adamhathcock
Copy link
Owner

IsSolid means a specific thing for RAR.

On Streams there's IsSeekable which basically means what we want but we need it on IArchive. An alternative is just not to have IArchive interfaces for non-seekable situations and tell people to use IReader

@vpenades
Copy link
Contributor Author

Got it... I'll do a new PR with that knowledge

@vpenades
Copy link
Contributor Author

vpenades commented Nov 28, 2022

Hmm.. having a hard time exposing the IsSeekable to IArchive .... opening a tar.gz reports IsSeekable to true in all the streams I can see around.

Anyway, the problem seems to be more tricky to handle;

On one side, most archives, including plain TAR can be opened as IArchive and accessed randomly.

TAR.XX is a special case and needs to be opened using an IReader.... because if you try to go through the IArchive path, it's just a Gzip with a single entry.

To understand the problem, what I am trying to do is a general archive reader, that relies on IArchiveFactory and does not know about the specifics of each archive format, and, whenever possible, try to use the random access, and only when that's not possible, to fall back to IReader.... but I would need a way to know which archives support and not support it.

The alternative is to also provide public static IReader Open(Stream stream, ReaderOptions? options = null) to IArchiveFactory

@adamhathcock
Copy link
Owner

IReaderFactory has that Open I think.

You don't want to put stream's IsSeekable on IArchive. You want to return true/false based on the archive/compression format. File streams are always seekable but decompressing files is usually not. Zip/Rar has individual files compressed so can seek. TarGz/TarBz are one continuous compression so they're not seekable.

@vpenades
Copy link
Contributor Author

vpenades commented Nov 28, 2022

yes, it's ReaderFactory the one that has that Open... so I think an IReaderFactory interface is needed, in the same way that I introduced IArchiveFactory... so the spaguetty code can be removed and new readers can be registered.

If I follow that path, what I would like to avoid is having to register factories at both Archive* and Reader* so it could be goo to have a single factory class for each archive type, implementing both IArchiveFactory and IReaderFactory

Maybe a Factory folder, and moving all ZipArchiveFactory to it, renaming ot ZipFactory, etc?

@adamhathcock
Copy link
Owner

that's not bad idea to just have singular factory classes or something to consolidate things

@vpenades
Copy link
Contributor Author

vpenades commented Nov 29, 2022

So, SevenZip doesn't have an implementation in the readers factory? is there a reason for it?

@vpenades
Copy link
Contributor Author

Added a PR: #709

@adamhathcock
Copy link
Owner

So, SevenZip doesn't have an implementation in the readers factory? is there a reason for it?

This is because 7Zip requires random access to a file from my memory. The streams need to seek around to properly find headers and decompress the streams in the format. Readers only work for non-seekable streams.

@vpenades
Copy link
Contributor Author

@adamhathcock expanding this topic a bit further: which would be the recomended way to open archives in a generic way?

What I'm trying to achieve is to traverse a number of directories, containing all sorts of archives (zip,rar, 7z, etc) open them and scan their content.

@adamhathcock
Copy link
Owner

You'll have to implement that yourself if you can't guarantee Reader. It's beyond the scope of the library.

I've been away for personal reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants