Skip to content

Add more file importers to JabRef #13310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

InAnYan
Copy link
Member

@InAnYan InAnYan commented Jun 12, 2025

Changes made:

  • Add Apache Tika dependencies.
  • Add imports for ODF documents (ODT, ODS, ODP).
  • Add tests for importers

Steps to test

  1. Run Java tests.
  2. Import ODF files.

Mandatory checks

  • I own the copyright of the code submitted and I license it under the MIT license
  • Change in CHANGELOG.md described in a way that is understandable for the average user (if change is visible to the user)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • [/] Screenshots added in PR description (if change is visible to the user)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@InAnYan InAnYan changed the title feat(importers): add importers for ODF files Add importers for ODF files Jun 12, 2025
@InAnYan
Copy link
Member Author

InAnYan commented Jun 12, 2025

Currently having errors while compiling:

Could not determine the dependencies of task ':jablib:test'.
> Could not resolve all dependencies for configuration ':jablib:testRuntimeClasspath'.
   > Could not resolve commons-logging:commons-logging:1.3.5.
     Required by:
         project :jablib
         project :jablib > org.apache.pdfbox:pdfbox:3.0.5
      > Module 'commons-logging:commons-logging' has been rejected:
           Cannot select module with conflict on capability 'org.gradlex:commons-logging-impl:1.0' also provided by [org.slf4j:jcl-over-slf4j:2.0.17(runtime)]
   > Could not resolve org.slf4j:jcl-over-slf4j:2.0.17.
     Required by:
         project :jablib > org.apache.tika:tika-parsers-standard-package:3.2.0
         project :jablib > org.slf4j:slf4j-api:2.0.17 > org.slf4j:slf4j-bom:2.0.17
      > Module 'org.slf4j:jcl-over-slf4j' has been rejected:
           Cannot select module with conflict on capability 'org.gradlex:commons-logging-impl:1.0' also provided by [commons-logging:commons-logging:1.3.5(runtime)]
> There are 3 more failures with identical causes.

@jabref-machine
Copy link
Collaborator

Your code currently does not meet JabRef's code guidelines. We use Checkstyle to identify issues. You can see which checks are failing by locating the box "Some checks were not successful" on the pull request page. To see the test output, locate "Tests / Checkstyle (pull_request)" and click on it.

In case of issues with the import order, double check that you activated Auto Import. You can trigger fixing imports by pressing Ctrl+Alt+O to trigger Optimize Imports.

Please carefully follow the setup guide for the codestyle. Afterwards, please run checkstyle locally and fix the issues, commit, and push.

@Siedlerchr
Copy link
Member

exclude commons logging from the dependency

@InAnYan
Copy link
Member Author

InAnYan commented Jun 12, 2025

@jabref-machine
Copy link
Collaborator

Your pull request needs to link an issue correctly.

To ease organizational workflows, please link this pull-request to the issue with syntax as described in https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue:

Linking a pull request to an issue using a keyword

You can link a pull request to an issue by using a supported keyword in the pull request's description or in a commit message.

Examples

  • Fixes #xyz links pull-request to issue. Merging the PR will close the issue.
  • Fixes https://github.com/JabRef/jabref/issues/xyz links pull-request to issue. Merging the PR will close the issue.
  • Fixes https://github.com/Koppor/jabref/issues/xyz links pull-request to issue. Merging the PR will close the issue.
  • Fixes [#xyz](https://github.com/JabRef/jabref/issues/xyz) links pull-request to issue. Merging the PR will NOT close the issue.

@jabref-machine
Copy link
Collaborator

JUnit tests of jablib are failing. You can see which checks are failing by locating the box "Some checks were not successful" on the pull request page. To see the test output, locate "Tests / Unit tests (pull_request)" and click on it.

You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.

@InAnYan InAnYan mentioned this pull request Jun 16, 2025
6 tasks
@InAnYan InAnYan changed the title Add importers for ODF files Add more file importers to JabRef Jun 16, 2025

@Override
protected void extractAdditionalMetadata(BibEntry entry, TikaMetadataParser metadataParser) {
entry.setType(BiblatexNonStandardTypes.Image);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using setter method instead of the recommended wither pattern for BibEntry modifications. Should use withType() method to maintain immutability principles.

protected void extractAdditionalMetadata(BibEntry entry, TikaMetadataParser metadataParser) {
List<String> authors = ListUtil.concat(metadataParser.getDcCreators(), metadataParser.getDcContributors());

entry.setField(StandardField.AUTHOR, TikaMetadataParser.formatBibtexAuthors(authors));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using setField instead of withField violates the immutability principle. Should use entry.withField(StandardField.AUTHOR, ...) for better consistency.

import org.apache.tika.metadata.Property;

public class TikaMetadataParser {
private final static Pattern imageDatePattern = Pattern.compile("(year|month|day|hour|minute|second)=(\\d+)");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect modifier order. According to Java conventions, 'static' should come before 'final'. Should be 'private static final Pattern'.

Comment on lines +52 to +54
/**
* Concatenate two {@link List}s. Does not modify the original lists.
*/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JavaDoc comment is trivial and simply restates what is obvious from the method signature and name. It doesn't provide any additional information or reasoning about the implementation.

Copy link

trag-bot bot commented Jun 17, 2025

@trag-bot didn't find any issues in the code! ✅✨

@@ -181,6 +181,33 @@ dependencies {

implementation("de.rototor.snuggletex:snuggletex-jeuclid")

// region for document importing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rework with the new scheme.

But the concrete versions at https://github.com/JabRef/jabref/blob/main/versions/build.gradle.kts


requires javafx.base;
requires javafx.graphics; // because of javafx.scene.paint.Color
requires afterburner.fx;
requires com.tobiasdiez.easybind;

// for java.awt.geom.Rectangle2D required by org.jabref.logic.pdf.TextExtractor
requires java.desktop;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is org.jabref.logic.pdf.TextExtractor gone? Then also delete the line before.

@@ -106,16 +106,14 @@
exports org.jabref.logic.git;
exports org.jabref.logic.pseudonymization;
exports org.jabref.logic.citation.repository;

requires java.base;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, really, no java.base any more?

@@ -252,5 +250,7 @@
requires mslinks;
requires org.antlr.antlr4.runtime;
requires org.libreoffice.uno;
requires org.apache.tika.core;
requires org.jetbrains.annotations;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -49,6 +49,10 @@ public abstract class Importer implements Comparable<Importer> {
* @throws IOException Signals that an I/O exception has occurred.
*/
public boolean isRecognizedFormat(Path filePath) throws IOException {
if (!Files.exists(filePath) || !Files.isRegularFile(filePath)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter includes the former, doesn't it?

Suggested change
if (!Files.exists(filePath) || !Files.isRegularFile(filePath)) {
if (!Files.isRegularFile(filePath)) {

import org.jabref.logic.util.FileType;
import org.jabref.logic.util.StandardFileType;

public class DjvuImporter extends TikaImporter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some refernce to the format would be nice.

import org.jabref.logic.util.FileType;
import org.jabref.logic.util.StandardFileType;

public class EpubImporter extends TikaImporter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some refernce to the format would be nice.

(Even if it self-explanatory here; but mayb e some more non-obvious onformatoin can be found there)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants