Open
Description
JabRef version
Other (please describe below)
Operating system
GNU / Linux
Details on version and operating system
JabRef 5.16--2024-07-25--771c4cd Linux 6.12.20-2-manjaro amd64 Java 21.0.2 JavaFX 22.0.2+4
Checked with the latest development build (copy version output from About dialog)
- I made a backup of my libraries before testing the latest development version.
- I have tested the latest development version and the problem persists
Steps to reproduce the behaviour
JabRef 5.16
There is a problem with the text parser which is changing the citations completely into a not related citations.
I created a test case see two attached files.
How come this totally different citations are matched? Is there a way to parse the strings without the use of grobid? As I think blindly trusting grobid is wrong. At least verify the whole title string would already help to see something is not right.
More test cases
At https://github.com/inukshuk/anystyle/blob/main/spec/benchmark.rb, anystyle has the following tests for benchmarking:
data = <<-END_REFERENCES <author> A. Cau, R. Kuiper, and W.-P. de Roever. </author> <title> Formalising Dijkstra's development strategy within Stark's formalism. </title> <editor> In C. B. Jones, R. C. Shaw, and T. Denvir, editors, </editor> <container-title> Proc. 5th. BCS-FACS Refinement Workshop, </container-title> <date> 1992. </date> <author> M. Kitsuregawa, H. Tanaka, and T. Moto-oka. </author> <title> Application of hash to data base machine and its architecture. </title> <journal> New Generation Computing, </journal> <volume> 1(1), </volume> <date> 1983. </date> <author> Alexander Vrchoticky. </author> <title> Modula/R language definition. </title> <tech> Technical Report TU Wien rr-02-92, version 2.0, </tech> <institution> Dept. for Real-Time Systems, Technical University of Vienna, </institution> <date> May 1993. </date> <author> Marc Shapiro and Susan Horwitz. </author> <title> Fast and accurate flow-insensitive points-to analysis. </title> <container-title> In Proceedings of the 24th Annual ACM Symposium on Principles of Programming Languages, </container-title> <date> January 1997. </date> <author> W. Landi and B. G. Ryder. </author> <title> Aliasing with and without pointers: A problem taxonomy. </title> <institution> Center for Computer Aids for Industrial Productivity </institution> <tech> Technical Report CAIP-TR-125, </tech> <institution> Rutgers University, </institution> <date> September 1990. </date> <author> W. H. Enright. </author> <title> Improving the efficiency of matrix operations in the numerical solution of stiff ordinary differential equations. </title> <journal> ACM Trans. Math. Softw., </journal> <volume> 4(2), </volume> <pages> 127-136, </pages> <date> June 1978. </date> <author> Gmytrasiewicz, P. J., Durfee, E. H., & Wehe, D. K. </author> <date> (1991a). </date> <title> A decision theoretic approach to coordinating multiagent interaction. </title> <container-title> In Proceedings of the Twelfth International Joint Conference on Artificial Intelligence, </container-title> <pages> pp. 62-68 </pages> <location> Sydney, Australia. </location> <author> A. Bookstein and S. T. Klein, </author> <title> Detecting content-bearing words by serial clustering, </title> <container-title> Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, </container-title> <pages> pp. 319327, </pages> <date> 1995. </date> <author> U. Dayal, H. Garcia-Molina, M. Hsu, B. Kao, and M.- C. Shan. </author> <title> Third generation TP monitors: A database challenge. </title> <container-title> In ACM SIGMOD Conference on Management of Data, </container-title> <pages> pages 393-397, </pages> <location> Washington, D. C., </location> <date> May 1993. </date> <author> C. Qiao and R. Melhem, </author> <title> "Reducing Communication Latency with Path Multiplexing in Optically Interconnected Multiprocessor Systems", </title> <container-title> Proc. of HPCA-1, </container-title> <date> 1995. </date> END_REFERENCES
We could re-use those for our RegEx tests.