Juxta-CL Text Comparison Tool is Available
Juxta-CL is a command line text comparison tool based on the online JuxtaCommons tool, created for eMOP by Performant Software Solutions, to do ground-truth comparison for testing the accuracy of our OCR processes. It is now available open-source through our Github page. Have fun.
Juxta-CL is a command line text comparison tool based on the online JuxtaCommons tool. Juxta-CL was created by eMOP collaborators Performant Software Solutions to do ground-truth comparison for testing the accuracy of our OCR processes. It compares two separate pages of text and generates a score between 0 & 1 indicating the their correlation to each other. Juxta-CL can use one of several different distance algorithms for this purpose:
- Jaro-Winkler
- Levenshtein
- native Juxta compare
Juxta-CL also has command-line options to:
- ignore punctuation
- ignore case
- ignore end of line hyphenation
- normalize file encoding to UTF-8
Juxta-CL is now available as an open-source project from eMOP via our Github page under an Apache Software License, v2.0
Installing Juxta-CL
Juxta-CL is a java based tool and so can be run on any platform that is loaded with the Java SE Developer's Kit. To install Juxta-CL without building it yourself:
- Download and unzip the Juxta-CL.zip file.
- For Windows users, download/copy the juxta.bat file, and put it in your new Juxta-CL folder.
- type sh juxta.sh or juxta.bat for Juxta-CL help information