# # spec file for package python-tesserocr # # Copyright (c) 2025 SUSE LLC # # All modifications and additions to the file contributed by third parties # remain the property of their copyright owners, unless otherwise agreed # upon. The license for this file, and modifications and additions to the # file, is the same license as for the pristine package itself (unless the # license for the pristine package is not an Open Source License, in which # case the license is the MIT License). An "Open Source License" is a # license that conforms to the Open Source Definition (Version 1.9) # published by the Open Source Initiative. # Please submit bugfixes or comments via https://bugs.opensuse.org/ # %{?sle15_python_module_pythons} Name: python-tesserocr Version: 2.8.0 Release: 1.1 Summary: A Python wrapper around tesseract-ocr License: MIT Group: Development/Languages/Python URL: https://github.com/sirfz/tesserocr Source: https://files.pythonhosted.org/packages/source/t/tesserocr/tesserocr-%{version}.tar.gz BuildRequires: %{python_module Cython} BuildRequires: %{python_module Pillow} BuildRequires: %{python_module devel} BuildRequires: %{python_module pip} BuildRequires: %{python_module pytest} BuildRequires: %{python_module setuptools} BuildRequires: %{python_module wheel} BuildRequires: fdupes BuildRequires: gcc-c++ BuildRequires: pkgconfig BuildRequires: python-rpm-macros BuildRequires: tesseract-ocr-traineddata-english BuildRequires: tesseract-ocr-traineddata-orientation_and_script_detection BuildRequires: pkgconfig(libcurl) BuildRequires: pkgconfig(tesseract) Requires: tesseract-ocr-traineddata-english Requires: tesseract-ocr-traineddata-orientation_and_script_detection Recommends: python-Pillow %python_subpackages %description A wrapper around the tesseract-ocr API for Optical Character Recognition (OCR). tesserocr integrates directly with Tesseract's C++ API using Cython which allows for Pythonic source code. It enables real concurrent execution when used with Python's threading module by releasing the GIL while processing an image in tesseract. %prep %setup -q -n tesserocr-%{version} %build %pyproject_wheel %install %pyproject_install %fdupes %{buildroot} %check export TESSDATA_PREFIX=/usr/share/tessdata %python_exec setup.py develop --user # test_LSTM_choices failure: https://github.com/sirfz/tesserocr/issues/214 # https://github.com/sirfz/tesserocr/issues/295 donttest="test_LSTM_choices" donttest+=" or test_detect_os" donttest+=" or test_init" %pytest -k "not ($donttest)" %files %{python_files} %license LICENSE %doc README.rst %{python_sitearch}/tesserocr* %changelog * Mon Mar 10 2025 John Paul Adrian Glaubitz <adrian.glaubitz@suse.com> - Update to 2.8.0 * Build Python 3.13 wheels by @nijel in (#357) * chore(ci): Modernize wheel builds by @nijel in (#362) - Switch build system from setuptools to pyproject.toml * Add python-pip and python-wheel to BuildRequires * Replace %%python_build with %%pyproject_wheel * Replace %%python_install with %%pyproject_install - Use Python 3.11 on SLE-15 by default * Mon Sep 23 2024 Dirk Müller <dmueller@suse.com> - update to 2.7.1: * bugfix: `set_leptonica_log_level` expects int * revert: disable tesseract's logging by default * Sun Apr 28 2024 Mia Herkt <mia@0x0.st> - Update to 2.7.0: * Allow passing configs/variables on initialization gh#sirfz/tesserocr#349 * Stub file for completion gh#sirfz/tesserocr#350 * Expose leptonica's log level setting via set_leptonica_log_level function * Keep tesseract's default debug_file setting * Tue Apr 2 2024 Dirk Müller <dmueller@suse.com> - update to 2.6.3: * Clarified the comments for tessdata path * skip unit test for GetComponentImages if Pillow is missing * Build with C++17 for Tesseract>=5.3.4 * Tue Nov 7 2023 Mia Herkt <mia@0x0.st> - Update to 2.6.2 (no user-facing changes) * Sun Jun 11 2023 Mia Herkt <mia@0x0.st> - Fix build: Add libcurl dependency * Fri Mar 17 2023 Mia Herkt <mia@0x0.st> - Update to 2.6.0 * _pix_to_image now works with binary images gh#sirfz/tesserocr#274 * SetImage with alpha channels support gh#sirfz/tesserocr#280 * Leptonica 1.83.0 support gh#sirfz/tesserocr#306 * Pointsize should be returned even if fontname doesn't exist gh#sirfz/tesserocr#308 * Added Python 3.10, 3.11 setup classifiers - Drop 1441bec703cf68161acce5e85907ddd71c47fdc3.patch * Mon Feb 27 2023 Daniel Garcia <daniel.garcia@suse.com> - Disable current broken tests, test_LSTM_choices, test_detect_os and or test_init, gh#sirfz/tesserocr#295 * Sat Jan 14 2023 Hans-Peter Jansen <hpj@urpla.net> - Apply 1441bec703cf68161acce5e85907ddd71c47fdc3.patch from upstream project in order to build with Leptonica 1.83.0 - Make tests work again * Fri Nov 11 2022 pgajdos@suse.com - silent rpmlint * Fri Nov 11 2022 pgajdos@suse.com - python-six is not required * Wed Jun 23 2021 Mia Herkt <mia@0x0.st> - Update to 2.5.2 * Support new Tesseract 5 API (gh#sirfz/tesserocr#242) * GetBestLSTMSymbolChoices crash fix (gh#sirfz/tesserocr#241) * Fallback to BMP instead of PNG * Create pix from a BMP image bytes (gh#sirfz/tesserocr#156) * Thu Mar 26 2020 pgajdos@suse.com - version update to 2.5.1 * Fix order of linker arguments (#211) * Fix memory leaks in GetComponentImages (#213) * Mon Jan 13 2020 pgajdos@suse.com - disable test_LSTM_choices temporarily https://github.com/sirfz/tesserocr/issues/214 * Tue Nov 26 2019 Martin Herkt <9+suse@cirno.systems> - Update to version 2.5.0 * Support for RowAttributes method in LTRResultIterator * SetImage: use PNG instead of JPEG fallback * Replace STRING::string() by c_str() * Don't use assignment operator for TessBaseAPI * Fri Aug 23 2019 Martin Herkt <9+suse@cirno.systems> - Update to version 2.4.1 * fix pixa_to_list python3 segfault * fix BlockPolygon python3 segfault * Thu Dec 6 2018 Jan Engelhardt <jengelh@inai.de> - Trim bias and filler wording. * Wed Dec 5 2018 Martin Herkt <9+suse@cirno.systems> - Update to version 2.4.0 Tesseract v4 new API methods supported: * GetBestLSTMSymbolChoices * BlanWksBeforeWord * Mon Aug 13 2018 9+suse@cirno.systems - Update to version 2.3.1 * Python 3.7 support release * Thu Aug 2 2018 tchvatal@suse.com - Ensure we require some of the tesseract data so we can do at least some basic ocr operations * Thu Aug 2 2018 tchvatal@suse.com - Drop unused bcond * Tue Jun 26 2018 9+suse@cirno.systems - Run tests - Use %%license macro - Update to version 2.3.0 * Support for Tesseract 4 + New OCR engines LSTM_ONLY and TESSERACT_LSTM_COMBINED + New default tessdata path handling * Fixed compilation against Tesseract v3.05.02 which required c++11 * Fallback to 'eng' as default language when default language returned by the API is empty * Fri Aug 11 2017 9@cirno.systems - Add doc files * Sun Aug 6 2017 9@cirno.systems - Switch to PyPI source URL - Add Pillow (PIL) to Recommends * Wed Jul 26 2017 9@cirno.systems - v2.2.2 * Support timeout in Recognize API methods * Fixed typo in _Enum initialization error message formatting * Display tessdata path in init exception message * Fixed version check in Python 3 when reading the version number from the tesseract executable * Thu Jun 1 2017 9@cirno.systems v2.2.1 * Fixed setup bug that affects gcc versions with no -std=c++11 option support (which should be required by tesseract 4.0+ and not older versions). * Sun May 28 2017 9@cirno.systems v2.2.0 * Improved setup script * Tesseract 4.0 support: - Two new OEM enums: OEM.LSTM_ONLY and OEM.TESSERACT_LSTM_COMBINED (tesseract 4.0+) - Two new API methods: GetTSVText and DetectOrientationScript (tesseract 4.0+) - PyTessBaseApi.__init__ now accepts a new attribute oem (OCR engine mode: OEM.DEFAULT by default). - file_to_text and image_to_text functions now also accept the oem attribute as above. * Fixed segfault on API Init* failure * Fixed segfault when pixa_to_list returns NULL * Documentation fixes and other minor improvments * Sat Apr 1 2017 9@cirno.systems - Initial commit