Autors: Angelov, S. A., Lazarova, M. K.
Title: Using LLM for Image Correction Proposals
Keywords: image processing, multimodal LLMs, printing quality control, vector databases, vision transformers

Abstract: The paper proposes an automated pipeline for printing quality control, leveraging multimodal large language models (LLMs) and vector databases to detect and mitigate defects such as misalignment, graininess, and offsetting in printed images. The pipeline integrates the CLIP-ViT model for feature extraction, ChromaDB for efficient embedding storage and retrieval, and LLaVA for generating actionable recommendations based on statistical metrics, including Structural Similarity Index (SSIM), histogram difference, and Mean Squared Error (MSE), alongside visual inputs. Technical challenges, such as memory constraints on Apple Silicon devices and floating-point image processing, are addressed to ensure scalability. Experimental validation using synthetic 512 × 512 images demonstrates the pipeline's efficacy with recommendations accurately corresponding to induced defects.

References

  1. R. Smith et al., "Automated quality control in printing: A review," J. Imaging Sci. Technol., vol. 64, no. 3, p. 030501, May-Jun. 2020.
  2. A. Radford et al., "Learning transferable visual models from natural language supervision," arXiv:2103.00020, 2021.
  3. J. Johnson, A. Alahi, and L. Fei-Fei, "Image retrieval using scene graphs," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 12174-12183.
  4. X. Zhang et al., "Deep learning-based defect detection in industrial printing," IEEE Trans. Ind. Informat., vol. 18, no. 5, pp. 3210-3219, May 2022.
  5. H. Li et al., "LLaVA: Large language and vision assistant," arXiv:2306.12345, 2023.
  6. ChromaDB Documentation, "Chroma: An open-source embedding database," [Online]. Available: https://docs.trychroma.com, 2023.
  7. G. Bradski and A. Kaehler, Learning OpenCV, Sebastopol, CA, USA: O'Reilly Media, 2008.
  8. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, Cambridge, MA, USA: MIT Press, 2016.
  9. Z. Wang et al., "Image quality assessment: From error visibility to structural similarity," IEEE Trans. Image Process., vol. 13, no. 4, pp. 600-612, Apr. 2004.
  10. A. Paszke et al., "PyTorch: An imperative style, high-performance deep learning library," in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 32, 2019, pp. 8024-8035.
  11. https://www.analyticsvidhya.com/blog/2024/09/clip-vit-l14/

Issue

2025 13th International Scientific Conference on Computer Science, COMSCI 2025 - Proceedings, 2025, Albania, https://doi.org/10.1109/COMSCI67172.2025.11225234

Вид: публикация в международен форум, публикация в реферирано издание, индексирана в Scopus