Derin Öğrenme Yöntemleri Yardımıyla Görüntülerin Otomatik Konumlu Betimlenmesi ve Türkçe Alt Yazı Oluşturulması için Yeni Veri Kümesi ile bir Yöntem Önerisi

Esin Erguvan Etgin; Erdal Güvenoğlu

doi:10.54525/bbmd.1454524

Research Article

A Method Proposal for Creation Automatic Turkish Directional Depicted and Captioning of Images with Deep Learning Methods and the New Turkish Dataset

Year 2024, Volume: 17 Issue: 1, 48 - 55

Esin Erguvan Etgin Erdal Güvenoğlu

https://doi.org/10.54525/bbmd.1454524

Abstract

With image processing is widely used today, to be depicted and automatic captioning of images is of great importance. With this study, it is aimed to make an directional depicted and captioning of the objects in the images in Turkish language. In this study, 1500 images were selected from the Microsoft Common Objects in Context (MS-COCO) dataset.For each selected image, subtitles were created to include the relative positions of some objects detected in the image, and a new data set was obtained. Using this dataset, a method has been proposed for automatic to be directional depicted and captioning of images and the best six models obtained for this method were selected. Experimental results show that of the proposed method show successful results for automatic to be directional depicted and captioning of images in Turkish language.

Keywords

Image captioning, Computer vision, Natural language processing, FasterRCNN, ResNet, GRU

References

Öztemel, E., Yapay Sinir Ağları. Papatya Yayıncılık , İstanbul, 2006.
Nabiyev, V., Yapay Zeka (3. baskı). Seçkin Yayıncılık, Ankara, 2010.
Patterson, J. and A. Gibson, Deep learning: A practitioner's approach, " O'Reilly Media, Inc.", 2017.
Unal, M.E., ve ark., Tasviret: Görüntülerden otomatik türkçe açıklama olusturma Için bir denektaçı veri kümesi (TasvirEt: A benchmark dataset for automatic Turkish description generation from images). IEEE Sinyal Isleme ve Iletisim Uygulamaları Kurultayı, 2016.
Antol, S., ve ark., Vqa: Visual question answering, in Proceedings of the IEEE international conference on computer vision, 2015.
Lin, X. and D. Parikh., Don't just listen, use your imagination: Leveraging visual common sense for non-visual tasks, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.
Vinyals, O., ve ark., Show and tell: Lessons learned from the 2015 mscoco image captioning challenge, IEEE, 2016. 39(4): p. 652-663.
Agrawal, H., ve ark., Nocaps: Novel object captioning at scale, in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.
Wu, Y., ve ark., Decoupled novel object captioner, in Proceedings of the 26th ACM international conference on Multimedia, 2018.
Venugopalan, S., ve ark., Captioning images with diverse objects, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
Kuyu, M., A. Erdem, and E. Erdem, Altsözcük Ögeleri ile Türkçe Görüntü Altyazılama Image Captioning in Turkish with Subword Units, IEEE, 2018.
Yılmaz, B.D., ve ark., Image Captioning in Turkish Language, in 2019 Innovations in Intelligent Systems and Applications Conference (ASYU), 2019.
Lin, T.-Y., ve ark., Microsoft coco: Common objects in context, in European conference on computer vision, Springer, 2014.
He, K., ve ark., Deep residual learning for image recognition, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
Ren, S., ve ark., Faster r-cnn: Towards real-time object detection with region proposal networks, arXiv preprint arXiv:.01497, 2015.
Hochreiter, S. and J. Schmidhuber, Long short-term memory, Neural computation, 9(8): p. 1735-1780, 1997.
Chung, J., ve ark., Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014.
Faulder, D. and T.T. Cladouhos, Physics-Guided Deep Learning for Prediction of Geothermal Reservoir Performance, 47 th Workshop on Geothermal Reservoir Engineering Stanford University ,2022.
The Tensoflow Hub Authors, "https://tfhub.dev/google/ faster_ rcnn/openimages_v4/inception_resnet_v2/1", 05.05.2022.
Pedersen M. E. H., “https://colab.research.google. com/github/ Hvass-Labs/TensorFlowTutorials”, 15.12.2021.
Mikolov, T., ve ark., Efficient estimation of word representations in vector space, arXiv preprint arXiv:.01497, 2013.
Mikolov, T., ve ark., Distributed representations of words and phrases and their compositionality, Advances in neural information processing systems, 2013.
Samet N., “https://github.com/nerminsamet”, 05.05.2022.
Wang, H., K. Ren, and J. Song, A closer look at batch size in mini-batch training of deep auto-encoders, in 2017 3rd ieee international conference on computer and communications (iccc) IEEE, 2017.
Tieleman, T. and G. Hinton, Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, COURSERA: Neural networks for machine learning, 2012.
Papineni, K., ve ark., Bleu: a method for automatic evaluation of machine translation, in Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002.
Lin, C.-Y. and E. Hovy. Automatic evaluation of summaries using n-gram co-occurrence statistics, in Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics, 2003.
Lin, C.-Y., Rouge: A package for automatic evaluation of summaries, in Text summarization branches out, 2004.
Banerjee, S. and A. Lavie. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments, in Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, 2005.
Vedantam, R., C. Lawrence Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.
Anderson, P., ve ark., Spice: Semantic propositional image caption evaluation, in European conference on computer vision Springer, 2016.

Derin Öğrenme Yöntemleri Yardımıyla Görüntülerin Otomatik Konumlu Betimlenmesi ve Türkçe Alt Yazı Oluşturulması için Yeni Veri Kümesi ile bir Yöntem Önerisi

Year 2024, Volume: 17 Issue: 1, 48 - 55

Esin Erguvan Etgin Erdal Güvenoğlu

https://doi.org/10.54525/bbmd.1454524

Abstract

Günümüzde görüntü işlemenin yaygın olarak kullanılması ile birlikte görüntülerin otomatik olarak betimlenmesi ve alt yazı oluşturulması önemli görülmektedir. Bu çalışma ile Türkçe otomatik konumlu betimleme ve alt yazı oluşturulması amaçlanmıştır. Bu çalışmada, MS-COCO veri kümesinden 1500 görüntü seçilmiştir. Seçilen her bir görüntü için görüntüde saptanan bazı nesnelerin birbirlerine göre konumlarını içerecek şekilde alt yazılar oluşturulmuş ve yeni bir veri kümesi elde edilmiştir. Bu veri kümesi kullanılarak otomatik konumlu betimleme ve alt yazılama için bir yöntem önerilmiştir. Bu yöntem için elde edilen en iyi altı model otomatik alt yazı oluşturma için seçilmiştir. Deneysel sonuçlar önerilen yöntemin Türkçe otomatik konumlu betimleme ve alt yazı oluşturma için başarılı sonuçlar ortaya koyduğunu göstermektedir.

Keywords

Görüntü alt yazılama, Bilgisayarlı görü, Doğal dil işleme, FasterRCNN, ResNet, GRU.

References

Öztemel, E., Yapay Sinir Ağları. Papatya Yayıncılık , İstanbul, 2006.
Nabiyev, V., Yapay Zeka (3. baskı). Seçkin Yayıncılık, Ankara, 2010.
Patterson, J. and A. Gibson, Deep learning: A practitioner's approach, " O'Reilly Media, Inc.", 2017.
Unal, M.E., ve ark., Tasviret: Görüntülerden otomatik türkçe açıklama olusturma Için bir denektaçı veri kümesi (TasvirEt: A benchmark dataset for automatic Turkish description generation from images). IEEE Sinyal Isleme ve Iletisim Uygulamaları Kurultayı, 2016.
Antol, S., ve ark., Vqa: Visual question answering, in Proceedings of the IEEE international conference on computer vision, 2015.
Lin, X. and D. Parikh., Don't just listen, use your imagination: Leveraging visual common sense for non-visual tasks, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.
Vinyals, O., ve ark., Show and tell: Lessons learned from the 2015 mscoco image captioning challenge, IEEE, 2016. 39(4): p. 652-663.
Agrawal, H., ve ark., Nocaps: Novel object captioning at scale, in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.
Wu, Y., ve ark., Decoupled novel object captioner, in Proceedings of the 26th ACM international conference on Multimedia, 2018.
Venugopalan, S., ve ark., Captioning images with diverse objects, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
Kuyu, M., A. Erdem, and E. Erdem, Altsözcük Ögeleri ile Türkçe Görüntü Altyazılama Image Captioning in Turkish with Subword Units, IEEE, 2018.
Yılmaz, B.D., ve ark., Image Captioning in Turkish Language, in 2019 Innovations in Intelligent Systems and Applications Conference (ASYU), 2019.
Lin, T.-Y., ve ark., Microsoft coco: Common objects in context, in European conference on computer vision, Springer, 2014.
He, K., ve ark., Deep residual learning for image recognition, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
Ren, S., ve ark., Faster r-cnn: Towards real-time object detection with region proposal networks, arXiv preprint arXiv:.01497, 2015.
Hochreiter, S. and J. Schmidhuber, Long short-term memory, Neural computation, 9(8): p. 1735-1780, 1997.
Chung, J., ve ark., Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014.
Faulder, D. and T.T. Cladouhos, Physics-Guided Deep Learning for Prediction of Geothermal Reservoir Performance, 47 th Workshop on Geothermal Reservoir Engineering Stanford University ,2022.
The Tensoflow Hub Authors, "https://tfhub.dev/google/ faster_ rcnn/openimages_v4/inception_resnet_v2/1", 05.05.2022.
Pedersen M. E. H., “https://colab.research.google. com/github/ Hvass-Labs/TensorFlowTutorials”, 15.12.2021.
Mikolov, T., ve ark., Efficient estimation of word representations in vector space, arXiv preprint arXiv:.01497, 2013.
Mikolov, T., ve ark., Distributed representations of words and phrases and their compositionality, Advances in neural information processing systems, 2013.
Samet N., “https://github.com/nerminsamet”, 05.05.2022.
Wang, H., K. Ren, and J. Song, A closer look at batch size in mini-batch training of deep auto-encoders, in 2017 3rd ieee international conference on computer and communications (iccc) IEEE, 2017.
Tieleman, T. and G. Hinton, Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, COURSERA: Neural networks for machine learning, 2012.
Papineni, K., ve ark., Bleu: a method for automatic evaluation of machine translation, in Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002.
Lin, C.-Y. and E. Hovy. Automatic evaluation of summaries using n-gram co-occurrence statistics, in Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics, 2003.
Lin, C.-Y., Rouge: A package for automatic evaluation of summaries, in Text summarization branches out, 2004.
Banerjee, S. and A. Lavie. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments, in Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, 2005.
Vedantam, R., C. Lawrence Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.
Anderson, P., ve ark., Spice: Semantic propositional image caption evaluation, in European conference on computer vision Springer, 2016.

There are 31 citations in total.

Details

Primary Language	Turkish
Subjects	Information Modelling, Management and Ontologies
Journal Section	Research Articles
Authors	Esin Erguvan Etgin 0000-0002-2607-6076 Erdal Güvenoğlu 0000-0003-1333-5953
Early Pub Date	March 18, 2024
Publication Date
Submission Date	October 24, 2023
Acceptance Date	December 7, 2023
Published in Issue	Year 2024 Volume: 17 Issue: 1

Cite

IEEE	E. Erguvan Etgin and E. Güvenoğlu, “Derin Öğrenme Yöntemleri Yardımıyla Görüntülerin Otomatik Konumlu Betimlenmesi ve Türkçe Alt Yazı Oluşturulması için Yeni Veri Kümesi ile bir Yöntem Önerisi”, bbmd, vol. 17, no. 1, pp. 48–55, 2024, doi: 10.54525/bbmd.1454524.

Download Cover Image

Article Files

Full Text