Skip to main content
Log in

Regular Expressions for Web Advertising Detection Based on an Automatic Sliding Algorithm

  • Published:
Programming and Computer Software Aims and scope Submit manuscript

Abstract

This paper presents the automation of a Web advertising recognition algorithm, using regular expressions. Currently, the use of regular expressions, optical character recognition, Databases, and automation tests have been critical for multiple Software implementations. The tests were carried out in three Web browsers. As a result, the detection of advertisements in Spanish, that distract attention and that above all extract information from users was achieved. The main feature of the algorithm is that automatic and versatile execution does not require access to the code of the page in question and that in the future it can be an application with background operation. Being supported by optical character recognition gives us acceptable efficiency in detecting advertising. Thanks to this identification, it may be possible to generate different applications, both in favor of the user and the brands, always with the aim of improving current online marketing models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.

Similar content being viewed by others

REFERENCES

  1. Marketing Digital, ¿Qué es el marketing digital?, 2020. http://www.mdmarketingdigital.com/que-es-el-marketingdigital.

  2. Redes Semánticas, http://tesis.uson.mx/digital/tesis/docs/9049/Capitulo1.pdf.

  3. Marketing Online: Potencial y Estrategias, 2019. http://www.cecarm.com/Guia_Marketing_Online_Potencial_y_Estrategias_-_CECARM.pdf-6120.

  4. Pomol, R., González, C., and González, S., Una herramienta didáctica para el aprendizaje interactivo de expresiones regulares, 2013. http://repositorio.uigv.edu.pe/handle/20.500.11818/804.

  5. Beltrán, R., El uso de expresiones regulares en la detección de errores escritos: implicaciones para el diseño de un corrector gramatical, 2008. https://dialnet.unirioja.es/servlet/articulo?codigo=4007478.

  6. Gallego, A., La jerarquía de Chomsky y la facultad del lenguaje: consecuencias para la variación y la evolución, Teorema, 2008, vol. 27, no. 2, pp. 47–60.

    Google Scholar 

  7. García, I., Herramienta para la corrección automática de autómatas finitos, 2017. https://riull.ull.es/xmlui/handle/915/5846.

  8. Sánchez, J., López, L., and Martíez, J., Solución para garantizar la privacidad en el Internet de las Cosas, El profesional de la informaciyn, 2015, vol. 24, pp. 62–70.

  9. Ortiz, M., Aguilar, L., and Marín, L., Los desafíos del marketing en la era del big data, e-Ciencias de la Informaciyn, 2016, vol. 6, pp. 1–30.

  10. Riaño, D., Molero-Castillo, G., Velázquez-Mena, A., and Bárcenas, E., Expresiones regulares para el tratamiento de privacidad de navegadores Web, Abstr. Appl., 2019, vol. 25, pp. 121–130.

    Google Scholar 

  11. Cerezo, P., Ad blocking: el modelo publicitario digital, a revisión, Cuadernos de periodistas: revista de la Asociaciyn de la Prensa de Madrid, 2016, pp. 81–89.

  12. Londaitz, A., Publicidad en los celulares: publicidad invasiva vs. derecho a la privacidad, Thesis, Universidad del Salvador, 2011. https://racimo.usal.edu.ar/4312.

  13. Bienvenido a Google, la mejor empresa para trabajar, 2013. http://www.expansion.com/2013/08/23/directivos/1377273795.html.

  14. Jarvis, J., Y Google, ¿cómo lo haría?, 2000. https://narrativabreve.com/2013/10/libro-google-jeffharvis.html.

  15. Leotta, M., Clerissi, D., Ricca, F., and Spadaro, C., Comparing the maintainability of selenium webdriver test suites employing different locators: a case study, Proc. 1st Int. Workshop on Joining AcadeMiA and Industry Contributions to Testing Automation, Lugano, 2013. https://dl.acm.org/doi/10.1145/2489280.2489284.

  16. Gojare, S., Joshi, R., and Gaigaware, D., Analysis and design of selenium WebDriver automation testing framework, Procedia Comput. Sci., 2015, vol. 50, pp. 341–346.

    Article  Google Scholar 

  17. Selenium Webdriver, 2017. http://www.tutorialspoint.com/selenium/pdf/selenium_webdriver.pdf.

  18. Yih, W., Goodman, J., and Carvalho, V., Finding advertising keywords on web pages, Proc. 15th Int. Conf. on World Wide Web, Edinburgh, 2006. https://dl.acm.org/doi/pdf/10.1145/1135777.1135813.

  19. Mei, T., Li, L., Tian, X., Tao, D., and Ngo, C., PageSense: toward stylewise contextual advertising via visual analysis of web pages, IEEE Trans. Circuits Syst. Video Technol., 2018. http://dl.acm.org/doi/abs/10.1109/TCSVT.2016.2598702

  20. Sánchez, D. and Viejo, A., Privacy-preserving and advertising-friendly web surfing, Comput. Commun., 2018, vol. 130, pp. 113–123.

    Article  Google Scholar 

  21. Krammer, V., An effective defense against intrusive web advertising, Proc. 6th Annu. Conf. on Privacy, Security and Trust, Fredericton, NB, 2008. https://ieeexplore.ieee.org/document/4641268.

    Google Scholar 

  22. Sajjad, K., Automatic license plate recognition using Python and Opencv, College of Engineering, 2010. https://pdfs.semanticscholar.org/bddf/1200eb17f239e4dce2a9cec938eb8cf305f5.pdf.

  23. Patel, C., Patel, A., and Patel, D., Optical character recognition by open source OCR tool tesseract: a case study, Int. J. Comput. Appl., 2012, vol. 55, no. 10. https://research.ijcaonline.org/volume55/number10/pxc3882784.pdf.

  24. Vallez, M., Keyword research: métodos y herramientas para identificar palabras clave, BiD: textos universitaris de biblioteconomia i documentació, 2011, vol. 27, pp. 1–14.

  25. Slamet, C., Andrian, R., Maylawati, D., Darmalaksana, W., and Ramdhani, M., Web scraping and naïve Bayes classification for job search engine, Proc. 2nd Annu. Applied Science and Engineering Conf., Bandung, 2018. https://iopscience.iop.org/article/10.1088/1757-899X/288/1/012038/pdf.

Download references

ACKNOWLEDGMENTS

This work was supported by UNAM-PAPIIT IA105320.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to D. Riaño, R. Piñon, G. Molero-Castillo, E. Bárcenas or A. Velázquez-Mena.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Riaño, D., Piñon, R., Molero-Castillo, G. et al. Regular Expressions for Web Advertising Detection Based on an Automatic Sliding Algorithm. Program Comput Soft 46, 652–660 (2020). https://doi.org/10.1134/S0361768820080162

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0361768820080162

Navigation