5 Web automation techniques for web sources
Most web sources, when designed to be easily used by humans, do not provide interfaces adapted to software programs to interact with them.
Recently, there has been a great interest in the automation of interactions with a website through the use of web automation applications. Many researchers have proposed techniques to solve this problem.
Most of the techniques proposed have focused on the use of wrappers that abstract the complexities involved in the automation of a task in a web source and provide an interface to external applications.
However, ad-hoc solutions still predominate in web automation applications. One of the reasons that motivate this situation is that most of the proposals have focused on the query wrappers, which transform a web source into a special type of database in which some queries can be executed using a query form and return a set of results composed of structured data records.
Although the query wrapper model is often useful, it is not appropriate for applications that make decisions based on the data obtained or for processes that use forms that can be modeled as insert / update / delete operations.
On the other hand, a crucial part in web automation applications is to easily generate and later play back navigation sequences. The problem was addressed in some jobs but those systems assume a navigation model that is now obsolete.
This obsolete model only allows very restrictive user actions (mainly clicks on elements, establish texts in form fields and select options in selection menus) and assumes that the effect of most of these actions will be solely the loading of a new page in the Navigator.
With the emergence of Web 2.0, websites try to look more and more like desktop applications: you can respond to a greater number of user actions (placing your mouse over an element, dragging and dropping an element, .. .) executing arbitrary code that manipulates the content of the page. Additionally, AJAX technology allows requesting information from the web server in a format independent of the presentation such as XML or JSON and modifying only certain parts of the current page based on the response received.
This means that many of these websites are outside the support offered by the Automatic Navigation Systems currently in existence. For this reason, there is a need to define new techniques that allow building an automatic web browsing system capable of handling all the complexity of Web 2.0 sites.
The objectives of this thesis are to define a new language for the definition of web automation processes based on the study of a wide range of real-world web automation tasks that have been used by corporations belonging to different business areas.
And on the other hand, it is intended to address the problem of automatically generating and reproducing complex actions with state-of-the-art websites. Finally, to validate the proposed ideas, a functional prototype will be implemented.