The Semantic Web concept requires a formal representation of the information according to reference ontologies that provide the Web with semantics for computer systems.
There is a general agreement that this is done through standard labeling languages. But it also requires that there be enough semantic annotations of this kind, a certain “critical mass” is necessary to make global sense on the Web. And this has not been achieved fundamentally because of the complexity of making the annotation manually.
Only when you have the facility to generate enough semantic annotations, either automatically or semi-automatically, will you be able to extend the semantics in the contents of the Web. From this situation it will be possible to develop applications that exploit or take advantage of this semantic, semantic applications. And towards this problem is oriented, and more particularly, our research.
Thus, the main specific contribution of this thesis is the proposal of a procedure to contribute to the extension of the ontology population, which provides an active user with the semantic labeling of the information he manages, and which he has already described in his text. HTML page, according to the ontology or ontologies that the system has identified as more related to its contents. In our work, this last possibility is taken into account, the content to be labeled can refer to different topics or can be interpreted from different points of view, which in this work we will call generating different “semantic views”.
But also a semantic web site must be compatible with the current Web, that is, the annotation process should not affect the current functioning of any search engine. Consequently, when transforming a website into a semantic website, semantic functionalities can be obtained that can be exploited by a semantic search engine, but when it is treated by an ordinary search engine, there will be total compatibility and the ordinary search engine will treat it as if it were a website more . Also in this thesis has been taken into account this requirement, the semantic views remain differentiated from the HTML page, accessible but without affecting the usual search engines.
We have defined transformation stages that must be carried out sequentially. The first one we call identification allows to associate the ontology or ontologies that are closest to the content of the web page. This selection of ontologies is fundamental so that in the next stage, which we call extraction, the text is processed at a morphological and syntactic level. Finally, the last stage we have called interpretation is responsible for semantic annotation. The annotation is made in our study in OWL DL because it is the standard language for the description of semantics in the Web and allows the inferences proper to the descriptive logic SROID (D) on which it is based.
In the development, the methodology used has been based on simplifying the problem without losing the conceptual category to cover the entire scope of the proposal, composed of a sequence of processes that develop throughout the thesis. That is to say, a simplified scenario has been proposed that recreates the fundamental elements of the current Web to propose a migration or transformation strategy towards the Semantic Web. The conclusions reached are the result of an experimental self-correction process. We have fully implemented the proposal of this thesis that can be verified by any researcher following the indications of the annex of the thesis.
To carry out this transformation or migration, a prototype tool (sw2sws) has been implemented that automates the three stages that we have presented. It has been tested on real websites. Our prototype tool automates the annotation process with the ontologies used in the thesis, but it is easily adaptable to support others. In addition, our approach accepts the possibility of user intervention (semi-automatic process) that completes or improves any of the phases of the overall process.
The quality of the annotation obtained depends on several factors; such as the quality of the ontology itself with respect to which it records (affinity, precision, standardization, completeness, etc.), the clarity of the content and the capacity of extraction and analysis, conditioned, to a large extent, to the processing of natural language ( PLN). This thesis is not intended to solve the PLN problem for annotation; However, to test the process, we have made a small module of PLN that allows to show the viability for active users, users that participate in the contents and who are inexperienced in the techniques of the Semantic Web.
Reached the main objective, to show how to exploit this information that already has semantics and close the whole sequence of the process, we have seen