Semantic Web Implementation using jena

Abstract:

This paper investigates the possibility to utilize standardized semantic Web-based languages, such as RDF and OWL. Jena is Java toolkit for developing semantic web applications based on W3C recommendations for RDF and OWL. It provides an RDF API, ARP, an RDF parser, SPARQL, the W3C RDF query language, an OWL API, and rule-based inference for RDFS and OWL. Jena is open source and grown out of work with the HP Labs Semantic Web Program.

Keywords:

RDF, OWL, SPARQL, W3C, API, ARP, XML, HTML, ONTOLOGY, METADATA, MODEL, FACTORY, JENA, JAVA, JDK etc.

Introduction:

Some aspects of W3C's RDF Model and Syntax Specification require careful reading and interpretation to produce a conformant implementation. Issues have arisen around anonymous resources, reification and RDF Graphs. These and other issues are identified, discussed and an interpretation of each is proposed. Jena, an RDF API in Java based on this interpretation, is described. Jena works in Linux, windows and Free BSD. We need to use jre1.2 or above. Included with the Jena toolset is an RDF parser, ARP(an acronym for another RDF Parser), accessible as a standalone product. A traditional open-source data sources, such of system implies the existence of: multiple database relational servers (e.g., MySQL or PostgreSQL servers), real-time multimedia streams (audio, video, or animated documents), XML-based documents (on the server, the content can be easily stored into XML documents without the layout or even within native XML databases, such as eXists or Apache Xindice), Plain text files (e.g., used for storage of configuration parameters or log information).

The actual WWW space is mainly compounded by pages (documents that contain markups) with information in the form of natural language text and multimedia – still images, sound, animations, video clips, etc. – intended for humans to read and to understand. Computers are principally used to render this hypermedia information, not to reason about it. Information retrieval has become omnipresent and information needs no longer to be intended for human readers only, but also for machine processing, enabling intelligent information services, personalized Web sites, and semantically empowered search engines – this is the seminal idea of Semantic Web Semantic Web technologies are based on the XML (Extensible Markup Language) and is structured on three main layers. The metadata layer offers an extensible framework in order to express simple semantic assertions (e.g., vocabularies or taxonomies); this conceptual model can be use to attach metadata (data about data) to each Web resource; The schema layer can help to specify simple ontologies in order to define a hierarchical description of the concepts and properties for a given resource; The logical layer introduces ontological languages that are capable to model complex ontologies; at this layer, in the future will exist different reasoning services to be used by the applications oriented to Semantic Web. Jena toolkit provide all the applications to develop using java.

XML Technologies:

Any semantic Web-based application is based on XML–Extensible Markup Language, a recommendation of the World Wide Web Consortium for a meta-language to define mark-ups (annotations) for content publishing particularly on the World Wide Web space. The main objective of the XML meta-language is to provide some benefits not available in HTML (HyperText Markup Language), such as arbitrary extensions of a document’s elements (tags) and their attributes, support for documents with complex structure, and validation of document structure with respect to an optional document-structure grammar, called a DTD (Document Type Definition). Also, instead of DTD, an object-oriented method for validation of XML documents can be used: an XML Schema. As a standard recommended by the Web Consortium, XML is considered as the data format for information interchanging between various Internet and Web applications. The XML popularity is primarily due to its flexibility in the representation of many data types (see below). The uses of mark-ups give to the XML language the possibility of self-description, and its extensible nature makes possible the definition of new document types, with a particular destination (e.g. user profiles, business rules, multimedia, data-flow etc.). Using XML, the semantics and the structure of the data exchanged by diverse Web business applications is preserved. One of the key advantages is that the data can be organized as in an object-oriented database. As XML is format-independent, there is possible to generate multiple – XHTML, SMIL, WML or XUL – outputs smoothly by transforming XML documents via XSL (Extensible Stylesheet Language) constructs. Similarly to the CSS (Cascading Style Sheets), the XSL documents separate the content from representation. Since 1998, XML has grown into a great family of standards integrating key technologies from three previously independent domains: documents, databases, and the Internet. Some examples of XML-based languages are: In order to move towards the Semantic Web, there were developed a series of XML-based languages specialized in the modeling of knowledge – for example, RDF (Resource Description Framework) and OWL (Web Ontology Language).

Jena Model:

Jena’s API architecture focuses on the RDF model, the set of statements that comprises an RDF document, graph, or instantiation of a vocabulary. A basic RDF/XML document is created by instantiating onr of the model classes and adding at least one statement (triple) to it. To view the RDF/XML, read it into a model and then access the individual elements, either through the API or through the query engine. The ModelMem class creates an RDF model in memory. It extends ModelCom-the class incorporating common model methods used by all models-and implements the key interface, Model. In addition, the DAML class, DAMLModelImpl, subclasses ModelMem. The ModelRDB class is an implementation of Model used to manipulate RDF store within a relational Database such as MySQL or Oracle. Unlike the memory Model, Model RDB persists the RDF data for later accesss, and the basic functionality between it and ModelMem is opening and maintaining a connection to relational database in addition to managing the data an interesting additional aspect of this implementation. In memory V/S Persistant Model Storage we can Store data within a relational database –

• As a flat table of statement

• As a hash

• Through Store procedure

Once the data is Store in the Model the next step is querying it.

Jena Query:

We can access data in a Stored RDF Model directly using specific API a function call, or via RDQL-an RDF Query language. Querying data using an SQL- likes syntax is very effective way of pulling data from an RDF Model, whether that Model is store in memory or Relational database. Jena’s RDQL is implemented as an object called query. Once instantiated, it can be passed to a query engine and result stored in query result. To access specific returned value, program variuable bounded to the result sets using the ResultBinding. Once the data retrieved from the RDF/XML, you can iterate through it any number of iterator. Once the query the data using query Object or if you access all RDF/XML elements of specific class, you can assign result to an iterator object and iterate through set, diplaying the result or looking for specific values. Each of several different iterator classes within Jena is focused on specific RDF/XML classes, such as NodeIterator for general RDF Nodes,ResIterator, StamtIterator.

DAML OIL:

Starting with later version of Jena, support for DML+OIL was added to the tool suite. DAML+OIL is language for describing ontologies, a way of describing constraints and refinement for a given vocabulary that are beyond the sophistication of RDFS. Much of the efforts on behalf of semantic web is based the web ontology language at the W3C, which owes much of its effort to DAML+OIL. The principle of DAML+OIL class within Jena, outside of the DAMLModel, is the DAML Ontology class.

RDF implementation:

The Resource Description Framework (RDF) is a standard (technically a W3C Recommendation) for describing resources. What is a resource? That is rather a deep question and the precise definition is still the subject of debate. For our purposes we can think of it as anything we can identify. You are a resource, as is your home page, this tutorial, the number one and the great white whale in Moby Dick. Our examples in this tutorial will be about people. They use an RDF representation of VCARDS. RDF is best thought of in the form of node and arc diagrams. A simple vcard might look like this in RDF:

The resource, John Smith, is shown as an elipse and is identified by a Uniform Resource Identifier (URI), in this case "http://.../JohnSmith". If you try to access that resource using your browser, you are unlikely to be successful; April the first jokes not withstanding, you would be rather surprised if your browser were able to deliver John Smith to your desk top. If you are unfamiliar with URI's, think of them simply as rather strange looking names. Resources have properties. In these examples we are interested in the sort of properties that would appear on John Smith's business card. Figure 1 shows only one property, John Smith's full name. A property is represented by an arc, labeled with the name of a property. The name of a property is also a URI, but as URI's are rather long and cumbersome, the diagram shows it in XML qname form. The part before the ':' is called a namespace prefix and represents a namespace. The part after the ':' is called a local name and represents a name in that namespace. Properties are usually represented in this qname form when written as RDF XML and it is a convenient shorthand for representing them in diagrams and in text. Strictly, however, properties are identified by a URI. The nsprefix:localname form is a shorthand for the URI of the namespace concatenated with the localname. There is no requirement that the URI of a property resolve to anything when accessed by a browser.Each property has a value. In this case the value is a literal, which for now we can think of as a strings of characters. Literals are shown in rectangles. Jena is a Java API which can be used to create and manipulate RDF graphs like this one. Jena has object classes to represent graphs, resources, properties and literals. The interfaces representing resources, properties and literals are called Resource, Property and Literal respectively. In Jena, a graph is called a model and is represented by the Model interface.

The code to create this graph, or model, is simple:

// some definitions

static String personURI = "http://somewhere/JohnSmith";

static String fullName = "John Smith";

// create an empty Model

Model model = ModelFactory.createDefaultModel();

// create the resource

Resource johnSmith = model.createResource(personURI);

// add the property

johnSmith.addProperty(VCARD.FN, fullName);

It begins with some constant definitions and then creates an empty Model or model, using the ModelFactory method createDefaultModel() to create a memory-based model. Jena contains other implementations of the Model interface, e.g one which uses a relational database: these types of Model are also available from

ModelFactory.

The John Smith resource is then created and a property added to it. The property is provided by a "constant" class VCARD which holds objects representing all the definitions in the VCARD schema. Jena provides constant classes for other well known schemas, such as RDF and RDF schema themselves, Dublin Core and DAML. The code to create the resource and add the property, can be more compactly written in a cascading style:

Resource johnSmith =

model.createResource(personURI)

.addProperty(VCARD.FN, fullName);

In the first example, the property value was a literal. RDF properties can also take other resources as their value. Using a common RDF technique, this example shows how to represent the different parts of John Smith's name:

Here we have added a new property, vcard:N, to represent the structure of John Smith's name. There are several things of interest about this Model. Note that the vcard:N property takes a resource as its value. Note also that the ellipse representing the compound name has no URI. It is known as a blank Node.

The Jena code to construct this example, is again very simple.

// some definitions

String personURI = "http://somewhere/JohnSmith";

String givenName = "John";

String familyName = "Smith";

String fullName = givenName + " " + familyName;

// create an empty Model

Model model = ModelFactory.createDefaultModel();

// create the resource

// and add the properties cascading style

Resource johnSmith

= model.createResource(personURI)

.addProperty(VCARD.FN, fullName)

.addProperty(VCARD.N,

model.createResource()

.addProperty(VCARD.Given, givenName)

.addProperty(VCARD.Family, familyName));

Use of Metadata:

Each component of the e-learning system can be described with the help of metadata. The metadata level is the first level of a semantic Web-based application. This metadata can be attached to each software component of the e-learning system in order to store several important characteristics (e.g., information regarding uptime, ownership, execution platform, etc.). Also, for each user we can retain the information about his/her status. For example, we can store the user role – administrator, database manager, security monitor, regular user (tutor, student or visitor etc. Also, the system can retain personal data (e.g.,age, user e-mail address, location, etc.), and user-interface preferences (layout, chromatic and interaction preferences, etc.). To associate and store metadata, we use RDF – an XML-based model for processing metadata. RDF standard provides interoperability between applications that exchange machine-understandable information on the World-Wide Web. RDF is intended to be used to capture and express the conceptual structure of information offered by the Web. RDF metadata can also describe client’s

rdf:resource="http://www.infoiasi.ro/courses/web" />

rdf:resource="http://www.cs.pub.ro/teach/os" />

rdf :about="http://students.infoiasi.ro/~stud">

...

The namespace prefix t refers to a specific namespace prefix chosen by the author of the RDF expression and defined in an Inter-connectivity of the E-learning System’s component.

Using Ontologies:

A superior level of modeling is to create or (re)use ontologies to represent the knowledge within the application. The semantic structure achieved by ontologies differs from the superficial composition and formatting of information (viewed as data) afforded by relational and native XML databases. Ontologies are able to provide an objective specification of domain information by representing a consensual agreement of the concepts, characteristics, and relations characterizing the way knowledge in that domain is expressed. The RDF – Resource Description Framework and OWL –Web Ontology Language facilities are significant in modeling of a knowledge-based e-learning system. Using RDF and OWL statements, we can represent – in astandardized way – several information such as:

• instructional content divided into different modules for multiple use and re-use,

• abstract pedagogical entities,

• tutor and student profiles,

• domain, pedagogical, and student knowledge bases(facts and rules) used by the system’s inference engines.

Also, the system can use several upper-level. That can be helpful to model concepts, properties and relations between these concepts. The ontology has been designed as a primitive (general) ontology so that individual communities are able to build on top of it more specific ontological constructs. The primitive category at the core of the ontology is an entity. At the next level three main categories are offered: temporality, actuality and abstraction. RDF and/or OWL languages to improve the inter-connectivity between the components of the given application.

Conclusion:

The actual paper presented a manner of using semantic Web-based standardized languages for modeling the information with Jena toolkit. The main focus of the paper was to use RDF and OWL languages for attaching metadata, and using ontologies to denote knowledge within such a Web-based application. The XML family of languages was proposed to be adopted for information exchanging and interoperability. XML is considered the best solution for information interchange between diverse components of the e-learning system and for semantic representation of data. The whole Jena API provide the packages to implement the Semantic web application using java.

References:

 Practical RDF by Shalley Power.

 Beckett, D. (ed.),RDF/XML Syntax Specification (Revised), W3C Recommendation, Boston, 2004:

 http://www.w3.org/TR/rdf-syntax-grammar

http://www.sourceforge.com

Comments

Shylesh GiriMarch 9, 2011 at 9:20 AM
Good work ...
sujayJune 16, 2011 at 2:21 AM
http://jena.sourceforge.net/
sujayJune 16, 2011 at 2:22 AM
This Link gives the whole idea about the API...
sujayJune 16, 2011 at 2:26 AM
http://sourceforge.net/projects/jena/files/ARQ/ARQ-2.8.8/arq-2.8.8.zip/download

This is to download the latest version of jena api for JAVA
sujayJune 16, 2011 at 2:27 AM
The group id is "com.hp.hpl.jena". Artifacts include jena, the IRI library, arq and sdb, together with a number of systems that Jena depends on in source form.

Using this we can write RDF files..

Techno World

Search This Blog

Semantic Web Implementation using jena

Comments

Post a Comment

Popular posts from this blog

Google re-branded the support Android libraries to AndroidX

Android Studio Release Updates

Android Pi migration(28 API support)