In the last decade, there has been an increased interest in making government data open and easily accessible to the public. Technological advances in the area of semantic web have given rise to the development of the so-called Web of Data, which has invoked an even stronger focus towards such efforts. More recently, advances in the area of natural language processing through the use of deep learning provide technologies which enable the successful processing of textual data.
An important kind of government data is the data related to legislation. Legislation applies to every aspect of people’s living and evolves, continuously building a huge network of interlinked legal documents. Therefore, it is important for a government to offer services which make legislation easily accessible to the public, aiming at informing them, enabling them to defend their rights, or use legislation as part of their job. Towards this direction, many European Union (EU) countries have already automated the digitalization process by developing platforms for archiving legislation documents and offering on-line access to them.
The semantic web standards, such as the Resource Description Framework (RDF) and Web Ontology Language (OWL), facilitate the separation between content, presentation and metadata, thus helping to make better use of the information contained in these documents. A popular form and vocabulary used to codify the structure and content of legal and ancillary documents is ELI, an initiative of the European Committee to standardize the form provided by legislation. ELI serves as an RDF form of data exchange, but recently an identical ontology has emerged (http://publications.europa.eu/mdr/resource/eli/eli.owl), which is expressed in OWL language and used as a vocabulary for expressing the metadata of legal documents.
In Greece, there is still a very limited degree of computerization around the legislative process, so even finding legislation related to a particular subject can be quite difficult. A recent law and its program, called Di@vgeia (https://diavgeia.gov.gr), he tried to correct this situation by obliging all government bodies to raise their actions and decisions on the Web. Di@vgeia offers basic search functions by using keywords on the content of their legal documents and metadata, as well as a service that enables developers to search for this collection of documents based on some default metadata information. This service, as well as other government web services such as the National Printing House site (http://www.et.gr) do not analyze/codify the legal text for the purpose of exploiting the information.
In this project, we are following the successful efforts of other countries in Europe and aim at modernizing the way the legislative work is offered to the public. In line with the goal of Di@vgeia, we envision a new state of affairs in which people have advanced search capabilities at their disposal on the content of legislative work. We envision a paradigm of distribution of legislative work in a way that developers can consume, so that it can be also combined with other open data to increase its value in the interest of people. To the best of our knowledge, there is no other effort in Greece or related decisions made by government institutions and administration alike which takes this perspective on legislation into consideration.
We perceive the legislation as a collection of legal documents with a formal structure. Legal documents can be linked from the point of view of amendments with references to other documents, which reflects the rich semantic information and the interdependencies between the documents. A modern country needs intelligent services that present not only the form of text contained in the legal documents but are able to answer complex questions such as "What legal documents has a particular minister signed during his term as Finance Minister?", "Which legal documents have been amended and by whom?" or "Retrieved the 10 most frequently amended legal documents between 2008-2013." To enable the configuration and response of such complex questions, we have designed and developed a prototype web application, called Nomothesi@, which offers Greek law to RDF, as published in the Government Gazette. Greek law has been modeled in Legislation @ according to an OWL ontology that re-uses the ontology of ELI and extends it where necessary to capture the specificities of Greek law. Legislation @ offers advanced presentation and search services to the metadata and content of the Greek text legislation. An important feature of Legitimate @ is that it offers a SPARQL queries endpoint and a RESTful service that can be utilized by developers to consume project content and this is combined with other open data. From this point of view, Nomothesi@ opens a whole new market that can develop services based on Semantic Web technologies with immediate social benefits and big business opportunities.
Nomothesi@ provides open access to a collection of over 12,000 legislative projects, interconnected with approximately 124,400 links and over 8,000 unique entities, such as geopolitical entities, persons, organizations, and geographical points of reference.
Greek legislation is published through different types of documents based on the government members, who curated it due to a specific legislative procedure. It has a standardized structure following the appropriate encoding, which may be reformed according to subsequent modifications.
There are five primary sources of Greek legislation we are considering in this work: constitution, presidential decrees, laws, acts of ministerial cabinet, and ministerial decisions. These sources of legislation are materialized in legal documents, which are encoded following specific standards. Legislation is an event-driven process. Legal documents are published in the government gazette, while they may be modified by later legal documents in terms of content modifications, and finally come out of enforcement. In the course of this process, we need to capture the structural information of legislation and the evolution of its content through time, given by the legislative modifications applied on the primary legal document.
Nowadays, the encoding of Greek legislation follows the rules set out in “Manual Directives for encoding of legislation”, which have been designed by the Central Committee of Encoding Standards and legislated in LAW 2003/3133. The encoding of a legal document is organized in a tree hierarchy around the concept of fragments that are articles, paragraphs, cases, or passages. These fragments are described below. Articles are the basic divisions in the text of legal document numbered using Arabic numerals (1, 2, 3, …) or, in the case of insertion of a new article in an existing legal document, by combining Arabic numeral with upper-case Greek letters (A, B, Γ, ...). An article may have a list of paragraphs that are numbered using Arabic numerals. If an article has a single paragraph, the numbering of that paragraph is omitted. Paragraphs may have a list of cases. Cases are numbered using lower-case Greek letters (α, β, γ, ...) and may have sub-cases which are numbered using double lower-case Greek letters (αα, ββ, ...). The verbal period between two dots is termed as passage. Passages are the elementary fragments of legal documents and are written contiguously, i.e., without any line breaks between them. Passages are the building blocks of cases and paragraphs. Last, legal documents may be subdivided according to their size at larger units, such as books, chapters, or sections, which are numbered using upper-case Greek letters. The larger units and articles may have title, which must be general and concise in order to bear their content, and is used in the systematic classification of the substance of legal documents. In addition to the aforementioned structural elements, legal documents are accompanied by metadata information. This includes the title of the legal document, which must be general enough but concise so as to reflect its content, the type (e.g., law, presidential decree), the year of publication, and the number (i.e., the serial number counting from the begging of the year for each type). These last four pieces of metadata information serve also as a unique identifier of the legal document. Of equal importance are also the issue and the sheet number of the Government Gazette in which the legal document is published.
When the reference to other legislation is necessary, this should be done uniformly throughout the text. Specifically, for purposes of accuracy and reading usability, and must bear the number of the legal document and the year of publication. At the first occurrence of the legal document, the issue and the number of the sheet of the Government Gazette must be stated in brackets. It should also be mentioned the fragment thereof, where such reference.
It is common international practice the amendment of a legal document by subsequent legal documents. Unfortunately, given the encoding of legal documents, there is no standard methodology that is followed for the codification of this legislative concept. This makes the whole process of the amendment very challenging from our perspective. By systematic observation, we reached to the conclusion that there are three main types of legislative modifications: 1) the substitution of a specific fragment by another introduced by a subsequent legal document, 2) the insertion of a new fragment and 3) the deletion of a specific fragment. All these kinds of modifications produce new versions of the original legal document. At any time point, the state of a legal document corresponds to the original document reformed by all subsequent modifications applied to it, until the specific time point.
Uniform Resource Identifiers (URIs) are short strings that identify resources in the Web: documents, images, downloadable files, services, electronic mailboxes, and other resources. Such identification enables interaction with representations of the web resource over a network, typically the World Wide Web, using specific protocols. In addition to utilizing the HTTP requests appropriately, resource naming is arguably the most debated and most important concept to grasp when creating an understandable, easily leveraged Web service API. When resources are named well, an API is intuitive and easy to use. Done poorly, that same API can feel klutzy and be difficult to use and understand. Essentially, a RESTFul API ends up being simply a collection of URIs. In our platform, each resource has its own address or URI-every interesting piece of information the platform can provide is exposed as a resource. In other words, the RESTful principal of addressability is covered by the URIs. We have chosen that each resource in a service suite will have at least one URI identifying it. And it's for our benefit when that URI makes sense and adequately describes the resource. URIs should follow a predictable, hierarchical structure to enhance understandability and, therefore, usability: predictable in the sense that they're consistent, hierarchical in the sense that data has structure-relationships.
Fixed URIs to divisions of legislation are very important, as they are on legal documents in general. Various initiatives are trying to upgrade reliable classification for the legislation to existing bibliographic scheme. Their aim is to facilitate the process of creating URIs for legal sources, regardless of the availability of a document on the web, location of a document, and the way to access it. Based on international practice and the particularities of Greek legislation, we proposed a schema of URIs, which is very similar with the UK. In our platform every single legal document, its subparts, its versions or services are resources, which need to be addressed by a specific URIs system. URIs system must be persistent and build so as URI for any kind of resource to be highly guessable. There are a number of different ways one might assign an unequivocal identifier to a legislative document. We have decided to use HTTP URIs. These URIs have been designed following the guidelines of our conception on the matter, but we are hoping that our work will form a new way of describing Greek legislation over the Web.
We would like to present the Schema of URIs one can get in contact with when using our REST services.
Any field within curly brackets needs to take specific value. As a type of legislation, we mean all different types of Greek legislation, we are using the encoding of our platform (e.g. con, law, pd etc.). Year is actually the year of publication (e.g. 2012) and ID is the Number (ID) of the specific legal document. So for example if we want to address in Law 12 of 2014, the corresponding URI is:
An extra goal, and vision at the same time, from this project aims for it to be considered as a RESTful (Representational State Transfer) Web platform and to provide keystone ingredients for other and maybe more specialized applications. We have developed and hoping to continue developing services that allow our API users to retrieve Greek legislation in many forms (PDF, RDF, XML and JSON) from accessing via HTTP GET requests specific URIs (Uniform Resource Identifiers).
We keep on stating, that Nomothesi@ is a RESTful API. One of its goals is to serve a series of Web Services over Greek Legislation to encourage further and more specialized projects. We will describe some of these services, so that we can underline the benefits of working with one single RDF data model through Sesame Server, hoping to inspire other projects to adopt this method, shaping a simpler Semantic Web. In our API one can request any legal document in PDF, XML, RDF, HTML and JSON format, as well as its enacted version or its updated one or even a specific version based on a date. In terms of technical engineering the RDF data model contributed in a very determinant way. We eliminated the unnecessary calls to the database and at the same time with RDF text annotations we managed to separate the text of a legal document based on the request. In this way we don’t only keep the complexity with requests low (one single query to fetch everything and then decide what to use), but also we presenting an expandable model which can very easily adopt more languages in the future.
SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware. SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions, while it also supports extensible value testing and constraining queries by source RDF graph. For the very first time in Greek Legislation and worldwide, our project innovates with the idea of serving a SPARQL Endpoint in a RESTful manner. Users can now take advantage of legal documents not only by reading them, but also querying all the semantic information. This forms a new path in the way legislation is being served and represented by any organization or official institution and we hope that many other will follow our way until we have unified government archives of all types. The main query operations in this system are:
• Obtaining the legislative document valid at a given date or a time space.
• Historical evolution of a legislative document.
• Full text search in the text of the legislative document.
• Laws repealed by a law.
• Obtaining all kinds of Metadata (the model allows new Metadata to be easily included).
• Description queries on the RDF graph.
Some conclusions are, that our architecture is type free which is the key to REST Services. We have one unified request-respond architecture, which differentiates from each type only with the final respond contracting method. That proves once again that RDF data models and Spring MVC are the most suitable tools for a RESTful API like Nomothesi@. We avoided creating long and messy methods for each media type and we ended with an expandable, easy and intelligent sequence of request building methods.
The following dates appear in legal documents:
The date the document was signed by the greek government. The information of this date is extracted directly from the Official Gazette through text processing and analysis procedures, and is preceded by the following dates.
The date when the document was published in the Official Gazette. This date is also extracted directly from the text and usually follows some days of the Signature Date. However, they are likely to be identical.
The date the provisions begin to apply. It is ealso xtracted through the text analysis and marks the entry into force of the provisions provided by the law.
The date when the law and its provisions are updated by another law. The amendment of a legal document is a common international practice and this date indicates the existence of an amendment to this document.
The date stating the current and updated state of the law. This corresponds to the original legal document with the application of all subsequent amendments applicable to it until today's date.
The basic subsystems of the platform consist of:
The ontology of the LEGISLATION platform (Figure 1) adopting the ELI framework (http://www.eli.fr/en/), the model for codifying legislation in the EU Member States.
The subsystem that processes the first issues of the Greek Government Gazette (FEK) from 1990 to 2018, providing a collection of more than 12,000 structured legislative acts.
This subsystem is used to analyze the legal text, to issue legislative amendments and to provide updated codification of Greek legislation.
The subsystem that recognizes references to multiple entities such as geopolitical entities, persons, geographical reference points and legislation and links reports to open public data sets (Greek administrative units and Greek politicians) to enhance the knowledge base of Nomothesi@ and to increase user search capabilities.
The subsystem that recognizes the interfaces between the European Directives and the Decisions incorporating them into the Greek Legislation.
The database that stores more than 4.5 million triplets that form the basis of knowledge. Nomothesia Datastore is the backbone of the platform.
The RESTful API that provides REST services to both people and computer clients. In our API one can request any legal document in PDF, XML, RDF, HTML and JSON format, as well as its original version or the current or even a special version based on a date.
The Endpoint that provides access to run SPARQL queries. Our project is innovating with the idea of having an SPARQL Endpoint in a RESTful way so that users can benefit from legal documents not only by reading them but also by asking all the semantic information.
This sub-system will categorize these laws and articles based on the hierarchy of the thematic units defined by the Code of Legislation-Raptarchis. (UNDER DEVELOPMENT)