Establishing of a Slovenian open access infrastructure: a technical point of view

Purpose – The purpose of this paper is to present a technical perspective when implementing the Slovenian open access infrastructure that consists of four institutional repositories (IRs) and a national portal (NP) that aggregates content from the repositories in order to provide a common search engine, recommendations of similar documents, and similar text detection. Design/methodology/approach – During the project, the necessary legal background and processes for mandatory submissions of final study works, research publications and research data were established, as well as processes for data exchange between the IRs and the NP, and processes for similar text detection. Findings – The consortium consisted of four Slovenian universities that significantly differ in size, organisation, and workflows. It was anticipated that exactly the same legal background and software would be used for the four repositories. It turned out that complete unification was impossible due to the differences. Practical implica...


Introduction
The possibilities for publishing research results have greatly improved since the development of the Internet. Search engines can provide links to publications within the research areas in less than a second. In the early 1980s, research organisations around the world had realized that they could improve their recognition by publishing their works on the Internet. Initially the majority of institutions published the final works of studies, research articles and technical reports on their websites or FTP servers, but later found that an institutional repository would be required to enable searching, browsing, and archiving (Kim, 2011). As the publications would Establishing of a Slovenian Open Access Infrastructure: a technical point of view be accessible to a wider array of stakeholders, science would evolve faster and knowledge would no longer be a monopoly of the wealthy (Suber, 2012). In addition, students and mentors would be aware of open access to their final works of studies, and thus strive to improve their work. The open availability of final study works in digital format would also allow the detecting of similar contents.
A national network of repositories increases the visibility and impact of national research activities and provides additional services for visitors such as federated searches across all involved repositories and the archiving of publications. A more detailed presentation of this is given in »Recommendations for Implementation of Open Access in Denmark« [1] and in the recommendations of the European Commission for the EU Member States on establishing national open access policies and infrastructures [2]. In Europe, a few such national infrastructures are already established. These are the Dutch NARCIS [3], the Irish RIAN [4], the Norwegian NORA [5], the German OA-Netzwerk [6], the Greek openarchives.gr [7], the Italian PLEIADI [8], the Polish »Poland Digital Libraries Federation« [9], the Portuguese RCAPP [10], the Spanish RECOLECTA [11] and the Swedish DiVA [12]. These portals aggregate metadata and allow searches across all repositories included within their own infrastructures. In 2008, the formation of such an infrastructure took place in Hungary (Karacsony, 2013). At the time of writing this article, they hadn't as yet established a national aggregator and common search engine.
The Slovenian open access infrastructure, presented in this paper, consists of institutional repositories (IR) for each Slovenian university and a national portal (NP) that aggregates content from the repositories. This open access infrastructure has similar functionalities as the previously cited national open access infrastructures. A national portal aggregates metadata and full texts of research publications, final study works and research data from institutional repositories. It enables federated search, plagiarism detection, and recommendation of similar documents among institutional repositories. Institutional repositories enable the submissions of final works of study, research publications and research data, Web 2.0 functionalities (social bookmarking, RSS, JSON, JavaScript API, and the ranking of and commenting on publications), archiving, searching, taxonomy browsing, OAI-PMH and statistics (about archived publications, their authors and usage of metadata and content). This infrastructure does not yet support the enhanced publications, which are supported in [3], however this is planned for the future. It is also planned to offer the use of metadata for citation purposes in different citation styles (APA, Harvard, IEEE, etc.) and their exporting to different reference management tools (Zotero, Endnote, RefMan, RefWorks), as can be seen in [3] and [7]. Additionally, each item, deposited within one of the four repositories that currently comprise the Slovenian national open access infrastructure, is checked against similar content (plagiarism detection) within the existing corpus of texts. Amongst the previously cited national open access infrastructures, only the German OA-Netzwerk [6] supports the detection of similar documents. The Slovenian national infrastructure is the only one that has a recommendation system that provides recommendations across all institutional repositories. These recommendations consist of titles from within the same IR and titles from other IR's and participating digital archives (Digital Library of Slovenia [13], VideoLectures.NET [14], and DKMORS [15]). The authors believe that the Slovenian open access infrastructure has some distinct advantages such as: all four institutional repositories use the same custom-built software, which is integrated with information and authentication systems of the universities, with the COBISS.SI national bibliographic system [16], the SICRIS national current research information system [17] and the national portal Open Science Slovenia. The national portal and the four repositories are also available from mobile applications for Android, Windows Phone and iOS devices. These features have been unavailable in any other of the previously cited national infrastructures.
Prior to the implementation of this project, there were already some open access repositories and digital archives in Slovenia that contained the final study works and research publications. These contents were, in most cases, inserted by librarians and not by the authors. Most institutions did not define adequate processes and legal background that would enable authors to deposit their publications within the repositories. An exception was the Digital Library of the University of Maribor, where such a process had already been implemented, thus enabling students to submit their final study works to the digital library. As it was impossible to simultaneously search the contents from different repositories and digital archives and with an absence of processes and policies for the submission of content, the consortium of Slovenian universities has decided to sign up to a call from the Ministry of Education, Science and Sport and has received funding for the establishment of a national infrastructure of open access (from the Ministry and the European Regional Development Fund). This project started in February 2013. During its establishment valuable experiences were learned as described in the articles of numerous authors (Bevan, 2005;Barwick, 2007;Doctor and Ramachandran, 2008;Jantz and Wilson, 2008;Tripathi and Jeevan, 2011;Paul, 2012).
The national open access infrastructure is the underpinning element for the introduction of open access policies in any country (for EU Member States see the Commission Recommendation on access to and preservation of scientific information [2]). No open access policies were implemented in Slovenia prior to the establishment of the national open access infrastructure. The Slovenian Research Agency requires that final reports of co-financed projects are deposited into the Digital Library of Slovenia [13], as well as copies of co-financed scientific and professional journals. In 2013 Slovenia named a National Point of Reference for Scientific Information, who is responsible for coordinating the implementation of measures from the Commission Recommendation [2]. The European Commission expects that by 2014 the policies for open access to scientific articles and data will have been established in all Member States at all relevant levels [18]. Anecdotal evidence tells the authors that the discussions have already started in Slovenia about the formation of open access policies regarding research funding institutions and research performing institutions.
The aim of the infrastructure presented in this article, is to enable open access of the intellectual production of Slovenian universities to interested stakeholders in Slovenia and worldwide. This article focuses on a description of the Slovenian open access infrastructure and functionalities offered by the software in use. The second section of a paper presents the state of open access in Slovenia before the establishment of the national infrastructure. The architecture of the Slovenian infrastructure of open access is covered in the third section. The organisational structure and processes for the mandatory depositing of publications within Establishing of a Slovenian Open Access Infrastructure: a technical point of view institutional repositories are described in the fourth section. The document similarity and plagiarism detection approach is presented in the fifth section. The sixth section describes the recommendation system, used in the repositories for providing content-based recommendations. The seventh section covers some conclusions, and guidance for further work.

Overview of the Slovenian open access repositories and document archives before the establishment of a national open access infrastructure
Prior to setting up the national infrastructure for open access, the sources were examined that could be incorporated within the infrastructure. Different Slovenian organisations were issuing over 40 openly accessible journals recorded in the Directory of Open Access Journals [19]. A few organisations were publishing openly accessible monographs (e.g. Educational Research Digital Library, which contained a few dozen works).  Establishing of a Slovenian Open Access Infrastructure: a technical point of view addition to metadata of full-text publications from outside the repositories, RUL also contains full-text publications and metadata from ePrints.FRI, PeFprints, DRUGG and Social Science Data Archives (SSDA) [21]. Metadata is gathered via OAI-PMH servers. National portal Open Science Slovenia [22] aggregates content and metadata from all institutional repositories and partially from dLib.si [13], VideoLectures.NET [14] and DKMORS [15]. Additionaly, SSDA, ePrints.FRI, PeFprints, DRUGG, dLib.si, DKMORS and VideoLectures.NET all have defined publication submission processes. Metadata can be gathered from these sources in XML and JSON formats. The Open Science Slovenia portal only takes metadata and full-texts of publications for the purposes of federated search, recommendations regarding publications and the detection of similar publications.
Institutional repositories store full-text versions of final study works, and research publications as well as other intellectual productions from universities (teaching materials, project reports, studies, monographs, research data ...). Institutional repositories are powered by the same software used to run the Digital Library of the University of Maribor (Brezovnik and Ojsteršek, 2011). The software has been substantially upgraded due to the need for new functionalities and regarding process support for student/staff submissions of publications.
The national infrastructure of open access exchanges metadata with the Slovenian bibliographic system COBISS.SI (Seljak and Seljak, 2002) via SRU/SRW services. COBISS.SI [22] consists of the local databases of libraries and a union catalogue that aggregates metadata from the local databases. Local databases belong to the libraries of individual academic institutions, however exceptions do exist where one database is linked to several academic institutions and where several databases are linked to a single academic institution. A part of COBISS.SI also contains authority controla normalisation database of personal and corporate names (CONOR.SI), where information is stored about authors and their affiliations to organisations. CONOR.SI also holds records of the Slovenian Research Agency's identity numbers for researchers, which can be used for establishing connections with the SICRIS (Južnič et al., 2010). SICRIS holds data about researchers, research organisations, research groups and research projects, therefore being of service during the evaluations of researchers' performances. Researchers can have their publications catalogued in COBISS.SI and by doing so improve their grades in SICRIS, which thus affects their rankings when trying to obtain research funding grants or want to be promoted.
Institutional repositories consist of administrator and visitors interfaces. The administrator interface is intended for the Study Office personnel, librarians, and system administrators. The Study Office personnel overviews the submission of the final study works. Librarians catalogue the student and staff publications into COBISS.SI and transfer the catalogued metadata from COBISS.SI into institutional repositories. The administrator user interface enables the import of publication metadata using a COBISS.SI identification number. In this way, all metadata of catalogued publications within COBISS.SI can be transferred into institutional repositories. This is used for all publications in the digital format that were submitted to different archives before the establishment of institutional repositories, and are adequately copyrighted for the university to make them accessible through the university repository.
The user interface can be used by registered (students and university staff) and non-registered visitors (general public). Students and researchers can deposit their publications on the repository and review the content (metadata and similar documents found by the detector of similar documents). They also have the possibilities of using Web 2.0 functionalities (social bookmarking, ranking of and commenting on publications).The user interface for the general public is bilingual (Slovenian and English) and is available as a Web application and as native mobile applications for Android, iPhone and Windows Phone platforms. The interface is adjusted to persons with disabilities in regard to the WAI specifications [26]. Faculties can include some of the functionalities from the repositories using a JavaScript API, which is also used by mobile applications. Metadata can be exported in a variety of formats including RSS, JSON, and RDF.
Green open accesspublishing in traditional or open scientific journals and submissions to institutional or subject repositories is encouraged by the European Commission. The Horizon 2020 funding programme determines mandatory open access to all publications from co-financed EU projects. The European Commission co-financed the European infrastructure of open access for research publications (projects OpenAIRE (2009(projects OpenAIRE ( -2012 and OpenAIREplus (2011OpenAIREplus ( -2014). Therefore, the established institutional repositories include OAI-PMH servers that return OpenAIRE compatible XML [27], thus enabling OpenAIRE servers to gathered metadata about Slovenian publications that were produced as the results of projects financed by the EU.

Organisational structure and processes for the mandatory depositing of publications in Slovenian institutional repositories
The ODUN project had a project manager and project team heads at each university. Each university project team head supervised a project group for policies and processes' definitions, and a project group for the information infrastructure. Each of the project group heads choose their own project co-workers. The project groups for policies and processes' definitions cooperated with offices, responsible for research or teaching at the respective universities. The same organisational structure is still in place today. Additionally, a group has been established consisting of representatives from all four universities. This group is responsible for the continued operation and development of the infrastructure. Each university also employs an IT specialist who is responsible for Establishing of a Slovenian Open Access Infrastructure: a technical point of view technical maintenance and support of the IR. Questions and problems are handled by the content administrators at universities who, if necessary, contact the student offices or librarians at faculties or academies. The changes regarding libraries after establishing the national infrastructure included assistance to students and researchers in understanding the newly established open access environment (understanding processes regarding mandatory depositing of publications within institutional repositories, plagiarism prevention and detection, preparation of final study works according to university templates, etc.) and to assist researchers in clearing the copyrights issues. Actual operational activities, based on organisational changes, both at the national level and at individual repositories, are not fully determined as yet. Each university has also appointed a group for the advocacy and dissemination of information on open access to all stakeholders (university managements, researchers, students, librarians).
Within the project, legal background of final study work submissions were analysed in regard to their archiving within the institutional repositories and open access. Appropriate rules were newly created or revised. This included rules about a mandatory copy and instructions on the preparation of bachelor, masters and doctoral theses.
The process, which was established at universities, allows authors to submit their work into the institutional repository. After which, the submission is reviewed and catalogued into COBISS.SI by a librarian. Based on the experiences accumulated since 2008, it was found that a manual review of metadata is mandatory as in many cases the submissions are incomplete or contain spelling errors. In addition, only librarians can normalise those authors using the CONOR.SI authority database. After the publication metadata have been successfully catalogued within the COBISS.SI, they can be transferred to the institutional repository via SRU/SRW services.
The working group, which consisted of librarians, legal experts and IT professionals from all four Slovenian universities, suggested the submission processes of the final study works. Slovenia lacks a common university academic information system (UAIS), therefore each university provided its own variation of this process. Students at the universities of Maribor and of Nova Gorica submit their final study works into IR, which is filled with some metadata from UAIS. A sequence diagram of the final study work publications of the universities of Maribor and of Nova Gorica is presented in Figure 2. The submissions at the universities of Ljubljana and of Primorska take place at UAIS, the sequence of operations is otherwise the same as described for the universities of Maribor and of Nova Gorica.

Figure 2. A sequence diagram of final study work submission and publication at the universities of Maribor and of Nova Gorica
Establishing of a Slovenian Open Access Infrastructure: a technical point of view

Figure 3. A sequence diagram of research item submission and publication
All four Slovenian universities have established the same process for research publications submissions, which is presented the sequence diagram in Figure 3. Researchers can submit articles, monographic chapters, monographs, conference papers, study materials, publications about patents, research data, and other types of publications. The types of publications are adapted according to the COBISS.SI typology, which is used by the Slovenian Research Agency for researchers' bibliographies evaluation. Once researchers are logged into the IR, they can submit new content as a whole, or they can use metadata from the catalogued record in COBISS.SI (optional request and replay on the beginning of the process). In the latter case, the IR takes care of the metadata transfer from COBISS.SI using the SRU/SRW protocol. In this case, researchers only need to provide the electronic version of their publications. A link to the SHERPA/RoMEO portal is also enabled to the publication authors, so they can check what type of access they can use depending on the publisher's copyright transfer agreement. During the insertions of names and surnames, suggestions from the CONOR.SI database are provided. These suggestions include the year of birth, if available in CONOR.SI, and researcher identifier from SICRIS, which simplifies the determination of the correct author. This can greatly simplify the librarian's work of cataloguing the publication in COBISS.SI. Authors can also determine the copyright holder and the type of access to the full-text publication. They can choose between immediate publication, closed access or delayed publication with embargo (these metadata are part of OpenAIRE compliance).

Similar text detection
Increases in copying from published texts without proper citation (i.e. plagiarism) have been noticed over the recent years. This problem has increased due to the growing number of documents available on the internet. The reason for improper citation is either lack of knowledge or deliberate copying of individual sentences or even chapters from the works of other authors (Alzahrani et al., 2012).
Finding similarities between documents is performed over two steps. The first step is document fingerprinting. Those documents that are similar to the document to be checked are found using algorithms which compare different document features. These features can be retrieved from the document itself (from chapters, paragraphs, sentences, fixed number of words) using hashing algorithms or using frequency analysis of words or phrases, which are obtained by semantic tagging. Features retrieved using hashing algorithms (Stein, 2007), are compared and the result is a list of documents and the percentages of coverage between features of document pairs. Vector-space model algorithms (e.g. LSA, BM25, Random indexing) and text classification algorithms (k-nearest neighbours, Naive Bayes, SVM) use frequency analysis of words or phrases (Alzahrani et al., 2012). The result is a set of candidate documents. This can be a fixed number of documents (e.g., first 50 documents ordered by similarity of a coverage percentage between document features). In the second step, called "pairwise feature-based exhaustive analysis", pairs of documents are checked using the longest common substrings. Several algorithms exist for this task, as described in a survey article (Navaro, 2001).
Establishing of a Slovenian Open Access Infrastructure: a technical point of view Document similarity detection (or plagiarism detection) is done within the national portal. The results of detections are available to students and staff. They can only see those documents where they had the roles of author, co-author or mentor. The detection is performed for each publication, submitted into the university repositories. Both coarse-checking (step 1) and fine-checking (step 2) analyses are supported by the custom-built software of the authors. The software does not check the similarities between images.
Software for coarse-checking texts returns similar sentences that are longer than forty characters (example of a result is shown on figure 4 -above). It enables coarse-checking due to its use of upgraded TextProc (Brezovnik and Ojsteršek, 2011). The language infrastructure used during the plagiarism detection process, consists of a morphological dictionary for the Slovenian language, which contains of approximately 8,000,000 word forms and 320,000 lemmas. Wikipedia labels from article titles from Slovenian, English and German Wikipedia are also used. They were extracted from Dbpedia (Morsey et al., 2012). A domain-specific semantic dictionary was created using keywords from publications metadata within the Open Science Slovenia portal. Sentences, marked by the coarse-checking software as similar, are undoubtedly the same in both texts. They differ only if the authors used synonyms, used a different grammatical person or used filler words (e.g. »therefore«, »however«). The software detects similarity even if the word order is changed or if any of the words are misspelled. The Coarse-checking algorithm (Brezovnik and Ojsteršek, 2011) which has been subsequently updated, first converts the text into UTF-8 format, eliminates extra whitespaces and new line characters (CR, LF). Then it splits the content into sentences. Words from these sentences are then lemmatised. Common words (e.g. »and«, »or«, »that«, etc.) are filtered out and all the remaining words are sorted alphabetically. This step also carries out spelling corrections using a morphological dictionary and POS tagger. In order to correct spelling errors the "Symmetric Delete Spelling Correction Algorithm" [28] is used. After the lemmatisation, the algorithm normalises those synonyms stored within the dictionary and can be transformed into single forms without changing the semantic meanings. A good example of this are the normalisations of the verbs »to present«, »to describe« and »to show«, which are synonyms in most cases. The algorithm then calculates hashes for these sentences. Finally, it compares the hashes of sentences from other documents within the corpus of documents, calculates the coverage of hashes for document pairs and provides a list of similar documents for every document within the corpus. Figure 4. An example of coarse-checking (above) and fine-checking (below) outputs between two texts Those documents that are found to be more than 1 % similar to the reviewed document during the coarse-checking step become candidates for entering into the fine-checking step. If the number of these candidates is less than 50, the rest of the similar documents are retrieved using the BM25 ranking function (Robertson et al., 2004). The fine-checking algorithm (example of a result is shown in Figure 4 below) finds the longest common sub sequences between two texts. Kärkkäinen's algorithm (Kärkkäinen et al., 2009) is used for finding common sub-sequences greater than 14 characters. Another algorithm (Ferme and Ojsteršek, 2011) is used in cases where the pairs of documents have more than 60 % of coarse-checking coverage of hashes. This algorithm has a time complexity of nearly O(N) if the pairs of documents are very similar. No spelling corrections are carried out for those texts using 'as-is'. If two documents are fine-checked (Figure 4 -below), the software marks those phrases or parts of sentences that are the same within both documents. A reviewer's task is to determine whether a sentence or a paragraph has been copied. Some sentences can be semantically identical as a whole or in part if the author has paraphrased the copied content. Similarity constants Establishing of a Slovenian Open Access Infrastructure: a technical point of view (1% of coarse-checking coverage, 50 candidate documents for fine-checking, sentence length of 40 characters, subsequence length of more than 14 characters, etc.) are selected on the basis of experiences from examination plagiarism cases.

Recommendation system
The main goal of recommendation systems is to provide the visitors of the institutional repository with contents covering their interests. There are several approaches to recommendations, mostly divided into two categories. The first category of approaches operates exclusively over visitor's activities (Su and Khoshgoftaar, 2009). It uses the known activities of a group of visitors to make recommendations or predictions of the unknown activities for other visitors. Amongst the most widely used recommendation algorithms are collaborative filtering, binary vector approaches and SlopeOne. The second category of recommendation algorithms work solely with content, whilst the user activities are not of such importance but can be used to further weight the results during ranking. Similar text search falls within the area of text classification. Some of the algorithms that enable similar content recommendations are BM25, k-nearest neighbours, and latent semantic analysis. In addition, recommendation systems can differentiate between whether recommendations are calculated in real-time (memory based recommendation), or are pre-calculated (model based recommendation), and even a hybrid approach can be used (Bobadilla et al., 2013).
The objective of the recommendation system is to enable the visitors of institutional repositories to find similar documents after clicking onto a document within the institutional repository. Partial duplicates are omitted from the recommendations. Partial duplicates are determined using similar sentence and substring detection. If two documents have a sentence-based or substringbased coverage value of more than 60 %, they are marked as partial duplicates. The recommendation software includes contentbased document recommendation that uses the BM25 ranking function and utilises additional weights during ranking (Borovič, 2012). Firstly, the metadata for each publication is obtained (authors, title, keywords, and abstract). Secondly, the metadata and the full-text of the publication are lemmatised. By utilising Wikipedia articles, a semantic tagging process (Burjek, 2011) of metadata and full-text of publications is used during the third step. Term frequencies (TF) and inverse document frequencies (IDF) are calculated for each publication during the fourth step. TF and IDF weights are determined from semantically tagged metadata and full-text. Similarity with other publications is then calculated using a BM25 ranking function suggested by Robertson, Zaragoza and Taylor (Robertson et al., 2004). Those pairs of documents with similarity values of 0 are discarded, as they are dissimilar. The result is a list of similar document pairs, which is then stored within the database. The recommendation threshold is set depending on the BM25 values of the documents on the recommendation list. Thus, the recommendation of similar publications is a result of selecting the top five highest ranking publications that exceed the threshold on the list, as ordered by the BM25 value. Additionally, other criteria such as the issue year, number of downloads, number of views and average rating, are used during the ranking process. A recommendation list could also be empty if the system cannot find any similar publications. The essential task is to maintain the database with up-to-date similarities when new documents are added to the system.

Conclusions and future work
The open accessibility of Slovenian research publications and data will increase the visibility and impact of Slovenian researchers and research organisations. It is expected that more opportunities for cooperation with universities worldwide will appear in higher education and research activities. Slovenian and foreign companies will be able to use the results of publicly funded research from Slovenian research organisations, therefore improving the efficiency of public funding. The national open access infrastructure will also influence the Slovenian research policy. At all four universities, the legal backgrounds have been prepared or revised to support a mandatory process of a final study works and research publications submission and plagiarism detection process. Using the plagiarism detector, the number of those wishing to attain academic achievements in a non-ethical way is reduced. Students and researchers are encouraged to produce new, original ideas and to cite previously published ideas accordingly. Open access to the final works of studies influences the quality of peer reviewing the theses by their supervisors and other faculty staff involved during the research process. The qualities of the final works have increased since the universities have established theses submission processes. Plagiarism is being detected and prevented, thus ensuring better technical, citation and linguistic qualities of theses.
Establishing of a Slovenian Open Access Infrastructure: a technical point of view National open access infrastructure enables permanent access to knowledge from anywhere, via any mobile, embedded device or a web browser. Anyone can benefit from the results of education and research, especially those who have limited possibilities to access the information society benefits (people with disabilities, the elderly and other social groups that have limited access to eservices due to physical or other barriers).
Compared to establishing several repositories for individual faculties or academies, the implemented solution with a single repository for each university is more effective from the financial, time wise, and human resource perspectives. The links between the repositories and information systems of universities are operational and a national portal has been developed. In addition, the appropriate compatibility and interoperability is available for the direct participation of repositories and the portal in international initiatives. The national portal will need to be promoted to the visitors of institutional repositories. The institutional repositories are more used since the researchers and students are more familiar with them. The recommendation system is being frequently used, due to the fact that visitors tend to look for similar publications. Between 1st March and 31st March 2014, on average 2.63 recommended publications were viewed. The statistics in use discard the most frequent requests from all known web crawlers, which are stored in our crawler database.
Following on from this project, the universities intend to expand the initial infrastructure and integrate the remaining higher education and research institutions within the national open access infrastructure. An additional repository will be set-up for organisations without their own repositories. The main future activities of the Slovenian national open access infrastructure will be: -inclusion of new features for registered visitors such as personalisation of content in search results, personal bookmarks, repairing of OCR generated text, etc., -enabling display of publication metadata in different citation styles (APA, Harvard, IEEE, etc.) or export to different reference management tools (Zotero, Endnote, RefMan, RefWorks), -establishment of digital preservation of publications, -enabling enhanced publications, -including metadata of research datasets within the national portal from the national infrastructure of open research data (when it will be built), -publishing metadata as linked open data (firstly, the data will be linked with the UDC taxonomy and info:eu-repo ontology [29]; furthermore, linkages with other taxonomies and ontologies will be done, such as Eurovoc, Agrovoc, MESH, and Dbpedia (Morsey et al., 2012)), -enabling the partially automatic extraction of keywords and publication classification using known taxonomies (CERIF, ACM, UDC, ISCED, LCH, DDC, Eurovoc, Agrovoc, MESH…), -enabling the extraction of additional metadata and knowledge in publications stored within the repositories and enabling publication linkage, archived in national infrastructure, according to time, spatial or semantic features.