Curriculum Data Association Organization and Knowledge Management Method for Unstructured Learning Resources

— To solve the problems in the curriculum of unstructured learning resources, data association organizations and knowledge management methods were proposed. Firstly, according to functions and requirements, the associated course data system was designed. Then, the table conversion test and the storage index test were performed. Finally, the merger test of the entity was carried out. Results showed that the data association organization and knowledge management methods effectively solved the problem of the curriculum of unstructured learning resources. In summary, the online learning environment provides conditions for unstructured learning resources


Introduction
The online course is implemented through the network under the guidance of curriculum theory, learning theory and teaching theory.It is the sum of teaching content and teaching activities in an online learning environment designed to achieve the curriculum objectives of a subject area.The online course consists of six components: teaching content, learning resources, teaching strategies, learning support, learning evaluation and teaching & learning activities.Learning resources reflect the advantages of network and multimedia teaching as one of the basic elements of the online course.This is an important part of the online course design.Learning resources in online courses can be divided into structured learning resources and unstructured learning resources according to their degree of structure.Structured learning resources refer to learning materials that have been carefully designed by teachers and organized according to a predetermined structure.Typically, it has a defined source, good structure and stable content.Linear or hierarchical knowledge organization methods such as lesson plans, handouts, supplementary materials, exercises and test questions are used.The sources of unstructured learning resources are uncertain, the structure is ambiguous, the content is dynamically changing and stability is not good.The so-called "unstructured" does not mean that such learning resources have no structure.In fact, the internal contents of all learning resources and other learning resources are correlated.Therefore, it has various structures that are easy to express or not easy to express.In general, linear structures and tree structures are easy to express and regular.This type of resource is also easy to acquire and store.The mesh structure is complex and irregular and the stability is bad.This type of resource is weakly related and difficult to acquire and store in a structured way such as a relational database.Therefore, the situation of such structural ambiguity is summarized in "unstructured".The unstructured learning resources are driven by the Web2.0 network application model.Unstructured learning resources are mainly embedded in social networks based on Web 2.0, such as blogs, wikis, forums, etc.As a provider of unstructured learning resources, online teachers and online learning partners are included in the scope of unstructured resource research.
The autonomy of the e-learning environment and the wider interactivity provide conditions for the development of learning resources, especially unstructured learning resources.However, these resources are all generated by people.Based on this perspective, the connotation of unstructured learning resources is human intellectual resources.The manifestation of this kind of resources is materialized, nonmaterialized, explicit and implicit.Its own internal structural features and the order of transmission of knowledge present a non-linear "unstructured" representation.It is dynamic and easy to share, spread and develop.Stability and controllability are bad.

State of the Art
At present, more research has been done on structured learning resources at home and abroad.Anshari et al. [1] believes that based on the perspective of human resources, any materialized and non-materialized learning resources were ultimately a summary and expression of human hidden empirical resources or intellectual resources.Cruz-Benito et al. [2] stated that in the online learning environment, learners communicate with other members of the community through tools such as Email, BBS, Blog, and Wiki.In this process, learners continue to generate new ideas, methods, and solutions to problems.These programs have no fixed form and structure, and they were dynamically changed and updated.These ideas, methods, and solutions can all be called resources.This process of generating ideas (solving problems) can be seen as a process of visualization of hidden intellectual resources.These resources were preserved and disseminated through visualization and were used by other members to create more available resources.In addition, Fathurrohman et al. [3] believes that online teachers often provide many external links based on learning objectives and content in the online course, so that learners can learn more about the relevant information.Learners also often add some link resources and recommend relevant learning materials to each other.Due to the constant participation of people, the breadth of learning resources has been greatly expanded.A complex network structure was formed which presents a state of seemingly ambiguous structure.
Linked data and semantic technologies have some applied research in education.Gupta et al. [4] believes that open data based on Semantic Web and ontology technology has become one of the most important ways to publish high-quality associated semantic data.It was widely used in intelligent services such as semantic search and personalized recommendation.Based on big data, Huda et al. [5] studied the innovative environment of online learning resources.Unstructured documents were labeled as structured data with semantics which allows both machines and users to understand their meaning and work together.People can directly access digital resources through mechanisms.Riley et al. [6] studied educational tools based on semantic technology, semantic tools and services those are actually used in higher education in the UK.Based on the study, a three-stage development route was proposed.The creation of connected data across higher education institutions has been gradually realized, so that resources such as education, teaching materials, and curriculum materials were shared among institutional alliances.Thus, the ontology of education was built and the application of ontology-based data analysis and educational perceptual reasoning was implemented.By summarizing, Sanati-Mehrizy et al. [7] found that the OREChem project was funded by Microsoft and was cooperated by Cambridge University, Cornell University, Indiana State University, Los Alamos National Laboratory, Pennsylvania State University, Queensland University, and Southampton University.The project mainly included the following contents: Grid computing was used to create new associated data resources which were developed and integrated into standard ontology for chemical knowledge representation.The core purpose of the project was to design and implement an interoperable architecture based on semantic Web rules.Chemical researchers can share and reuse distributed institutional warehousing, databases and Web services.Connections between different disciplines were also supported.Wang et al. [8] proposed using Semantic Web technology to solve the problems of RLM (Resource List Management) tools.Existing ontology was used to describe resources uniformly.Yeh et al. [9] used the explanatory structure model to explore the design and benefit analysis of professional courses.The principle of associated data was used to improve data interoperability.Existing patterns and ontology were used to describe relationships.Students and teachers were encouraged to enrich the semantics of the data in order to support context-aware recommendation functions.The system not only realizes the unified description of learning resources, but also enriches the semantic description of resources.This was implemented at the University of Plymouth in September 2008.
In summary, associated data has influential applications in some areas.However, this has not made much progress in the field of e-learning.This study innovatively proposes the construction of Linked Course Data (LCD) and links with other associated knowledge data.Linked course data was designed to increase the efficiency and interest of learners by allowing more users to discover more potential data knowledge.The key was to build the ontology based on LCD.Semantic Web technology was used to manage knowledge and provide knowledge services for massively correlated data.First, the relevant algorithms were introduced.Then, the operating environment of the system was discussed.Finally, the associated course dataset system was tested.Results showed that the proposed system was feasible.

High-level method
The High-level method is as follows: First, the corpus is scanned and all relationships declared as owl:sameAs are separated.
Second, these relationships are loaded into an index in memory.These relationships are the passing and symmetric semantics of owl:sameAs.
Third, for each equivalent class in the index, a generic term is chosen.Fourth, the corpus is scanned again.The terminology of the subject or object location in the rdf:type triple is regulated.Therefore, only a small subset of the index corpus-the owl:sameAs statement is needed.These data were merged by two scans.

Union-find algorithm
To perform a transitive symmetric closure in memory, the traditional union-find algorithm is used to calculate equipartition partitions.In the process, first, the equivalent elements are stored in a normal data set, so that each element is only contained in one data set.Second, a graph is used to provide a lookup function for querying which collections an element belongs to.Third, when new equivalent elements are discovered, their collections will be merged.

Semantic association data technology
With the development of Internet technology, a large number of data resources are emerging and information becomes more complex.How to obtain useful knowledge from massive information has become an urgent problem to be solved.The data model based on semantic association is an effective way to solve this problem.It mainly includes two aspects.The first one is semantic association data model, that is, data expression and group based on semantic association.The second is more effective and intelligent detection mechanism on the basis of semantic association data model.Quarantine results are extended to the knowledge entities most related to the query request semantics and the results are sorted reasonably according to the global evaluation value of resources and the query association degree.The model can effectively support reasoning and extend the results to semantically related entities, at the same time it can effectively support the evaluation of knowledge entities and prevent the return of a large number of disorderly results.

Literature research method
Literature related to semantic technology of associated data is collected and identified.By sorting out the collected and identified data, the situation related to this study is summarized and understood, the current situation and existing problems of previous studies are explored, the purpose, objectives and requirements of the study are defined, the focus of the study is identified and the research plan is formulated.Through the comparison and analysis of the relevant literature, the current situation and development trend of semantic association technology resource integration and knowledge retrieval can be understood.The shortcomings of previous studies are summarized and lessons are drawn from existing research results to determine the research direction and ideas of this topic and select appropriate research objects.The method of literature research runs through the whole process.

Experiment method
In this study, two experiments: table conversion experiment and storage index experiment are carried out to verify the effectiveness of semantic association technology in resource integration and knowledge retrieval.

4
Result Analysis and Discussion

Design of associated course data system
The development environment of the system is divided into two: • The hardware environment The jena development kit is used to build and manipulate ontology in the system.Client: The operating system above IE6.0 is installed.
The essence of the associated course data set is the knowledge point (concept) and the semantic relationship between knowledge.This system is mainly to give users a special knowledge to overview.The main idea of the system is to read the associated course data (knowledge ontology) of the constructed computer interface course.The program code is used to read and display the relationship between knowledge points and knowledge in the data set.In Figure 1, circles represent knowledge points and arrows indicate cognitive order relationships before and after knowledge points.Subsequent knowledge points of the knowledge point semiconductor tube and MOS (metal-oxide-semiconductor) tube are gate circuits, flip-flops and decoders.The preorder knowledge points of the ROM (Read-Only Memory) cells and the RAM (Random Access Memory) cells are gate circuits, flip-flops, and decoders.These give the user an overview of the knowledge and helps the user to learn better.Figure 1 also shows the display of some of the associated course data (in the form of knowledge ontology) built into the browser during the research process.The most valuable aspect of the associated course data set is the association.The interface also presents the pre-order knowledge point in personal computer of the knowledge point CPU and the owl:sameAs link of the knowledge point with other data sets (e.g.dbpedia, freebase).Users can click on the link to further learn the multifaceted knowledge layer extension.
The system also provides retrieval of knowledge points.It is used for users to retrieve the knowledge points and related content that they want to know in the system quickly and actively, instead of passively accepting knowledge learning.The idea of the retrieval sub-system is to generate a SPARQL query based on the search content typed by the user.In the background ontology, knowledge points and relationships are retrieved and presented.The retrieval sub-system is not a search engine but a specific knowledge points related content in the knowledge point class in the SPARQL-based retrieval related course data.The summary of the knowledge point and the list of pre-order knowledge points can be found.
The associated course data is for upper-level applications.For example, the navigation map of the knowledge system can generate different knowledge point structure maps for different learners.
In the knowledge navigation browser interface, if the user inputs a knowledge point in the knowledge navigation search box, the system will access the associated course data set in the background and automatically generate a navigation map corresponding to the knowledge point.ROM unit is a pre-order knowledge point of a ROM memory chip.The ROM memory chip is the pre-order knowledge point of the Pentium memory sub-system.That is to say, if the learners want to learn the Pentium memory sub-system, they need to learn the knowledge points such as Read Only Memory (ROM) unit and ROM memory chip as the preliminary knowledge.A series of learning paths help learners to conduct in-depth learning to form theoretical and systematic knowledge, which can help learners to build and migrate knowledge.Therefore, one-sided and shallow or inadequate learning is avoided.

Table conversion experiment
For the sake of accuracy, the experiment was divided into two parts: • The first part extracted 15 tables from Google Squared and Wikipedia.
• The second part of the experiment used the relevant tables of the course data of this project group.The simple table in English is preferred.
A total of 52 columns were assigned from the class label to the column header.The table cell is linked to the entity for a total of 611 entities.Table 1 shows an overview of the data set.Table 2 shows the four categories of columns and entities.In Tables 1  and 2, columns and entities are distributed among four categories of people, locations, organizations, and others (movies, songs, nationalities, etc.).Manual evaluation is used to evaluate the correctness of the class labels predicted by this method.Class tags predicted from the DBpedia ontology are primarily evaluated.
When evaluating the algorithm for assigning class labels to columns for the first time, the system's class tag ranking list is compared to the evaluator's class tag ranking.As shown in Table 3, 80.76% of the columns in the Mean Average Precision (MAP) are greater than 0, which means that at least one related tag is ranked in the top three in the system ranking table.In 75% of the columns, the recall rate of the algorithm is greater than or equal to 0.6.This high recall rate indicates a high match between the first three tags of the system and the first three tags of the evaluator.Finally, the rationality of the predictive class label based on the manual assessment is evaluated.In a given column, there may be a more accurate class label.The evaluator needs to determine the reasonableness of the forecasting class.For example, a column named City, a person might judge dbpedia-owl:City as the most appropriate class.Since dbpediaowl:Populated Place and dbpedia-owl:Place are acceptable and other classes are unacceptable (for example, dbpedia-owl:Thing).The evaluator will think that the 76.92% predicted class label is correct.Figures 2 and 3 show the accuracy of each of the four categories.For assigning class labels, such as organizations and other types of data, moderate accuracy rates are favored.In the knowledge base, these types of entities have data sparsity.For the evaluation of table cells is linked to entities, the cells in the 611 tables are first manually tagged to the corresponding Wikipedia/DBpedia page.This is compared to the system generated links.The results show that 66.12% of the predicted table cells are correctly linked.

Fig. 2. Accuracy of classes in four categories
As shown in Figures 2 and 3 from the point of view of accuracy, there is the highest precision (83.05%) on the link Persons followed by the link to Places (80.3%).There was modest success in the connection (61.90%) but the accuracy of connecting other types of data, such as: movies, nationality, songs, types of business and industry was only 29.22%.In the knowledge base, these types of entities have data sparsity.
The data set has 24 entities that do not exist in the knowledge base.In all 24 cases, the system was able to correctly predict that the table cells should be linked to "empty".
This study did a preliminary assessment to determine the relationship between the columns.First, the manual evaluator assesses the relationship between the columns in a table and at the same time, the system also makes judgments on the relationship between the columns.Then, the two results are compared.The results show that in the five tables for the evaluation, the system can identify 25% of the correct intercolumn relationships.

Storage index experiment
PostgreSQL and MonetDB are used to execute this indexing scheme.PostgreSQL is implemented according to the horizontal partitioning specification.MonetDB has been extended because there is no built-in functionality for horizontal partitioning.It is a widely used and efficient column-oriented database.MonetDB supports partitions of the same size on the currently implemented partition.On MonetDB, the LCDDB is implemented.The triple table is used as input to calculate statistics.N partitions are created and inserted into their corresponding partition table.For example, in the sIndex_2000 table, all triples with a subjectID value between 1001-2000 are selected and inserted into the sIndex_2000 table.Similarly, in all tables oIndex and pIndex, each predicate is created with a predicate table.The triple of the predicate is inserted into the corresponding table .The system test was tested with the LUBM (Lehigh University Benchmark), YAGO, and BSBM (Berlin SPARQL Benchmark) data sets.
First, about the LUBM data set.The LUBM data set is a publicly available benchmark set for testing the query performance of RDF (Resource Description Framework) data storage systems.Corresponding tools are given to generate relevant basic data sets based on the university field including universities, professors, students, courses and many other aspects.Based on the data generation tool given by the LUBM data set, data of different sizes is generated.Therefore, under different data sets, the storage system's SPARQL query performance is tested.Currently, this test data set is very popular.The performance of a storage system that supports SPARQL queries is tested.For functional and performance testing, the tools provided by LUBM were used to generate data of 200 universities including 27,629,308 tuples and 18 different predicates, which occupied 3.2 GB of disk space.Here, the system also uses 1000 resources to divide the sIndex and oIndex index tables.sIndex has 6576 tables, and oIndex has 6576 tables.These data are loaded into MonetDB and 3.4GB is occupied.
Second, about the YAGO data set.YAGO is a real-world data set that contains information extracted from Wikipedia.93 different predicates of the 93193669 triads are included which occupies 3.1 GB of disk space.The number of non-repeating subjects far exceeds the number of objects.Every 2,000 resources are used for division.In addition, every 1000 resources are used to divide the oIndex index.sIndex has 20,608 tables, oIndex has 16,977 tables, and pIndex has 93 tables.After the mapping process of the string id, the data is loaded into MonetDB, which takes up 5.3 GB of disk space.
Third, about the BSBM data set, which is a benchmark data set for the ecommerce field.It consists of a series of products, product descriptions, suppliers and reviews.For experimental purposes, the system produced 1000 products, including data for 356,477 triples.There are 40 different predicates that occupy 805 MB of disk space.The number of objects and subjects is almost equal.Therefore, 1000 is used to divide the table of subjects and objects of resources.sIndex has 1008 tables, oIndex has 1008 tables and pIndex has 40 tables.The data is then loaded into MonetDB, which occupies 1GB of disk space.
The hardware environment and software environment of the test are introduced respectively.Hardware environment: The hardware environment used for the test of a memory size of 1GB.The disk size is 270GB.There are 8 machines with Intel Xeon CPUs, running at 2.5GHz.Software environment: The operating system installed on the machine is Red Hat Enterprise LinuxAS release 4, x86_64.The kernel version is Linux LOG 2.6.9-42.To facilitate viewing of results and comparisons, Java was used to develop the interface to the test system.Sockets are used to query the transmission of statements and result sets.Apache Tomcat/6.0.29 was used for deployment, which provides remote access for most of the tests.On the server, the command line is used to query and test performance.
Some RDF triple storage systems and indexing schemes are studied.From these storage architectures, the DBMS-based triple storage SW-store with the best performance and the best-performing triple storage RDF-3X based on the file system were compared.For the sake of fairness, SW-store and MonetDB are re-implemented for establishing LCDDB (Linked Course Data Database) indexes.
These systems were compared to the associated course database LCDDB of this study, namely LCDDB (MonetDB) and LCDDB (PostgreSQL).In addition, it is also compared with the Triple The experimental results on the LUBM data set were analyzed.LUBM was used as a benchmark to evaluate query performance.LUBM provides 14 queries, which covers most types of queries.However, some queries use reasoning, which is not considered for the time being.Therefore, these queries are modified.Seven queries were compared.Figure 4 shows the LUBM query time.It can be seen that among the seven queries, four executed queries are better than RDF-3X.All implementations are better than SW-store.The query set of the RDF-3X is roughly the same as the LCDDB, which is five times faster than the SW-store query time.The experimental results on the YAGO data set were analyzed.RDF-3X is used to query the same set of YAGO and the base graph mode BGP query is not included.This is compared to the indexing scheme.Figure 5 shows the query performance of different queries on the YAGO data set.Three of the six queries performed better than the RDF-3X.In addition, all queries performed better than SW-store.
The experimental results on the data set were analyzed.Figure 6 shows the experimental results on the BSBM data set.MonetDB is used to implement storage.All six queries perform better than RDF-3X and SW-Store.The query results for the BSBM cold buffer show that the LCDDB is twice as fast as the RDF-3X and SW-Store.Table 5 shows the run time of filter and range queries based on regular expressions.Current storage systems cannot handle these types of SPARQL queries.Table 5 shows the overall results, which calculates the geometric mean of all queries for each data set.

Entity merger experiment
Using this method, experiments were performed on a data set with 111.8 million triads.These tuples are crawled by the crawler from open RDF/XML web files.The crawler grabs in a breadth-first manner.From all locations of the RDF data, the URI is extracted.URI queues are assigned to different paid domain names.
The system extracts 11.93 million original owl:sameAs statements.An equivalent class of 2.16 million was formed, which contained 5.75 million terms (6.24% is a URI).There are only 4156 blank nodes.Figure 9 shows the distribution size of the equivalence class.Among them, 1.6 million (74.1%) equivalence classes contain at least two equal identifiers.Experiments have shown that there is a significant increase in the number of equivalence classes by extending the merge method.This shows that the method of extended reasoning is effective to some extent.The system generated 2.82 million equivalence classes from the data, which was 1.31 times higher than the baseline method.14.86 million terms were involved.Among them, 9.03 million are empty nodes and 5.83 million are URIs.It can be seen that the average number of equivalence classes has increased to 5.26 entities.The maximum number of equivalence classes becomes 33,052.

Conclusion
A system for correlating course data is designed.Table conversion and storage indexes are introduced.The experimental scheme and results of data integration (entity consolidation) are discussed.Compared with the traditional e-learning system, the related course data system has made some progress.The system can present knowledge and associated knowledge links.This leads learners from one source of knowledge to another.In terms of table conversion, the proposed class label prediction in the four-step method and the evaluation of the table cell link to the entity have a higher recall rate.However, the accuracy of the relationship between the columns is not high, which requires manual intervention and subsequent in-depth research.In terms of storage indexing, experiments were conducted on large data sets.The results show that the proposed method is superior to the previously used or

Fig. 1 .
Fig. 1.Example of associated course data in a browser

Fig. 3 .
Fig. 3. Accuracy of entity links in four categories

Table 1 .
Overview of the data set

Table 2 .
Four categories of column and entity distribution

Table 3 .
Example table

Table 4 .
Example table table (MonetDB).LCDDB (MonetDB) is the implementation of the MonetDB version of the LCDDB and LCDDB (PostgreSQL) is the PostgreSQL implementation.The Triple table (MonetDB) is the execution of an ordinary three column three tuple table on MonetDB.The LCDDB is compared to the Triple table (MonetDB).The Triple table (MonetDB) has a longer query time.

Table 5 .
Run time of filter and range query on BSBM (unit: second)