Coronaviruses: A patent dataset report for research and development (R&D) analysis

This work shows a patent database for Coronaviruses that provides an overview of the patenting activity and trends in focused antiviral therapy with the use of triazole based compounds, glycoprotein, and protease inhibitors as possible treatment. The patent data was obtained from Orbit Intelligence Software using a patent family structure to get a big database that could be used for built patent landscape report (PLR), market analysis, technical and competitive intelligence, and monitoring and survey of a new ideas for the treatment of coronavirus diseases. The raw data is reported in four databases, which were classified according to different items: legal status (alive, dead), 1st application year (after 2015, 2011-2015, 2006-2010, 2001-2005), and Top 5 International Patents Classifications (IPC). The main players, the investment trend, markets, geographical distribution, technology overview, technologies distribution, and patent citation are showed by this analysed data report.


Specifications
Infectious Diseases Specific subject area patent landscape report, patent analysis, Bioinformatics Type of data Chart Graph Figure  How  The raw data consist of four databases, each database has 12 files (XLSX format) and 11  • The data could help elaborate policies to determine the qualifications for investments in universities, research institutes, foundations, companies, and governments, thus allowing for better decision making in this regard.

Data Description
The data patents are of high importance because the patents contain technical information about a specific area and they have a high impact on the innovation process [1] . The database consists of two sections: CV AV GLYCO The database contains the information of all patent families related to coronaviruses and antiviral therapy and glycoprotein. 4 CV AV PRO The database contains the information of all patent families related to coronaviruses and antiviral therapy and protease inhibitors.

Raw data
The supporting information section has four databases, each dataset has 12 files (XLXS format) with information selected for specific items and the date of search. Table 1 shows the distribution of data for each search and Table 2 to Table 5 show the information for each file in the database. All files contain information related to: Title, Images, Publication numbers, Publication kind codes, Publication dates, Original document, Earliest priority date, Abstract, Inventors, Latest standardized assignees -inventors removed, Representative, Advantages / Previous drawbacks, Independent claims, Object of invention, Technical concepts, Claims, Keywords in context, Technology domains, CPC -Cooperative classification, IPC -International classification, Citing patents -Standardized publication number, Citing patents -Raw information, Cited patents -Standardized publication number, Cited patents -Raw information, Non-Latin cited patents, Cited non-patent literature, Family legal status, Legal status (Pending, Granted, Revoked, Expired, Lapsed), Family legal state, Legal state (Alive, Dead), Legal actions, Independent claims, Dependent claims -Count.  CV AV TZ 11-03-2020 IPC A61K-038 Table 4 Raw data list for CV AV GLYCO database.

Analysed data
This section discloses the processed data from Orbit Intelligence software. The main charts have been selected for each database according to the visualizations recommended by the soft- Table 5 Raw data list for CV AV PRO database.        ware. The Figures 1 to 11 show the analysed data for CV AV TH database. The supporting information (SI) has the figures obtained for CV AV TZ, CV AV GLYCO, and CV AV PRO database CV AV TH database , Figure 1 and Figure 2 show the main key players (top 30) according to the size of patents and their legal status (pending, granted, dead) respectively. Also, the figures show the size of the company's portfolios in the antiviral therapy treatment. Figure 3 illustrates the evolution of the investment trend since 20 0 0 to 2019, this data shows he dynamics of inventiveness of the portfolio on patent families. Figure 4 shows the trend of applications over time by an applicant and this data is related to the investment (relative size) for company in the time. Figure 5 illustrates the protection map of alive patents in the various national offices. Figure 6 and Figure 7 show heat maps over the domain of technology according to IPC classifications and the distribution of companies. Further, Figure 8 and Figure 9 establish a relationship between inventors and a technological map based on IPC.
Finally, the Figure 10 and Figure 11 shows the relationship between the key patent families (legal status) and the companies based on the citations.

Experimental Design, Materials, and Methods
The dataset patents were obtained and analysed using Orbit Intelligence Software (version 1.9.8) from Questel-Orbit.This software has a comprehensive suite for searching, analysing, and managing inventions and IP assets [2 , 3] .
The advanced search option has nine fields: Title, Abstract, Claims, Description, Object of the Invention, Advantages of the Invention Over Previous Art, Independent Claims, Concepts, and Full text. The Fampat Collection was used for search and analysis data and the Fampat module coverage of worldwide patent publications is published by more than 100 patent authorities (Orbit Intelligence Software) [4] .
The methodology for searching patents is similar to the one reported by different areas: fisheries [5] , petroleum bioremediation techniques [6] , propagation in sugar industry [7] , analysis of dental caries in primary teeth [8] , and hydrogen economic analysis [9] .
The selection of keywords was based on a list of compounds that could be useful for the treatment of disease [10][11][12] , the search strategy was based on the revision of search strategies and then the keywords and IPC combination on the Orbit Platform was used. The advanced search assistant was used with the term related to antiviral therapy. So, the script for the equa- The raw data were recorded on file (XLXS Format) and the out profile was Title, Images, Publication numbers, Publication kind codes, Publication dates, Original document, Earliest priority date, Abstract, Inventors, Representative, Latest standardized assignees -inventors removed, Advantages / Previous drawbacks, Independent claims, Object of invention, Technical concepts, Claims English, description, Keywords in context, CPC -Cooperative classification, IPC -International classification, and PCL -US patent classification.
The analysis data was made with the IP Business Intelligence module, which is a tool used for decision making. It allows you to analyse big volumes of data and it produces different charts according to the analysis made.