Algorithms for Analyzing and Forecasting in a Pharmaceutical Company

This paper presents some of the utilities of using SAS code for analyze and forecast, in a pharmaceutical company. First, data must be cleaned, in order to obtain representative results. Once this stage done, the input can be evaluated and models can be created in SAS, to extract the most valuable information from the initial data. To sustain management decisions, SAS code allows creating different types of reports and has the capability to recode the initial variables into new ones, but keeping the most part of the information contained, through the technique called “principal components analysis”. Another advantage of SAS code is its capability of classifying the initial variables into compact class. Based on scorecards and WOE (Weight of Evidence), notions used in banking analyses, it has been created a model which evaluates the suppliers of the company and ranks them, in order to find out what contracts must be continued and which contracts must be closed.


Introduction
Data analysis involves collecting, analyzing and presenting large amounts of data, in order to extract the most valuable information from the initial data and to discover patterns and trends which are used in decision-making process.This means that based on many information, it is important to know how to extract the essence from the available data and how to use it in order to improve the future activity and the results obtained.Data analysis is a process that involves statistical algorithms or logical techniques to describe and evaluate data.Many of the analyses performed become an iterative process where data is systematically collected and analyzed simultaneously, especially because patterns are discovered through the entire data collection 0. Subjecting data to statistical operations can relieve the relationship among variables and the level of trust the answers are getting.Data analysis is used to process and filter the information contained in the data studied, in order to capture the essence or extract the information contained.Another purpose is to highlight this information in an essentially form, more intelligible, suggestive, simplified and synthesizers.Achieving this goal involves the development of a sequence of transformations performed on the raw data and involves the use of specific methods and techniques.These changes aim to maximize the relevance and interpretability of data and requires, among other things, to eliminate redundant or meaningless information and generality.From this point of view, data analysis appears to be a specific transformation process that takes as input the raw data and as output the information synthesized 0.  10) Integer (10) Integer(3) Integer( 2) Char( 6) Table 1 presents the variables analyzed.Their description is presented below: Category refers to the status of the pharmaceutical product.This variable can have the values "OTC" -without a prescription, "Rx" -by prescription, "SN" -a dietary supplement, "Cosmetic" or "Biocides".Type describes the formulation of the product and can have the values "Tablets", "Capsule", "Syrup", "Solution" or "Spray".Unitary_price, Range_series, Stock, Sales, Production refer to the production.Term refers to the number of months of validity guaranteed for each product.Substitutes contain the number of products substitued (have the same properties and the same curative effect) for each product.Form refers to the physical nature of each pharmaceutical product.This variable can have the values "liquid" or solid ".Active_ingredients contains the number of active ingredients in each product, which influences the shelf life of the products.Data cleaning is one of the most important steps in processing data, by verifying that the data are correct or at least compliant with a certain set of rules.Data cleaning is concerned with detecting and removing errors and incompatibilities to improve data quality.Data quality problems are present in the individual data collections such as files and databases, caused by misspellings during data entry, lack of information or other types of errors.In order to verify the accuracy of data, it has been created formats for the analyzed variables:

Reports
Once the stage of accessing and cleaning data done, data can be summarized, analyzed and evaluated.Before transform data, it can be created reports to show the initial state.Regarding that data are unique, without missing values and with no incorrect format, the results obtained will be representatives.Reports are tools for management, which influence the decision making process.SAS offers the possibility of creating such reports, parameterize and customize them, as well as save the resulted reports in PDF or HTML.The database analyzed belongs to a pharmaceutical company, which means that the information analyzed refer to clients, manufacture, products and suppliers.For example, we can create a report based on information about productsunit price, category, type, unit stock, unit sales etc.To create a report about products and the situation of units, considering the category and the type for each product, we can use the code below: considering the category and the type for each product Fig. 1 presents the report which lists by category and type, products and unit prices, while achieving total units sold, the total stock and total units produced.For example, in the category "Biocide", there are 2 types of productsliquid and tablets; the liquid products are Betagin sol.ext.10% 500ml x1 and Betagin sol.ext.10% 30ml x1 and the tablets are Dezinfectant Forte compr.x50.For every product are listed total units sold, the total stock and total units produced.

Scorecard for Ranking Suppliers
Scorecard is a technique that allows a company to monitor and manage performance, considering some targets.

Measurements
are usually financial performance, customer value, internal business processes, the performance of innovation or employee performance 0. Suppliers have an important role in the proper functioning of the Biofarm Company.If suppliers do not fulfill their obligations under the contract, increases the risk that Biofarm reduce or even stop their work due to lack of raw materials or packaging.Scorecard is used in activities related to risk management of non-commitment to a supplier, in order to decide to conclude a new contract or not to continue to work with suppliers for which risk and losses are related.For these reasons, it is useful to review suppliers that Biofarm contracts.The analyze made regarded the top 20 suppliers by value of purchases made by the company.The analysis regards 20 vendors for which 9 attributes are known: Acquisition_value, Term (delivery time agreed in contract), Term(achieved) (delivery time realized), Product_category (the category for each product), Orders_history (the status for each order), Company_type (the type of every supplier), Contract (the length of the collaboration between Biofarm and each supplier), No_of_contracts (number of contracts closed by Biofarm with each supplier), Performance (show if a supplier is performing or not).Fig. 2 presents the formula for WOE (Weight of Evidence), which represents the power of discrimination of each attribute and helps to differentiate the model risk and must be calculated for each variable.

Fig. 2. Weight Of Evidence formula
Information Value formula, shown in Fig. 3, has the purpose to select variables during analysis and model building.This concept reflects the overall predictive power of the variables considered, so it can be used for comparing the predictive power among competing variables.As it is presented in Fig. 5, the biggest negative value for the variable Term is obtained for the "45" category, which means that for the category of providers who should deliver in 45 days, there are more underperforming suppliers than performing ones.The chart above shows that the most profitable are providers who delivered the order within 30 days.Ongropack Kft BC, Tipografia Polirom SRL and Morgan Thorpe SA.Considering the results obtained, Biofarm will continue to collaborate with those companies and can close the contracts with the suppliers which didn't fulfill their responsibilities, such as Xiamen Forever, Bostocke, Argorom Plastics, Linnea SA or DSM Nutritional Products Europe LTD.

Conclusions
In the beginning of this article, SAS has been presented as an important data analysis software tool, dedicated to highlight the power and importance of using such programs for statistical analysis of the data in a company.Following, it has been shown the importance of data cleaning and validation; this phase cannot miss because the analyze need accurate data in order to obtain representative results.For that, it has been created data validation formats and the records which didn't comply have been removed.Once the stage of accessing and cleaning data done, data can be summarized, analyzed and evaluated.Before transform data, it can be created reports to show the initial state.Regarding that data are unique, without missing values and with no incorrect format, the results obtained will be representatives.Reports defined in SAS allow visualization of key information about products and are useful tools for sustaining management decisions.SAS offers the possibility of creating such reports, parameterize and customize them, as well as save the resulted reports in PDF or HTML.Scorecard created for the company's suppliers has the purpose to sustain management decisions.The hierarchy resulted takes into account the performance indicators obtained in the last year, and historical data, such as length collaboration between the company and suppliers, the number of contracts and the degree of respect the contract by each supplier.

Fig. 1 .
Fig. 1.Report about products and the situation of stock, sales and production units, considering the category and the type for each product

Fig. 3 .Fig. 4 .
Fig. 3. Information Value formulaRegarding the notions explained above Error!Reference source not found., the following macro has been defined in order to calculate the Weight of Evidence and the Information Value for each attribute 0:

Fig. 5 .
Fig. 5. Weight of Evidence for the variable Term

Lucelia
DEDU has graduated the Faculty of Economic Cybernetics, Statistics and Informatics from the Bucharest Academy of Economic Studies in 2012 and she has graduated in 2014 the courses of "Databases for Business Support" master program organized by the Academy of Economic Studies of Bucharest.During her second year at the master program, she followed the course Credit Risk Management, held by BCR, where she discovered new features of using SAS code.Her career is centered on payroll domain, database's queries, ETL Tools for data warehouses and COBOL development.Her areas of interest are: Data Warehouses, Data Modeling, Risk Management, Databases and SAS development.

Table 1 .
Structure of initial data set 2 Accessing and Cleaning DataIn SAS, data can be accessed in 3 different ways: Using INFILE declaration;  Reading data directly from SAS code (for small data sets);  Using IMPORT procedure.

proc sort data=formats.products out=temp; by Product; run; data formats.duplicate formats.nonduplicate; set temp; by Product; if first.Product and last.Product then output formats.nonduplicate; else output formats.duplicate; run; title "Duplicate values: "; proc print data=formats.duplicate; id Product; run; title "Unique values: "; proc print data=formats.nonduplicate; id Product; run;
To eliminate records with duplicate values, we can use the code below: