Wavepress is a User-Friendly Toolkit for Computational Chemistry, Drug Design and Material Science

Wavepress is a signal processing software that has a simple and friendly interface that can be used by researchers, professors and students. The program uses the wavelet transform, which is a signal processing tool, being used to handle a signal in the time-scale space. In this way, the program can be used for signal processing capable of reducing, without losing important information, the amount of information. Thus, it is possible to obtain graphs and tables to explore the results in a comprehensive and satisfying way. In this work, the main features of Wavepress are explored with applications in computational chemistry, and drug design, all examples are illustrated with graphs and tables got with Wavepress.


Introduction
The large volume of data obtained in computational simulations and large amounts of samples in several areas of science is a big challenge.It should be kept in mind, however, that a well-executed big data strategy can reduce time and operational costs, leading to the development of new products with high impact on the market and society.In this line, a given analysis can make the process difficult or unfeasible; thus, it is necessary to reduce an enormous amount of data without the loss of important information.In fact, the treatment to reduce large amounts of samples and data is a challenge in several areas of science and engineering.Considering this demand, a technique that has brought new perspectives in signal processing is the wavelet transform. 13][4] In this way, wavelets are functions capable of decomposing and describing other functions in the frequency domain at different frequency and time scales, becoming powerful tools for signal analysis and data compression.The decomposition of a function using wavelets is known as the Wavelet transform and has its variants, continuous and discrete.For the present software (Wavepress) the discrete wavelet transform (DWT) is used. 1,5t is well-known that as DWT is a mathematical procedure that converts a signal into a distinct form, this conversion is important because it can reveal the hidden characteristics of the signal and can represent the original signal in a more succinct way.WT can be applied to signal analysis and has the advantage of the great abundance of its functions, for example, Coiflets, Daubechies, Discrete Meyer, Fejer-Korovkin, Morlet, Reverse Biorthogonal and Symlets. 6,7Thus, each function has a specific form and characteristics that can be used in different signal analysis.Thus, a computational tool based on the wavelet transform, as the Wavepress program, can be used to analyze different types of signals, such as molecular dynamics (MD) simulations, 8,9 drug design, and so on.
MD simulations are amongst the most versatile and important computational techniques, with numerous applications in fields like materials science, biochemistry and biophysics. 10,11In general, the MD analysis is obtained by the trajectory of energy as a function of time, and it is important to have a more realistic study of the system.Through MD simulations, it is possible to study the solvent effect employing explicit molecules to obtain time-averaged properties of the system, such as density, conductivity, and dipolar moment, as well as different thermodynamic parameters, including entropy values as well as interaction energies between ligand and proteins, 12,13 which can be used for drug design.
In fact, the development of new drugs is one of the most important and arduous challenges of current science.In this way, pharmaceutical industries, biotechnology companies, universities and other public and private sectors have been striving for the development of new drugs, being a very complex and demanding interdisciplinary process.][16] Computational chemistry can be applied at various stages in the development of new drugs, in an early stage, these focus on reducing the number of possible ligands, while at the end, during lead-optimization stages, the emphasis is on decreasing experimental costs and reducing times. 17,18Thus, selecting the main structures from the MD simulations is an important step to optimize the process, and the Wavepress program can select the main structures and help to optimize the development of new drugs. 19,20he Wavepress program 21 can be applied in different signal treatments.The Wavepress program is very versatile in data processing using wavelet transforms.In this work, we consider three studies selection of molecular dynamics (MD) structures, 22 Material Science and Drug design. 14

Wavepress design
Wavepress is a user-friendly toolkit for molecular dynamics, with possible applications in spectroscopy, drug design and signal analysis.Thus, Wavepress program was developed within the Matlab 23,24 environment.The program is based on the Optimal Wavelet Signal Compression Algorithm (OWSCA) 5,10,25 methodology.The primary purpose of Wavepress is to provide a tool with a simple interface to be used in signal processing through wavelet transform.Keeping this in mind, Figure 1 shows the interface and summarizes all the functions that are currently implemented in Wavepress.
In the program interface, in the input function, there are Domain 1 (X) and Domain 2 (Y).Domain 1 (X) refers to the file that will be on the X axis of the graph, Domain 2 (Y) refers to the file that will be on the Y axis of the graph.To enter these files, the user must save the data in a file in text document format (notepad, for example) and load both files in the Domain 1 (X) and Domain 2 (Y) button, respectively.To insert input files in both domains (X and Y) the results must be vertically aligned.
After adding the files in the two domains, the next step is to choose which wavelet to use; in the Wavepress program, currently 93 wavelets are implemented, divided into 7 families (Biorthogonal, Coiflets, Daubechies, Discrete Meyer, Fejer-Karovkin, Reverse Biorthogonal and Symlets).The last step is to execute the program, so the user must click on the 'run' button and await the process.As already mentioned, the Wavepress program can be applied in different signal processing processes.In the next topic, we show two applications of the program that are in molecular dynamics (MD) simulations and drug design.
It is important to mention that computational performance depends on the size of the data sets and the hardware capability.The examples presented in this work were carried out in a laptop with Core i5 processor and 8 GB RAM.The Wavepress program can be downloaded from the website. 26On the website it is possible to access the user guide, flowchart and the group's publications regarding the application of wavelets.However, it is possible to run the script in Matlab without using the graphical interface.For this, the user must access Github 27 and manually download the file, in addition, in the Github folder there is a tutorial explaining how the user can execute the code.

Results and Discussion
Selection of structures from molecular dynamics simulations One of the applications of the Wavepress program is for the selection of structures from MD calculations.Generally, MD calculations generate thousands of conformations and often, later, it is necessary to perform quantum calculations of these conformations, for example, chemical shift calculations.In this line, performing quantum calculations of all conformations obtained by MD simulations is computationally unfeasible; thus, it is necessary to select the main or representative conformations for further calculations.The Wavepress program uses wavelet transforms to select representative conformations of MD simulations.The program uses the OWSCA methodology for this purpose. 1In this article, we will show the selection of structures for the system containing magnetite (100) and water (Figure 2).The MD simulations were performed in the REAX-FF 28 program, using the FEOCH 29 force field (which was developed and validated for iron oxide materials).The simulation consisted of a thermalization stage of 500 ps, followed by an additional period of 2.0 ns, and the MD results generated 400000 conformations.Certainly, this number of conformations is unfeasible for a QM (Quantum Mechanics) treatment.Thus, the signal treatment was performed in the Wavepress program in order to reduce the number of conformations.For compounds containing transition metals, the most suitable wavelet is bior 1.3. 30Figure 3a shows the original signal (blue signal) and the signal treated with the bior 1.3 wavelet (red signal); Figure 3b shows only the treated signal.
Analyzing the signals in Figure 2, it is possible to observe that each step of the signal corresponds to a conformation (interaction calls as can be seen in the X axis legend of Figure 2), in real numbers, the number of initial structures was 6000 and the number of treated structures was 100.Thus, it is noted that the number of treated structures (number of steps of the red signal) is much smaller than the number of initial structures (number of steps of the blue signal).This signal compression and, consequently, the number of conformations is very important for further calculations to be performed.Therefore, the Wavepress program also provides us with the total number of treated structures, as well as the values of root mean square error (RMSE) and coefficient of determination (R 2 ), Figure 4.
RMSE is the standard deviation of the residuals (prediction errors).Residuals are a measure of how far from the regression line data points are; RMSE is one of the most commonly used measures for evaluating the quality of predictions.It shows how far predictions fall from measured true values using Euclidean distance.Thus, the closer to 0 the RMSE, the better the wavelet, that is, the compressed and original data are well adjusted.
For the studied composite (magnetite), it is observed that the RMSE value was 0.2402, which represents a good agreement between the compressed and original signal.
The variable R 2 is a statistical measure of how close the data is to the fitted regression line, so it is a measure of linear correlation between two sets of data.In this way, the closer R 2 to 1, the more efficient is the signal compression since its waveform is more similar to the original case.The R 2 can be determined by equation 1, where n is sample   (1) The third variable calculated by the program is the number of structures selected for MD calculations.This number of structures corresponds to the total number of steps obtained by the compressed signal.Thus, for magnetite (compound studied), it was selected 100 structures, as can be seen in Figure 4. Detailed information for each structure, such as energy and time, is calculated and can be seen succinctly in Figure 5 (Table S1, Supplementary Information (SI) section).

Method validation
To validate the efficiency of Wavepress program, the composite [Fe(H 2 O) 6 ] 2+ was employed as the reference model in this stage.In this scenario, structures were selected by 3 different methods, the OWSCA method (present in the Wavepress program), the statistically uncorrelated structures (SU) method and the random (Rd) method.Thus, after the selection, QM calculations of the selected structures were carried out (using three methods) for evaluating the hyperfine coupling constant for the iron atom.After that stage, our theoretical findings were compared to experimental data.
The MD simulations of the compound [Fe(H 2 O) 6 ] 2+ were performed in the REAX-FF 28 program with the FEOCH 29 force field.The selection carried out by the Wavepress program had 100 structures, an R 2 value of 0.7302 and an RMSE value of 0.1922.Figure 6 shows the graph of the original and the treated signal (Figure 6a) and only the treated signal (Figure 6b).
Considering further validation of our methodology, statistically uncorrelated structures (SU) for the system [Fe(H 2 O) 6 ] 2+ were also selected, 31,32 thus, for this methodology, 111 structures were selected.The third methodology used to compare the results is the random (Rd) 33 selection of structures.For this methodology, 80 structures were selected.The results obtained by the three methods are reported in Table 1.
The results obtained show that the A iso (hyperfine coupling constant) calculations with the structures   selected by the Wavepress program were closer to the experimental results, in fact, there is a difference of just 0.02 MHz between the theoretical and experimental data.Using the SU method, the difference between theoretical and experimental results is 0.04 MHz.In turn, with the Rd methodology, the difference between experimental and theoretical results is 0.11 MHz.To have a better understanding of the differences between the three methods, the relative error, equation 2, was also calculated.
(2) Thus, with the Wavepress methodology, the relative error was 4%, with the SU methodology increased to 8% and with the Rd methodology, we have a relative error of 22%.These results reinforce that the Wavepress program is well implemented and could provide, in some systems, a small margin of error between the calculated and experimental results.
Comparing the three methods, our findings reveal a better agreement with experimental data for Wavepress and similar results between Wavepress and SU method and, showing the effectiveness of the methodology.In addition, the results indicate that the developed program has good accuracy in the selection of structures obtained through MD simulations.Our findings point out, therefore, that the program can be used effectively.In addition, it is also important to mention that the program has an interactive interface and easy handling for the user.

Drug design
Drug discovery is the process through which potential new therapeutic entities are identified using a combination of computational, experimental, translational, and clinical models.Thus, the process of finding new drugs based on knowledge of a specific biological target is called Drug Design.This process involves design molecules that are complementary and carry the biomolecular target they interact with and will therefore bind to.Keeping this in mind, this process often relies on computational techniques such as computational modeling. 34,35That way, biopharmaceuticals and especially therapeutic antibodies are an increasingly important class of drugs and computational methods for improving the affinity, selectivity, and stability of these protein-based therapeutics have also gained great advances. 36,37igure 7 shows the structure of the interaction of chloroquine with the coronavirus disease 2019 (Covid-19) main viral protease (M pro ) enzyme. 38,39The MD simulation was performed in the GROMACS 40 program.Considering this simulation, the GROMOS54A7 all-atom force field 41 was used and the simulation generated 20000 interactions.For the ligand parametrization, the Automated Topology Builder (ATB) 42 was used, thus facilitating the development of molecular force fields for MD.The M pro complex was inserted into a 12 Å water box with the simple point-charge (SPC) solvation model, and sodium and chlorine ions were added for charges neutralization under periodic boundary conditions.The calculation of electrostatic interactions was then performed by using the Particle Mesh Ewald method with a cut-off of 12 Å and time step of 1 fs.Initially, complexes were minimized over 5000 cycles using the steepest descent algorithm.After the minimization, a 500 ps equilibration was done in the Canonical (VNT) ensemble, slowly increasing the temperature from 50 to 300 K, using Berendsen thermostat.In order to equilibrate the pressure of the system, a Isothermic-Isobaric (NPT) ensemble equilibration was performed employing Parrinello-Rahman barostat to maintain the system pressure of 1 bar.After the equilibration of the systems, they were submitted to a MD production step with 20 ns of simulation and a 1 fs integration time.The next step is to select the main interactions for the subsequent calculations to be performed.For this purpose, the Wavepress program was used.Figure 8 shows the original signal (Figure 8a) and the compressed signal (Figure 8b).The original signal has a total of  2000 structures and the treated signal corresponds to 90 structures.It is important to mention that for this system the signal treatment was performed with the wavelet bior1.1, in fact, this wavelet is successfully used for protein systems and other systems that have hundreds of atoms.For the analysis, an RMSE value of 0.1352 and an R 2 value of 0.7152 were obtained.These values show that the treated signal is in good agreement with the original signal, being able to represent the whole system with a smaller number of structures, the selected structures of chloroquine with the protein by the Wavepress program can be seen in Table S2 (SI section).
In order to evaluate the applicability of Wavepress for treating big data, what gains accuracy and can reduce time and operational costs.So, the idea behind is also mention the applicability of the program in different areas and fields of activity, thereby making it Wavepress a multidisciplinary program that can be used for different purposes.

Conclusions
As reported, Wavepress is a software developed within the Matlab environment and has a user-friendly interface.The program uses the wavelet transform and its main objective is to reduce large amounts of data without losing important information.The current version of the program has 93 types of wavelets distributed in 7 families (Biorthogonal, Coiflets, Daubechies, Discrete Meyer, Fejer-Karovkin, Reverse Biorthogonal and Symlets).However, new wavelet families and other features can be implemented in later versions.
With that in mind, in this work, two important applications of the program were presented: selection of molecular dynamics structures and drug design.In all applications, the program proved to be very efficient in reducing information and the figures obtained by the program can be exported to various image formats with high resolution.In addition to the applications shown, the Wavepress program can be extended to other areas of science that require efficient data compression.

Figure 2 .
Figure 2. Fe 3 O 4 (100) with water used in the application of the Wavepress program.

Figure 3 .
Figure 3. Energy of MD conformations for magnetite (a) original and compressed signal; (b) compressed signal.X axis: energy; Y axis: interactions.
size and - x and - y are the statistical mean of the original and compressed signals, respectively.

Figure 4 .
Figure 4. Parameters obtained by the Wavepress program.

Figure 5 .
Figure 5. Output file for selected structures.

Figure 7 .
Figure 7. Protein with chloroquine used in the application of the Wavepress program.

Table 1 .
A iso values of Fe 2+ atoms for the selected [Fe(H 2 O) 6 ] 2+ structures