ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Brief Report

Cloning data with unchanged estimates of estimable non-linear functions of parameters

[version 1; peer review: 1 approved with reservations]
PUBLISHED 11 Feb 2021
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Research on Research, Policy & Culture gateway.

Abstract

Non-linear regression models occur in the fields of biology, banking, economics, and sociology, population and biological growth. The absolute growth, growth of humans, and most importantly, an economic variable is appropriately described by non-linear regression models. In this article, we present cloned datasets for bivariate and multivariate non-linear regression models with the same non-linear regression fit. The application of such cloned datasets is used for the confidentiality of sensitive data for publication purposes. In this article, we present cloned data sets which will yield the same fitted non-linear regression models.

Keywords

Cloned data, Linear regression, Non-linear regression, Residuals

Introduction

In situations where the data is confidential and cannot be shown, there is a need for an alternative or matching set of data that can play the role of the actual data. The alternative or matching set of data is called cloned data. Therefore, cloned datasets can give a model-free way of representing confidential data. One possible use of these naturally cloned datasets is for the confidentiality of sensitive data for publication purposes, where having data sets with the same fit as the original data is the main advantage. Anscombe (1973) provided four cloned datasets to show the significance of graphs in a statistical study. All these cloned datasets have identical summary statistics (e.g., mean, variance, and correlation) but different data graphics (scatter plots). Chatterjee and Firat (2007) presented a technique of producing different (bivariate) datasets with the same summary statistics but dissimilar graphs by applying a genetic algorithm-based method. The idea of generating cloned data that has the same fit for simple and multiple linear regressions has been explained by Haslett and Govindaraju (2009, 2012) and Govindaraju and Haslett (2008). Govindaraju and Haslett (2008) gave the idea of cloning datasets by using the simple linear regression in the bivariate case. In all the cases, regression estimates are the same, and the variability decreases in the first to the next iteration. Haslett and Govindaraju (2009) explained the procedure for generating matching or cloning datasets for multivariate case.

The procedure by Haslett and Govindaraju gives a substitute way of presenting confidential data so that statistical analysis of multiple regression has the same fit in the original as well as in the cloned data. However, the data have been changed to be not any more confidential. The advantage is that parameter estimates from the cloned data and the original data do not consist of any model error. Haslett and Govindaraju (2012) regarded the issue of how to enhance the algorithm of producing several cloned datasets that will generate the same fitted regression equations. The primary method described that the fitted slope and intercept are merely estimates and that somewhat dissimilar datasets can still generate the same estimates. Anscombe (1973) showed using four different fictious data sets (see Table 1) and showed that the regression estimates and their standard errors obtained are similar with their graphical significance in the literature (presented in Figure 1) but could not elaborate how such data have been obtained. Chatterjee and Firat (2007) used the data given in Anscombe (1973). They showed that all four datasets have identical summary statistics but different graphs by using the algorithm of the same datasets. Govindaraju and Haslett (2008) explained the procedure to generate cloned data for a simple linear regression yi = a + bxi + eimodel as follows, assuming.

Eei=0,varei=σ2,Eeiej=0,EXei=0
  • 1) Consider n pairs of observations for X and Y, i.e.,(xi, yi) , obtain X̂ by simple regression of Y on X, also obtain by simple inverse regression.

  • 2) The simple regression of Ŷ on X̂ i.e., Ŷ̂=a+bX̂ has the same parameters.

  • 3) Further, obtain X̂̂ by inverse regression and observe simple regression Ŷ̂ on X̂̂ Has the same parameters.

174ef8d2-0ea4-4070-b4a8-293836156ea7_figure1.gif

Figure 1. Scatter plots of Anscombe’s datasets with cloned simple regression models

Table 1. Anscombe’s datasets with (X, Y1), (X, Y2), (X, Y3), and (X4, Y4), forming pairs.

XY1Y2Y3X4Y4
107.469.148.0486.58
86.778.146.9585.76
1312.748.747.5887.71
97.118.778.8188.84
117.819.268.3388.47
148.848.109.9687.04
66.086.137.2485.25
45.393.104.261912.5
128.159.1310.8485.56
76.427.264.8287.91
55.734.745.6886.89

Iterating steps 1 & 2 can generate several fictitious or cloned datasets. They used the datasets given in Anscombe (1973) to generate several cloned datasets and showed that all cloned datasets giving the same regression estimates for first, second, third, fifth, and tenth iteration but different scatter plots. They also identify that SY2>SŶ2>SŶ̂2 and SX2>SX̂2>SX̂̂2. The cloned datasets generated by four fictitious datasets given in Anscombe (1973) provided the same mean of X and Y, the correlation between X and Y, coefficient of determination R2, adjusted R2, regression fit, and standard error of the slope. But the variance of X and Y, standard error of residual and standard error of intercept decreases as the iterations increase. It shows regression towards the mean, i.e. every next cloned dataset is closer to the mean. Haslett and Govindaraju (2009) explained the procedure for generating cloned or matched datasets for a multivariate case that has the same fit. They consider identically independently distributed data for multiple regression models

Y=+ε

Where Y is the vector of responses, X = (X1, X2, …, Xp) Is the n × p design matrix, β is the unknown p × 1 vector of parameters, and ε is the n × 1 vector of errors. The OLS estimate β is

β̂=XtX1XtY

They used mean corrected form for response variable and independent variables x1, x2, …, xp . Because of mean correction, the above multiple regression models can be written as

ŷ=b1x1+b2x2++bpxp

They explained the procedure in six steps and generated ynew, x1, new, …, xp, new which have the same fit as the original model. The cloned dataset generated by Haslett and Govindaraju (2009) gave the same regression fit, the sample mean of Y, X1, X2but the variance of Y, X1, X2, and residual standard error less than that of raw data. Haslett and Govindaraju (2012) developed cloning algorithms for simple and multiple linear regression models. They fit the linear regression model of ynew on x (where x and y are mean-centred) on the original data and find its estimates and residuals. The residuals are added to data y one by one to create n2 data points then fit the linear regression model of ynew and x to find its estimates, resulting in identical estimates for the original datasets and the cloned datasets. The above cloning algorithm can also be used in the multivariate case. In both cases, the parameter estimates of original datasets and cloned datasets are similar. They explained the following methods to generate cloned datasets

  • Cloning via supplementing data by zero-mean additions-bivariate case

  • Cloning via supplementing data by zero-mean additions-multivariate case

  • Bivariate data cloning by regression y on x and x on y

  • Cloning for multiple regression via pivots

Here, we use the model presented in Haslett and Govindaraju (2012) to provide cloned datasets for bivariate and multivariate non-linear regression models with the same non-linear regression fit.

Methods

We consider the non-linear regression of y on X, where both X and y are non-mean-centered with data points. R software was used for all analysis.

Cloning for bivariate non-linear regression

In general, the non-linear regression model is

(2.1)
y=hXβ+ε

with y being the response variable, X the covariate data design matrix, which is often controlled by the researcher, β the model parameters characterizing the relationship between X and y through the regression function h, and ε the model errors that are assumed to be normally distributed with zero mean and unknown variances σ2.

When the regression function h is linear in the parameters β, it leads to linear regression analysis. However, linear models are not always appropriate, so one often needs to apply a non-linear regression model where h is non-linear in β.

Like in linear regression, non-linear regression provides parameter estimates based on the least square criterion. However, unlike linear regression, no explicit mathematical solution is available and specific algorithms are needed to solve the minimization problem, involving iterative numerical approximations. Here, since this is a bivariate non-linear regression on non-mean corrected data when X= x a column vector. In general, provided X is of full row rank, the ordinary least square estimate of β is, of course, β̂.

Now add the residuals r=ε̂=yhxβ̂ from the model fit the data , so that the original data are replicated as block n times to create an n2 × 1 vector and to each block, one of the residuals is added. The first block is y + 1r1 where is a vector of 1’s and r1 the first residual. The data are now 1 ⊗ y + r ⊗ 1, and the design matrix becomes 1 ⊗ x . On noting that the model is still the same, i.e., a bivariate non-linear regression, if 1 ⊗ y + r ⊗ 1 are now regressed on 1 ⊗ x , the OLS estimate becomes β~ which is equal to β̂. Thus, the non-linear regression estimates for cloned data are unchanged because the sum of the residuals 1Tr is zero. Software R has been used to obtain the numerical results.

Anything can be added, i.e., if {al : l = 1, 2, …, m} is added to each data point in the set {yi : i = 1, 2, …, n} then the condition is that ∑al = 0. Some additions are more useful than others.

Example 1: The following cloned dataset (Table 2) is generated from the dataset X= (1,2,3,4,5,6) T and Y= (2.98, 4.26, 5.21, 6.10, 6.80, 7.50) T for the nonlinear regression model Y = aXb , a geometric or power curve. The parameter estimates for this cloned data set are summarized in Table 2b. which can be suitable for the data used in different fields of life if our plotted data shows the form of model Y = aXb. It can be observed that the estimates obtained by cloning procedure in Table 2b are some of the actual estimates.

Table 2.

Cloned dataset having the same non-linear regression fit Y = 2.974X0.5154.

XcloneYcloneXcloneYcloneXcloneYclone
12.99062613.00309712.988703
24.27062624.28309724.268703
35.22062635.23309735.218703
46.11062646.12309746.108703
56.81062656.82309756.808703
67.51062667.52309767.508703
12.96237812.95056212.985865
24.24237824.23056224.265865
35.19237835.18056235.215865
46.08237846.07056246.105865
56.78237856.77056256.805865
67.48237867.47056267.505865

Table 2b. Parameter estimates of the raw and cloned dataset in Table 2.

EstimatesStd. ErrorVariablesMeanVarianceRSECorr.
a2.97400.015214X3.53.5Y|X-
b0.51540.003503Y5.4752.803670.0219870.9931
aclone2.97400.007380Xclone3.53Yclone|Xclone-
bclone0.51540.001699Yclone5.4752.403780.0261250.9931

Example 2: The cloned dataset (Table 3) is generated from the dataset X= (0, 1, 2, 3, 4, 5, 6, 7, 8) T and Y= (0.75, 1.20, 1.75, 2.50, 3.45, 4.70, 6.20, 8.25, 11.50) T for the nonlinear regression model =abX , an exponential curve. If the sensitive observed data shows the exponential curve (Y = abX) such procedure can be useful for cloning of data. It can be observed that the estimates obtained by cloning procedure in Table 3b are similar to the actual estimates.

Table 3.

Cloned dataset having the same non-linear regression fit Y = abX

XcloneYcloneXcloneYcloneXcloneYclone
00.789816800.912410100.7033688
11.239816811.362410111.1533688
21.789816821.912410121.7033688
32.539816832.662410132.4533688
43.489816843.612410143.4033688
54.739816854.862410154.6533688
66.239816866.362410166.1533688
78.289816878.412410178.2033688
811.5398168811.6624101811.4533688
00.584702800.868014200.6307205
11.034702811.318014211.0807205
21.584702821.868014221.6307205
32.334702832.618014232.3807205
43.284702843.568014243.3307205
54.534702854.818014254.5807205
66.034702866.318014266.0807205
78.084702878.368014278.1307205
811.3347028811.6180142811.3807205
00.770585200.803298200.5312433
11.220585211.253298210.9812433
21.770585221.803298221.5312433
32.520585232.553298232.2812433
43.470585243.503298243.2312433
54.720585254.753298254.4812433
66.220585266.253298265.9812433
78.270585278.303298278.0312433
811.5205852811.5532982811.2812433

Table 3b. Parameter estimates of the raw and cloned dataset of Table 3.

EstimatesStd. ErrorVariablesMeanVarianceRSECorr.
a0.970.037728X47.50Y|X-
b1.360.007540Y4.512.950.1397620.954
aclone0.960.015889Xclone46.75Yclone|Xclone-
bclone1.360.003210Yclone4.511.670.1774240.954

Example 3: The cloned dataset (Table 4) is generated from the dataset X= (1, 2, 3, 4, 5, 6) T and Y= (1.6, 4.5, 13.8, 40.2, 125.0, 363.0) T for the nonlinear regression model Y = aebX, an exponential curve. If the sensitive data shows the non-linear regression shape of Y = aebX, then such cloning procedure would be helpful. It can be observed that the estimates obtained by cloning procedure in Table 4b are equal to the actual estimates.

Table 4.

Cloned dataset having the same non-linear regression fit Y = aebX

XcloneYcloneXcloneYcloneXcloneYclone
11.34125681-0.21963511.2394154
24.241256822.680363524.1394154
313.541257311.980364313.439415
439.941257438.380364439.839415
5124.741265123.180365124.63942
6362.741266361.180366362.63942
13.052436911.108744011.5468715
25.952436924.008744024.4468715
315.252437313.308744313.746872
441.652437439.708744440.146872
5126.452445124.508745124.94687
6364.452446362.508746362.94687

Table 4b. Parameter estimates of the raw and cloned dataset in Table 4.

EstimatesStd. ErrorVariablesMeanVarianceRSECorr.
a0.560.027971X3.53.5Y|X-
b1.080.008450Y9119830.871.2105550.8331
a clone 0.550.0137135X clone 3.53Y clone |X clone -
b clone 1.080.004206Y clone 9116998.831.4696590.8331

Example 4: The cloned dataset (Table 5) is generated from the dataset X= (0, 1, 2, 3, 4, 5)T and Y= (58, 66, 72.5, 78, 82, 85)T for the nonlinear regression model Y = kabX, the Gompertz curve. Parameter estimates of the raw and cloned dataset is shown in Table 5b.

Table 5.

Cloned dataset having the same non-linear regression fit Y = kabX

XcloneYcloneXcloneYcloneXcloneYclone
057.93007058.13133057.97689
165.93007166.13133165.97689
272.43007272.63133272.47689
377.93007378.13133377.97689
481.93007482.13133481.97689
584.93007585.13133584.97689
058.05172057.87750058.03288
166.05172165.87750166.03288
272.55172272.37750272.53288
378.05172377.87750378.03288
482.05172481.87750482.03288
585.05172584.87750585.03288

Table 5a. Parameter estimates of the raw and cloned dataset in Table 5.

EstimatesStd. ErrorVariablesMeanVarianceRSECorr.
a0.6152220.002761X2.53.5Y|X-
b0.73211930.004796Y73.583104.440.1017760.9859
k0.94220.466140Xclone2.53Yclone|Xclone-
aclone0.6152220.001339Yclone73.58389.530.1209280.9859
bclone0.73211930.002326
kclone0.94220.226110

Example 5: The cloned dataset (Table 6) is generated from the dataset X= (0.5, 0.5, 1, 1, 2, 2, 4, 4, 8, 8, 16, 16) T and Y= (0.96, 0.91, 0.86, 0.79, 0.63, 0.62, 0.48, 0.42, 0.17, 0.21, 0.03, 0.05) T for the nonlinear regression model =ksXbcX, the Makeham curve. The observed sensitive data shows the non-linear regression shape of makeham curve, then such cloning procedure would be beneficial as the estimates are closed. It can be observed that the estimates obtained by cloning procedure in Table 6b are some of the actual estimates.

Table 6.

Cloned dataset having the same non-linear regression fit Y = ksXbcX

XcloneYcloneXcloneYcloneXcloneYcloneXcloneYclone
0.50.967790.50.931430.50.935470.51.00684
0.50.917790.50.881430.50.885470.50.95684
1.00.867791.00.831431.00.835471.00.90684
1.00.797791.00.761431.00.765471.00.83684
2.00.637792.00.601432.00.605472.00.67684
2.00.627792.00.591432.00.595472.00.66684
4.00.487794.00.451434.00.455474.00.52684
4.00.427794.00.391434.00.395474.00.46684
8.00.177798.00.141438.00.145478.00.21684
8.00.217798.00.181438.00.185478.00.25684
16.00.0377916.00.0014316.00.0054716.00.07684
16.00.0577916.00.0214316.00.0254716.00.09684
0.50.947790.50.948730.50.945470.50.93146
0.50.897790.50.898730.50.895470.50.88146
1.00.847791.00.848731.00.845471.00.83146
1.00.777791.00.778731.00.775471.00.76146
2.00.617792.00.618732.00.615472.00.60146
2.00.607792.00.608732.00.605472.00.59146
4.00.467794.00.468734.00.465474.00.45146
4.00.407794.00.408734.00.405474.00.39146
8.00.157798.00.158738.00.155478.00.14146
8.00.197798.00.198738.00.195478.00.18146
16.00.0177916.00.0187316.00.0154716.00.00146
16.00.0377916.00.0387316.00.0354716.00.02146
0.50.971430.51.008730.50.936840.50.98146
0.50.921430.50.958730.50.886840.50.93146
1.00.871431.00.908731.00.836841.00.88146
1.00.801431.00.838731.00.766841.00.81146
2.00.641432.00.678732.00.606842.00.65146
2.00.631432.00.668732.00.596842.00.64146
4.00.491434.00.528734.00.456844.00.50146
4.00.431434.00.468734.00.396844.00.44146
8.00.181438.00.218738.00.146848.00.19146
8.00.221438.00.258738.00.186848.00.23146
16.00.0414316.00.0787316.00.0068416.00.05146
16.00.0614316.00.0987316.00.0268416.00.07146

Table 6b. Parameter estimates of the raw and cloned dataset in Table 6.

EstimatesStd. ErrorVariablesMeanVarianceRSECorr.
b1.2060.128403X5.2531.9773Y|X-
c0.290.350857Y0.5100.11330.029115-0.91
k0.9340.075154Xclone5.2529.5175Yclone|Xclone-
s0.8240.014716Yclone0.5100.10530.037855-0.91
bclone1.2060.048284
cclone0.290.131779
kclone0.9340.028093
sclone0.8240.005518

Example 6: The cloned dataset (Table 7) is generated from the dataset X= (0, 1, 2, 3, 4, 5, 6, 7, 8) T and Y= (0.75, 1.20, 1.75, 2.50, 3.45, 4.70, 6.20, 8.25, 11.50) T for the nonlinear regression model =k + abX , a modified exponential curve. Sensitive data showing the pattern of modified exponential curve, procedure explained above with the help of table and their estimates would be beneficial. It can be observed that the estimates obtained by cloning procedure in Table 7b are equal to the actual estimates.

Table 7.

Cloned dataset having the same non-linear regression fit Y = k + abX

XcloneYcloneXcloneYcloneXcloneYclone
00.8676000.8413000.75791
11.3176011.2913011.20791
21.8676021.8413021.75791
32.6176032.5913032.50791
43.5676043.5413043.45791
54.8176054.7913054.70791
66.3176066.2913066.20791
78.3676078.3413078.25791
811.61760811.59130811.50791
00.5442900.8298700.73207
10.9942911.2798711.18207
21.5442921.8298721.73207
32.2942932.5798732.48207
43.2442943.5298743.43207
54.4942954.7798754.68207
65.9942966.2798766.18207
78.0442978.3298778.23207
811.29429811.57987811.48207
00.6916000.8097600.67560
11.1416011.2597611.12560
21.6916021.8097621.67560
32.4416032.5597632.42560
43.3916043.5097643.37560
54.6416054.7597654.62560
66.1416066.2597666.12560
78.1916078.3097678.17560
811.44160811.55976811.42560

Table 7b. Parameter estimates of the raw and cloned dataset in Table 7.

EstimatesStd. ErrorVariablesMeanVarianceRSECorr.
a1.1855290.127009X47.5Y|X-
b1.33194340.015977Y4.4812.950.1093900.954
k-0.3611290.194051Xclone46.75Yclone|Xclone-
aclone1.1855290.053467Yclone4.4811.670.1381490.954
bclone1.33194340.006726
kclone-0.3611290.081689

Example 7: The following cloned dataset (Table 8) is generated from the dataset X= (0, 1, 2, 3, 4, 5, 6,7, 8) T and Y= (1225, 2879, 4994, 11525, 16190, 22573, 30677, 38517, 39003) T for the nonlinear regression model Y=k1+bcX, the Logistic curve. If the curve of observed data is in the form of logistic, then Table 8 procedure for cloning the data would be suitable. It can be observed that the estimates obtained by cloning procedure in Table 8b are identical as the actual estimates.

Table 8.

Cloned dataset having the same non-linear regression fit Y=k1+bcX

XcloneYcloneXcloneYcloneXcloneYclone
01778.740-632.170991.62
13432.7411021.8312645.62
25547.7423136.8324760.62
312078.7439667.83311291.62
416743.74414332.83415956.62
523126.74520715.83522339.62
631230.74628819.83630443.62
739070.74736659.83738283.62
839556.74837145.83838769.62
03919.880789.9601506.03
15573.8812443.9613160.03
27688.8824558.9625275.03
314219.88311089.96311806.03
418884.88415754.96416471.03
525267.88522137.96522854.03
633371.88630241.96630958.03
741211.88738081.96738798.03
841697.88838567.96839284.03
0686.4802912.8901204.81
12340.4814566.8912858.81
24455.4826681.8924973.81
310986.48313212.89311504.81
415651.48417877.89416169.81
522034.48524260.89522552.81
630138.48632364.89630656.81
737978.48740204.89738496.81
838464.48840690.89838982.81

Table 8b. Parameter estimates of the raw and cloned dataset in Table 8

EstimatesStd. ErrorVariablesMeanVarianceRSECorr.
b31.96248.38X47.5Y|X-
c0.460.032Y186202205789581438.260.98
k41044.631829.16Xclone46.75Yclone|Xclone-
bclone31.96243.72Yclone188572000931971837.820.98
cclone0.460.014
kclone41044.63767.30

Cloning for multivariate non-linear regression

The algebra for the bivariate non-linear regression is unaltered for multivariate non-linear regression, except that the matrix X becomes Xn×p=xi1:xi2::xip, and the parameter vector and its estimates become (p + 1) × 1 vector, β̂,β~, and β.

Example 8: The following cloned dataset (Table 9) is generated from the dataset X1= (23.81, 75.83, 9.46, 5.71, 85.78, 0.37,8.82, 8.99, 37.65)T, X2= (11.33, 25.92, 7.03, 29.68, 21.81, 0.57, 11.25, 19.01, 75.25)T and Y= (22.76, 76.73, 8.62, 10.98, 86.77, 0.97, 11.82, 16.63, 67.40)T for the nonlinear regression model Y=AaX2b+1aX1b1b, the constant elasticity of substitution production function. Parameter estimates of the raw and cloned dataset is shown in Table 9b.

Table 9.

Cloned dataset having the same non-linear regression fit Y=AaX2b+1aX1b1b

X1,cloneX2,cloneYcloneX1,cloneX2,cloneYcloneX1,cloneX2,cloneYclone
23.8111.3318.9323.8111.3318.7023.8111.3321.61
75.8325.9272.9075.8325.9272.6775.8325.9275.58
9.467.034.799.467.034.569.467.037.47
5.7129.687.155.7129.686.925.7129.689.83
85.7821.8182.9485.7821.8182.7185.7821.8185.62
0.370.57-2.860.370.57-3.090.370.57-0.18
8.8211.257.998.8211.257.768.8211.2510.67
8.9919.0112.808.9919.0112.578.9919.0115.48
37.6575.2563.5737.6575.2563.3437.6575.2566.25
23.8111.3320.6723.8111.3325.2623.8111.3323.53
75.8325.9274.6475.8325.9279.2375.8325.9277.50
9.467.036.539.467.0311.129.467.039.39
5.7129.688.895.7129.6813.485.7129.6811.75
85.7821.8184.6885.7821.8189.2785.7821.8187.54
0.370.57-1.120.370.573.470.370.571.74
8.8211.259.738.8211.2514.328.8211.2512.59
8.9919.0114.548.9919.0119.138.9919.0117.40
37.6575.2565.3137.6575.2569.9037.6575.2568.17
23.8111.3319.5623.8111.3323.1523.8111.3325.17
75.8325.9273.5375.8325.9277.1275.8325.9279.14
9.467.035.429.467.039.019.467.0311.03
5.7129.687.785.7129.6811.375.7129.6813.39
85.7821.8183.5785.7821.8187.1685.7821.8189.18
0.370.57-2.230.370.571.360.370.573.38
8.8211.258.628.8211.2512.218.8211.2514.23
8.9919.0113.438.9919.0117.028.9919.0119.04
37.6575.2564.2037.6575.2567.7937.6575.2569.81

Table 9b. Parameter estimates of the raw and cloned dataset in Table 9.

EstimatesStd. ErrorVariablesMeanVarianceRSE
A1.360.064478X128.491111113.739-
a0.300.029150X222.427781008.499-
b-0.500.420536Y33.63111478.74972.924394
Aclone1.340.027909X1,clone28.491111008.25-
aclone0.300.012978X2,clone22.42778907.6489-
bclone-0.500.185972Yclone32.71486430.87473.851240

Conclusions

In this article, we presented a cloned dataset for bivariate and multivariate non-linear regression models with the same non-linear regression fit. The application of such cloned datasets is for maintaining the confidentiality of sensitive real data for publication purposes. In this context, new methods can be developed so that cloning is possible for non-linear regression models. The question this study addresses is how cloning techniques are better than simulation and re-sampling. The simulation approach assumes that the model is known and then generates random data from the distribution of the response variable to illustrate the sampling variability in the estimates, re-sampling estimates the precision of sample statistics by using a subset of available data or drawing randomly with replacement from a set of data points. Unfortunately, these approaches do not help to explain the concept of regression or the idea of ‘moving towards’ the mean. The methods presented in this study are intended to fill this gap by yielding a sequence of matching data sets with the same fitted regression equation, for which the variability in the response variable Y and the explanatory variable X will progressively reduce. The tendency of moving towards the means rather than the conditional mean are also demonstrated.

Data Availability

All data underlying the results are available as part of the article and no additional source data are required.

Comments on this article Comments (1)

Version 2
VERSION 2 PUBLISHED 15 Mar 2022
Revised
Version 1
VERSION 1 PUBLISHED 11 Feb 2021
Discussion is closed on this version, please comment on the latest version above.
  • Author Response 19 Nov 2021
    Roseline Ogundokun, Department of Computer Science, Landmark University Omu Aran, Omu Aran, 251101, Nigeria
    19 Nov 2021
    Author Response
    • How does the method can be compared with classical approaches?
    • Response: It has been mentioned in the conclusion that classical methods like the simulation approach assume that the
    ... Continue reading
  • Discussion is closed on this version, please comment on the latest version above.
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Hussain S, Daniyal M, Ogundokun RO et al. Cloning data with unchanged estimates of estimable non-linear functions of parameters [version 1; peer review: 1 approved with reservations] F1000Research 2021, 10:106 (https://doi.org/10.12688/f1000research.28297.1)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 11 Feb 2021
Views
12
Cite
Reviewer Report 15 Oct 2021
David Medina-Ortiz, Universidad de Chile, Región Metropolitana, Chile 
Approved with Reservations
VIEWS 12
The authors propose a novel approach to simulate datasets in case of confidential problems using non-linear regression approaches. The authors demonstrate your proposed strategy testing with different datasets and explain how your strategy could be applied on different case of ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Medina-Ortiz D. Reviewer Report For: Cloning data with unchanged estimates of estimable non-linear functions of parameters [version 1; peer review: 1 approved with reservations]. F1000Research 2021, 10:106 (https://doi.org/10.5256/f1000research.31296.r95224)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (1)

Version 2
VERSION 2 PUBLISHED 15 Mar 2022
Revised
Version 1
VERSION 1 PUBLISHED 11 Feb 2021
Discussion is closed on this version, please comment on the latest version above.
  • Author Response 19 Nov 2021
    Roseline Ogundokun, Department of Computer Science, Landmark University Omu Aran, Omu Aran, 251101, Nigeria
    19 Nov 2021
    Author Response
    • How does the method can be compared with classical approaches?
    • Response: It has been mentioned in the conclusion that classical methods like the simulation approach assume that the
    ... Continue reading
  • Discussion is closed on this version, please comment on the latest version above.
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.