Comparative assessment of freeform polynomials as optical surface descriptions

: Slow-servo single-point diamond turning as well as advances in computer controlled small lap polishing enables the fabrication of freeform optics, or more specifically, optical surfaces for imaging applications that are not rotationally symmetric. Various forms of polynomials for describing freeform optical surfaces exist in optical design and to support fabrication. A popular method is to add orthogonal polynomials onto a conic section. In this paper, recently introduced gradient-orthogonal polynomials are investigated in a comparative manner with the widely known Zernike polynomials. In order to achieve numerical robustness when higher-order polynomials are required to describe freeform surfaces, recurrence relations are a key enabler. Results in this paper establish the equivalence of both polynomial sets in accurately describing freeform surfaces under stringent conditions. Quantifying the accuracy of these two freeform surface descriptions is a critical step in the future application of these tools in both advanced optical system design and optical fabrication.

weight of an optical system. With the emergence of slow servo diamond turning, freeform optical elements are beginning to appear in rotationally nonsymmetric precision optics, e.g. head worn displays [1], IR-seekers [2], and illumination systems [3]. Freeform surfaces may be described by full aperture polynomials, as for example Zernike polynomials [4], or other descriptions such as splines [5], and radial basis functions (RBFs) [6,7]. Zernike polynomials and polynomials in that family are restricted to circular apertures, while splines and radial basis functions may handle generally shaped apertures. A widely used technique to represent optical surfaces with a circular aperture (which is currently most common) is to express the deviation from a conic in terms of a summation of orthogonal polynomials, where the orthogonal polynomials of choice are currently one of many forms of the Zernike polynomials (e.g. Born & Wolf, FRINGE, Noll, etc.).
Recently a new set of orthogonal polynomials over a circular aperture has been developed by Forbes, orthogonalized with respect to the mean square gradient over an enclosing circular aperture with the goal of facilitating measures of manufacturability, e.g. optical testing, pad polishing etc [8]. These polynomials will be referred to in this paper as gradient-orthogonal Q-polynomials following from the Q-polynomial form developed earlier for rotationally symmetric surfaces [9]. In this paper, we quantify the effectiveness of this new polynomial for describing a rotationally nonsymmetric surface. Specifically, we compare these gradientorthogonal Q-polynomials with the current most common form of describing a freeform optical surface as a summation of Zernike polynomials added to a conic base surface.
In the comparison, two sample surfaces form the basis for the presentation of results. These surfaces were chosen to represent a stressing example of departure from rotational symmetry to establish there are effectively no limits to the conclusions in the context of a rotationally nonsymmetric surface that might be developed for an imaging application on any scale as this new technology evolves and is leveraged by the optical design community. The surface departures are mathematically and conceptually simplified and do not directly represent any directly relevant surface at this time. They do however represent a representative case for spatial frequency in terms of cycles per aperture and in terms of an extreme asymmetry.
Orthogonal bases as freeform surface descriptors are well-behaved in terms of conditioning and provide independence and numerical robustness among basis elements. However, these orthogonal bases are defined only over specific aperture geometries constraining the aperture shapes of freeform elements. Freeform surfaces do not always conform to these aperture shapes. There are methods to overcome these obstacles, for example enclosing the aperture inside a circle is mentioned in [8].
This paper is organized as follows; Section 2 briefly summarizes two polynomial descriptions for characterizing the shape of freeform optical surfaces along with the recurrence relations for numerically robust computation. In Section 3, two benchmark test cases based on an F/1 parabolic surface with a generic asymmetric feature are investigated using two ray based data site sampling strategies. In Section 4, we show the results of performing a least-square fit of these two analytic surface test cases using the two sets of polynomials with two different ray-sampling grids. Also in this section, we show how the RMS fit residual grows with the height of the nonsymmetric feature that is placed on the F/1 parabolic surface at two different locations, before concluding the paper.

Gradient-orthogonal Q-polynomials and Zernike polynomials
One of the major parameters used to characterize a freeform optical surface is the rate of change of departure along the normals of a best-fit conic section. In earlier work to overcome the general ill conditioning of the historical power series polynomial as a descriptor of rotationally symmetric aspheric surfaces, Forbes derived two sets of orthogonal slope polynomials; Q con and Q bfs polynomials [9]. In extending his work on aspheric surfaces to freeform surfaces, which in general are not rotationally symmetric, he has proposed and derived gradient-orthogonal Q-polynomials [8]. With gradient-orthogonal Q-polynomials, a freeform surface is represented as follows where c is the curvature of the best fit sphere, u = ρ/ρ max with ρ being a position in the aperture and ρ max the radius of the enclosing circular aperture, 0 n Q represents the rotationally symmetric slope-orthogonal bfs Q polynomials, and m n Q represents the gradient-orthogonal Qpolynomials [8]. The first two terms of Eq. (1) account for the rotationally symmetric best-fit sphere and the summation of symmetric polynomial contributions to the surface departure from the best-fit sphere. The third term accounts for the rotationally nonsymmetric contributions of the surface departure.
where m m n u Z represents the standard Born and Wolf Zernike polynomials of order m [10]. In order to avoid numerical instabilities associated with the explicit expressions in computation of both Zernike and Q-polynomials of high orders, we use the recurrence relations defined for these polynomials in [8] and [11]. Zernike polynomials satisfy a standard three-term recurrence relation given as where n P represents the orthogonal Jacobi polynomial in a sequence related to Zernike polynomials as given in the paragraph before Eq. (4.1) in [11]. For each azimuthal order, this recurrence relation should be used to find the next polynomial in the set. For the coefficients a n ,, b n ,, and c n readers are referred to Eq. (4.1) of [11]. The rotationally nonsymmetric gradient-orthogonal Q-polynomials, much as earlier rotationally symmetric bfs Q polynomials, do not satisfy a standard three-term recurrence relation. Instead, they satisfy an unconventional three-term recurrence relation. First, for each freeform and bfs Q polynomial, an auxiliary polynomial must be computed with a standard three-term recurrence relation given as Eq. (3). The coefficients used for the auxiliary polynomials recurrence formula are defined in Eq. (A.2) and Eq. (A.3) in Appendix A of [8]. The resulting unconventional recurrence relation for the gradient-orthogonal Q-polynomials are given as where, m n Q is the Q-polynomial that we are computing, 1 m n Q − is the previously computed freeform polynomial, and m n A is the auxiliary polynomial that is precomputed. The  [8]. The effect of using recurrence relations with Zernike polynomials are given in Fig. 1 of [11] and Fig. 2 of [12].

Ray grids for data site sampling and test cases
We showed previously that, in the context of fitting a set of polynomials to a continuous analytical surface, edge clustered fitting grids demonstrate the best efficacy in terms of polynomial fitting optical surfaces in a least-squares sense [12]. Thus, we will make use of an edge clustered fitting grid in our performance evaluations. Edge clustered sampling is created by first generating random Halton points and then applying a sine function on the radial coordinate to move these points towards the boundary of the aperture.
To continue to illustrate the effectiveness of edge clustered ray grids and to enable comparison with earlier evaluations, we have also provided results using hexagonal subgrids centered on a uniform rectangular grid. We have sampled the optical surface with hexagonal subgrids rather than rectangular subgrids as the circular aperture is more uniformly covered with this strategy. In Fig. 1, we have illustrated two examples of the sample grids that will be used in the polynomial comparisons. Throughout the paper, when comparisons are made between two sets of polynomials with two different sampling grids, we make sure that we use about the same number of samples on each of the grids. In order to investigate the effectiveness of the gradient-orthogonal Q-polynomial versus the Zernike polynomial using the ray grids given in Fig. 1, we have formed a benchmark test suite consisting of analytical functions. The first test case is an F/1 parabola with a Gaussian bump away from the edge of the aperture. The second test case is again the same F/1 parabola with the same Gaussian bump placed now at the edge of the parabola. The aperture diameter for the F/1 parabola is chosen to be 80 mm. The Gaussian bump is 12.5 µm in height and has a 2.357 mm standard deviation. The analytical definitions for the test cases are given in Eq.
where 1 f represents the F/1 parabola with the Gaussian bump away from the edge, and 2 f represents the F/1 parabola with the Gaussian bump at the edge. To illustrate these two test cases, we plotted the sag departure from a best-fit sphere in Fig. 2. Similar to the Eq. (4.2) in [13], the curvature of the best-fit sphere (bfs) is computed as where the angle brackets denote the average of the sag at the edge of the aperture over θ . Because for freeform surfaces the sag also depends upon θ , correctly computing the curvature of the best-fit sphere has a profound effect on the computations associated with fitting of surfaces with gradient-orthogonal Q-polynomials, especially when the surface to be fitted has asymmetric components at the edge of the aperture. Figure 2(a) illustrates the sag departure of the first test case that is an 80 mm diameter aperture F/1 parabola with the Gaussian bump away from the edge of the aperture. Figure 2(b) shows the sag departure of the second test case that is an 80 mm diameter aperture F/1 parabola with the Gaussian bump at the edge of the aperture.

Numerical simulations
We investigated the fidelity of creating a freeform optical surface description based on the gradient-orthogonal Q-polynomials and the Zernike polynomials based on data sites placed on the hexagonal uniform and the edge clustered sampling grid. We have carried out the least squares fit with increasing numbers of basis elements (coefficients). The relation between the number of samples and the number of basis elements was established empirically as 9*k 2 , where k is the highest order of the polynomial in the polynomial fit. Truncation of the sums is carried out based upon the condition k<T, for some given integer T, and k equals m + 2n for the gradient-orthogonal Q-polynomials.
In Fig. 3, we have illustrated the effect of sampling on the fidelity of the surface representation with both sets of polynomials for the f 1 test case. We found that both polynomials performed almost identically for this test case. We have made use of approximately 54845 samples and 3320 polynomials with either set of polynomials. We have seen that for the f 1 test case, which is the 80 mm diameter F/1 parabola with a Gaussian bump placed away from the edge of the aperture, using hexagonal uniform ray grids both polynomials have peak-to-valley (PV) fit residuals ~10 nm (see Fig. 3(a)). Edge clustered ray grids result in a remarkable improvement on the overall fit residual profile, as shown in Fig.  3(b). Both polynomials produced PV fit residuals on the order of sub-nanometers with edge clustered ray grids. In Fig. 4, we have displayed the effect on the fit residual for test case f 2 , when the Gaussian bump is placed at the edge of the aperture. Also in this case, the fit residuals for gradient-orthogonal Q-polynomials and Zernike polynomials are quite indistinguishable. In Fig. 4(a), the hexagonal uniform ray grid is used to create data sites for the least squares fitting, and we see that the PV fit residuals are ~4 nm with the gradient-orthogonal Qpolynomials and Zernike polynomials. The outcome is more compelling with the edge clustered ray grid, which increases the density of data sites towards the edge of the aperture. As seen in earlier work, this ray grid strategy significantly reduces the PV fit residuals, as seen in Fig. 4(b). We observe that the Zernike polynomials and the gradient-orthogonal Qpolynomials produced a sub-nanometer fit residual with the edge clustered sampling as shown in Fig. 4(b).
In Fig. 5, we have compared RMS fit residuals that result in fitting test cases f 1 and f 2 with hexagonal uniform and edge clustered ray grids with the two polynomial sets. We have gradually increased the degree of the Zernike polynomials and the gradient-orthogonal Qpolynomials as the truncation parameter in the sum is moved from T = 5 to T = 80 in steps of 10. As the number of basis elements goes up from 19 to 3319, the number of data samples in the fit increases from 226 to 54845. As found previously, we have observed that the edge clustered sampling consistently produces better fits when compared to hexagonal uniform sampling as demonstrated by the solid black lines in Fig. 5(a) and Fig. 5(b). We have also shown that gradient-orthogonal Q-polynomials and Zernike polynomials produced effectively exact representations with edge clustered sampling for both the less stressing f 1 case, with the bump away from the edge and the more stressing f 2 case, with the bump at the edge as marked with solid black lines in Fig. 5(a) and Fig. 5(b).  In the case of fitting an analytical surface description using the intrinsically less effective hexagonal uniform data site sampling, the gradient-orthogonal Q-polynomials residual fits are slightly better than for the Zernike polynomials for both the f 1 and the f 2 test cases, as shown by the dash-dot blue curves in Fig. 5(a) and Fig. 5(b). Zernike polynomials and gradientorthogonal Q-polynomials combined with edge clustered sampling consistently produced significantly better fits as the maximum degree of the polynomial is increased from T = 5 to T = 80 (see the black solid curves in Fig. 5(a) and Fig. 5(b)). For both test cases f 1 and f 2 , fits with Zernike polynomials and gradient-orthogonal Q-polynomials reached the required subnanometer levels (see point B in Fig. 5(a) and Fig. 5(b)). The gradient-orthogonal Qpolynomials performed as well as Zernike polynomials in achieving the accuracy levels in describing the optical surfaces, as given here as test cases f 1 and f 2 .
We then expanded the test case study to quantify the fit residuals for Zernike polynomials and gradient-orthogonal Q-polynomials by systematically doubling the height of the Gaussian bump. We have quantified the minimal RMS fit residual in the fits for when the truncation point in the expansion is determined by T = 80, with hexagonal uniform and edge clustered sampling with both polynomial sets for the height of the bump set at 12.5µm, 25µm, 50µm, and 100µm as shown in Fig. 6. Dash-dot lines show the RMS fit residuals in the least-squares approximations with gradient-orthogonal Q-polynomials. Solid lines are used when the Zernike polynomials are used. Results show that there is a linear relationship between the minimum RMS fit residual and the height of the bump. Specifically, in Fig. 6(a) that addressed a bump away from the edge of the aperture (i.e. case f 1 ), Point A shows the RMS fit residual when the height of the bump is 12.5 µm using Zernike polynomials with edge clustered sampling. Point B shows the RMS fit residual when the height of the bump is 100 µm. The RMS fit residual increased from 4.5x10 −12 m to 3.6x10 −11 m that is 8 times. An equivalent relation is also found for the Points C and D. Moreover, we observe that with edge clustered sampling both polynomials produced two orders of magnitude better RMS fit residuals when compared with the performance of either polynomial with hexagonal uniform sampling (see blue and black curves in Fig. 6). The Point A records a RMS fit residual 4.5x10 −12 m, and Point C shows 1.8x10 −10 m residual fit departure.
The blue curves show RMS fit residuals in the approximants when hexagonal uniform sampling is used for both polynomial sets. In Fig. 6(a) the blue dash-dot curve is slightly lower than the solid blue line indicating the gradient-orthogonal Q-polynomials performed slightly better, while not significantly, with hexagonal uniform sampling for the test case f 1 . Similarly for Fig. 6(b), the Zernike polynomials performed slightly better, while not significantly, with hexagonal uniform sampling for the test case f 2 (see blue lines in Fig. 6(b)). Similarly the black curves demonstrate the improved performance with edge clustered sampling. We can see for both Fig. 6(a) and Fig. 6(b) that the black dash-dot line and the black solid line coincide, which suggest that with edge clustered sampling Zernike polynomial and gradient-orthogonal Q-polynomials provide fits with identical fidelity for the test cases f 1 and f 2 .

Conclusion and future work
In this work we have seen that in order to achieve an acceptable polynomial fit to an asymmetric localized feature any single additive polynomial requires many terms, on the order of thousands, if subnanometer accuracy is required, as is often the case in precision optics. We have also observed that Zernike and gradient-orthogonal Q-polynomials placed additively on a base conic section are able to equally represent the nonsymmetric features of the surface no matter where these features might be positioned over a significant range of feature height and slope. One crucial step working with Q-polynomials is to accurately calculate the curvature of the best fit sphere, which later on effects the sag computation significantly, see Eq. (1) and Eq. (7). Also, in both cases, the use of recurrence formula is a key enabler to nanometer accuracy when representing high frequency features in an aperture.
In all the analyses carried out, we have used least-squares methods in arriving at the coefficients of fit. In a real optical design environment, these approximations are the results of optimization procedures involving not only the polynomials, but also their first and second derivatives. Hence a next level of comparison takes into account the first and second derivatives of the polynomials under evaluation. Also, the offset Gaussian bump may be considered as a possible extreme feature to fall beyond a departure that would be seen in a freeform optical design for an imaging application. In addition, while representing a surface with thousands of coefficients such as given in this paper currently exceeds the capabilities of commercial optical design optimization, it does not exceed their analysis. An alternative to thousands of terms for representing a generic asymmetric feature, while perhaps not as narrow as in this paper, is to consider using a number (tens) of multicentric additive bases. An initial evaluation of this approach is found in [7].
The capability to fabricate rotationally nonsymmetric surfaces for imaging applications is a new capability for the industry and as a result there are currently few examples. However, the generation of aspheric surfaces with small tool grinding and polishing provides an early set of surface examples that often suffer from significant mid spatial frequencies. Also bump generation with small tools polishing may occur during the fabrication process. For this study, the stressing asymmetric surface was used to establish that are there no limits to the application of the results in the context of current or future rotationally nonsymmetric surfaces in image forming optical systems. The offset Gaussian bump may be considered as a possible feature during fabrication if considered as an isolated bump. However, there is some anticipation that the Gaussian bump used in this simulation could represent a limiting spatial frequency in the aperture, but, as part of an imaging surface departure, it would be expected that there may be tens of, or perhaps even hundreds of features with this limiting geometry on a future surface. Future work will investigate the application of the tools developed under this work to fitting mid spatial frequencies on measured surface data with the goal to set tolerances for fabrication. The application of freeform surfaces in advanced optical system design also requires establishing quantitatively the equivalence between various freeform surface descriptions.