Machine learning for analyzing and characterizing InAsSb-based nBn photodetectors

This paper discusses two cases of applying artificial neural networks to the capacitance–voltage characteristics of InAsSb-based barrier infrared detectors. In the first case, we discuss a methodology for training a fully-connected feedforward network to predict the capacitance of the device as a function of the absorber, barrier, and contact doping densities, the barrier thickness, and the applied voltage. We verify the model’s performance with physics-based justification of trends observed in single parameter sweeps, partial dependence plots, and two examples of gradient-based sensitivity analysis. The second case focuses on the development of a convolutional neural network that addresses the inverse problem, where a capacitance–voltage profile is used to predict the architectural properties of the device. The advantage of this approach is a more comprehensive characterization of a device by capacitance–voltage profiling than may be possible with other techniques. Finally, both approaches are material and device agnostic, and can be applied to other semiconductor device characteristics.


Introduction
Recent advances in computing capabilities have led to an unprecedented rise in data generation and availability. As such, there has been a need for rapid development of tools to aid researchers in identifying trends hidden deep within high-dimensional datasets. Machine learning [1][2][3], a subset of artificial intelligence concerned with studying techniques for building data-driven mathematical models, offers a set of tools that are well-suited to address this need. In addition to computing hardware becoming more powerful, improvements in machine learning algorithms and the development of user-friendly machine learning frameworks has made these types of tools more accessible than ever.
Machine learning has gained renewed interest recently due to new neural network architectures achieving state of the art performance in numerous disciplines for applications in image and speech recognition, language processing, and data analysis. For instance, machine learning techniques have started to be routinely used, or researched for use, in many diverse scientific and engineering fields, including bioinformatics [4], materials science [5][6][7], and radiology [8,9]. In this paper, we discuss machine learning in the scope of semiconductor device development.
The semiconductor industry has been using machine learning to address topics in modeling, manufacturing, and recently, design. For example, artificial neural networks (ANNs) have shown to be useful as a method of creating compact models for circuit simulators [10][11][12][13]. ANNs can reproduce device characteristics over wider ranges of operating conditions where analytic equations may no longer be valid. From another perspective, ANNs could also be used to create a statistically representative compact model for a batch of fabricated devices that have been characterized over temperature, voltage, and frequency, creating a more realistic representation of the product in the circuit simulation. In either case, once trained, ANNs are faster to evaluate than a numerical approach, and can offer higher accuracy and superior generalization than a look-up table. Another compelling example in the semiconductor industry is the application of machine learning to manufacturing. By maintaining logs of each fabrication cycle, facilities can generate vast amounts of data each year. This data can be leveraged using machine learning techniques to improve models used for fault detection, predictive maintenance, correlating processing parameters to yield, and develop advanced metrology techniques [14][15][16][17][18]. A notable example is the use of machine learning to aid in wafer fault detection [19,20]. By monitoring the progress of a wafer through a series of processing steps, a trained convolutional neural network can prevent accruing additional cost by using pattern recognition to find defects on the wafer, identifying the faulty process, and preventing the wafer from continuing through the fabrication line. The neural network can have higher detection accuracy than a human technician, and also offer insight into ways to improve the fabrication line by analyzing the features that led the network to its prediction.
A widely encountered challenge in device development is understanding how each design parameter affects performance. It quickly becomes difficult to understand parameter-performance correlations as the number of free parameters increases. There has been recent progress in using approaches based on machine learning to address this problem [21]. However, these techniques have yet to be applied to infrared photodetectors, where the device architecture and material are extensively engineered to meet performance requirements in demanding applications.
As is the case in many areas of engineering, research regarding infrared sensor design and fabrication is predominantly driven by finding ways to reduce cost while simultaneously improving performance and adding new capabilities. The barrier detector design [22][23][24] has achieved success for midwave-infrared sensing in this aspect by allowing the detector to reach background-limited performance at higher operating temperatures, reducing the cooling requirements for the imaging system. Realization of these types of innovative designs requires sufficient theoretical knowledge of the device operation, often through physics-based analytical or numerical modeling. There has been much effort in modeling barrier infrared detectors in recent years [25,26], and as a result, the methodology and materials models are now relatively mature. As modeling capabilities and computing hardware continue to improve, we are able to more efficiently study complex device designs. Machine learning offers tools to both ameliorate analysis of the data to make definitive conclusions, and construct physics-based surrogate models for complex device phenomena. It is the aim of this work to demonstrate a methodology for applying machine learning to simulation results to create models for studying, analyzing, and optimizing device characteristics. To this end, we look at two approaches of applying neural networks to the capacitance-voltage (C-V) characteristics of InAsSb-based nBn photodetectors. To our knowledge, this is the first report of applying neural networks to this type of device and its C-V characteristics.
While capacitance is not a common figure of merit when discussing infrared photodetector performance, it is a common technique for characterizing semiconductor materials [27]. Moreover, it has been shown [28][29][30] that the C-V characteristics for barrier-style devices can elucidate more about the underlying materials. The C-V profile can be used to extract information about the doping density in all three critical device layers, the absorber, barrier, and contact, as well as the thickness of the barrier. To this end, this paper covers two approaches for applying neural networks to C-V characteristics.
First, we show that an ANN can be trained to accurately predict the capacitance based on a subset of the device's architectural parameters. In this case, we include the doping densities of the absorber, barrier, and contact layers, the thickness of the barrier layer, and the applied voltage as input features for the model. Once trained, the model is used as an analysis tool to understand the role of each feature in shaping the C-V characteristics.
Second, inspired by recent developments in image recognition, we address the inverse problem by using a convolutional neural network to create an enhanced characterization tool for C-V analysis. The convolutional network is used in place of conventional analytic techniques, and may be applied to other devices and cases where analytical models are not valid. In this paper, we train a convolutional neural network to predict the doping densities of the absorber, barrier, and contact layers, and the thickness of the barrier layer and show excellent generalization over a wide range of values for each parameter.
This manuscript is organized as follows: section 2 introduces the neural networks studied in this work, and discusses the C-V characteristics of barrier infrared detectors; section 3 presents the methodology used for data acquisition and training the networks; section 4 presents the results; section 5 concludes the article.

Neural networks: multilayer perceptron
Machine learning provides many algorithms for creating predictive models from data. One approach, and the one we adopt in this work, are neural networks. We briefly discuss a few of the basic concepts of neural networks in this section.
The first type of network we present in this work is based on the multilayer perceptron model where information is propagated through a set of fully-connected layers. This type of feedforward network calculates a prediction vector,ŷ = {ŷ 1 , . . . ,ŷ k }, by performing a series of simple calculations on an input vector, x = {x 1 , . . . , x p }. The vectors x andŷ represent the input and output layers respectively, and elements of x are referred to as features or predictors. The calculations that lead the input to the output take place in the hidden layer. The hidden layer contains layers of fully-connected elements, or neurons. The number of layers, and neurons in each layer, need to be optimized for the given problem. Consider two sequential layers H 1 and H 2 with n and m neurons respectively. The value calculated by the jth neuron in H 2 is given by where w 2 ij , b 2 j , and z 1 j denote the weight associated with the connection between the ith neuron in H 1 and the jth neuron in H 2 , the jth neuron's bias, and the activated value from the ith neuron. The function, σ, is the activation function for this neuron, and is a way to introduce non-linearity into the model. There are many choices of activation functions, and more are researched to improve computational efficiency, stability, convergence rate, and accuracy. In this work we will use hyperbolic tangent, and the exponential linear unit, or ELU, given by [31] As we increase the complexity of the model by adding more neurons and hidden layers, the output will become an increasingly complex function of nested function evaluations. The sets of weights and biases for each hidden layer are represented as matrices of free variables that are optimized by an optimization algorithm during the learning process. During training, the network is exposed to a set of N samples, X = {x 1 , . . . , x N } T of known outputs, Y = {y 1 , . . . , y N } T , referred to as the training data. An important consideration when designing the network is the complexity of the model. If there are too many weights and biases compared to N, the model may overfit the data, and will not generalize well to unseen data. The weights and biases in the network are optimized to minimize a cost function J, expressed as L is the loss function that is used to calculate the error in each prediction for every sample. θ is used to denote the set of all hyperparameters used in the model, for instance, the weights, biases, learning rates, and data processing. For regression, a common choice, and what we use here, is the mean squared error Similarly to σ, there are several choices for J and L for other types of problems. The recent reemergence of neural networks is due in part to improvements in optimization algorithms for updating the weights and biases. For large-scale applications, the numbers of weights and biases can be such that the task of optimizing hundreds of thousands of free parameters is non-trivial, either suffering from issues with stability or being simply computationally prohibitive. One of the most popular, and a particularly intuitive approach, is to use the method of steepest descent. The gradients of equation (1) with respect to the weights and biases are calculated by repeatedly applying the chain rule-an approach called backpropagation. These gradients are then used to update each weight and bias in the network in such a way that the cost function is, ideally, reduced with every iteration.

Convolutional neural networks
Convolutional neural networks, or convnets, are frequently used for pattern recognition due to their ability to identify relevant features in an image by learning the correspondence between neighboring pixels [32]. The key to the success of convnets is their use of learned sets of weights and biases, called filters, used to perform convolutions over clusters of pixels [1]. The outputs of these convolutions are referred to as feature maps. These maps can highlight pertinent information in the image; for example, feature maps have shown to emphasize edges of objects with various orientations [3,33,34].
Basic convolutional networks are based on the following design. The input, considering for example a greyscale image, is simply a structured array of pixel intensity values. This array is passed to a series of convolutional blocks. These blocks are composed of convolutional and pooling layers. The purpose of the convolutional blocks is to extract the salient information from the image, which is then used by a final fully-connected network in its prediction.
Consider the following example of a convolutional layer in the network. Let I ij denote the intensity of the pixel located in row i and column j. Consider filter f, and assume it has m rows and n columns. Then, the resulting value for the feature map at this position, z f ij , when the top-left of filter is aligned with pixel ij will be The final feature map contains the outputs, z f for every position on the image that the filter is applied. Each filter contains n × m weights and a single bias. Notice that now, unlike the multilayer perceptron model in the previous subsection, we have decoupled the number of free parameters in the model from the dimensionality of the preceding layer by sharing the weights and bias in each filter with every input. The dimensions of the feature map are controlled by how the filter is stepped across the image and the dimensions of the filter. The number of filters, the filter dimensions, and the way the filter is stepped across the preceding layer are treated as hyperparameters that are typically determined empirically. While the number of free parameters associated with the convolutional layers is controlled by the filters, the feature maps may still have high dimensionality. Recall that the output of the convolutional block will be given to a fully-connected network. Since the number of free parameters associated with each set of fully-connected layers is directly proportional to the number of inputs, it can be useful to further reduce the dimension of each layer in the convolutional block. One common approach is to use pooling. Pooling involves calculating an aggregate assumption about a cluster of pixels, usually by finding the average, minimum, or maximum value. The size and stride of the pooling operation are again hyperparameters that must be chosen, but are both commonly fixed at two in order to reduce the dimensionality in half.
While we limited this brief discussion of convnets to images, the pattern recognition ability can be applied to any type of structured data where information can be found in localized clusters of points. For this reason, convnets are continued to be researched for applications in signal processing. A few compelling examples include using convnets to formulate diagnoses based off of medical signals, such as echocardiograms [35,36] and electroencephalograms [37]. Following similar logic, a semiconductor device's electrical characteristics, such as current-voltage or C-V, are representative of the underlying device structure and material quality. It is the goal of this paper to introduce a methodology based on convolutional neural networks to predict device-related parameters from C-V characteristics.

Capacitance-voltage characteristics
Capacitance-voltage profiling is a common approach to measure the doping density of a semiconducting layer within a device [27,38]. The capacitance of a barrier detector can be approximated as a series capacitance from three regions of the device, or [28,29,39] 1 where C j,A and C j,C are the junction capacitances due to the absorber-barrier and contact-barrier interfaces, and C B is a parallel plate capacitance when the barrier layer is fully depleted given by where t B , ϵ r , and ε 0 are the thickness of the barrier layer, relative dielectric constant, and vacuum permittivity. Usually, assumptions are made regarding equation (2) to characterize the barrier detector. For the structure considered in this work, an n-type absorber, N-type barrier, and nth type contact, when a reverse bias is applied, equation (2) can be simplified to a series capacitance of C j,A and C j,B , since C j,C will be a relatively large accumulation capacitance. Under this assumption, it is possible to extract the absorber doping density through the conventional technique from metal-oxide-semiconductor theory by analyzing 1/C 2 [27,39]. A more complete solution for equation (2) can be obtained by considering the three semiconducting layers and applying Poisson's equation to the two interfaces with appropriate boundary conditions [29,40]. This approach, however, can be challenging to implement due to requiring a numerical approach to solve the resulting transcendental equation. An alternative is to use a drift-diffusion model to solve a set of coupled  equations. Drift-diffusion alleviates many of the constraints imposed by the analytical approach, but the added computational complexity and need for advanced materials models can also be a hinderance. In this paper, we apply machine learning to this problem in two ways. First, by using drift-diffusion simulation results, we train a neural network to create a surrogate model for the C-V relationship that contains the added physical accuracy of drift-diffusion, while being quickly evaluated like analytical models, facilitating rapid analysis of the characteristics. This approach creates a model that can be easily used to study these types of complex problems, and can deepen our comprehension of performance-parameter correlations in multidimensional spaces. Second, we suggest a solution to the inverse case where a more complete characterization of the device is possible by training a convolutional neural network to analyze its C-V profile.

Methods
In a previous work, we used Synopsys Sentaurus TCAD to simulate the C-V characteristics of InAsSb-based nBn photodetectors [29,41]. In the same work, we showed through a combination of analytical and numerical modeling that we are able to reproduce measured C-V data of similar nBn devices. The detector structure used to perform the simulations is summarized in table 1. Simulations were performed assuming a temperature of 150 K, and with a frequency of 1 MHz. More details regarding the simulation methodology, materials parameters, and analysis can be found in other works [29,42].
For this study we created a database of drift-diffusion computed capacitances for varied values of the absorber doping density, N d ,A, barrier doping density, N d ,B, contact doping density, N d ,C, barrier thickness, t B , and the applied voltage, V a . One prominent issue in creating data-driven models is how the the data is sampled. The easiest approach is using a grid search over the parameters, where every permutation of set of discrete values for each feature is sampled. However, while simple to implement, this causes the model to only observe a small number of values for each feature. Furthermore, if we consider using N discrete values for n features, the number of samples required is N n , which quickly becomes infeasible for large values of N to ensure adequate mapping of the parameter space. To address this issue, we opted for an approach using a quasi-random sampling approach based on Halton sequences [43][44][45]. Quasi-random techniques offer better coverage of the parameter space than gridded sampling, while preventing the possibility of clustering by random sampling. In this instance, if we take N samples, there will be N different values for each parameter that are more evenly distributed through the space. In this study, we used Halton sequences to discretize N d ,A, N d ,B, N d ,C, and t B , while the values of V a were fixed at 101 evenly spaced points during each simulation.
We used TensorFlow [46] as the framework for initializing, training, and evaluating the neural networks. The first neural network considered in this work is a fully-connected feedforward network. The input layer is comprised of five neurons for N d ,A, N d ,B, N d ,C, t B , and V a . The hidden layer contains four layers with ten neurons each. We used hyperbolic tangent as the activation function for each neuron. The output layer is a single neuron with a linear activation representing the predicted value of the capacitance, C. A summary of the network architecture is shown in figure 1. Weights were initialized using Glorot initialization [47]. Biases were initialized as zeros. The weights and biases were optimized using the Adam optimization algorithm [48]. The recommended values for the hyperparameters β 1 , β 2 , and ε of 0.9, 0.999, and 10 −7 were verified to be reasonable by monitoring the validation loss during a small grid search of different values for each. Prior to training, we applied a Z-score normalization to the training inputs and labels to transform the sets to have zero mean and unity standard deviation, and used the base-10 logarithmic values of N d ,A, N d ,B, and N d ,C. For stability the initial learning rate was set to 10 −4 . The networks were trained using 128 quasi-random samples of N d ,A within 10 14 -10 17  The second considered case uses a convolutional neural network; a summary of the network architecture is shown in figure 2. In this case, a C-V profile is used as the input layer, and is represented as a one-dimensional array with 101 capacitances. The inputs were normalized prior to training using Z-score normalization. The convolutional block is made up of two sets of convolutional layers each followed by average pooling layers. Both of the convolutional layers have eight 5 × 1 filters with ELU activations. The pooling layers use a stride and step of two. We apply global average pooling to the output of the convolutional block. The result is passed to a fully-connected network with three layers comprised of eight neurons each, and ELU activations. The output is four neurons, representing the values of N d ,A, N d ,B, N d ,C, and t B . The Adam algorithm with the default hyperparameters was also used for this network with an initial learning rate of 10 −4 . Note, to better handle the orders of magnitude spanned by N d ,A, N d ,B, and N d ,C, we used the base-10 logarithmic values, but use the true values when reporting the results. A Z-score normalization was also applied to these training labels. A set of 768 quasi-random values of N d ,A, N d ,B, N d ,C, t B , and the 16 values of the four-dimensional hypercube were using to simulate the C-V profiles to create the training and validation sets, with 90% used for training and 10% for validation. Cases on the corners of the parameter hypercube are weighted higher during training.

Multilayer perceptron 4.1.1. Learning curve and network generalization
The learning curve for the first neural network is shown in figure 3(a). The network exhibits low validation loss without overfitting, due to the network only containing 401 free variables and being trained with about 7000 samples. Figures 3(b) and (c) present prediction-expectation plots for the training and validation sets. An ideal predicted-expected curve would be a line with unity slope. The quality of fit for the neural network model is quite higher for both datasets, with R 2 values near one. In this case, the validation set contains the same values of N d ,A, N d ,B, N d ,C, and t B that are present in the training set given how the two sets were constructed. The excellent performance of the network on the validation set confirms that the network is not  overfitting the training data, and generalizes well to new voltage values. To confirm generalization to unknown values of N d ,A, N d ,B, N d ,C, and t B another dataset was generated. The same predicted-expected curves are shown in figure 3(d), and confirms the ability of the network to generalize well to unseen cases. Figure 4 presents examples of predicted C-V profiles by the network through univariate parameter sweeps. Note, the network was not trained using any of the exact cases of N d ,A, N d ,B, N d ,C, and t B shown in figure 4. For each univariate sweep, the predicted C-V curve smoothly follows the expected trends for each parameter, while matching the test data well. For example, in figure 4(a) N d ,A is varied between 10 15 and 10 17 cm −3 . The network correctly captures the trend where a larger reverse bias applied to the contact layer is required to expand the depletion region in the absorber for high doping densities, manifesting as a capacitance that drops more slowly with voltage. Physically this is due to how the depletion region in the absorbing layer expands with reverse bias; generally, a larger reverse bias is required to extend the depletion region in highly doped semiconductors, and hence, the capacitance will drop at a lower rate with reverse bias. The trained neural network accurately captures this behavior.
The barrier doping density, N d ,B, is varied in figure 4(b). The barrier doping density affects the C-V by changing the flatband voltage where either the absorber (contact) layer changes from accumulation (depletion) to depletion (accumulation) under forward (reverse) or reverse (forward) bias respectively. This voltage is the point at which the capacitance starts to decrease with bias, and indicates that the absorber (under reverse bias) or the contact layer (under forward bias) has a sufficiently large depletion region to  dominate in equation (2). As the barrier doping density increases, the magnitude of the crossover voltage should also increase, as noted in previous works [25,29]. Again, the neural network accurately reproduces the expected behavior. The trend in figure 4(c) is the same as in (a), except for forward bias. Under forward bias the contact layer depletes, and can limit the device capacitance. Unlike the absorber, however, the contact is only 100 nm thick, and can fully deplete under forward bias when the doping density is low. When this happens, the capacitance approaches a constant value. Once more, the network is able to reproduce both of these physical phenomena.
Finally, figure 4(d) varies the thickness of the barrier layer. The thickness of the barrier affects the capacitance by increasing the dielectric width in a parallel plate capacitor, as in equation (3). The parallel plate capacitance decreases as the barrier thickness increases, and further limits equation (2). Just like the previous cases, the neural network is able to capture this trend with high fidelity.

Partial dependence plots
One way to gain a global perspective on how each feature shapes the output is to generate partial dependence plots [49][50][51]. Partial dependence can help the user in understanding the role of a feature in the prediction of any black box function. Figure 5 shows bivariate partial dependence plots illustrating the average predicted capacitance as functions of the applied voltage and other features. We computed partial dependence by using the following approach. First, we create sets of discrete values for N d ,A, N d ,B, N d ,C,   Rel. Sensitivity (1) Figure 7. (a) An example of a C-V profile for reference when considering the relative feature sensitivity in (b). Curves in (b) were computed using equation (4).
represents the partial dependence of the capacitance with respect to the fixed features by averaging the effect from the others. Using figure 5(a) as an example, we can observe how the capacitance is shaped by the absorber doping density. N d ,A has little affect on the value of the forward bias capacitance, as shown by a constant value ofC with varied N d ,A. This is consistent with the previous discussion of figure 4; under forward bias the absorber-barrier interface is under accumulation, and has a larger capacitance than either the parallel plate capacitance determined by the barrier thickness, or the contact layer's junction capacitance. On average, the capacitance rapidly drops under reverse bias for low values of N d ,A, consistent with a low doped absorber layer rapidly depleting. Shown in figure 5(b), N d ,B broadens the C-V profile for large doping densities, consistent with how N d ,B affects the crossover voltages for the absorber and contact layers. Similarly to figures 5(a) and (c) shows that N d ,C only affects the C-V under forward bias, and that for low values of N d ,C the capacitance quickly reaches a constant value, again consistent with the layer fully depleting. Figure 5(d) reveals more information about the t B dependence of the C-V than figure 4(d). On average, t B is important over the full voltage range, since t B can directly limit the total capacitance in equation (2) through C B .

Sensitivity analysis
The partial dependence plots in the previous section are useful for providing insight into the average value of the prediction due to each feature. Another analysis that can be useful is to quantify how sensitive the output is to changes in the features. With automatic differentiation we have direct access to the gradient of the capacitance with respect to the inputs. Using a similar approach to the partial density plots, we create a set comprised of every possible combination of features in the parameter-space, compute the gradients, and find the interquartile range, median, and average values. The result is shown in figure 6, where for every voltage we have computed a distribution of partial derivatives of the capacitance with respect to each of the features. This enables both a qualitative and quantitative understanding of the extent of the effect that each has on the capacitance. For example, as previously discussed, figure 6(a) shows that changes in N d ,A have little impact on the capacitance under forward bias, and is most influential at low reverse bias. However, it also shows that a incremental change of N d ,A by 10 14 cm −3 , the capacitance, on average, will rise by 0.2 nF cm −2 at about −2 V.
Of course, we can also study a specific case, and look at the gradients to understand which parameter should be changed to meet a design requirement. For example, depending on the application it may be beneficial to reduce the capacitance under reverse bias to increase the speed of the device. Figure 7 shows an The figure clearly shows that reducing N d ,A would have the largest effect in achieving a lower capacitance, but also offers information on the other features as well.

Convolutional neural network
The previous discussion established neural networks as a robust tool for predicting and analyzing device characteristics. In this section we consider the inverse case, where a convolutional neural network is trained to characterize the device's architectural properties. From this perspective, the neural network is used as an enhanced metrology technique for extracting more information from the C-V characteristic. The learning curve for training the convolutional neural network to predict N d ,A, N d ,B, N d ,C, and t B is shown in figure 8, and the quality of the fit is summarized in table 2. Since the complexity of the model was reduced to accommodate the limited number of training samples, the validation error follows the training error without overfitting. The average and median errors in both the training and validation sets are low, with R 2 values are near unity for all cases. With more samples, and a more complex network, these errors could be reduced further.
The performance for the convolutional network is shown in figure 9. Shown in figure 9(a), the absorber doping density is accurately predicted for all cases within 10 14 -10 17 cm −3 , with a majority of cases falling within 0.8 to 1.2× the expected value. While a 20% error in these outliers may seem high, to put it in perspective this would indicate that a doping density of 10 15 cm −3 would be either over-or underestimated 10  Predicted / Expected (1)  Figure 9 also provides insight into cases that are difficult to predict or obtain information about the features. Cases of low N d ,C and high t B have wider distributions of predictions. For low N d ,C, the contact layer quickly depletes and the capacitance is constant without any distinctive features, similar to what was shown in figure 4(c). Similarly, a large value of t B causes the C-V to flatten has the parallel plate capacitance from the barrier dominates, reducing the important of aspects of the profile, like shown in figure 4(d).

Conclusion
In this work, we introduced a methodology for applying machine learning to analyze and characterize semiconductor devices based on their electrical characteristics. Specifically, we looked at two cases using ANNs trained with C-V data obtained by drift-diffusion simulations of InAsSb barrier infrared detectors. First, we created a simple model to predict the capacitance based on supplied values for the absorber, barrier, and contact layer doping densities, and the barrier thickness. We demonstrated that this model provides a high quality fit, generalizes well to unseen data, and reproduces the behavior expected when considering the underlying device physics that determine the capacitance of the device. Moreover, we showed the usefulness of this approach in analyzing the global parameter space associated with the prediction through partial dependence plots. This approach was also shown to have potential use in helping with critical design decisions by analyzing the capacitance's sensitivity to each feature.
While capacitance is not typically the critical metric used when discussing photodetector parameters, it is a useful diagnostic tool, and elucidates more about this particular type of infrared photodetector than other competing designs. However, the methodology discussed here is agnostic of the characteristic, so while we focused on the C-V relationship, the approach could be extended to other figures of merit and used to optimize performance around different sets of features. For instance, a model could be created to include an additional metric, such as dark current, and used to understand the tradeoff between lowering capacitance and photodetector noise with the objective of meeting application-specific performance goals. Additional features could also be included to create a more comprehensive representation of the device for further optimization. Finally, it is worth noting that the methodology introduced in this paper is agnostic of the material or device under study, so a similar approach could be used to study other types of devices-such as those that use strained superlattices as absorbing layers-their performance, and the role of their associated design features. The key challenge in extending the methodology to other devices lies in the ability to generate or acquire additional data.
The second contribution from this work was the demonstration of using a trained convolutional neural network to analyze a barrier detector's C-V characteristics to predict N d ,A, N d ,B, N d ,C, and t B . The model provides value as a characterization tool by alleviating constraints imposed by other analysis approaches and enabling the possibility of extracting additional diagnostic information from a measurement. The model was shown to perform well over a wide range of absorber, barrier, and contact layer doping densities, and barrier thicknesses. While this study was limited to a small number of device parameters, the complexity of the model could be increased by simulating additional cases to encompass more details about a device, including for example, the thickness and composition of each semiconducting layer. Of course, inclusion of additional features may require a more complex network and acquisition of additional data, leading to a higher computational cost in training the model to achieve a reasonable accuracy. Another opportunity would be to train the model to characterize heterointerfaces and other semiconductor surface properties by characterizing band offsets, defect energy levels, or trap densities. In the future, it could prove useful to explore the feasibility of using data augmentation or synthetic data generated by the feedforward model, or both, to improve model generalization and prediction accuracy.
The data that support the findings of this study may be made available upon reasonable request from the authors.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.