Classification of nucleic acid amplification on ISFET arrays using spectrogram-based neural networks

The COVID-19 pandemic has highlighted a significant research gap in the field of molecular diagnostics. This has brought forth the need for AI-based edge solutions that can provide quick diagnostic results whilst maintaining data privacy, security and high standards of sensitivity and specificity. This paper presents a novel proof-of-concept method to detect nucleic acid amplification using ISFET sensors and deep learning. This enables the detection of DNA and RNA on a low-cost and portable lab-on-chip platform for identifying infectious diseases and cancer biomarkers. We show that by using spectrograms to transform the signal to the time–frequency domain, image processing techniques can be applied to achieve the reliable classification of the detected chemical signals. Transformation to spectrograms is beneficial as it makes the data compatible with 2D convolutional neural networks and helps gain significant performance improvement over neural networks trained on the time domain data. The trained network achieves an accuracy of 84% with a size of 30kB making it suitable for deployment on edge devices. This facilitates a new wave of intelligent lab-on-chip platforms that combine microfluidics, CMOS-based chemical sensing arrays and AI-based edge solutions for more intelligent and rapid molecular diagnostics.


Introduction
The COVID-19 pandemic has highlighted the need for rapid and accurate diagnosis of infectious diseases. AI-based edge solutions have shown significant promise towards the real-time classification of sensor data and have the potential to realise fully integrated sensing and processing systems which operate intelligently with their data. The use of Lab-on-Chip (LoC) devices can enable tests to be brought to the point of need and therefore stand to benefit from the use of edge-based AI. Towards this goal, this work introduces a new methodology involving spectrograms and AI for portable CMOS-based LoC diagnostic platforms that can rapidly identify infectious diseases and cancers by using arrays of electrochemical sensors.
Infectious disease detection commonly relies on identifying two types of biomarkers. The first involves, detecting antigens specific to the pathogen and is achieved by reading the signal associated with antigen-antibody binding. Lateral flow devices (LFD) are the most accessible readout technology, achieving low-cost diagnostics at the point of care, thanks to their simplicity of use and manufacture, at the expense of accuracy of the test [1]. The second involves, detecting nucleic acids using molecular methods to target and amplify a known * Corresponding author.
sequence. Quantification can be achieved by analysing the speed of the reaction. Common nucleic acid amplification tests (NAATs) include the polymerase chain reaction (PCR) and loop-mediated isothermal amplification (LAMP). PCR is the gold-standard laboratory technique and is preferred for high-accuracy diagnostics. Both techniques are highly specific and commonly only result in amplification when their specific nucleic acid target sequence is present in the solution.
There are several readout methods associated with biomarker detection. Fluorescence methods are currently very common and well suited for applications in specialised laboratories as they rely on bulky and expensive optical instrumentation [2]. However, they fall short of the requirements of point-of-care diagnostics in low-resource settings. In this context, electrochemical sensing in CMOS technology is a promising alternative that leverages the economies of scale of the semiconductor industry and combines scalability, miniaturisation and high accuracy. Ion-Sensitive Field-Effect Transistors (ISFETs) [3] are electrochemical sensors which carry the potential to bring the accuracy of molecular methods with the portability of the LFD by leveraging the evolution of CMOS technology in accordance with Moore's law [4]. IS-FET sensors were first introduced by Bergveld in 1972 [3]   the gate oxide of a transistor with an insulating membrane for sensing pH changes in the solution at the sensor surface. Once their implementation was achieved in unmodified CMOS technology [5], ISFETs have been adopted as an industry-standard method for next-generation DNA sequencing (Ion Torrent) [6] and recently have been reported for infectious disease and cancer diagnostics [7][8][9]. Coupled with LAMP, this technology allows for the detection of amplification in samples of small volume [10] and with minor alterations to the assay (the reaction mix is only weakly buffered to allow for proton release during the reaction) without the need for complex thermocycling. ISFETs also allow for the integration of instrumentation within the same systemon-chip. This technique has previously been used to mitigate device non-idealities (such as offset and drift) and further allow for accurate detection.
ISFETs suffer from non-idealities with drift posing the main challenge to extract the signal generated by nucleic acid amplification. The frequency range of drift and the signal of interest are within 10 mHz to 100 mHz approximately. So far previous works have focussed on standard signal processing methods, identifying an inflection point in the time-series output signal as the amplification event [11,12]. However, this method is compromised by a high drift rate and low signal amplitude and is highly sensitive to any instability or additional noise in the sensor output.
Deep neural networks are a popular tool in the field of healthcare, with medical imaging applications ranging from diagnostics to assistive surgery [13][14][15][16][17][18]. Deep learning [19] has become one of the greatest success stories in the field of computer vision [20]. However, a vast amount of data is required to effectively train neural networks [21,22]. This highlights a huge challenge in data collection for the deep learning community, as the crucial role of patient privacy and security needs to be respected [23]. Deep learning is helping physicians by demonstrating promising results for complex diagnostics in radiology [24], dermatology [25,26], ophthalmology [27], ureteroscopy [16] and pathology [28]. These advances bring forth the opportunity to extend deep learning solutions to point-of-care diagnostic devices for infectious diseases as illustrated in Fig. 1.
This paper proposes a framework to efficiently identify nucleic acid amplification from ISFET arrays, towards a fully integrated, portable diagnostic platform, using a novel combination of spectrogram processing and neural networks. It is the first proof-of-concept study for the use of deep learning techniques to improve on the accuracy of classification of ISFET-based nucleic acid amplification data by relying on its frequency spectrum. A dataset of nucleic acid amplification was selected from two categories of diagnostic assays, namely SARS-CoV-2 and three cancer biomarkers. To benchmark our work in the absence of earlier research on this dataset, a time-based approach is first proposed and then compared against the spectrogram approach to understand how neural networks can best visualise ISFET data for biomarker detection. Finally, an edge implementation with the potential to become a tool in routine diagnostics and future pandemics is proposed. The contributions of this work include the following: • a benchmark for combining the field of deep learning with pointof-care lab-on-chip (LoC) diagnostics for infectious diseases and cancer • a novel spectrogram-based approach that facilitates the application of image-processing ML techniques, including 2D-CNNs, for the classification of ISFET data • an approach that can help train neural networks with a limited dataset of infectious diseases for high precision and accuracy • a framework for machine intelligent LoC diagnostic platforms by leveraging the combined advances in the fields of spectrogram, deep learning and electrochemical-based detection to (1) improve upon the diagnostic accuracy of conventional methods, (2) transfer learning for faster deployment of tests, (3) reduce bandwidth requirement for sensor data transfer, (4) reduce carbon footprint, (5) have a lower cost and (6) promote data privacy and security.

ISFETs
Various circuits have been proposed to achieve sensor readout, i.e. to translate the threshold voltage variation into a signal. An exhaustive review can be found in [29]. In the context of this paper, the ISFET interface leads to an output signal that is linearly related to pH. Measurement of pH in a solution allows for the detection of DNA amplification on-chip: this is because DNA amplification results in a release of protons during nucleotide incorporation (Eq. (2)). Hence, detecting a pH change indicates the amplification of DNA on the chip surface [7,30].
However, despite the great potential, ISFETs present a number of nonidealities that set great challenges for the interpretation of the data. Apart from the electrical noise, ISFETs suffer from certain sources of noise that are specific to the technology. Refer to [29] for an exhaustive discussion on the limitations of ISFET technology in relation to the physical properties of the sensor. In this context, the impact of the non-idealities on the output signal is described as: P. Tripathi et al. • Trapped charge during fabrication at the interfaces of the sensing materials [31] causes an offset of the threshold voltage of the transistor. This offset induces a mismatch between sensors and may drive some out of the range of readout Fig. 2(b). • Drift is a modification of the threshold voltage of the transistor over time, caused by the chemical interaction between the ISFET surface and the solution [32]. It is a complex phenomenon that is challenging to model mathematically because of its highly unpredictable behaviour. A dependence upon several factors has been shown, such as surface geometry and material, sensor size, pH of the solution and readout configuration [33]. In practice, drift results in a monotonic exponential decay in the signal, as shown in Fig. 2(b). The drift behaviour has been shown to demonstrate high stochasticity, and may often exceed the chemical signal in amplitude, adding to the challenge of output classification.
The limitations in ISFET performance motivate the integration of the sensors onto large arrays [4]. This allows the collection of a large amount of spatio-temporal data for each experiment to compensate for the inaccuracies of the individual sensor outputs. The output can be interpreted as a series of frames of values, where each frame is equivalent to a chemical image and each sensor acts as a pixel. The data used in this paper is the output of a 78 × 56 sensing array fabricated in 350 nm CMOS technology with 1 μm SiO 2 with 1 μm SiN 4 on top as the sensing surface obtained using Serial Parallel Interface (SPI) as shown in Fig. 2(c).

AI for ISFETs
AI techniques are slowly becoming a topic of interest in the field of ISFET sensing. Whilst circuit design has been the foremost method of improving sensor performance [29], recent efforts have been reported using deep learning networks and neuromorphic architectures to combat ISFET non-idealities. The use of Multi-layer perceptron, linear regression, support vector machine, decision trees, random forests, LSTM and GRU have been proposed to compensate for temperature and temporal drift in ISFETs [34,35]. Virtual training feature generation and subsequent training of SVM and ANNs have also been proposed for separating light and pH signals in dual-gated ISFETs [36]. An alternative approach is embedding neuromorphic architectures and using backpropagation as part of the sensor front-end to help compensate for non-ideal effects [37] by using the spatial correlation among neighbouring sensors [38]. While the majority of these efforts have been focused on improving sensor performance, we propose a method to extend the benefits of AI to be used as a classification tool for the ISFET data.

Spectrograms and AI
Section 2.1 presents a possible analogy between the ISFET array output and a set of frames, i.e. a video. In contrast, we propose an alternative analogy between an ISFET output and an audio signal: where each sensor output is treated independently as a time series. This allows us to explore an entirely new set of processing techniques for experiment classification. When treating audio signals, the choice of how to treat a signal is non-trivial. The signal can be processed in the original time-domain or transformed with a number of methods, including Mel Frequency Cepstral Coefficients (MFCCs), Magnitude spectra and Spectrograms [39,40]. Spectrograms have been used for animal audio classification in combination with Siamese neural networks (SNN), clustering techniques and Support vector machines (SVM) [41]. In the context of cough recognition from audio recordings, Mel-spectrograms have been proposed as data pre-processing technique for CNN classification [42].
The same question has been raised for ECG signals, where 99% classification accuracy in arrythmia classification has been shown by transforming the signal in time-frequency domain with Short-Time Fourier Transform (STFT)-based spectrograms, and using 2D image processing techniques for classification [43]. Similarly for EEG signals, stacked multi-channel EEG spectrograms have been used for training DCNNs for REM Behaviour Disorder (RBD) diagnosis [44]. The EEG spectrograms have also been used for Autism Spectrum disorder using SVM classifiers [45]. The successful application of these algorithms to biological signals poses the justification for the approach proposed in this paper, that is employing spectrograms and CNNs on ISFET outputs for classification.

Data collection
Nucleic acid amplification tests, such as the polymerase chain reaction (qPCR), result in amplification events where a specific nucleic acid sequence is detected. These amplification events result in the formation of large quantities of DNA. With each addition of a nucleic acid to a double-stranded DNA sequence, one proton is released into the solution [46]. qPCR usage is limited within point-of-care settings, predominantly due to the requirement for thermal cycling [47]. Loopmediated isothermal amplification (LAMP) is a rapid, isothermal and quantitative amplification technique [48]. The addition of reverse transcriptases into LAMP assays (RT-LAMP) also results in the detection of RNA sequences. As a result, RT-LAMP has been utilised in the detection of SARS-CoV-2 mRNA, prostate cancer mRNA and influenza mRNA for diagnostics [7,49,50]. Augmentation of LAMP assays to generate a pH readout has previously been established [51]. This work utilises augmented RT-LAMP assays (RT-pHLAMP) for detecting the gene from P. Tripathi et al. SARS-CoV-2. The specificity of this assay has been robustly ratified [7]. Cancer markers 1,2 (mRNA) [8] and 3 (DNA) [9] show potential as circulating biomarkers for cancer diagnosis and prognosis. As such, RT-pHLAMP and pHLAMP assays for these markers were implemented on the ISFET sensing array; their data is also included in the dataset.
Microfluidic manifolds were used to house the reaction mixtures over the ISFET sensing array. SARS-CoV-2 samples were contained within a bio-compatible resin (MED-AMB-10) with two 5 μL chambers, one with a positive sample (i.e, would render a release of protons) and one negative sample. The RT-pHLAMP and pHLAMP assays for cancer marker detection were housed in an acrylic manifold with one 20 μL chamber. Positive and negative samples in this instance were run separately. Each chamber of each assay type was affixed to the ISFET sensing array with an adhesive gasket (Double-sided smooth lamination filmic tape, Tesa ® ) to avoid leakage of the RT-pHLAMP or pHLAMP solution. The reference electrode (AgCl/Ag, 0.03 mm chloridised silver wire) was secured between the adhesive gasket and the ISFET sensing array. A 100 mV voltage spike from the reference electrode occurs at the beginning of the experiment. This indicates if the reference electrode is connected to the circuit and illustrates which pixels are responding to variations in pH change. The reaction was heated to 63 • C with a peltier heating module in contact with the chip.
The chambers were initially filled with nuclease-free water for 700 s to set a common voltage for the array surface. At 700 s the water was replaced with the RT-pHLAMP assay solution. All positive reactions were then run for a further 30 min. SARS-CoV-2 negative reactions were run for 30 min, negative cancer biomarker assays were run for 20 min. Data was recorded in real-time on a mobile phone. Table 1 presents the dataset distribution between training and testing based on the experiments.

Data pre-processing
Data preparation is required to address sensor non-idealities prior to network classification. The steps proposed in this method are individually discussed in the following paragraphs and illustrated in Fig. 3: identification of active pixels, extraction of relevant time interval, trapped charge compensation, spectrogram generation, and highfrequency noise compensation.
Identification of active pixels. These filtering steps are computationally inexpensive operations that allow filtering of the sensors that carry significant information about the chemical reaction in the well. Pixels are generally considered as inactive if they are covered by the microfluidic manifold or out of the readout range due to non-ideal effects like trapped charge and drift. These pixels are identified as the sensors that respond to a variation in reference electrode voltage at the beginning of each experiment. Moreover, the offset caused by uneven trapped charge in every sensor causes some of the pixels to have an output that cannot be expressed by the voltage output range = 0 − 1 V. These clearly need to be disregarded.

Relevant time interval.
As highlighted in Section 3, the relevant section of the experiment has to be identified after the sample is inserted in the well. Any earlier signal is then disregarded as irrelevant. Similarly, the underlying biological properties allow identifying the end of the experiment after 450 samples. In fact, it is possible to infer from the standard curve of Covid-19 and Cancer Biomarkers 1, 2, and 3 LAMP amplification that any positive sample would lead to detectable DNA amplification within the first 22 min [7]. Despite the uneven sampling across experiments, selecting 450 samples allows us to always capture the relevant interval whilst avoiding fitting irrelevant data after the 22 min timestamp.
Trapped charge compensation. Trapped charge compensation consists in subtracting the initial offset from the signal to ensure that all experiments start at the same voltage. However, this approach does not account for the effect of trapped charge on sensor drift, which has not been fully characterised and therefore is not addressed in this work. In reality, offset subtraction does not completely mitigate the effect of trapped charge, which has an influence on the drift behaviour and pH sensitivity of the ISFET, but any additional compensation would be non-trivial because of the lack of complete mathematical models that account for these dependencies.
Spectrogram generation. To explore the frequency content of a discrete ISFET output signal ( ) of length , the average behaviour of the entire signal could be explored by taking the DFT over the entire period . However, a more appropriate tool for the non-stationary ISFET output is to look for local information by dividing the signal into small overlapping windows, where the signal can be assumed to be stationary. The time-frequency spectrogram is then found with the STFT as where is the window size, is the shift between successive windows, ∈ [0 ∶ ], ∈ [0 ∶ ] and ( ) is a Hanning window (4) with window size = 22 and window shift = 11.
The spectrogram is individually obtained for every pixel, so that each 450-sample signal is transformed in a 12 × 39 spectrogram array ( , ), with frequency estimates ∈ [0 ∶ ∕2] and resolution of ∕ , where is the sampling frequency. The spectrogram represents the ISFET readout in a redundant manner, such that each time sample is used twice to obtain a frequency estimate. This in turn allows training a neural network that can learn to classify these features. The underlying idea is that the frequency of drift at the time of amplification is lower than the sigmoid indicating amplification, thus aiding in classification.
High-frequency noise compensation. The chemical reaction associated with DNA hybridisation is a slow process, hence the expected ISFET signal is equivalently expected not to show abrupt changes. As such, the high frequencies are not expected to carry any significant information for classification. Then, the 10 × 39 section of the spectrum corresponding to the lowest frequencies is used as input to the CNN to avoid fitting data that is known to be irrelevant for classification.

Implemented networks and results
The aim to implement a neural network for binary classification involving the use of both time domain and spectrogram-based approaches to train neural networks. All networks are trained and compared by employing the same seed value, learning rate, epoch, batch size and adam as the optimiser for a fair comparison. For the chosen spectrogram-based network, the hyperparameter optimisation is reported in Appendix C. The first approaches explored to achieve the binary classification of the ISFET data are based on processing the time domain data directly. There are sophisticated methods proposed for time-series-based classification in literature such as FCN [53], Inception Time [52], and RESNET [53] that are proven to perform better than image classification using CNN based models. This was the motivation behind the implementation of the following time-based networks on the time series data from ISFETs:  • 1D-DCNN : a convolutional network was implemented as shown in Fig. 4a, showing some improvement in performance with the addition of convolutional filters. From Table 2, it can be inferred that 1D-DCNN has the highest accuracy for all the time domain approaches but the lowest recall, suggesting a very high probability of false negatives. • FCN : this was implemented as shown in Fig. 4b From Table 2, it can be inferred that not only does FCN take a longer training runtime but also has a very low recall making it unsuitable for medical applications like the 1D-DCNN. • InceptionTime: this was implemented as shown in Fig. 5. From Table 2, it can be inferred that InceptionTime has the highest accuracy of all the time domain approaches. However, the extremely high training runtime and a large number of trainable parameters make it unsuitable for edge implementations. • RESNET : this was implemented as shown in Fig. 6. From Table 2, it can be inferred that while the RESNET has a more balanced performance across all the metrics, it still needs a longer training runtime and manages to reach an accuracy of only 66.89 • Autoencoder: this was implemented as from Fig. 7 and trained with Mean Absolute Error [54] loss function, given in Eq. (5), where is the predicted value and̂is the true value and is the number of samples.
This started showing some improvements in the recall of the network but leading to low accuracy and precision, essentially showing a bias towards the classification of samples as positive.
This is due to the network being trained only with the positively labelled data and then applying the principle of anomaly detection by using a threshold to treat the negative samples as anomalies. The encoder input, decoder output and reconstruction error for eight pixels in the test set is shown in Fig. 8.
The overall unsatisfactory classification performance of the time-based networks reported in Table 2 resulted in the decision to feature engineer the sensor data into spectrograms, thus allowing the exploration of image-based approaches for training neural networks. A 2D-CNN was designed for classification as shown in Fig. 9. This consists of two convolutional layers with ReLU activation function, with each convolutional layer being followed by a 2 × 2 Max-Pooling function. The layer is then flattened and passed through a dense layer with ReLU activation function. The output layer is based on a Sigmoid activation function for the ultimate classification of a healthy or infected patient. Because the aim is to solve a binary classification problem, the loss function is Binary Cross-entropy [55], reported in Eq. (6), with mean-based threshold classification in the neural networks.
The hyperparameter optimisation performed on the network led to the decision to train with a Batch Size of 16, 40 Epochs, an initial learning rate of 0.001 with a step decay of 75% every 10 epochs. As the model is implemented to achieve point-of-care diagnostic on a microcontroller, the neural network implementation is aimed at compact structures that are compatible with the application, and that can additionally be trained quickly to be deployed in emergencies. Furthermore, quantisation-aware training was performed with P. Tripathi et al.    Fig. 9. DCNN using spectrograms. This consists of two convolutional layers with ReLU activation function, with each convolutional layer being followed by a 2 × 2 Max-Pooling function. The layer is then flattened and passed through a dense layer with ReLU activation function. The output layer is based on a Sigmoid activation function for the ultimate classification of a healthy or infected patient.  Training approach. Two conflicting arguments were considered in the training set preparation. On one hand, as seen in Section 2.1, the single-pixel signal is highly non-ideal: that is why experiments are often treated as the average of the active pixels, essentially using spatial averaging to compensate for the noise of the isolated signal. This is in contrast with the requirement for large amounts of data to successfully train a network. The proposed method thus uses all pixel data as independent samples to train the model. The underlying assumption is that all sensors in contact with a positive sample will eventually detect the reaction at some point. Thus, we present a data augmentation approach for LoC devices to help train with fewer experiments. The limitation in the number of experiments available (83) lead to the decision to consider the single-pixel data in the model instead of the experiment average, leading to 70,208 sets used for training and 17552 for testing. In fact, each experiment is comprised of thousands of pixels that provide a reading independently; the pixel signals from the same experiment do present some level of correlation, thus introducing some redundancy in the dataset but were sufficient for training. In this method, the dataset is kept balanced in the training by selecting an equal number of positive and negative active pixels. All ISFET pixels are independent and only share resources for communication protocols. The dataset is thus trained and validated using a 5-fold crossvalidation strategy by treating each pixel as an independent training data point with its own spectrogram. This is followed by testing on Dataset-1 which comprises 5 new patient samples which have 3 mRNAbased samples and 2 negatives. To further test the generalisability, Interbiomarker precision is further defined for testing of 4 additional Cancer samples which is the cancer marker 3 and is DNA based sample.
Testing approach. 5-Fold Cross-Validation is used for testing, resulting in an 80∕20 split for train/test sets with 21105 pixels in turn used to evaluate the performance of the model. Performance metrics are introduced in Appendix B of the supplementary material and values are reported for each fold in Table 2. Bar error plots with tolerance intervals with respect to the random seed are also provided in Appendix C. Appendix C presents the hyperparameter optimisation with hyperparameter tuning done through the scikit grid search. The performance metrics are reported across multiple hyperparameter values and the order of optimisation is also defined. Fig. 10 shows the T-SNE plot of the features from the penultimate layer in the DCNN applied P. Tripathi et al.

Fig. 11. Proposed edge implementation framework for new infectious diseases.
Note that here edge devices will be microcontrollers like the Arduino Nano 33 BLE Sense.
to the time series (left) and on the spectrogram (right). This succinctly depicts that the features extracted from the time-frequency data allow for better distinguishability of the negative and positive experiments, respectively labelled 0 and 1.
Although the spectrogram-based DCNN has proven to be a powerful approach for single-pixel classification, in the context of LoC diagnostics the interest is towards the classification of data from entire experiments, that is, identifying samples as positive/negative using data from the array. Table 1 shows that the SARS-COV-2, Cancer Biomarker 1, Cancer Biomarker 2 and Negatives are used for both training and testing. On the other hand, 4 samples of Cancer Biomarker 3 are just used for testing. This gives rise to the Inter-Biomarker Precision metric in Table 2. Here, the model is evaluated on the basis of single pixel classification and then a majority voting approach is implemented to determine whether the entire experiment is classified as positive or negative. This was tested across the 5 folds for positive samples of Biomarker 3. The percentage of positive active pixels in the positive cancer Biomarker 3 experiments was 59.01% on average, with a standard deviation of 5.57%, with the majority voting always in favour of a positive outcome. This shows that the algorithm is robust to allow diagnostics of data not directly used in the training, thus potentially resulting in a universal diagnostics platform where only the assay preparation needs to be updated.

Edge implementation
When treating medical data, such as patient samples in the proposed application, patient privacy and security is essential to avoid malicious intent. Therefore, the use of edge devices is critical to support diagnostic testing while respecting patient confidentiality. The quantisation-aware training of the proposed spectrogram-based DCNN model showed limited drop in performance, achieving an accuracy of 84.57% in the TFLite implementation, thus showing good metrics for deployment of devices on the edge. The converted TFLite-Micro model of the Spectrogram based DCNN is 30 kB, making it suitable for deployment on most microcontrollers. The microcontroller of choice is the Nano 33 BLE Sense which is compatible with TFLite-Micro and can have a standby power consumption of just 0.9 nA. It is expected that the edge implementation of the neural network-based inference engine will have a negligible contribution compared to the heater required for the LAMP reaction that is present on the current system and takes more than 80% of the power consumption. This will further be tested in future work with a full system implementation. Fig. 11 illustrates a framework for the use of the proposed platform in a generic diagnostics context by leveraging the power of TinyML [56].

Conclusion
This paper presents a novel spectrogram-based DCNN approach that leverages the redundant feature space representation achieved by spectrograms for the classification of ISFET array data of nucleic acid amplification on LoC platforms. The proposed method provides a significant improvement from 57.58%, 60.33%, 55.8%, 57.24%, 69.67%, and 69.05% for the time-domain-based ANN, 1D-DCNN, autoencoder, FCN, RESNET, and InceptionTime approaches respectively, to 84.84% for the spectrogram based 2D-DCNN approach. Our proposed method is the preferred approach because it improves accuracy, reduces bandwidth requirement and allows faster time to market as a more generalised approach without the need for complex feature engineering.
Thus allowing for more efficient diagnostics. Furthermore, medical data from ISFET sensors is very different from the EMG-based and ECGbased time-domain data that has been used to prove the accuracy of time-domain-based networks in the past. This is due to the fact that the presence of an infected sample is plagued by drift, offset and trapped charge in the CMOS-based ISFET sensors, thus making it difficult to highlight a constant feature in the time series data. In addition, we have proposed a preprocessing approach to augment the dataset to allow researchers to train with limited datasets of electrochemical readouts from nucleic acid amplification for infectious disease diagnosis. We also present an edge implementation framework that can help provide tests with high sensitivity and specificity to patients while addressing the need for data security. The accurate classification of Cancer Biomarker 3, which is excluded from the training set, proves that the model can be generalised for future diagnostic applications. This work sets a benchmark for future AI approaches to point-of-care diagnostic devices using potentiometric sensors and presents spectrograms as a benchmark transform for implementing neural networks in the field of diagnostics for infectious diseases and cancers.

Limitations and future work
• Although some methods are used for active pixel identification, the performance of individual sensors has not been characterised in the overall positive and negative experiments. This means a greater amount of time spent on the annotation of individual sensor data could result in higher accuracy of the models. • The pre-processing of data needs to be converted from Python to C++ for integration with microcontrollers. We intend to work on this and show real-time results in the future. • Treating the sensors as pixels forming electrochemical image frames can help us apply image processing techniques without much pre-processing. This would require a large amount of framewise annotation and will be pursued in the future to compare performance. • The ISFET output signal is low frequency and has information in terms of time, frequency and amplitude. This makes a timefrequency and amplitude transform like the spectrogram, a perfect approach for ISFET based diagnostics, allowing the authors to present it as a benchmark. We expect to carry out future work to identify the pros and cons of alternative transforms. Among the alternative transforms, Wigner Ville [57,58] and scalograms [59] have shown promise in the past in biomedical applications. However, Wigner Ville is known to introduce cross terms between several frequency components. As is visible in the signal from ISFETs, there are a large amount of frequency components residing in the low-frequency region, making Wigner Ville unsuitable for this application. Scalograms on the other hand are computationally expensive and are not suitable for implementation on microcontrollers. Currently, the constant window constant length in STFT for spectrograms appears as a more suitable candidate. • The models used in this paper have been chosen by keeping in mind that the size of the model should be compact enough for deployment on a Nano BLE 33. This means an effort will be made in the future to explore more sophisticated neural networks such as recurrent neural networks (RNNs) and transformer-based models.
P. Tripathi et al.

Declaration of competing interest
None Declared.