Label-free protein detection using terahertz time-domain spectroscopy

: Protein analysis is the foundation to understanding the mechanisms of complex biological processes. As one of the most widely used techniques to determine protein species and contents, protein dot blot aids biology research but needs corresponding antibodies for marking. A label-free detection method based on terahertz time-domain spectroscopy (THz-TDS) is proposed and demonstrated to improve this traditional technology. A membrane loaded with protein samples is directly scanned using a transmission THz-TDS system for spectral imaging. Different kinds of proteins can be distinguished by the refractive index extracted from the THz transmission spectrum. The intensity or shade imaged with the THz transmission spectrum can help detect the protein quantitatively. The feasibility of this new protein assay is demonstrated by the results of systematic testing with actual samples prepared with the dot-blot protocol.


Introduction
Proteins play an important role within all living organisms. As enzyme [1], antibody [2], signaling molecules [3], transporting molecules [4], and so on, they belong to a category of organic macromolecules participating in the formation of cells, and are principally responsible for life activities in the organism. The quantity of different kinds of proteins is an important index of protein function analysis, clinical diagnosis, recovery check, quality testing of biological products, etc [5]. The common methods to determinate protein quantity include the kjeldahl method [6], the Biuret method [7], the ultraviolet absorption method [8], the Bradford method [9], the Lowry method [10], and the protein immunoblot [11]. Among these, the protein immunoblot as the most significant semi quantitative experimental analysis method is conventionally used in the research on the expression of gene at the protein level, detection of the activity of antibody and early diagnosis of disease, due to its high specificity and sensitivity in solid phase immunoassay [12]. It takes advantages of the specific expression information of homologous antibody of the protein to determinate the presence or absence of the specific protein, together with its quantity. Generally, the use of antibody involves several sophisticated and tedious processes that take much time, labor and are costly [13,14]. For a label-free method antibodies are not essential, thus avoiding processes mentioned above and, it may be the last resort in a situation in which the corresponding antibodies are hard or even impossible to obtain. So a reliable and label-free protein analysis method would help improve the detection efficiency and save time and money.
Terahertz (THz) electromagnetic spectrum ranges from tens of gigahertz (GHz) to several THz. The vibrational and rotational energy levels of large molecules, especially biological macromolecules mostly lie in the THz wave band [15,16]. Biological molecular vibrations and rotations involve intra/inter molecular weak hydrogen bonds, van der Waals forces, conformational changes, non-bonded hydrophobic and hydrophilic interactions [17,18]. Moreover, THz radiation excites the aforementioned vibrations and rotations that can be revealed by refractive index spectrum and absorption (or transmission) spectrum of biological molecules by their individual symmetries and structures in the THz frequency range [19,20]. As an emerging field to examine biomolecular structure and to characterize absorption properties of biological materials, THz spectroscopy technology makes itself an appropriate and effective candidate to identify structures and physical properties of macromolecules [21]. Refractive index which is closely related with electromagnetic properties characterizes the aligning of ions inside materials, so accurately acquiring refractive index can indirectly reflects the conformational changes and the other large-scale deformations involving charge movement and relocation. Several previous works have reported measurements of the dielectric functions of DNA components, such as thiabendazole and bactericides [22,23]. Proteins of different species have diverse amino acid structures that determinate individual spatial conformation and electromagnetic characteristics [24], such as refractive index and absorption coefficient, directly related to the corresponding protein species.
Absorbance of the medium in the THz band is mainly brought about by resonance absorption and partly arise by vibration and rotation of molecules and the interaction between them [15,25]. The greater the amount of the medium, the higher the energy consumption caused by the resonance absorption, and the more revealing the particular vibration, rotation, and interaction [26]. The amount of energy of THz radiation that goes through the medium relative to that penetrates the background indirectly reflects the absorbance of the medium. The capacity of THz to qualitatively and quantitatively detect proteins has been confirmed by Xie et al. [27] and Wang et al. [28]. Hence penetrated energy of THz radiation provides us a way to detect the microscopic mechanism of the medium to achieve the protein quantity.
THz time-domain spectroscopy (THz-TDS) systems measure the THz electric field (both amplitude and phase) over a wide frequency range rather than just intensity or phase, with picoseconds temporal resolution. In our method, liquid protein solution to be measured was dropped on nitrocellulose blotting membranes (NC). After being dried by a drying oven to eliminate the influence of bulk water, the membrane was scanned by THz-TDS in the through-transmission imaging mode. The information about protein species and protein quantity were obtained through the frequency-dependent refractive index extracted from the transmission spectrum, and the imaging formed by average integrated relative energy of THz radiation that penetrates the membrane. Refractive index provides us with the foundation to differentiate different kinds of proteins and the intensity or the level of shade of the image is directly related to the quantity of proteins participating in the absorption. Our experimental results match exactly the results of dot blot, which confirms the feasibility of the THz-TDS protein assay. Our method does not merely distinguish the different kinds of proteins and precisely acquire the protein quantity distribution, it also saves about 6 hours of processing time on average compared with standard procedure of conventional protein immunoblot [11], in addition to being label-free and nondestructive.
The remainder of this paper is organized as follows: the THz-TDS system used in our experiments and the information of samples are introduced in Sec. 2. In Sec. 3, the theoretical basis of THz spectral imaging is described, and experimental results are presented and discussed. Sec. 4 presents concluding remarks and a summary.

THz-TDS system
The robust THz-TDS system deployed in our research is the T-Gauge 5000 unit [29] from Advanced Photonix, Inc.. The schematic diagram of the THz-TDS system in the throughtransmission mode is shown in Fig. 1. The femtosecond laser pulse produced from the Sapphire oscillator has a duration of 80 fs, a central wavelength of 1064 nm, a repetition rate of 100 MHz, and the average power of the probe beam is 20 mW. The system can work in both the reflection and the through-transmission modes in which the transmitter and receiver are installed on a travelling rail. Regardless the modality of operation, the system provides a spectral bandwidth ranging from 0.1 to 3.5 THz, a signal to noise ratio better than 70 dB at the low-frequency end (down to a few tens of GHz), and not less than 35 dB at the highfrequency end. The sampling interval of the THz-TDS is 0.1 ps, the spectral resolution is 12.5 GHz and the maximum scanning rate is 1 kHz. The beam width of the THz-TDS imaging system is about 10 mm in diameter at the focal spot, and the actual imaging resolution at 1 THz is about 1 mm [30].

Sample preparation
Optical photographs of the Membrane 1 and the Membrane 2 in which protein liquid solution is dropped are respectively illustrated in Fig. 2 and Fig. 3, as with standard protein immunoblot method. The membrane used in the experiment was nitrocellulose blotting membranes (NC), which was purchased from BIO-RAD (Texas, USA), with 0.2 µm pore-size and is 100% nitrocellulose. The thickness was 0.148 ± 0.001 mm. The proteins under investigation are Protein rn21, Protein rn22, Protein 28, Protein n42, Protein n43 and Protein n53, which are new gene encoding proteins found in Thermophiles from Hot Springs in Tengchong, Yunnan. Clear and detailed properties of protein samples, such as amino acid sequence information, concentration and loading quantities can help analyst the THz spectrum signature of tested samples. Also, chosen protein samples must have corresponding antibodies and can be tested by traditional methods in order to prove the reliability of the new experiment method. The sample used in this study meet the above requirements. The molecular masses based on the gene sequence are given in Table 1. The six proteins are formulated into every five solutions with respective concentrations of 2 μg/μl, 1 μg/μl, 0.5 μg/μl, 0.25 μg/μl and 0.125 μg/μl, dissolved in solution of 1 × PBS at pH 7.2. In Membrane 1, three 4 μl solutions of the six kinds of proteins with a concentration of 2 μg/μl are vertically adjacently dropped in, and then dried naturally and kept standby. Thus, the protein quantity of every single sample point is 8 μg. In Membrane 2, three 4 μl solutions of each protein solution are horizontally adjacently dropped in then dried naturally and kept standby. The protein quantities of every three horizontally adjacent protein sample points are 8 μg, 4 μg, 2 μg, 1 μg and 0.5 μg. To refrain from the interference of water before being scanned by THz-TDS, the membranes are dried by a drying oven under room temperature of 17 degrees Celsius. The drying mode is wind-blow, the dry temperature is 25 °C, and the dry time is 20 minutes for each time until the remaining total weight of the membrane is less than 0.01 g (the initial and final total weight of Membrane1 are respectively 0.360 g and 0.326 g, the initial and final total weight of Membrane2 are respectively 0.968 g and 0.782 g). The diameter of the protein sample plots was 6.84 ± 0.42 mm and the separation distance between different sample plots was 3.28 ± 0.44 mm.

Algorithm
Refractive index and the relative energy of the transmission spectrum are optical parameters quoted in this paper. The refractive index is calculated by Eq. (1), derived from Fresnel's Law of refraction/reflection, together with the Beer-Lambert Law of attenuation. Details of the derivation can be found in Ref [31], where s n is the refractive index, φ is the phase difference between the Fourier transforms of the sample signal and reference signal recorded with and without the measured sample between two pieces of polyethylene (PE), c is the THz wave propagation speed in air, ω is the angular frequency, d is the sample thickness. The relative energy of the transmission spectrum is defined as Here re P is the relative energy of the transmission spectrum, s E and r E are respectively the frequency-dependent amplitudes of the Fourier transform of the sample signal and the reference signal. Because the thickness of measured samples lies between 100 μm and 160 μm the samples act as Fabry-Pérot etalons which lead to oscillations superimposed on the parameters calculated by Eqs. (1-2). An exclusive and simultaneous band-stop filter is designed for each sample to remove the Fabry-Pérot ripples. The center frequency of each filter is the reciprocal of the Fabry-Pérot oscillation period which can be obtained from the parameters mentioned above. We apply the filter to the raw refractive index and power of transmission spectrum to eliminate the effect of Fabry-Pérot oscillation on the veracity of parameters.
In the graphical treatment of spectral data, we have constructed a criterion which has proved more persuasive in presenting the results for imaging. It is the integration of the relative energy of the transmission spectrum confined in Eq. (2) in a frequency range as shown in Eq. (3). For different samples the effective frequency range for imaging is different. For example, for Membrane 2 the most effective frequency range is 0.875-1.075 THz.
Here, re P is the frequency-dependent relative energy of transmission spectrum defined in Eq. (2). Either the frequency-dependent refractive index or the relative energy of the transmission spectrum can be utilized to construct spectral imaging for samples. When extracting refractive index the accurate thickness of a sample is demanded and the influence of the thickness error on the accuracy of the refractive index is exponential. For refractive index imaging the thickness of every pixel should be exactly measured, which is impractical for our scraggy samples. So the relative energy of the transmission spectrum which is independent of sample thickness is the appropriate candidate for imaging. It is true that for a sample the relative energy of the transmission spectrum at every single frequency can be used to build an image, if one can tolerate the occasionality that may exist. However, the integration of the relative energy of the transmission spectrum in an appropriate frequency range is the most effective spectral attribute for imaging.

Protein immunoblot
As a contrast experiment, the dot blot is performed in the traditional way mentioned in reference. Except for all protein rn22 as well as protein n42 and n53 with concentration of 0.5 μg/μl, all the liquid samples specified in section 2.2 underwent the dot blot procedure. After being applied on a membrane the proteins are probed by the corresponding antibodies that are prepared in advance to detect the presence or absence of the specific proteins. In the experimental results the area of shade is proportional to the protein quantity.

Results and discussion
The refractive index of the six kinds of protein rn21, rn22, rn28, n42, n43 and n53 in Membrane 1 with concentration of 2.0 μg/μl and quantity of 8 μg is shown in Fig. 4. The quoted refractive index in the curves is the average of the refractive index of 27 single points among which every 9 were chosen at random from every 3 vertically adjacent protein sample points with the same protein quantity. The curves of these six different species of proteins are separated from one another, which can be used to distinguish the protein species. Although the error bar overlaps with each other the curves roughly present the law that at every frequency the protein with a higher molecular mass has a stronger refraction capacity, namely greater refractive index. Refractive index is exactly the physical quantity that reflects the aligning of ions inside materials. While different proteins in the same family, among which the proteins have similar amino acid structural domains, may have their own tertiary structures, which determine the difference of ionic configurational modes of these proteins, due to evolution and differentiation. Thus for the materials with similar structures, the higher molecular mass generally results in denser population of ions [22,23]. The dot blot imaging result of Membrane 2 and the pseudo-color image of gray image constructed by the relative energy integration of the transmission spectrum pixel by pixel for Membrane 2 are respectively shown in Fig. 5a and Fig. 5b. In the pseudo-color image every individual sample point is identified. The imparity of color namely intensity of shade of gray image indicates the different quantities of proteins. The quantity of protein was identified mainly through estimating the imaging area and shadow of the protein sample in the traditional method dot blot. And the imaging quality greatly depended on the quality of the antibody and experimental skills of the laboratory personnel [32]. Figure 5a showed the imaging of membrane with different protein samples. Significant difference between different concentrations in identified protein areas had been shown in protein n42 groups with small variation among members within the group. The adjacent dot represented 4, 2 and 1 μg of protein 28, also showed obvious differences in the identified protein areas. But areas of sample points of 8 μg and 4 μg, 1μg and 0.5μg are close with each other. Moreover, these areas of Protein rn21 and 43 cannot be differentiated with ease. However, bigger points area of 4 μg of protein 53 than that of 8 μg had been detected. There were reported studies showing that when the quantity of the detected protein is lower than 1 μg, the final dot blot results would not be very accurate [33]. This kind of misdiagnosis resulting from the protein quantity will not occur with our method. The detection and display of a protein sample with high concentration would be affected, because part of the proteins in the bottom could not be combined with the corresponding antibodies.
Different from monotone imaging of dot blot, the pseudo-color image which visually manifests the THz energy showed more nuances in each sample plots (Fig. 5b). The average grayscale of gray image, upon which the pseudo-color image is based, for the same quantity of the same proteins is shown in Table 2. The average grayscale of Protein rn21, Protein rn22 and Protein n43 first increases at 0.5 μg and then decrease at 1 μg, 2 μg, 4 μg and 8 μg with the increase of protein quantity (except for the 8 μg of Protein n43, it increases instead). The average grayscales of Protein rn28 and Protein n42 exhibit the opposite trend to that of the three kinds of proteins mentioned above. For the six kinds of proteins with the same quantity, the intensity of shade of gray presents a similar tendency to that of the refractive index. That is, the protein with a higher molecular mass has a higher absorption, which results in a lower THz energy passing through the sample. For a single kind of protein with the gradient quantity, the greater the protein quantity, the more energy is absorbed. Additionally, the center had the highest transmission, and more energy loss occurs around the edge [34]. Our method to detect protein concentration by using THz-TDS displayed the spatial density distribution of proteins, which promoted the accuracy of determination of protein content. To explicitly compare our image with the dot blot results, the line graphs of relative energy integration and identified protein area by dot blot are constructed together, along with the explanation of the other phenomena of the other proteins.  The line graphs of relative energy integration and identified protein area by dot blot of Protein rn21, Protein rn22, Protein rn28, Protein n42, Protein n43 and Protein n53 are respectively shown in Fig. 6(a)-6(f). Four curves in each graph are respectively of the following variables: standardized value of identified protein area by dot blot, standardized value of integrated relative energy and the fitting curves of the two values. The identified positive protein area by dot blot is the average of horizontally adjacent three protein sample points with the same quantity. The integrated relative energy is the average of integrated relative energy in the frequency range of 0.875-1.075 THz in all areas of horizontally adjacent three protein sample points with the same quantity. Information of fitting curves for the average integrated relative energy and the identified protein area is respectively included in Table 3 and Table 4. The polynomial orders of the fitting curves of relative energy integration and identified protein area by dot blot for Protein rn21, Protein n42, Protein n43 and Protein n53 are all 2nd, and for Protein rn28 are both 3rd. Goodness-of-fit statistics of the fitting curves of relative energy integration for Protein rn21 and Protein rn28 is far superior than that of identified protein area by dot blot. Goodness-of-fit statistics of the two fitting curves for Protein n42 is almost the same at 97% and 98%. However the Goodness-of-fit statistics of the fitting curves of identified protein area by dot blot for Protein is a little bit higher than that of relative energy integration. On the whole, the orders of fitting curves of two parameters exactly match with each other, and in terms of the Goodness-of-fit statistics of the fitting curves our method is more sensitive than dot blot. Thus, the derived parameters' mutation quantity from the fitting curves of relative energy integration is preferred, rather than that of identified protein area by dot blot which depart from the actual mutation quantity. curves of Protein rn28 and Protein n42 exhibit the opposite trend to that mentioned above. The primary reason for the gross ascending tendency of THz energy penetrating the proteins with the increase of protein quantities is the absorption of proteins, which is stronger than the membrane. Four reasons may account for the nonlinearity of the four kinds of curves in Fig.  6. Firstly, the content of free water [35,36], which strongly absorbs THz radiation; in solutions, with variation of concentration, the amount of free water that volatilizes naturally and artificially cannot be controlled. Secondly, during the drying process, the spatial conformations are subject to change due to the hot-air blow and the hydration water loss [37], because of the sizes of the pores, which are about three orders magnitude larger than the size of a protein molecule, in nitrocellulose membranes. Thirdly, a part of the measured proteins may have degraded into smaller fragments which are unstable. Lastly, the approximation made in the calculation process, which neglects the dispersion, reflection, phase error, and the unevenness of the membrane, may be the most responsible culprit. On the other hand, the opposite trend shown at the least protein quantity may be due to the fact that the amount of protein is too small, that after being dried it forms clusters/islands, and cannot cover the entire measured area. Overall, no matter what the trends of the curves are, the lines of the standardized value of average integrated relative energy hold the exact crosscurrent with those of the standardized value of the identified protein area by dot blot. Information on the fitting curves for the identified protein area is in accordance with that of the fitting curves for the average integrated relative energy. The reason for the crosscurrent is that the identified area of positive protein is directly proportional to the protein quantity, and the energy goes through the membrane is in inverse proportion to the protein quantity. Thus, the dot blot results verified our method, and our method is shown to be superior to the protein dot blot. The protein detection methods based on immunological technology for quantitative and qualitative analysis of specific proteins include dot blot analysis, western blot, immunostaining and immunocytochemistry etc., at the molecular, cell, and tissue levels, respectively. Antibody with a higher titer and sensitivity is the decisive factor for experimental success. The classification accuracy of protein species of protein immunoblot is higher than THz spectrum because of the specific antigen-antibody reaction at present. But there is no doubt that the economy, adaptability and simplicity of the improved label-free method is superior to traditional methods. The unique THz spectral feature information ("fingerprints") of biomacromolecules, such as DNA, RNA and protein at THz frequencies are increasingly becoming available, which, combined the improvement of detection sensitivity and accuracy of THz spectroscopy and imaging, will expand application areas of the THz label-free spectral analysis method to further improve the various blot-based approaches.

Conclusion
A label-free method to acquire the quantitative distribution of different kinds of proteins using THz-TDS is presented. Compared with protein immunoblot it departs from several intricate processes, resulting in substantial time-saving. The new method can obtain more accurately detailed information for a large number of proteins. Refractive index can be used to discriminate different kinds of proteins and imaging constructed by THz-TDS can distinguish the protein quantity, with a resolution of better than 0.5 μg. Experimental results which are confirmed by conventional dot blot show that the average integrated relative energy of Protein rn21, Protein n42, Protein n43 and Protein n53 satisfies a quadratic relationship with their respective protein quantity, while the average integrated relative energy of Protein rn22 and Protein rn28 is in a cubic relationship with their respective protein quantity. Our method, though still at an early stage of development, with much more work to be done to detect the interaction of proteins in a membrane, amount other things, nevertheless provides fundamental support for detection of biological macromolecules in the THz band of electromagnetic radiation.