High Accuracy Protein Identification: Fusion of Solid-State Nanopore Sensing and Machine Learning
Proteins are arguably one of the most important class of biomarkers for health diagnostic purposes. Label-free solid-state nanopore sensing is a versatile technique for sensing and analyzing biomolecules such as proteins at single-molecule level. While molecular-level information on size, shape, and charge of proteins can be assessed by nanopores, the identification of proteins with comparable sizes remains a challenge. Here, solid-state nanopore sensing is combined with machine learning to address this challenge. The translocations of four similarly sized proteins is assessed using amplifiers with bandwidths (BWs) of 100 kHz and 10 MHz, the highest bandwidth reported for protein sensing, using nanopores fabricated in <10 nm thick silicon nitride membranes. F-values of up to 65.9% and 83.2% (without clustering of the protein signals) are achieved with 100 kHz and 10 MHz BW measurements, respectively, for identification of the four proteins. The accuracy of protein identification is further enhanced by classifying the signals into different clusters based on signal attributes, with F-value and specificity of up to 88.7% and 96.4%, respectively, for combinations of four proteins. The combined use of high bandwidth instruments, advanced clustering and machine learning methods allows label-free identification of proteins with high accuracy.