Identification of differentially expressed genes, transcription factors, microRNAs and pathways in neutrophils of sepsis patients through bioinformatics analysis

Sepsis has been recognized to be a life-threatening organ dysfunction caused by the dysregulation of the host response to infections. Our work aims to screen key biomarkers related to neutrophils in sepsis using bioinformatics analysis. For this purpose, the microarray datasets related to neutrophils in sepsis patients were downloaded from the Gene Expression Omnibus (GEO) database. According to the Bayesian test, the Limma package in R was used to screen differentially expressed genes (DEGs). Then, DEGs were uploaded to the DAVID online diagnostic tool for subsequent Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment on the selected DEGs. Next, protein-protein interaction (PPI) network was established based on the selected DEGs using the STRING website and the Cytoscape software. Furthermore, according to the function of the iRegulon plug-in in Cytoscape, our study further predicts and established regulatory networks related to transcription factors and regulatory genes. In addition, the miRWalk2.0 database was used to search for miRNA-DEG pairs, associated with the conduction of intersections of miRNAs predicted by TargetScan, Miranda, miRDB and RNA22 databases. Then, these miRNA-DEG pairs were also displayed in the form of a regulatory network through Cytoscape. Finally, two datasets were selected to verify the screened genes, regulatory factors, and miRNAs, to plot receiver operating characteristics (ROC) curves and compute the area under the curve (AUC) values. The results showed that AKT1, MMP9, ARG1, ETS1 targeting AKT1, and has-miR-124-3p targeting RPS6KA5 may have diagnostic value for patients with sepsis and septic shock. While further experimental studies are required to confirm their role in septic neutrophils.


Introduction
Sepsis causes life-threatening organ dysfunction caused by an imbalanced response of the host to infections. It has a high incidence, and the condition remains one of the leading causes of death worldwide. It is estimated that there are over 19 million sepsis (previously severe sepsis) cases and 5 million sepsisrelated deaths each year (1,2). It has been recognized to be a major health burden worldwide with considerable economic consequences (3). It is of great practical significance for exploring the molecular mechanism of sepsis to benefit the development of novel therapeutic approaches for this disease.
Neutrophils are the most abundant immune cells in sepsis and the main source of IL-1β and IL-18 (4). Neutrophils are considered to be major players in the host's immune response to sepsis (5). Prior research has documented the critical role of regulating neutrophils in sepsis treatment. For example, inducing neutrophil autophagy may enhance the production of neutrophil extracellular traps (NETs), thereby protecting mice from the lethality induced by sepsis (6).
Accumulated evidence has confirmed the role of important genes and key regulatory miRNAs in the pathogenesis and development of sepsis. For instance, in critical illness and sepsis, high levels of miR-133a are related to the severity of the disease and predict adverse outcomes in critically ill patients (7). Despite the above studies, there is still a lack of comprehensive understanding of the genes and miRNAs that are involved in neutrophils during sepsis. Furthermore, bioinformatics combines computer science and biological science to screen various molecules and signaling pathways, conduct data mining, and study diseases at the molecular level. Generally, bioinformatic analysis is performed first to identify potential molecules associated with diseases, followed by investigation and validation in clinical trials. It has been widely applied in the study of sepsis. Several biomarkers have previously been identified and tested in the treatment of sepsis. However, there are few studies on the miRNA in neutrophils during sepsis. Therefore, this study aims to conduct bioinformatics analysis on the gene expression profile in neutrophils of sepsis.

Materials and methods Data acquisition and preliminary processing
Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) database, a publicly accessible genomics database on PubMed, contains high-throughput gene expression data, chips and microarray data. It was visited to verify the difference of gene expressions in neutrophils between patients with sepsis and healthy subjects using "sepsis" or "sepsis shock" and "neutrophil" as searching terms. A total of 520 gene chip sets were obtained using "Homo sapiens" in Filters. The retrieval program was provided as follows: ( In this study, the inclusion criteria of the datasets we selected were: 1) gene expression dataset; and 2) dataset composed of the normal control group and sepsis or raw data obtained from cases in the septic shock group. At the same time, the exclusion criteria were: 1) datasets with non-encoding RNA and DNA methylation sequence data; and 2) sample data from non-sepsis datasets of patients and non-neutrophils. Finally, 3 gene datasets were selected for data analysis (GSE64457, GSE123729 and GSE100150) from the GEO database, and 2 gene expression profiles (GSE100159 and GSE101639) as the verification datasets. Meanwhile, neutrophil samples were selected from sepsis patients and normal subjects.
The microarray data of GSE64457 was obtained Among them, GSE64457 contained transcription data of neutrophils isolated from peripheral blood of 9 patients with sepsis and 8 healthy volunteers, GSE123729 included 15 sepsis patients and 11 normal subjects, and GSE100150 provided the data of 35 sepsis samples and 12 normal samples.
In the next step, we downloaded the original data of the microarray, the GPL platform files, including CEL files and gene probes information. Based on data in the GPL file, the probe information was converted into a corresponding gene symbol. All data can be obtained through GEO data. At the same time, the whole process of this research did not involve any experimental behavior on humans or animals, with no ethical approval required.

Identifying differentially expressed genes (DEGs)
First, the affy extension package (https://bioconductor.org/packages/3.3/bioc/html/affy.h tml) in R software (version 4.0.2, https://www.rproject.org/) was used to read the gene expression chip data-CEL format data and process them into an expression matrix. The matrix data were further processed, including background correction, normalization, etc., to form a robust multi-array average (RMA). In order to identify DEGs, the limma (linear model of microarray data) package in R software (https://bioconductor.org/packages/release/bioc/html/li mma.html) was selected to analyze and calculate corresponding data of each group (sepsis group and normal group). Value with P<0.05 and |Log2FC|> 1 was determined as the threshold. False discovery rate (FDR) was controlled by the Benjamini-Hochberg method. The DEGs data were uploaded to the Venn diagram website (http://bioinformatics.psb.ugent.be/webtools/Venn) to get the Venn diagrams of the DEGs from the three microarray datasets. All DEGs were displayed in the volcano map made by R software.

Gene Ontology (GO) Analysis and KEGG Pathway Enrichment Analysis
GO-based function enrichment analysis is a bioinformatics method including three common analyses of biological processes (BP), molecular functions (MF), and cellular components (CC). Through previous analysis, the overlapped DEGs among the three groups were submitted to the Database for Annotation, Visualization and Integrated Discovery (DAVID, version 6.8, https://david.ncifcrf.gov) with the species limited as "Homo sapiens" for BP, MF and CC analysis in GO analysis, and KEGG pathway analysis. DAVID provides a comprehensive set of functional annotation tools for investigators to have a functional interpretation of large lists of genes. DAVID can identify enriched biological themes, particularly GO terms. Besides, GO can provide a systematic language for the description of the attributes of genes and their product. KEGG is such a database that can benefit the understanding of the advanced functions and utility of biological systems (e.g., the cell, the organism and the ecosystem) based on molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies. P<0.05 meant that the difference was statistically significant.

Protein-Protein Interaction (PPI) Network and Hub Gene Identification
In order to further explore the protein interaction relationships of DEGs, a PPI network was generated firstly by using the STRING database (V11.0, http://string-db.org). STRING can not only establish PPI Networks but also is available for functional enrichment analysis. We set the minimum required interaction score as >0.4, and the rest followed the default settings. In addition, Cytoscape software (http://www.cytoscape.org/, version 3.8.2) was used to visualize the data exported from STRING to construct the PPI network. By using the CytoHubba program in Cytoscape, the values of degree centrality (DC), betweenness centrality (BC), and closeness centrality (CC) between genes in the PPI network were analyzed to find hub genes. Besides, the MCODE App in Cytoscape software was used to perform module analysis for the established network.

Prediction of transcription factor (TF)-DEG pairs and construction of a regulatory network
In order to predict the TFs in the PPI network, the iRegulon app in Cytoscape was used to process the constructed PPI network, with the normalized enrichment score (NES)>3 as the standard. Finally, a TF-DEG regulatory network was constructed, after which Cytoscape software was applied to visualize the predicted TF-DEG pairs. Genes related to key pathways were selected and processed by the miRWalk2.0 database (http://zmf.umm.uniheidelberg.de/apps/zmf/mirwalk2/ generetsysself.html) to predict their targeted miRNAs. In order to verify the accuracy of the results, TargetScan, Miranda, miRDB and RNA22 databases were also used for the intersection. Then, the screened targets were downloaded from the miRWalk2.0 database. By locating the target genes of miRNAs to the previously calculated DEGs, the miRNA-DEGs regulatory network related to neutrophils in sepsis patients were obtained and displayed in Cytoscape.

Receiver operating characteristic (ROC) analysis
To evaluate whether the DEGs, TFs, and miRNAs screened above were of important diagnostic value for sepsis or septic shock, package "PROC" in R software was utilized to perform ROC analysis on each gene. Meanwhile, the area under the ROC curve (AUC) was calculated to evaluate the accuracy of DEGs, TFs, and miRNAs in the diagnosis of sepsis/septic shock through the values of AUC. In other words, when the value of AUC was >0.8, the gene screened might have a better value in diagnosing sepsis/septic shock patients from normal controls.

Results and discussion Identification of DEGs
After the integration of 3 microarray data related to neutrophils of sepsis patients and normal subjects, the obtained DEGs were crossed with the screening of 603 DEGs in GSE64457, 695 in GSE123729, and 3,158 in GSE100150. [Figures S1(1), S1(2)-(6), S2(1), S2(2)-(6), S3(1) and S3(2)- (6)]. The DEGs of these three datasets are shown in the form of a volcano diagram (Fig 1. A-C). The overlap of the three microarray data contained 116 genes, as displayed in Fig 1.D, including 18 downregulated genes and 98 upregulated genes in the sepsis group when compared with those in the normal group.

PPI Network and Hub Gene Identification
The PPI network of DEGs was constructed by using Cytoscape, involving 116 nodes and 244 interactions (Fig 2. A). Fig 2. B shows the PPI network displayed by Cytoscape. The hub genes were found by the CytoHubba program in Cytoscape. According to the scores of DC, BC and CC, 10 hub genes were determined with the highest network connectivity. The top 10 hub genes identified were AKT serine/threonine kinase 1 (AKT1), matrix metalloproteinase (MMP)9, arginase 1 (ARG1), MMP8, C3AR1, LCN2, QSOX1, TLR5, MAPK14 and FCAR ( Table 5). The gene with the largest number of node networks (n=13) was AKT1. The 3 PPI network modules fulfilled the cutoffs of the degree cutoff (2), the maximum depth (100), the κcore (2) and node score cutoff (0.2) (Fig 2. C-E, Table  6). Among them, genes of module 1 (the highest score, score=10.105) were enriched in Staphylococcus aureus infection pathway (p=4.62E-02) ( The GO and KEGG pathway enrichment analyses of the genes in 3 modules (Table 7) showed that the genes in module 3 containing the hub gene AKT1 were mainly linked to postsynapse, cytosol, and cGMP-PKG signaling pathway.   (2), the maximum depth (100), the κ-core (2) and node score cutoff (0.2). PPI-Protein-protein interaction.

Prediction of TF-DEG pairs and construction of network
The regulatory network of TF-DEG by Cytoscape was constructed, in which TF ETS proto-oncogene 1 (ETS1) targeted 39 DEGs (such as AKT1) (Fig 3). In addition, a pathway analysis was performed on the genes involved in the TF-DEG regulatory network.  Through the iRegulon App in Cytoscape and verified by TargetScan, Miranda, miRDB and RNA22 databases, has-miR-124-3p was screened to be related to neutrophil function in sepsis. Then the targets of has-miR-124-3p were downloaded from the miRWalk2.0 database. Among these target genes, 9 genes (7 upregulated genes and 2 downregulated genes, such as ribosomal protein S6 kinase alpha-5 and RPS6KA5) were found to be differentially expressed. In addition, Fig 4 shows the miRNA-DEG network constructed by Cytoscape. Figure 4. The miRNA-DEG regulatory network and targets of has-miR-124-3p. The red ovals, green diamonds and orange hexagons represent upregulated genes, downregulated genes and miRNAs, respectively. miRNA-microRNA; DEGs-differentially expressed genes.

ROC analysis
According to the results of ROC curve analysis to verify the diagnostic significance of the screened genes and miRs in sepsis, MMP9 (AUC=0.955), ETS1 (AUC=0.950), RPS6KA5 (AUC=0.945), and ARG1 (AUC=0.917) were found to have the ability to clearly distinguish between sepsis patients and normal controls. However, AKT1 revealed a poor ability to recognize sepsis and corresponding controls (AUC=0.602) (Fig 5. A). Similarly, our findings showed that has-miR-124-3p had a significant diagnostic value of distinguishing septic shock (AUC=0.889) and sepsis (AUC=0.833) samples from normal controls (AUC=0.889) (Fig 5. B-D). In our study, 116 DEGs were identified on the basis of sepsis and normal samples from the screened datasets, including 98 upregulated genes and 18 downregulated genes. The PPI network was constructed involving these DEGs. Based on DC, BC and CC scores, AKT1, MMP9 and ARG1 were identified to be the hub genes. In addition, 3 modules were identified from the PPI network. Furthermore, we also constructed the ETS1-DEG and the has-miR-124-3p regulatory networks.
As one of the most versatile kinase families, AKT1 (also known as PKB) serine-threonine kinase is a key regulator of cell survival, proliferation, metabolism and migration. For instance, Ursula et al. constructed a mouse model and demonstrate that the overexpression of AKT1 in lymphocytes can prevent sepsis-induced apoptosis, cause Th1 cytokine propensity, and improve survival when compared with wild-type littermates (8).
Meanwhile, AKT1 phosphorylation could also be stimulated by lncRNA MALAT1 based on the recruitment of EZH2 and decrease of BRCA1 expression, thus accelerating inflammatory response to aggravating sepsis (9). Furthermore, in terms of MMP-9, it has been reported that the administration of inhibitors to inhibit the trimeric proteoform of MMP-9 selectively would increase the presence of neutrophils in the inflammation triggering site, suggesting a possible beneficial role of MMP-9 trimers in inhibiting excessive neutrophil migration or anti-inflammatory effect (10). Reckens et al. found that MMP-9(-/-) mice developed more severe damage to distant organs during infection, implying that MMP-9 may be significantly involved in the effective host response to E. coli peritonitis (11). In addition, Shi, H. et al. demonstrated in their study that through the upregulation of the expression of ARG1 and inducible nitric oxide synthase (iNOS) via Arctigenin, the immunosuppression activity of MDSC could be promoted on M1 macrophage polarization (12). Uhel F. et al found that genes (e.g., MMP8 and ARG1) involved in MDSC inhibitory function could be upregulated in the peripheral blood of patients with sepsis. Besides, the high level of ARG1 was clearly associated with subsequent nosocomial infections (13). GO enrichment analysis in our study showed that AKT1 and MMP9 were enriched in TNF signaling pathway. These evidences indicate that AKT1, MMP9 and ARG1 may all play important roles in neutrophils during sepsis.
Furthermore, in another study carried out by He, H., et al., aging-related inflammation could promote senescence of hepatic stellate cells through the TNFα→ETS1→IL-27ra pathway (14). Shiu, Y. T. and E. A. Jaimes' experimental evidence suggested that ETS1 plays a role in vascular and kidney damage (15). Moreover, ETS1 is involved in regulating the expression of cytokines, growth factors and adhesion molecules. The research of Hua, P., et al. showed that ETS1 is a TF that can transactivate the expression of fibronectin induced by Ang II. In addition, the phosphorylation of ETS1 induced by Ang II involves both the ERK-MAPK pathway and PI3K/Akt pathway (16). In our study, pathway analysis indicated that genes regulated by ETS1 were enriched in MAPK signaling pathway, TNF signaling pathway, Influenza A and HIF-1 signaling pathway. Among them, AKT1 was involved in the above signaling pathways. In the constructed TF-DEG network, AKT1 was the target gene of ETS1, indicating that ETS1 targeting AKT1 may play a role in septic neutrophils through MAPK and TNF signaling pathways.
Concerning the value of miR-124-3p, Liang, Y., et al. found that miR-124-3p treatment could alleviate the degree of lung injury caused by endotoxin (such as alveolar hemorrhage, and interstitial edema), accompanied by a significant reduction in the levels of IL-1β, IL-6 and TNF-alpha. Overexpression of miR-124-3p can significantly inhibit lipopolysaccharideinduced p65 expression in NR8383 cells and in mice and cell apoptosis in NC8383 cells, suggesting a protective role of miR-124-3p by targeting p65 (17). Besides, in a high-quality RCT study on out-ofhospital cardiac arrest, miR-124-3p levels can be used as a predictive tool for neurological prognosis and survival after out-of-hospital cardiac arrest (18). At the same time, it has been demonstrated that RPS6KA5 could regulate protein synthesis and affect cell division, proliferation and differentiation (19,20). In our study, in the miRNA-DEG regulatory network we constructed, RPS6KA5 was targeted by has-miR-124-3p, implying that has-miR-124-3p targeting RPS6KA5 may participate in mediating the functions of neutrophils in sepsis.

Conclusions
To sum up, 116 DEGs were identified from sepsis samples. AKT1, MMP9, and ARG1 are the hub genes. AKT1, MMP9, ARG1, ETS1 targeting AKT1, and has-miR-124-3p targeting RPS6KA5 may have potential values in the diagnosis of sepsis and septic shock. Nevertheless, our study do have some limitations that deserve attention. Firstly, it shall be noted that findings in our study were speculation and assumptions based on bioinformatics analysis. There is an absence of clinical experiments for validation, which may affect the reliability of our study and shall be carried out in the future. In addition, there may be some methodological overlaps considering the emergence of bioinformatic analysis at present. While findings in our study are still valuable for the diagnosis and potential interventions of sepsis. While their roles in neutrophils of sepsis still need further experimental studies to confirm.

Funding
The work in this paper was not funded.

Declaration of interest
The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties. Figure S1(1). The data of the GSE64457 dataset was divided into 2 groups (sepsis patients and normal adults) for analysis, with preliminary processing and analysis performed. A: Box plot after data standardization, different colors represent different datasets. B: PCA results before batch removal for multiple datasets. Different colors represent different datasets; C: PCA results after batch removal, which can be used as a batch of data for subsequent analysis; D: Volcano plots using fold-change values and adjusted P-value. The red point represents the over-expressed mRNAs and the blue point indicates the down-expressed mRNAs with statistical significance. E: Hierarchical clustering analysis of mRNAs, which were differentially expressed in neutrophils between sepsis patients and normal controls.

AKT1
ARG1 ETS1 MMP9 RPS6KA5 Figure S1(2)-(6  Figure S2(1). The data of the GSE123729 dataset was divided into 2 groups (sepsis patients and normal adults) for analysis, with preliminary processing and analysis performed.A: Box plot after data standardization, different colors represent different datasets. B: PCA results before batch removal for multiple datasets. Different colors represent different datasets; C: PCA results after batch removal, which can be used as a batch of data for subsequent analysis; D: Volcano plots using fold-change values and adjusted P values. The red point represents the over-expressed mRNAs and the blue point indicates the downexpressed mRNAs with statistical significance. E: Hierarchical clustering analysis of mRNAs, which were differentially expressed in neutrophils between sepsis patients and normal controls.

AKT1 ARG1 ETS1
MMP9 RPS6KA5 Figure S2(2)-(6). Comparison of AKT1, ARG1, ETS1, MMP9 and RPS6KA5 gene expression in the GSE123729 dataset in sepsis group and normal group (sepsis patients and normal adults). A: Box plot after data standardization, different colors represent different datasets. B: PCA results before batch removal for multiple datasets. Different colors represent different datasets.C: PCA results after batch removal, which can be used as a batch of data for subsequent analysis; D: The expression distribution of AKT1, ARG1, ETS1, MMP9 and RPS6KA5 in each group. The horizontal axis represents different groups of samples, and the vertical axis represents the gene expression distribution. Different colors represent different groups, and the upper left corner represents the p-value test method to indicate statistical significance. Figure S3(1). The data of the GSE100150 dataset was divided into 2 groups (sepsis patients and normal adults) for analysis, with preliminary processing and analysis performed. A: Box plot after data standardization, different colors represent different datasets. B: PCA results before batch removal for multiple datasets. Different colors represent different datasets; C: PCA results after batch removal, which can be used as a batch of data for subsequent analysis; D: Volcano plots using fold-change values and adjusted P values. The red point represents the over-expressed mRNAs and the blue point indicates the downexpressed mRNAs with statistical significance. E: Hierarchical clustering analysis of mRNAs, which were differentially expressed in neutrophils between sepsis patients and normal controls.

AKT1
ARG1 ETS1 MMP9 RPS6KA5 Figure S3(2)-(6). Comparison of AKT1, ARG1, ETS1, MMP9 and RPS6KA5 gene expression in the GSE100150 dataset in sepsis group and normal group (sepsis patients and normal adults). A: Box plot after data standardization, different colors represent different datasets. B: PCA results before batch removal for multiple datasets. Different colors represent different datasets.C: PCA results after batch removal, which can be used as a batch of data for subsequent analysis; D: The expression distribution of AKT1, ARG1, ETS1, MMP9 and RPS6KA5 in each group. The horizontal axis represents different groups of samples, and the vertical axis represents the gene expression distribution. Different colors represent different groups, and the upper left corner represents the p-value test method to indicate statistical significance.