Research on Personal Data Privacy Security in the Era of Big Data

| ABSTRACT Big data privacy security has become a hot research topic in contemporary society. Based on the data relevance and life-cycle in the era of big data, this paper analyzes the causes of security problems in China’s data privacy. It puts forward suggestions from three aspects to provide references for subsequent research. Based on the current research progress, this paper first sorts out the definitions of data privacy and data privacy protection, then summarizes the causes of privacy security from the perspectives of technology and management and reveals the consequences of data privacy security issues. The demonstrated results trigger an insight into the solution strategy of data privacy problems and offer suggestions for solving problems from the perspective of management based on the data life-cycle model. Finally, starting from other stages of the data life-cycle and the application scenarios of big data, this paper looks forward to the future research direction. This study found that the present study needs to focus on the combination of system and technology, the improvement of laws and regulations, and the data life-cycle model in both technical and institutional management.


Introduction
Since the development of the computer industrial revolution (also known as the third industrial revolution), the continuously accumulated information is retained in the virtual world in an exponentially explosive growth trend, which promotes the rapid development of information technology and then drives all fields of society into the era of big data (Jin, 2020). As the core technology of big data, the data management system has the advantages of high compatibility, strong distribution, and excellent memory. It is used to process big data with large capacity, multiple types, rapid change, and low quality (Du et al., 2019). Under the processing of a data management system, big data has favorable properties such as relevance, non-expendable, sharing, and fidelity. These unique properties of big data make data gradually become the "contemporary oil" in the mind of more and more people (Wen, 2010;Yan, 2020). While the value of data is constantly explored, more and more data infringement cases are loaded into the database, and data privacy security has gradually become the focus of people's attention.
In recent years, research on big data privacy has been carried out mainly from protection countermeasures, contradictory relations, technical support, etc. First is the Protection Countermeasures analysis of personal privacy infringement under the digital background. This research refers to the research direction of providing reasonable suggestions for data privacy protection countermeasures by being familiar with the complexity of data privacy infringement and re-parsing personal sensitive data (Zhou & Xu, 2015;Jiang, 2019) under the influence of the diversification and sophistication of infringement caused by big data. The second is research on personal data privacy protection and data sharing behavior. It refers to the research direction of tracing the source of data privacy security issues by analyzing the objective contradiction between data sharing and privacy protection (Tian & Huang, 2014;Zhang & Zhu, 2014) and provides reference materials for solving data privacy issues. Third, the expansionary theory of data privacy and the research on the expected positioning of rights. It refers to the research direction of systematically improving the data privacy protection and countermeasure system based on existing privacy protection laws and regulations (Wang & Zhao, 2015;Zhang, 2015) and then supplementing the content of data privacy and data privacy protection from a theoretical perspective. The development of the three research directions of data privacy protection has promoted the improvement of data security systems in many industries, announced the progress of digitization of relevant laws and regulations, and is of great significance to the development of digital civilization.
Data privacy research is still in its infancy. From the dimension of time, the research on data privacy is 2-4 years later than the research on the development and application of big data on average. Hence, the research on big data privacy is insignificant compared to the research on the development and application of big data. The amount of research on the disadvantages of big data is far less than the amount of research on the advantages, leading to the current data privacy research is not enough to deal with the complex problems in the application. In addition to the small amount of research, immature research on data privacy is also reflected in the following three aspects: First, the academic community has not yet formed a unified interpretation of data privacy (Peng, 2021). Second, most studies on new data protection technologies have not been transferred to the stage of practical application . Third, there have been no systematic suggestions on data privacy protection strategies (Zhu & Li, 2021). The above phenomenon shows that data privacy research only stays in specific areas and lacks the macroscopic overhead of its complete system. Therefore, this paper will summarize recent research on privacy data. First, the definition of data privacy and data privacy protection are sorted out in section one. Then the causes and impacts of new data privacy problems are summarized in section two. Finally, in the third section, this paper puts forward suggestions to solve the big data privacy security problem from different perspectives, hoping to provide some references for the follow-up research.

privacy and privacy protection in the context of the era of big data new interpretation
Data privacy and protection are indispensable to the research on data privacy security. A basic understanding of data privacy and protection is the premise of studying data privacy security. The following will discuss the two aspects of data privacy and data privacy protection to study data privacy security issues and focus on the differences and connections between them and traditional privacy.

Data Privacy and Data Privacy Rights
Data privacy has not only the complex nature of traditional privacy but also the other unique nature of digitized content. With the rapid development of big data, the connotation of privacy is constantly transmuting. Traditional privacy interpretations have developed in various fields such as psychology, sociology, economics, and law, and the era of big data has enriched its explanations from the information systems perspective. Allen (2004) proposes that "control of all one's information is a privacy," which links privacy and information as an early explanation for data privacy. There are many interpretations of data privacy in the academic community and different views. Most of these scholars believe that data privacy is part of the sensitive data that an individual or organization is not allowed to be known to outsiders, including the connection that can be expressed after the data has been processed, such as the browsing records of individuals and the financial status of the company (e.g., Huang et al., 2015;Meng & Zhang, 2015;Xu & Hu, 2019). Compared with traditional privacy, the unique feature of data privacy is that it includes information and data that can be inferred, which supplemented the new privacy connotation and gradually became a new paradigm that provides for data privacy (Lu, 2018;Ren et al., 2022). This expansive change in privacy connotation will lead to changes in related rights, and privacy rights closely related to privacy connotations will also develop new theories and rights positioning.
Privacy and privacy rights are closely related, and the change in privacy data has led to expansionary variations in the connotation of privacy. From the jurisprudential perspective, privacy is essentially a right. This claim can be traced back to the 1890s when Louis Brandeis and Samuel Warren (1890) proposed in the United States that the right to privacy is a right to be alone. In general, although privacy and privacy rights cannot be confused, in the study of privacy security issues, the two are inextricably linked, and the content of privacy increases will even directly affect the scope of privacy rights protection (Gu & Fan, 2018). The range of both, privacy is secret information that individuals do not want to disclose, while data privacy refers to the digitized form of this part of confidential information (Gu, 2021). In essence, privacy and data privacy are informational, and data privacy is the product of expressing private details through binary code. And this closely linked relationship has prompted the expansion of the scope of the object of privacy rights to all data that can be directed to infer or identify personal information (Wang & Yang, 2017;Shang, 2020). All in all, privacy is a right at some level, and its data-driven reform indirectly affects the positioning of its rights, which inaugurates new connotations about data privacy. To research security issues of data privacy rights, it is necessary to have a preliminary understanding of data privacy rights.
The era of big data has given new attributes to data privacy and expanded the theoretical boundaries of traditional privacy, which makes it more difficult to protect data. The conventional right of confidentiality refers to citizens' right to control information inherent in individuals, does not harm the public interest, and is unwilling to be disclosed (Wang, 1994). Although similar to traditional privacy rights, the subject of data privacy rights is generally a natural person whose goal is also to protect vital personal interests (Meng & Zhang, 2015). Still, with the continuous expansion of the definition of privacy rights in the era of big data, data privacy rights have gradually derived new attributes. First, the value weight of privacy rights from the economic perspective has increased after entering the age of big data. This change in property attributes stems from the ability of privacy to be given value creation in the era of big data. Due to the inference and practical nature of private data, the proportion of personal data as a property function has increased compared with the traditional privacy focus on personality rights (Mao, 2019). Second, compared with conventional privacy rights, the object of data privacy rights has the characteristics of the data life-cycle, data type diversification, and inferred transmission relationship between data, thereby expanding the scope of protection of data privacy rights. Third, due to the characteristics of difficulty in identifying the power boundary, difficulty in controlling the consequences of infringement, and difficulty in applying current laws and regulations (Meng & Zhang, 2015), compared with traditional privacy, the protection of data privacy is much more difficult. The above three new changes come from the technological development of the information age and the big data era. The derived data privacy and data privacy rights put forward higher requirements for privacy protection.

Data Privacy Protection and Protection Technologies
Data privacy protection needs to focus on the dynamic balance of privacy confidentiality and data availability, which is different from traditional privacy protection. Legal privacy protection aims to protect private information and prevent personal privacy from being maliciously stolen (Zhu & Zeng, 2021). Compared with the original privacy protection, data privacy protection aims to minimize the risk of private data leakage while reasonably grasping data availability. Compared with the two protection purposes, data privacy protection seems more likely to be used as a balancing mechanism. The main reason for this discrepancy is that the privacy protection object includes two dimensions. One of the dimensions is the protection of the individual, which refers to the safety of the individual's private data from leakage and the provision of technical operation security support for the individual when the user accesses or uses the data information. Another dimension is the protection of data, which is the use of relevant technologies to protect personal privacy data and maintain the regular availability of data when transmitting or encrypting data (Qian, 2013). Based on the above dimension of protection requirements analysis, data privacy protection is based on traditional privacy protection to put forward more refined requirements. The upgrade of data privacy protection requirements has increased the difficulty of data privacy protection, and the protection technology closely related to privacy protection also needs to be innovated.
The difficulty of big data privacy protection catalyzes the secondary growth of data privacy protection technology, and data protection technology can be divided into different types according to different angles. First, the methods of protecting data privacy by technology are divided into active and passive types (Zhang, 2019). Active type technology refers to adding identification attributes to personal data so that the data can be proved to be the source of the data in the use stage; passive method technology is currently widely used, mainly refers to the encryption, reconstruction, and control of personal privacy data, which to interfere with the identification and inference of privacy data during the data use phase. Secondly, based on the life-cycle model of data privacy, data privacy protection technology can be divided into protection technology in the data collection stage, protection technology in the data release stage, and protection technology in the data access stage. Among them, the protection technology for the data release stage is currently a hot topic in academic research , which can be roughly divided into three categories: grouping technology (Samarati & Sweeney, 1998), encryption protection technology (Qiu & Li, 2018) and release distortion technology (Li et al., 2013). At the same time, these three types of technologies have been extended to privacy protection technologies in other life-cycle-based stages, supplementing the technical gaps in the protection methods at the acquisition and access stages. These complementary and constantly derived new technical theories have become the development trend of technology research of data privacy protection.

Data privacy security issues under the contradiction between "disclosure" and "protection."
The contradiction between developing big data technology and ensuring data privacy security is challenging to avoid. The development of big data technology depends on data disclosure. However, the practice of disclosing data will objectively infringe the privacy of organizations or individuals, which makes the relationship between the development of big data technology and the protection of privacy implicit deviation and contradiction, and producing the paradox of "data disclosure" and "privacy protection" (Yan, 2020). This paradox is widely present in the privacy security issues in the era of big data, which dramatically increases the difficulty of privacy protection. Furthermore, this contradictory relationship may become one of the motivations for privacy theft. On the one hand, enhancing the property attributes of data privacy has increased the proceeds of stealing private data.
On the other hand, data disclosure makes private data relatively easy to obtain, coupled with the objective paradox that creates activity space for the theft of privacy. The combined effect of the two reduces the cost of illegal data theft. Faced with the temptation of such economic benefits, more and more people have chosen the unlawful act of stealing private data. Therefore, to protect personal privacy and develop science and technology simultaneously, many scholars have researched the causes of privacy security issues around the contradiction between the two (e.g., Tian & Huang, 2014;Loukis, 2016;Zhu & Li, 2021). Studies have shown that taking data and data subjects as reference objects can mainly attribute privacy security issues to two aspects: technology and management.

Privacy data definition technology and data protection technology
The traditional privacy data identification technology has lost its original role due to the popularity of big data, failing sensitive data screening technology. At present, privacy data identification technology divides highly recognizable data such as the name, residential address, and education of data subjects into sensitive data and data with low recognition, such as birthdays and hobbies, into ordinary data. With the development of network communication technology and big data technology, the above two will become possible to be interconnected and transformed: attackers can collect ordinary data of objects, take advantage of the transmutation of privacy content, the ambiguity of privacy boundary, and obtain sensitive data through link attack models, which makes traditional privacy definition technology gradually lose the ability to isolate and effectively distinguish standard privacy (Li et al., 2010;Yan, 2020;Zhu & Li, 2021). Not only that, but the rapidly evolving technologies of data collection at scale, modern data access and extraction technology, and late data model analytical technologies, as well as the barriers to data sharing that these technologies have spawned (Meng, 2015;Yan, 2020), which not only makes it more difficult to define and protect privacy but also widens the gap between individuals and enterprises and governments with massive databases (Zhang, 2019;Zhu & Li, 2021). Traditional privacy identification technology fails because the conventional privacy positioning and theory fail to effectively cover the scope of privacy authority attribution under big data (Long, 2014). Failed privacy-identification technologies and the widening data divide have upset the balance between individuals and data resource owners. Therefore, the present study needs solid and effective privacy data identification technology, data protection technology, and relevant laws and regulations to adjust the relationship between the two.
Traditional privacy data protection technologies focus on more minor data and are less effective at dealing with extensive and multi-type big data. On the one hand, the conventional Anonymous method commonly used at present has become easy to crack in big data, the Fuzzing technology is cumbersome and lowly applicable in the face of large amounts of data, and the traceability query method is too limited in function construction, and it isn't easy to adapt to the complex situation of reality (Meng, 2015; Feng, 2019). On the other hand, the new data protection technologies generated under big data also have the limitations of difficulty in the popularization and immature technology. Although the much-concerned differential privacy can effectively prevent data traceability, it is difficult to independently and appropriately determine the input privacy parameters (Meng, 2015). Although Homomorphic encryption technology can encrypt private data, it has the limitation of higher requirements for computing performance and the disadvantage of higher cost (Liu et al., 2022). Although secure multi-party computing can improve the security of encryption and the efficiency of the guarantee protocol, it still has the defect of being unable to effectively defend against attackers who break the protocol and are too costly (Li, 2007). Other new protection technologies, such as Blockchain, have also been technically Bottlenecked by insufficient data throughput capabilities and limited application scope (Liu et al., 2022), resulting in the gradual elimination of old protection technologies and the fact that new technologies have not yet been rolled out. This phenomenon reflects that traditional protection technologies can no longer meet the needs of privacy protection in the era of big data, which is one of the leading causes of data privacy security issues.

Data subject awareness management and data privacy policy management
Internal factors at the management level mainly come from the lack of awareness of the behavior of data subjects. This paper takes the data subject as the reference object. It divides the factors at the management level into the internal consciousness management of the issue and the external policy management. With the continuous enrichment of the content of the big data industry and the continuous clarification of the industrial division of labor, data subjects can gradually be divided into users, data owners, and data producers. In China, the privacy protection awareness of the three types of data subjects has different degrees of defects. First of all, data owners hold a large number of information resources. They should shoulder the responsibility of data disclosure and privacy protection. Still, in the face of economic temptations, some data owners, such as enterprises and governments, have shown a lack of sense of responsibility and insufficient self-control capabilities, which in turn has spawned a variety of privacy cases that are complex and difficult to deal with by existing laws (Zhang, 2019; Zhu & Li, 2021). Second, most users and data producers, as objects of data privacy protection, are ignored by their increasingly open values and weak awareness of data protection, resulting in their personal privacy information being snooped, stolen, and exploited (Di, 2016;Feng, 2019). Finally, the privacy awareness of most data subjects has limitations and lags (Mao, 2019). First, the infringer believes that as long as it does not involve the loss of economic and life interests, it is unnecessary to pay attention to the infringement. In addition, the infringe often becomes aware of data privacy protection after being violated by data privacy. This lag and limitation create a large amount of activity space for various privacy violations. Together with the lack of data privacy management awareness of data subjects, it constitutes the main internal management factor of data privacy leakage problems, which objectively increases the difficulty of solving data privacy security issues. External factors at the management level mainly come from the shortcomings in managing data privacy protection policies. Shang (2020) stressed that the fundamental element of personal privacy leakage in the era of big data lies in the lack of a suitable data privacy legal system. In other words, due to the late start of research on data privacy rights, compared with developed countries, countries still have problems such as unclear division of rights, imperfect laws and regulations, and poor ways to protect rights (Wen, 2010). The flaws of this system have led to continuous contradictions and conflicts between traditional privacy protection strategies and big data-guided development trends, which have made the original network normative order questionable (Zhang, 2019). The failure of the normative order is embodied in the declining moral cultivation of the infringer and the increasing demand for privacy protection of the infringed party (Xie, 2019). This also reflects the lack of attention paid by the government to data privacy issues in another way. In most countries' current judicial processing, there are missed examples of judicial authorities making expansive interpretations of privacy rights based on data characteristics (Shang, 2020). In terms of international information management legislation, countries have long focused on ensuring the regular operation of the Internet and passively withstanding hacker attacks, thus ignoring the formulation of systematic laws and regulations, which finally leads to the current situation of weak handling of complex data privacy cases (Yin & Wang, 2016;Ning & Li, 2020;Zhu & Li, 2021). Whether in the implementation of the system or legal attention, the degree of attention to data privacy and security needs to be improved.

Information security issues are becoming increasingly prominent in the era of big data
The security and harm caused by data privacy have been further amplified in the Internet era. The value of personal data has been more deeply excavated in the period of big data, and it has become a commodity that occupies a vast supply, demand, and market. Organizations or individuals have been able to collect and analyze data on a larger scale, resulting in various "overlord clauses" for collecting personal data information and even making profits by reselling personal data, constantly breeding eclectic illegal collection of personal data (Zhu & Li, 2021). Once personal data is collected and uploaded to various databases in the online world, it cannot escape the fate of automated surveillance and secondary exploitation (Yuan, 2015;Zhu & Li, 2021). On the one hand, this result will lead to the misuse and dissemination of personal data, resulting in the invasion of personal privacy (Xu & Dong, 2014).
On the other hand, due to the fast, large volume, and significant scope of information dissemination, it is more likely that individuals with weak awareness of specific economic, cultural, and political situations and privacy awareness will be illegally collected and forced to leak personal data (Shang, 2020) repeatedly. Once personal privacy data is leaked, citizens' right to know and right to information self-determination will be seriously infringed (Zhu & Li, 2021), resulting in serious interference in the expected life of the victim, which in turn violates personal real-life, amplifies social contradictions, and endangers the public interest (Mao, 2019;Peng, 2021), infrastructure construction that ultimately causes losses to the real economy and threatens the safety of individual lives and national defense security . The above phenomena show that data privacy security issues affect different individuals to varying degrees, and the harm caused by them has touched the fundamental interests of the majority of people.
In the era of big data, the privacy disclosure of personal data leads to multiple interests, affecting all walks of life. Personal data includes primary personal data, personal privacy data, and non-primary essential information between two, and the consequences of privacy data and information theft are becoming increasingly severe. Information security has become an urgent problem to be solved by all walks of life (Gu, 2021). To obtain the convenience brought by big data, people need to continuously disclose personal data, which leads to data privacy being under monitoring at any time, and the risk of leakage will be significantly improved; however, if people are cautious in their words and deeds to avoid the leakage of privacy, they cannot enjoy the "freedom" brought by the era of big data (Chen & Huang, 2016;Feng, 2019). Moreover, more and more people find that their behavior of disclosing personal data begins to change from active to passive. The fear of losing control of information affects the development of all walks of life, so people urgently need a safe and efficient information protection system to ensure the quality of data release and protect data that may leak privacy . It can be seen that it is imperative to solve the problem of personal privacy and security in the era of big data, and it is urgent to need a relatively complete data privacy protection management system. To this end, the following privacy management suggestions based on the life-cycle model will be proposed for the causes of data privacy security issues.

Data privacy protection recommendations based on the data life-cycle model
Cracking the dilemma of data privacy security requires seeking a balance between privacy protection and data openness. The realization of this balance can take the data life-cycle model as a reference and implement every step of theoretical and technical research on the actual situation. This section will put forward guiding suggestions for data privacy protection in three aspects: technical management, system management, and awareness management based on the management perspective aiming to provide specific references for solving data privacy security issues.

Technical management level
In terms of technical management, maintaining data privacy security needs to start from three aspects: privacy protection technology, big data algorithms, and industry standards. First, privacy protection technology needs to be carried out around the data life-cycle (Meng, 2015), with professional technology research and development for each data stage. On this basis, the advanced technology protection theory of all parties is integrated, then promotes the implementation of protection technology from theory to practical application, and finally establishes a set of safe, comprehensive, efficient, and professional technology management systems for computer security. Secondly, big data network algorithms include data operations of the whole data lifecycle, which need to pay more attention to the identification and protection of personal privacy at the beginning of the design of requirements (Zhang, 2022), thereby enhancing people's control over high-order algorithms of big data, and finally establishing a series of more humanitarian computer algorithms and algorithm management standards from data collection to data push. Finally, computer data technology and data protection technology also need to follow the data life-cycle model (Zhu & Li, 2021). Different technical means and guidelines should be adopted for different data stages to establish technical standards that can maintain the harmonious development of people and technology. If the data life-cycle model can be introduced into the structural design and functional implementation of privacy protection technology, and the people-oriented standards can be integrated into technology research and development and standard-setting, data privacy security issues may be improved.

Institutional management level
In terms of system management, improving data privacy security requires paying attention to the management system based on the data life-cycle model. Zhu & Li (2021) emphasized the importance of combining the data life-cycle with the management system and derived a scientific analysis framework from the model that combines data sensitivity and data stage. According to the framework of the above analysis, this paper proposes the following five steps of data privacy protection methods: First, in the data deletion stage, it is necessary to establish a data operating system that the data subject can operate, reduce the uncontrollability of the data removal stage, and thus protect the individual data forgetting right (Wang & Zhao, 2020). Second, in the data release stage, the government must take the lead in formulating fair data transaction guidelines and a complete data management system. At the same time, enterprises must pay attention to personal data security protection, standardize the "overlord clause" of the application, and cooperate in purifying the network environment of data release. Third, in the data application stage, the government needs to set up a particular sensitive information protection agency, build a multi-party reporting system for the government, enterprises, and individuals, and form a triangular relationship of mutual restraint and mutual supervision between the three parties, thereby ensuring the user's data privacy. Fourth, in the data transportation stage, all parties can establish a data transportation confidentiality mechanism through the supervision of an independent third-party agency. The government should introduce relevant policies to crack down on the illegal theft of private data severely. Finally, in the data production and collection stage, all parties should first clarify the purpose of data collection, divide the data's sensitivity, and ultimately refer to different sensitivity levels and combine the protection technology to protect the data hierarchically. The management method based on the data model will be the multi-faceted and multi-level refinement of data privacy protection. To a certain extent, it can improve the overall framework of data protection management.

Awareness management level
In terms of awareness management, improving awareness of data privacy protection for individuals and enterprises is an essential part of awareness management. The ultimate goal of addressing data privacy security issues is to protect privacy while preserving the positive impact of technology on productivity. Although the sound development of protection technology and management norms can effectively help realize the goal, the article emphasizes the importance of raising citizens' awareness of data privacy protection. First, individuals should cultivate healthy online habits, pay more attention to and reject illegal websites, reduce the public release of sensitive personal information on social software, and develop self-disciplined privacy protection habits (Gu, 2021). Secondly, enterprises and organizations should cultivate a sense of privacy and ethical responsibility (Li, 2022), fully understand the harm of data privacy leakage, and refuse the illegal transaction of personal privacy data, to establish a sense of industry ethics, and ultimately establish an industry benchmark and corporate culture with moral responsibility. Finally, the government and enterprises should implement data privacy education and create a privacy protection atmosphere, and carry out education related to a dialectical view of data privacy and data privacy protection, to reduce the limitations and lag of people's awareness of protection, thereby improving media literacy and personal quality in the era of big data, and ultimately establishing a correct view of privacy and values. Any awareness cultivation requires the joint efforts of individuals to organizations from all walks of life, and the awareness cultivation related to data protection also requires mutual help from all walks of life to break the boundaries of multi-party relationships under big data and ultimately create opportunities for exploring a balance to solve privacy problems.

Conclusion
This paper mainly starts from the current research situation of data privacy protection, researches the data-sharing stage in the data life-cycle, and introduces the external and internal factors of data privacy security issues. Starting from the internal and external factors introduced, the present study summarizes the shortcomings of existing data protection technology and the omissions of current data privacy management, sums up the existing and potential harms of privacy security issues, and then puts forward references from the perspective of the data life-cycle. There is a widespread problem in data privacy protection that the protection mechanism can not keep up with the development of technology, which still needs to be sorted out and discussed in depth to prospect the future research direction.
At present, the research on data privacy and security can expand the scope of the study. First, future research needs to focus on other data life-cycle stages. Privacy objects at all data life-cycle stages deserve to be analyzed and studied. The direction of followup research can be extended to the data production and destruction stages of the life-cycle. Second, future research should be more integrated with big data application scenarios. Big data has spawned many new formats, and different industries and technologies have additional requirements for data privacy protection. The research on data privacy protection needs to spread to specific issues of other objects, and many contents are worthy of further study.