High-concurrency systems avoid frequent Y-GC

2022-04-27 0 By

Businesses use your platform and generate a lot of data. Based on this data, you need to provide some data reports for businesses, such as: How many visitors does each business have per day?How many transactions?What is the conversion rate?This requires BI system, Business Intelligence and Business Intelligence.Let the business can better understand their own operating conditions, and then let the boss “intelligent” to adjust the business strategy, improve performance.Therefore, a BI system like this will first collect a lot of business daily operation data from a platform we provide for business daily use: then it can calculate these business data based on various big data computing platforms, such as Hadoop, Spark, Flink, etc., and get data reports:Then, all the calculated data analysis reports can be persisted, such as MySQL, Elastcisearch and HBase:The last step is to develop a BI system based on Java based on the data reports of MySQL, HBase and ES storage, which exposes all kinds of stored data to the front end and allows the front end to conduct complex screening and analysis of stored data based on various conditions:At the beginning of the system deployment architecture, there were not many merchants, such as thousands of merchants, so the system deployment was simple at the beginning. Several 4-core 8G machines deployed BI system, and about 1.5G memory was allocated to the new generation in the heap memory, and about 1G was allocated to Eden area:When the number of merchants using the system suddenly starts to surge, there is a kind of data report in THE BI system. There is a JS script on the front page that automatically sends requests to the background to refresh data every few seconds, namely “real-time data report” :Assuming that only a few tens of thousands of merchants are users of your system, you may have thousands of merchants open that real-time report at the same time.Then, after each merchant opens the real-time report, the front-end page will request the background to load the latest data every few seconds. Each machine deployed by BI system has hundreds of requests per second, assuming 500 requests per second.Each request then loads out a large amount of data needed for a report, since the BI system may need to perform in-memory field calculations on that data before returning it to the front end.Each request needs to load 100KB data for calculation, so for 500 requests per second, 50MB data needs to be loaded into memory for calculation: Frequent Y-GC Under the above system operation model, 50M data will be loaded into Eden area every second. As long as 200s, Eden will be filled and Y-GC will be triggered.Eden at around 1G can perform Y-GC at a faster speed, which may be completed in dozens of ms.Therefore, the impact on system performance is not significant.Moreover, in BI scenarios, the surviving objects may be tens or even several megabytes after each Y-GC.Therefore, if this is only the case, the BI system may be stuck at 10ms after a few minutes of operation, but there is almost no impact on the end user and system performance: as the number of machines using large memory increases, the concurrent pressure is increasing, and the peak time is 100,000 requests /s.If it’s still 8GB with 4 cores, you might have to deploy hundreds of machines for 100,000 concurrent sessions.Therefore, the general upgrade machine configuration, BI itself is a memory eating system, so we will fully upgrade the deployment machine to 16 core 32GB.Each machine can withstand thousands of concurrent sessions, deploying 20 to 30 of them.If large memory machine is used, the new generation will be divided into at least 20G, and Eden region will occupy more than 16G:At this time, thousands of requests /s will load hundreds of MB of data into the memory every second, which may be tens of seconds or even 1min. Eden area will be filled and Y-GC will be performed. The memory of such a large size will be recovered at a much slower speed, which may cause the system to be stuck for hundreds of ms or 1s:If the system is stuck for a long time, it will inevitably cause a lot of requests to be overhauled and queued instantly. In serious cases, it will cause frequent timeout of online front-end requests. After the front-end requests are received, an error will be reported if 1 or 2 seconds has not been returned.G1 optimized y-GC machine with large memory. G1 was adopted to solve the problem of slow Y-GC with large memory.Set the expected pause time for G1, such as 100ms, and let G1 ensure that the maximum pause time for each Y-GC is 100ms to avoid affecting users.In this way, maybe Y-GC is more frequent, but each pause is small enough to have less impact on the system.In summary, even if y-GC happens very frequently, it does not have much impact on the system.Only if your machine has a lot of memory, be aware that Y-GC can also cause long pauses, and direct G1 is recommended for machines with a lot of memory.