」工欲善其事,必先利其器。「—孔子《論語.錄靈公》
首頁 > 程式設計 > 棘手的 Golang 面試問題 - 部分數據競賽

棘手的 Golang 面試問題 - 部分數據競賽

發佈於2024-11-09
瀏覽:703

Tricky Golang interview questions - Part Data Race

Here is another code review interview question for you. This question is more advanced than the previous ones and is targeted toward a more senior audience. The problem requires knowledge of slices and sharing data between parallel processes.

If you're not familiar with the slices and how they are constructed, please check out my previous article about the Slice Header

What is a Data Race?

A data race occurs when two or more threads (or goroutines, in the case of Go) concurrently access shared memory, and at least one of those accesses is a write operation. If there are no proper synchronization mechanisms (such as locks or channels) in place to manage access, the result can be unpredictable behavior, including corruption of data, inconsistent states, or crashes.

In essence, a data race happens when:

  • Two or more threads (or goroutines) access the same memory location at the same time.
  • At least one of the threads (or goroutines) is writing to that memory.
  • There is no synchronization to control the access to that memory.

Because of this, the order in which the threads or goroutines access or modify the shared memory is unpredictable, leading to non-deterministic behavior that can vary between runs.

      ----------------------        --------------------- 
     | Thread A: Write      |      | Thread B: Read      |
      ----------------------        --------------------- 
     | 1. Reads x           |      | 1. Reads x          |
     | 2. Adds 1 to x       |      |                     |
     | 3. Writes new value  |      |                     |
      ----------------------        --------------------- 

                    Shared variable x
                    (Concurrent access without synchronization)

Here, Thread A is modifying x (writing to it), while Thread B is reading it at the same time. If both threads are running concurrently and there’s no synchronization, Thread B could read x before Thread A has finished updating it. As a result, the data could be incorrect or inconsistent.

Question: One of your teammates submitted the following code for a code review. Please review the code carefully and identify any potential issues.
And here the code that you have to review:

package main  

import (  
    "bufio"  
    "bytes"
    "io"
    "math/rand"
    "time"
)  

func genData() []byte {  
    r := rand.New(rand.NewSource(time.Now().Unix()))  
    buffer := make([]byte, 512)  
    if _, err := r.Read(buffer); err != nil {  
       return nil  
    }  
    return buffer  
}  

func publish(input []byte, output chan



What we have here?

The publish() function is responsible for reading the input data chunk by chunk and sending each chunk to the output channel. It begins by using bytes.NewReader(input) to create a reader from the input data, which allows the data to be read sequentially. A buffer of size 8 is created to hold each chunk of data as it’s being read from the input. During each iteration, reader.Read(buffer) reads up to 8 bytes from the input, and the function then sends a slice of this buffer (buffer[:n]) containing up to 8 bytes to the output channel. The loop continues until reader.Read(buffer) either encounters an error or reaches the end of the input data.

The consume() function handles the data chunks received from the channel. It processes these chunks using a bufio.Scanner, which scans each chunk of data, potentially breaking it into lines or tokens depending on how it’s configured. The variable b := scanner.Bytes() retrieves the current token being scanned. This function represents a basic input processing.

The main() creates a buffered channel chunkChannel with a capacity equal to workersCount, which is set to 4 in this case. The function then launches 4 worker goroutines, each of which will read data from the chunkChannel concurrently. Every time a worker receives a chunk of data, it processes the chunk by calling the consume() function. The publish() function reads the generated data, breaks it into chunks of up to 8 bytes, and sends them to the channel.

The program uses goroutines to create multiple consumers, allowing for concurrent data processing. Each consumer runs in a separate goroutine, processing chunks of data independently.

If you run this code, noting suspicious will happen:

[Running] go run "main.go"

[Done] exited with code=0 in 0.94 seconds

But there is a problem. We have a Data Race Risk. In this code, there’s a potential data race because the publish() function reuses the same buffer slice for each chunk. The consumers are reading from this buffer concurrently, and since slices share underlying memory, multiple consumers could be reading the same memory, leading to a data race. Let's try to use a race detection. Go provides a built-in tool to detect data races: the race detector. You can enable it by running your program with the -race flag:

go run -race main.go

If we add the -race flag to the run command we will receive the following output:

[Running] go run -race "main.go"

==================
WARNING: DATA RACE
Read at 0x00c00011e018 by goroutine 6:
  runtime.slicecopy()
      /GOROOT/go1.22.0/src/runtime/slice.go:325  0x0
  bytes.(*Reader).Read()
      /GOROOT/go1.22.0/src/bytes/reader.go:44  0xcc
  bufio.(*Scanner).Scan()
      /GOROOT/go1.22.0/src/bufio/scan.go:219  0xef4
  main.consume()
      /GOPATH/example/main.go:40  0x140
  main.main.func1()
      /GOPATH/example/main.go:55  0x48

Previous write at 0x00c00011e018 by main goroutine:
  runtime.slicecopy()
      /GOROOT/go1.22.0/src/runtime/slice.go:325  0x0
  bytes.(*Reader).Read()
      /GOROOT/go1.22.0/src/bytes/reader.go:44  0x168
  main.publish()
      /GOPATH/example/main.go:27  0xe4
  main.main()
      /GOPATH/example/main.go:60  0xdc

Goroutine 6 (running) created at:
  main.main()
      /GOPATH/example/main.go:53  0x50
==================
Found 1 data race(s)
exit status 66

[Done] exited with code=0 in 0.94 seconds

The warning you’re seeing is a classic data race detected by Go’s race detector. The warning message indicates that two goroutines are accessing the same memory location (0x00c00011e018) concurrently. One goroutine is reading from this memory, while another goroutine is writing to it at the same time, without proper synchronization.

The first part of the warning tells us that Goroutine 6 (which is one of the worker goroutines in your program) is reading from the memory address 0x00c00011e018 during a call to bufio.Scanner.Scan() inside the consume() function.

Read at 0x00c00011e018 by goroutine 6:
  runtime.slicecopy()
  /GOROOT/go1.22.0/src/runtime/slice.go:325  0x0
  bytes.(*Reader).Read()
  /GOROOT/go1.22.0/src/bytes/reader.go:44  0xcc
  bufio.(*Scanner).Scan()
  /GOROOT/go1.22.0/src/bufio/scan.go:219  0xef4
  main.consume()
  /GOPATH/example/main.go:40  0x140
  main.main.func1()
  /GOPATH/example/main.go:55  0x48

The second part of the warning shows that the main goroutine previously wrote to the same memory location (0x00c00011e018) during a call to bytes.Reader.Read() inside the publish() function.

Previous write at 0x00c00011e018 by main goroutine:
  runtime.slicecopy()
  /GOROOT/go1.22.0/src/runtime/slice.go:325  0x0
  bytes.(*Reader).Read()
  /GOROOT/go1.22.0/src/bytes/reader.go:44  0x168
  main.publish()
  /GOPATH/example/main.go:27  0xe4
  main.main()
  /GOPATH/example/main.go:60  0xdc

The final part of the warning explains that Goroutine 6 was created in the main function.

Goroutine 6 (running) created at:
  main.main()
  /GOPATH/example/main.go:53  0x50

In this case, while one goroutine (Goroutine 6) is reading from the buffer in consume(), the publish() function in the main goroutine is simultaneously writing to the same buffer, leading to the data race.

 -------------------                 -------------------- 
|     Publisher     |               |      Consumer      |
 -------------------                 -------------------- 
        |                                   |
        v                                   |
1. Read data into buffer                    |
        |                                   |
        v                                   |
2. Send slice of buffer to chunkChannel     |
        |                                   |
        v                                   |
  ----------------                          |
 |  chunkChannel  |                         |
  ----------------                          |
        |                                   |
        v                                   |
3. Consume reads from slice                 |
                                            v
                                    4. Concurrent access
                                    (Data Race occurs)

Why the Data Race Occurs

The data race in this code arises because of how Go slices work and how memory is shared between goroutines when a slice is reused. To fully understand this, let’s break it down into two parts: the behavior of the buffer slice and the mechanics of how the race occurs. When you pass a slice like buffer[:n] to a function or channel, what you are really passing is the slice header which contains a reference to the slice’s underlying array. Any modifications to the slice or the underlying array will affect all other references to that slice.

buffer = [ a, b, c, d, e, f, g, h ]  





func publish(input []byte, output chan



If you send buffer[:n] to a channel, both the publish() function and any consumer goroutines will be accessing the same memory. During each iteration, the reader.Read(buffer) function reads up to 8 bytes from the input data into this buffer slice. After reading, the publisher sends buffer[:n] to the output channel, where n is the number of bytes read in the current iteration.

The problem here is that buffer is reused across iterations. Every time reader.Read() is called, it overwrites the data stored in buffer.

  • Iteration 1: publish() function reads the first 8 bytes into buffer and sends buffer[:n] (say, [a, b, c, d, e, f, g, h]) to the channel.
  • Iteration 2: The publish() function overwrites the buffer with the next 8 bytes, let’s say [i, j, k, l, m, n, o, p], and sends buffer[:n] again.

At this point, if one of the worker goroutines is still processing the first chunk, it is now reading stale or corrupted data because the buffer has been overwritten by the second chunk. Reusing a slice neans sharing the same memory.

How to fix the Data Race?

To avoid the race condition, we must ensure that each chunk of data sent to the channel has its own independent memory. This can be achieved by creating a new slice for each chunk and copying the data from the buffer to this new slice. The key fix is to copy the contents of the buffer into a new slice before sending it to the chunkChannel:

chunk := make([]byte, n)    // Step 1: Create a new slice with its own memory
copy(chunk, buffer[:n])     // Step 2: Copy data from buffer to the new slice
output 



Why this fix works? By creating a new slice (chunk) for each iteration, you ensure that each chunk has its own memory. This prevents the consumers from reading from the buffer that the publisher is still modifying. copy() function copies the contents of the buffer into the newly allocated slice (chunk). This decouples the memory used by each chunk from the buffer. Now, when the publisher reads new data into the buffer, it doesn’t affect the chunks that have already been sent to the channel.

 -------------------------             ------------------------ 
|  Publisher (New Memory) |           | Consumers (Read Copy)  |
|  [ a, b, c ] --> chunk1 |           |  Reading: chunk1       |
|  [ d, e, f ] --> chunk2 |           |  Reading: chunk2       |
 -------------------------             ------------------------ 
         ↑                                    ↑
        (1)                                  (2)
   Publisher Creates New Chunk          Consumers Read Safely

This solution works is that it breaks the connection between the publisher and the consumers by eliminating shared memory. Each consumer now works on its own copy of the data, which the publisher does not modify. Here’s how the modified publish() function looks:

func publish(input []byte, output chan



Summary

Slices Are Reference Types:
As mentioned earlier, Go slices are reference types, meaning they point to an underlying array. When you pass a slice to a channel or a function, you’re passing a reference to that array, not the data itself. This is why reusing a slice leads to a data race: multiple goroutines end up referencing and modifying the same memory.

Memory Allocation:
When we create a new slice with make([]byte, n), Go allocates a separate block of memory for that slice. This means the new slice (chunk) has its own backing array, independent of the buffer. By copying the data from buffer[:n] into chunk, we ensure that each chunk has its own private memory space.

Decoupling Memory:
By decoupling the memory of each chunk from the buffer, the publisher can continue to read new data into the buffer without affecting the chunks that have already been sent to the channel. Each chunk now has its own independent copy of the data, so the consumers can process the chunks without interference from the publisher.

Preventing Data Races:
The main source of the data race was the concurrent access to the shared buffer. By creating new slices and copying the data, we eliminate the shared memory, and each goroutine operates on its own data. This removes the possibility of a race condition because there’s no longer any contention over the same memory.

Conclusion

The core of the fix is simple but powerful: by ensuring that each chunk of data has its own memory, we eliminate the shared resource (the buffer) that was causing the data race. This is achieved by copying the data from the buffer into a new slice before sending it to the channel. With this approach, each consumer works on its own copy of the data, independent of the publisher’s actions, ensuring safe concurrent processing without race conditions. This method of decoupling shared memory is a fundamental strategy in concurrent programming. It prevents the unpredictable behavior caused by race conditions and ensures that your Go programs remain safe, predictable, and correct, even when multiple goroutines are accessing data concurrently.

It's that easy!

版本聲明 本文轉載於:https://dev.to/crusty0gphr/tricky-golang-interview-questions-part-7-data-race-753?1如有侵犯,請聯絡[email protected]刪除
最新教學 更多>
  • 解決MySQL插入Emoji時出現的\\"字符串值錯誤\\"異常
    解決MySQL插入Emoji時出現的\\"字符串值錯誤\\"異常
    Resolving Incorrect String Value Exception When Inserting EmojiWhen attempting to insert a string containing emoji characters into a MySQL database us...
    程式設計 發佈於2025-05-13
  • Java開發者如何保護數據庫憑證免受反編譯?
    Java開發者如何保護數據庫憑證免受反編譯?
    在java 在單獨的配置文件保護數據庫憑證的最有效方法中存儲憑據是將它們存儲在單獨的配置文件中。該文件可以在運行時加載,從而使登錄數據從編譯的二進製文件中遠離。 使用prevereness class import java.util.prefs.preferences; 公共類示例{ 首選...
    程式設計 發佈於2025-05-13
  • 在Ubuntu/linux上安裝mysql-python時,如何修復\“ mysql_config \”錯誤?
    在Ubuntu/linux上安裝mysql-python時,如何修復\“ mysql_config \”錯誤?
    mysql-python安裝錯誤:“ mysql_config找不到”“ 由於缺少MySQL開發庫而出現此錯誤。解決此問題,建議在Ubuntu上使用該分發的存儲庫。使用以下命令安裝Python-MysqldB: sudo apt-get安裝python-mysqldb sudo pip in...
    程式設計 發佈於2025-05-13
  • 在程序退出之前,我需要在C ++中明確刪除堆的堆分配嗎?
    在程序退出之前,我需要在C ++中明確刪除堆的堆分配嗎?
    在C中的顯式刪除 在C中的動態內存分配時,開發人員通常會想知道是否有必要在heap-procal extrable exit exit上進行手動調用“ delete”操作員,但開發人員通常會想知道是否需要手動調用“ delete”操作員。本文深入研究了這個主題。 在C主函數中,使用了動態分配變量(...
    程式設計 發佈於2025-05-13
  • 如何處理PHP文件系統功能中的UTF-8文件名?
    如何處理PHP文件系統功能中的UTF-8文件名?
    在PHP的Filesystem functions中處理UTF-8 FileNames 在使用PHP的MKDIR函數中含有UTF-8字符的文件很多flusf-8字符時,您可能會在Windows Explorer中遇到comploreer grounder grounder grounder gro...
    程式設計 發佈於2025-05-13
  • 如何檢查對像是否具有Python中的特定屬性?
    如何檢查對像是否具有Python中的特定屬性?
    方法來確定對象屬性存在尋求一種方法來驗證對像中特定屬性的存在。考慮以下示例,其中嘗試訪問不確定屬性會引起錯誤: >>> a = someClass() >>> A.property Trackback(最近的最新電話): 文件“ ”,第1行, attributeError:SomeClass實...
    程式設計 發佈於2025-05-13
  • 如何使用Depimal.parse()中的指數表示法中的數字?
    如何使用Depimal.parse()中的指數表示法中的數字?
    在嘗試使用Decimal.parse(“ 1.2345e-02”中的指數符號表示法表示的字符串時,您可能會遇到錯誤。這是因為默認解析方法無法識別指數符號。 成功解析這樣的字符串,您需要明確指定它代表浮點數。您可以使用numbersTyles.Float樣式進行此操作,如下所示:[&& && && ...
    程式設計 發佈於2025-05-13
  • 如何從Google API中檢索最新的jQuery庫?
    如何從Google API中檢索最新的jQuery庫?
    從Google APIS 問題中提供的jQuery URL是版本1.2.6。對於檢索最新版本,以前有一種使用特定版本編號的替代方法,它是使用以下語法:獲取最新版本:未壓縮)While these legacy URLs still remain in use, it is recommended ...
    程式設計 發佈於2025-05-13
  • 如何有效地選擇熊貓數據框中的列?
    如何有效地選擇熊貓數據框中的列?
    在處理數據操作任務時,在Pandas DataFrames 中選擇列時,選擇特定列的必要條件是必要的。在Pandas中,選擇列的各種選項。 選項1:使用列名 如果已知列索引,請使用ILOC函數選擇它們。請注意,python索引基於零。 df1 = df.iloc [:,0:2]#使用索引0和1 ...
    程式設計 發佈於2025-05-13
  • Python高效去除文本中HTML標籤方法
    Python高效去除文本中HTML標籤方法
    在Python中剝離HTML標籤,以獲取原始的文本表示Achieving Text-Only Extraction with Python's MLStripperTo streamline the stripping process, the Python standard librar...
    程式設計 發佈於2025-05-13
  • 如何解決AppEngine中“無法猜測文件類型,使用application/octet-stream...”錯誤?
    如何解決AppEngine中“無法猜測文件類型,使用application/octet-stream...”錯誤?
    appEngine靜態文件mime type override ,靜態文件處理程序有時可以覆蓋正確的mime類型,在錯誤消息中導致錯誤消息:“無法猜測mimeType for for file for file for [File]。 application/application/octet...
    程式設計 發佈於2025-05-13
  • 您如何在Laravel Blade模板中定義變量?
    您如何在Laravel Blade模板中定義變量?
    在Laravel Blade模板中使用Elegance 在blade模板中如何分配變量對於存儲以後使用的數據至關重要。在使用“ {{}}”分配變量的同時,它可能並不總是最優雅的解決方案。 幸運的是,Blade通過@php Directive提供了更優雅的方法: $ old_section =...
    程式設計 發佈於2025-05-13
  • 找到最大計數時,如何解決mySQL中的“組函數\”錯誤的“無效使用”?
    找到最大計數時,如何解決mySQL中的“組函數\”錯誤的“無效使用”?
    如何在mySQL中使用mySql 檢索最大計數,您可能會遇到一個問題,您可能會在嘗試使用以下命令:理解錯誤正確找到由名稱列分組的值的最大計數,請使用以下修改後的查詢: 計數(*)為c 來自EMP1 按名稱組 c desc訂購 限制1 查詢說明 select語句提取名稱列和每個名稱...
    程式設計 發佈於2025-05-13
  • 如何使用Regex在PHP中有效地提取括號內的文本
    如何使用Regex在PHP中有效地提取括號內的文本
    php:在括號內提取文本在處理括號內的文本時,找到最有效的解決方案是必不可少的。一種方法是利用PHP的字符串操作函數,如下所示: 作為替代 $ text ='忽略除此之外的一切(text)'; preg_match('#((。 &&& [Regex使用模式來搜索特...
    程式設計 發佈於2025-05-13
  • 為什麼不````''{margin:0; }`始終刪除CSS中的最高邊距?
    為什麼不````''{margin:0; }`始終刪除CSS中的最高邊距?
    在CSS 問題:不正確的代碼: 全球範圍將所有餘量重置為零,如提供的代碼所建議的,可能會導致意外的副作用。解決特定的保證金問題是更建議的。 例如,在提供的示例中,將以下代碼添加到CSS中,將解決餘量問題: body H1 { 保證金頂:-40px; } 此方法更精確,避免了由全局保證金重置...
    程式設計 發佈於2025-05-13

免責聲明: 提供的所有資源部分來自互聯網,如果有侵犯您的版權或其他權益,請說明詳細緣由並提供版權或權益證明然後發到郵箱:[email protected] 我們會在第一時間內為您處理。

Copyright© 2022 湘ICP备2022001581号-3