Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does gh-ost guarantee no race between chunk copying & binlog processing #1113

Open
wangzhe711 opened this issue Mar 31, 2022 · 2 comments
Open

Comments

@wangzhe711
Copy link

@wangzhe711 wangzhe711 commented Mar 31, 2022

Hi folks,

Thank you all for writing this awesome tool! There's one scenario I don't quite understand. How do we avoid following situation:

  1. A row in the original table has value (foo, 1).
  2. A migration run starts: (foo, 1) is read by the chunk copying thread, but before it's written into the ghost table, 3) & 4) happened.
  3. A user of the table updates the row and changes its value to (bar, 2)
  4. Binlog of the update is processed, (bar, 2) is not written into the ghost table because of the query condition.
  5. the execution of 2) is now resumed, and (foo, 1) is written into the ghost table.

The issue of this scenario is that after 5), the row has incorrect value, and it's not fixable by future binlog entries (because it won't match the update query condition)

Best,
Zhe

@wangzhe711 wangzhe711 changed the title How does gh-ost guarantee chunk copying won't overwrite the latest value from binlog? How does gh-ost guarantee no race between chunk copying & binlog processing Mar 31, 2022
@wangzhanbing
Copy link

@wangzhanbing wangzhanbing commented Apr 12, 2022

migrating data from origin table to ghost table uses insert-select statement, like this

insert /* gh-ost %s.%s */ ignore into %s.%s (%s)
      (select %s from %s.%s force index (%s)
        where (%s and %s) lock in share mode
      )

the sql is executed atomic.

so no sql is executed between 2-5

Best,
Zhanbing

@dragonly
Copy link

@dragonly dragonly commented Apr 21, 2022

@wangzhanbing Thanks for your reply! But I still have some questions about the ordering.

According to

gh-ost/go/logic/migrator.go

Lines 1267 to 1293 in 8f361f6

// We give higher priority to event processing, then secondary priority to
// rowcopy
select {
case eventStruct := <-this.applyEventsQueue:
{
if err := this.onApplyEventStruct(eventStruct); err != nil {
return err
}
}
default:
{
select {
case copyRowsFunc := <-this.copyRowsQueue:
{
copyRowsStartTime := time.Now()
// Retries are handled within the copyRowsFunc
if err := copyRowsFunc(); err != nil {
return this.migrationContext.Log.Errore(err)
}
if niceRatio := this.migrationContext.GetNiceRatio(); niceRatio > 0 {
copyRowsDuration := time.Since(copyRowsStartTime)
sleepTimeNanosecondFloat64 := niceRatio * float64(copyRowsDuration.Nanoseconds())
sleepTime := time.Duration(time.Duration(int64(sleepTimeNanosecondFloat64)) * time.Nanosecond)
time.Sleep(sleepTime)
}
}
default:
, the applyEventsQueue takes precedence over copyRowsQueue, so the update (foo, 2) could actually be processed earlier through the binlog applier, and then the problem will be as described by @wangzhe711.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants