Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slow 2nd+ backup with winbind as nss provider #5295

Open
biji opened this issue Aug 12, 2020 · 12 comments
Open

slow 2nd+ backup with winbind as nss provider #5295

biji opened this issue Aug 12, 2020 · 12 comments
Milestone

Comments

@biji
Copy link

@biji biji commented Aug 12, 2020

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

Issue

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

client: borg 1.1.13

server: borg 1.1.13

Operating system (distribution) and version.

client: Centos 6

server: Centos 7

Hardware / network configuration, and filesystems used.

Backup via internet

How much data is handled by borg?

450GB

302366 files

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg create -s -p -C lzma --list --filter=AME --exclude-from=/root/exclude-backup.txt --files-cache=ctime,size borg@server-ip:backup/pdc::pdc-{now} /home/data/

Describe the problem you're observing.

Second, third backups, etc. after successful backup is slow

First backup takes 23 hours, next backups still takes >20 hours

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

I have tried to remove /root/.cache , but still slow

Thanks

Include any warning/errors/backtraces from the system logs

none

@ThomasWaldmann
Copy link
Member

@ThomasWaldmann ThomasWaldmann commented Aug 12, 2020

Easiest explanation is that something frequently/regularly changes the ctime of many fs items, e.g. chmod/chown -R ....
Or, that the full path is changing (does not look like, you wrote that you use /home/data/).

Of course, after removing borg's cache, next backup is expected to be slow, but the one after that should then be faster again.

Another reason why you maybe do not see a huge speedup could be that you have lots of small files, which are in general less efficient to deal with. In that case 20h < 23h is maybe all the speedup you'll get.

@biji
Copy link
Author

@biji biji commented Aug 13, 2020

Yes, there are lots of small files: roaming storage for windows users. Any suggestion for speeding up backup lots of small files?
I'll try again using mtime,size . Ok.. using fresh cache (will do backup twice)

Thanks

@biji
Copy link
Author

@biji biji commented Aug 16, 2020

Hi, i've tried 3 backups with clear borg's cache, using files-cache=ctime,size again
Excluding some files, now files around 130.000

Time required for first backup is: 23 hours , next backups is 17 hours , and new files is not many

Is there any way to make next backups faster than 17 hours?
Probably xattr causes slow down while enumerating files? I checked the mount point is using xattr option for samba
Or maybe python dictionary lookup is that slow

Archive name: pdc-2020-08-13
Archive fingerprint: 9c72ae92713cf9313f85393f57d49edd7dd78156007666fca081c2b7161eee54
Comment:
Hostname: pdc
Username: root
Time (start): Thu, 2020-08-13 21:03:48
Time (end): Fri, 2020-08-14 20:20:07
Duration: 23 hours 16 minutes 18.47 seconds
Number of files: 131817
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              236.07 GB            173.87 GB             21.28 MB
All archives:                4.54 TB              3.83 TB            176.91 GB

                       Unique chunks         Total chunks
Chunk index:                  334972              3362062
Archive name: pdc-2020-08-14
Archive fingerprint: c04886c76b7f0468884eee5e014c65d914e16ae747a3103129ebcb97c45c6243
Comment:
Hostname: pdc
Username: root
Time (start): Fri, 2020-08-14 21:18:05
Time (end): Sat, 2020-08-15 15:07:00
Duration: 17 hours 48 minutes 54.85 seconds
Number of files: 132209
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              237.16 GB            174.94 GB             20.16 MB
All archives:                4.54 TB              3.83 TB            176.91 GB

                       Unique chunks         Total chunks
Chunk index:                  334972              3362062
Archive name: pdc-2020-08-15
Archive fingerprint: 6ca991b167276e2f76b17b2d1284172faae0cee9bf5de6c7b419b30add9ff2eb
Comment:
Hostname: pdc
Username: root
Time (start): Sat, 2020-08-15 23:24:51
Time (end): Sun, 2020-08-16 16:40:31
Duration: 17 hours 15 minutes 39.20 seconds
Number of files: 132369
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              237.31 GB            175.09 GB            128.19 MB
All archives:                4.54 TB              3.83 TB            176.91 GB

                       Unique chunks         Total chunks
Chunk index:                  334972              3362062

@ThomasWaldmann
Copy link
Member

@ThomasWaldmann ThomasWaldmann commented Aug 16, 2020

Something is "wrong" in your setup.

Here are some stats from one of my backups:

Archive name: server-2020-08-15-23:42:01
Archive fingerprint: ...
Time (start): Sat, 2020-08-15 23:42:11
Time (end):   Sat, 2020-08-15 23:45:23
Duration: 3 minutes 11.94 seconds
Number of files: 802035
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              669.61 GB            619.04 GB              4.72 MB
All archives:               86.02 TB             78.02 TB              1.14 TB

                       Unique chunks         Total chunks
Chunk index:                 7919874            241722469

So you see, for this backup, it took 3 minutes to process 800k files (most of them did not have changes, only a few new/with changes resulting in a deduped backup size of ~5MB).

Setup:

borg client:

  • server-grade machine with a fast Xeon CPU and lots of RAM.
  • the files are on HDD (not SSD) and the filesystem is ZFS (2-disk mirror).
  • ubuntu linux 20.04

borg repo server:

  • remote machine via ssh, ~40Mbit/s WAN connection
  • relatively old/slow machine, AMD Turion CPU, 4GB RAM
  • borg repo files on HDD, fs is ZFS (2-disk mirror).
  • ubuntu linux 20.04
@ThomasWaldmann
Copy link
Member

@ThomasWaldmann ThomasWaldmann commented Aug 16, 2020

Stuff to check:

  • can you reproduce on something more recent than centos 6 (which is rather old). maybe try with a newer client and same repo server.

  • are you running out of (real) memory while borg is running (check client and server)? how much paging ("swapping") activity do you see?

  • you already run borg with --list --filter AME, did you look into the output produced by that? when looking at the status chars at the beginning of the lines, you should see how many files borg considers as A)dded or M)odified, also check for any E)rrors. if it considers too much as A/M although it was not "really" (not intentionally) added/modified, there is something wrong and the files-cache thus can't work efficiently.

  • is there other load on the client or on the server that maybe slows down borg? e.g. if a HDD is already rather busy seeking due to other load, borg performance will be impacted.

  • are there serious network issues slowing down borg? minor network issues (like minor latency or throughput issues) should not be an issue here - as the deduplication works quite good for you, there is not much to transfer to the repo.

  • the input files, are they local or are they mounted via some network filesystem? if the latter, check performance of that network fs.

@ThomasWaldmann
Copy link
Member

@ThomasWaldmann ThomasWaldmann commented Aug 16, 2020

You did not specify where you got borg 1.1.13 from. Is it the binary from github releases?

If yes, maybe also try an older binary on the centos6 machine.

borg 1.1.11 was built on (old) debian wheezy and might work better for old linux dists. See #5220.
borg 1.1.13 was built on (more recent) debian jessie.

This only applies to the "fat binaries" offered on github releases (made with pyinstaller).

If you installed borg from source (including via pip), there is no reason to use an older borg release.

@biji
Copy link
Author

@biji biji commented Aug 16, 2020

Wow thats very fast, 3 minutes.. Yes i think so something wrong with my setup 😭

Borg is compiled using pip, python is 3.6.10 , also compiled from source

Client is virtual machine, with 4 GB allocated. Load is below 1. Using iotop i can see no heavy read/write

I will check again.. Thanks

@biji
Copy link
Author

@biji biji commented Aug 18, 2020

Hi.. I think I found the problem

When I stop samba while backup, borg skips many files very fast, when I start samba again, backup slow again . My system is using winbind as nss provider to resolve file owner

Using borg diff I found many lines like this:

[DOMAIN\myuser:users -> None:users]

If ext4 acl is already backup by borg, I think it is safe

Then I try backup again with --numeric-owner and samba started, but backup is slow not like when i stop samba. I hope backup is fast when using --numeric-owner , without stopping samba

Thanks

@ThomasWaldmann
Copy link
Member

@ThomasWaldmann ThomasWaldmann commented Aug 18, 2020

Interesting. So, which of --files-cache=ctime,size,inode does this trigger?

@biji
Copy link
Author

@biji biji commented Aug 19, 2020

I'm using --files-cache=ctime,size

@ThomasWaldmann
Copy link
Member

@ThomasWaldmann ThomasWaldmann commented Aug 19, 2020

Hmm, maybe winbind does not produce cache misses (at least not if you always have it active), but it is just winbind being very slow compared to a normal stat() against the local filesystem?

@ThomasWaldmann ThomasWaldmann changed the title Next backups is slow like first backup slow 2nd+ backup with winbind as nss provider Oct 11, 2020
@ThomasWaldmann
Copy link
Member

@ThomasWaldmann ThomasWaldmann commented Oct 11, 2020

TODO: check if borg create --numeric-ids ... does uid/gid -> name lookups.

@ThomasWaldmann ThomasWaldmann added this to the 1.1.15 milestone Oct 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.