12:03 < nemo> say, I just had to run "hg censor" on the mercurial repository for Hedgewars
12:03 < nemo> Which got me to wondering... Does svn have something like that?
12:04 < nemo> There's some changesets at work I've been thinking I should probably censor just out of caution.
12:04 < nemo> but I was under the impression I'd have to do a full repo dump with filtering
12:04 < nemo> was wondering if it is possible to target a changeset like that
12:05 < nemo> For context, I ran: hg censor -r f9f71cccb5c3 share/hedgewars/Data/Fonts/DejaVuSans-Bold.ttf due to https://hg.hedgewars.org/hedgewars/rev/fbd385a1bcf4
12:25 < nemo> certainly would be nicer if there was a standard command like censor
13:13 < JulianF> nemo: The corresponding svn wishlist feature was always referred to as "obliterate"; it was never developed.
13:16 < JulianF> Since FSFS f7 (svn 1.9), it's become much more feasible to implement "obliterate" than it was before, but nobody is doing it.
13:20 < nemo> JulianF: mm. shame... unfortunately I'm not willing to add yet another FOSS project to learn. I'm already way behind on my Hedgewars contributions
13:30 < nemo> JulianF: so... that blog post. is it still something that one could do, and is there any chance of it breaking things?
14:05 < danielsh> nemo, the article is from 2011. It might have been accurate at the time but new versions have complications
14:05 < danielsh> The smaller one is that rep headers now include both md5 and sha1
14:06 < danielsh> The bigger one is that there's an out of band index mapping sha1 to file offsets, so that one needs to be updated as well
14:06 < danielsh> The final one is that there's no guarantee that the file contents will be stored in plaintext.
14:06 < danielsh> They might be stored as deltas and the deltas might be compressed.
14:12 < danielsh> 'svn obliterate' has been requested for 18 years now...
14:12 < danielsh> but yeah, as JulianF said, the FSFS format 7 (f7) makes it much more possible
14:12 < danielsh> In a nutshell, f6 and earlier use absolute file offsets all over the place, so it's not possible to replace a file contents ("rep") by a longer one
14:13 < danielsh> f7 adds a layer of indirection ("logical addressing")
14:40 < nemo> danielsh: BTW, WRT that f6/f7 issue - sounds like it was doable in f6 so long as the obliterate resulted in a shorter file? wouldn't that always be the case when stripping the contents of file in a revision?
04:53 < danielsh> nemo, That would be correct with s/shorter file/shorter rep/. A rep may be either the plaintext or a compressed delta against a previous revision.
04:53 < danielsh> (See subversion/libsvn_fs_fs/structure:525 for details on reps)
04:56 < danielsh> Also, even in <=f6 there was still something you could do: 1. create an empty commit 2. 'svnadmin freeze /path/to/repo /bin/sh' 3. Add a rep to the empty commit's revision file, adjusting offsets within that file as needed (manually) 4. Update the old rev file to reference the new rep in the *newer* revision, *without breaking offsets* (if need be, replace the sha1 in the rep header by 39 (sic) spaces to make room)
04:57 < danielsh> That should work... but it's really black magic
04:58 < danielsh> authz and (dump/load or svnsync) remains the official recommendation. :)
04:58 < danielsh> sorry, s/rep header/node-rev header/
04:59 < danielsh> node-rev header is the rfc822 thingy in revision files.
08:16 < nemo> danielsh: yeah, agree that svndumpfilter or authz are a lot more straightforward, was just puzzled why "svn obliterate" had to wait until the new filesystem if smaller files were "safe" in past
08:17 < nemo> or was it just that it was too complicated a mess to even attempt at the time ☺
17:19 < danielsh> nemo, To be clear, I was describing how I'd hack this "by hand" in an emergency. That wasn't a design proposal :)
17:20 < danielsh> nemo, That said... the part about making a node-rev reference a rep in a "future" revision is fairly robust
17:21 < danielsh> The difficult part in all this is (1) The need to edit a file without breaking offsets (2) The fact that the entire system is heavily designed around the history being immutable
17:22 < danielsh> anything that changes history *silently* invalidates caches everywhere (in-process caches, http proxies, .svn/pristine/ ...)
17:25 < danielsh> Oh, I forgot one
17:25 < danielsh> if you'd like to change a version of a file other than the last, you need to remember that later versions are stored as deltas against it
17:26 < danielsh> in this context "last" is in the sense of `svn log -q file | grep ^r | head -1`, regardless of the global revnums
17:26 < danielsh> and copies are indexed one way only (you can go from a file to where it was copied *from*, but not to where it was copied *to*)
08:33 < nemo> danielsh: WRT deltas, presumably there are checkpoints in the system right?
08:33 < nemo> danielsh: couldn't force a checkpoint at the censor location? there'd be a size cost, but I guess censoring doesn't happen that often
08:34 < nemo> danielsh: normally delta systems have those for performance
09:03 < danielsh> nemo, Yes, there are deltas. https://svn.apache.org/repos/asf/subversion/trunk/notes/skip-deltas
09:05 < danielsh> There's no need to "force" a checkpoint, though, because DELTA reps reference their base rep directly, so that reference remains invalid even after the node-rev no longer points to that rep
09:06 < danielsh> The problem I was getting at is that if you commit the secret in r100, edit the file in r110, and revert r100 in r120, then you'd still have the secret in r110
09:06 < danielsh> (though in principle, you could use diff3 to auto-reconstruct in r110, that might be too much automation)
09:12 < nemo> hm. I see
09:12 < nemo> well, hg censor must have that issue too
09:12 < nemo> hg censor's only requirement is that you specify a changeset and file you want to censor
09:13 < nemo> if the forbidden content is in other revisions, it's your responsibility to catch that
09:13 < nemo> I guess your point is the delta would have the secret even if it isn't in the changeset for 110
09:14 < danielsh> Suppose somebody does 'svn cat file@r110'.
09:14 < danielsh> svn would open the r110 rev-file and <...> and seek to the rep
09:14 < danielsh> the first bytes it reads would be "DELTA 100 n m" where n and m are some integers
09:15 < danielsh> so it would read the delta, and then it would open the r100 rev file in order to get the file the delta is relative to (the "base")
09:15 < danielsh> which is the file you censored.
09:15 < nemo> huh... so that's... good then?