planet
January 16, 2012
Darcs News
darcs weekly news #90
January 16, 2012 11:47 AM UTC
News and discussions
- Ganesh Sittampalam uploaded the second beta of darcs 2.8:
- Zooko O'Whielacronx called for a new maintainer for darcsver:
- Eric Kow summed up the recent issues of Darcs with regards to SSH:
Issues resolved in the last week (2)
- issue845 Eric Kow
- issue2090 Eric Kow
Patches applied in the last week (44)
See
darcs wiki entry for details.
December 30, 2011
Darcs News
darcs weekly news #89
December 30, 2011 01:36 PM UTC
News and discussions
- Owen Stephens summarized his work on the darcs-bridge summer of code and did some followup work on it:
- Eric Kow introduced an 'announce' topic filter on the mailing list to provide a low-traffic announcement subset of the list:
- Mark Stosberg announced a donation of $500 on behalf of Summersault. Remember you too can donate by going on the donations page of darcs.net:
- Interested by participating in the next Darcs sprint in March? Add yourself to the wiki page:
Issues resolved in the last week (1)
- issue1705 Eric Kow
Patches applied in the last week (27)
See
darcs wiki entry for details.
November 22, 2011
Owen Stephens
I've recently been checking-in with my GSoC darcs-bridge work. I've re-worked the horrible state-file format I originally used, and have been making various other tweaks to the code.
Unfortunately, I've come across a few bugs (thanks to Brent Yorgey and Jesper Reenburg for reporting some initial problems they've hit), and have been trying to iron them out. With that in mind, I've added a bug tracker "Topic" for darcs-bridge on the darcs.net bug tracker: Bug tracker.
Hopefully, I'll be able to sort out the remaining bugs, and then move onto a tidy-up and refactor of the existing code.
October 08, 2011
Stefan Wehr
DPM: Darcs Patch Manager
October 08, 2011 09:22 PM UTC
I’ve just released the initial version of DPM
on Hackage!
The Darcs Patch Manager (DPM for short) is a tool that simplifies working
with the revision control system darcs. It is most
effective when used in an environment where developers do not push their
patches directly to the main repository but where patches undergo a
reviewing process before they are actually applied.
Here is a short story that illustrates how would use the DPM
in such sitations.
Suppose that Dave Developer implements a very cool feature. After
polishing his patch, Dave uses darcs send to send the patch:
$ darcs send host:MAIN_REPO
Tue Mar 16 16:55:09 CET 2010 Dave Developer <dave@example.com>
* very cool feature
Shall I send this patch? (1/1) [ynWsfvplxdaqjk], or ? for help: y
Successfully sent patch bundle to: patches@example.com
After the patch has been sent to the address patches@example.com, DPM
comes into play. For this example, we assume that mail devivery for
patches@example.com is handled by some mailfilter program such as maildrop
(http://www.courier-mta.org/maildrop/) or procmail
(http://www.procmail.org/). The task of the mailfilter program is the add
all patches sent to patches@example.com to the DPM database. This is
achieved with the DPM command add:
$ dpm add –help
add: Put the given patch bundles under DPM’s control (use ‘-’ to read from stdin).
Usage: add FILE…
Command options:
Global options:
-r DIR –repo-dir=DIR directory of the darcs repository
-s DIR –storage-dir=DIR directory for storing DPM data
-v –verbose be verbose
–debug output debug messages
–batch run in batch mode
–no-colors do not use colors when printing text
–user=USER current user
–from=EMAIL_ADDRESS from address for emails
–review-address=EMAIL_ADDRESS email address for sending reviews
-h, -? –help display this help message
Now suppose that Dave’s patch is in the DPM database. A reviewer, call him
Richard Reviewer, uses the DPM command list to see what patches are
available in this database:
$ dpm list –help
list: List the patches matching the given query.
Query ::= Query ‘ + ‘ Query — logical OR
| Query ‘ ‘ Query — logical AND
| ‘^’ Query — logical NOT
| ‘{‘ Query ‘}’ — grouping
| ‘:’ Special
| String
Special is one of "undecided", "rejected", "obsolete", "applied",
"reviewed", "open", or "closed", and String is an arbitrary sequence
of non-whitespace characters not starting with ‘^’, ‘{‘, ‘}’, ‘+’, or ‘:’.
If no query is given, DPM lists all open patch groups.
Usage: list QUERY …
Command options:
Global options:
-r DIR –repo-dir=DIR directory of the darcs repository
-s DIR –storage-dir=DIR directory for storing DPM data
-v –verbose be verbose
–debug output debug messages
–batch run in batch mode
–no-colors do not use colors when printing text
–user=USER current user
–from=EMAIL_ADDRESS from address for emails
–review-address=EMAIL_ADDRESS email address for sending reviews
-h, -? –help display this help message
In our example, the output of the list command might look as follows:
$ dpm -r MAIN_REPO -s DPM_DB list
very cool feature [State: OPEN]
7861 Tue Mar 16 17:20:45 2010 Dave Devloper <dave@example.com>
State: UNDECIDED, Reviewed: no
added
some other patch [State: OPEN]
7631 Tue Mar 16 13:15:20 2010 Eric E. <eric@example.com>
State: REJECTED, Reviewed: yes
added
…
(The -r option specifies a directory containing the DPM
database. Initially, you simply create an empty directory. The -s option
specifies the path to the darcs repository in question.)
DPM groups all patches with the same name inside a patch group. Patch
groups allow keeping track of multiple revisions of the same patch. In the
example, the patch group of name very cool feature has only a single
member, which is the patch Dave just created. The patch is identified by a
unique suffix of its hash (7861 in the example). The output of the list
command further tells us that no reviewer decided yet what to do with the
patch (its in state UNDECIDED).
At this point, Richard Reviewer reviews Dave’s patch. During the review,
he detects a minor bug so he rejects the patch:
$ dpm -r MAIN_REPO -s DPM_DB review 7861
Reviewing patch 7861
Starting editor on DPM_DB/reviews/2010-03-16_7861_swehr_24166.dpatch
<inspect patch in editor>
Mark patch 7861 as reviewed? [Y/n] y
Patch 7861 is in state UNDECIDED, reject this patch? [y/N] y
Enter a comment: one minor bug
Marked patch 7861 as reviewed
Moved patch 7861 to REJECTED state
Send review to Dave Developer <dave@example.com>? [Y/n] y
Mail sent successfully.
Now Dave Developer receives an email stating that has patch has been
rejected. The email also contains the full review so that Dave sees why
the patch has been rejected. Thus, Dave starts fixing the bug, does an
amend-record of the patch, and finally sends the patch again.
(Alternatively, he could also create a new patch with exactly the same name as the original patch.)
$ darcs send MAIN_REPO
Tue Mar 16 16:55:09 CET 2010 Dave Developer <dave@example.com>
* very cool feature
Shall I send this patch? (1/1) [ynWsfvplxdaqjk], or ? for help: y
Successfully sent patch bundle to: patches@example.com
Once the email is received, the improved patch is added to the DPM
database. The output of the list command now looks like this:
$ dpm -r MAIN_REPO -s DPM_DB list
very cool feature [State: OPEN]
2481 Tue Mar 16 17:50:23 2010 Dave Devloper <dave@example.com>
State: UNDECIDED, Reviewed: no
added
7861 Tue Mar 16 17:20:45 2010 Dave Devloper <dave@example.com>
State: REJECTED, Reviewed: yes
marked as rejected: one minor bug
some other patch [State: OPEN]
7631 Tue Mar 16 13:15:20 2010 Eric E. <eric@example.com>
State: REJECTED, Reviewed: yes
added
…
The patch 2481 is the improved revision of the original patch 7861. It is
in the same group as the original patch because both patches have the same
name. Richard Reviewer reviews the improved patch and has no complains
anymore:
$ dpm -r MAIN_REPO -s DPM_DB review 2481
Reviewing patch 2481
Starting editor on DPM_DB/reviews/2010-03-16_2481_swehr_876102.dpatch
<inspect patch in editor>
Mark patch 2481 as reviewed? [Y/n] y
Patch 2481 is in state UNDECIDED, reject this patch? [y/N] n
Enter a comment: ok
Marked patch 2481 as reviewed
Send review to Dave Developer <dave@example.com>? [y/N] n
At this point, Richard Reviewer applies the patch with the very cool
feature:
$ dpm apply 2481
About to apply patch 2481
Entering DPM’s dumb (aka interactive) apply command.
Future will hopefully bring more intelligence.
Instructions:
=============
– Press ‘n’ until you reach
Tue Mar 16 17:50:23 2010 Dave Devloper <dave@example.com>
* very cool feature
(Hash: 20100316162041-c71f4-871aedab8f4dd3bd042b9188f1496011c7dd2481)
– Press ‘y’ once
– Press ‘d’
Tue Mar 16 17:50:23 2010 Dave Devloper <dave@example.com>
* very cool feature
Shall I apply this patch? (1/1) [ynWsfvplxdaqjk], or ? for help: y
Finished applying…
Patch 2481 applied successfully
Send notification to author Dave Developer <dave@example.com> of patch 2481? [Y/n] y
Mail sent successfully.
Applying a patch closes the corresponding patch group. Per default, the list command
doesn’t display closed patch groups, but we can force it to do so with the :closed
query:
$ dpm list :closed
very cool feature [State: CLOSED]
2481 Tue Mar 16 17:50:23 2010 Dave Devloper <dave@example.com>
State: APPLIED, Reviewed: yes
marked as applied: -
7861 Tue Mar 16 17:20:45 2010 Dave Devloper <dave@example.com>
State: REJECTED, Reviewed: yes
marked as rejected: one minor bug
…
Author: Stefan Wehr
September 14, 2011
the Patch-Tag blog
$ shuf /usr/share/dict/words | head -4 | tr '\n' ' '; echo
blotches rarity's unwieldier disarrange
If I got the math right, I believe we have here an easy MEMORABLE passphrase generator that should be relatively secure even against a distributed botnet password crack attack. Specifically, the password should resist a 1K botnet attack for 39 years, or a 1M botnet attack for 14 days.
Note that this is more secure than, say, a passphrase based on a lyric from a favorite song or some snip of text from a blog post, because the passphrase here is random.
Still a lot more memorable than a string of gobbledygook text.
Source / Explanation from my bashrc file:
thartman@pampasgrass:~/shellenv/shared>type thartman_password_gen
thartman_password_gen is a function
thartman_password_gen ()
{
echo ' comic explaining password strength in an intuitive way: http://xkcd.com/936
wc -l /usr/dict/words => ~ 100k
log 2 100k => ~ 16
echo 16 * 4 -> 64
distributed password cracking with a botnet: http://www.turnkeylinux.org/blog/tklbam-backup-passphrase
echo search 42 bits with 1000 computers => ~ 5 minutes
echo 64 - 42 => 22
echo search 64 bits with 1k botnet => (2^22 * 5) / (60*24 * 365) => ~ 39 years
secure password: ';
shuf /usr/share/dict/words | head -4 | tr '\n' ' '; echo
}
September 03, 2011
Owen Stephens
GSoC: Darcs Bridge – Results
September 03, 2011 01:35 AM UTC
So, that's it, GSoC is over! Sorry I've been slack on the updates, I should have definitely kept on top of them better...
Anyway, it's been an interesting summer, frustrating at (most of the! :-)) times, confusing and hard work... sounds like fun? Well, it was! I've learnt a great deal about Darcs, Haskell, VCSs and working alone (and that I need to force myself to write blog updates in a more timely fashion!) and I've had an ultimately very rewarding experience.
So, what have I actually achieved? (If you want to jump right into how to use the bridge, check out the wiki page I've created here: http://wiki.darcs.net/DarcsBridgeUsage)
I suppose the easiest way to judge this is to take the list of pre-project targets and see where we are with them:
- Allow automatic incremental conversion: Check!
This definitely works well.
- Create a mapping/encoding of multi-head repositories: Check!
Using a special tagging-scheme as described on the wiki, we are able to import (and re-export) branches/merges, mapping individual Git branches onto separate Darcs repositories.
- Import and export foreign patch formats generated by VCS “send” commands: Check 1/2!
This works for applying Git patches to a Darcs repository, but the reverse operation turned out to be difficult to handle correctly. (We're not convinced that there are particularly compelling use-cases either.) If there was particular demand, I would hope that with a bit of further effort/thinking it would be possible to code up a solution.
- Solve the problem of efficiently translating to/from Darcs patches: Check!
This goal is a bit of a strange one; given that we've decided to do complete translations of the repositories, rather than on-the-fly conversions, we've basically side-stepped any after-conversion performance problems, since the generated repositories are repositories-proper and do not have any translation-associated performance problems.
- “Roundtripping”, whereby information may be lost when converting to and back from another repository format. (Particularly), Translation to and from Darcs specific patch-types e.g. replace patches: Check (mostly)!
Exporting replace patches can be recovered, assuming the replace primitives are at the "start" or "end" of a patch, rather than in the middle (intermediate states cannot be recovered/transmitted, hence the restriction - see the wiki page for more). Currently (as described in the wiki page) Darcs conflicts that are resolved using >1 patch (assuming the tagging-scheme is used) will be coalesced into a single resolution patch, upon exporting. In practice I imagine this isn't really a problem, but that said I'm hopeful it'd not be particularly difficult to fix.
- The cycle problem, in the presence of multiple bridges.
N/A Again, due to choosing one-off translations I don't think that multiple bridges will cause any issues.
- Create a consistent mapping between Darcs2 and Darcs1 format repositories.
Unchecked. I never got around to this feature; given that there exists a tool to do one-time conversions to Darcs-2 I'm not particularly concerned.
So, all-in-all, pretty good! There are a few things I'd like to get tidied up though...
TODOs:
- Further investigation with Darcs/Haskell gurus (Ganesh and Petr, I'm looking at you!) as to how I can improve the performance and resource-usage of darcs-bridge. Currently, exporting Darcs repositories is too memory-hungry, something I definitely want to improve.
- Attempting to import the Git source Git repo manages to trip Darcs itself up; there is a corner case of the patch-theory implementation in current Darcs: http://bugs.darcs.net/issue1520 that the changes/conflicts/resolutions in the Git repo manage to find. I suspect that others who are much more clever than myself have spent hours looking at the problem with no luck (since it's still not fixed)... I wonder if a fresh pair of eyes will spot anything or if the code's too opaque?
- Release! Once the performance has been tweaked a bit more, it'll be time to actually release darcs-bridge to the wider-world! Maintainership and bug-reporting and naming (currently the code/packaging is all centred around the darcs-fastconvert name, I think darcs-bridge should be separate to signify its improvements/differences) are some things I can think of that need discussion beforehand.
So, what next, Darcs-wise? I think I want to look into understanding and hopefully continuing Petr's work on the next-generation primitive-patches, particularly, how they fit into a repo-model (things like conflicts,duplicates and the issues their design throws up).
And finally...
A big thank you to Google for running the summer-of-code programme, Haskell.org for accepting my project (gaining a very keen Haskeller and Darcs-hacker in the process :-)) and the #darcs inhabitants: kowey, gh, mornfall sm and iago to name a few and particularly Ganesh for his advice throughout the project; all three groups were invaluable and this project couldn't have gone ahead without any one of them.
August 26, 2011
Petr Rockai
soc reloaded: outcomes
August 26, 2011 11:40 AM UTC
This blog is kind of a final report for the summer. I had a progress report
drafted for about two weeks, so let me paste that here just for the record. To
read about the actual results, please skip to the “State of the Code” section
below.
The Last Progress Report
This is the second (and last) progress update for this summer of code
project. It was written something like a week before the pencil’s down, but I
got disheartened and instead of finishing and posting, I went on to code some
more. Here it goes…
Since my last report, I have decided to turn somewhat more radical again. The
original plan was to stick with the darcs codebase and do most (all) of the
work within that, based primarily on writing tests for the testsuite and not
exposing anything of the new functionality in a user-visible fashion. I changed
my mind about this. The main reason was that the test environment, as it is,
makes certain properties hard to express: a typical test-suite works with
assertions (HUnit) and invariants (QC). In this environment, expressing ideas
like “the displayed patches are aesthetically pleasing” or “the files in the
repository have reasonable shape” is impractical at best.
An alternative would have been to make myself a playground using the darcs
library to expose the new code. But the fact is, our current codebase is
entrenched in all kinds of legacy issues, like handling filenames and
duplicated code. It makes the experimenter’s life harder than necessary, and it
also involves rebuilding a whole lot of code that I never use, over and over.
All in all, I made a somewhat bold decision to cut everything that lived under
Darcs.Patch (plus a few dependencies, as few as possible) into a new library,
which I named patchlib, in the best tradition of cmdlib, pathlib and
fslib. At that point, I also removed custom file path handling from that
portion of code, removed the use of a custom Printer (a pretty-printer
implementation) module and a made few other incompatible changes.
Of course, the testing code went along. The net result, at least for me, was an
ability to build and test a much smaller piece of self-contained code. It also
allowed me to experiment with APIs a bit, where those were used all over darcs,
which made it, within the big codebase, impossible to advance without expending
disproportionate amount of work on every change. Of course, part of that will
be paid back when we decide to port darcs over to use patchlib.
I originally planned this report for the start of this week, but then I got
caught in a big refactor of the ApplyMonad/Apply classes (again). The refactor
was triggered by the need to pretty-print patches, which is not a completely
easy task (made more complicated by the fact that UUIDs are meaningless for the
user, so formatting patches without context is essentially useless
now). Anyway, I am now much happier with how the ApplyMonad class looks (the
ApplyMonadBase thing was genuinely hideous… good riddance). As a net result
the ApplyMonad class and, even more importantly, the ApplyMonad transformer
(used in applyTo{Tree,State} among other things) is substantially easier to
use on the client side (while maybe very slightly harder on the provider side,
it is also much clearer, IMHO). Overall win.
As for formatting and summarising patches, I have created a new Display class
(I plan to nuke the existing viewing classes more or less, when I manage to
make meaningful Display instances for V1 Prim). The API lets you format patches
based on their ending or starting context. Since bits of the patches need to be
fetched from the hashed store, the display needs to run in a LoadMonad. Since
any reasonable patch formatter also needs to pass state around (and since the
actual type of state passed around will be different for different Prim
implementations, we hide the state in a type family of monad transformers; this
also opens the option to use something else than StateT when appropriate).
(This is the end of the “progress” post. The remainder more or less describes
the end state of the project.)
State of the Code
You can look at the code in my darcs repositories. Specifically, to play
around with a “demo”, you need pathlib, fslib, patchlib, cmdlib and
gorsvet. A bit more about the individual libraries:
pathlib is small library for handling file paths; it uses strict
bytestring representation, and adds a certain amount of type safety and a
few form invariants
fslib is the successor of hashed-storage; it deals with accessing the
filesystem in an efficient manner, with good support for hashing files and
using the hashes for efficient comparisons
patchlib, as outlined above, came into existence as a “fork” of the
Darcs.Patch hierarchy, add and take some; the code is self-contained, and
has a set of QC/HUnit tests, although these definitely need extending to
cover more of the functionality (and some of the functionality needs to be
removed)
cmdlib is my commandline parsing library, and is needed by the test client
gorsvet (see next point)
gorsvet is an experimental client using version 3 primitive patches, and a
very simple hashed repository format; its main purpose is to demo the
implementation of V3 prims, in addition to existing QC coverage;
About patchlib
One of the main problems of the darcs codebase today is insufficient
modularity, and patchlib is an experiment in an attempt to address that
concern. Efforts to bring about significant improvement of the situation from
“inside” (by restructuring existing code) have to date failed. Even though
there have been local improvements, the overall problem stubbornly persists.
Hand in hand with modularity problems come issues with unclear and
underspecified (both internal end external) APIs. Since the separation between
different components of darcs is blurry at best, the pressure to introduce
clean, testable interfaces is minimal. The external library, on the other hand,
is forced to put up a presentable façade. Luckily, the Patch subhierarchy in
darcs is, compared to remainder of darcs, in a fairly good shape in this
respect.
Eventually, patchlib should provide leverage to work with darcs(-style)
patches, including at least:
- Implementation of primitive patches, both version 1 (as used by darcs 1 and
2), compatible with existing repositories, and version 3 (with better
semantics and more efficient representation; subject of this SoC project).
- Implementation of the “real” patches, in version 1 (as used by darcs 1) and
version 2 (as used by darcs 2), and at some point, version 3 (of as yet
unspecified properties).
- Implementation of PatchInfo and “named” patches, which implement changesets
in the darcs sense, and allow tracking their metadata. With PatchInfo (and
to a limited extent, patch implementations themselves) goes a set of
matchers and support code for interactive dependency-aware patch selection.
With a homogenous set of APIs, mostly mediated by type classes and appropriate
instances, to:
- Create primitive patches (mainly through diffing, but also by
version-specific direct construction). The most important type class is
Diff, defined in Data.Patch.Diff. The interface allows multiple
equivalent representations of a single change, intended to provide an
interface for heuristic detection of high-level patch types like hunk move
or token replace.
- Create real and named patches by building them from constituent primitives.
- Store and load all types of patches. This role used to be filled by the
ShowPatchBasic and ShowPatch classes, but is being superseded by Store
and Load instances (the classes are provided by fslib for generic
hash-based stores).
- Format (and summarise) patches for user-friendly display, using relevant
context to improve the legibility and usefulness of the output. The new
class to achieve this is
Display. Previously, this role was filled by
ShowPatch, with its showContextPatch method (which is, however,
significantly under-used by darcs). Since V3 primitive patches are
substantially harder to read without context, the interface for user-level
rendering of patches mediated by Display mandates context use.
- Commute, invert (all patch types) and merge (“real” and “named”
patches). The primary classes are
Commute and Merge, currently both
faithful copies of their darcs versions. They will however need to be
changed to allow these operations access to a LoadMonad, for the benefit
of delayed, on-demand loading of extensive text data (applies to hunks at
the moment). Currently, the Commute class allows this, by substituting the
Maybe monad for a CommuteMonad constraint, when compared to the original
darcs definition.
- Load, store and convert (to/from legacy format)
PatchInfo objects that are
used to track metadata in Named patches.
About V3 Prims in patchlib
The version 3 primitives in their current incarnation in patchlib have the
following traits (based on the list from the proposal):
- File content and file location/existence are tracked separately, tied
together through universally-unique identifiers (of 256 bits). This avoids a
number of conflict scenarios and may allow further novel features, like
non-history-disruptive project splits and merges or subtree “checkouts”.
- Hunk content is detached from the hunk patch itself through a hash (unless
shorter than twice the length of the hash representation). This makes the
patch files themselves very compact and allows most commute rules to avoid
loading the hunk content altogether (improving speed and reducing memory
footprint). It also makes handling of binary files vastly more efficient.
- The hunks are byte-based, making it substantially more efficient to apply a
patch to a file, since no newline scanning needs to happen.
- The same hunk format that is used for text files is reused for binary hunks,
basically encoding a range substitution operation on the binary file. A
binary delta algorithm can be plugged in to compute more efficient binary
hunks, although even full content replace is much cheaper than the binary
patch type available in V1 prims.
- A set of primitive patches implementing a hunk “move” operation is
implemented, and is passing the generic commutation / application
tests. Unfortunately, at this point there is no diffing algorithm to detect
such moves, although one is planned for the future.
- Hunk and hunk move patches are the only content-editing patches available to
date. Further patch types can be added to the library without restriction as
long as the format is not frozen (at least token replace and indentation
patch types are planned before any such freeze). New object types and
accompanying patch types may be added in later backward-compatible revisions
of the format.
About gorsvet
Gorsvet is a toy implementation of a repository layer that uses V3 primitive
patches. (Un)fortunately, the V3 prims violate fundamental assumptions made by
the repository and command layers of darcs, which means that an integration is
substantially more expensive than fits a single SoC project. However, as
outlined above, it is useful to be able to play around with the V3 prim patches
in a realistic environment. Therefore, gorsvet. I made it into a rather thin
user shell based on cmdlib and a prototype repository layer. The UI more or
less resembles darcs (without interactivity, since that’d be superfluous for a
tool of this scope). You are of course welcome to try the experimental tool
out: the online help should give you an idea how to use it.
One thing I would like to discuss a bit here is the repository format. Since
the patch types are incompatible anyway, we are fully liberated from backward
compatibility considerations. The next darcs repository format can be designed
from scratch, keeping in mind the shortcomings of the previous two formats. The
implementation in gorsvet is a peek at what the result might look like. Anyway,
we still need a better “composite” patch layer, which represents conflicts (and
sits one level above primitive patches), since the current (version 2)
composite patches in darcs are quite unsatisfactory. That also means we have
plenty of time to play around with the repository format, which is more or less
independent of the composite patch format.
As for the format itself, I went for as simple as possible (but no simpler). So
far, I have 2 files and a hashed store: .gorsvet/hashed is a sha256-based,
uncompressed “dumping grounds” for stuff of all kinds. The files are (not
implemented as of this writing) .gorsvet/index (with the same purpose as
_darcs/index — fast, efficient working copy access) and .gorsvet/meta — a
small set of root pointers. This has two very beneficial side effects: very
atomic oupdates and transactional semantics (free transaction
rollback). Compression and garbage collection of the hashed store can then be
sorted out separately (and does not affect semnatics of the repository either
way). There are currently 4 root pointers in meta: shadow, pristine,
inventory and order. The pristine is the same thing as in darcs, shadow
is a similar thing, but at any time reflects the current state of the working
copy (it is automatically updated from working every time, before anything else
happens with the repository). The reason is mainly that the working copy has
entirely wrong semantics for diffing V3 primitive patches: most importantly,
UUID tracking is implemented through shadow.
The remaining two root pointers represent two new data structures (and
inventory is different from what darcs calls an inventory). The order
simply lists patchinfo hashes (a handle for each patch that does not change
through commutation) in the application order for the repository: in this
sense, order replaces hashed_inventory known from darcs. It is pretty
compact, but on its own also useless (since it give us no way to get the
patches themselves). The inventory then, on the other hand, is an efficient
map (currently a sorted pair list, written and read as a Data.Map) from
patchinfo hashes to the current patch storage hashes. Moreover, the patchinfo
objects are, unlike in darcs, stored in the hashed store as separate entities,
and the patchinfo hash can be used to efficiently fetch the patchinfo
itself. Therefore, the named patch can be assembled from inventory and
order puts the patch into correct context.
Apart from storing and showing things, I have also implemented a “pull” command
for gorsvet, but it’s currently fairly unusable, since any conflict
automatically means failure (there is no layer to handle conflicts; we could
use version 2 darcs patches, but I think it would constitute a dangerous
slippery slope: we definitely want a more solid implementation, and also want
to avoid a double transition).
Future Work
The obvious future work lies in the conflict handling. There are two main
options in this regard: either re-engineer a patch-level, commute-based
representation of conflicts (in the spirit of mergers and conflictors), as V3
“composite” patches, or alternatively, use a non-patch based mechanism for
tracking conflicts and resolutions. It’s still somewhat early to decide which
is a better choice, and they come with different trade-offs. Nevertheless, the
decision, and the implementation, constitute a major step towards darcs 3.
The other major piece of work that remains is the repository format: in this
area, I have done some research in both the previous and this year’s project,
but there are no definitive answers, even less an implementation. I think we
now have a number of good ideas on how to approach this. We do need to sort out
a few issues though, and the decision on the conflict layer also influences the
shape of the repository.
Each of these two open problems is probably about the size of an ambitious SoC
project. On top of that, a lot of integration work needs to happen to actually
make real use of the advancements. We shall see how much time and resources can
be found for advancing this cause, but I am relatively optimistic: the
primitive level has turned out fairly well, and to me it seems that shedding
the shackles of legacy code sprawl can boost the project as a whole
significantly forward.
(PS: While I agree, on the theoretical level, that nuking significant amounts
of legacy code carries non-trivial risks, for a small volunteer project like
darcs it is imperative to be fun. And a trench war against legacy code is not
fun. Writing new things and exploring possibilities, on the other hand, is
fun. Which means we need a bit more of the latter and a bit less of the former,
even though the project sits in the more conservative camp — afterall, we
handle rather precious data… Even when taking natural conservativism into
account, a well-motivated, honest subsystem rewrite is better than a
half-hearted, someone-has-to-do-it maintenance of a piece of code that everyone
hates…)
July 26, 2011
Darcs News
darcs weekly news #88
July 26, 2011 06:13 PM UTC
News and discussions
- The Darcs 2.8 line has been branched for a tentative release in August:
- To catch up with Owen's Summer of Code on git-darcs bridging:
- To catch up with Petr's Summer of Code on Prim3 patches:
Issues resolved in the last week (10)
- issue1473 Florent Becker
- issue1714 Florent Becker
- issue1727 Florent Becker
- issue1740 Florent Becker
- issue2021 Florent Becker
- issue2054 Scott Lawrence
- issue2066 Andreas Brandt
- issue2067 Florent Becker
- issue2076 Florent Becker
- issue2079 Florent Becker
Patches applied in the last week (69)
See
darcs wiki entry for details.
July 22, 2011
Owen Stephens
As I mentioned in my previous post, a problem with exporting multiple Darcs branches is that the patch-based model of Darcs makes it particularly difficult to detect merges of two branches.
We want to be able to detect merges, and export them in the fast-import data stream, for import by Git or other fast-import aware VCSs, since otherwise, it would appear as if two branches never converged, even if they had been merged in Darcs.
As an example, say we have two Darcs repos:
master : ABCD'E'
branch1: ADE
where D' and E' are the commuted versions of D and E, having been pulled into master from branch1. We'd like to see this history in git:
_ B _ C _ M
/ /
A _ D _ E _/
i.e. a merge of ABC and ADE; Git makes these merge points explicit (a commit with >1 parent SHA1s is a merge commit), whereas Darcs does not.
So, the question is, how do we detect merges that are clean (i.e. no conflicts) and non-clean (with conflicts)?
First, a discussion on re-ordering and selective-pulling:
To Darcs, the repositories containing {ABC} and {ACB} are equal and are already sharing all their patches. Git (and thus, the fast-import format) does not see it this way - branches only share patches if the SHA1s of those patches match. This means that we can only treat Darcs branches as equal if they share their patches and ordering.
Darcs always provides the user with the ability to cherry-pick patches - selectively choosing the patches to operate on. If some patches are not pulled in, we cannot treat the resulting repository as a merge of branches.
So the resulting constraints are: to use the darcs-bridge effectively, patches should not be unpulled or otherwise re-ordered, and pulls between branches should always use the "--all|-a" option to pull in all changes.
Finally, how do we attempt to find the merge points? What we essentially need to know is where does a merge start in a sequence of patches, and where do the merged patches come from? So in our merged example, we'd need to know that D' is the start of a merge, and the patches are coming from branch1. Knowing this information allows us to export the patches of branch1, and then create a merge commit between the state of branch1 and the state of master.
One potential way of doing this is to create a tag, immediately before pulling a branch, and after the pull has completed. In our example, we'd get:
master: {A,B,C,T1,D',E',T1}
branch1: {A,D,E}
To export these branches, we follow these steps, starting with master:
- While the head-patches of all branches are equal, output a copy of that patch and move the 'reset point' of the branches to the state after exporting the patch.
- When we find a non-equal patch, save the current state as the branches reset point. And keep exporting the current branch.
- When we hit a merge tag (identified by a unique tag message), we read the patches in until we hit the corresponding merge tag (with equal message). We now have a set of merged patches, and need to find the origin branch.
- For each branch, try and match the list of merged patches, if we find a match, we can export that branches patches (having reset the current state to that of the branch reset-state), if we don't find the patches, something has been changed, so the only thing we can do is just output the patches as they are on the merge target branch (since we can't find the source branch).
The next question is how we obtain these "merge marker" tags, which is what I am currently pondering... watch this space!
July 15, 2011
Owen Stephens
Another very late blog post, argh! But, good news - I've completed several of the targets of the project:
- Branches that the bridge manages can now be tracked/untracked and listed.
- Incremental imports/exports with branches now work correctly (bar the merging issues).
- Git patches can be applied to a Darcs repo.
We can now tell the bridge to track new Git/Darcs branches, such that they are synced when commits/patches are exported/imported and can also tell the bridge to no longer track branches, so they will no longer be exported/imported.
# Init a test repo.
$ darcs init --repo foo_project
$ cd foo_project
$ echo 1 > a
$ darcs add a
$ darcs rec -am 'add a'
Finished recording patch 'add a'
$ cd ..
# Create a branch of the repo.
$ darcs get foo_project foo_project_branch1
Copying patches, to get lazy repository hit ctrl-C...
Finished getting.
# Init a bridge from the repo.
$ darcs-fastconvert create-bridge foo_project
Identified darcs repo at /tmp/throwaways/foo_project
Cloning source repo from /tmp/throwaways/foo_project to /tmp/throwaways/foo_project_bridge/foo_project
Initialised target git repo at /tmp/throwaways/foo_project_bridge/foo_project_git
Created .darcs_bridge in /tmp/throwaways/foo_project_bridge
Wrote new marks files.
Wrote hook.
Wired up hook in both repos. Now syncing from darcs
Copying old sourcemarks: /tmp/throwaways/foo_project_bridge/.darcs_bridge/marks/darcs_export_marks
Doing export.
Doing import.
Copying old targetmarks: /tmp/throwaways/foo_project_bridge/.darcs_bridge/marks/git_export_marks
Doing mark update export.
Diffing marks.
1 marks to append.
Import marks updated.
Bridge successfully synced.
# Start tracking the branch we created.
$ darcs-fastconvert branch track foo_project_bridge/ foo_project_branch1/
Copying old sourcemarks: /tmp/throwaways/foo_project_bridge/.darcs_bridge/marks/darcs_export_marks
Doing export.
Doing import.
Copying old targetmarks: /tmp/throwaways/foo_project_bridge/.darcs_bridge/marks/git_export_marks
Doing mark update export.
Diffing marks.
0 marks to append.
Import marks updated.
Bridge successfully synced.
# Print a list of all tracked branches.
$ darcs-fastconvert branch list foo_project_bridge/
Tracked branches:
Name: master ~ Darcs path: foo_project_bridge/foo_project
Name: foo_project_branch1 ~ Darcs path: foo_project_bridge/foo_project_branch1
# Show that the branch was correctly imported into git.
$ (cd foo_project_bridge/foo_project_git/ && git log -p foo_project_branch1)
commit 893e08f44b0de658e00a49bc61a51c6a6621d59e
Author: Owen Stephens
Date: Fri Jul 15 11:53:53 2011 +0000
add a
diff --git a/a b/a
new file mode 100644
index 0000000..d00491f
--- /dev/null
+++ b/a
@@ -0,0 +1 @@
+1
Another main piece of work that I've completed is to properly handle incremental import/export with branches.
Since Darcs uses a patch-based representation, and the fast-import format uses a snapshot-based representation, we have to jump through some hoops to properly import/export the state requried.
To demonstrate, consider the Darcs history with 2 patches:
Fri Jul 15 13:11:04 BST 2011 Owen Stephens
* Amend shopping list, :-(
hunk ./shopping_list.txt 4
+Apples
+Pears
Fri Jul 15 13:10:44 BST 2011 Owen Stephens
* Create shopping list.
addfile ./shopping_list.txt
hunk ./shopping_list.txt 1
+Beer
+Pizza
+Chips
To output this repository in the fast-import format, we need to recreate each intermediate state of the repository, and list the file contents at that state. This is easy - we simply apply each patch and dump the affected files' contents, one after another. At each state, we save the pristine hash (a hash that identifies the pristine state of the repository) and the inventory (the list of patches that have been applied to create the pristine), which allows us to 'reset' ourselves to a previous state at a later point in time, by restoring the pristine/inventory.
Imagine another repository, which has a different final patch:
Fri Jul 15 13:21:06 BST 2011 Owen Stephens
* More treats needed!
hunk ./shopping_list.txt 4
+Cake
+Cider
Fri Jul 15 13:10:44 BST 2011 Owen Stephens
* Create shopping list.
addfile ./shopping_list.txt
hunk ./shopping_list.txt 1
+Beer
+Pizza
+Chips
We want to output the following 2 repos:
original : AB
better : AC
the snapshot-based history graph would look something like:
/-- B
A --
\-- C
We export A, since it is shared by both branches, and then export B. However, we need to reset ourselves back to the state before we applied B - we do so by restoring the pristine/inventory that we stored when we first applied A.
Once we have output C, we throw away the state we have generated/saved, since we have no further need for it, and it could potentially consume a large amount of space.
The fly in the ointment is the need to handle incremental imports/exports. Incremental imports/exports are supported by the fast-import format, through the use of "marks" files. Marks files contain a list of mark->patch_hash mappings[1], where a mark is an integer that is output along with each commit/patch in the format stream. In our example above, A would have mark 1, B 2 and C 3, along with their corresponding patch-hashes (and branch names) in the marks file.
Imagine that we exported the repository incrementally: we would first export A and B, and then, in a separate stream, C. The problem is the fact that to export C, we need the state, as it was after exporting A (remember we've thrown it away after exporting B, to save space). The solution is simple, but fairly inelegant - we simply run through all the thus-far exported patches, and recreate the state for each, which is, as expected, expensive.
On the import side, it's a little more difficult - consider reading the incremental stream containing just C - it'll contain a single commit that looks something like:
commit refs/heads/demo2
mark :3
committer Owen Stephens 1310732466 +0000
data 20
More treats needed!
from :1
M 100644 inline shopping_list.txt
data 28
Beer
Pizza
Chips
Cake
Cider
This commit object names the branch on which it should be recreated ("refs/heads/demo2") the mark for the commit, the commiter, commit message, ancestor (from) commit mark and the commit modifications.
Note the line "from :1" - this line tells the importer that this commit should be based on the state as it was at mark 1 - i.e. commit A. We need to recreate that state - as mentioned earlier, as we import each commit, we stash the state for later use; however, as in export, we throw away this state, once we've finished a particular import stream. To recreate the state in a later stream, we take the ancestor mark (the from mark) and read the corresponding branch-name and patch-hash from the marks file. We then issue an internal command that performs the equivalent of "darcs get branch-name temporary_location --to-match = 'hash: PATCH_HASH'"; once we have get'd a new copy of the repo at the required state, we simply read the pristine and (entire) inventory, which allows us to reset our current state to that of the marked patch.
The final piece of work completed is being able to directly apply git-formatted patches to a Darcs repository:
$ darcs init
$ git init -q
$ echo -e '1\n2\n3' > a
$ git add a && git commit -m 'Add a'
[master (root-commit) 564e9e2] Add a
1 files changed, 3 insertions(+), 0 deletions(-)
create mode 100644 a
$ echo -e '4\n5' > a
$ echo -e 'a\nb\nc' > b
$ git add b
$ git add a
$ git commit -m 'Modify a and add b'
[master 411e599] Modify a and add b
2 files changed, 5 insertions(+), 3 deletions(-)
create mode 100644 b
# Revert changes, so we can apply the patch to the Darcs repo.
$ rm a b
# Create a Git patch of all the repos commits.
$ git format-patch --all --stdout > git.patch
# Apply the Git patch to Darcs.
$ darcs-fastconvert apply-patch . git.patch
Attempting to parse input.
Successfully parsed 2 patches.
Attempting to apply patches.
Applying patch 1 of 2: Add a
Applying patch 2 of 2: Modify a and add b
Succesfully applied patches.
Git patches contain a SHA1 hash of each affected file, which we can use to verify that the files are in the same state that they were in Git, prior to the commit. The patch-apply code computes the target files' SHA1 hashes (Git computes the SHA1 of the string "blob LEN\0CONTENT" of a file) to detect if the files are in the same state as in the Git patch. If the hashes differ, the user is prompted to apply anyway, with any non-applying patches being completely rolled back (the unrecorded state of the repository is also unaffected).
# Create another Git commit.
$ echo -e 'd\ne' >> b
$ git add b && git commit -m 'Modify b'
[master 59046a6] Modify b
1 files changed, 2 insertions(+), 0 deletions(-)
# Revert, so Darcs can apply the patch.
$ darcs rev -a
Finished reverting.
# Change b, so the expected hash doesn't match, but the patch will still apply cleanly.
$ sed -i 's/a/d/' b
$ git format-patch HEAD~1 --stdout > git.patch
$ darcs-fastconvert apply-patch . git.patch
Attempting to parse input.
Successfully parsed 1 patches.
Attempting to apply patches.
Applying patch 1 of 1: Modify b
WARNING: Hash of b does not match patch
No changes will be recorded, if the patch does not apply.
Continue anyway? [yn]y
Succesfully applied patches.
This means someone can make their own local git clone of a Darcs repo, and send patches to the Darcs-repo owner, who will be able to directly apply them.
Still outstanding in the next week:
- Apply Darcs patches to a Git repo.
- Merge detection - we still need to try to detect and output clean Darcs merges, else we'll lose them when exporting to Git.
- Performance - the performance of import is somewhat slow - we need to work out where and why it is performing badly.
[1] Darcs marks also contain the branch name that the given patch is part of - since Darcs doesn't yet natively support branches, we have to provide this information manually.
July 12, 2011
Petr Rockai
soc reloaded: progress 1
July 12, 2011 10:27 AM UTC
Oops! There has been no update for a long while from me, although I have been
busy with code/patches. So far, I have tackled two areas: generalising and
cleaning up the existing Patch testsuite, so I could apply it to the
in-progress V3 Prim code later. This has been quite successful, although it
took a little longer than I would have liked. With the new structure, (QC)
properties for single patches, commutes and merges can be applied to any
concrete patch type that supports the respective operations. Therefore, I now
have coverage of both V3 Prims as standalone patches (for single patch and
commute properties) and also when used with RealPatch (the implementation of
non-primitive darcs 2 patches).
The latter part was then to make all these tests pass. Since I finished the
respective Read/Show instances yesterday, all tests pass. Commute, apply and
friends have been done couple days before that. So the next step is to write
more tests that can demonstrate where the code needs to be augmented.
Hunk storage
I have slightly changed the actual (on-disk) hunk format slightly. For now,
“detached” hunk-text storage is not quite supported, I am keeping that
post-midterms. But the format still counts on that being possible. We do need a
new monad (class) for writing patches though, since the Show instance is
somewhat inadequate: the detached storage needs to be handled somehow.
Anyway, the format now looks like this:
hunk NNN .whitespace_encoded_old .new_text
(we use the same method for encoding whitespace as we do in filenames here). We
might want to change to a format that’s faster to produce/parse, since hunks
typically do contain whitespace. On the other hand, only very short hunks will
be encoded in this form. Also, an empty string is encoded as “!”. So the hunk
text (old and new) can take following forms: .whitespace_encoded, ! (empty) or
“@”. The last form, @ should take 2 parameters, a hash and length, like
e.g. this: @123456789ABCD<...>:65000. That means that most of the time, we
can ignore the hash and only fool around with the numbers (offset and lengths)
when commuting patches. Of course, applying them is a different matter:
nevertheless, we still do have a substantial advantage over V1 prims there,
since each hunk is a simple splice/catenate operation on bytestrings. With V1
prims, we had to chop up the hunk at newlines and remove the +/- signs.
As for implementation, this means we need to abstract commute over a monad
class, which besides commutation failure can express an “fetch text for this
hash” operation. This might be simpler than it was in the Apply case, a lot
of code had to be modified to accomodate for V3 Prims, since the existing
commute runs in the Maybe monad and we can make Maybe an instance of
CommuteMonad. Nevertheless, to make actual use of this, toplevel code that
runs commutes will definitely need to be modified, and in effect, all the
intermediate code that relies on Maybe for commutation will need to be
modified as well. This part could become actually more hairy than was the case
with Apply.
The question of coalesce
Apart from apply and commute, there is currently one more “core” operation in
Darcs patches: coalesce. This operation takes two primitive patches and decides
whether they can be merged into a single primitive patch. This can only happen
if the patches do not commute. Unfortunately, coalesce does not preserve
commutation behaviour: move a b :> move b c gets coalesced into move a c,
which modifies its commutation behaviour with (add, remove, move) patches
mentioning b. On the other hand, coalesce is normally only used during
operations like amend-record, rebase and when handling the “pending” patch.
All these operations modify the identity of a patch, and therefore it shouldn’t
matter much that coalesce fails to commute with commute.
On the downside, coalesce is currently a first-class operation that cannot be
derived from the remaining. Most importantly, it is “redundant” with the diff
operation, that constructs patches from two states. The problem is that with
current (V1) prims, there is no diff operation for some patches. If we had
reliable diff for all patch types, we could implement coalesce in terms of
diff, commute and apply (pseudocode):
coalesce context (a :> b) | isJust (commute (a :> b)) =
diff context (apply context $ a :> b)
| otherwise = a :> b
This would work as long as there was a deterministic diff operation, i.e. one
with the property (for a being a primitive patch) that diff ctx (apply ctx
a) == a for all a. This diff operation does not need (and indeed cannot be
made) universal over different patch types, but fortunately that doesn’t
matter. We can always call it with a specific patch type as one of its inputs:
diff TextHunk ctx1 ctx2
I believe this operation should be possible to have, and it would also allow
significant improvements in the “record” user experience: darcs could try
various differs on a pair of states and offer “better” patches than just hunks
(like eg. replace, move, etc.).
The ability to implement coalesce in terms of other operations is important
because even more than commute, the implementation of coalesce is O(n^2), with
n being the number of different primitive patch types: it needs to take into
account any pair of types. With the above approach, since coalesce always
yields a single resulting patch, we can implement it as follows:
- try to commute a with b; if this works, there’s nothing to coalesce
- if it fails, take a context in which (a :> b) can be applied, ideally as
small as possible (since diff is somewhat expensive)
- try to diff the original and post-a/b context, using both the “a” differ and
“b” differ: if either succeeds (i.e. produces a single primitive patch),
then we have a winner; otherwise, we cannot coalesce either
An UI digression
This is not GSoC related: you can skip this section if all you care about is
GSoC…
Anyway, when I am talking about user interface. I think a substantial
improvement in the way “semantic” patches work would come with the ability to
infer those patches “automatically”. In the spirit of the above diff
operation that is available with every patch type. However, it is hard to do
a semantic “diff” on significantly divergent repository states. This is simply
because further changes obscure the relationships of entities in question. When
the only thing that changes is a “mv”, it is easy to detect. But when you
also edit the file, it is no longer possible to tell for sure if this is a move
or a new object.
What would help substantially, then, would be to be able to run diff much more
often. This prompts a workflow change. Of course, we cannot ask people to
commit every time they do a small change. However, we could ask them to
“amend” an “in-progress” patch when they do. This would be especially useful if
they can be coached into stashing their changes before and after things like
applying “sed” to whole codebase, moving around files etc. This would be
basically a supercharged version of “darcs mv”: it would say “I did something,
you figure what is the right way to represent it”. It adds the burden of having
to call this both before and after the “contentious” operation. But it also
adds significant benefits (IMHO).
The other thing that would this kind of “open” patch (that you keep adding
things to, until you are satisfied, and then you commit) allow is progressive
(time-sensitive) revert. This is something that I would be completely sold on:
if I kept telling the VCS to note down my changes reasonably often, I could
get, in exchange, a whole-repository (but still granular) undo operation.
(It is not hard to imagine that you could also have more than one “open” patch
at a time, sorting changes into different buckets for semi-related changes. The
UI for that one would be more tricky though.)
The story of Apply
To get back to GSoC though. For what it’s worth, the test-suite part of the
work and a sketch of the V3 prim implementation is already in the
screened branch. The changes to the Apply class are almost ready for
getting into screened as well; they are currently available as patch635.
The basic challenge with Apply was that V1 prims and V3 prims apply to
different kinds of things. While V1 applies to a filesystem tree (basically
your working copy), this is not the case with V3 anymore. The V3 state is
modelled as a map from UUIDs to Objects. However, it would be extremely uncool
to have two different “Apply” type classes to apply different kinds of patches:
this would also mean that all higher layers of patch code would need to
duplicate their apply implementations. Not cool.
Therefore, associated types to the rescue: I have added an ApplyState
associated type to the Apply class. The Prim level patches then decide what
they can be applied to (currently a Tree, from hashed-storage, for V1 and
an ObjectMap, which still needs to be fleshed out in more detail, for V3).
Any higher levels inherit their apply state from the prims. Cool.
Of course, it’s not as simple, since we actually have to implement that cool
apply method. This was traditionally (well, since I merged my new annotate
code; not that traditional I guess) been done through the ApplyMonad
class. Now ApplyMonad used to have operations like “create a file”, “create a
directory” or “write this bytestring to this file”. That’s cool for V1 prims,
but not very useful for V3 prims. So ApplyMonad needed to be generalised over
multiple apply states. This forced a multi-parameter type class, since there
are no functional dependencies to save us (and therefore associated types do
not apply either). This is because we expect some monads (e.g. IO) to be
instances of both ObjectMap- and Tree-based ApplyMonad. In general, I
didn’t want to limit this to special monads, although we might go for that
option later if it turns out to be superior.
Anyway, the ApplyMonad class is a bit of a meta-class, since the actually
useful set operations is different for different apply state
representations. But since the methods can carry their own constraints, I have
added a couple of fully generic methods (get current state, set current state
and the like) and a set that only applies to ObjectMap and one that only
applies to Tree. This doesn’t seem to pose significant trouble. Haskell
doesn’t seem to have higher-order type classes that would solve this maybe
slightly more elegantly. (I don’t even know if they are possible. Don’t crucify
me if they aren’t.)
Anyway, long story short, we now have a single apply method in a single
Apply class, that works on both V1 and V3 prims, as witnessed by running the
same set of tests, which sometimes do invoke apply, on both implementations.
The story of Commute
There isn’t much of a story behind this one so far. As I outlined above, there
are things coming here as well, but they are not required to allow V3 prims per
se: only needed for the detached storage optimisation. The commute as it is
implemented works, at least as far as tests are concerned. This goes hand in
hand with some kind of StoreMonad and LoadMonad abstraction, that will
actually allow us to implement the detached storage. The CommuteMonad can
then be a
class (LoadMonad m) => CommuteMonad m where
commuteFail :: m a
kind of deal. A LoadMonad superclass constraint can (should) appear on
ApplyMonad as well. For now, the instances don’t need to be too elaborate (they
can simply fail to fetch anything at all, which will work just fine for V1
prims).
The story of unsafeInterleaveIO
I don’t like unsafeInterleaveIO. At all. My last summer of code was, after a
fashion, about removing a significant source of unsafe, ugly and outright
dangerous lazy IO from darcs. I believe it was a significant success. Now the
LoadMonad / StoreMonad abstraction has a potential to rid us of another
source of lazy IO in darcs: currently, the reading of patch content is
unsafeInterleaveIO`d and the rest of the code treats the repository as a
kind-of-pure data structure built of patches. Of course, the unsafeInterleaveIO
is unsafe because it breaks the type system. Since darcs uses it a lot, there
is no telling which value is actually pure. Arguing about runtime behaviour of
lazy code is hard enough when it is actually pure. Random IO thunks lurking
inside pure values (kind of like an alien in Sigourney’s stomach) turn it into
a nightmare.
This will involve a lot of code being lifted into a monad, fortunately a
significantly restricted monad. In practice, it’ll be IO more often than not
(although in the testsuite, it’ll probably be the like of Maybe, or StateT
Maybe). What matters is not that the code will execute in IO, but it is
statically well protected: it cannot access the IO monad and it has to be
explicit about side effects (like loading things from disk). Therefore, we give
things types that actually reflect impurity, without allowing them to spin out
of control with side effects (like they could if they were simply in the IO
monad). A static type system win.
Conclusion
Ok, I guess this is more than enough for today. I’ll try to keep you folks more
informed about the progress in the second half of the endeavour. On the other
hand, I am a bit worried that these posts are more useful as a note-to-self
resource than for general reading by others. Well, let’s hope, dear reader,
that you found at least something of this post useful and/or interesting.
June 26, 2011
Owen Stephens
I've had a fairly busy couple of weeks outside of GSoC, so I didn't have a particularly interesting blogpost to make last week. That said, the bridge is coming along nicely; we are now able to export Darcs branches into the fast-export format (other than a TODO on detecting merges - more on that later).
Some interesting topics that have come up recently:
Prefix sharing of Darcs branches when exporting - given branches ABCD and ABCE we can "share" the patches A,B and C between the branches rather than simply exporting ABCD and A'B'C'E, which would lose the common history of the two branches. The current implementation exports the longest prefix of patches between branches and then (to use the Git terminology) "rebases" any extra patches on top. E.g. branches ABCD and ABDE will be exported as ABCD and ABD'E (N.B. that D and D' are not the same). The current behaviour is somewhat a "best-effort" (it has some complicated "reproducibility" issues) but after a long discussion with Ganesh and Petr, a better approach wasn't found, so for now, it is how it is.
Encoding replace patches (and other incompatible patch-types) is tricky. The fast-export stream format simply stores file contents at each commit (just as Git does internally), which is fine for exporting patches once - we just apply the Darcs patches in turn, listing the changed files in full for each patch/commit. However, a property we are keen to keep with the Bridge is that of reproducability - multiple exports or repeated import/exporting should yield the same changes. For example, if a Darcs replace patch was exported to Git, and then the Git repo exported back into Darcs, we'd like to be able to recover the same replace patch (rather than a large hunk patch).
To illustrate, imagine the Darcs patch: [hunk file1 "foo\nfoo\nfoo" 1, replace file1 foo bar] that adds some foos to file1 and then replaces foos with bars in file1. It is important to know exactly where the "replace" was in the sequence of low-level patches - if we don't know the position we will create the wrong patch when re-importing (e.g. the newly added foos won't be changed to bars, if we place the "replace" before the hunk). It is difficult to encode positions other than "first" or "last", since we are unable to easily represent the intermediate states in Git (to ensure that the states are re-exported later), so for now, these changes will only be handled if they are first or last in a patch. N.B. the only way to force a replace into the middle of a sequence is by using amend-record, so the impact of this decision is *somewhat* limited.
Upcoming TODOs:
- Add branches to the bridge commands (add, rm, list branches etc.) - since we now support multi-head import/export on both sides, these commands will be very useful.
- Detecting, and making explicit in the fast-convert stream, merges of Darcs branches. Currently, re-exported Git merges are lost, since they are not detected on the Darcs side.
- Performance (import, especially is sometimes slow).
- Perhaps a way of showing progress without piping into git fast-import? Currently, bridge progress is mostly ignored.
- Accepting foreign patch-formats e.g. be able to apply emailed Darcs patches to a Git repo and vice-versa?
June 15, 2011
Owen Stephens
A quick update post to show my first import of a branching git repo that contains merges.
It's a very simple import, but it works! :)
$ git log --graph --pretty='%ad %an <%ae>%n * %s%n%n %b'
* Tue Jun 14 16:19:34 2011 +0100 Owen Stephens
|\ * Merge branch 'branch1'
| |
| | Conflicts:
| | b
| |
| * Tue Jun 14 16:18:54 2011 +0100 Owen Stephens
| | * b branch1
| |
| |
* | Tue Jun 14 16:19:08 2011 +0100 Owen Stephens
|/ * b master
|
|
* Tue Jun 14 16:18:34 2011 +0100 Owen Stephens
* a master
$ git fast-export --all | darcs-fastconvert import darcs
$ darcs cha --repo darcs
Tue Jun 14 16:19:34 BST 2011 Owen Stephens
* Merge branch 'branch1'
Conflicts:
b
Tue Jun 14 16:18:54 BST 2011 Owen Stephens
* b branch1
Tue Jun 14 16:19:08 BST 2011 Owen Stephens
* b master
Tue Jun 14 16:18:34 BST 2011 Owen Stephens
* a master
The "conflicts b" message is generated by Git, and shows up in the Darcs patch, even though the patch isn't a conflicting patch; also, the merge commit seen by Git is actually 2 patches in Darcs: a 'merge' patch, which contains conflicts, and a 'resolution' patch that contains the resolution as per the Git merge commit. This is to ensure that we preserve the entire patch history, rather than simply "diffing" the end state and the branches.
The following rather verbose commands show the actual patch/commit content:
$ darcs cha --repo darcs -v
Tue Jun 14 16:19:34 BST 2011 Owen Stephens
* Merge branch 'branch1'
Conflicts:
b
hunk ./b 1
+1
+c
Tue Jun 14 16:18:54 BST 2011 Owen Stephens
* b branch1
duplicate
|hunk ./b 1
|-1
|-2
|-3
|rmfile ./b
|:
addfile ./b
conflictor [
hunk ./b 1
+1
+2
+3
]
|:
hunk ./b 1
+a
+b
+c
addfile ./c
hunk ./c 1
+a
+b
+c
Tue Jun 14 16:19:08 BST 2011 Owen Stephens
* b master
addfile ./b
hunk ./b 1
+1
+2
+3
Tue Jun 14 16:18:34 BST 2011 Owen Stephens
* a master
addfile ./a
hunk ./a 1
+1
+2
+3
git log --graph -p
* commit 6802fcd03d3ddf69cfb33a803211fe4f22da9542
|\ Merge: f085e43 c5dc576
| | Author: Owen Stephens
| | Date: Tue Jun 14 16:19:34 2011 +0100
| |
| | Merge branch 'branch1'
| |
| | Conflicts:
| | b
| |
| * commit c5dc576b33e8125153c9337b6b2dbf99a2de1a60
| | Author: Owen Stephens
| | Date: Tue Jun 14 16:18:54 2011 +0100
| |
| | b branch1
| |
| | diff --git a/b b/b
| | new file mode 100644
| | index 0000000..de98044
| | --- /dev/null
| | +++ b/b
| | @@ -0,0 +1,3 @@
| | +a
| | +b
| | +c
| | diff --git a/c b/c
| | new file mode 100644| | index 0000000..de98044
| | --- /dev/null
| | +++ b/c
| | @@ -0,0 +1,3 @@
| | +a
| | +b
| | +c
| |
* | commit f085e43571e1d19dd345c7e3ec7a0a57efaaba26
|/ Author: Owen Stephens
| Date: Tue Jun 14 16:19:08 2011 +0100
|
| b master
|
| diff --git a/b b/b
| new file mode 100644
| index 0000000..01e79c3
| --- /dev/null
| +++ b/b
| @@ -0,0 +1,3 @@
| +1
| +2
| +3
|
* commit 83888d0729210fd84c4557467c6548fcc99aae2c
Author: Owen Stephens
Date: Tue Jun 14 16:18:34 2011 +0100
a master
diff --git a/a b/a
new file mode 100644
index 0000000..01e79c3
--- /dev/null
+++ b/a
@@ -0,0 +1,3 @@
+1
+2
+3
June 13, 2011
Owen Stephens
GSoC: Darcs Bridge – Week 3
June 13, 2011 04:06 PM UTC
So, time for a (somewhat delayed - oops!) week 3 update on darcs-bridge:
By the end of last week, I had coded up a poorly-implemented approach to importing branches, but had hit a few final problems (mostly arising due to my flawed approach). Petr and Ganesh both helped me to see the light (and also a better method for handling branches!) which I've spent the weekend on-and-off hacking up.
The result is this rather innocuous-looking transcript:
$ git branch -a
foo_branch
* master
$ git log --graph --all
* commit 48e3317aad56df72977c80c2b40a34b87349e435
| Author: Owen Stephens <git@owenstephens.co.uk>
| Date: Mon Jun 13 14:28:53 2011 +0100
|
| add b
|
| * commit e85e1bae42b1647bc08bdac2aaec8e402152bce0
|/ Author: Owen Stephens <git@owenstephens.co.uk>
| Date: Mon Jun 13 14:28:53 2011 +0100
|
| add c
|
* commit d8b3bafad054acdd8307fbefdc95287dc715e9a7
Author: Owen Stephens <git@owenstephens.co.uk>
Date: Mon Jun 13 14:28:53 2011 +0100
add a
$ (cd git_repo && git fast-export --all) | darcs-fastconvert import darcs
[...]
$ darcs cha --repo darcs
Mon Jun 13 14:28:53 BST 2011 Owen Stephens <git@owenstephens.co.uk>
* add c
Mon Jun 13 14:28:53 BST 2011 Owen Stephens <git@owenstephens.co.uk>
* add a
$ darcs cha --repo darcs-branch_foo_branch
Mon Jun 13 14:28:53 BST 2011 Owen Stephens <git@owenstephens.co.uk>
* add b
Mon Jun 13 14:28:53 BST 2011 Owen Stephens <git@owenstephens.co.uk>
* add a
Which shows (somewhat unclearly) a multi-head git repo with a base commit - "add a" and then two branching commits: "add b" and "add c"; "add b" is made on a new branch, and should not show up in the log of the master branch. Importing this git repo creates a base repo directory and any branches are created in adjacent directories. I've not quite finished off this branch-importing work - I still need to handle merges, but it should not be too much trouble, and it's good to see that the code works on simple cases already!
Other things I've completed over the last few days:
- Correctly handling renames/moves, rather than simply diffing before/after (which would give patches that see fully removed/added files).
- I no longer shell out to darcs-fastconvert (yuck!), when syncing a bridge, as I've reworked the input/output handling that was preventing me from calling the import code internally.
- Testing: I've added a basic test-suite, that will catch the most egregious of foul-ups due to any changes that I make. I'll add more tests in the future, to catch tricky cases, especially with handling branches.
Things coming up: handling merges and exporting multiple darcs "branches" into a single git repository. Onwards!
June 04, 2011
Owen Stephens
GSoC: Darcs Bridge – Week 2
June 04, 2011 10:45 AM UTC
So, it's the end of the 2nd week of GSoC, but only my first full week of work, due to my uni finals; Thankfully, they're over and GSoC is going well! I didn't write a week 1 post, since I'd only done < 2 days of work, but this week I've got some good things to discuss.
This week, I've created a working (but not stringently tested, yet!) automatic Darcs<->Git bridge, by extending darcs-fastconvert. The bridge creates a Git clone of an input Darcs repository (vice versa for a Git input repo), using the fast-import data format to import the data from Darcs. The bridge inserts a "hook" into both repos, (pre-receive for Git, and pre-apply for Darcs) that ensures that the bridge is synced, before allowing new patches to be pushed. If the bridge was out-of-sync, the new commits will be imported and the push/apply will be rejected; the user should then pull in the imported changes and resolve any conflicts locally. We use a mutex to disallow concurrent pushes to the two repos.
If you'd like to test the bridge, you can get a copy of my darcsden repo. The bridge can be created and tested as follows:
darcs get http://darcsden.com/owst/darcs-fastconvert-gsoc
cabal configure; cabal build; cabal install
cd DIR_CONTAINING_REPO
darcs-fastconvert create-bridge --input-repo=REPO_DIR
a directory named REPO_DIR_bridge should have been created, with a clone of the input repo, and a Git copy. These repos should be used as the master repos, and should be pushed to, not edited directly (otherwise, the bridge-syncing commands won't run).
The current tool has limitations, particularly with regard to branches in Darcs and Git, but on simple, linear history repos, the bridge should have no problems.
Some TODOs, for the bridge:
- Improve the help (possibly by modifying cmdlib, the command-line argument parser being used), particularly removing flags for mandatory parameters.
- Create a typeclass for the monad that the "export" command runs in, to allow easy redirection of output. Currently I have to shell-out to my own executable, due to the design of darcs-fastconvert, when I want to internally run an export/import command. This sucks (but at least the hack works as is!), and I will need to do some re-engineering to fix it.
- Create some tests! I need to create some simple shell-tests that will ensure I don't introduce regression errors, when adding features to the bridge.
At the end of the week, I started work on adding Rename/Move handling to the import mechanism of darcs-fastconvert. Git is able to infer moves/copies, using the -C and -M options to git-fast-export, but darcs-fastconvert cannot currently import them. Adding this behaviour will reduce the likelihood of loosing information, if a previously converted repo with moves/renames was converted back to its original format.
My next task is to implement simple multi-head importing - currently darcs-fastconvert linearises Git repos with multiple branches, usually leading to "strange" patch contents. I will map Git branches to multiple Darcs repos (the method of branching in Darcs). One particularly tricky problem to solve is that of creating "good" Darcs patches, from a Git merge commit.
May 23, 2011
Petr Rockai
This year, I have accepted Eric’s invitation to submit a proposal to Google
Summer of Code again. I am not going to repeat the proposal itself here,
so please read that as well. This post is more about filling in the technical
details and setting out a plan of work.
First a clarification: even though there are no V2 prims, I call those V3,
because the V1 prims have slightly (and somewhat confusingly) different
semantics in Darcs1 and Darcs2 repositories; if nothing else filename encoding
has changed incompatibly. There have been some commute rule changes as well,
although I am not sure this wasn’t retroactively changed even for Darcs1 repos.
Not important anyway.
In the rest of the post, I will lay out how I anticipate the V3 prims to work
and further what I intend to implement and roughly how.
Primitive patches shall operate on a collection of abstract objects (which
define the “pristine” state), each object in the collection being uniquely
identified. Objects come into existence the first time they are referenced and
they are never destroyed. We assign a type to each object (and object patches
get corresponding arrow types).
I imagine there would be a few object types: binary, text, directory. We can
add “bugs” objects and stuff like that later. The patch types should be
monomorphic to simplify things. We can share implementations between different
patch types if they are identical apart from their type.
Directories
A directory object is a map from names (strings) to object ids. (I say map and
not bimap since there seems no good reason to prevent multiple manifestations
of a single object.) We should however take care to avoid loops in the
structure and such. We could even tie hardlinking to this, although that’s
probably pretty useless in practice. We definitely should take care to avoid
loops and similar abominations in the directory structure.
Among other repository properties, we keep a “root” object — this is the UUID
of a (directory) object that represents the root of the working copy of the
repository. The directory can map names to things, like text or binary files,
or other directories.
Akin to the “root” object, we may want to keep track of a “preferences” object
as well. Again, this would be just an UUID of a directory object. The directory
object could then list individual preference files.
Some examples:
- bhunk (binary hunk) :: binary -> binary
- hunk (text hunk) :: text -> text
- bin2text :: binary -> text
- text2bin :: text -> binary
- manifest :: directory -> directory
- demanifest :: directory -> directory
Patches of different types on the same object clearly don’t need to commute. If
there is a binary -> binary patch and a text -> text patch affecting the same
object, they can never change their order. In fact, a -> b patches for a != b
can’t realistically commute with any a -> a patch. This should drastically
reduce the exponential number of commute rules we’d otherwise need to deal
with, and should make the primitive commute function much more modular. In
fact, only a -> a patches for same a become involved in the exponential commute
definition blowup. This should be manageable.
Moreover, if we impose a map from patches to the object they affect, we can
also trivially commute patches that affect different objects. We will need to
generalise this later, however, since some patch types may involve multiple
objects (even though our type system can’t handle that yet, either), or even
involve a list of objects variable under commute.
Patch application needs to obey the type restrictions of course.
We will possibly want to attach a UUID to each primitive patch as well, so it
can be readily identified. Of course, this increases the space overhead, but
presumably not exceedingly so.
Hunks
The basic patch type is the hunk: the representation may be identical for both
binary and text objects. What is not the same is how binary and text hunks are
obtained. For text objects, we should use a whitespace-sensitive diffing
algorithm (line diff, most likely; either the one we already have in darcs, or
alternatively patience diff). For binary objects, we can use one of the binary
delta algorithms. It may be prudent to disallow commute for binary hunks, too.
The format is still not defined, although there is a “first shot” at
http://web.mornfall.net/blog/patch_formats.html. But in the end, we probably
want a somewhat different format anyway, or an additional hunk type, because we
apparently want both removal and addition to happen in a single Prim, for
commute to make more sense. So the format could instead look like:
hunk 123 "old text" "new text"
or
hunk 123 old_hash new_hash
with the quoted-string version being a text-escaped, “inline” variant of the
patch to be used when length of old\_text + new\_text is less than two
times the length of a hash. Even though patches of this textual form are not of
constant byte width, that doesn’t matter since other patch types cannot be made
to be that way either (like replace, or manifest). Any “substantial” text that
is the actual problematic part to parse is moved away by indirecting it through
hashes.
Multi-object patches
Until now, we restricted ourselves to patches that affect a single object. This
may be genuinely impractical for patches that move around things, be it
complete files (move) or pieces of content (hunk move). We want such patches to
commute as a single unit, either commuting completely or not at all. This could
be achieved differently, by adding a concept of atomic patch group. I am not
entirely sure if that is right or not, but it currently seems like the more
complicated option.
Therefore, we can go on adding multi-object patches. Presumably, the correct
type would be (a, b) -> (c, d). Most commonly of course (i.e. in the two
abovementioned cases), this would end up being (a, a) -> (a, a).
Generic commute rules
Let’s assume a function
touches :: Prim -> [UUID]
we can say that
commute (a :> b) | null (touches a `intersect` touches b) = (b :> a)
Now we can also add the type restrictions. We demand that for each touched
object, all the types in both patches match for the commute to be allowed.
commute (a :> b) | not (a `typematch` b) = fail
Where
typematch a b = all match (touches a `union` touches b)
where match x | type a x /= type b x = False
| fst (type a x) /= snd (type a x) = False
| fst (type b x) /= snd (type b x) = False
| otherwise = True
With type :: Patch -> UUID -> (ObjectType, ObjectType).
A few more restrictions will need to be applied in case we need a patch type
that may operate on object collections such as a “subtree”, whose composition
can change in time. I have a rather vague idea that some kind of general
mechanism could be used for designating such an object collection could be
used, which could then be built into the framework, making it also possible to
define that kind of patches without introducing exponentially big commute
definitions.
Adding new patch/object types
With the generic commute rules, it becomes possible (and easy) to add new
object types and corresponding patch types to the system, without ending up in
an exponential tangle. One such object type could be “haskell” (holding a
representation of a Haskell AST), or “bug” (for in-repo bug tracker, ala bugs
everywhere). Another useful object type could be “set”, keeping a sorted set of
lines, or a “changelog”, keeping a timestamped list of textual chunks.
Of course, there are other options to achieve a similar effect, but I like how
this “falls out” more or less naturally from the design.
Optimisations
The suggested patch representation allows for some optimisations in the way
patches are stored. This could include per-object buckets, detached hunk
storage (which is more or less implied by the hunk format) or the
like. Per-object buckets are slightly complicated by multi-object patches, but
probably not ruled out. With per-object buckets, UUIDs based on hashing and
minimal context for prims may be feasible, granting a number of desirable
security and convenience properties to the system as a whole.
There’s a couple of issues that have been raised about the proposed design so
far. I think the major one is that merge behaviour might be somewhat surprising
in some respects, especially if root and preference object UUIDs are picked
fresh, instead of being hardcoded: it is not clear how to manage these
cases. On the repository level, any two repositories with disjoint sets of live
UUIDs merge cleanly. However, they don’t necessarily merge cleanly on the
working directory level.
The most counter-intuitive case seems to be when you initialise an empty
repository and then immediately pull from somewhere: you get a conflict here. A
possible solution: darcs init should create a repo with no objects. Any and
all objects come into existence through patches. And once you have two patches
that add a root object, you get a reasonable conflict that can be reasonably
resolved. (The preferred solution probably entailing recursive merge of the two
trees, with conflicting leaves being renamed out of way; this would constitute
“conflict markup” and would need to be explicitly recorded, of course, like
with other conflicts… another desirable solution may be renaming one of the
roots and making it a child of the other, confirming the other as the
repository root.)
About preferences, that’s a bit more tricky. The tradeoff is slightly
different, since pivoting repository roots sanely is probably significantly
more valuable than it is for preferences. More discussion is probably needed on
how to exactly arrange this, but the exact details are not important for the
project. Most of the available options can be implemented relatively painlessly
on top of the proposed infrastructure.
So in the above, I have outlined the design of what I intend to implement. I
think I am now reasonably happy with the abstract level of design. There is of
course a number of implementation challenges to be solved.
Nevertheless, since it is really getting late now, so I will keep the “battle
plan” for another post. I intend to spell out roughly what and how I intend to
do on the implementation level, and break this down into a couple of phases,
time-wise. A very quick summary for now…
The core of the implementation work will happen directly in the darcs
repository, since each “version” of primitive patches comes with a separate
module hierarchy (even if there is only V1 so far). I can work in
Darcs.Patch.Prim.V3 more or less without disturbing other work or the release
process. I expect that coverage of the newly written code will come mostly from
QC and unit tests, which will likely live in the darcs source tree as well.
An external, experimental “client” may be an option, depending on how things
go. It would probably implement a simple repository format without any of the
higher level patch types (i.e. no conflict handling etc.), mostly for
demonstration purposes.
A semi-independent vein of the implementation work would entail pushing out
lazy IO out of the patch code in darcs per se. This is tied in with decoupling
the hunk content (the text or binary blobs) from the patches themselves in
memory. Even though it may well be possible to achieve using IO interleaving,
an explicit approach will be more transparent, presumably much easier to reason
about and, ultimately, debug, than the current lazy IO code.
May 20, 2011
Owen Stephens
This week I've been participating in some productive revision-procrastination, by starting work on my GSoC project - Darcs Bridge; I still have university exams until next week, so wanted to make sure I lost no GSoC time (and besides, Haskell hacking is much more fun than revising Elliptic curves!).
-
First up, I created a darcsden repository, to host my work-in-progress over the summer, here.
-
To kick things off code-wise, I wanted to make sure I could build darcs-fastconvert, mornfall's existing method of converting to/from darcs/git, using the "fast-import" de facto standard. This meant updating the code, to build against the latest dev-version of Darcs - 2.7.3 and GHC 7.
-
I found, and fixed a bug in Darcs itself, which manifested itself when attempting to add non-existent files within a newly added folder within a repo.
Ensuring darcs-fastconvert compiled was a great first challenge for my project; fixing the new typing requirements forced me to understand how many of Darcs low-level concepts were implemented, particularly how types are used to represent the contexts of a patch.
Ealier today, my mentor Ganesh and I had an online chat about ideas for the project:
-
We want conversion between darcs<->git to be as seamless as possible - a "sync" command.
-
Darcs-fastconvert can do this to some extent, but requires manually managing "marks" files (for git and darcs), to persist the state of the conversion. These are somewhat fragile and prone to errors (and also make the export/import command invocations more noisy than they should be).
-
We can "shell-out" to git, rather than relying on the user to manage the git side of the conversion. This also allows us to easily get hold of data such as the git commit ids (something that is not exported in the fast-import data-format.) - this'll allow us hopefully to better manage multi-head repos.
-
We can assume that the bridge will be responsible for "managing" the darcs and git repositories; particularly, we envisage a "bridge lock" that will allow us to ensure that users cannot commit to the darcs and git repositories simultaneously - the pre-commit hooks in both git and darcs shall fail, if this lock is active. (We initially thought we could use the separate darcs and git locks, but this could well lead to lock ordering or race problems.)
-
We can use git "hooks" to ensure that a darcs and git repository are in sync, before new commits can be pushed to the git repo.
That's it for now, but I'll make sure to keep this blog updated, as my work progresses.
May 08, 2011
Darcs News
darcs weekly news #87
May 08, 2011 09:35 PM UTC
News and discussions
- Darcs received funding for two Google Summer of Code students, Owen Stephens and Petr Rockai:
- Andrew Pennebaker built and put online Windows installers for Darcs 2.5.2:
- We are looking for a new issue manager:
- And here is the report of last month's darcs sprint (Paris) in case you missed it:
Issues resolved in the last week (5)
- issue1640 Radoslav Dorcik
- issue1661 Scott Lawrence
- issue1665 Chris Trompette
- issue1804 Eric Kow
- issue2052 Owen Stephens
Patches applied in the last week (27)
See
darcs wiki entry for details.
April 26, 2011
Owen Stephens
GSoC project accepted!
April 26, 2011 11:29 AM UTC
Yesterday I received confirmation that my Darcs GSoC project proposal has been accepted; this summer I'll be working on creating/improving a "bridge" between Darcs and other VCSs, such as Git (for more information, see the Darcs wiki page of my project).
This blog will host my weekly GSoC updates, when the coding period starts in late May.
Thanks to all in Darcs team who helped me flesh out my proposal (special mentions to Eric, Ganesh and Jason - thanks guys!), I'm very much looking forward to my summer!
April 19, 2011
Eric Kow
In the Darcs community, we've been discussing the recent blog posts saying that Git is inconsistent, that it cannot be made to be consistent.
With Darcs being the foil to Git for the purposes of this discussion, I thought it would be useful if I cleared up a few points, particularly this first one:
consistency is a usability issue
When people say they like Darcs, they don't generally talk about it having a beautiful or elegant theory. Instead, they talk about how easy and simple it is to use, about how they never really had to grapple with a learning curve or feel stupid for doing something wrong.
What makes Darcs so simple to use? Did it hit the right notes by accident or through David Roundy's good taste? Or is usability merely in the eye of the beholder? Some of these explanations may be true, but I think what lies at the heart of Darcs' usability is that it supports a very simple way of understanding a repository:
a darcs repository is a set of patches
This mental model may not be suitable for everybody, and in the long run Darcs may need to improve its support for history tracking. But if you want to understand why, for all its current shortcomings, people continue to use and develop Darcs, you must appreciate how refreshingly simple the set-of-patches mental model can be. As a Darcs user you are freed from a lot of the artefacts of worrying about commit order. Collaborating with people is just question of shuffling patches around, with no merge states, no rebases, way fewer spurious dependencies to worry about.
But simplicity is hard. In order to make this simple world view possible, Darcs has to guarantee a property that any ordering of patches allowed by Darcs commutation rules is equivalent. If Darcs gives you the option of skipping a patch, it has to work hard to make sure that if you include the patch later on, that the repository you get is equivalent. That's what the patch theory fuss is about. While it's useful that Darcs tends to attract purists and
math geeks, we're really not engaged in the pursuit of some sort of ivory tower theoretical elegance for its own sake. Ultimately what we're after is usability.
A good user interface minimises work for the user, be it cognitive, memory or physical work. The joy of Darcs is being able to focus cognitive work on our real jobs, and not on babysitting version control systems. So when Russell O'Connor says that merges ought to be associative, he's not saying this to tick some sort of mathematical box, what I think he's really saying is as a Darcs user, he doesn't want to worry about the difference between pushing patches one at a time vs all in one go. Consistency is a usability issue.
darcs is imperfect
Darcs is very much a work in progress. Some users have felt let down by Darcs: whenever performance grew to be unacceptable for their repositories, when they hit one exponential merge too many, or when Darcs just plain did something wrong. Even our much vaunted usability has cracks at the edges, a confirmation prompt too many, an inconsistent flag set, a non-reversible operation or two.
I particularly want to make sure I'm very clear about this point:
darcs patch theory is incomplete
We still don't know how to cope with complicated conflicts. Moreover the implementation of our first two theories is somewhat buggy. Darcs copes well enough with most every day conflicts, but if a conflict gets hairy enough, Darcs will crash and emit a nasty message. This is one of the reasons why we don't recommend Darcs for large repositories.
Our version of "don't do that" is not to maintain long term feature branches without merging back to the trunk on a regular basis. This is not acceptable for bigger projects, but for smaller projects like Darcs itself, the trade-off between a simple user interface in the general case, and the occasional hairy conflict can be worth it. In the long run, we have to fix this. We are revising our patch theory again, this time taking a much more rigorous and systematic approach to the problem.
In the interim, we will be gaining some powerful new tools to help work around the problem, namely a new "darcs rebase" feature that will allow users to smooth away conflicts rather than letting them get out of hand. This will be a crucial bridging tool while we continue to attack the patch theory problem.
patch theory is simple at heart
I am in the awkward position of being a non-expert maintainer, having to defer a lot of thinking about software engineering and patch theory to the rest of the Darcs team. In a way, this is healthy for Darcs, because we have long suffered from an excess concentration of expertise.
Inverting the pie so that you basically have the number one Darcs Fan as the maintainer is useful because it forces everybody else to break things down into words an Eric can understand.
The good news is that
basic patch theory is one of these things an Eric can understand: patches have inverses and may sometimes be commuted. Just learning the core theory teaches you how merging and cherry picking works, why you can trust the set-of-patches abstraction and most importantly, how simple Darcs is. So we're not after some kind of magical AI here, nor are we trying to guess user intention. The things we do with patches are much more mechanical, systematically adjusting patches to context, one at a time, click-clack on the abacus until the merge is complete.
patch vs snapshot is not so important
We think it's important to continue working on Darcs because we are exploring territory that no other version control system is looking at - patch-based version control. That said, patches and snapshots are duals of each other. We think that things that Darcs can do are possible in snapshot based version control and we would be very interested to see work in that direction.
The secret to Darcs merging is that it replaces guesswork (fuzz factor) with history. A darcs patch only exists in the context of its predecessors, and if we want to apply a patch to a different context, we mechanically transform the patch to fit. We think this sort of history-aware merging could be implemented in Git. In fact, we would be excited to see somebody taking up the challenge. Git fans! How about stealing history-aware merging from us?
exponential merges still exist but there are fewer of them
We have developed two versions of patch theory. The second version avoids a lot of the common causes of exponential merge blowups, but it is still possible to trigger them. Recent Darcs repositories are created using version 2 of the theory. For compatibility's sake, repositories created before Darcs 2 came along tend to still be using version 1 of the theory (we only recommend converting if conflicts become a problem).
The most well-known remaining cause of blowups in theory 2 is the problem of "conflict fights" where one side of the conflict resolves the conflict and gets on with their life without propagating the resolution back to the other side. What tends to happen there is that we not only encounter the conflict again in the future, but we also conflict with the resolution!
So life is definitely better with Darcs 2. We've given the exponential merge problem a good knock on the head, but it's still staggering around and we're working our way to the finishing blow.performance is improving
I think that when people complain about Darcs being slow, they're not talking about the exponential merge problem. They're mostly referring to day-to-day issues like the time it takes to check out a repository. Our recent focus has been to solve a lot of these pedestrian performance issues. For example, the upcoming Darcs 2.8 is like to use a new "packs" feature which makes it possible to fetch a repository in the form of two larger tarballs rather than thousands of little patch files. This makes a big difference!
Another improvement we hope to bring to Darcs 2.8 is the performance of the darcs annotate command (cf. git blame). Annotate has neglected for a while, and to make things better, we've basically reimplemented the command from scratch with more readable output to boot. As an example of something fixed along the way, one misfeature of the old annotate is that would work by applying all the patches relevant to a given file, building it up from the very beginning. But if you think about it, annotating a file is really about annotating its current state; we don't care about ancient history! So one of the Darcs hackers had the sort of idea that’s obvious in hindsight: rather than applying patches forwards from the beginning of history, we simply unapply them from the end. Much faster.
We're not yet trying to compete with Git when working on these performance issues. We admire the performance that Git can deliver and we agree that getting speed right is a usability issue (too slow and your user loses their train of thought). But we've been picking a lot of low hanging fruit lately, solving problems that make Darcs faster with very little cost. We hope you'll like the results!
April 18, 2011
Darcs News
The sixth Darcs Hacking Sprint took place on 1-3 April in Paris. As usual we had a lot of fun getting together, thinking and talking about Darcs for 3 days together, meeting new developers and seeing old friends.
Preparing Darcs 2.8
Darcs 2.8 is fast approaching!
The main feature we're pushing for is a new "packs" optimisation, which rolls a repo’s pristine and patch files into single files, making darcs get over HTTP significantly faster. The packs optimisation work was done by Alexey Levan, one of our 2010 Gooogle Summer of Code students (and now our Windows packager). Guillaume spent some of the sprint working on a lot of the finishing touches needed to get packs into our users' hands: enabling them by default, writing new tests to ensure that Darcs attempts to use them, and reducing their size. Guillaume and Jérémie (who joined us as he was a local Darcs user and just wanted to give back, merci!) also worked together to get a couple of benchmarks:
| Jérémie's repository | ~900 patches | 10s | 1s |
| darcs screened (full) | ~9300 patches | 37m | 2m |
| darcs screened (lazy) | ~9300 patches | 27s | 7s |
The timings and a couple of other numbers are now available on the darcs wiki
optimize --http pageFuture of Darcs
If Darcs didn't exist, it would be worth writing
Ganesh gave an excellent talk on the
Future of Darcs, shaking us out of some recently infectious gloom (it was even getting to me!). He gave us a much needed reminder what we're fighting for, how to keep it going, and what he's been contributing to the fight.
- Darcs is important because it's fundamentally different. Our patch-based view is still unique and brings a lot of novel thinking to the version control table. We need to work on Darcs because nobody else is doing something like it. If Darcs itself didn't exist, we'd want to create it.
- Patch-based version control is the secret ingredient to Darcs' very easy user interface. The message we hear most often from Darcs fans is not about theory, but about how simple and easy it is to use Darcs. Our users don't care about patch theory per se, but it's because we work from patch theory that our user interface can be both gentle and powerful.
- Darcs provides a path to the future of version control. Ever wished your version control system thought in terms of abstract syntax trees instead of lines of text? We can't do that yet, but because we understand patches in a way nobody else does, we have a good idea for how to get there.
We use Darcs because we love the UI; we hack Darcs because we love the theory. But then what?
First, we still need to catch up; we're making a lot of progress overcoming performance issues, and usability issues such as our rather appalling conflict marking. Second, we need to solve the interoperability problem. Being a research-oriented version control system with a lot of practical catch-up to do puts us in the minority. We need to make it cheaper and cheaper for people who love Darcs to keep using it, when all their friends are using something else; and we need to make it risk free for somebody to give it a spin and see how they feel about it. Finally, we need to start moving towards that future, start tackling some of the really cool features we have in mind for Darcs 3.
So what has Ganesh been up to? Lots! When not taking care of a new baby, he's been implementing new features like rebase, making conflicts marking easier to use (with patch names!) and exploring a potential "graphictors" conflict representation (because every Darcs hacker needs a conflict representation of his own). More below!
Rebase Design
 |
| Thomas, Eric and Ganesh |
Darcs rebase is a new feature which may be available in Darcs 2.8 on an experimental basis.
One of the great things about Darcs is that we generally do not need rebase; the combination of a patch theory and an friendly interactive UI makes many complicated rebase/cherry-pick use cases effortless for Darcs users. Theory and UI gets very far but sometimes it's not enough. Having a rebase operation would make it possible to do things like smoothing away unwanted conflicts, amending depended-upon patches including mistakes like that 1 GiB file you added one year ago, and consolidating multiple patches into one.
Ganesh gave us a tour of the changes and cleanups to Darcs code needed to make Darcs rebase work. He also brought up a tricky implementation issue about how the rebase "suspended" patches will interact with darcs amend-record, which we pored over together whiteboard markers in hand (this is why we need sprints!). Owen captured some of this discussion so we're hoping to have some nice diagrams explaining the issue in more detail.
Bridge - from Darcs to Git and everything else
Git is a great choice of version control system for many users and projects. Great hosting sites like GitHub, code review tools like Gerrit and graphical interfaces like GitTower add a lot of value to the Git universe and add testimony to the self-reinforcing power of network effects. So what kind of role does Darcs play in an increasingly Git-dominated world? We debated the issue for a while and eventually agreed on a single goal which is to work towards best-effort interoperability between Darcs and other version control systems, but being cautious to avoid tying ourselves tightly to any particular system.
 |
| Owen on Day 1 |
What we envisioned was an advanced bridge based on darcs-fastconvert that would allow us to maintain an incremental bidirectional mapping between repositories in Darcs and another version control system. We want it to be easy, for example to contribute to a GitHub hosted project as a Darcs user, or conversely to maintain a Darcs project, but make it easy for Git-using contributors to submit patches and track your upstream work. One exciting feature of the bridge is that it would allow us to smooth the progression from Darcs 1 to Darcs 2 repositories and eventually to Darcs 3.
Owen and Ganesh worked on identifying the technical challenges behind creating such a bridge and fleshing it out into a Google Summer of Code proposal:
- Creating a mapping of multi-head repos to Darcs repositories.
- Import/Export of foreign patch formats.
- Efficiently mapping between patch-based and snapshot-based models.
- Robust translation between Darcs versions.
- Mapping from “Darcs only” patch-types e.g. replace.
Check out
Owen's proposal for more details. Looks like we may have a fun summer ahead of us!
Review backlog
Getting together with a video projector was a good way to catch up on some patch review. Over the three days, Ganesh and Guillaume cleaned out the patch tracker, reviewing patches (6 accepted, 2 rejected, 1 follow-up requested). Thanks, guys!
Development workflow
 |
| So if "Adam" submits a patch to the tracker... |
Speaking of the patch tracker, there's been lots of experimentation lately, namely, a "screened" branch and a more relaxed review policy. The overall aim is to reduce administrative overhead: the screened branch reduces the need for developers to maintain long term branches, and the relaxed review policy allows us to keep patches flowing, so that minor patches that don’t actually need any review are not jammed up by bureaucratic process. But change is messy! We now have users who aren't entirely sure how to send patches to Darcs or when they can or should amend their patches.
We spent some time thinking about how to achieve our goal of reforming the review process to accommodate the needs of patch submitters and reviewers, while keeping the overall process simple and lightweight. Our conclusion was to:
- preserve the screened/reviewed branch distinction and start explicitly calling the reviewed branch that
- simplify the general flow of patches to a linear screened, reviewed, release sequence (except the usual release-specific backports)
- eventually point http://darcs.net to the screened branch
- shift definitively from an amend-oriented culture to a follow-up oriented one. Once a patch has been accepted to screened, it can no longer be amended
- simplify release process by removing the notion of a soft freeze
Eric, Ganesh and Guillaume also spent some time on the developer documentation. We clarified
which repository to send patches to (screened), when you can amend patches (only until they've been screened), and how to propose/add a
new feature to Darcs (carefully). We also discussed briefly discussed the
high level developer documentation concluding that we want to move towards it reading more like a book.
Healing paper cuts
Owen and Iago worked on improving Darcs' user interface in some corner cases. Owen studied the infamous "thisrepo" problem. The thisrepo cache entry is a piece of local-filesystem tracking information that causes more recent versions of Darcs to generate an annoying warning when a repository is moved to new location. Having established that it was safe to do so, he got Darcs to stop producing the entry and to ignore it when it is present. This keeps the warning system useful and relevant, by removing false alarms.
 |
| Guillaume and Iago |
Iago started out by fixing a
deceptively easy bug where darcs amend-record would let you add primitive patches to a tag patch. Darcs tags are implemented as empty named patches that depend on other patches; in theory tags could also contain patches of their own, but this use case is not supported in the implementation. Removing the ability to sneak these patches in avoids potential bugs and usability issues that may arise from it. Iago's work on this bug opened up into a host of related patch name parsing corner cases. Iago spent some time analysing the different possible ways to specify the name of a patch, how they behave with respect to empty names and names starting with 'TAG'. This took quite a bit reading the Darcs.Commands.Record code to figure out how to could effectively fix these problems with minor effort.
An interesting recurring theme came up talking about these issues with Owen, Iago and Ganesh, which is that changing Darcs behaviour can sometimes be a delicate affair because we have to take into account (a) repositories produced by older versions of Darcs and (b) how older versions of Darcs will react to repositories produced by more recent ones. It's the sort of issue that makes Darcs a great place for anybody who wants to confront "real world" software issues, where getting things right includes, and goes beyond, simply nailing down the theory.
Garbage collecting pristine
The darcs hashed repository format (one idea we've stolen from Git!) makes for much better robustness, allows for performance improvements such as the global cache and lazy fetching, and opens the door to future work on verifiability (short secure version identifiers). After a lot of performance work on hashed repositories, we've reached a point where we're ready to deprecate the old-fashioned format and get everybody updated to hashed repositories.
 |
| Florent formulates a plan |
But there's a lingering performance issue in the back of our minds. To avoid race conditions we have disable automatic garbage collection of ununsed hashed repository entries. Repository owners would have to remember to run an occasional darcs optimize to delete these files. Is there a better way?
Guillaume led a discussion which led to a suggestion by Florent: keep track of pristine root hash timestamps and delete files older than 24 hours. Getting a repository should not take that long, so it should be safe to delete older files!
Darcs Testing and Code Quality
Iago gave a couple of nice presentations on work he has been doing for his MSc work, in the context of an MFES circular unit on formal methods: verifying darcs patch theory properties with the Alloy model checker, and assessing/improving the maintainability of Darcs code. He showed us some rather interesting examples of things that Darcs code does which makes testing hard (really long functions seems to be killer) and discussed improvements he made to our QuickCheck generators. It was great to see him break down our randomly generated tests into categories with varying degrees of meaningfulness and how a few simple tweaks to our generators could make the Darcs tests a lot more useful and informative. Iago will post the slides and his sample Alloy code when he has obtained his qualifications.
Darcs needs to work harder on code quality and testing, but what are you going to do with a handful of hobby-hacking hackers doing the best with their free time? Iago suggests candidates for the two lowest hanging fruit to pick: (a) cutting down our function sizes (check out urlThread in our
URL module!) and (b) developing a sort of standard test suite template to be used for each Darcs module. He later spent a bit of time in the airport fleshing this out with the Darcs.Util module as an example.
Experience reports
We had two new participants at the sprint, Owen and Iago. Let's hear it from them. Owen said:
The sprint was a great introduction to the Darcs team, having face-to-face discussions (and being able to quickly answer my beginner-questions) really helped bring me up to speed with not only the code-base but also the underlying concepts and ideas. My primary motivation for coming was to get to know the team and code-base, so that I could make an effective GSoC project proposal, and the sprint certainly helped in that respect.
Since I had only had limited experience with the code-base prior to the sprint, I didn't actually do much coding (I was primarily focusing on my GSoC ideas), however, at future sprints, I'll definitely be able to code more, having "learnt my way around".
I came away feeling very positive about Darcs, with a strong feeling of wanting to contribute, to bring Darcs to the top of the VCS world. It is obvious to me that people do care about Darcs!
I think attending a Darcs sprint is a great chance to have a fun weekend coding and talking about a project that is very much alive, with a few beers and a kebab afterwards! :)
And Iago:
It was a very nice experience. Personally, I though the Sprint will consist on three days of almost full-time work from 9am to 6pm; now I see that the hard work is done during the time between Sprints, whilst Sprints are just time for discussing some project-management stuff, interesting work in progress, ideas, suggestions, etc. Although I have started contributing with Darcs few months before the Sprint, I think that it is a very good opportunity to meet Darcs people and start to contribute with Darcs.
Thanks guys! It was great having you. I hope you can join us for future sprints.
Presentations
- Ganesh on Future of Darcs
- Iago on Model Checking Darcs
- Iago on Darcs Code Quality
Participants
 |
| It's on timer! (Iago, Guillaume, Owen, Ganesh and Eric) |
- Eric Kow
- Florent Becker
- Ganesh Sittampalam
- Guillaume Hoffmann
- Iago Abal
- Owen Stephens
Short visits
- Jérémie
- Thomas Refis
- Nicolas Pouillard
Thank-you!
We had a great time in Paris. Many thanks to the Initiative de Recherche et Innovation sur le Logiciel Libre (IRILL) for making such nice facilities available to us with so little fuss!
Thanks also to our donors for making it sprints as accessible to a wide public, and to to the Software Freedom Conservancy for taking on the nitty gritty administrative detail that makes it possible for us to focus on the core mission of making Darcs rock.
Merci à tous!
April 09, 2011
Darcs News
darcs weekly news #86
April 09, 2011 03:04 PM UTC
News and discussions
- Radoslav Dorcik provides a web service to visualize darcs repositories accessible via HTTP:
- Several users talked about how they use darcs revert/unrevert as an equivalent to git stash/pop:
Issues resolved in the last week (4)
- issue1344 Gabriel Kerneis
- issue2012 Gabriel Kerneis
- issue2013 Gabriel Kerneis
- issue2049 Ganesh Sittampalam
Patches applied in the last week (53)
See
darcs wiki entry for details.
April 05, 2011
Owen Stephens
GSoC project proposed
April 05, 2011 01:46 PM UTC
I've just submitted my project proposal to Google, to work on Darcs this summer.
Summer of Code is a Google project, paying students to hack on an open source project for the summer.
For details of what I've proposed, see the Darcs wiki entry I've created.
Now, let the waiting begin!
March 18, 2011
Darcs News
darcs weekly news #85
March 18, 2011 12:05 PM UTC
News and discussions
- Darcs 2.5.2 was released:
- Version 0.3.3 of the Jenkins-Darcs plugin was released:
Issues resolved in the last week (0)
Patches applied in the last week (8)
See
darcs wiki entry for details.
Jason Dagit
I'm not really sure what motivated this, but I just used cloc to count the lines of code in both the darcs source and the git source. Here are the numbers. The git source tree:
1951 text files.
1836 unique files.
848 files ignored.
http://cloc.sourceforge.net v 1.51 T=15.0 s (72.3 files/s, 20377.3 lines/s)
--------------------------------------------------------------------------------
Language files blank comment code
--------------------------------------------------------------------------------
C 267 15517 13469 100133
Bourne Shell 589 15127 5508 84826
Perl 40 3798 3441 23825
Tcl/Tk 39 1453 375 9762
C/C++ Header 99 1977 3557 8301
make 12 413 434 2673
Bourne Again Shell 1 144 110 2165
Lisp 2 231 170 1779
Python 13 465 442 1384
ASP.Net 8 141 0 931
m4 2 87 21 858
CSS 2 154 24 710
Javascript 2 113 319 477
Assembly 1 26 100 98
XSLT 7 15 29 77
DOS Batch 1 0 0 1
--------------------------------------------------------------------------------
SUM: 1085 39661 27999 238000
--------------------------------------------------------------------------------
The darcs source tree:
561 text files.
549 unique files.
57 files ignored.
http://cloc.sourceforge.net v 1.51 T=189.0 s (2.7 files/s, 298.0 lines/s)
--------------------------------------------------------------------------------
Language files blank comment code
--------------------------------------------------------------------------------
Haskell 169 4361 7374 27760
Bourne Shell 300 2071 2869 8333
C 7 325 153 1494
HTML 5 41 4 316
C/C++ Header 12 92 83 308
Bourne Again Shell 3 51 95 180
Perl 2 43 36 130
CSS 1 21 3 79
make 1 12 6 53
Lisp 1 5 6 23
--------------------------------------------------------------------------------
SUM: 501 7022 10629 38676
--------------------------------------------------------------------------------
Take those categories with a grain of salt. For example, the darcs source does not have any lisp files. It is interesting that git has 200k more lines than darcs. I'm not sure what that says. C is far more verbose than Haskell? Although, that's not really fair because they also have an order of magnitude more shell code. If you're just comparing C to Haskell it's a factor of about 4.
March 11, 2011
Darcs News
darcs weekly news #84
March 11, 2011 11:51 AM UTC
News and discussions
- The next Darcs Hacking Sprint will take place on April 1st, 2nd and 3rd in Paris:
- A comparison between Wave and Darcs was posted:
Issues resolved in the last week (0)
Patches applied in the last week (27)
See
darcs wiki entry for details.
February 26, 2011
Darcs News
darcs weekly news #83
February 26, 2011 03:38 PM UTC
News and discussions
- The next Hacking Sprint will take place in Paris soon. Please add yourself to the wiki page if you consider attending:
- Radoslav Dorcik compiled an e-book from Darcs' Wikipedia article and its related pages, and made it available under the epub format
- We discussed our review policy and settled on a more relaxed approach to it:
Issues resolved in the last week (2)
- issue1737 bsrkaditya
- issue2041 Alexey Levan
Patches applied in the last week (38)
See
darcs wiki entry for details.
February 13, 2011
Darcs News
darcs weekly news #82
February 13, 2011 09:03 PM UTC
News and discussions
- Darcs 2.5.1 is the first stable release that can be built with GHC 7:
- A Darcs plugin for the Hudson/Jenkins continuous integration system was released:
Issues resolved in the last week (5)
- issue1350 Gabriel Kerneis
- issue1558 Gabriel Kerneis
- issue2003 Ganesh Sittampalam
- issue2019 Alex Suraci
- issue2035 Eric Kow
Patches applied in the last week (147)
See
darcs wiki entry for details.
February 11, 2011
the Patch-Tag blog
This is mostly for myself, but maybe the googlebot will pick it up and help some others.
Basically, patch-tag encourages https: browsing post log in because, well, it’s the right thing to do. (IMHO, https should be the default option for web browsing, and there is a school of thought about that, but I’m in too much hurry to track it down. Comments welcome
)
So, I bought my ssl cert from godaddy to make it possible. And it expired, and I couldn’t remember how to make it work again.
After a bit of mucking around, I chose “renew ssl cert” in godaddy, and paid their pound of flash. Downloaded a little zipped bundle patch-tag.com.zip from godaddy. Contained 2 .crt files, patch-tag.com.crt and gd_bundle.crt.
To get things using the new cert, I edited
/etc/stunnel/stunnel.pem
leaving the top portion (pk) unchanged, and swapping out the bottom portion (cert) with the contents of patch-tag.com.crt file from godaddy.
I then did /etc/init.d/stunnel4 restart
afaict, good to go.
Not sure what that other cert file (gd_bundle.crt) is for.
That’s all folks.
Happy tagging!
PS This page was also helpful for configuring stunnel with a godaddy ssl certificate
February 07, 2011
Darcs News
In the previous post we talked about Darcs 2.8's read-only support for old-fashioned (OF) repositories. This reduced support will help us simplify the code base and focus on new features and optimizations for the Hashed repositories, that are now the recommended way to use Darcs.
New features
Let us review the features that were not in 2.5 and that we are working on now:
- rebase: this is a long-wanted Git feature that may be merged into Darcs 2.8. It is currently maintained on a separate branch. We will be confident to include it in the 2.8 release if we are sure we get the UI right by then. We already had some feedback but if you want to try it please go ahead and tell us what you think.
- repositories without working copies: this is still not accepted in HEAD but we are confident we can have it on time.
Let us now consider an improvement that is already present in Darcs HEAD and is thus guaranteed to appear in 2.8:
packs. Basically this optimization makes getting a repository over HTTP much faster.
What happens with darcs getLet us come back to the
darcs get command and see why fetching Hashed repositories is faster than OF (see also
the previous post for more information on OF repositories). A Darcs repository contains (among other things):
- the patches
- a pristine cache, that is the "sum of patches"
Here is what Darcs does when getting an OF repository:
- get patch 0, get patch 1, ..., get patch n
- build the local pristine and working directory trees
Hence to start hacking on a project hosted like this, you need to wait as long as the history of the project is large. This is only bearable for small histories. For instance, the
Xmonad repository (>1100 patches) can take up to 7 minutes to be retrieved on an ADSL line, while the repositoy itself is only 5 MBytes big.
On the other hand, when getting a Hashed repository, Darcs does:
- get pristine file 0, get pristine file 1, ..., get pristine filem
- get patch n, get patch n-1, ... , get patch 0
- build local working directory tree
Step 2. can be skipped with the
--lazy flag, or can be interrupted by hitting CTRL-C. This is fine, since the new local repo has the address of the remote repo and Darcs can fetch patches on demand. Hence, the wait can be reduced a lot, and dowloading with
--lazy is as long as the working copy is large. The Xmonad repository has about 30 pristine files, weighting less than 200 KBytes, so getting those would only be a matter of seconds.
By the way, getting a Darcs repo with
--lazy is strictly more powerful than doing a checkout of a Subversion repository: even though you don't have patches locally, you have the high-level history of the project (i.e., you can use
darcs changes offline). However, looking into the patch contents (
darcs changes -v) does require you to access the remote repository.
Darcs 2.8 will contain a new optimization called packs. Running the command
darcs optimize --http in a repository will store a pack of the pristine and a pack of the patches inside of it. Packs are basically
tar.gz archives. Darcs will detect these packs when doing a
get and act as follows:
- get pristine pack
- get missing pristine files
- get patches pack
- get missing patches
- build working directory tree
Steps 1. and 3. consist in transferring a single file via HTTP, which is much faster than transferring many little files. Steps 3. and 4. are skipped if the flag
--lazy is used.
How much faster is this? Here are
the figures we got on the
Darcs repository (>8000 patches, >600 pristine files):
- full get without packs: 40 minutes
- full get with packs: 3 minutes
- lazy get without packs: 20 seconds
- lazy get with packs: 11 seconds
You need to run
darcs optimize --http manually on the public repository of your project from time to time, as this will not happen automatically. The best moment to do that is after pushing a tag, since this enables people who use
get --tag X (with X being the last tag) to take advantage of the optimization.
ConclusionThis is probably not going to be the only new features that we will try to fit into the 2.8 release, so when we have more we will let you know. Before May 2011 we are going to try to release feature-based alpha versions of Darcs. They will be called Darcs 2.7.x and will be for users who want to try out bleeding-edge features and give us feedback.
If you can't wait, you can simply build the current Darcs HEAD with
darcs get --lazy http://darcs.net and
cabal update && cabal install -f-library inside of the obtained directory.
EDIT: removed
darcs-fastconvert from the list since as of now this is not a supported feature of Darcs.