-
Notifications
You must be signed in to change notification settings - Fork 684
ChunkingNG: add '0' padding on the filename #5476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ogoffart, thanks for your PR! By analyzing the history of the files in this pull request, we identified @ckamm, @jturcotte and @danimo to be potential reviewers. |
|
CC\ @DeepDiver1975 The server sorts the chunk alphabeticaly (10 < 2) (cf. the strcmp in https://github.com/owncloud/core/blob/5ce66c4d5f9a895bec41f56f7cc96661eec00ae6/apps/dav/lib/Upload/AssemblyStream.php#L69 ) |
src/libsync/propagateuploadng.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean "leading" 0
|
What is the purpose of this? Why do you care on the client side in which order the server keeps chunks internally? If you need on the client to establish order based on list of chunk names returned by the server then convert to int and sort. Adding leading 0 so the server by side effects gets alphabetical sorting right -- what a horrible idea. Will break the moment server decides to change ordering for any reason. Also, don't you need more zeros to prepend when sorting numbers with more than 2 digits? Likewise on the server. If you need to establish the order for concatenating the chunks => convert to int. BTW: The client should keep the state of the transfer based on the server responses. Normally there should be no reason to inquire server state. That's also recommended by S3 multipart upload (http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html) BTW2. I think you really need to think of this API one level higher: you get the list of chunks uploaded to the server. Not files uploaded to the server. There may be different implementations. These webdav calls could be also REST APIs. The API structure is the same -- just the HTTP verb is different. I think it makes no sense that client thinks of these chunks as individual files on the server. |
The server sorts the chunk by name alphabetically. So if we want to keep the chunk in order, we need to add a few '0' in front of the chunk name
I care in which order the chunks will be assembled: They will be assembled in alphabetical order. So if i upload chunks 0,1,2,3,4,5,6,7,8,9,10,11 I want them to be assembled in that order. Not 0,1,10,11,2,3,... That would be a file corruption.
The "rightJustified" function add a variable number of '0' depending of the size of the string.
That's what the old chunking do. The advantage of querrying the server is that we really know what are the chunks on the servers.
That's correct. The client don't think they are files, just entries in a webdav reply. |
The original idea of the protcol was to use integer numbers - it still should be from my pov. |
|
@ogoffart: Isn't it up to the server to assure correct chunk ordering based on the value you pass? You are passing an integer number (as string due to the way it is encoded in HTTP request). The server should take this number and do whatever necessary to assure the correct sequence. They may pad it with zeros or convert to int or whatever. I really think this should be fixed in the server for this special endpoint. I do not think that the client should assume anything how the server internally stores these chunks and if it does alphabetic sorting internally. From a point of view of the protocol this is a low-level implementation detail that makes the system very fragile in case of future changes. I completely agree with you that we should make sure the protocol doc reflect any changes or additional assumptions. The current description at https://github.com/cernbox/smashbox/blob/master/protocol/chunking.md is not very complete and I can help fixing this but I would like to know what's described on Dragotin's blog is still valid or if some parts are outdated. For example, do you use the PROPFIND and do you assume anything about ordering of the chunk list that server provides to you? |
|
About your last question: in our implementation we do remember which chunks have been uploaded. However, ideally, the client should manage this state completely.There might be corner cases where you want to inquire about the status of chunks on the server but they should be quite rare I believe (if needed at all)? |
|
@moscicki Because of the server being able to timeout chunks we had decided that the state is managed on the server and the client inquires with a PROPFIND when an upload is started/resumed. I agree that the server should be fixed here, not the client. |
as said I'll have a look at this next week |
|
@DeepDiver1975 don't worry, not meant as a blame, more like a -1 to @ogoffart patch. |
|
Likewise here: no offense meant to anyone, please. For our endpoint implementation we will not care about leading zeros because we will convert to int. But for the sanity of the protocol I would suggest to make minimal assumptions and separate protocols from implementation side effects. |
|
It's possible that we'd like to encode more info in the name in the future, like the start offset in the file or the total number of chunks. For example if we want to use the same end point for delta sync. It could make sense to parse the current chunk number already up to a delimiter (e.g. a ".") so that we have some freedom regarding older implementations in the future. Or do we assume that we'll pass everything else through HTTP headers? |
|
How is the file offset provided today? I guess a header? |
|
Currently chunks are just concatenated at |
|
How does the client know in this case what was the chunk size used for this interrupted upload? If you only get the chunk number via propfind without the offset you must assume the chunk size is constant, don't you? About previous question on special headers, according to the doc below the chunk offset should be sent by the client as a header: https://github.com/cernbox/smashbox/blob/master/protocol/chunking.md Could someone that knows the current client implementation verify if the doc is up-to-date and if it is complete? Thank you. |
|
Chunks, like files, are atomically added to WebDAV. Unfinished chunks aren't kept and that means that we can just use the size of each chunk as returned by PROPFIND, and add them up to the point of the offset of the next-to-do chunk (afterward just using the chunk size in the client's configuration). Different chunks can have different sizes for that reason, as long as chunks of a single file are uploaded serially, in order. |
|
Okey, that's clear. But why do you require sequential upload of the chunks? |
|
If we want parallel chunks, we'd actually need the offset in order to patch potential holes in the chain and avoid having to restart from the first missing chunk. If a following chunk has the time to finish but not its predecessor, we need the offset of the next chunk if we can't know the used chunk size of the missing chunk. In any case, we're still doing parallel upload of files (chunked or not), so it would mainly be a win when a single big file is being uploaded (which probably does happen), but with a big-enough chunk size, it probably won't upload faster in parallel anyway. |
|
Don't you have the chunk offset to be sent in OC-Chunk-Offset header? According to spec. The previous version (V1) of chunking allowed for parallel upload. This one does not. If you have a hole in transfer then you should also have a hole in the index number? Why not just calculate the offsets correctly from the PROPFIND and send the missing chunk again? |
|
We can do parallel upload, that's not a problem. |
|
Here are a bunch of other things I noticed while comparing the client behavior to the protocol:
|
|
@ogoffart You are right, merging this until the server is fixed. I think the server needs to use https://secure.php.net/manual/en/function.strnatcmp.php .. I'll try. |
|
You may also consider to activate it now via a server capability. You will need this anyway when the server fixes the issue such that you do not break older servers. It is cleaner to do it now via a capability such as "ChunkingNG.ZeroPaddedChunkNames" = 1. Otherwise you will have to come up with a capability to disable this in the future - not nice. |
For owncloud/client#5476 Before this, the assembly could be bogusly in the order 0,1,10,11,2,3 etc. As per the spec "The name of every chunk should be its chunk number." https://github.com/cernbox/smashbox/blob/master/protocol/chunking.md
|
@ckamm: thanks a lot for this investigation.
Please note that, following idea from @PVince81 I added 400 Bad Request specification: https://github.com/cernbox/smashbox/blob/master/protocol/chunking.md#put Any comments on this? |
@moscicki Would you adjust https://github.com/cernbox/smashbox/blob/master/protocol/chunking.md to allow for DELETE on chunks? The client can decide to do this if he decides that he aborts the upload or that he thinks he's better off with another chunk size etc. e.g. #5480 |
For owncloud/client#5476 Before this, the assembly could be bogusly in the order 0,1,10,11,2,3 etc. As per the spec "The name of every chunk should be its chunk number." https://github.com/cernbox/smashbox/blob/master/protocol/chunking.md
For owncloud/client#5476 Before this, the assembly could be bogusly in the order 0,1,10,11,2,3 etc. As per the spec "The name of every chunk should be its chunk number." https://github.com/cernbox/smashbox/blob/master/protocol/chunking.md Signed-off-by: Morris Jobke <[email protected]>
The server sorts the chunk by name alphabetically. So if we want to keep
the chunk in order, we need to add a few '0' in front of the chunk name