ChunkingNG: add '0' padding on the filename #5476

ogoffart · 2017-01-18T16:43:05Z

The server sorts the chunk by name alphabetically. So if we want to keep
the chunk in order, we need to add a few '0' in front of the chunk name

mention-bot · 2017-01-18T16:43:06Z

@ogoffart, thanks for your PR! By analyzing the history of the files in this pull request, we identified @ckamm, @jturcotte and @danimo to be potential reviewers.

ogoffart · 2017-01-18T16:45:06Z

CC\ @DeepDiver1975 The server sorts the chunk alphabeticaly (10 < 2) (cf. the strcmp in https://github.com/owncloud/core/blob/5ce66c4d5f9a895bec41f56f7cc96661eec00ae6/apps/dav/lib/Upload/AssemblyStream.php#L69 )
Is that intentional, if yes, we need to document it on https://github.com/cernbox/smashbox/blob/master/protocol/chunking.md

ckamm · 2017-01-18T18:08:36Z

src/libsync/propagateuploadng.cpp

you mean "leading" 0

guruz · 2017-01-18T18:28:49Z

I think it's misleading if you really need to do that.

@dragotin @moscicki @PVince81

moscicki · 2017-01-18T20:01:29Z

What is the purpose of this? Why do you care on the client side in which order the server keeps chunks internally? If you need on the client to establish order based on list of chunk names returned by the server then convert to int and sort. Adding leading 0 so the server by side effects gets alphabetical sorting right -- what a horrible idea. Will break the moment server decides to change ordering for any reason. Also, don't you need more zeros to prepend when sorting numbers with more than 2 digits?

Likewise on the server. If you need to establish the order for concatenating the chunks => convert to int.

BTW: The client should keep the state of the transfer based on the server responses. Normally there should be no reason to inquire server state. That's also recommended by S3 multipart upload (http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html)

BTW2. I think you really need to think of this API one level higher: you get the list of chunks uploaded to the server. Not files uploaded to the server. There may be different implementations. These webdav calls could be also REST APIs. The API structure is the same -- just the HTTP verb is different. I think it makes no sense that client thinks of these chunks as individual files on the server.

The server sorts the chunk by name alphabetically. So if we want to keep the chunk in order, we need to add a few '0' in front of the chunk name

ogoffart · 2017-01-19T08:25:17Z

@moscicki

Why do you care on the client side in which order the server keeps chunks internally?

I care in which order the chunks will be assembled: They will be assembled in alphabetical order.

So if i upload chunks 0,1,2,3,4,5,6,7,8,9,10,11 I want them to be assembled in that order. Not 0,1,10,11,2,3,... That would be a file corruption.

don't you need more zeros to prepend when sorting numbers with more than 2 digits?

The "rightJustified" function add a variable number of '0' depending of the size of the string.

BTW: The client should keep the state of the transfer based on the server responses.

That's what the old chunking do. The advantage of querrying the server is that we really know what are the chunks on the servers.

BTW2. [...] These webdav calls could be also REST APIs. [...] I think it makes no sense that client thinks of these chunks as individual files on the server.

That's correct. The client don't think they are files, just entries in a webdav reply.
Is your point that your server will not have these individual file and it will therefore be hard for the server to remember the list of chunks with their name?
But that's part of the new chunking protocol: that you can query the existing chunks with PROPFIND.

DeepDiver1975 · 2017-01-19T08:31:20Z

CC\ @DeepDiver1975 The server sorts the chunk alphabeticaly (10 < 2) (cf. the strcmp in https://github.com/owncloud/core/blob/5ce66c4d5f9a895bec41f56f7cc96661eec00ae6/apps/dav/lib/Upload/AssemblyStream.php#L69 )
Is that intentional, if yes, we need to document it on https://github.com/cernbox/smashbox/blob/master/protocol/chunking.md

The original idea of the protcol was to use integer numbers - it still should be from my pov.
Maybe the strcmp is simply wrong - I need to rethink on this - next week.

moscicki · 2017-01-19T09:00:15Z

@ogoffart: Isn't it up to the server to assure correct chunk ordering based on the value you pass? You are passing an integer number (as string due to the way it is encoded in HTTP request). The server should take this number and do whatever necessary to assure the correct sequence. They may pad it with zeros or convert to int or whatever. I really think this should be fixed in the server for this special endpoint. I do not think that the client should assume anything how the server internally stores these chunks and if it does alphabetic sorting internally. From a point of view of the protocol this is a low-level implementation detail that makes the system very fragile in case of future changes.

I completely agree with you that we should make sure the protocol doc reflect any changes or additional assumptions. The current description at https://github.com/cernbox/smashbox/blob/master/protocol/chunking.md is not very complete and I can help fixing this but I would like to know what's described on Dragotin's blog is still valid or if some parts are outdated. For example, do you use the PROPFIND and do you assume anything about ordering of the chunk list that server provides to you?

moscicki · 2017-01-19T09:04:02Z

About your last question: in our implementation we do remember which chunks have been uploaded. However, ideally, the client should manage this state completely.There might be corner cases where you want to inquire about the status of chunks on the server but they should be quite rare I believe (if needed at all)?

guruz · 2017-01-19T11:23:52Z

@moscicki Because of the server being able to timeout chunks we had decided that the state is managed on the server and the client inquires with a PROPFIND when an upload is started/resumed.

I agree that the server should be fixed here, not the client.

DeepDiver1975 · 2017-01-19T11:42:53Z

I agree that the server should be fixed here, not the client.

as said I'll have a look at this next week

guruz · 2017-01-19T11:47:07Z

@DeepDiver1975 don't worry, not meant as a blame, more like a -1 to @ogoffart patch.

moscicki · 2017-01-19T13:45:27Z

Likewise here: no offense meant to anyone, please.

For our endpoint implementation we will not care about leading zeros because we will convert to int. But for the sanity of the protocol I would suggest to make minimal assumptions and separate protocols from implementation side effects.

jturcotte · 2017-01-19T14:42:06Z

It's possible that we'd like to encode more info in the name in the future, like the start offset in the file or the total number of chunks. For example if we want to use the same end point for delta sync.

It could make sense to parse the current chunk number already up to a delimiter (e.g. a ".") so that we have some freedom regarding older implementations in the future.

Or do we assume that we'll pass everything else through HTTP headers?

moscicki · 2017-01-19T14:56:44Z

How is the file offset provided today? I guess a header?

jturcotte · 2017-01-19T15:23:11Z

Currently chunks are just concatenated at MOVE time by the server, and the client calculates the source offset of the next chunks by doing an initial PROPFIND of any existing temp upload folder (e.g. when restarted during upload).

moscicki · 2017-01-19T15:55:27Z

How does the client know in this case what was the chunk size used for this interrupted upload? If you only get the chunk number via propfind without the offset you must assume the chunk size is constant, don't you?

About previous question on special headers, according to the doc below the chunk offset should be sent by the client as a header: https://github.com/cernbox/smashbox/blob/master/protocol/chunking.md

Could someone that knows the current client implementation verify if the doc is up-to-date and if it is complete? Thank you.

jturcotte · 2017-01-19T16:03:17Z

Chunks, like files, are atomically added to WebDAV. Unfinished chunks aren't kept and that means that we can just use the size of each chunk as returned by PROPFIND, and add them up to the point of the offset of the next-to-do chunk (afterward just using the chunk size in the client's configuration). Different chunks can have different sizes for that reason, as long as chunks of a single file are uploaded serially, in order.

moscicki · 2017-01-19T16:13:12Z

Okey, that's clear. But why do you require sequential upload of the chunks?

jturcotte · 2017-01-19T16:18:21Z

If we want parallel chunks, we'd actually need the offset in order to patch potential holes in the chain and avoid having to restart from the first missing chunk.

If a following chunk has the time to finish but not its predecessor, we need the offset of the next chunk if we can't know the used chunk size of the missing chunk.

In any case, we're still doing parallel upload of files (chunked or not), so it would mainly be a win when a single big file is being uploaded (which probably does happen), but with a big-enough chunk size, it probably won't upload faster in parallel anyway.

moscicki · 2017-01-19T17:14:14Z

Don't you have the chunk offset to be sent in OC-Chunk-Offset header? According to spec.

The previous version (V1) of chunking allowed for parallel upload. This one does not. If you have a hole in transfer then you should also have a hole in the index number? Why not just calculate the offsets correctly from the PROPFIND and send the missing chunk again?

ogoffart · 2017-01-19T18:43:26Z

We can do parallel upload, that's not a problem.
When we resume, we resume from the first chunk that was not successfull.

ogoffart · 2017-01-20T09:35:53Z

@guruz:

more like a -1 to @ogoffart patch.

I think we should merge this patch anyway before the beta if the server is not fixed yet.
We can revert it later if released version of the server don't have this problem.

ckamm · 2017-01-20T10:22:07Z

@moscicki @ogoffart

Here are a bunch of other things I noticed while comparing the client behavior to the protocol:

If there is a chunk ordering through the numbers as well as the OC-Chunk-Offset header, what should happen during MOVE precisely? Assembly based on the chunk ordering, using the chunk offset values for validation? (the current oc server version just assembles based on order and doesn't care about chunk-offset)
Is there a particular use for OC-Total-Length? I propose it should be used for detecting missing data in ChunkingNG: Missing chunks are silently ignored core#26988.
The spec mentions If-Destination-Match. The actual implementation sends If: <$dest>([ $match ])
The client currently sometimes uses DELETE on chunks when resuming. There's no mention of that in the protocol document.

guruz · 2017-01-20T15:45:53Z

@ogoffart You are right, merging this until the server is fixed.

I think the server needs to use https://secure.php.net/manual/en/function.strnatcmp.php .. I'll try.

moscicki · 2017-01-20T16:06:32Z

You may also consider to activate it now via a server capability. You will need this anyway when the server fixes the issue such that you do not break older servers. It is cleaner to do it now via a capability such as "ChunkingNG.ZeroPaddedChunkNames" = 1. Otherwise you will have to come up with a capability to disable this in the future - not nice.

For owncloud/client#5476 Before this, the assembly could be bogusly in the order 0,1,10,11,2,3 etc. As per the spec "The name of every chunk should be its chunk number." https://github.com/cernbox/smashbox/blob/master/protocol/chunking.md

guruz · 2017-01-20T16:50:45Z

-> owncloud/core#26995

moscicki · 2017-01-20T20:35:49Z

@ckamm: thanks a lot for this investigation.

Very good point. I would assume that the order goes by chunk number and that all Chunk-Offset+Content-Lenght must match the entire file range (as declared by OC-Total-Length). Essentially if this is not met then it means that some bytes were not sent and this is a clear error. This error should result in wrong checksum. Do we want to add a possibility to return 400 Bad Request to MOVE in case not all bytes were sent?
Detecting missing data is one application. Another is that you may want to reserve space in your storage system upfront knowing the size. Another is that you can check quota and reject the initial request straightaway. BTW: do we report quota errors on PUT or in this case MKCOL?
Okey, something to be fixed. BTW. Couldn't we use If-Match like for the normal upload?
Why would the client DELETE already uploaded chunks? Does it make any sense?

Please note that, following idea from @PVince81 I added 400 Bad Request specification: https://github.com/cernbox/smashbox/blob/master/protocol/chunking.md#put

Any comments on this?

guruz · 2017-01-26T12:10:25Z

The spec mentions If-Destination-Match. The actual implementation sends If: <$dest>([ $match ])
The client currently sometimes uses DELETE on chunks when resuming. There's no mention of that in the protocol document.

@moscicki Would you adjust https://github.com/cernbox/smashbox/blob/master/protocol/chunking.md to allow for DELETE on chunks? The client can decide to do this if he decides that he aborts the upload or that he thinks he's better off with another chunk size etc. e.g. #5480

For owncloud/client#5476 Before this, the assembly could be bogusly in the order 0,1,10,11,2,3 etc. As per the spec "The name of every chunk should be its chunk number." https://github.com/cernbox/smashbox/blob/master/protocol/chunking.md

For owncloud/client#5476 Before this, the assembly could be bogusly in the order 0,1,10,11,2,3 etc. As per the spec "The name of every chunk should be its chunk number." https://github.com/cernbox/smashbox/blob/master/protocol/chunking.md Signed-off-by: Morris Jobke <[email protected]>

ckamm reviewed Jan 18, 2017

View reviewed changes

src/libsync/propagateuploadng.cpp Outdated

Copy link

Contributor

ckamm Jan 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean "leading" 0

ChunkingNG: add '0' padding on the filename

403553a

The server sorts the chunk by name alphabetically. So if we want to keep the chunk in order, we need to add a few '0' in front of the chunk name

ogoffart force-pushed the chunking-ng branch from 3d0519f to 403553a Compare January 19, 2017 08:09

PVince81 mentioned this pull request Jan 19, 2017

Use chunking in web upload owncloud/core#26306

Closed

16 tasks

guruz added this to the 2.3.0 milestone Jan 20, 2017

guruz merged commit d6fdda8 into master Jan 20, 2017

guruz deleted the chunking-ng branch January 20, 2017 15:04

guruz mentioned this pull request Jan 20, 2017

Chunking NG: Assemble in natural sort order of files owncloud/core#26995

Merged

moscicki mentioned this pull request Jan 20, 2017

Chunking-NG protocol specification clarifications cernbox/smashbox#76

Open

MorrisJobke mentioned this pull request Mar 17, 2017

Chunking NG: Assemble in natural sort order of files nextcloud/server#3886

Merged

ChunkingNG: add '0' padding on the filename #5476

ChunkingNG: add '0' padding on the filename #5476

Uh oh!

Conversation

ogoffart commented Jan 18, 2017

Uh oh!

mention-bot commented Jan 18, 2017

Uh oh!

ogoffart commented Jan 18, 2017

Uh oh!

ckamm Jan 18, 2017

Choose a reason for hiding this comment

Uh oh!

guruz commented Jan 18, 2017

Uh oh!

moscicki commented Jan 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogoffart commented Jan 19, 2017

Uh oh!

DeepDiver1975 commented Jan 19, 2017

Uh oh!

moscicki commented Jan 19, 2017

Uh oh!

moscicki commented Jan 19, 2017

Uh oh!

guruz commented Jan 19, 2017

Uh oh!

DeepDiver1975 commented Jan 19, 2017

Uh oh!

guruz commented Jan 19, 2017

Uh oh!

moscicki commented Jan 19, 2017

Uh oh!

jturcotte commented Jan 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

moscicki commented Jan 19, 2017

Uh oh!

jturcotte commented Jan 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

moscicki commented Jan 19, 2017

Uh oh!

jturcotte commented Jan 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

moscicki commented Jan 19, 2017

Uh oh!

jturcotte commented Jan 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

moscicki commented Jan 19, 2017

Uh oh!

ogoffart commented Jan 19, 2017

Uh oh!

ogoffart commented Jan 20, 2017

Uh oh!

ckamm commented Jan 20, 2017

Uh oh!

guruz commented Jan 20, 2017

Uh oh!

moscicki commented Jan 20, 2017

Uh oh!

guruz commented Jan 20, 2017

Uh oh!

moscicki commented Jan 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guruz commented Jan 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

moscicki commented Jan 18, 2017 •

edited

Loading

jturcotte commented Jan 19, 2017 •

edited

Loading

jturcotte commented Jan 19, 2017 •

edited

Loading

jturcotte commented Jan 19, 2017 •

edited

Loading

jturcotte commented Jan 19, 2017 •

edited

Loading

moscicki commented Jan 20, 2017 •

edited

Loading

guruz commented Jan 26, 2017 •

edited

Loading