...making Linux just a little more fun!

<-- 2c Tips | TAG Index | 1 | 2 | 3 | 4 | Knowledge Base | News Bytes -->

The Answer Gang

By Jim Dennis, Jason Creighton, Chris G, Karl-Heinz, and... (meet the Gang) ... the Editors of Linux Gazette... and You!



(?) Question about file mtime on linux

From Suramya Tomar

Answered By: Benjamin Okopnik, Pete Jewell, John Karnes, Mike Orr, Jay R. Ashworth, Kapil Hari Paranjape

Hi Everyone, I have a question for you about file mtime(modify time) on linux. Does the mtime stamp of a file change as soon as a process starts modifying the file or does it change it after the change is done?

(!) [Ben] After the change is done, of course; up until that time, the file has only been read, not written to. What is modified (in e.g., an editor) is a copy of the file, generally held in memory - which is why the "save" function exists. Otherwise, a crash in the middle of editing would destroy your original file.

(?) The reason I am interested in this is that I am writing a perl script which is supposed to monitor a file for changes and as soon as the change is done run another script which processes the changed file. I don't want the second script to run before the file change is complete(Which would happen if the mtime changes as soon as the modification starts).

So how would I check that the file change is done?One way would be to keep checking in a loop if the mtime changed if it did goto sleep and check again after a few seconds, keep repeating this until the mtime stop changing. But in my opinion this is a stupid way of doing this so I am hoping one of you has a better way of doing it.

(!) [Kapil] File locking might be useful. The script sees the change in the mtime, then waits for the lock to go away and starts processing. You would have to tell the modifying program to use file locking though.
(!) [Sluggo] I would agree to use the kernel monitor first. But for alternatives...
Why not have the first process send the second process a signal when it's done? Put the observing process in indefinate sleep, and have the write process send a SIGALRM (Alarm) to kick it. You'd prob'ly want to put the observer's PID in a well-known file so the other process can find it.
Or use file locking. The writer holds a write lock till it's done. The observer sees the mtime change and acquires a read lock. The read lock blocks until the write lock has released.
If you really want to know when the mtime is set, look in the kernel source. or the libc source, and see what fwrite() does. That's one of the advantages of open-source software. Even if you don't know much C, you can still tell whether the word "mtime" appears above or below a write() call.
Or just think about why most programs don't have to worry about this. Unix programs tend to open-write-close quickly, and close the file when they don't need it. (As opposed to Windows text editors, which often hold the file open the whole time.) Unless the program has to hold the file open for a long time (e.g., streaming log entries), your chances of hitting the file in the middle of an update are pretty slim. Then think about, what's the worst that would happen if you did? Your observer would produce garbled output, with part of one version and part of another. Or maybe it would crash. Would this be the end of the world or a minor inconvenience? At least it would tell you how (in)frequently such a collision is occurring.
(!) [Pete] From past experience I seem to remember that the only 'pure perl' way to ensure that a file is not being modified, without relying on file locks, is to check the size/mtime of the file, wait a bit, and then check it again, repeat until the two size/mtimes are the same.
The reason I ran into this was because I was writing a routine to process a file after it had been uploaded to an ftp server. In the end, because we had control over the ftp server, we configured it so that the uploader and my routine both logged into the ftp server using the same username, and restricted the number of times a user could concurrently login to 1. This did the trick quite nicely.
A good resource for perl questions is the Perl Monks website http://www.perlmonks.org - in fact I would go so far as to say that it is the best resource for perl information on the web.
(!) [Ben] [grin] Multiple xterms are useful for this. In one of them, run this program:

See attached mike.show-mtime.pl.txt

where "foo" is the file you're looking at; this will print the mtime of 'foo' once a second. In another xterm, open 'foo' and modify to your heart's content - all the while glancing at the first term. Nice and easy.
(!) [John Karns] How about making a call to 'lsof' to see if the file is open?
I'm not sure if a latency in the kernel flushing disk buffers would be a concern in this kind of scenario. If so, you might want to have either one of the processes make a call to flush the buffers to ensure that there is not a pending update to the file.
In fact I've sometimes wondered about this myself: if there is a pending write to a file via a dirty buffer, is that automatically taken into account if I read the file before the buffer is flushed? I.e., is the pending change transparently mapped to a read of the file by libc or whatever?
(!) [Sluggo] It would be a very severe bug if it didn't read from the buffer, since that's the official version currently. I've sometimes wondered this myself, but I've found the kernel developers pretty trustworthy so I assumed they wouldn't do such a thing.
(!) [Jay] It seems to me that the kernel should update the mtime in the inode (as the inode is transparently cached in RAM) *everytime a write(1) call is made to the file*.
So, at this point, your question expands to "if someone makes a write call which takes a finite amount of realtime to execute (like, writing 1MB from a RAM buffer to a file), at which end of the execution of that write call will the inode get updated?"
IANAKH, but I believe the pertinent code is in kernel/fs/$FILESYS/file.c
http://www.ibiblio.org/navigator-bin/navigator.cgi?fs/ext2/file.c
and specifically the function ext2_file_write() (for ext2).
Short version:
  1. ext2 updates the inode after it writes the actual data,
  2. since it's buffered, the order in which the actual bytes hit the disk is indeterminate, and
  3. it's filesystem specific, since the answer to the question lives in a filesystem specific location in the source.
The specific answer lies in the relative positions of ll_rw_block() and the assignment to inode->i_mtime in the function.

(?) Wow, I didn't expect my question to generate such a big response. Now I know why I like TAG :)

NEways I did a little research after reading all your answers and found the following:

The file modification time changes continuously when
the file is being created/modified. (For example when
creating the file using cp).

To test this I wrote the following perl script that displays the mtime of my test file every 1 second:

See attached suramya.show-mtime.pl.txt

Then I opened another xterm window and ran the following command: 'cp /var/log/messages test.dat'

suramya@StarKnight:~/Temp$ perl test.pl
Time: 1113452801
Time: 1113452801
Time: 1113452801
Time: 1113452801
Time: 1113452801
Time: 1113452801
Time: 1113452801
Time: 1113452890  <-- Changed
Time: 1113452891  <-- Changed
Time: 1113452891
Time: 1113452893  <-- Changed
Time: 1113452894  <-- Changed
Time: 1113452895  <-- Changed
Time: 1113452896  <-- Changed

As you can see when the copy started the numbers started changing. So now we know that the mtime keeps changing when the file is being modified. And if you think about it, it makes sense: The mtime changes whenever any changes are made to the inode's used by the file so when the file is being created new inodes are being used constantly so the mtime has to change.

Now I will be looking into the other suggestions you guys made and see if I can get this to work. (And no this is not a school project  :) ) I need to export data from an oracle DB to a CSV file and have another script read this CSV file and process it. If the second script reads a half written file 'Bad Things' (TM) will happen.

(!) [Pete] If that's the case, then a low tech solution might suffice. Write your CSV file out from oracle as something like 'temp_output.csv', and then have your oracle process rename the file once it's fully exported. Then your perl script is only looking for the renamed file, instead of trying to guess when the file creation is complete.
(!) [Ben] Oh, nice solution, Pete! You could even follow the (more or less) standard practice of "building" the file in a temp dir, then moving it into the location from which it should be copied. As long as the temp and the final locations are on the same partition, that's just a matter of changing the inode info - which is an atomic op, so there's no chance of retrieving a partial file.

(?) Thanks a lot. This solution works perfectly for my task. Makes my life a lot simpler too :)


This page edited and maintained by the Editors of Linux Gazette
HTML script maintained by Heather Stern of Starshine Technical Services, http://www.starshine.org/


Each TAG thread Copyright © its authors, 2005

Published in issue 114 of Linux Gazette May 2005

<-- 2c Tips | TAG Index | 1 | 2 | 3 | 4 | Knowledge Base | News Bytes -->
Tux