 |
 |
How big is Spotlights index?
|
 |
|
 |
|
Addicted to MacNN
Join Date: Apr 2001
Location: The bottom of Cloud City
Status:
Offline
|
|
Just curious. What is it called and how can I find it to see how big it might be?
|

"Ahhhhhhhhhhhhhhhh"
|
| |
|
|
|
 |
|
 |
|
Grizzled Veteran
Join Date: May 2001
Location: Ca
Status:
Offline
|
|
I was wondering that as well.
Anyone
|
With some loud music + a friend to chat nearby you can get alot done. - but jezz, I'd avoid it if I had the choice---- If only real people came with Alpha Channels.......:)
AIM:xflaer
deinterlaced.com
|
| |
|
|
|
 |
|
 |
|
Senior User
Join Date: Nov 1999
Location: Milkyway Galaxy
Status:
Offline
|
|
Simple:
Check out the hidden folder .Spotlight-V100/ on the base of all your volumes. Inside that folder you'll find these files which constitute the index:
.journalHistoryLog
ContentIndex.db
store.db
.store.db
_rules.plist
PS. need to be root to access the .Spotlight-V100/ directory, not just an admin.
---
I Know It All
|
|
|
| |
|
|
|
 |
|
 |
|
Posting Junkie
Join Date: Dec 2000
Status:
Offline
|
|
You can see how large your Spotlight index is by typing this:
sudo du -sh /.Spotlight-V100/
Mine is 369 MB. 
|
|
|
| |
|
|
|
 |
|
 |
|
Clinically Insane
Join Date: Nov 1999
Status:
Offline
|
|
I would imagine that it depends largely upon how many files you have accessible to you. The size is unlikely to matter as much as the number, so a big drive with only a few files, such as one used for capturing video, would still have a smaller index than a small drive with many files, such as the drive you boot from.
|
|
You are in Soviet Russia. It is dark. Grue is likely to be eaten by YOU!
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Nov 2001
Status:
Offline
|
|
Originally Posted by Millennium
I would imagine that it depends largely upon how many files you have accessible to you. The size is unlikely to matter as much as the number, so a big drive with only a few files, such as one used for capturing video, would still have a smaller index than a small drive with many files, such as the drive you boot from.
This is going to be highly dependent on the file types. That's because Spotlight will do full text searches on "text" files (plain text, Word, PDF, every email, etc.) files, so if you have 10,000 1 MB text documents, you're going to have a HUGE file.
Larger files (videos, photos, etc.,) that aren't text, will just have the metadata indexed, so they won't add as much to the index.
|
|
|
| |
|
|
|
 |
|
 |
|
Clinically Insane
Join Date: Nov 1999
Status:
Offline
|
|
Originally Posted by CatOne
This is going to be highly dependent on the file types. That's because Spotlight will do full text searches on "text" files (plain text, Word, PDF, every email, etc.) files, so if you have 10,000 1 MB text documents, you're going to have a HUGE file.
The Project Gutenberg translation of Les Miserables, one of the longest single volumes around, is only three megs, and that file is pure ASCII. No one has that many 1-MB text files, if only because such files are few and far between outside of server logs. Also, it is unlikely that Spotlight actually stores the full text in its index. Instead, it ought to go out to the file itself, both to ensure that it has the most recent data and to save space. The slight performance overhead involved in opening the file will outweigh the potentially massive space (not to mention privacy and security) overhead of storing the file's contents in another location.
Most "text" files actually have very small amounts of actual text when compared to the size of the file. Word files are the worst when it comes to formatting bloat, but PDFs aren't exactly lean and mean either.
|
|
You are in Soviet Russia. It is dark. Grue is likely to be eaten by YOU!
|
| |
|
|
|
 |
|
 |
|
Clinically Insane
Join Date: Oct 2000
Location: Los Angeles
Status:
Offline
|
|
Originally Posted by Millennium
The Project Gutenberg translation of Les Miserables, one of the longest single volumes around, is only three megs, and that file is pure ASCII. No one has that many 1-MB text files, if only because such files are few and far between outside of server logs. Also, it is unlikely that Spotlight actually stores the full text in its index. Instead, it ought to go out to the file itself, both to ensure that it has the most recent data and to save space. The slight performance overhead involved in opening the file will outweigh the potentially massive space (not to mention privacy and security) overhead of storing the file's contents in another location.
Most "text" files actually have very small amounts of actual text when compared to the size of the file. Word files are the worst when it comes to formatting bloat, but PDFs aren't exactly lean and mean either.
Yeah, PDFs will definitely be larger than any native word processing file, including Word, since the format encapsulates a variety of information in order to display the file as it was seen in its native form. However, I don't see how that extra data would increase the size of the Spotlight index for that file type. As others have said, Spotlight is simply indexing text.
|

"The natural progress of things is for liberty to yield and government to gain ground." TJ
|
| |
|
|
|
 |
|
 |
|
Fresh-Faced Recruit
Join Date: Mar 2005
Location: Uk
Status:
Offline
|
|
Originally Posted by Millennium
Also, it is unlikely that Spotlight actually stores the full text in its index. Instead, it ought to go out to the file itself, both to ensure that it has the most recent data and to save space. The slight performance overhead involved in opening the file will outweigh the potentially massive space (not to mention privacy and security) overhead of storing the file's contents in another location.
The performance overhead in opening the file and reading it to determine its contents isn't slight but massive.
Isn't the whole point to spotlight that almost all the info it needs about a file is stored in the database allowing fast searches?
Most text files people make are going to contain a small number of words repeated frequently. If the database can get away with storing a reference to a word and its frequency most documents will take up a lot less space in the database.
|
|
|
| |
|
|
|
 |
|
 |
|
Addicted to MacNN
Join Date: Apr 2001
Location: The bottom of Cloud City
Status:
Offline
|
|
I mostly have photoshop files. Anywho, mine is 148M.
|

"Ahhhhhhhhhhhhhhhh"
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Jan 2000
Location: Seattle, WA, King
Status:
Offline
|
|
Spotlight will only index the first few hundred KB of files, IIRC.
|
|
|
| |
|
|
|
 |
|
 |
|
Clinically Insane
Join Date: Oct 2000
Location: Los Angeles
Status:
Offline
|
|
Originally Posted by stinch
The performance overhead in opening the file and reading it to determine its contents isn't slight but massive.
Isn't the whole point to spotlight that almost all the info it needs about a file is stored in the database allowing fast searches?
Most text files people make are going to contain a small number of words repeated frequently. If the database can get away with storing a reference to a word and its frequency most documents will take up a lot less space in the database.
You're partially missing the point - it's an index rather than a database.
|

"The natural progress of things is for liberty to yield and government to gain ground." TJ
|
| |
|
|
|
 |
|
 |
|
Posting Junkie
Join Date: Dec 2000
Status:
Offline
|
|
Originally Posted by Big Mac
You're partially missing the point - it's an index rather than a database.
Actually, the files within the invisible /.Spotlight-V100 directory have the extension .db. To me, that stands for "database."
|
|
|
| |
|
|
|
 |
|
 |
|
Professional Poster
Join Date: Jun 2001
Location: Northwest Ohio
Status:
Offline
|
|
Originally Posted by CharlesS
Actually, the files within the invisible /.Spotlight-V100 directory have the extension .db. To me, that stands for "database."
It's an index stored in a database. 
|
|
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Nov 2001
Status:
Offline
|
|
Originally Posted by Millennium
The Project Gutenberg translation of Les Miserables, one of the longest single volumes around, is only three megs, and that file is pure ASCII. No one has that many 1-MB text files, if only because such files are few and far between outside of server logs. Also, it is unlikely that Spotlight actually stores the full text in its index. Instead, it ought to go out to the file itself, both to ensure that it has the most recent data and to save space. The slight performance overhead involved in opening the file will outweigh the potentially massive space (not to mention privacy and security) overhead of storing the file's contents in another location.
Most "text" files actually have very small amounts of actual text when compared to the size of the file. Word files are the worst when it comes to formatting bloat, but PDFs aren't exactly lean and mean either.
Milliennium, you're a software developer, right? Do you have Apple's developer documentation library on your machine? It's > 450 MB of PDFs. And much/most of that is text (sure, you can get a 120 page manual on the //vx if you want, and 5 copies of iMac technical specifications).
Also, I have > 100 MB of emails. And that's pretty much 100% text, because the attachments are in a separate folder. Many of those emails are > 80 kb in size, because people don't properly trim the emails on replies and they get long response chains. Gotta love sales reps.
Spotlight doesn't store the FULL text in its index (I'm sure it cuts out ifs, and, and buts). But it really does have to index many of the words in there... how else could you find "Pythagoras" in a PDF, without putting it in an index somewhere? Are you going to hash each word in some way?
|
|
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Nov 2001
Status:
Offline
|
|
[turbo:/$] sudo du -sk .Spotlight-V100/
236716 .Spotlight-V100/
|
|
|
| |
|
|
|
 |
 |
|
 |
|
|
|
|
|

|
|
 |
Forum Rules
|
 |
 |
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is Off
|
|
|
|
|
|
 |
 |
 |
 |
|
 |
|